JP2019528643A

JP2019528643A - Method, apparatus and system for detecting scene change frames

Info

Publication number: JP2019528643A
Application number: JP2019510927A
Authority: JP
Inventors: ▲ジエ▼ 熊; 友▲慶▼ ▲楊▼; 一宏黄
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2016-08-23
Filing date: 2017-08-22
Publication date: 2019-10-10
Also published as: CN107770538A; KR20190039265A; EP3499460A4; US20190260999A1; WO2018036481A1; CN107770538B; US10917643B2; EP3499460A1

Abstract

本発明は、ビデオ中のシーンチェンジフレームを検出するための方法および装置ならびにシステムを開示する。ビデオ中のシーンチェンジフレームが検出されるとき、Pフレーム中のシーンチェンジフレームを検出するために、ビデオのGOP内のすべてのPフレームのうちの最大Pフレーム（Pmax）がシーンチェンジフレームであるかどうかが、PmaxとPmaxに最も近くかつPmaxよりも前のシーンチェンジフレームとの間の複数のIフレームのサイズの中央値もしくは平均値と、PmaxのサイズPkmaxとの相対関係、または、Pkmaxと、GOP内の複数のPフレームのサイズの中央値もしくは平均値との相対関係に基づいて判定され、これにより、シーンチェンジフレームの検出漏れが効果的に低減される。The present invention discloses a method and apparatus and system for detecting scene change frames in a video. When a scene change frame in the video is detected, whether the maximum P frame (Pmax) of all P frames in the video GOP is a scene change frame in order to detect a scene change frame in the P frame How about the relative value of the median or average value of the sizes of multiple I frames between Pmax and the scene change frame closest to Pmax and before Pmax, and the size Pkmax of Pmax, or Pkmax, Judgment is made based on the relative relationship with the median or average value of the sizes of a plurality of P frames in the GOP, thereby effectively reducing the detection omission of the scene change frame.

Description

本発明は、ビデオ技術の分野に関し、具体的には、シーンチェンジフレームを検出するための方法および装置ならびにシステムに関する。 The present invention relates to the field of video technology, and in particular, to a method and apparatus and system for detecting scene change frames.

通信技術の発展に伴い、IPTV（Internet Protocol Television、インターネットプロトコルテレビジョン）およびOTTサービスなどのビデオサービスが広く商用利用されている。ビデオサービスの品質を保証するために、ビデオ品質を評価する必要があり、これにより、対応する手段を使用して適時に調整が行われ、ビデオサービスの正常な動作が保証される。したがって、ビデオ品質を正確に評価する方法は、緊急に解決する必要がある重要な問題である。 With the development of communication technology, video services such as IPTV (Internet Protocol Television) and OTT service are widely used commercially. In order to guarantee the quality of the video service, it is necessary to evaluate the video quality, so that timely adjustments are made using corresponding means to ensure the normal operation of the video service. Therefore, how to accurately assess video quality is an important issue that needs to be solved urgently.

ビデオのセグメントは、複数の連続したビデオフレームシーケンスを含み、一般に、1つよりも多くのシーンを含む。例えば、ビデオのセグメントは、4つのシーンを含み、シーン1およびシーン3は、サッカーグラウンドのショットビデオに対応し、シーン2およびシーン4は、観客席のショットビデオに対応する。 A segment of video includes a plurality of consecutive video frame sequences and generally includes more than one scene. For example, the video segment includes four scenes, scene 1 and scene 3 correspond to a soccer ground shot video, and scene 2 and scene 4 correspond to a spectator seat shot video.

ビデオ品質が評価されているとき、最初に、シーンが変化する位置、すなわちシーンチェンジフレームの位置を検出する必要があり、次に、そのシーンに基づいてビデオ品質が評価される。例えば、ビデオ符号化中に生じるビデオ符号化損失は、ビデオ符号化タイプ、フレームレート、解像度、およびビットレートだけでなく、シーンの複雑度にも関連するので、ビデオ符号化損失を評価するために、シーンが変化する位置を最初に検出する必要がある。したがって、ビデオ品質が評価されているときに、シーンチェンジ検出を正確に行う必要がある。 When video quality is being evaluated, it is first necessary to detect where the scene changes, i.e. the position of the scene change frame, and then the video quality is evaluated based on the scene. For example, to assess video coding loss because video coding loss that occurs during video coding is related not only to video coding type, frame rate, resolution, and bit rate, but also to the complexity of the scene. First, the position where the scene changes needs to be detected. Therefore, it is necessary to accurately detect scene changes when video quality is being evaluated.

ビデオのビデオフレームが符号化されているとき、ビデオフレームは、異なるタイプのフレーム、例えば、Iフレーム、Pフレーム、およびBフレームに符号化される。Iフレームは、フレーム内予測フレームであり、フレーム内のデータのみが符号化中に参照される。Pフレームは、予測フレーム、言い換えれば一方向差分フレームであり、このフレームと前のIフレーム（またはPフレーム）との差分を記録するために使用される。Bフレームは、双方向補間予測フレーム、言い換えれば双方向差分フレームであり、このフレームと前のフレームおよび次のフレームの各々との差分を記録するために使用される。 When video frames of a video are being encoded, the video frames are encoded into different types of frames, eg, I frames, P frames, and B frames. The I frame is an intra-frame prediction frame, and only data in the frame is referred to during encoding. The P frame is a prediction frame, in other words, a unidirectional difference frame, and is used to record a difference between this frame and the previous I frame (or P frame). The B frame is a bi-directional interpolation prediction frame, in other words, a bi-directional difference frame, and is used to record the difference between this frame and each of the previous frame and the next frame.

シーンチェンジフレームを検出するための方法は、標準ITU-T P1201.2のIPTV監視ソリューションにおいて提供されている。しかしながら、従来技術ではIフレーム中のシーンチェンジフレームしか検出されないが、実際には、多くのシーンチェンジフレームはPフレームである。その結果、従来技術では、シーンチェンジフレーム検出中に検出漏れが発生する。 A method for detecting scene change frames is provided in the standard ITU-T P1201.2 IPTV surveillance solution. However, in the prior art, only the scene change frame in the I frame is detected, but in reality, many scene change frames are P frames. As a result, in the prior art, a detection failure occurs during the scene change frame detection.

本発明の実施形態は、従来技術におけるシーンチェンジフレームの検出漏れを回避するように、シーンチェンジフレームを検出するための方法および装置を提供する。 Embodiments of the present invention provide a method and apparatus for detecting scene change frames so as to avoid missing scene change frames in the prior art.

第1の態様によれば、シーンチェンジフレームを検出するための方法が提供される。ビデオは、N個のピクチャグループGOPを含み、Nは、2以上の整数であり、本方法は、
K番目のGOP内のすべてのPフレームのうちの最大PフレームP_maxを判定するステップであって、P_maxのサイズが、
であり、Kが、MからNまでの範囲の変数であり、1≦M≦Nである、ステップと、
の間の相対値が第1の閾値以上であり、かつ
の間の相対値が第2の閾値以上であると判定された場合に、P_maxはシーンチェンジフレームであると判定するステップであって、
が、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間の複数のIフレームのサイズの中央値または平均値であり、
が、K番目のGOP内の複数のPフレームのサイズの中央値または平均値であり、第1の閾値が、0よりも大きくかつ1よりも小さく、第2の閾値が1よりも大きい、ステップと
を含む。 According to a first aspect, a method for detecting a scene change frame is provided. The video includes N picture group GOPs, where N is an integer greater than or equal to 2, and the method
Determining the maximum P frame P _max of all P frames in the Kth GOP, wherein the size of P _max is
And K is a variable in the range from M to N, 1 ≦ M ≦ N, and
The relative value between is greater than or equal to the first threshold, and
P _max is a scene change frame when it is determined that the relative value between is greater than or equal to the second threshold,
But a P _max, a median or average value of the size of a plurality of I frames between the nearest and the scene change frame before P _max to P _max,
Is the median or average value of the sizes of multiple P frames in the Kth GOP, the first threshold is greater than 0 and less than 1, and the second threshold is greater than 1. Includes and.

本発明の第1の態様で提供される方法では、ビデオ中のシーンチェンジフレームを検出するときに、Pフレーム中のシーンチェンジフレームを検出することができ、これにより、シーンチェンジフレームの検出漏れを効果的に低減することができる。 In the method provided in the first aspect of the present invention, when a scene change frame in a video is detected, a scene change frame in a P frame can be detected. It can be effectively reduced.

第1の態様の第1の可能な実施態様では、
に従って計算されてもよい。 In a first possible embodiment of the first aspect,
May be calculated according to

に簡単かつ効果的に反映させることができる。 Can be reflected easily and effectively.

一実施態様では、
に従って計算され、これに対応して、第1の閾値は、式
に従って計算されてもよい。 In one embodiment,
Correspondingly, the first threshold is calculated according to the equation
May be calculated according to

別の実施態様では、
に従って計算され、これに対応して、第1の閾値は、式
に従って計算されてもよい。 In another embodiment,
Correspondingly, the first threshold is calculated according to the equation
May be calculated according to

I_thresholdは、第1の閾値であり、I_medianは、ビデオのすべてのIフレームのサイズの中央値または平均値であり、P_medianは、ビデオのすべてのPフレームのサイズの中央値または平均値である。 I _threshold is the first threshold, I _median is the _median or average of the sizes of all I frames of the video, and P _median is the _median or average of the sizes of all P frames of the video It is.

I_medianは、ビデオのすべてのIフレームのサイズの中央値または平均値であり、P_medianは、ビデオのすべてのPフレームのサイズの中央値または平均値であるため、有効な閾値を、式
を使用して正確に計算することができ、これにより、Pフレーム中のシーンチェンジフレームを正確に検出することができる。 I _median is the _median or average of the sizes of all I frames in the video, and P _median is the _median or average of the sizes of all P frames in the video, so the effective threshold is
Can be accurately calculated, so that the scene change frame in the P frame can be accurately detected.

第1の態様または第1の態様の第1の可能な実施態様に関連して、第2の可能な実施態様では、
に従って計算されてもよい。
に簡単かつ効果的に反映させることができる。 In connection with the first possible embodiment of the first aspect or the first aspect, in a second possible embodiment:
May be calculated according to
Can be reflected easily and effectively.

は、特に、以下の式に従って計算されてもよい。
、ただし、P_−m、…、およびP₋₁は、K番目のGOP内のP_maxよりも前のPフレームを表し、P₁、…、およびP_nは、K番目のGOP内のP_maxよりも後のPフレームを表し、Fは、P_−m、…、およびP₋₁ならびにP₁、…、およびP_nのサイズの中央値または平均値を計算するために使用され、
m＝min（num_before_P_frames，max_num）、
n＝min（num_after_P_frames，max_num）、ただし、
num_before_P_framesは、K番目のGOP内のP_maxよりも前のPフレームの数であり、num_after_P_framesは、K番目のGOP内のP_maxよりも後のPフレームの数であり、max_numは、考慮する必要がある予め設定されたフレームの数を表す。 May in particular be calculated according to the following formula:
, Where P _−m ,..., And P ₋₁ represent P frames prior to P _max in the K th GOP, and P ₁ ,..., And P _n are P _max in the K th GOP. It represents a P frame later than, F is, P _-m, ..., and P _-1 and P _1, ..., and is used to calculate the median or average value of the size of P _n,
m = min (num_before_P_frames, max_num),
n = min (num_after_P_frames, max_num),
num_before_P_frames is the number of P frames before P _max in the Kth GOP, num_after_P_frames is the number of P frames after P _max in the Kth GOP, and max_num needs to be considered Represents the preset number of frames.

が、上記の式を使用して計算され、最初のビデオフレームに最も近く、かつ最初のビデオフレームが位置するGOP内にある一部のビデオフレームが考慮されることから、Pフレーム中のシーンチェンジフレームを検出する精度がさらに向上する。 Is calculated using the above formula and takes into account some video frames that are closest to the first video frame and within the GOP where the first video frame is located. The accuracy of detecting a frame is further improved.

第2の態様によれば、ビデオ品質評価を実施するための方法が提供される。ビデオは、N個のピクチャグループGOPを含み、Nは、2以上の整数であり、本方法は、
K番目のGOP内のすべてのPフレームのうちの最大PフレームP_maxを判定するステップであって、P_maxのサイズが、
であり、Kが、MからNまでの範囲の変数であり、1≦M≦Nである、ステップと、
の間の相対値が第1の閾値以上であり、
の間の相対値が第2の閾値以上であり、かつK番目のGOP内にBフレームが存在しないと判定された場合、または、
の間の相対値が第1の閾値以上であり、
の間の相対値が第2の閾値以上であり、K番目のGOP内にBフレームが存在し、かつ
の間の相対値が第3の閾値以上であると判定された場合、P_maxはシーンチェンジフレームであると判定するステップであって、
が、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間の複数のIフレームのサイズの中央値または平均値であり、
が、K番目のGOP内の複数のPフレームのサイズの中央値または平均値であり、
が、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間のすべてのBフレームのサイズの中央値または平均値であり、第1の閾値が、0よりも大きくかつ1よりも小さく、第2の閾値が1よりも大きく、第3の閾値が1よりも大きい、ステップと
を含む。 According to a second aspect, a method for performing video quality assessment is provided. The video includes N picture group GOPs, where N is an integer greater than or equal to 2, and the method
Determining the maximum P frame P _max of all P frames in the Kth GOP, wherein the size of P _max is
And K is a variable in the range from M to N, 1 ≦ M ≦ N, and
The relative value between is greater than or equal to the first threshold,
If it is determined that the relative value between is greater than or equal to the second threshold and there is no B frame in the Kth GOP, or
The relative value between is greater than or equal to the first threshold,
The relative value between is greater than or equal to the second threshold, there is a B frame in the Kth GOP, and
If it is determined that the relative value between is greater than or equal to the third threshold, P _max is a step of determining that it is a scene change frame,
But a P _max, a median or average value of the size of a plurality of I frames between the nearest and the scene change frame before P _max to P _max,
Is the median or average of the sizes of multiple P frames in the Kth GOP,
Large but the P _max, a median or average value of the sizes of all B-frames between the nearest and P _max than previous scene change frame into P _max, the first threshold, than 0 And a step in which the second threshold value is greater than 1 and the third threshold value is greater than 1.

本発明の第2の態様で提供される方法では、ビデオ中のシーンチェンジフレームを検出するときに、Pフレーム中のシーンチェンジフレームを検出することができ、これにより、シーンチェンジフレームの検出漏れを効果的に低減することができる。また、Pフレーム中のシーンチェンジフレームを検出しているとき、IフレームおよびPフレームのサイズだけでなく、Bフレームのサイズも考慮されるため、Pフレーム中のシーンチェンジフレームを検出する精度がさらに向上する。 In the method provided in the second aspect of the present invention, when detecting a scene change frame in a video, it is possible to detect a scene change frame in a P frame. It can be effectively reduced. Also, when detecting a scene change frame in the P frame, not only the size of the I frame and P frame but also the size of the B frame are considered, so the accuracy of detecting the scene change frame in the P frame is further increased improves.

第2の態様の第1の可能な実施態様では、
に従って計算されてもよい。 In a first possible embodiment of the second aspect,
May be calculated according to

一実施態様では、
に従って計算されてもよく、これに対応して、第1の閾値は、式
に従って計算されてもよい。 In one embodiment,
Correspondingly, the first threshold may be calculated according to the equation
May be calculated according to

別の実施態様では、
に従って計算されてもよく、これに対応して、第1の閾値は、式
に従って計算されてもよい。 In another embodiment,
Correspondingly, the first threshold may be calculated according to the equation
May be calculated according to

第2の態様または第2の態様の第1の可能な実施態様に関連して、第2の可能な実施態様では、
に従って計算されてもよい。
に簡単かつ効果的に反映させることができる。 In connection with the first possible embodiment of the second aspect or the second aspect, in a second possible embodiment:
May be calculated according to
Can be reflected easily and effectively.

第2の態様または第2の態様の第1のもしくは第2の可能な実施態様に関連して、第3の可能な実施態様では、
に従って計算されてもよい。 In connection with the second aspect or the first or second possible embodiment of the second aspect, in a third possible embodiment:
May be calculated according to

一実施態様では、
に従って計算され、これに対応して、第3の閾値は、式
に従って計算されてもよい。 In one embodiment,
Correspondingly, the third threshold is calculated according to the equation
May be calculated according to

別の実施態様では、
に従って計算され、これに対応して、第3の閾値は、式
に従って計算されてもよい。 In another embodiment,
Correspondingly, the third threshold is calculated according to the equation
May be calculated according to

B_thresholdは第3の閾値であり、P_medianは、ビデオのすべてのPフレームのサイズの中央値または平均値であり、B_medianは、ビデオのすべてのBフレームのサイズの中央値または平均値である。 B _threshold is the third threshold, P _median is the _median or average of the sizes of all P frames of the video, and B _median is the _median or average of the sizes of all B frames of the video is there.

P_medianは、ビデオのすべてのPフレームのサイズの中央値または平均値であり、B_medianは、ビデオのすべてのBフレームのサイズの中央値または平均値であるため、有効な閾値を、式
を使用して正確に計算することができ、これにより、Pフレーム中のシーンチェンジフレームを正確に検出することができる。 P _median is the _median or average of the sizes of all P frames of the video, and B _median is the _median or average of the sizes of all B frames of the video, so the effective threshold is
Can be accurately calculated, so that the scene change frame in the P frame can be accurately detected.

第2の態様の第3の可能な実施態様に関連して、第4の可能な実施態様では、K番目のGOP内のP_maxがシーンチェンジフレームとして判定された後に、ビデオ中のシーンチェンジフレームとして判定されたPフレーム以外のPフレームのサイズの中央値または平均値が、新しいP_medianとして使用され、新しいB_thresholdが、式
に従って計算される。新しいB_thresholdは、次のGOP内のP_maxがシーンチェンジフレームであるかどうかを判定するために使用される。 In connection with the third possible embodiment of the second aspect, in a fourth possible embodiment, the scene change frame in the video is determined after P _max in the Kth GOP is determined as the scene change frame. The _median or average size of P frames other than the P frame determined as is used as the new P _median and the new B _threshold is
Calculated according to The new B _threshold is used to determine if P _max in the next GOP is a scene change frame.

第2の態様の第4の可能な実施態様に関連して、第3の閾値B_thresholdをリアルタイムで更新することができるため、シーンチェンジフレームとして判定されたPフレームの影響は適時に排除され、これにより、Pフレーム中のシーンチェンジフレームを検出する精度がさらに向上する。 In connection with the fourth possible implementation of the second aspect, the third threshold B _threshold can be updated in real time, so the effects of P frames determined as scene change frames are eliminated in a timely manner, This further improves the accuracy of detecting the scene change frame in the P frame.

第3の態様によれば、ビデオ中のシーンチェンジフレームを検出するための検出装置が提供される。ビデオは、N個のGOPを含み、Nは、2以上の整数であり、検出装置は、第1の判定部および第2の判定部を含む。 According to the third aspect, a detection device for detecting a scene change frame in a video is provided. The video includes N GOPs, where N is an integer equal to or greater than 2, and the detection device includes a first determination unit and a second determination unit.

第1の判定部は、K番目のGOP内のすべてのPフレームのうちの最大PフレームP_maxを判定し、P_maxのサイズは、
であり、Kは、MからNまでの範囲の変数であり、1≦M≦Nである、ように構成され、
第2の判定部は、
の間の相対値が第1の閾値以上であり、かつ
の間の相対値が第2の閾値以上であると判定された場合に、P_maxはシーンチェンジフレームであると判定し、
は、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間の複数のIフレームのサイズの中央値または平均値であり、
は、K番目のGOP内の複数のPフレームのサイズの中央値または平均値であり、第1の閾値は、0よりも大きくかつ1よりも小さく、第2の閾値は1よりも大きい、ように構成される。 The first determination unit determines the maximum P frame P _max among all the P frames in the Kth GOP, and the size of P _max is:
And K is a variable ranging from M to N, and is configured such that 1 ≦ M ≦ N,
The second determination unit
The relative value between is greater than or equal to the first threshold, and
When it is determined that the relative value between is greater than or equal to the second threshold, P _max is determined to be a scene change frame,
Has a P _max, a median or average value of the size of a plurality of I frames between the nearest and the scene change frame before P _max to P _max,
Is the median or average of the sizes of multiple P frames in the Kth GOP, the first threshold is greater than 0 and less than 1, and the second threshold is greater than 1. Configured.

本発明の第3の態様で提供される検出装置では、ビデオ中のシーンチェンジフレームを検出するときに、Pフレーム中のシーンチェンジフレームを検出することができ、これにより、シーンチェンジフレームの検出漏れを効果的に低減することができる。 In the detection apparatus provided in the third aspect of the present invention, when detecting a scene change frame in a video, a scene change frame in a P frame can be detected. Can be effectively reduced.

第3の態様の第1の可能な実施態様では、第2の判定部は、特に、式
を計算してもよい。 In the first possible embodiment of the third aspect, the second determining unit is notably the formula
May be calculated.

一実施態様では、第2の判定部は、式
を計算し、これに対応して、第2の判定部は、式
に従って第1の閾値を計算してもよい。 In one embodiment, the second determination unit has the formula
Corresponding to this, the second determination unit calculates
The first threshold may be calculated according to

別の実施態様では、第2の判定部は、式
を計算し、これに対応して、第2の判定部は、式
に従って第1の閾値を計算してもよい。 In another embodiment, the second determination unit has the formula
Corresponding to this, the second determination unit calculates
The first threshold may be calculated according to

第3の態様または第3の態様の第1の可能な実施態様に関連して、第2の可能な実施態様では、第2の判定部は、特に、式
を計算してもよい。
に簡単かつ効果的に反映させることができる。 In connection with the first possible embodiment of the third aspect or the third aspect, in the second possible embodiment, the second determining part is in particular the formula
May be calculated.
Can be reflected easily and effectively.

第2の判定部は、特に、以下の式に従って
を計算してもよい。
、ただし、P_−m、…、およびP₋₁は、K番目のGOP内のP_maxよりも前のPフレームを表し、P₁、…、およびP_nは、K番目のGOP内のP_maxよりも後のPフレームを表し、Fは、P_−m、…、およびP₋₁ならびにP₁、…、およびP_nのサイズの中央値または平均値を計算するために使用され、
m＝min（num_before_P_frames，max_num）、
n＝min（num_after_P_frames，max_num）、ただし、
num_before_P_framesは、K番目のGOP内のP_maxよりも前のPフレームの数であり、num_after_P_framesは、K番目のGOP内のP_maxよりも後のPフレームの数であり、max_numは、考慮する必要がある予め設定されたフレームの数を表す。 In particular, the second determination unit follows the following formula:
May be calculated.
, Where P _−m ,..., And P ₋₁ represent P frames prior to P _max in the K th GOP, and P ₁ ,..., And P _n are P _max in the K th GOP. It represents a P frame later than, F is, P _-m, ..., and P _-1 and P _1, ..., and is used to calculate the median or average value of the size of P _n,
m = min (num_before_P_frames, max_num),
n = min (num_after_P_frames, max_num),
num_before_P_frames is the number of P frames before P _max in the Kth GOP, num_after_P_frames is the number of P frames after P _max in the Kth GOP, and max_num needs to be considered Represents the preset number of frames.

第4の態様によれば、ビデオ中のシーンチェンジフレームを検出するための検出装置が提供される。ビデオは、N個のピクチャグループGOPを含み、Nは、2以上の整数であり、検出装置は、第1の判定部および第2の判定部を含む。 According to the fourth aspect, there is provided a detection device for detecting a scene change frame in a video. The video includes N picture groups GOP, where N is an integer equal to or greater than 2, and the detection device includes a first determination unit and a second determination unit.

第1の判定部は、K番目のGOP内のすべてのPフレームのうちの最大PフレームP_maxを判定し、P_maxのサイズは、
であり、Kは、MからNまでの範囲の変数であり、1≦M≦Nである、ように構成され、
第2の判定部は、
の間の相対値が第1の閾値以上であり、
の間の相対値が第2の閾値以上であり、かつK番目のGOP内にBフレームが存在しないと判定された場合、または、
の間の相対値が第1の閾値以上であり、
の間の相対値が第2の閾値以上であり、K番目のGOP内にBフレームが存在し、かつ
の間の相対値が第3の閾値以上であると判定された場合、P_maxはシーンチェンジフレームであると判定し、
は、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間の複数のIフレームのサイズの中央値または平均値であり、
は、K番目のGOP内の複数のPフレームのサイズの中央値または平均値であり、
は、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間のすべてのBフレームのサイズの中央値または平均値であり、第1の閾値は、0よりも大きくかつ1よりも小さく、第2の閾値は1よりも大きく、第3の閾値は1よりも大きい、ように構成される。 The first determination unit determines the maximum P frame P _max among all the P frames in the Kth GOP, and the size of P _max is:
And K is a variable ranging from M to N, and is configured such that 1 ≦ M ≦ N,
The second determination unit
The relative value between is greater than or equal to the first threshold,
If it is determined that the relative value between is greater than or equal to the second threshold and there is no B frame in the Kth GOP, or
The relative value between is greater than or equal to the first threshold,
The relative value between is greater than or equal to the second threshold, there is a B frame in the Kth GOP, and
When it is determined that the relative value between is greater than or equal to the third threshold, P _max is determined to be a scene change frame,
Has a P _max, a median or average value of the size of a plurality of I frames between the nearest and the scene change frame before P _max to P _max,
Is the median or average of the sizes of multiple P frames within the Kth GOP,
It has a P _max, a median or average value of the sizes of all B-frames between the nearest and the scene change frame before P _max to P _max, the first threshold value is greater than 0 And the second threshold is greater than 1 and the third threshold is greater than 1.

本発明の第4の態様で提供される検出装置では、ビデオ中のシーンチェンジフレームを検出するときに、Pフレーム中のシーンチェンジフレームを検出することができ、これにより、シーンチェンジフレームの検出漏れを効果的に低減することができる。また、Pフレーム中のシーンチェンジフレームを検出しているとき、IフレームおよびPフレームのサイズだけでなく、Bフレームのサイズも考慮されるため、Pフレーム中のシーンチェンジフレームを検出する精度がさらに向上する。 In the detection apparatus provided in the fourth aspect of the present invention, when detecting a scene change frame in a video, a scene change frame in a P frame can be detected. Can be effectively reduced. Also, when detecting a scene change frame in the P frame, not only the size of the I frame and P frame but also the size of the B frame are considered, so the accuracy of detecting the scene change frame in the P frame is further increased improves.

第4の態様の第1の可能な実施態様では、第2の判定部は、特に、式
を計算してもよい。 In a first possible embodiment of the fourth aspect, the second decision unit in particular has the formula
May be calculated.

第4の態様または第4の態様の第1の可能な実施態様に関連して、第2の可能な実施態様では、第2の判定部は、特に、式
を計算してもよい。
に簡単かつ効果的に反映させることができる。 In relation to the fourth aspect or the first possible embodiment of the fourth aspect, in the second possible embodiment, the second determining part is in particular the formula
May be calculated.
Can be reflected easily and effectively.

第4の態様または第4の態様の第1のもしくは第2の可能な実施態様に関連して、第3の可能な実施態様では、第2の判定部は、特に、式
を計算してもよい。 In connection with the first aspect or the second possible embodiment of the fourth aspect or the fourth aspect, in the third possible embodiment, the second determining part is in particular a formula
May be calculated.

I_thresholdは、第1の閾値であり、I_medianは、ビデオのすべてのIフレームのサイズの中央値または平均値であり、B_medianは、ビデオのすべてのBフレームのサイズの中央値または平均値である。 I _threshold is the first threshold, I _median is the _median or average of the sizes of all I frames of the video, and B _median is the _median or average of the sizes of all B frames of the video It is.

I_medianは、ビデオのすべてのIフレームのサイズの中央値または平均値であり、B_medianは、ビデオのすべてのBフレームのサイズの中央値または平均値であるため、有効な閾値を、式
を使用して正確に計算することができ、これにより、Pフレーム中のシーンチェンジフレームを正確に検出することができる。 I _median is the _median or average of the sizes of all I frames of the video, and B _median is the _median or average of the sizes of all the B frames of the video, so the effective threshold is
Can be accurately calculated, so that the scene change frame in the P frame can be accurately detected.

第4の態様の第3の可能な実施態様に関連して、第4の可能な実施態様では、K番目のGOP内のP_maxがシーンチェンジフレームとして判定された後に、ビデオ中のシーンチェンジフレームとして判定されたPフレーム以外のPフレームのサイズの中央値または平均値が、新しいP_medianとして使用され、新しいB_thresholdが、式
に従って計算される。新しいB_thresholdは、次のGOP内のP_maxがシーンチェンジフレームであるかどうかを判定するために使用される。 In connection with the third possible embodiment of the fourth aspect, in the fourth possible embodiment, after the P _max in the Kth GOP is determined as the scene change frame, the scene change frame in the video The _median or average size of P frames other than the P frame determined as is used as the new P _median and the new B _threshold is
Calculated according to The new B _threshold is used to determine if P _max in the next GOP is a scene change frame.

第4の態様の第4の可能な実施態様に関連して、第3の閾値Bthresholdをリアルタイムで更新することができるため、シーンチェンジフレームとして判定されたPフレームの影響は適時に排除され、これにより、Pフレーム中のシーンチェンジフレームを検出する精度がさらに向上する。 In relation to the fourth possible embodiment of the fourth aspect, the third threshold Bthreshold can be updated in real time, so that the influence of the P frame determined as the scene change frame is eliminated in a timely manner. This further improves the accuracy of detecting the scene change frame in the P frame.

第5の態様によれば、ビデオ中のシーンチェンジフレームを検出するための検出装置が提供され、この検出装置は、プロセッサおよびメモリを含む。 According to a fifth aspect, a detection device is provided for detecting a scene change frame in a video, the detection device including a processor and a memory.

メモリは、コンピュータ動作命令を記憶するように構成される。 The memory is configured to store computer operating instructions.

プロセッサは、検出装置が第1の態様もしくは第1の態様の可能な実施態様のいずれか1つまたは第2の態様もしくは第2の態様の可能な実施態様のいずれか1つで提供される方法を実行できるようにする、メモリに記憶されたコンピュータ動作命令を実行するように構成される。 The processor, wherein the detection device is provided in any one of the possible embodiments of the first aspect or the first aspect or in any one of the possible embodiments of the second aspect or the second aspect Is configured to execute computer operating instructions stored in memory.

本発明の第5の態様で提供される検出装置では、ビデオ中のシーンチェンジフレームを検出するときに、Pフレーム中のシーンチェンジフレームを検出することができ、これにより、シーンチェンジフレームの検出漏れを効果的に低減することができる。 In the detection apparatus provided in the fifth aspect of the present invention, when detecting a scene change frame in a video, a scene change frame in a P frame can be detected. Can be effectively reduced.

第6の態様によれば、検出デバイスが提供され、検出装置は、媒体部および検出装置を含む。 According to the sixth aspect, a detection device is provided, and the detection apparatus includes a medium unit and a detection apparatus.

媒体部は、ビデオを取得し、このビデオを検出装置に送信するように構成される。 The media portion is configured to acquire a video and send the video to a detection device.

検出装置は、媒体部からビデオを取得し、第3の態様もしくは第3の態様の可能な実施態様のいずれか1つ、第4の態様もしくは第4の態様の可能な実施態様のいずれか1つ、または第5の態様もしくは第5の態様の可能な実施態様のいずれか1つで提供される検出装置によって実行される動作を実行するように構成される。 The detection device acquires the video from the media part and any one of the third aspect or the possible embodiments of the third aspect, the fourth aspect or any of the possible embodiments of the fourth aspect Or the operation performed by the detection device provided in any one of the fifth aspect or any possible embodiment of the fifth aspect.

本発明の第6の態様で提供される検出デバイスでは、ビデオ中のシーンチェンジフレームを検出するときに、Pフレーム中のシーンチェンジフレームを検出することができ、これにより、シーンチェンジフレームの検出漏れを効果的に低減することができる。 In the detection device provided in the sixth aspect of the present invention, when detecting a scene change frame in a video, it is possible to detect a scene change frame in a P frame. Can be effectively reduced.

第7の態様によれば、ビデオ品質評価を実施するためのシステムが提供され、このシステムは、ビデオサーバ、送信デバイス、およびビデオ端末を含む。ビデオサーバによって送信されるビデオストリームは、送信デバイスを介してビデオ端末に送信される。 According to a seventh aspect, a system for performing video quality assessment is provided, the system including a video server, a transmitting device, and a video terminal. The video stream transmitted by the video server is transmitted to the video terminal via the transmitting device.

送信デバイスまたはビデオ端末は、特に、第3の態様もしくは第3の態様の可能な実施態様のいずれか1つ、第4の態様もしくは第4の態様の可能な実施態様のいずれか1つ、または第5の態様もしくは第5の態様の可能な実施態様のいずれか1つで提供される検出装置を含んでもよい。 The transmitting device or the video terminal, in particular, any one of the possible embodiments of the third aspect or the third aspect, any one of the possible embodiments of the fourth aspect or the fourth aspect, or The detection device provided in any one of the fifth aspect or possible embodiments of the fifth aspect may be included.

システムは、第1の検出装置をさらに含み、第1の検出装置は、特に、第3の態様もしくは第3の態様の可能な実施態様のいずれか1つ、第4の態様もしくは第4の態様の可能な実施態様のいずれか1つ、または第5の態様もしくは第5の態様の可能な実施態様のいずれか1つで提供される検出装置であってもよい。送信デバイス2020またはビデオ端末2030は、第1の検出装置に接続され、第1の検出装置は、第1の検出装置に接続された送信デバイスまたはビデオ端末を使用してビデオストリームを取得する。 The system further comprises a first detection device, in particular the first detection device, in particular any one of the third aspect or possible embodiments of the third aspect, the fourth aspect or the fourth aspect. It may be a detection device provided in any one of the possible embodiments, or in any one of the possible embodiments of the fifth aspect or the fifth aspect. The transmission device 2020 or the video terminal 2030 is connected to a first detection device, and the first detection device acquires a video stream using the transmission device or video terminal connected to the first detection device.

本発明の第7の態様で提供されるシステムでは、ビデオ中のシーンチェンジフレームを検出するときに、Pフレーム中のシーンチェンジフレームを検出することができ、これにより、シーンチェンジフレームの検出漏れを効果的に低減することができる。 In the system provided in the seventh aspect of the present invention, when a scene change frame in a video is detected, a scene change frame in a P frame can be detected. It can be effectively reduced.

本発明の実施形態の技術的解決策をより明確に説明するために、以下では、実施形態または従来技術を説明するために必要な添付図面について簡単に説明する。明らかに、以下の説明における添付図面は、本発明の一部の実施形態しか示しておらず、当業者は、創造的な努力なしにこれらの添付図面から他の図面をさらに得ることができる。 To describe the technical solutions of the embodiments of the present invention more clearly, the following briefly describes the accompanying drawings required for describing the embodiments or the prior art. Apparently, the accompanying drawings in the following description show only some embodiments of the present invention, and those skilled in the art can further obtain other drawings from these accompanying drawings without creative efforts.

本発明の実施形態1によるビデオシステム100のネットワーク構成の概略図である。1 is a schematic diagram of a network configuration of a video system 100 according to Embodiment 1 of the present invention. FIG. 本発明の実施形態1によるビデオシステム100のネットワーク構成の概略図である。1 is a schematic diagram of a network configuration of a video system 100 according to Embodiment 1 of the present invention. FIG. 本発明の実施形態1によるGOPの概略図である。1 is a schematic diagram of a GOP according to Embodiment 1 of the present invention. 本発明の実施形態1によるGOPの概略図である。1 is a schematic diagram of a GOP according to Embodiment 1 of the present invention. 本発明の実施形態1によるGOPの概略図である。1 is a schematic diagram of a GOP according to Embodiment 1 of the present invention. 本発明の実施形態1の実施態様Aによる方法の概略フローチャートである。2 is a schematic flowchart of a method according to Embodiment A of Embodiment 1 of the present invention. 本発明の実施形態1の実施態様Aによる方法の概略フローチャートである。2 is a schematic flowchart of a method according to Embodiment A of Embodiment 1 of the present invention. 本発明の実施形態1の実施態様Aによる方法の概略フローチャートである。2 is a schematic flowchart of a method according to Embodiment A of Embodiment 1 of the present invention. 本発明の実施形態1の実施態様Bによる方法の概略フローチャートである。2 is a schematic flowchart of a method according to Embodiment B of Embodiment 1 of the present invention. 本発明の実施形態1の実施態様Bによる方法の概略フローチャートである。2 is a schematic flowchart of a method according to Embodiment B of Embodiment 1 of the present invention. 本発明の実施形態1の実施態様Bによる方法の概略フローチャートである。2 is a schematic flowchart of a method according to Embodiment B of Embodiment 1 of the present invention. 本発明の実施形態1によるGOPの例の概略図である。FIG. 3 is a schematic diagram of an example of a GOP according to Embodiment 1 of the present invention. 本発明の実施形態1によるGOPの例の概略図である。FIG. 3 is a schematic diagram of an example of a GOP according to Embodiment 1 of the present invention. 本発明の実施形態1によるGOPの例の概略図である。FIG. 3 is a schematic diagram of an example of a GOP according to Embodiment 1 of the present invention. 本発明の実施形態2による検出装置200の概略構成図である。FIG. 5 is a schematic configuration diagram of a detection device 200 according to Embodiment 2 of the present invention. 本発明の実施形態3による検出装置1000の概略構成図である。FIG. 5 is a schematic configuration diagram of a detection apparatus 1000 according to Embodiment 3 of the present invention. 本発明の実施形態4による検出デバイス400の概略構成図である。FIG. 6 is a schematic configuration diagram of a detection device 400 according to Embodiment 4 of the present invention. 本発明の実施形態5によるシステム2000の概略構成図である。FIG. 6 is a schematic configuration diagram of a system 2000 according to Embodiment 5 of the present invention. 本発明の実施形態5によるシステム2000の概略構成図である。FIG. 6 is a schematic configuration diagram of a system 2000 according to Embodiment 5 of the present invention. 本発明の実施形態5によるシステム2000の概略構成図である。FIG. 6 is a schematic configuration diagram of a system 2000 according to Embodiment 5 of the present invention.

以下では、本発明の実施形態の添付図面を参照しながら、本発明の実施形態の技術的解決策を明確かつ完全に説明する。明らかに、説明されている実施形態は、本発明の実施形態の一部であり、本発明の実施形態の全部ではない。創造的な努力なしに本発明の実施形態に基づいて当業者によって得られる他のすべての実施形態は、本発明の保護範囲内に含まれるものとする。 The following clearly and completely describes the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Apparently, the described embodiments are a part of the embodiments of the present invention and not all of the embodiments of the present invention. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts shall fall within the protection scope of the present invention.

図1Aは、本発明の一実施形態によるビデオシステム100のネットワーク構成の概略図である。ビデオシステム100は、ビデオサーバ110、1つ以上の送信デバイス120、およびビデオ端末130を含む。ビデオサーバ110によって送信されるビデオストリームは、送信デバイス120を介してビデオ端末130に送信される。 FIG. 1A is a schematic diagram of a network configuration of a video system 100 according to an embodiment of the present invention. Video system 100 includes a video server 110, one or more transmission devices 120, and a video terminal 130. The video stream transmitted by the video server 110 is transmitted to the video terminal 130 via the transmission device 120.

ビデオシステム100は、特に、図1Bに示すIPTVシステムであってもよい。IPTVシステムにおいて、ビデオサーバ110は、具体的にはビデオヘッドエンド（video headend、video HE）である。送信デバイス120は、特に、コアルータ（Core Router、CR）、ブロードバンドネットワークゲートウェイ（Broadband Network Gateway、BNG）、または光回線終端装置（Optical Line Terminal、OLT）などのネットワークデバイスを含む。ビデオ端末130は、具体的にはセットトップボックス（Set Top Box、STB）である。 The video system 100 may in particular be an IPTV system as shown in FIG. 1B. In the IPTV system, the video server 110 is specifically a video headend (video HE). The transmission device 120 includes, in particular, a network device such as a core router (CR), a broadband network gateway (BNG), or an optical line terminal (OLT). Specifically, the video terminal 130 is a set top box (STB).

図1Aおよび図1Bに示すビデオシステムでは、ビデオストリームがビデオサーバからビデオ端末に送信されるとき、ネットワーク状態の変化に起因してビデオストリームにパケット損失、遅延、ジッタ、または乱れなどの異常現象が発生する場合がある。これらの異常現象は、ビデオ端末の画面に表示されるビデオ画像において不規則な表示およびフレームフリーズなどが発生し、その結果、ユーザのビデオ視聴体験が損なわれるという問題をもたらす場合がある。したがって、ビデオ品質を評価することによって、ユーザのビデオ体験を監視する必要がある。 In the video system shown in FIGS. 1A and 1B, when a video stream is transmitted from a video server to a video terminal, abnormal phenomena such as packet loss, delay, jitter, or disturbance are caused in the video stream due to changes in the network state. May occur. These abnormal phenomena may cause a problem that irregular display and frame freezing occur in the video image displayed on the screen of the video terminal, and as a result, the user's video viewing experience is impaired. Therefore, there is a need to monitor a user's video experience by assessing video quality.

ビデオ品質が評価されているとき、通常は最初に、シーンが変化する位置、すなわちシーンチェンジフレームの位置を検出する必要があり、次に、そのシーンに基づいてビデオ品質が評価される。 When video quality is being evaluated, it is usually necessary to first detect where the scene changes, i.e. the position of the scene change frame, and then evaluate the video quality based on that scene.

例えば、ビデオ符号化中に生じるビデオ符号化損失は、ビデオ符号化タイプ、フレームレート、解像度、およびビットレートだけでなく、シーンの複雑度にも関連するので、ビデオ符号化損失を評価するために、シーンが変化する位置を最初に検出する必要がある。 For example, to assess video coding loss because video coding loss that occurs during video coding is related not only to video coding type, frame rate, resolution, and bit rate, but also to the complexity of the scene. First, the position where the scene changes needs to be detected.

別の例では、ビデオ送信プロセスでパケット損失が発生した場合、ビデオ端末のデコーダは、通常、破損したフレームに対して誤り補償を行うために破損したフレームの前のフレームの対応領域のビデオコンテンツを破損したフレームの破損した領域のコンテンツとして使用し、したがって、破損したフレームと前のフレームとのコンテンツ差分が小さいほど、補償効果は良くなる。しかしながら、破損したフレームがシーンチェンジフレームである場合、シーンチェンジフレームのコンテンツは、シーンチェンジフレームの前のフレームのコンテンツとほぼ完全に異なるため、補償効果は最悪である。したがって、ビデオ品質がパケット損失の影響を受ける場合、破損したフレームがシーンチェンジフレームであるかどうかを考慮する必要がある。 In another example, if a packet loss occurs during the video transmission process, the video terminal decoder typically retrieves the video content in the corresponding region of the previous frame of the damaged frame to provide error compensation for the damaged frame. As the content of the damaged frame is used as the content of the damaged area, the smaller the content difference between the damaged frame and the previous frame, the better the compensation effect. However, when the damaged frame is a scene change frame, the content of the scene change frame is almost completely different from the content of the frame before the scene change frame, so the compensation effect is the worst. Therefore, if the video quality is affected by packet loss, it is necessary to consider whether the corrupted frame is a scene change frame.

別の例では、符号化ビデオフレームシーケンスは、複数のピクチャグループ（Group of Picture、GOP）を含む。図2Aに示すように、各GOPは、Iフレームで開始され、その後にいくつかのPフレームおよびBフレームが続き、次のIフレームの前のフレームで終了する。Iフレームはフレーム内フレームであり、Pフレームは前方向参照フレームであり、Bフレームは双方向参照フレームである。GOP内のフレームでパケット損失が発生した場合、パケット損失に起因する復号誤りは、連続して次のビデオフレームに拡大し、通常はGOP内の最後のフレームで終了する。図2Bに示すように、GOP1内の4番目のフレームが破損した場合、通常、誤りは連続して拡大すると考えられ、GOPの最後のフレームで終了する。しかしながら、GOP内にシーンチェンジフレームが存在する場合、シーンチェンジフレームのコンテンツは、シーンチェンジフレームの前のフレームのコンテンツとほぼ完全に異なり、フレーム内予測符号化は、通常、符号化中に実行される（フレーム内予測符号化は、シーンチェンジフレームのほとんどのマクロブロックに対して実行される）。したがって、GOP内のシーンチェンジフレームよりも前のフレームが破損した場合、誤りの拡大は、シーンチェンジフレームで終了する。図2cに示すように、GOP1の6番目のフレームはシーンチェンジフレームであり、4番目のフレームが破損した場合、誤りの拡大は、6番目のフレームで終了する。したがって、ビデオ品質がパケット損失の影響を受ける場合、シーンチェンジフレームを検出する必要がある。 In another example, the encoded video frame sequence includes a plurality of group of pictures (GOP). As shown in FIG. 2A, each GOP starts with an I frame, followed by several P and B frames, and ends with a frame before the next I frame. The I frame is an intra-frame, the P frame is a forward reference frame, and the B frame is a bidirectional reference frame. When a packet loss occurs in a frame within a GOP, a decoding error due to the packet loss continuously expands to the next video frame, and normally ends with the last frame within the GOP. As shown in FIG. 2B, if the fourth frame in GOP1 is corrupted, the error is usually considered to expand continuously and ends with the last frame of the GOP. However, if there is a scene change frame in the GOP, the content of the scene change frame is almost completely different from the content of the frame before the scene change frame, and intra-frame predictive encoding is usually performed during encoding. (Intraframe predictive coding is performed for most macroblocks of a scene change frame). Therefore, when a frame before the scene change frame in the GOP is damaged, error expansion ends at the scene change frame. As shown in FIG. 2c, the sixth frame of GOP1 is a scene change frame, and if the fourth frame is damaged, error expansion ends at the sixth frame. Therefore, when the video quality is affected by packet loss, it is necessary to detect a scene change frame.

本発明のこの実施形態で説明されるフレームのシーケンスは、時間的に見てビデオのフレームのシーケンスであることに留意されたい。例えば、持続時間T（例えば10秒）を有するビデオは、時点t1におけるビデオフレーム1と時点t2におけるビデオフレーム2とを含む。t1がt2よりも小さい場合、例えば、t1が1秒30ミリ秒であり、t2が5秒40ミリ秒である場合、ビデオフレーム1はビデオフレーム2の前にある。 Note that the sequence of frames described in this embodiment of the invention is a sequence of video frames in time. For example, a video having a duration T (eg, 10 seconds) includes video frame 1 at time t1 and video frame 2 at time t2. If t1 is less than t2, for example, if t1 is 1 second 30 milliseconds and t2 is 5 seconds 40 milliseconds, then video frame 1 is before video frame 2.

特定の実施態様では、シーンチェンジフレームを検出するための検出装置をビデオシステムに配置することができる。検出装置は、ビデオストリームが通過する任意のデバイス（例えば、送信デバイス120またはビデオ端末130）に配置されてもよいし、ミラーリング方式でビデオストリームを取得するように、ビデオストリームが通過する任意のデバイスをバイパスしてもよい。 In certain embodiments, a detection device for detecting scene change frames may be located in the video system. The detection apparatus may be located in any device through which the video stream passes (eg, transmitting device 120 or video terminal 130), or any device through which the video stream passes so as to obtain the video stream in a mirrored manner. May be bypassed.

図3Aおよび図3Bは、本発明の実施形態1による方法の概略フローチャートである。本発明の実施形態1における方法は、図1Aおよび図1Bに示したビデオシステム100に適用することができ、検出装置によって実行される。 3A and 3B are schematic flowcharts of a method according to Embodiment 1 of the present invention. The method in Embodiment 1 of the present invention can be applied to the video system 100 shown in FIGS. 1A and 1B, and is executed by a detection apparatus.

本発明の実施形態1では、ビデオ（以下、検出対象ビデオと呼ぶ）におけるシーンチェンジフレームを検出する。検出対象ビデオは、ビデオファイルから読み出されてもよいし、取得したビデオストリームから取得されてもよい。検出対象ビデオは、特に、完全なビデオであってもよいし、ビデオのビデオセグメントであってもよい。比較的長いビデオの場合、通常は測定時間ウィンドウが設定され、測定時間ウィンドウ内のビデオセグメントが検出される。例えば、ビデオを検出しているとき、測定時間ウィンドウの長さは10秒に設定され、最初に、ビデオの0〜10秒のビデオセグメントが検出対象ビデオとして検出され、次に、10〜20秒のビデオセグメントが検出対象ビデオとして検出され、このような検出が類推によって行われる。 In Embodiment 1 of the present invention, a scene change frame in a video (hereinafter referred to as a detection target video) is detected. The detection target video may be read from the video file or acquired from the acquired video stream. The video to be detected may in particular be a complete video or a video segment of the video. For relatively long videos, a measurement time window is usually set and video segments within the measurement time window are detected. For example, when detecting video, the length of the measurement time window is set to 10 seconds, first the 0-10 second video segment of the video is detected as the video to be detected, then 10-20 seconds Are detected as detection target videos, and such detection is performed by analogy.

検出前に、検出モジュールは最初に、検出対象ビデオの各ビデオフレームのタイプ（Iフレーム、Pフレーム、またはBフレームなど）およびサイズを決定することができる。 Prior to detection, the detection module may first determine the type (such as I frame, P frame, or B frame) and size of each video frame of the video to be detected.

例えば、ビデオストリームはリアルタイムで取得され、ビデオフレームに関する情報は、測定時間ウィンドウ（例えば、10〜20秒）内のビデオストリームに対応するパケットから抽出され、ビデオフレームのサイズ（バイト単位の）が計算される。ビデオフレームのサイズを計算する具体的なプロセスは以下の通りである。測定時間ウィンドウ内の各パケットに関して、最初に、現在のビデオフレームの開始識別子がパケットのヘッダから発見され、次に、その開始識別子を含むパケットのロード長および後続のパケットのロード長が、次のビデオフレームの開始識別子が発見されるまで累積される。累積合計が、現在のビデオフレームのサイズである。ビデオフレームのサイズを計算する特定の実施態様については、標準ITU-T P1201.2を参照されたい。 For example, the video stream is acquired in real time, information about the video frame is extracted from the packet corresponding to the video stream within a measurement time window (eg 10-20 seconds), and the size of the video frame (in bytes) is calculated Is done. A specific process for calculating the size of the video frame is as follows. For each packet in the measurement time window, first the start identifier of the current video frame is found from the packet header, then the load length of the packet containing that start identifier and the load length of the subsequent packets are It is accumulated until the start identifier of the video frame is found. The cumulative total is the size of the current video frame. See standard ITU-T P1201.2 for a specific implementation of calculating the size of a video frame.

次に、測定時間ウィンドウ内のすべてのビデオフレームのタイプが判定される。具体的には、ビデオフレームのタイプは、パケットのパケットヘッダ内のフィールドrandom_access_indicatorに基づいて判定することができる。Iフレームに関しては、ビデオが暗号化されているか否かに関わらず、フレームのタイプは、random_access_indicatorに基づいて判定することができる。非Iフレームに関しては、ビデオが暗号化されていない場合、フレームのタイプは、ビデオフレームのフレームヘッダから直接取得することができる。ビデオが暗号化されているか、またはビデオフレームのフレームヘッダが失われている場合、最初にGOPモードを、フレームのサイズまたはフレームの表示タイムスタンプ（Present Time Stamp、PTS）に基づいて推定することができる。GOPモードは、通常、PBBPBBまたはPBBBPBBBである。GOPモードは、現在のPTSの値と前のPTSの値との差を使用して記述することができる。GOPモードが判定されたら、失われたまたは暗号化されたビデオフレームすべてのモードを判定することができる。ビデオフレームのタイプを判定する特定の実施態様については、標準ITU-T P1201.2を参照されたい。 Next, the types of all video frames within the measurement time window are determined. Specifically, the type of video frame can be determined based on the field random_access_indicator in the packet header of the packet. For I frames, the type of frame can be determined based on random_access_indicator regardless of whether the video is encrypted or not. For non-I frames, if the video is not encrypted, the frame type can be obtained directly from the frame header of the video frame. If the video is encrypted or the frame header of the video frame is lost, first the GOP mode can be estimated based on the size of the frame or the present time stamp (PTS) it can. The GOP mode is usually PBBPBB or PBBBPBBB. The GOP mode can be described using the difference between the current PTS value and the previous PTS value. Once the GOP mode is determined, the mode of all lost or encrypted video frames can be determined. See standard ITU-T P1201.2 for a specific implementation for determining the type of video frame.

上記の2つのステップを実行することによって、検出対象ビデオをいくつかのGOPに分割することができる。1つの検出対象ビデオは、通常、複数のGOPを含む。図5Aの例に示すように、1つの検出対象ビデオはN個のGOPを含み、Nは2以上の整数であると想定される。図5Aに示す例では、単色で塗りつぶされたビデオフレームは、Iフレームであり、斜線で塗りつぶされたビデオフレームは、Pフレームであり、塗りつぶされていないビデオフレームは、Bフレームである。 By performing the above two steps, the video to be detected can be divided into several GOPs. One detection target video usually includes a plurality of GOPs. As shown in the example of FIG. 5A, one detection target video includes N GOPs, and N is assumed to be an integer of 2 or more. In the example shown in FIG. 5A, a video frame filled with a single color is an I frame, a video frame filled with diagonal lines is a P frame, and an unfilled video frame is a B frame.

1ビデオフレームは、1画像である。ビデオのビデオフレームが符号化されているとき、ビデオフレームは、異なるタイプのフレーム、例えば、Iフレーム、Pフレーム、およびBフレームに符号化される。Iフレームは、フレーム内予測フレームであり、フレーム内のデータのみが符号化中に参照され、したがって、Iフレームは、完全な画像データを含む。Pフレームは、予測フレーム、言い換えれば一方向差分フレームであり、このフレームと前のIフレーム（またはPフレーム）との差分を記録するために使用される。Bフレームは、双方向補間予測フレーム、言い換えれば双方向差分フレームであり、このフレームと前のフレームおよび次のフレームの各々との差分を記録するために使用される。 One video frame is one image. When video frames of a video are being encoded, the video frames are encoded into different types of frames, eg, I frames, P frames, and B frames. An I frame is an intra-frame prediction frame, and only the data in the frame is referenced during encoding, and therefore the I frame contains complete image data. The P frame is a prediction frame, in other words, a unidirectional difference frame, and is used to record a difference between this frame and the previous I frame (or P frame). The B frame is a bi-directional interpolation prediction frame, in other words, a bi-directional difference frame, and is used to record the difference between this frame and each of the previous frame and the next frame.

Iフレームは、通常、Pフレームよりも大きく、Pフレームは、通常、Bフレームよりも大きい。一般に、Iフレームのサイズは、Pフレームの2〜5倍であり、Pフレームのサイズは、Bフレームの2〜5倍である。 An I frame is usually larger than a P frame, and a P frame is usually larger than a B frame. In general, the size of the I frame is 2 to 5 times that of the P frame, and the size of the P frame is 2 to 5 times that of the B frame.

シーンチェンジフレームのコンテンツとシーンチェンジフレームの前のフレームのコンテンツとの差分は比較的大きいので、シーンチェンジフレームはPフレームに符号化されるが、フレーム内予測符号化は、シーンチェンジフレーム内のほとんどのマクロブロックに対して、シーンチェンジフレーム内の別のマクロブロックを参照して実行される。したがって、符号化されたシーンチェンジフレームのサイズは比較的大きい。PフレームのサイズがIフレームのサイズの半分を超える場合、そのPフレームはシーンチェンジフレームである可能性が高い。したがって、Pフレームであるシーンチェンジフレームを検出するとき、PフレームとIフレームとの相対関係を参照することができる。 Since the difference between the content of the scene change frame and the content of the frame before the scene change frame is relatively large, the scene change frame is encoded as a P frame, but intraframe predictive encoding is mostly used in a scene change frame. This macro block is executed with reference to another macro block in the scene change frame. Therefore, the size of the encoded scene change frame is relatively large. If the size of a P frame exceeds half the size of an I frame, the P frame is likely to be a scene change frame. Therefore, when detecting a scene change frame that is a P frame, the relative relationship between the P frame and the I frame can be referred to.

しかしながら、ビデオの画像コンテンツが比較的速く変化するとき、例えば、激しいサッカーの試合などの比較的激しい運動のショットシーンでは、隣接する2つのビデオフレーム間の相関性が小さいため、シーンチェンジフレームではないフレームが符号化されているとき、フレームは前のビデオフレームを参照してPフレームに符号化されるが、圧縮率は比較的低く、Pフレームのサイズは、比較的大きく、Iフレームのサイズの半分を超えることさえある。この場合、シーンチェンジフレームではない隣接するPフレーム間のサイズ差は大きくないが、シーンチェンジフレームであるPフレームと、このPフレームに隣接し、かつシーンチェンジフレームではないPフレームとのサイズ差は比較的大きい。したがって、Pフレームであるシーンチェンジフレームを検出するとき、隣接するPフレームのサイズの比を参照することもできる。 However, when the video image content changes relatively quickly, for example, in a relatively intensely moving shot scene such as a fierce soccer game, it is not a scene change frame because the correlation between two adjacent video frames is small When a frame is encoded, the frame is encoded into a P frame with reference to the previous video frame, but the compression ratio is relatively low, the size of the P frame is relatively large, and the size of the I frame Even more than half. In this case, the size difference between adjacent P frames that are not scene change frames is not large, but the size difference between a P frame that is a scene change frame and a P frame that is adjacent to this P frame and is not a scene change frame is Relatively large. Therefore, when detecting a scene change frame which is a P frame, it is possible to refer to the ratio of the sizes of adjacent P frames.

上記の分析に基づいて、以下では、図3Aを参照して、検出対象ビデオ中のシーンチェンジフレームを検出するための、本発明の実施形態1の実施態様Aを詳細に説明する。ビデオはN個のGOPを含み、Nは2以上の整数である。 Based on the above analysis, Embodiment A of Embodiment 1 of the present invention for detecting a scene change frame in the detection target video will be described in detail below with reference to FIG. 3A. The video includes N GOPs, where N is an integer greater than or equal to 2.

検出対象ビデオは、ビデオファイルのビデオのセグメントであってもよいし、ビデオサーバによってビデオ端末に送信されるビデオストリームのビデオのセグメントなどの、ビデオストリームのビデオのセグメントであってもよい。これに対応して、検出装置は、ビデオストリームが通過する任意のデバイス（送信デバイス120またはビデオ端末130など）に配置されてもよいし、ミラーリング方式でビデオストリームを取得するように、ビデオストリームが通過する任意のデバイスをバイパスしてもよい。 The video to be detected may be a video segment of a video file or may be a video segment of a video stream, such as a video segment of a video stream transmitted by a video server to a video terminal. Correspondingly, the detection device may be placed in any device through which the video stream passes (such as the transmitting device 120 or the video terminal 130), and the video stream is acquired so as to obtain the video stream in a mirroring manner. Any device that passes through may be bypassed.

図3Aに示すように、本発明の実施形態1の実施態様Aで提供される方法は、以下のステップを含む。 As shown in FIG. 3A, the method provided in implementation A of embodiment 1 of the present invention includes the following steps.

検出装置は、N個のGOPのうちのM番目のGOPから開始される各GOP内のすべてのPフレームのうちの最大PフレームP_maxがシーンチェンジフレームであるかどうかを判定するために、N個のGOPのうちのM番目のGOPから開始される各GOPに対して以下の操作を実行するが、ただし、Mは、1以上かつN以下である。 The detection device determines whether the maximum P frame P _max of all P frames in each GOP starting from the Mth GOP out of N GOPs is a scene change frame. The following operation is performed on each GOP starting from the Mth GOP among the GOPs, where M is 1 or more and N or less.

具体的には、検出は最初に、最初のGOP（すなわち、M＝1）に対して行われてもよい。あるいは、検出は最初に、最初のGOPに続くGOPに対して行われてもよい。例えば、検出対象ビデオは、ビデオストリームの冒頭の、ビデオのセグメントであり、最初の2つのGOP内のフレームのサイズは、通常は参照値を持たないため、通常、検出は最初に3番目のGOP（M＝3）に対して行われる。 Specifically, detection may be initially performed on the first GOP (ie, M = 1). Alternatively, detection may first be performed on a GOP that follows the first GOP. For example, the detected video is the segment of the video at the beginning of the video stream, and the size of the frame in the first two GOPs usually does not have a reference value, so detection is usually the first GOP first (M = 3).

ステップ102：K番目のGOP内のすべてのPフレームのうちの最大PフレームP_maxを判定するが、ただし、P_maxのサイズは、
であり、Kは、MからNまでの範囲の変数であり、1≦M≦Nである。 Step 102: Determine the maximum P frame P _max of all P frames in the Kth GOP, where the size of P _max is
And K is a variable ranging from M to N, where 1 ≦ M ≦ N.

ステップ103：
の間の相対値が第1の閾値以上であり、かつ
の間の相対値が第2の閾値以上であると判定された場合、P_maxはシーンチェンジフレームであると判定するが、ただし、
は、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間の複数のIフレームのサイズの中央値または平均値であり、
は、K番目のGOP内の複数のPフレームのサイズの中央値または平均値であり、第1の閾値は、0よりも大きくかつ1よりも小さく、第2の閾値は1よりも大きい。 Step 103:
The relative value between is greater than or equal to the first threshold, and
If it is determined that the relative value between is greater than or equal to the second threshold, P _max is determined to be a scene change frame,
Has a P _max, a median or average value of the size of a plurality of I frames between the nearest and the scene change frame before P _max to P _max,
Is the median or average value of the sizes of the P frames in the Kth GOP, the first threshold is greater than 0 and less than 1, and the second threshold is greater than 1.

また、
の間の相対値が第1の閾値よりも小さいか、または
の間の相対値が第2の閾値よりも小さいと判定された場合、P_maxはシーンチェンジフレームではないと判定される。 Also,
The relative value between is less than the first threshold, or
If it is determined that the relative value between is smaller than the second threshold, P _max is determined not to be a scene change frame.

図5Bに示すように、K番目のGOP内の2番目のPフレームはP_maxであり、K番目のGOP内のIフレームはシーンチェンジフレームではなく、（K−1）番目のGOPにはシーンチェンジフレームは存在せず、（K−2）番目のGOP内の3番目のPフレーム（以下、P’_maxと呼ぶ）はシーンチェンジフレームである。この場合、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームはP’_maxである。 As shown in FIG. 5B, the second P frame in the Kth GOP is P _max , the I frame in the Kth GOP is not a scene change frame, and the (K−1) th GOP has a scene. There is no change frame, and the third P frame (hereinafter referred to as P ′ _max ) in the (K−2) th GOP is a scene change frame. In this case, the scene change frame closest to P _max and before P _max is P ′ _max .

特定の実施態様では、ステップ102の前に、最初のGOP内のIフレームをシーンチェンジフレームとして決定することができる。K番目のGOP内のP_maxと最初のGOP内のIフレームとの間にシーンチェンジフレームが存在しない場合、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームは、最初のGOP内のIフレームである。検出対象ビデオがビデオのビデオセグメントであり、シーンチェンジフレームがその検出対象ビデオよりも前に検出されるビデオセグメント（以下、前のビデオセグメントと呼ぶ）が存在する場合、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームは、前のビデオセグメント内に位置し得る。 In certain embodiments, prior to step 102, the I frame in the first GOP may be determined as the scene change frame. If there is no scene change frame between the K-th P _max and the first I-frame in the GOP of the GOP, closest and P _max scene change frame earlier than the P _max, in the first GOP I frame. If the video to be detected is a video segment of a video, and there is a video segment in which a scene change frame is detected before the video to be detected (hereinafter referred to as the previous video segment), the closest to P _max and P Scene change frames prior to _max may be located in the previous video segment.

は、P_maxとP_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間の全部または一部のIフレームのサイズの中央値または平均値を使用して計算することができる。図5Bに示すように、K番目のGOP内の2番目のPフレームはP_maxであり、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームはP’_maxであり、P_maxとP’_maxとの間に2つのIフレームI_kおよびI_k−1が存在し、
はI_kおよびI_k−1のサイズの平均値である。 It can be calculated using all or a median or average value of the size of some of the I frame between P _max and P _max nearest and P _max scene change frame prior to. As shown in FIG. 5B, 2-th P-frame of the K-th in the GOP are P _max, the scene change frame before the nearest and P _max to P _max is P _'max, P _max and P 'There are two I frames I _k and I _k-1 between _max and
Is the average value of the sizes of I _k and I _k−1 .

は、K番目のGOP内の全部または一部のPフレームのサイズの中央値または平均値とすることができる。好ましい実施態様は以下の通りである。
は、以下の式に従って計算される。
、ただし、P_−m、…、およびP₋₁は、K番目のGOP内のP_maxよりも前のPフレームを表し、P₁、…、およびP_nは、K番目のGOP内のP_maxよりも後のPフレームを表し、Fは、P_−m、…、およびP₋₁ならびにP₁、…、およびP_nのサイズの中央値または平均値を計算するために使用され、
m＝min（num_before_P_frames，max_num）、
n＝min（num_after_P_frames，max_num）、ただし、
num_before_P_framesは、K番目のGOP内のP_maxよりも前のPフレームの数であり、num_after_P_framesは、K番目のGOP内のP_maxよりも後のPフレームの数であり、max_numは、考慮する必要がある予め設定されたフレームの数を表す。図5Cに示すように、num_before_P_framesは7であり、num_after_P_framesは4であり、max_numは6に設定されている。この場合、mは6であり、nは4である。 Can be the median or average of the sizes of all or some of the P frames in the Kth GOP. Preferred embodiments are as follows.
Is calculated according to the following equation:
, Where P _−m ,..., And P ₋₁ represent P frames prior to P _max in the K th GOP, and P ₁ ,..., And P _n are P _max in the K th GOP. It represents a P frame later than, F is, P _-m, ..., and P _-1 and P _1, ..., and is used to calculate the median or average value of the size of P _n,
m = min (num_before_P_frames, max_num),
n = min (num_after_P_frames, max_num),
num_before_P_frames is the number of P frames before P _max in the Kth GOP, num_after_P_frames is the number of P frames after P _max in the Kth GOP, and max_num needs to be considered Represents the preset number of frames. As shown in FIG. 5C, num_before_P_frames is 7, num_after_P_frames is 4, and max_num is set to 6. In this case, m is 6 and n is 4.

特定の実施態様では、
は、以下の式に従って計算することができる。
In certain embodiments,
Can be calculated according to the following equation:

第1の閾値は予め設定されてもよく、同じ第1の閾値が、同じ検出対象ビデオの異なるGOPで使用されてもよい。
に従って計算される場合、第1の閾値は0.53に設定され、あるいは
に従って計算される場合、第1の閾値は0.47に設定される。 The first threshold may be set in advance, and the same first threshold may be used in different GOPs of the same detection target video.
The first threshold is set to 0.53, or
The first threshold is set to 0.47.

あるいは、検出精度がさらに向上するように、第1の閾値は、計算によって取得されてよく、また動的に調整されてもよい。第1の閾値を計算し動的に調整するプロセスは以下の通りである。 Alternatively, the first threshold may be obtained by calculation or may be dynamically adjusted so that the detection accuracy is further improved. The process of calculating and dynamically adjusting the first threshold is as follows.

検出対象ビデオのM番目のGOPが検出される前に、第1の閾値が最初に以下の式に従って計算される。
Before the Mth GOP of the detection target video is detected, the first threshold value is first calculated according to the following equation.

I_medianは、検出対象ビデオのすべてのIフレームのサイズの中央値または平均値であり、P_medianは、検出対象ビデオのすべてのPフレームのサイズの中央値または平均値である。中央値を例として使用する。例えば、検出対象ビデオは、サイズが3、5、3、6、4、7、3、5、および4である9個のPフレームを含み、シーケンスは、昇順で3、3、3、4、4、5、5、6、および7であり、したがって、P_medianは4であるか、あるいは、検出対象ビデオは、サイズが15、12、および18である3個のPフレームを含み、シーケンスは、昇順で12、15、および18であり、したがって、I_medianは15である。 I _median is the _median value or average value of the sizes of all I frames of the detection target video, and P _median is the _median value or average value of the sizes of all the P frames of the detection target video. The median is used as an example. For example, the video to be detected contains 9 P frames of size 3, 5, 3, 6, 4, 7, 3, 5, and 4, and the sequence is 3, 3, 3, 4, 4, 5, 5, 6, and 7, so P _median is 4, or the detected video contains 3 P frames of size 15, 12, and 18, and the sequence is , 12, 15, and 18 in ascending order, so I _median is 15.

次に、新しいPフレームがシーンチェンジフレームとして判定されるたびに、例えば、K番目のGOP内のP_maxがシーンチェンジフレームであると判定されるたびに、検出対象ビデオ中のシーンチェンジフレームとして判定されたPフレーム以外のPフレームのサイズの中央値または平均値が、新しいP_medianとして使用され、新しいI_thresholdが、式
に従って計算され、新しいI_thresholdが、次のGOP（K番目のGOPに続くGOP）内のP_maxがシーンチェンジフレームであるかどうかを判定するために使用されてもよい。 Next, whenever a new P frame is determined as a scene change frame, for example, whenever P _max in the Kth GOP is determined to be a scene change frame, it is determined as a scene change frame in the target video. The _median or average size of P frames other than the specified P frame is used as the new P _median , and the new I _threshold is
And the new I _threshold may be used to determine whether P _max in the next GOP (the GOP following the Kth GOP) is a scene change frame.

第1の閾値I_thresholdをリアルタイムで更新することができるため、シーンチェンジフレームとして判定されたPフレームの影響は適時に排除され、これにより、Pフレーム中のシーンチェンジフレームを検出する精度がさらに向上する。 Since the first threshold I _threshold can be updated in real time, the effects of P frames determined as scene change frames are eliminated in a timely manner, thereby further improving the accuracy of detecting scene change frames in P frames. To do.

特定の実施態様は以下の通りとすることができる。第1の閾値が式
に従って取得される場合、
であり、あるいは第1の閾値が式
に従って取得される場合、
である。 Specific embodiments can be as follows. The first threshold is an expression
If obtained according to
Or the first threshold is the expression
If obtained according to
It is.

第2の閾値は通常は予め設定されてもよく、同じ第2の閾値が、同じ検出対象ビデオの異なるGOPで使用されてもよい。例えば、第2の閾値は1.51に設定される。 The second threshold value may be normally set in advance, and the same second threshold value may be used in different GOPs of the same detection target video. For example, the second threshold is set to 1.51.

本発明の実施形態1の実施態様Aは、ステップ101をさらに含むことができる。 Implementation A of Embodiment 1 of the present invention can further include Step 101.

ステップ101：N個のGOPのうちのM番目のGOPからN番目のGOP内のIフレームの中からシーンチェンジフレームを検出する。 Step 101: A scene change frame is detected from I frames in the Nth GOP from the Mth GOP among the N GOPs.

ステップ101において、K番目のGOP内のIフレームがシーンチェンジフレームであるかどうかは、特に、K番目のGOP内のIフレームのサイズと（K−1）番目のGOP内のIフレームのサイズとの比、（K−1）番目のGOP内のすべてのPフレームのサイズの平均値とK番目のGOP内のすべてのPフレームのサイズの平均値との比、または（K−1）番目のGOP内のすべてのBフレームのサイズの平均値とK番目のGOP内のすべてのBフレームのサイズの平均値との比に基づいて判定することができる。特定の実施態様は以下の通りである。 In step 101, whether or not the I frame in the Kth GOP is a scene change frame, in particular, determines the size of the I frame in the Kth GOP and the size of the I frame in the (K−1) th GOP. The ratio of the average size of all P frames in the (K−1) th GOP to the average size of all P frames in the Kth GOP, or the (K−1) th The determination can be made based on the ratio between the average value of the sizes of all the B frames in the GOP and the average value of the sizes of all the B frames in the Kth GOP. Specific embodiments are as follows.

1．K番目のGOP内のIフレームのサイズと（K−1）番目のGOP内のIフレームのサイズとの比r_Iを計算する。 1． The ratio r _I between the size of the I frame in the Kth GOP and the size of the I frame in the (K−1) th GOP is calculated.

2．（K−1）番目のGOP内のすべてのPフレームのサイズの平均値とK番目のGOP内のすべてのPフレームのサイズの平均値との比r_pを計算する。 2． The ratio r _p between the average value of the sizes of all the P frames in the (K−1) th GOP and the average value of the sizes of all the P frames in the Kth GOP is calculated.

3．（K−1）番目のGOP内のすべてのBフレームのサイズの平均値とK番目のGOP内のすべてのBフレームのサイズの平均値との比r_Bを計算する。 3． The ratio r _B between the average value of the sizes of all the B frames in the (K−1) th GOP and the average value of the sizes of all the B frames in the Kth GOP is calculated.

4．比r_Iが、第1の閾値よりも大きいかまたは第2の閾値よりも小さい場合、以下の条件（1）および条件（2）についてさらに判定し、そうでない場合、K番目のGOP内のIフレームはシーンチェンジフレームではないと判定する。 Four. If the ratio r _I is greater than the first threshold or less than the second threshold, then further determination is made for the following condition (1) and condition (2), otherwise I in the Kth GOP It is determined that the frame is not a scene change frame.

条件（1）：r_Pが第3の閾値よりも小さいか、またはr_Pが第4の閾値よりも大きい。 Condition (1): r _P is smaller than the third threshold or r _P is larger than the fourth threshold.

条件（2）：r_Bが第5の閾値よりも小さいか、またはr_Bが第6の閾値よりも大きい。 Condition (2): r _B is smaller than the fifth threshold or r _B is larger than the sixth threshold.

条件（1）と条件（2）の両方が満たされた場合、K番目のGOP内のIフレームはシーンチェンジフレームであると判定され、そうでない場合、K番目のGOP内のIフレームはシーンチェンジフレームではないと判定される。 If both condition (1) and condition (2) are met, the I frame in the Kth GOP is determined to be a scene change frame; otherwise, the I frame in the Kth GOP is a scene change. It is determined that it is not a frame.

上記の実施態様の具体的な詳細については、標準ITU-T P1201.2を参照されたい。 Refer to standard ITU-T P1201.2 for specific details of the above embodiment.

特定の実施態様では、最初のGOP内のIフレームをシーンチェンジフレームとして直接決定することができる。Kが1ではない場合、上記の方法を使用して、K番目のGOP内のIフレームがシーンチェンジフレームであるかどうかを判定することができる。 In certain embodiments, the I frame in the first GOP can be determined directly as the scene change frame. If K is not 1, the above method can be used to determine if the I frame in the Kth GOP is a scene change frame.

特定の実施態様では、図3Bに示すように、実施形態Aは、実施形態Jを使用して実施することができ、最初にステップ101が実行され、次にステップ102および103が実行される、すなわち、最初に、M番目のGOPからN番目のGOP内のIフレーム中のシーンチェンジフレームが検出され、次に、M番目のGOPからN番目のGOP内のPフレーム中のシーンチェンジフレームが検出される。例えば、最初にGOP1内のIフレームがシーンチェンジフレームとして決定され、次に、GOP M（例えばGOP1）からGOP N内のIフレームがシーンチェンジフレームであるかどうかが判定され、GOP M（例えばGOP1）からGOP N内のP_maxがシーンチェンジフレームであるかどうかが判定される。 In a particular implementation, as shown in FIG. 3B, embodiment A can be implemented using embodiment J, where step 101 is performed first, then steps 102 and 103 are performed. That is, first, a scene change frame in the I frame in the Nth GOP from the Mth GOP is detected, and then a scene change frame in the P frame in the Nth GOP is detected from the Mth GOP. Is done. For example, first, an I frame in GOP1 is determined as a scene change frame, and then it is determined from GOP M (for example, GOP1) whether an I frame in GOP N is a scene change frame, and GOP M (for example, GOP1) ) To determine whether P _max in GOP N is a scene change frame.

あるいは、特定の実施態様では、図3Cに示すように、実施態様Aは、実施態様Kを使用して実施することができ、ステップ101は、ステップ102および103と組み合わせて実行され、シーンチェンジフレームは、ビデオフレームのシーケンスに基づいて検出される、すなわち、GOPのシーケンスに基づいて、M番目の（例えば、最初の）GOPから開始して現在のGOP内のシーンチェンジフレームが検出され、現在のGOP内のシーンチェンジフレームが検出されたら、最初に、現在のGOP内のIフレームがシーンチェンジフレームであるかどうかが検出され、次に、現在のGOP内のP_maxがシーンチェンジフレームであるかどうかが検出される。例えば、最初にGOP1内のIフレームがシーンチェンジフレームとして決定され、次に、GOP1内のP_max、GOP2内のIフレーム、GOP2内のP_max、GOP3内のIフレーム、GOP3内のP_max、…、GOP N内のIフレーム、またはGOP N内のP_maxがシーンチェンジフレームであるかどうかが順番に判定される。つまり、KがNよりも小さい場合、K番目のGOP内のP_maxがシーンチェンジフレームであるかどうかが判定された後に、（K＋1）番目のGOP内のIフレームがシーンチェンジフレームであるかどうかが判定される。 Alternatively, in a particular implementation, implementation A can be implemented using implementation K, and step 101 is performed in combination with steps 102 and 103, as shown in FIG. Is detected based on the sequence of video frames, i.e., based on the sequence of GOPs, starting from the Mth (e.g., first) GOP, the scene change frame in the current GOP is detected and the current When a scene change frame in a GOP is detected, it is first detected whether the I frame in the current GOP is a scene change frame, and then whether P _max in the current GOP is a scene change frame Whether it is detected. For example, first an I frame in GOP1 is determined as a scene change frame, then P _max in GOP1, I frame in GOP2, P _max in GOP2, I frame in GOP3, P _max in GOP3, ..., it is determined in turn whether the I frame in GOP N or P _max in GOP N is a scene change frame. That is, if K is less than N, it is determined whether P _max in the Kth GOP is a scene change frame, and then whether the I frame in the (K + 1) th GOP is a scene change frame Is determined.

実施態様Kを使用して実施態様Aが実施される場合、シーンが短時間に連続して変化する確率が比較的低いため、Iフレームがシーンチェンジフレームであるかどうかが判定される前に、最初に、Iフレームと前のシーンチェンジフレームとの間の距離（以下、第1の距離と呼ぶ）が計算される。第1の距離が距離閾値以下である場合、Iフレームはシーンチェンジフレームではないと判定され、そうでない場合、標準ITU-T P1201.2で提供される方法に従って、Iフレームがシーンチェンジフレームであるかどうかをさらに判定することができる。特定の実施態様は以下の通りである。KがNよりも小さいとき、K番目のGOP内のP_maxがシーンチェンジフレームであるかどうかが判定された後、（K＋1）番目のGOP内のIフレーム（以下、現在のIフレームと呼ぶ）と、現在のIフレームに最も近くかつ現在のIフレームよりも前のシーンチェンジフレームとの間の距離が距離閾値以下であると判定された場合、現在のIフレームはシーンチェンジフレームではないと判定され、そうでない場合、標準ITU-T P1201.2で提供される方法に従って、現在のIフレームがシーンチェンジフレームであるかどうかをさらに判定することができる。 When implementation A is implemented using implementation K, the probability that the scene will change continuously in a short time is relatively low, so before it is determined whether the I frame is a scene change frame, First, the distance between the I frame and the previous scene change frame (hereinafter referred to as the first distance) is calculated. If the first distance is less than or equal to the distance threshold, it is determined that the I frame is not a scene change frame, otherwise the I frame is a scene change frame according to the method provided in standard ITU-T P1201.2 It can be further determined whether or not. Specific embodiments are as follows. When K is less than N, it is determined whether P _max in the Kth GOP is a scene change frame, and then the I frame in the (K + 1) th GOP (hereinafter referred to as the current I frame) If the distance between the current I frame and the scene change frame that is closest to and before the current I frame is less than or equal to the distance threshold, the current I frame is not a scene change frame. If not, it can further be determined whether the current I frame is a scene change frame according to the method provided in standard ITU-T P1201.2.

本発明で説明されている、2つのビデオフレーム間の距離は、2つのビデオフレーム間のビデオフレームの数である。X番目のビデオフレームとY番目のビデオフレームとの間の距離はY−Nであり、2つの隣接するビデオフレーム間の距離は1である。 The distance between two video frames as described in the present invention is the number of video frames between the two video frames. The distance between the Xth video frame and the Yth video frame is YN, and the distance between two adjacent video frames is 1.

距離閾値は予め設定されてもよく、同じ距離閾値が、同じ検出対象ビデオの異なるGOPで使用されてもよい。 The distance threshold may be set in advance, and the same distance threshold may be used in different GOPs of the same detection target video.

あるいは、検出精度がさらに向上するように、距離閾値は決定されてもよく、また動的に調整されてもよい。距離閾値を決定し動的に調整するプロセスは以下の通りである。 Alternatively, the distance threshold may be determined or dynamically adjusted so that the detection accuracy is further improved. The process for determining and dynamically adjusting the distance threshold is as follows.

N個のGOPのうちのM番目のGOPからN番目のGOP内のIフレーム中のシーンチェンジフレームが検出される前に、最初に、初期距離閾値が決定され、決定される初期距離閾値は、以下の3つの長さのうちの1つとすることができる。
（1）N個のGOPのうちの最長のGOPの長さ、
（2）N個のGOPの長さの平均値、および
（3）長さL、ただし、長さLを有するGOPの数はN個のGOPのうちで最大である Before the scene change frame in the I frame in the Nth GOP is detected from the Mth GOP of the N GOPs, the initial distance threshold is first determined, and the determined initial distance threshold is: It can be one of the following three lengths:
(1) The length of the longest GOP out of N GOPs,
(2) Average length of N GOPs, and (3) Length L, where the number of GOPs with length L is the largest of N GOPs

本発明で説明されているGOPの長さは、そのGOPに含まれるビデオフレームの数である。 The length of a GOP described in the present invention is the number of video frames included in the GOP.

例えば、検出対象ビデオは、長さが10、6、8、7、8、7、9、および8である8つのGOPを含み、方法（1）によれば、初期距離閾値は10に決定され、方法（2）によれば、初期距離閾値は8に決定され、方法（3）によれば、長さ8を有するGOPの数が最大であるため、初期距離閾値は8に決定される。 For example, the video to be detected includes 8 GOPs of length 10, 6, 8, 7, 8, 7, 9, and 8, and according to method (1), the initial distance threshold is determined to be 10. According to method (2), the initial distance threshold is determined to be 8, and according to method (3), since the number of GOPs having length 8 is the maximum, the initial distance threshold is determined to be 8.

固定GOP長を使用して符号化が行われる場合、すべてのGOPの長さは同じであり、したがって、上記の3つの方法で計算される初期距離閾値は同じである。 When encoding is performed using a fixed GOP length, the length of all GOPs is the same, so the initial distance threshold calculated in the above three methods is the same.

次に、新しいシーンチェンジフレームが決定されたとき、その新しいシーンチェンジフレームと、その新しいシーンチェンジフレームに最も近くかつその新しいシーンチェンジフレームよりも前のシーンチェンジフレームとの間の距離（以下、第2の距離と呼ぶ）が距離閾値よりも小さい場合、距離閾値は、第2の距離に更新される。 Next, when a new scene change frame is determined, the distance between the new scene change frame and the scene change frame closest to and before the new scene change frame (hereinafter referred to as the first scene change frame). If the distance threshold is smaller than the distance threshold, the distance threshold is updated to the second distance.

本発明の実施形態1の実施態様Aでは、ビデオ中のシーンチェンジフレームが検出されるとき、Pフレーム中のシーンチェンジフレームを検出するために、ビデオのGOP内のすべてのPフレームのうちの最大PフレームP_maxがシーンチェンジフレームであるかどうかが、P_maxとP_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間の複数のIフレームのサイズの中央値もしくは平均値と、P_maxのサイズ
との相対関係または
と、GOP内の複数のPフレームのサイズの中央値もしくは平均値との相対関係に基づいて判定され、これにより、シーンチェンジフレームの検出漏れが効果的に低減される。 In implementation A of embodiment 1 of the present invention, when a scene change frame in a video is detected, a maximum of all P frames in the video GOP is detected to detect a scene change frame in the P frame. whether P-frame P _max is a scene change frame, the median or average value of the size of a plurality of I frames between P _max and P _max nearest and P _max scene change frame prior to, P _max size
Relative to or
And the relative relationship between the median value or the average value of the sizes of a plurality of P frames in the GOP, thereby effectively reducing the detection omission of the scene change frame.

また、ビデオ符号化において、Pフレームは、通常はBフレームよりも大きく、Pフレームは、通常はBフレームのサイズの2〜5倍である。シーンチェンジフレームのコンテンツとシーンチェンジフレームの前のフレームのコンテンツとの差分は比較的大きいので、シーンチェンジフレームはPフレームに符号化されるが、フレーム内予測符号化は、シーンチェンジフレーム内のほとんどのマクロブロックに対して実行される。したがって、符号化されたシーンチェンジフレームのサイズは比較的大きい。PフレームがBフレームのサイズの2倍よりも小さい場合、Pフレームはシーンチェンジフレームである可能性が高い。したがって、Pフレームであるシーンチェンジフレームを検出するとき、PフレームとBフレームとの相対関係を参照することができる。 In video coding, a P frame is usually larger than a B frame, and the P frame is usually 2 to 5 times the size of the B frame. Since the difference between the content of the scene change frame and the content of the frame before the scene change frame is relatively large, the scene change frame is encoded as a P frame, but intraframe predictive encoding is mostly used in a scene change frame. Is executed on the macroblock of Therefore, the size of the encoded scene change frame is relatively large. If the P frame is smaller than twice the size of the B frame, the P frame is likely to be a scene change frame. Therefore, when detecting a scene change frame that is a P frame, the relative relationship between the P frame and the B frame can be referred to.

上記の分析に基づいて、以下では、図4Aを参照して、検出対象ビデオ中のシーンチェンジフレームを検出するための、本発明の実施形態1の実施態様Bを詳細に説明する。ビデオはN個のGOPを含み、Nは2以上の整数である。 Based on the above analysis, Embodiment B of Embodiment 1 of the present invention for detecting a scene change frame in the detection target video will be described in detail below with reference to FIG. 4A. The video includes N GOPs, where N is an integer greater than or equal to 2.

図4Aに示すように、本発明の実施形態1の実施態様Bで提供される方法は、以下のステップを含む。 As shown in FIG. 4A, the method provided in Embodiment B of Embodiment 1 of the present invention includes the following steps.

ステップ202：ステップ202はステップ102と同じであり、ここでは詳細は再度説明しない。 Step 202: Step 202 is the same as step 102, and details are not described here again.

ステップ203：
の間の相対値が第1の閾値以上であり、
の間の相対値が第2の閾値以上であり、かつK番目のGOP内にBフレームが存在しないと判定された場合、または、
の間の相対値が第1の閾値以上であり、
の間の相対値が第2の閾値以上であり、K番目のGOP内にBフレームが存在し、かつ
の間の相対値が第3の閾値以上であると判定された場合、P_maxはシーンチェンジフレームであると判定するが、ただし、
は、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間の複数のIフレームのサイズの平均値であり、
は、K番目のGOP内の複数のPフレームのサイズの中央値または平均値であり、
は、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間のすべてのBフレームのサイズの中央値または平均値であり、第1の閾値は、0よりも大きくかつ1よりも小さく、第2の閾値は1よりも大きく、第3の閾値は1よりも大きい。 Step 203:
The relative value between is greater than or equal to the first threshold,
If it is determined that the relative value between is greater than or equal to the second threshold and there is no B frame in the Kth GOP, or
The relative value between is greater than or equal to the first threshold,
The relative value between is greater than or equal to the second threshold, there is a B frame in the Kth GOP, and
If it is determined that the relative value between is greater than or equal to the third threshold, P _max is determined to be a scene change frame,
Has a P _max, the average of the size of a plurality of I frames between the nearest and the scene change frame before P _max to P _max,
Is the median or average of the sizes of multiple P frames within the Kth GOP,
It has a P _max, a median or average value of the sizes of all B-frames between the nearest and the scene change frame before P _max to P _max, the first threshold value is greater than 0 Less than 1, the second threshold is greater than 1, and the third threshold is greater than 1.

また、
の間の相対値が第1の閾値よりも小さいか、または
の間の相対値が第2の閾値よりも小さいか、または
の間の相対値が第3の閾値よりも小さいと判定された場合、P_maxはシーンチェンジフレームではないと判定される。 Also,
The relative value between is less than the first threshold, or
The relative value between is less than the second threshold, or
If it is determined that the relative value between is smaller than the third threshold, P _max is determined not to be a scene change frame.

の間の相対値が第1の閾値以上であることを判定するための方法および
の間の相対値が第2の閾値以上であることを判定するための方法は、ステップ102に関して説明したものと同じであり、ここでは詳細は再度説明しない。 A method for determining that the relative value between is greater than or equal to a first threshold and
The method for determining that the relative value between is greater than or equal to the second threshold is the same as that described for step 102, and details are not described herein again.

第3の閾値は予め設定されてもよく、同じ第3の閾値が、同じ検出対象ビデオの異なるGOPで使用されてもよい。
に従って計算される場合、第1の閾値は2.87に設定され、あるいは
に従って計算される場合、第1の閾値は1.87に設定される。 The third threshold may be set in advance, and the same third threshold may be used in different GOPs of the same detection target video.
The first threshold is set to 2.87 if calculated according to
The first threshold is set to 1.87.

あるいは、検出精度がさらに向上するように、第3の閾値は、計算によって取得されてよく、また動的に調整されてもよい。第3の閾値を計算し動的に調整するプロセスは以下の通りである。 Alternatively, the third threshold may be obtained by calculation or may be dynamically adjusted so that the detection accuracy is further improved. The process of calculating and dynamically adjusting the third threshold is as follows.

検出対象ビデオのM番目のGOPが検出される前に、第3の閾値が最初に以下の式に従って計算される。
Before the Mth GOP of the detection target video is detected, the third threshold value is first calculated according to the following equation.

B_thresholdは第3の閾値であり、P_medianは、ビデオのすべてのPフレームのサイズの中央値または平均値であり、B_medianは、検出対象ビデオのすべてのBフレームのサイズの中央値または平均値である。 B _threshold is the third threshold, P _median is the _median or average size of all P frames in the video, and B _median is the _median or average size of all B frames in the detected video Value.

次に、新しいPフレームがシーンチェンジフレームとして判定されるたびに、K番目のGOP内のP_maxがシーンチェンジフレームであると判定された場合、検出対象ビデオ中のシーンチェンジフレームとして判定されたPフレーム以外のPフレームのサイズの中央値または平均値が、新しいP_medianとして使用され、新しいB_thresholdが、式
に従って計算され、新しいB_thresholdが、次のGOP内のP_maxがシーンチェンジフレームであるかどうかを判定するために使用されてもよい。 Next, every time a new P frame is determined as a scene change frame, if it is determined that P _max in the Kth GOP is a scene change frame, the P determined as a scene change frame in the detection target video The _median or average size of non-frame P frames is used as the new P _median and the new B _threshold is
And the new B _threshold may be used to determine if P _max in the next GOP is a scene change frame.

第3の閾値B_thresholdをリアルタイムで更新することができるため、シーンチェンジフレームとして判定されたPフレームの影響は適時に排除され、これにより、Pフレーム中のシーンチェンジフレームを検出する精度がさらに向上する。 Since the third threshold B _threshold can be updated in real time, the effects of P frames determined as scene change frames are eliminated in a timely manner, thereby further improving the accuracy of detecting scene change frames in P frames. To do.

本発明の実施形態1の実施態様Bは、ステップ201をさらに含むことができ、ステップ201は、ステップ101と同じであり、ここでは詳細は再度説明しない。 Implementation B of embodiment 1 of the present invention can further include step 201, which is the same as step 101, and details are not described here again.

特定の実施態様において、実施態様Bは、図4Bおよび図4Cに示すように、実施態様Aの2つの実施態様（実施態様Jおよび実施態様K）と同様の実施態様を使用して実施することもできる。 In certain embodiments, embodiment B is performed using an embodiment similar to the two embodiments of embodiment A (embodiment J and embodiment K), as shown in FIGS. 4B and 4C. You can also.

本発明の実施形態1の実施態様Bでは、ビデオ中のシーンチェンジフレームが検出されるとき、Pフレーム中のシーンチェンジフレームを検出するために、ビデオのGOP内のすべてのPフレームのうちの最大PフレームP_maxがシーンチェンジフレームであるかどうかが、P_maxとP_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間の複数のIフレームのサイズの中央値もしくは平均値と、P_maxのサイズ
との相対関係または
と、GOP内の複数のPフレームのサイズの中央値もしくは平均値との相対関係またはP_maxとP_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間のすべてのBフレームのサイズの中央値もしくは平均値と、
との相対関係に基づいて判定され、これにより、シーンチェンジフレームの検出漏れが効果的に低減される。また、Pフレーム中のシーンチェンジフレームを検出しているとき、IフレームおよびPフレームのサイズだけでなく、Bフレームのサイズも考慮されるため、Pフレーム中のシーンチェンジフレームを検出する精度がさらに向上する。 In implementation B of embodiment 1 of the present invention, when a scene change frame in a video is detected, a maximum of all P frames in the video GOP is detected to detect a scene change frame in the P frame. whether P-frame P _max is a scene change frame, the median or average value of the size of a plurality of I frames between P _max and P _max nearest and P _max scene change frame prior to, P _max size
Relative to or
When the size of all the B frames between a plurality of relative relationships or P _max and the nearest and the scene change frame before P _max to P _max of the median or average value of the size of P frames in the GOP The median or average of
This makes it possible to effectively reduce detection omissions in scene change frames. Also, when detecting a scene change frame in the P frame, not only the size of the I frame and P frame but also the size of the B frame are considered, so the accuracy of detecting the scene change frame in the P frame is further increased improves.

本発明の実施形態1に基づいて、本発明の実施形態2は、ビデオ中のシーンチェンジフレームを検出するための検出装置200を提供する。ビデオはN個のGOPを含み、Nは2以上の整数である。図6に示すように、検出装置200は、第1の判定部210および第2の判定部220を含む。 Based on Embodiment 1 of the present invention, Embodiment 2 of the present invention provides a detection apparatus 200 for detecting a scene change frame in a video. The video includes N GOPs, where N is an integer greater than or equal to 2. As shown in FIG. 6, the detection apparatus 200 includes a first determination unit 210 and a second determination unit 220.

実施形態2の第1の実施態様は、実施形態1の実施態様Aに対応し、詳細は以下の通りである。 The first mode of the second embodiment corresponds to the mode A of the first embodiment, and details are as follows.

第1の判定部210は、K番目のGOP内のすべてのPフレームのうちの最大PフレームP_maxを判定し、P_maxのサイズは、
であり、Kは、MからNまでの範囲の変数であり、1≦M≦Nである、ように構成される。 The first determination unit 210 determines the maximum P frame P _max among all the P frames in the Kth GOP, and the size of P _max is
Where K is a variable ranging from M to N, and is configured such that 1 ≦ M ≦ N.

第2の判定部220は、
の間の相対値が第1の閾値以上であり、かつ
の間の相対値が第2の閾値以上であると判定された場合、P_maxはシーンチェンジフレームであると判定し、
は、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間の複数のIフレームのサイズの中央値または平均値であり、
は、K番目のGOP内の複数のPフレームのサイズの中央値または平均値であり、第1の閾値は、0よりも大きくかつ1よりも小さく、第2の閾値は1よりも大きい、ように構成される。 The second determination unit 220 is
The relative value between is greater than or equal to the first threshold, and
When it is determined that the relative value between is greater than or equal to the second threshold, P _max is determined to be a scene change frame,
Has a P _max, a median or average value of the size of a plurality of I frames between the nearest and the scene change frame before P _max to P _max,
Is the median or average of the sizes of multiple P frames in the Kth GOP, the first threshold is greater than 0 and less than 1, and the second threshold is greater than 1. Configured.

具体的には、第1の判定部210は、実施形態1の実施態様Aの方法のステップ102を実行するように特に構成されてもよく、第2の判定部220は、実施形態1の実施態様Aの方法のステップ103を実行するように特に構成されてもよい。 Specifically, the first determiner 210 may be specifically configured to perform step 102 of the method of embodiment A of embodiment 1, and the second determiner 220 is the implementation of embodiment 1. It may be specifically configured to perform step 103 of the method of aspect A.

さらに、第2の判定部220は、N個のGOPのうちのM番目からN番目のGOP内のIフレームの中からシーンチェンジフレームを検出するようにさらに構成されてもよく、実施形態1の実施態様Aの方法のステップ101を実行するように特に構成されてもよい。 Further, the second determination unit 220 may be further configured to detect a scene change frame from the I frames in the Mth to Nth GOPs out of the N GOPs. Embodiment A may be specifically configured to perform step 101 of the method.

実施形態2の第2の実施態様は、実施形態1の実施態様Bに対応し、詳細は以下の通りである。 The second embodiment of the second embodiment corresponds to the embodiment B of the first embodiment, and details are as follows.

第2の判定部220は、
の間の相対値が第1の閾値以上であり、
の間の相対値が第2の閾値以上であり、かつK番目のGOP内にBフレームが存在しないと判定された場合、または、
の間の相対値が第1の閾値以上であり、
の間の相対値が第2の閾値以上であり、K番目のGOP内にBフレームが存在し、かつ
の間の相対値が第3の閾値以上であると判定された場合、P_maxはシーンチェンジフレームであると判定し、
は、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間の複数のIフレームのサイズの中央値または平均値であり、
は、K番目のGOP内の複数のPフレームのサイズの中央値または平均値であり、
は、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間のすべてのBフレームのサイズの中央値または平均値であり、第1の閾値は、0よりも大きくかつ1よりも小さく、第2の閾値は1よりも大きく、第3の閾値は1よりも大きい、ように構成される。 The second determination unit 220 is
The relative value between is greater than or equal to the first threshold,
If it is determined that the relative value between is greater than or equal to the second threshold and there is no B frame in the Kth GOP, or
The relative value between is greater than or equal to the first threshold,
The relative value between is greater than or equal to the second threshold, there is a B frame in the Kth GOP, and
When it is determined that the relative value between is greater than or equal to the third threshold, P _max is determined to be a scene change frame,
Has a P _max, a median or average value of the size of a plurality of I frames between the nearest and the scene change frame before P _max to P _max,
Is the median or average of the sizes of multiple P frames within the Kth GOP,
It has a P _max, a median or average value of the sizes of all B-frames between the nearest and the scene change frame before P _max to P _max, the first threshold value is greater than 0 And the second threshold is greater than 1 and the third threshold is greater than 1.

具体的には、第1の判定部210は、実施形態1の実施態様Bの方法のステップ202を実行するように特に構成されてもよく、第2の判定部220は、実施形態1の実施態様Bの方法のステップ203を実行するように特に構成されてもよい。 Specifically, the first determiner 210 may be specifically configured to perform step 202 of the method of embodiment B of embodiment 1, and the second determiner 220 is an implementation of embodiment 1. It may be specifically configured to perform step 203 of the method of aspect B.

さらに、第2の判定部220は、N個のGOPのうちのM番目からN番目のGOP内のIフレームの中からシーンチェンジフレームを検出するようにさらに構成されてもよく、実施形態1の実施態様Bの方法のステップ201を実行するように特に構成されてもよい。 Further, the second determination unit 220 may be further configured to detect a scene change frame from the I frames in the Mth to Nth GOPs out of the N GOPs. Embodiment B may be specifically configured to perform step 201 of the method.

本発明の実施形態2では、ビデオ中のシーンチェンジフレームが検出されるとき、Iフレーム中のシーンチェンジフレームが検出され、ビデオのGOP内のすべてのPフレームのうちの最大PフレームP_maxがシーンチェンジフレームであるかどうかが、Pフレーム中のシーンチェンジフレームを検出するために、P_maxとP_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間の複数のIフレームのサイズの中央値もしくは平均値と、P_maxのサイズ
との相対関係または
と、GOP内の複数のPフレームのサイズの中央値もしくは平均値との相対関係に基づいて判定され、これにより、シーンチェンジフレームの検出漏れが効果的に低減される。本発明の実施形態2の実施態様Bでは、Pフレーム中のシーンチェンジフレームを検出しているとき、IフレームおよびPフレームのサイズだけでなく、Bフレームのサイズも考慮されるため、Pフレーム中のシーンチェンジフレームを検出する精度がさらに向上する。 In Embodiment 2 of the present invention, when a scene change frame in the video is detected, a scene change frame in the I frame is detected, and the maximum P frame P _max among all P frames in the video GOP is the scene. Whether or not it is a change frame is the size of multiple I frames between P _max and a scene change frame that is closest to P _max and before P _max to detect a scene change frame in the P frame Median or average and P _max size
Relative to or
And the relative relationship between the median value or the average value of the sizes of a plurality of P frames in the GOP, thereby effectively reducing the detection omission of the scene change frame. In the embodiment B of the embodiment 2 of the present invention, when detecting the scene change frame in the P frame, not only the size of the I frame and the P frame but also the size of the B frame is considered. The accuracy of detecting the scene change frame is further improved.

本発明の実施形態1に従って、本発明の実施形態3は、検出装置1000を提供する。図7に示すように、検出装置1000は、プロセッサ1010およびメモリ1020を含み、プロセッサ1010とメモリ1020とはバスを使用して相互通信を行う。 In accordance with Embodiment 1 of the present invention, Embodiment 3 of the present invention provides a detection apparatus 1000. As shown in FIG. 7, the detection apparatus 1000 includes a processor 1010 and a memory 1020, and the processor 1010 and the memory 1020 communicate with each other using a bus.

メモリ1020は、コンピュータ動作命令を記憶するように構成される。メモリ1020は、高速RAMメモリを含んでもよく、少なくとも1つの磁気ディスクメモリなどの不揮発性メモリ（non-volatile memory）をさらに含んでもよい。 Memory 1020 is configured to store computer operating instructions. The memory 1020 may include a high speed RAM memory and may further include at least one non-volatile memory such as a magnetic disk memory.

プロセッサ1010は、メモリ1020に記憶されたコンピュータ動作命令を実行するように構成される。プロセッサ1010は、具体的には中央処理装置（CPU、central processing unit）であってもよく、コンピュータのコアユニットである。 The processor 1010 is configured to execute computer operating instructions stored in the memory 1020. Specifically, the processor 1010 may be a central processing unit (CPU), and is a core unit of a computer.

プロセッサ1010は、検出装置1000が実施形態1の方法を実行できるようにするためにコンピュータ動作命令を実行する。 The processor 1010 executes computer operating instructions to enable the detection apparatus 1000 to perform the method of embodiment 1.

本発明の実施形態3では、ビデオ中のシーンチェンジフレームを検出するときに、Pフレーム中のシーンチェンジフレームを検出することができ、これにより、シーンチェンジフレームの検出漏れを効果的に低減することができる。 In Embodiment 3 of the present invention, when detecting a scene change frame in a video, it is possible to detect a scene change frame in a P frame, thereby effectively reducing a scene change frame detection failure. Can do.

本発明の実施形態1〜3に従って、本発明の実施形態4は、検出デバイス400を提供する。図8に示すように、検出デバイス400は、媒体部4010および検出装置4020を含む。 In accordance with Embodiments 1-3 of the present invention, Embodiment 4 of the present invention provides a detection device 400. As shown in FIG. 8, the detection device 400 includes a medium part 4010 and a detection device 4020.

媒体部4010は、ビデオ（以下、検出対象ビデオと呼ぶ）を取得し、そのビデオを検出装置4020に送信するように構成される。媒体部4010は、特に、ビデオファイルから検出対象ビデオを読み出してもよいし、ビデオサーバによって送信される受信メディアストリームから検出対象ビデオを取得してもよい。検出対象ビデオは、特に、完全なビデオであってもよいし、ビデオのビデオセグメントであってもよい。検出対象ビデオがビデオセグメントである場合、媒体部4010は、ビデオセグメントが配置されているビデオ（すなわち、ビデオセグメントを含むビデオ）を検出装置4020に送信してもよく、検出装置4020は、検出対象ビデオ中のシーンチェンジフレームを検出するために、受信したビデオのビデオセグメントを検出する。 The medium unit 4010 is configured to acquire a video (hereinafter referred to as a detection target video) and transmit the video to the detection device 4020. In particular, the medium unit 4010 may read the detection target video from the video file, or may acquire the detection target video from the reception media stream transmitted by the video server. The video to be detected may in particular be a complete video or a video segment of the video. When the detection target video is a video segment, the medium unit 4010 may transmit the video in which the video segment is arranged (that is, the video including the video segment) to the detection device 4020, and the detection device 4020 In order to detect a scene change frame in the video, a video segment of the received video is detected.

検出装置4020は、特に、実施形態2で提供される検出装置200または実施形態3で提供される検出装置1000であってもよく、媒体部4010から検出対象ビデオを取得し、実施形態2で提供される検出装置200または実施形態3で提供される検出装置1000によって実行される動作を実行する。 The detection device 4020 may be, in particular, the detection device 200 provided in the second embodiment or the detection device 1000 provided in the third embodiment, obtains a detection target video from the medium unit 4010, and is provided in the second embodiment. The operation performed by the detection apparatus 200 to be performed or the detection apparatus 1000 provided in the third embodiment is executed.

検出装置4020は、検出されたシーンチェンジフレームに基づいて、検出対象ビデオの品質、または検出対象ビデオが配置されているビデオの品質をさらに評価してもよい。 The detection device 4020 may further evaluate the quality of the detection target video or the quality of the video in which the detection target video is arranged based on the detected scene change frame.

本発明の実施形態4では、ビデオ中のシーンチェンジフレームを検出するときに、Pフレーム中のシーンチェンジフレームを検出することができ、これにより、シーンチェンジフレームの検出漏れを効果的に低減することができる。 In Embodiment 4 of the present invention, when detecting a scene change frame in a video, it is possible to detect a scene change frame in a P frame, thereby effectively reducing a scene change frame detection failure. Can do.

本発明の実施形態1〜3に従って、本発明の実施形態5は、ビデオ品質評価を実施するためのシステム2000を提供する。図9Aに示すように、システム2000は、ビデオサーバ2010、送信デバイス2020、およびビデオ端末2030を含む。ビデオサーバ2010によって送信されるビデオストリームは、送信デバイス2020を介してビデオ端末2030に送信される。 In accordance with embodiments 1-3 of the present invention, embodiment 5 of the present invention provides a system 2000 for performing video quality assessment. As shown in FIG. 9A, the system 2000 includes a video server 2010, a transmitting device 2020, and a video terminal 2030. The video stream transmitted by the video server 2010 is transmitted to the video terminal 2030 via the transmission device 2020.

特定の実施態様では、送信デバイス2020またはビデオ端末2030は、特に、実施形態2で提供される検出装置200または実施形態3で提供される検出装置1000を含んでもよい。特定の実施態様では、送信デバイス2020およびビデオ端末2030の両方は、実施形態2で提供される検出装置200または実施形態3で提供される検出装置1000を含んでもよい。送信デバイス2020またはビデオ端末2030は、特に、実施形態4で提供される検出デバイス400であってもよい。 In certain implementations, the transmitting device 2020 or video terminal 2030 may specifically include the detection apparatus 200 provided in embodiment 2 or the detection apparatus 1000 provided in embodiment 3. In certain implementations, both the transmitting device 2020 and the video terminal 2030 may include the detection apparatus 200 provided in Embodiment 2 or the detection apparatus 1000 provided in Embodiment 3. The transmission device 2020 or the video terminal 2030 may in particular be the detection device 400 provided in embodiment 4.

別の特定の実施態様では、システムは、検出装置2040をさらに含む。図9Bおよび図9Cに示すように、検出装置2040は、特に、実施形態2で提供される検出装置200または実施形態3で提供される検出装置1000であってもよい。送信デバイス2020またはビデオ端末2030は、検出装置2040に接続され、検出装置2040は、検出装置2040に接続された送信デバイス2020またはビデオ端末2030を使用してビデオストリームを取得する。特定の実施態様では、送信デバイス2020およびビデオ端末2030は、別々に1つの検出装置2040に接続されてもよい。 In another specific embodiment, the system further includes a detection device 2040. As shown in FIGS. 9B and 9C, the detection device 2040 may in particular be the detection device 200 provided in the second embodiment or the detection device 1000 provided in the third embodiment. The transmission device 2020 or the video terminal 2030 is connected to the detection device 2040, and the detection device 2040 acquires the video stream using the transmission device 2020 or the video terminal 2030 connected to the detection device 2040. In certain implementations, the transmitting device 2020 and the video terminal 2030 may be separately connected to one detector 2040.

本発明の実施形態5では、ビデオ中のシーンチェンジフレームを検出するときに、Pフレーム中のシーンチェンジフレームを検出することができ、これにより、シーンチェンジフレームの検出漏れを効果的に低減することができる。 In Embodiment 5 of the present invention, when detecting a scene change frame in a video, it is possible to detect a scene change frame in a P frame, thereby effectively reducing a scene change frame detection failure. Can do.

当業者であれば、本明細書に開示されている実施形態で説明された例との組み合わせにおいて、ユニットおよびアルゴリズムステップが、電子ハードウェアまたはコンピュータソフトウェアと電子ハードウェアとの組み合わせによって実施され得ることを認識することができる。機能がハードウェアとソフトウェアのどちらによって実行されるかは、技術的解決策の特定の用途および設計制約条件に依存する。当業者であれば、特定の用途ごとに、説明された機能を実施するために異なる方法を使用することができるが、その実施態様は本発明の範囲を超えると考えられるべきではない。 Those skilled in the art that, in combination with the examples described in the embodiments disclosed herein, the units and algorithm steps can be implemented by electronic hardware or a combination of computer software and electronic hardware. Can be recognized. Whether the function is performed by hardware or software depends on the specific application and design constraints of the technical solution. One skilled in the art can use different methods to perform the described functions for each particular application, but that embodiment should not be considered beyond the scope of the present invention.

簡便かつ簡単な説明のために、上記のシステム、装置、およびユニットの詳細な動作プロセスについては、上記の方法の実施形態における対応するプロセスを参照することができ、ここでは詳細は再度説明していないことが、当業者によって明確に理解され得る。 For the sake of simplicity and simplicity, the detailed operating processes of the above systems, devices, and units can be referred to the corresponding processes in the above method embodiments, where details are described again. It can be clearly understood by those skilled in the art.

本願で提供されているいくつかの実施形態に関して、開示されているシステム、装置、および方法が他の方法で実施され得ることを理解されたい。例えば、説明されている装置の実施形態は単なる例である。例えば、ユニットの分割は、単なる論理的な機能の分割であり、実際の実施態様では他の分割であってもよい。例えば、複数のユニットまたはコンポーネントが、別のシステムとして組み合わされるか、もしくは統合されてもよいし、一部の特徴が、無視されるか、もしくは実行されなくてもよい。さらに、提示したまたは述べた相互結合または直接的な結合もしくは通信接続は、いくつかのインタフェースを使用して実施されてもよい。装置またはユニット間の間接的な結合または通信接続は、電子的形態、機械的形態、または他の形態で実施されてもよい。 It should be understood that with respect to some embodiments provided herein, the disclosed systems, devices, and methods may be implemented in other ways. For example, the described apparatus embodiment is merely an example. For example, the unit division is merely logical function division, and may be another division in an actual implementation. For example, multiple units or components may be combined or integrated as separate systems, and some features may be ignored or not performed. Furthermore, the presented or described mutual coupling or direct coupling or communication connection may be implemented using several interfaces. Indirect coupling or communication connections between devices or units may be implemented in electronic form, mechanical form, or other form.

別々の部分として説明されているユニットは、物理的に別々であってもなくてもよく、ユニットとして提示されている部分は、物理的なユニットであってもなくてもよく、1つの位置に配置されても、複数のネットワークユニットに分散されてもよい。ユニットの一部または全部は、実施形態の解決策の目的を達成するために実際の必要に応じて選択されてもよい。 Units that are described as separate parts may or may not be physically separate, and parts that are presented as units may or may not be physical units, It may be arranged or distributed to a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

また、本発明の実施形態における機能ユニットは、1つの処理ユニットに統合されてもよいし、これらのユニットの各々は、物理的に単独で存在してもよいし、2つ以上のユニットが、1つのユニットに統合されてもよい。 Further, the functional units in the embodiments of the present invention may be integrated into one processing unit, each of these units may physically exist alone, or two or more units may be It may be integrated into one unit.

機能が、ソフトウェア機能ユニットの形態で実施され、独立した製品として販売または使用される場合、機能は、コンピュータ可読記憶媒体に記憶されてもよい。こうした理解に基づいて、本質的に、本発明の技術的解決策、または従来技術に寄与する部分、または技術的解決策一部は、ソフトウェア製品の形態で実施されてもよい。コンピュータソフトウェア製品は、記憶媒体に記憶され、コンピュータデバイス（パーソナルコンピュータ、サーバ、またはネットワークデバイスであってもよい）に、本発明の実施形態で説明した方法のステップの全部または一部を実行するよう命令するためのいくつかの命令を含む。上記の記憶媒体は、USBフラッシュドライブ、リムーバブルハードディスク、読み出し専用メモリ（ROM、Read-Only Memory）、ランダムアクセスメモリ（RAM、Random Access Memory）、磁気ディスク、または光ディスクなど、プログラムコードを記憶することができる任意の媒体を含む。 If the function is implemented in the form of a software functional unit and sold or used as an independent product, the function may be stored on a computer-readable storage medium. Based on this understanding, in essence, the technical solution of the present invention, or a part that contributes to the prior art, or a part of the technical solution, may be implemented in the form of a software product. The computer software product is stored in a storage medium so that the computer device (which may be a personal computer, server, or network device) performs all or part of the method steps described in the embodiments of the present invention. Includes several instructions for ordering. The above storage medium may store program codes such as a USB flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk. Including any medium capable.

上記の説明は、本発明の特定の実施態様に過ぎず、本発明の保護範囲を限定するものではない。本発明で開示された技術的範囲内で当業者に容易に想到される変形例または置換例は、本発明の保護範囲内に含まれるものとする。したがって、本発明の保護範囲は、特許請求の範囲の保護範囲に従うものとする。 The above descriptions are merely specific embodiments of the present invention, and do not limit the protection scope of the present invention. Modifications or substitutions easily conceived by those skilled in the art within the technical scope disclosed in the present invention shall fall within the protection scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

100 ビデオシステム
110 ビデオサーバ
120 送信デバイス
130 ビデオ端末
200 検出装置
210 第1の判定部
220 第2の判定部
400 検出デバイス
1000 検出装置
1010 プロセッサ
1020 メモリ
2000 システム
2010 ビデオサーバ
2020 送信デバイス
2030 ビデオ端末
2040 検出装置
4010 媒体部
4020 検出装置 100 video system
110 Video server
120 sending device
130 video terminals
200 detector
210 1st judgment part
220 Second judgment part
400 detection devices
1000 detector
1010 processor
1020 memory
2000 system
2010 video server
2020 sending device
2030 video terminal
2040 Detector
4010 Media section
4020 Detector

第1の態様によれば、シーンチェンジフレームを検出するための方法が提供される。ビデオは、N個のピクチャグループ（GOP）を含み、Nは、2以上の整数であり、本方法は、
K番目のGOP内のすべてのPフレームのうちの最大PフレームP_maxを判定するステップであって、P_maxのサイズが、
であり、Kが、MからNまでの範囲の変数であり、1≦M≦Nである、ステップと、
の間の相対値が第1の閾値以上であり、かつ
の間の相対値が第2の閾値以上であると判定された場合に、P_maxはシーンチェンジフレームであると判定するステップであって、
が、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間の複数のIフレームのサイズの中央値または平均値であり、
が、K番目のGOP内の複数のPフレームのサイズの中央値または平均値であり、第1の閾値が、0よりも大きくかつ1よりも小さく、第2の閾値が1よりも大きい、ステップと
を含む。 According to a first aspect, a method for detecting a scene change frame is provided. The video contains N picture groups ( GOP ) , where N is an integer greater than or equal to 2,
Determining the maximum P frame P _max of all P frames in the Kth GOP, wherein the size of P _max is
And K is a variable in the range from M to N, 1 ≦ M ≦ N, and
The relative value between is greater than or equal to the first threshold, and
P _max is a scene change frame when it is determined that the relative value between is greater than or equal to the second threshold,
But a P _max, a median or average value of the size of a plurality of I frames between the nearest and the scene change frame before P _max to P _max,
Is the median or average value of the sizes of multiple P frames in the Kth GOP, the first threshold is greater than 0 and less than 1, and the second threshold is greater than 1. Includes and.

第2の態様によれば、ビデオ品質評価を実施するための方法が提供される。ビデオは、N個のピクチャグループ（GOP）を含み、Nは、2以上の整数であり、本方法は、
K番目のGOP内のすべてのPフレームのうちの最大PフレームP_maxを判定するステップであって、P_maxのサイズが、
であり、Kが、MからNまでの範囲の変数であり、1≦M≦Nである、ステップと、
の間の相対値が第1の閾値以上であり、
の間の相対値が第2の閾値以上であり、かつK番目のGOP内にBフレームが存在しないと判定された場合、または、
の間の相対値が第1の閾値以上であり、
の間の相対値が第2の閾値以上であり、K番目のGOP内にBフレームが存在し、かつ
の間の相対値が第3の閾値以上であると判定された場合、P_maxはシーンチェンジフレームであると判定するステップであって、
が、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間の複数のIフレームのサイズの中央値または平均値であり、
が、K番目のGOP内の複数のPフレームのサイズの中央値または平均値であり、
が、P_maxと、P_maxに最も近くかつP_maxよりも前のシーンチェンジフレームとの間のすべてのBフレームのサイズの中央値または平均値であり、第1の閾値が、0よりも大きくかつ1よりも小さく、第2の閾値が1よりも大きく、第3の閾値が1よりも大きい、ステップと
を含む。 According to a second aspect, a method for performing video quality assessment is provided. The video contains N picture groups ( GOP ) , where N is an integer greater than or equal to 2,
Determining the maximum P frame P _max of all P frames in the Kth GOP, wherein the size of P _max is
And K is a variable in the range from M to N, 1 ≦ M ≦ N, and
The relative value between is greater than or equal to the first threshold,
If it is determined that the relative value between is greater than or equal to the second threshold and there is no B frame in the Kth GOP, or
The relative value between is greater than or equal to the first threshold,
The relative value between is greater than or equal to the second threshold, there is a B frame in the Kth GOP, and
If it is determined that the relative value between is greater than or equal to the third threshold, P _max is a step of determining that it is a scene change frame,
But a P _max, a median or average value of the size of a plurality of I frames between the nearest and the scene change frame before P _max to P _max,
Is the median or average of the sizes of multiple P frames in the Kth GOP,
Large but the P _max, a median or average value of the sizes of all B-frames between the nearest and P _max than previous scene change frame into P _max, the first threshold, than 0 And a step in which the second threshold value is greater than 1 and the third threshold value is greater than 1.

第4の態様によれば、ビデオ中のシーンチェンジフレームを検出するための検出装置が提供される。ビデオは、N個のピクチャグループ（GOP）を含み、Nは、2以上の整数であり、検出装置は、第1の判定部および第2の判定部を含む。 According to the fourth aspect, there is provided a detection device for detecting a scene change frame in a video. The video includes N picture groups ( GOP ) , where N is an integer equal to or greater than 2, and the detection device includes a first determination unit and a second determination unit.

検出装置4020は、検出されたシーンチェンジフレームに基づいて、検出対象ビデオの品質、またはビデオセグメントが配置されているビデオの品質をさらに評価してもよい。 The detection device 4020 may further evaluate the quality of the video to be detected or the quality of the video in which the video segment is arranged based on the detected scene change frame.

Claims

A method for detecting a scene change frame in a video, wherein the video includes N picture group GOPs, where N is an integer greater than or equal to 2,
Determining the maximum P frame P _max of all P frames in the Kth GOP, wherein the size of P _max is
And K is a variable in the range from M to N, 1 ≦ M ≦ N, and
The relative value between is greater than or equal to the first threshold, and
P _max is a scene change frame when it is determined that the relative value between is greater than or equal to the second threshold,
But a P _max, a median or average value of the size of a plurality of I frames between the nearest and the scene change frame before P _max to P _max,
Is the median or average value of the sizes of a plurality of P frames in the K-th GOP, the first threshold is greater than 0 and less than 1, and the second threshold is greater than 1. A method, including a large step.

The method comprises
formula
Calculating the first threshold according to
Further comprising calculating the first threshold according to
I _threshold is the first threshold, I _median is the _median or average of the sizes of all I frames of the video, and P _median is the _median of the sizes of all P frames of the video The method according to claim 1, wherein the method is an average value.

The first threshold is the formula
If obtained according to
Or the first threshold is the formula
If obtained according to
The method of claim 2, wherein

The method comprises
After P _max in the Kth GOP is determined as a scene change frame, the _median or average value of the sizes of P frames other than the P frame determined as a scene change frame in the video is used as a new P _median And the above formula
And calculating a new I _threshold according to claim 2, wherein the new I _threshold is used to determine whether P _max in the next GOP is a scene change frame. Or the method of 3.

The method comprises
Detecting a scene change frame from I frames in the Nth GOP from the Mth GOP of the N GOPs;
If K is less than N, it is determined whether P _max in the Kth GOP is a scene change frame, and then whether or not the I frame in the (K + 1) th GOP is a scene change frame. And determining
The step of detecting a scene change frame from an I frame in an Nth GOP from an Mth GOP of the N GOPs,
When it is determined that the first distance is less than or equal to the distance threshold, the I frame in the (K + 1) -th GOP is determined not to be a scene change frame, and the first distance is The I frame in the (K + 1) th GOP; a scene change frame that is closest to the I frame in the (K + 1) th GOP and before the I frame in the (K + 1) th GOP; 5. A method according to any one of the preceding claims, particularly comprising the step of being a distance between.

The method comprises
Setting the distance threshold, wherein the distance threshold is one of the following three lengths:
The length of the longest GOP of the N GOPs;
The median or average value of the lengths of the N GOPs, and the length L, the number of GOPs having the length L being the maximum among the N GOPs, the length L
6. The method of claim 5, further comprising the step of being one of:

When a new scene change frame is determined, if the second distance is less than the distance threshold, the distance threshold is updated to the second distance, and the second distance is notably the new scene change frame. 7. The method according to claim 5 or 6, wherein the distance is between a frame and a scene change frame that is closest to and before the new scene change frame.

A method for detecting a scene change frame in a video, wherein the video includes N picture group GOPs, where N is an integer greater than or equal to 2,
Determining the maximum P frame P _max of all P frames in the Kth GOP, wherein the size of P _max is
And K is a variable in the range from M to N, 1 ≦ M ≦ N, and
The relative value between is greater than or equal to the first threshold,
If the relative value between is greater than or equal to a second threshold and it is determined that there is no B frame in the Kth GOP, or
The relative value between is greater than or equal to the first threshold,
The relative value between is greater than or equal to a second threshold, a B frame exists in the Kth GOP, and
If it is determined that the relative value between is greater than or equal to the third threshold, P _max is a step of determining that it is a scene change frame,
But a P _max, a median or average value of the size of a plurality of I frames between the nearest and the scene change frame before P _max to P _max,
Is the median or average value of the sizes of P frames in the Kth GOP,
But a P _max, a median or average value of the sizes of all B-frames between the scene change frames before the nearest and P _max to P _max, the first threshold, from 0 And the second threshold value is greater than 1 and the third threshold value is greater than 1.

The method comprises
formula
Calculating the first threshold according to
Calculating the first threshold according to: I _threshold is the first threshold, I _median is the _median or average value of the sizes of all I frames of the video, and P _median 9. The method of claim 8, wherein is the median or average value of the sizes of all P frames of the video.

The method comprises
formula
Calculating the threshold value of 3 according to:
Calculating the third threshold according to: B _threshold is the third threshold, P _median is the _median or average of the sizes of all P frames of the video, and B _median 10. A method according to claim 8 or 9, wherein is the median or average value of the sizes of all B frames of the video.

The method comprises
After P _max in the Kth GOP is determined as a scene change frame, the _median or average value of the sizes of P frames other than the P frame determined as a scene change frame in the video is used as a new P _median And the above formula
And calculating a new I _{threshold The} following, it said new I _{threshold The} is, P _max in the next GOP is used to determine whether a scene change frame, further comprising the steps of claim 8 The method according to any one of 10 to 10.

The method comprises
Detecting a scene change frame from I frames in the Nth GOP from the Mth GOP of the N GOPs;
If K is less than N, it is determined whether P _max in the Kth GOP is a scene change frame, and then whether or not the I frame in the (K + 1) th GOP is a scene change frame. And determining
The step of detecting a scene change frame from an I frame in an Nth GOP from an Mth GOP of the N GOPs,
When it is determined that the first distance is less than or equal to the distance threshold, the I frame in the (K + 1) -th GOP is determined not to be a scene change frame, and the first distance is The I frame in the (K + 1) th GOP; a scene change frame that is closest to the I frame in the (K + 1) th GOP and before the I frame in the (K + 1) th GOP; 12. A method according to any one of claims 8 to 11, comprising in particular a step, which is a distance between.

The method comprises
Setting the distance threshold, wherein the distance threshold is one of the following three lengths:
The length of the longest GOP of the N GOPs;
The median or average value of the lengths of the N GOPs, and the length L, the number of GOPs having the length L being the maximum among the N GOPs, the length L
13. The method of claim 12, further comprising the step of being one of:

When a new scene change frame is determined, if the second distance is less than the distance threshold, the distance threshold is updated to the second distance, and the second distance is notably the new scene change frame. 14. The method of claim 13, wherein the distance is between a frame and a scene change frame that is closest to and before the new scene change frame.

A detection device for detecting a scene change frame in a video, wherein the video includes N GOPs, N is an integer equal to or greater than 2, and the detection device includes a first determination unit and a first determination unit. Including two decision units,
The first determination unit determines the maximum P frame P _max among all P frames in the Kth GOP, and the size of P _max is:
And K is a variable ranging from M to N, and is configured such that 1 ≦ M ≦ N,
The second determination unit,
The relative value between is greater than or equal to the first threshold, and
When it is determined that the relative value between is greater than or equal to the second threshold, P _max is determined to be a scene change frame,
Has a P _max, a median or average value of the size of a plurality of I frames between the nearest and the scene change frame before P _max to P _max,
Is the median or average value of the sizes of a plurality of P frames in the Kth GOP, the first threshold is greater than 0 and less than 1, and the second threshold is less than 1. A detection device configured to be large.

The second determination unit,
formula
Calculating the first threshold according to or
Further configured to calculate the first threshold according to
I _threshold is the first threshold, I _median is the _median or average of the sizes of all I frames of the video, and P _median is the _median of the sizes of all P frames of the video 16. The detection device according to claim 15, which is an average value.

The first threshold is the formula
If obtained according to
Or the first threshold is the formula
If obtained according to
The detection device according to claim 16, wherein

After the second determination unit determines that P _max in the Kth GOP is a scene change frame, the median value of the sizes of P frames other than the P frame determined as the scene change frame in the video Or use the mean as the new P _median and the above formula
The new I _{threshold The} calculated according the new I _{threshold The} is, P _max in the next GOP is used to determine whether a scene change frame, further configured to, according to claim 16 or 17 The method described in 1.

The second determination unit,
A scene change frame is detected from I frames in the Nth GOP from the Mth GOP of the N GOPs.
If K is less than N, it is determined whether P _max in the Kth GOP is a scene change frame, and then whether or not the I frame in the (K + 1) th GOP is a scene change frame. Further configured to determine,
The second determination unit is configured to detect a scene change frame from an I frame in an Nth GOP from an Mth GOP of the N GOPs,
When it is determined that the first distance is equal to or less than the distance threshold, it is determined that the I frame in the (K + 1) -th GOP is not a scene change frame, and the first distance is the (K + 1) The distance between the I frame in the GOP and the scene change frame closest to the I frame in the (K + 1) th GOP and before the I frame in the (K + 1) th GOP The detection device according to any one of claims 15 to 18, which specifically includes:

The second determination unit sets the distance threshold, and the distance threshold is one of the following three lengths:
The length of the longest GOP of the N GOPs;
The median or average value of the lengths of the N GOPs, and the length L, the number of GOPs having the length L being the maximum among the N GOPs, the length L
20. The detection device of claim 19, further configured to be one of:

When the second distance is smaller than the distance threshold when the second determination unit determines a new scene change frame, the second threshold is updated to the second distance, and the second distance is updated. Is in particular further configured to be a distance between the new scene change frame and a scene change frame that is closest to and before the new scene change frame. Or the detection apparatus of 20.

A detection device for detecting a scene change frame in a video, wherein the video includes N picture groups GOP, N is an integer equal to or greater than 2, and the detection device includes a first determination unit. And a second determination unit,
The first determination unit determines the maximum P frame P _max among all P frames in the Kth GOP, and the size of P _max is:
And K is a variable ranging from M to N, and is configured such that 1 ≦ M ≦ N,
The second determination unit,
The relative value between is greater than or equal to the first threshold,
If the relative value between is greater than or equal to a second threshold and it is determined that there is no B frame in the Kth GOP, or
The relative value between is greater than or equal to the first threshold,
The relative value between is greater than or equal to a second threshold, a B frame exists in the Kth GOP, and
When it is determined that the relative value between is greater than or equal to the third threshold, P _max is determined to be a scene change frame,
Has a P _max, a median or average value of the size of a plurality of I frames between the nearest and the scene change frame before P _max to P _max,
Is the median or average value of the sizes of multiple P frames in the Kth GOP,
Has a P _max, a median or average value of the sizes of all B-frames between the scene change frames before the nearest and P _max to P _max, the first threshold value, from 0 And the second threshold value is greater than 1, and the third threshold value is greater than 1.

The second determination unit,
formula
Calculating the first threshold according to or
Further configured to calculate the first threshold according to
I _threshold is the first threshold, I _median is the _median or average of the sizes of all I frames of the video, and P _median is the _median of the sizes of all P frames of the video 23. The detection device according to claim 22, which is an average value.

The second determination unit,
formula
Calculate the third threshold according to or
And is further configured to calculate the third threshold according to
I _threshold is the first threshold, I _median is the _median or average of the sizes of all I frames of the video, and B _median is the _median of the sizes of all B frames of the video The detection device according to claim 22 or 23, which is an average value.

After the second determination unit determines that P _max in the Kth GOP is a scene change frame, the median value of the sizes of P frames other than the P frame determined as the scene change frame in the video Or use the mean as the new P _median and the above formula
25. The method according to claim 22, further comprising: calculating a new I _threshold according to: wherein the new I _threshold is used to determine whether P _max in the next GOP is a scene change frame. The detection device according to any one of the above.

The second determination unit,
A scene change frame is detected from I frames in the Nth GOP from the Mth GOP of the N GOPs.
If K is less than N, it is determined whether P _max in the Kth GOP is a scene change frame, and then whether or not the I frame in the (K + 1) th GOP is a scene change frame. Further configured to determine,
The second determination unit is configured to detect a scene change frame from an I frame in an Nth GOP from an Mth GOP of the N GOPs,
When it is determined that the first distance is equal to or less than the distance threshold, it is determined that the I frame in the (K + 1) -th GOP is not a scene change frame, and the first distance is the (K + 1) The distance between the I frame in the GOP and the scene change frame closest to the I frame in the (K + 1) th GOP and before the I frame in the (K + 1) th GOP 26. The detection device according to any one of claims 22 to 25, which specifically includes:

The second determination unit sets the distance threshold, and the distance threshold is one of the following three lengths:
The length of the longest GOP of the N GOPs;
The median or average value of the lengths of the N GOPs, and the length L, the number of GOPs having the length L being the maximum among the N GOPs, the length L
27. The detection device of claim 26, further configured to be one of:

When the second distance is smaller than the distance threshold when the second determination unit determines a new scene change frame, the second threshold is updated to the second distance, and the second distance is updated. 28, in particular, is further configured to be a distance between the new scene change frame and a scene change frame that is closest to and before the new scene change frame. The detection device according to 1.