JP2011129979A

JP2011129979A - Image processor

Info

Publication number: JP2011129979A
Application number: JP2009283841A
Authority: JP
Inventors: Naoya Hayashi; 直哉林
Original assignee: Renesas Electronics Corp
Current assignee: Renesas Electronics Corp
Priority date: 2009-12-15
Filing date: 2009-12-15
Publication date: 2011-06-30

Abstract

<P>PROBLEM TO BE SOLVED: To reliably detect a scene change from moving images compressed by a compression system using inter-frame or inter-field motion compensation. <P>SOLUTION: A reduced image generation part 130 acquires the reduced images of a plurality of pictures configuring the moving images in encoding order from an MPEG-2 stream. A frequency component acquisition part 150 extracts each of the frequency components of the plurality of reduced images obtained by the reduced image generation part 130. A scene change degree calculation part 160 rearranges the frequency components of the plurality of reduced images in display order, and obtains a difference of the frequency components acquired by the frequency component acquisition part 150 for each two continuous reduced images or each reduced images of the leading I pictures of two continuous GOPs (Groups Of Pictures), as a scene change degree indicating the degree of the possibility of the scene change. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、画像処理技術、具体的には、圧縮された動画像に対してシーンチェンジ検出を行う技術に関する。 The present invention relates to an image processing technique, specifically, a technique for performing scene change detection on a compressed moving image.

圧縮された動画像に対して、シーンの切り替わる箇所を特定するシーンチェンジ検出が様々な目的で行われている。例えば、ビデオ録画装置では、ビデオ編集の開始点として、シーンチェンジの箇所を設定することが多い。また録画したビデオのシーンの検索にシーンチェンジの画像をしばしば用いる。さらに、デジタル放送のストリームに対して、解像度や圧縮率を変えて録画する場合や、トランスコードをする場合などがあり、これらの場合において、シーンチェンジの情報を利用して録画時の符号化を行うことにより画質の向上を図ることができる。なお、「トランスコード」とは、圧縮の符号化方式を変えて録画することを意味し、例えば符号化方式をＭＰＥＧ−２からＨ．２６４に変更して録画することはこれに該当する。説明上の便宜のため、以下、解像度や圧縮率を変えて圧縮することやトランスコードなどを「再圧縮」という。 Scene change detection is performed for various purposes to identify a scene switching point for a compressed moving image. For example, in a video recording apparatus, a scene change location is often set as the starting point of video editing. Also, scene change images are often used to search recorded video scenes. Furthermore, there are cases where the digital broadcast stream is recorded with different resolutions and compression ratios, transcoding, etc. In these cases, encoding at the time of recording is performed using the information of the scene change. This can improve the image quality. Note that “transcoding” means recording by changing the compression encoding method. For example, the encoding method is changed from MPEG-2 to H.264. Recording to change to H.264 corresponds to this. For convenience of explanation, hereinafter, compressing by changing the resolution and compression rate, transcoding and the like are referred to as “recompression”.

シーンチェンジが生じると、現在のピクチャと過去のピクチャとの相関が無くなる。フレーム間動き補償を用いた圧縮方式で圧縮を行う際に、相関の無いフレーム間でフレーム間動き補償を行うのでは、画質劣化を引き起こしかねない問題がある。そのため、上述した再圧縮の際に、シーンチェンジが生じるピクチャを、該ピクチャを符号化する前に検出しておく必要がある。 When a scene change occurs, there is no correlation between the current picture and the past picture. When performing compression by a compression method using interframe motion compensation, performing interframe motion compensation between uncorrelated frames has a problem that may cause image quality degradation. Therefore, it is necessary to detect a picture in which a scene change occurs at the time of the above-described recompression before encoding the picture.

ここで、圧縮された動画像に対してシーンチェンジ検出を行う技術をいくつか説明する。説明に際して、動画像の元の圧縮方式としてＭＰＥＧ−２を例にする。また、本発明の説明において、特別な説明が無い限り、「ピクチャ」は、動画像の構成単位を意味し、「フレーム」と「フィールド」のいずれも含む。 Here, some techniques for performing scene change detection on a compressed moving image will be described. In the description, MPEG-2 is taken as an example as an original compression method for moving images. In the description of the present invention, unless otherwise specified, “picture” means a structural unit of a moving image, and includes both “frame” and “field”.

ＭＰＥＧ−２では、フレームあるいはフィールド間動き補償と、ブロック毎の直交変換（ＤＣＴ）とを組み合わせた符号化が採用されている。ブロックは、ピクチャの画面を８画素×８ラインに分割したものである。通常使われている４：２：０フォーマットでは、符号化は、輝度信号（Ｙ）の４ブロックと、２つの色差信号（Ｃｂ、Ｃｒ）それぞれ１ブロックにより構成されたマクロブロック（ＭＢ）単位で行われる。ＭＢは、画面内で符号化するイントラＭＢと、過去の符号化ピクチャを参照して動き補償の予測値を作成する前方予測ＭＢと、未来の符号化ピクチャを参照して動き補償の予測値を作成する後方予測ＭＢと、過去及び未来の符号化ピクチャを参照して動き補償の予測値を作成する両方向予測ＭＢの４種類がある。またピクチャは、イントラＭＢしか使えないＩピクチャと、イントラＭＢと前方予測ＭＢのみ使えるＰピクチャと、全てのタイプのＭＢが使えるＢピクチャの３種類がある。Ｂピクチャは、それが参照する未来のＩピクチャまたはＰピクチャの後に符号化されるため、Ｂピクチャがある場合には、ストリームにおけるピクチャの並び順と表示順が異なる。また、ストリーム上でＩピクチャから始まる０．５〜１秒程度のピクチャ群により、ランダムアクセスや編集の単位としてのＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅｓ）が構成される。 MPEG-2 employs encoding that combines frame or inter-field motion compensation and orthogonal transform (DCT) for each block. A block is a picture screen divided into 8 pixels × 8 lines. In the commonly used 4: 2: 0 format, encoding is performed in units of macroblocks (MB) each composed of 4 blocks of luminance signal (Y) and 1 block of 2 color difference signals (Cb, Cr). Done. The MB includes an intra MB that is encoded in the screen, a forward prediction MB that creates a motion compensation prediction value by referring to a past encoded picture, and a motion compensation prediction value that refers to a future encoded picture. There are four types of backward prediction MBs to be created and bidirectional prediction MBs to create motion compensation prediction values by referring to past and future coded pictures. In addition, there are three types of pictures: an I picture that can use only an intra MB, a P picture that can use only an intra MB and a forward prediction MB, and a B picture that can use all types of MBs. Since a B picture is encoded after a future I picture or P picture to which it refers, if there is a B picture, the arrangement order and display order of the pictures in the stream are different. A group of pictures starting from an I picture on the stream for about 0.5 to 1 second constitutes a GOP (Group Of Pictures) as a unit of random access or editing.

特許文献１には、圧縮された動画像の１ピクチャ毎のデータ量を計数し、計数値と閾値とを比較することにより該ピクチャがシーンチェンジであるか否かを判定する技術が開示されている（特許文献１における請求項７）。 Patent Document 1 discloses a technique for determining whether a picture is a scene change by counting the amount of data for each picture of a compressed moving image and comparing the count value with a threshold value. (Claim 7 in Patent Document 1).

特許文献２には、シーンチェンジである可能性であるピクチャに対して、イントラ符号化マクロブロックの数と、それ以外の予測マクロブロックの数との比率を算出して閾値と比較し、比較の結果に基づいて該ピクチャがシーンチェンジであるか否かを判定する技術が開示されている。 In Patent Document 2, the ratio of the number of intra-coded macroblocks to the number of other predicted macroblocks is calculated for a picture that may be a scene change, and compared with a threshold value. A technique for determining whether or not the picture is a scene change based on the result is disclosed.

特許文献３には、ピクチャの種類別の符号量を求め、求めた符号量に所定の演算を施して２つのピクチャ間の相関を表す特徴量を算出し、閾値との比較によりシーンチェンジを検出する技術が開示されている。 In Patent Document 3, a code amount for each type of picture is obtained, a predetermined calculation is performed on the obtained code amount to calculate a feature amount representing a correlation between two pictures, and a scene change is detected by comparison with a threshold value. Techniques to do this are disclosed.

シーンチェンジのピクチャは、表示順で直前のピクチャと相関が無いため、フレーム間予測がしにくくなる。そのため、シーンチェンジのピクチャは、符号量が増えたり、予測ＭＢの数の比率が変わったりする。上述した各技術は、この点を利用してシーンチェンジを検出している。 Since the scene change picture has no correlation with the immediately preceding picture in the display order, inter-frame prediction is difficult. Therefore, the code of the scene change picture increases or the ratio of the number of predicted MBs changes. Each technique described above uses this point to detect a scene change.

ここで、図７〜図１１を参照して、よく使われる「Ｍ＝１」タイプと「Ｍ＝３」タイプのストリームに対して上記技術によるシーンチェンジ検出を説明する。なお、「Ｍ＝１」タイプのストリームとは、ＩピクチャとＰピクチャのみを有するストリームを意味し、「Ｍ＝３」タイプのストリームとは、ＩまたはＰピクチャ間に２枚のＢピクチャが挟まれるストリームを意味する。また、各図において、「Ｉ」、「Ｂ」、「Ｐ」は、ピクチャの種類（タイプ）を示し、ピクチャタイプの次の数字は、表示順での番号を示す。分かりやすいように、表示する際のピクチャの並び順も示す。なお、図中陰影は、シーンチェンジが起こるピクチャを示す。 Here, with reference to FIGS. 7 to 11, scene change detection by the above technique will be described for the frequently used “M = 1” type and “M = 3” type streams. The “M = 1” type stream means a stream having only I and P pictures, and the “M = 3” type stream means that two B pictures are sandwiched between I or P pictures. Means a stream. In each figure, “I”, “B”, and “P” indicate the type (type) of a picture, and the next number of the picture type indicates a number in display order. For the sake of easy understanding, the arrangement order of pictures at the time of display is also shown. In the figure, the shade indicates a picture in which a scene change occurs.

図７は、「Ｍ＝１」タイプであり、Ｐピクチャに続くＰピクチャでシーンチェンジが生じた場合のストリームの例を示す。図示のように、該ストリームは、ＩピクチャとＰピクチャからなり、Ｐ２に続くＰ３でシーンチェンジが生じている。 FIG. 7 shows an example of a stream when the scene change occurs in the P picture following the P picture of the “M = 1” type. As shown in the figure, the stream is composed of an I picture and a P picture, and a scene change occurs at P3 following P2.

この場合、Ｐ２とＰ３の相関が低いことから、符号化時にＰ２を前方参照して動き補償がなされるＰ３は、イントラＭＢが多くなり、また前方予測ＭＢでは動きベクトルが大きくなる傾向があり、符号量も大きくなる。このような場合、上述した技術によりＰ３をシーンチェンジとして検出することができる。 In this case, since the correlation between P2 and P3 is low, P3 in which motion compensation is performed by referring to P2 forward at the time of encoding tends to have a larger intra MB, and a motion vector tends to be larger in the forward prediction MB. The code amount also increases. In such a case, P3 can be detected as a scene change by the technique described above.

図８は、「Ｍ＝３」タイプであり、Ｂピクチャに続くＰピクチャでシーンチェンジが生じた場合のストリームの例を示す。図示のように、該ストリームは、ＩまたはＰピクチャ間に２枚のＢピクチャが挟まれており、Ｉ２、Ｂ０、Ｂ１に続くＰ５でシーンチェンジが生じている。 FIG. 8 shows an example of a stream when the scene change occurs in the P picture following the B picture of the “M = 3” type. As shown in the figure, in the stream, two B pictures are sandwiched between I or P pictures, and a scene change occurs at P5 following I2, B0, and B1.

この場合、符号化時にＩ２を前方参照して動き補償がなされるＰ５は、イントラＭＢが多くなる。また、Ｉ２を前方参照しＰ５を後方参照して動き補償がなされるＢ３とＢ４は、後方予測ＭＢと両方向予測ＭＢが少なくなる。そのため、上述した技術によりＰ５をシーンチェンジとして検出することができる。 In this case, intra MB increases in P5 in which motion compensation is performed by referring to I2 forward during encoding. Also, B3 and B4, in which motion compensation is performed by referring forward to I2 and backwardly referring to P5, have fewer backward prediction MBs and bidirectional prediction MBs. Therefore, P5 can be detected as a scene change by the technique described above.

図９は、「Ｍ＝３」タイプであり、Ｐピクチャに続くＢピクチャでシーンチェンジが生じた場合のストリームの例を示す。図示のように、該ストリームは、Ｉ２、Ｂ０、Ｂ１、Ｐ５に続くＢ３でシーンチェンジが生じている。 FIG. 9 shows an example of a stream when the scene change occurs in the B picture following the P picture of the “M = 3” type. As shown in the figure, a scene change occurs in B3 following I2, B0, B1, and P5 in the stream.

この場合、Ｉ２を前方参照して動き補償がなされるＰ５は、イントラＭＢが多くなる。また、Ｉ２を前方参照しＰ５を後方参照して動き補償がなされるＢ３とＢ４は、前方予測ＭＢと両方向予測ＭＢが少なくなる。そのため、上述した技術によりＢ３をシーンチェンジとして検出することができる。 In this case, intra MB increases in P5 in which motion compensation is performed with reference to I2 forward. Also, B3 and B4, which are motion-compensated with forward reference to I2 and backward reference to P5, have fewer forward prediction MBs and bidirectional prediction MBs. Therefore, B3 can be detected as a scene change by the technique described above.

図１０は、「Ｍ＝３」タイプであり、Ｂピクチャに続くＢピクチャでシーンチェンジが生じた場合のストリームの例を示す。図示のように、該ストリームは、Ｉ２、Ｂ０、Ｂ１、Ｐ５、Ｂ３に続くＢ４でシーンチェンジが生じている。 FIG. 10 shows an example of a stream when the scene change occurs in the B picture following the B picture of the “M = 3” type. As shown in the figure, the stream has a scene change at B4 following I2, B0, B1, P5, and B3.

この場合も、Ｉ２を前方参照して動き補償がなされるＰ５は、イントラＭＢが多くなる。また、Ｉ２を前方参照しＰ５を後方参照して動き補償がなされるＢ３は、後方予測ＭＢと両方向予測ＭＢが少なくなり、Ｉ２を前方参照しＰ５を後方参照して動き補償がなされるＢ４は、前方予測ＭＢと両方向予測ＭＢが少なくなる。そのため、上述した技術によりＢ４をシーンチェンジとして検出することができる。 Also in this case, intra MB increases in P5 in which motion compensation is performed with reference to I2 forward. Further, B3 in which motion compensation is performed with reference to I2 forward and backward reference to P5 has fewer backward prediction MBs and bi-directional prediction MB, and B4 with motion compensation compensated for forward reference with I2 and backward reference to P5. The forward prediction MB and the bi-directional prediction MB are reduced. Therefore, B4 can be detected as a scene change by the technique described above.

図１１は、「Ｍ＝３」タイプであり、Ｂピクチャに続くＩピクチャでシーンチェンジが生じた場合のストリームの例を示す。図示のように、該ストリームは、Ｉ２、Ｂ０、Ｂ１、Ｐ５、Ｂ３、Ｂ４に続くＩ８でシーンチェンジが生じている。 FIG. 11 shows an example of a stream when the scene change occurs in the I picture following the B picture of the “M = 3” type. As shown in the figure, the stream has a scene change at I8 following I2, B0, B1, P5, B3, and B4.

この場合、Ｐ５を前方参照しＩ８を後方参照して動き補償がなされるＢ６とＢ７は、後方予測ＭＢと両方向予測ＭＢが少なくなる。そのため、上述した技術によりＩ８をシーンチェンジとして検出することができる。 In this case, B6 and B7, in which motion compensation is performed with reference to P5 forward and backward reference to I8, have fewer backward prediction MBs and bidirectional prediction MBs. Therefore, I8 can be detected as a scene change by the technique described above.

特許文献４には、静止画像が類似するか否かを判定するための特徴量の取得技術が開示されている。この技術は、静止画像の縮小画像を生成し、縮小画像に対して周波数解析を行って画像特徴量として直流分および一部の交流分を取得する。また、動画像については、動画像のデータから一部または全部のフレームを取り出してそれぞれの縮小画像を生成し、縮小画像毎に周波数解析を行ってフレーム特徴量として直流分および一部の交流分を取得し、これらのフレーム特徴量を集めて動画像の特徴量とする。 Patent Document 4 discloses a feature amount acquisition technique for determining whether still images are similar. This technique generates a reduced image of a still image, performs frequency analysis on the reduced image, and acquires a direct current component and a partial alternating current component as an image feature amount. Also, for moving images, some or all of the frames are extracted from the moving image data to generate respective reduced images, and frequency analysis is performed for each reduced image to obtain a DC component and a partial AC component as frame feature amounts. Are collected, and these frame feature values are collected as feature values of the moving image.

特開平０６−０２２３０４号公報Japanese Patent Laid-Open No. 06-022304 特表２００５−５０５１６５号公報JP 2005-505165 A 特開平１０−０２３４２１号公報Japanese Patent Laid-Open No. 10-023421 特開２０００−２５９８３２号公報JP 2000-259832 A

ところで、特許文献１−３の技術では、シーンチェンジが検出できない、または誤検出をしてしまう場合がある。 By the way, with the technique of patent document 1-3, a scene change cannot be detected or it may detect incorrectly.

例えば、図１２に示すように、「Ｍ＝１」タイプのストリームで、Ｉピクチャ（図中Ｉ７）でシーンチェンジが生じた場合に、Ｉ７のマクロブロックは元々全てイントラＭＢであるので、ＭＢの符号化タイプの比率の比較によるシーンチェンジ検出ができない。また、Ｉ７の符号量と、前のＧＯＰの先頭のＩピクチャの符号量とは大きく異なるとも限らないので、符号量の比較によるシーンチェンジ検出ができない場合がある。 For example, as shown in FIG. 12, when a scene change occurs in an I picture (I7 in the figure) in a stream of “M = 1” type, since all macroblocks of I7 are originally intra MBs, The scene change cannot be detected by comparing the encoding type ratio. In addition, since the code amount of I7 and the code amount of the leading I picture of the previous GOP are not necessarily greatly different, there are cases where scene change detection cannot be performed by comparing the code amounts.

また、図１３に示すように、「Ｍ＝３」タイプのストリームで、編集しやすいようにｃｌｏｓｅｄＧＯＰ構成が採用されている場合には、ＧＯＰの先頭のＩピクチャ（Ｉ８）に続くＢピクチャ（Ｂ６）でシーンチェンジが生じた場合、Ｂ６が元々直前のＧＯＰの最後のＰピクチャを参照できない制限があるため、ＭＢの符号化タイプの比率の比較によるシーンチェンジ検出ができない。また、図１４に示すように、「Ｍ＝３」タイプのストリームで、シーンチェンジがＩピクチャ（Ｉ６）で生じ、該Ｉピクチャに続くピクチャがＢピクチャではなくＰピクチャ（Ｐ９）である場合においても同様である。 Also, as shown in FIG. 13, when a closed GOP configuration is adopted so as to facilitate editing in an “M = 3” type stream, a B picture (B6) following the first I picture (I8) of the GOP ), There is a restriction that B6 cannot originally refer to the last P picture of the immediately preceding GOP, so that it is not possible to detect a scene change by comparing the ratios of MB coding types. In addition, as shown in FIG. 14, in a stream of “M = 3” type, a scene change occurs in an I picture (I6), and a picture following the I picture is not a B picture but a P picture (P9). Is the same.

また、フェードイン、フェードアウト、クロスフェード、ズームなどフレーム間動き補償では予測しにくい動画像では、予測ＭＢの数や動きベクトルのデータ量などは通常と変わってしまう。例えば、イントラＭＢが多くなったり、Ｂピクチャでは時間的に遠いピクチャを参照するＭＢが少なくなったりする。これらの場合、特許文献１−３の技術では、誤検出が生じてしまう恐れがある。 In addition, in a moving image that is difficult to predict by inter-frame motion compensation, such as fade-in, fade-out, cross-fade, and zoom, the number of predicted MBs and the amount of motion vector data vary from normal. For example, the number of intra MBs increases, or the number of MBs that refer to pictures that are distant in time decreases in the B picture. In these cases, there is a possibility that erroneous detection occurs in the technique of Patent Literatures 1-3.

また、特許文献４の技術を動画像のシーンチェンジ検出に利用することが考えられる。例えば、動画像をデコードして静止画像（フレーム）を得、各フレームに対して周波数解析を行って得た直流分と一部の交流分をフレームの特徴量として取得し、フレーム間の特徴量の差分に基づいてシーンチェンジを検出する。 Further, it is conceivable to use the technique of Patent Document 4 for detecting a scene change of a moving image. For example, a moving image is decoded to obtain a still image (frame), and a DC component obtained by performing frequency analysis on each frame and a part of an AC component are acquired as a frame feature amount, and a feature amount between frames is obtained. A scene change is detected based on the difference between the two.

しかし、上述した再圧縮の場合、各ピクチャの符号化タイプを変えないよう、ストリームにおけるピクチャの並び順（すなわち符号化順）に再圧縮することが望ましいため、再圧縮のためのデコードにより得られた各画像は、通常、表示順ではなく、ストリームにおける並び順すなわち元の符号化順で出力される。これでは、Ｂピクチャが使われるストリームでのシーンチェンジの検出が困難である。たとえば図６に示すストリームではＰ５でシーンチェンジが起こるが、符号化順で特徴量を比較すると、Ｂ１とＰ５、Ｐ５とＢ３、Ｂ４とＩ８の３回特徴量の差分が大きくなることが予想される。 However, in the case of the above-described recompression, it is desirable to recompress in the order of pictures in the stream (that is, the encoding order) so as not to change the encoding type of each picture. The images are normally output in the arrangement order in the stream, that is, in the original encoding order, not in the display order. This makes it difficult to detect a scene change in a stream in which a B picture is used. For example, in the stream shown in FIG. 6, a scene change occurs at P5, but when comparing feature quantities in the encoding order, it is expected that the difference between the three feature quantities B1 and P5, P5 and B3, and B4 and I8 will increase. The

本発明の一つの態様は、フレームあるいはフィールド間動き補償を用いた圧縮方式で圧縮された動画像の画像処理装置である。この画像処理装置は、縮小画像生成部と、周波数成分取得部と、シーンチェンジ度算出部を備える。 One aspect of the present invention is an image processing apparatus for moving images compressed by a compression method using motion compensation between frames or fields. The image processing apparatus includes a reduced image generation unit, a frequency component acquisition unit, and a scene change degree calculation unit.

縮小画像生成部は、前記動画像のデータから、該動画像を構成する複数のフレームの縮小画像を符号化順に取得する。 The reduced image generation unit obtains reduced images of a plurality of frames constituting the moving image from the moving image data in the order of encoding.

周波数成分取得部は、縮小画像生成部により得られた複数の縮小画像の周波数成分をそれぞれ抽出する。 The frequency component acquisition unit extracts the frequency components of the plurality of reduced images obtained by the reduced image generation unit.

シーンチェンジ度算出部は、上記複数の縮小画像の周波数成分を表示順に並び替えて、シーンチェンジの可能性の大小を示すシーンチェンジ度として、連続する２つの縮小画像毎に、または連続する２つのＧＯＰの先頭Ｉピクチャの縮小画像毎に上記差分を求める。 The scene change degree calculation unit rearranges the frequency components of the plurality of reduced images in the order of display, and sets the scene change degree indicating the possibility of a scene change for each of two consecutive reduced images or two consecutive reduced images. The difference is obtained for each reduced image of the first I picture of the GOP.

なお、上記態様の装置を方法やシステムに置き換えて表現したもの、コンピュータを該装置として動作せしめるプログラムなども、本発明の態様としては有効である。 Note that a representation in which the apparatus of the above aspect is replaced with a method or system, a program that causes a computer to operate as the apparatus, and the like are also effective as an aspect of the present invention.

本発明にかかる技術によれば、動画像から確実にシーンチェンジを検出できる。 According to the technique according to the present invention, a scene change can be reliably detected from a moving image.

本発明の第１の実施の形態にかかる録画装置を示す図である。It is a figure which shows the video recording apparatus concerning the 1st Embodiment of this invention. 図１に示す録画装置におけるシーンチェンジ検出部を示す図である。It is a figure which shows the scene change detection part in the video recording apparatus shown in FIG. 図２に示すシーンチェンジ検出部による具体例の処理を説明するための図である。It is a figure for demonstrating the process of the specific example by the scene change detection part shown in FIG. 白フェードアウトとシーンチェンジを含むストリームに対してフレーム毎に求めた各種ＭＢの数の例を示す図である。It is a figure which shows the example of the number of various MB calculated | required for every flame | frame with respect to the stream containing a white fade-out and a scene change. 図２に示すシーンチェンジ検出部により、白フェードアウトとシーンチェンジを含むストリームに対して求めた、隣接するフレーム間の周波数成分の差分を示す図である。It is a figure which shows the difference of the frequency component between the adjacent frames calculated | required with respect to the stream containing a white fade-out and a scene change by the scene change detection part shown in FIG. 本発明の第２の実施の形態にかかる録画装置におけるシーンチェンジ検出部を示す図である。It is a figure which shows the scene change detection part in the video recording apparatus concerning the 2nd Embodiment of this invention. 従来技術によりシーンチェンジ検出が可能なストリームの例を示す図である（その１）。It is a figure which shows the example of the stream which can detect a scene change by a prior art (the 1). 従来技術によりシーンチェンジ検出が可能なストリームの例を示す図である（その２）。It is a figure which shows the example of the stream which can detect a scene change by a prior art (the 2). 従来技術によりシーンチェンジ検出が可能なストリームの例を示す図である（その３）。It is a figure which shows the example of the stream which can detect a scene change by a prior art (the 3). 従来技術によりシーンチェンジ検出が可能なストリームの例を示す図である（その４）。It is a figure which shows the example of the stream which can detect a scene change by a prior art (the 4). 従来技術によりシーンチェンジ検出が可能なストリームの例を示す図である（その５）。It is a figure which shows the example of the stream which can detect a scene change by a prior art (the 5). 従来技術によりシーンチェンジ検出が困難なストリームの例を示す図である（その１）。It is a figure which shows the example of the stream for which a scene change detection is difficult with the prior art (the 1). 従来技術によりシーンチェンジ検出が困難なストリームの例を示す図である（その２）。It is a figure which shows the example of the stream for which a scene change detection is difficult with the prior art (the 2). 従来技術によりシーンチェンジ検出が困難なストリームの例を示す図である（その３）。It is a figure which shows the example of the stream for which a scene change detection is difficult with the prior art (the 3).

以下、図面を参照して本発明の実施の形態について説明する。説明の明確化のため、以下の記載及び図面は、適宜、省略、及び簡略化がなされている。また、様々な処理を行う機能ブロックとして図面に記載される各要素は、ハードウェア的には、ＣＰＵ、メモリ、その他の回路で構成することができ、ソフトウェア的には、メモリにロードされたプログラムなどによって実現される。
＜第１の実施の形態＞ Embodiments of the present invention will be described below with reference to the drawings. For clarity of explanation, the following description and drawings are omitted and simplified as appropriate. Each element described in the drawings as a functional block for performing various processes can be configured by a CPU, a memory, and other circuits in terms of hardware, and a program loaded in the memory in terms of software. Etc.
<First Embodiment>

図１は、本発明の第１の実施の形態にかかる録画装置１００を示す。録画装置１００は、動画像ここではデジタル放送のＭＰＥＧ−２ストリームを録画するものであり、録画に際して、Ｈ．２６４方式で再圧縮を行う。図１に示すように、録画装置１００は、録画部１１０とシーンチェンジ検出部１２０を有する。 FIG. 1 shows a recording apparatus 100 according to a first embodiment of the present invention. The recording apparatus 100 records an MPEG-2 stream of a moving image, here a digital broadcast. Recompress with H.264 method. As shown in FIG. 1, the recording device 100 includes a recording unit 110 and a scene change detection unit 120.

録画部１１０は、ＭＰＥＧ−２デコーダ１１２とＨ．２６４エンコーダ１１４を備える。ＭＰＥＧ−２デコーダ１１２は、ＭＰＥＧ−２ストリームをデコードしてＨ．２６４エンコーダ１１４に出力する。Ｈ．２６４エンコーダ１１４は、ＭＰＥＧ−２デコーダ１１２から出力された動画像に対してＨ．２６４方式で再圧縮を行う。Ｈ．２６４エンコーダ１１４は、再圧縮に際して、シーンチェンジ検出部１２０からのシーンチェンジ検出情報を参照する。 The recording unit 110 includes an MPEG-2 decoder 112 and an H.264 decoder. H.264 encoder 114 is provided. The MPEG-2 decoder 112 decodes the MPEG-2 stream and decodes the H.264. H.264 encoder 114. H. The H.264 encoder 114 applies H.264 to the moving image output from the MPEG-2 decoder 112. Recompress with H.264 method. H. The H.264 encoder 114 refers to the scene change detection information from the scene change detection unit 120 during recompression.

シーンチェンジ検出部１２０は、ＭＰＥＧ−２ストリームに対してシーンチェンジ検出を行い、シーンチェンジの箇所（ピクチャ）を示すシーンチェンジ検出情報をＨ．２６４エンコーダ１１４に供する。 The scene change detection unit 120 performs scene change detection for the MPEG-2 stream, and sets the scene change detection information indicating the scene change location (picture) as H.264. H.264 encoder 114.

図２は、シーンチェンジ検出部１２０を示す。シーンチェンジ検出部１２０は、縮小画像生成部１３０と、周波数成分取得部１５０と、シーンチェンジ度算出部１６０と、シーンチェンジ判定部１８０を備える。 FIG. 2 shows the scene change detection unit 120. The scene change detection unit 120 includes a reduced image generation unit 130, a frequency component acquisition unit 150, a scene change degree calculation unit 160, and a scene change determination unit 180.

縮小画像生成部１３０は、ＭＰＥＧ−２ストリームを部分的にデコードし、該ストリームが表わす動画像を構成する複数のピクチャの縮小画像を符号化順に取得して周波数成分取得部１５０に出力する。周波数成分取得部１５０は、縮小画像生成部１３０からの各縮小画像の周波数成分をそれぞれ抽出してシーンチェンジ度算出部１６０に出力する。シーンチェンジ度算出部１６０は、周波数成分取得部１５０により得られた各縮小画像の周波数成分を表示順に並び替えて、連続する２つの画像毎に、周波数成分の差分を求める。これらの差分は、シーンチェンジの可能性の大小を示すシーンチェンジ度としてシーンチェンジ度算出部１６０からシーンチェンジ判定部１８０に出力される。シーンチェンジ判定部１８０は、シーンチェンジ度算出部１６０が算出したシーンチェンジ度を所定の閾値と比較することによりシーンチェンジの有無を判定する。具体的には、シーンチェンジ度が上記閾値以上であれば、シーンチェンジが生じたと判定すると共に、該当箇所を示すシーンチェンジ検出情報をＨ．２６４エンコーダ１１４に出力する。 The reduced image generation unit 130 partially decodes the MPEG-2 stream, acquires reduced images of a plurality of pictures constituting the moving image represented by the stream, and outputs them to the frequency component acquisition unit 150. The frequency component acquisition unit 150 extracts the frequency components of each reduced image from the reduced image generation unit 130 and outputs them to the scene change degree calculation unit 160. The scene change degree calculation unit 160 rearranges the frequency components of the respective reduced images obtained by the frequency component acquisition unit 150 in the display order, and obtains a difference between the frequency components for every two consecutive images. These differences are output from the scene change degree calculation unit 160 to the scene change determination unit 180 as a scene change degree indicating the possibility of a scene change. The scene change determination unit 180 determines the presence or absence of a scene change by comparing the scene change degree calculated by the scene change degree calculation unit 160 with a predetermined threshold. Specifically, if the degree of scene change is equal to or greater than the above threshold, it is determined that a scene change has occurred, and scene change detection information indicating the corresponding location is stored in H.264. H.264 encoder 114.

図３に示す具体例を参照して、シーンチェンジ検出部１２０の各機能ブロックの動作を説明する。
縮小画像生成部１３０は、部分可変長復号部１３２と、部分逆量子化部１３４と、平均成分復元部１３６と、加算器１３８と、セレクタ１４０と、予測画素作成部１４２、フレームメモリ１４４、縮小ピクチャ生成部１４６を備える。なお、縮小画像生成部１３０は、ＭＰＥＧ−２ストリームが表わす動画像の各ピクチャの低周波成分のみをデコードする点を除き、通常のＭＰＥＧ−２デコーダと同様の動作をするため、ここでは簡単に説明する。 The operation of each functional block of the scene change detection unit 120 will be described with reference to the specific example shown in FIG.
The reduced image generation unit 130 includes a partial variable length decoding unit 132, a partial inverse quantization unit 134, an average component restoration unit 136, an adder 138, a selector 140, a prediction pixel creation unit 142, a frame memory 144, a reduction A picture generation unit 146 is provided. The reduced image generation unit 130 operates in the same manner as a normal MPEG-2 decoder except that only the low frequency components of each picture of a moving image represented by the MPEG-2 stream are decoded. explain.

部分可変長復号部１３２は、ピクチャヘッダからピクチャ符号化タイプや量子化マトリクスなど、低周波数成分のデコードに必要なデータを復号し、また、ＭＢヘッダ毎にＭＢ符号化タイプ、動きベクトル、量子化幅を復号する。 The partial variable length decoding unit 132 decodes data necessary for decoding low frequency components such as a picture coding type and a quantization matrix from the picture header, and also MB coding type, motion vector, quantization for each MB header. Decode width.

部分逆量子化部１３４と平均成分復元部１３６は、ブロック毎に最初のＤＣＴ係数を復号する。具体的には、イントラＭＢでは、周波数（０，０）成分となるＤＣ係数予測誤差を復号し、イントラＭＢ以外では予測誤差係数の周波数（０，０）成分を復号する。なお、イントラＭＢの場合、ＤＣ係数予測値を用いてＤＣ係数を復号する。 The partial inverse quantization unit 134 and the average component restoration unit 136 decode the first DCT coefficient for each block. Specifically, in the intra MB, a DC coefficient prediction error that is a frequency (0,0) component is decoded, and in a frequency other than the intra MB, the frequency (0,0) component of the prediction error coefficient is decoded. In addition, in the case of intra MB, a DC coefficient is decoded using a DC coefficient prediction value.

加算器１３８は、平均成分復元部１３６の出力とセレクタ１４０の出力を加算してブロックの平均成分を得る。セレクタ１４０は、イントラＭＢについてイントラ予測値（８ｂｉｔ画像の場合は１２８）を出力し、イントラＭＢ以外については予測画素作成部１４２が作成した予測値を出力する。すなわち、加算器１３８は、イントラＭＢについて、平均成分復元部１３６からのＤＣ成分と、セレクタ１４０からのイントラ予測値を加算して該イントラＭＢの各ブロックの平均成分を得る。イントラＭＢ以外については、平均成分復元部１３６からの予測誤差係数の周波数（０，０）成分と、セレクタ１４０からの、予測画素作成部１４２が作成した予測値とを加算して該ＭＢの各ブロックの平均成分を得る。 The adder 138 adds the output of the average component restoration unit 136 and the output of the selector 140 to obtain the average component of the block. The selector 140 outputs an intra prediction value (128 for an 8-bit image) for an intra MB, and outputs a prediction value created by the prediction pixel creation unit 142 for other than the intra MB. That is, the adder 138 adds the DC component from the average component restoration unit 136 and the intra prediction value from the selector 140 to obtain the average component of each block of the intra MB. Except for the intra MB, the frequency (0, 0) component of the prediction error coefficient from the average component restoration unit 136 and the prediction value created by the prediction pixel creation unit 142 from the selector 140 are added to each of the MBs. Get the average component of the block.

予測画素作成部１４２は、イントラＭＢ以外のＭＢについて、部分可変長復号部１３２が得た動きベクトルを１／８に縮小した動きベクトルと、フレームメモリ１４４に格納された前方参照画像または後方参照画像とを用いて予測値を作成する。ここで動きベクトルを１／８に縮小するのは、８画素×８ラインのブロックを平均成分からなる１画素に縮小するのに合わせるためである。 The prediction pixel creation unit 142 performs a motion vector obtained by reducing the motion vector obtained by the partial variable length decoding unit 132 to 1/8 for an MB other than the intra MB, and a forward reference image or a backward reference image stored in the frame memory 144. The predicted value is created using and. The reason why the motion vector is reduced to 1/8 here is to match the block of 8 pixels × 8 lines to 1 pixel consisting of an average component.

縮小ピクチャ生成部１４６は、ピクチャ毎に、加算器１３８から出力された各ブロックの平均成分を集めて縮小画像を生成する。これらは、各ピクチャの縮小画像である。ブロックのサイズはが８画素×８ラインであるので、縮小ピクチャ生成部１４６が作成した画像の大きさは、完全に復号した場合の１／８×１／８である。例えば、入力されたＭＰＥＧ−２ストリームの各ピクチャが１９２０画素×１０８０ラインであるときに、縮小ピクチャ生成部１４６から出力された縮小画像は、輝度（Ｙ）は２４０画素×１３６ラインであり、色差（Ｃｂ、Ｃｒ）は１２０画素×６８ラインである。 The reduced picture generation unit 146 collects the average components of each block output from the adder 138 and generates a reduced image for each picture. These are reduced images of each picture. Since the block size is 8 pixels × 8 lines, the size of the image created by the reduced picture generation unit 146 is 1/8 × 1/8 when completely decoded. For example, when each picture of the input MPEG-2 stream is 1920 pixels × 1080 lines, the reduced image output from the reduced picture generation unit 146 has a luminance (Y) of 240 pixels × 136 lines, and a color difference (Cb, Cr) is 120 pixels × 68 lines.

縮小ピクチャ生成部１４６は、生成した各縮小画像を順次周波数成分取得部１５０に出力する。また、縮小ピクチャ生成部１４６は、ＩピクチャとＰピクチャについては、それらの縮小画像をフレームメモリ１４４にも出力する。これらの縮小画像は、後続のＰピクチャまたはＢピクチャの前方参照画像または後方参照画像として予測画素作成部１４２に用いられる。 The reduced picture generation unit 146 sequentially outputs the generated reduced images to the frequency component acquisition unit 150. The reduced picture generation unit 146 also outputs reduced images of the I picture and P picture to the frame memory 144. These reduced images are used by the prediction pixel creation unit 142 as a forward reference image or backward reference image of the subsequent P picture or B picture.

図３に示すように、縮小画像生成部１３０からの縮小画像の出力順は、ＭＰＥＧ−２ストリームにおける並び順すなわち符号化順と同一である。 As shown in FIG. 3, the output order of the reduced images from the reduced image generation unit 130 is the same as the arrangement order in the MPEG-2 stream, that is, the encoding order.

周波数成分取得部１５０は、ブロック化部１５２と順方向ＤＣＴ変換部１５４を有し、縮小ピクチャ生成部１４６からの各縮小画像の周波数成分をそれぞれ抽出してシーンチェンジ度算出部１６０に供する。 The frequency component acquisition unit 150 includes a blocking unit 152 and a forward DCT conversion unit 154. The frequency component acquisition unit 150 extracts the frequency components of each reduced image from the reduced picture generation unit 146 and supplies the extracted frequency components to the scene change degree calculation unit 160.

ブロック化部１５２は、縮小画像のＹ、Ｃｂ、Ｃｒをそれぞれ８画素×８画素のブロックに縮小して順方向ＤＣＴ変換部１５４に出力する。例えば、入力されたＭＰＥＧ−２ストリームが１９２０画素×１０８０ラインの場合には、縮小画像のＹを３０画素×１６ライン単位の領域に分割して各領域の平均値を求め、Ｃｂ、Ｃｒについては１５画素×８ライン単位の領域に分割して各領域の平均値を求める。 The blocking unit 152 reduces Y, Cb, and Cr of the reduced image into blocks of 8 pixels × 8 pixels, respectively, and outputs them to the forward DCT conversion unit 154. For example, when the input MPEG-2 stream is 1920 pixels × 1080 lines, Y of the reduced image is divided into regions of 30 pixels × 16 lines and the average value of each region is obtained. The average value of each area is obtained by dividing the area into 15 pixels × 8 lines.

順方向ＤＣＴ変換部１５４は、８×８ＤＣＴ変換を行って縮小画像毎に、Ｙ、Ｃｂ、Ｃｒにつきそれぞれ８×８個の変換係数（ＤＣＴ係数）を求めてシーンチェンジ度算出部１６０に出力する。すなわち、本実施の形態において、周波数成分取得部１５０は、ＤＣＴ係数を周波数成分として取得する。 The forward DCT conversion unit 154 performs 8 × 8 DCT conversion, obtains 8 × 8 conversion coefficients (DCT coefficients) for Y, Cb, and Cr for each reduced image, and outputs them to the scene change degree calculation unit 160. . That is, in the present embodiment, frequency component acquisition section 150 acquires a DCT coefficient as a frequency component.

図３に示すように、周波数成分取得部１５０からの周波数成分の出力順も、符号化順である。 As shown in FIG. 3, the output order of frequency components from the frequency component acquisition unit 150 is also the encoding order.

シーンチェンジ度算出部１６０は、バッファ１６２、バッファ１６４、セレクタ１６６、バッファ１６８、差分算出部１７０を備える。 The scene change degree calculation unit 160 includes a buffer 162, a buffer 164, a selector 166, a buffer 168, and a difference calculation unit 170.

バッファ１６２、バッファ１６４、セレクタ１６６は、協働して、周波数成分取得部１５０から出力された周波数成分を表示順に並べ替えて差分算出部１７０とバッファ１６８に出力する。図３に示すように、セレクタ１６６からは、各周波数成分は、表示順に出力されている。 The buffer 162, the buffer 164, and the selector 166 cooperate to rearrange the frequency components output from the frequency component acquisition unit 150 in the display order and output them to the difference calculation unit 170 and the buffer 168. As shown in FIG. 3, each frequency component is output from the selector 166 in the display order.

差分算出部１７０は、セレクタ１６６から現在出力してきた縮小画像の周波数成分と、バッファ１６８に格納された１つ前の縮小画像の周波数成分との差分を求めてシーンチェンジ判定部１８０に出力する。図３に示すように、バッファ１６８は、セレクタ１６６が現在出力した縮小画像の周波数成分の１つ前の縮小画像の周波数成分を出力するようになっている。 The difference calculation unit 170 obtains a difference between the frequency component of the reduced image currently output from the selector 166 and the frequency component of the previous reduced image stored in the buffer 168 and outputs the difference to the scene change determination unit 180. As shown in FIG. 3, the buffer 168 outputs the frequency component of the reduced image immediately before the frequency component of the reduced image currently output by the selector 166.

なお、本実施の形態において、差分算出部１７０は、周波数が低いほど重み付け係数が大きくなるように周波数成分を重み付けした上で、２つの縮小画像間の周波数成分の差分を求める。具体的には、下記の式（１）に従って差分を求める。なお、周波数成分は６４成分あるが、高周波成分の重み付け係数を０にして、格納する周波数成分の数を減らすこともできる。
差分＝Σw(m,n)*|Fi(m,n)−Fi(m,n)| (1)
但し，Ｆi(m,n)：縮小画像ｉの周波数(m,n)成分
Ｆj(m,n)：縮小画像jの周波数(m,n)成分
ｗ(m,n)：周波数ｗ(m,n)の重み付け係数 In the present embodiment, the difference calculation unit 170 weights the frequency component so that the weighting coefficient becomes larger as the frequency is lower, and obtains the difference between the frequency components between the two reduced images. Specifically, the difference is obtained according to the following equation (1). Although there are 64 frequency components, it is possible to reduce the number of frequency components to be stored by setting the weighting coefficient of the high frequency component to 0.
Difference = Σw (m, n) * | Fi (m, n) −Fi (m, n) | (1)
Where Fi (m, n): frequency (m, n) component of the reduced image i
Fj (m, n): Frequency (m, n) component of reduced image j
w (m, n): Weighting factor for frequency w (m, n)

シーンチェンジ判定部１８０は、シーンチェンジ度算出部１６０が算出したシーンチェンジ度を閾値と順次比較し、シーンチェンジ度が上記閾値以上であれば、シーンチェンジが生じたと判定すると共に、該当箇所を示すシーンチェンジ検出情報をＨ．２６４エンコーダ１１４に出力する。 The scene change determination unit 180 sequentially compares the scene change degree calculated by the scene change degree calculation unit 160 with a threshold value. If the scene change degree is equal to or greater than the threshold value, the scene change determination unit 180 determines that a scene change has occurred and indicates the corresponding part. The scene change detection information is H.264. H.264 encoder 114.

本実施の形態の録画装置１００によれば、動画像データから生成した縮小画像の周波数成分を取得して表示順に並び変えることによって、表示順で連続する２つのフレーム間の周波数成分の差分によりシーンチェンジ検出を可能にしたため、確実にシーンチェンジを検出できる。 According to the recording apparatus 100 of the present embodiment, scenes are obtained based on differences in frequency components between two consecutive frames in display order by acquiring frequency components of reduced images generated from moving image data and rearranging them in display order. Because change detection is possible, scene changes can be detected reliably.

図４は、ある１９２０画素×１０８０ラインの画像のストリームに対して、ピクチャ毎に求められた各種ＭＢの数を示す。図中横軸は、分かりやすいように、表示順のピクチャ番号を示す。このストリームは、ピクチャ番号３、１８、３３、４８、５１がＩピクチャの「Ｍ＝３」タイプである。また、このストリームは、４３番ピクチャ〜５３番ピクチャあたりまでが白フェードアウトであり、５７番のＰピクチャでシーンチェンジが生じている。 FIG. 4 shows the number of various MBs obtained for each picture with respect to an image stream of a certain 1920 pixels × 1080 lines. In the figure, the horizontal axis indicates picture numbers in display order for easy understanding. In this stream, picture numbers 3, 18, 33, 48, and 51 are “M = 3” type of I picture. Further, in this stream, white fade-out is performed from the 43rd picture to the 53rd picture, and a scene change occurs in the 57th P picture.

図示のように、フェードアウトの箇所とシーンチェンジが生じた箇所間で、イントラＭＢの数や、予測ＭＢの数などの差が不明確である。たとえば４５番と５４番のＰピクチャではイントラＭＢがそれぞれ過半数の約４０００個、５４００個と多くなり、従来技術ではシーンチェンジとみなされる可能性がある。また５２番のＢピクチャでも逆方向および両方向ＭＢ数が通常より少なく、シーンチェンジとみなされる可能性がある。またこのストリームは１７、３２番のようにＧＯＰの最終のＢピクチャで順方向予測ＭＢ及び両方向予測ＭＢが通常より少なくなっており、やはりシーンチェンジとみなされる可能性がある。 As shown in the figure, the difference in the number of intra MBs, the number of predicted MBs, etc. is unclear between the fade-out part and the part where the scene change has occurred. For example, in the 45th and 54th P-pictures, the number of intra MBs increases to approximately 4000 and 5400, which are the majority, respectively, and may be regarded as a scene change in the prior art. In the 52nd B picture, the number of backward and bidirectional MBs is smaller than usual, and there is a possibility of being regarded as a scene change. In addition, this stream has the forward prediction MB and the bidirectional prediction MB less than usual in the last B picture of GOP as in Nos. 17 and 32, and may be regarded as a scene change.

図５は、本実施の形態の録画装置１００により同一のストリームに対して処理を行った結果、差分算出部１７０が得た周波数成分の差分を示す。図から分かるように、シーンチェンジ箇所の差分が、他の箇所（白フェードアウトの箇所を含む）に対して算出した差分より突出した大きさを有する。そのため、シーンチェンジの検出精度が高い。また白フェードアウト部分も他の部分と比較して差分が大きく、識別可能であり、再圧縮の際の符号化情報として利用可能である。
＜第２の実施の形態＞ FIG. 5 shows the frequency component difference obtained by the difference calculation unit 170 as a result of processing the same stream by the recording apparatus 100 of the present embodiment. As can be seen from the figure, the difference in the scene change location has a size that protrudes from the difference calculated for other locations (including the white fade-out location). Therefore, the scene change detection accuracy is high. Also, the white fade-out portion has a larger difference than other portions, can be identified, and can be used as encoded information in recompression.
<Second Embodiment>

本発明の第２の実施の形態も録画装置である。この録画装置は、シーンチェンジ検出部が録画装置１００のシーンチェンジ検出部１２０と異なる点を除き、録画装置１００と同様の構成を有する。ここでは本第２の実施の形態にかかる録画装置におけるシーンチェンジ検出部２２０についてのみ説明する。 The second embodiment of the present invention is also a recording apparatus. This recording apparatus has the same configuration as the recording apparatus 100 except that the scene change detection unit is different from the scene change detection unit 120 of the recording apparatus 100. Here, only the scene change detection unit 220 in the recording apparatus according to the second embodiment will be described.

図６は、シーンチェンジ検出部２２０を示す。なお、図６において、図２に示すシーンチェンジ検出部１２０のものと同一の構成要素に対して同一の符号を付与し、これらの構成要素の説明を省略する。 FIG. 6 shows the scene change detection unit 220. In FIG. 6, the same components as those of the scene change detection unit 120 shown in FIG. 2 are assigned the same reference numerals, and description of these components is omitted.

図６に示すように、シーンチェンジ検出部２２０のシーンチェンジ度算出部２６０は、シーンチェンジ検出部１２０のシーンチェンジ度算出部１６０と異なる。また、シーンチェンジ度算出部２６０は、セレクタ１６６とバッファ１６８との間に演算器２６２が追加された点を除き、シーンチェンジ度算出部１６０と同様である。 As shown in FIG. 6, the scene change degree calculation unit 260 of the scene change detection unit 220 is different from the scene change degree calculation unit 160 of the scene change detection unit 120. The scene change degree calculation unit 260 is the same as the scene change degree calculation unit 160 except that a calculator 262 is added between the selector 166 and the buffer 168.

演算器２６２は、セレクタ１６６の出力（現ピクチャの周波数成分「ｉ」）と、バッファ１６８の出力（過去の周波数成分「ｉ」）が入力され、それらの平均値（周波数成分「ｉ」）を算出してバッファ１６８に出力するものである。すなわち、演算器２６２とバッファ１６８は、セレクタ１６６から出力された各周波数成分に対する時間方向の無限インパルス応答フィルタを構成する。演算器２６２とバッファ１６８により実現される処理は、式（２）に示す。 The calculator 262 receives the output of the selector 166 (frequency component “i” of the current picture) and the output of the buffer 168 (past frequency component “i”), and calculates the average value (frequency component “i”). This is calculated and output to the buffer 168. That is, the calculator 262 and the buffer 168 constitute an infinite impulse response filter in the time direction for each frequency component output from the selector 166. The processing realized by the calculator 262 and the buffer 168 is shown in Expression (2).

周波数成分[i]＝現ピクチャの周波数成分[i]／２＋過去の周波数成分[i]）／２（２） Frequency component [i] = frequency component of current picture [i] / 2 + past frequency component [i]) / 2 (2)

第１の実施の形態の録画装置１００において、シーンチェンジ度算出部１６０は、セレクタ１６６から現在出力された縮小画像の周波数成分（現ピクチャ周波数成分「ｉ」）と、バッファ１６８に格納された１つ前の縮小画像の周波数成分との差分を求めている。ところでストリームによっては、照明が変化したりして、短時間で明るさや色調が変わることもある。このようなストリームでは、ピクチャ毎の周波数成分の低周波数成分が変動し、録画装置１００においてもシーンチェンジを誤検出する可能性がある。本第２の実施の形態の録画装置は、このようなストリームにも対応できるようにしたものであり、そのシーンチェンジ度算出部２６０において、セレクタ１６６が出力した各周波数成分に時間方向の無限インパルス応答フィルタをかけてから差分算出部１７０に出力する。 In the recording apparatus 100 according to the first embodiment, the scene change degree calculation unit 160 includes the frequency component (current picture frequency component “i”) of the reduced image currently output from the selector 166 and 1 stored in the buffer 168. The difference from the frequency component of the previous reduced image is obtained. By the way, depending on the stream, the lighting or the color tone may change in a short time due to a change in illumination. In such a stream, the low-frequency component of the frequency component for each picture fluctuates, and there is a possibility that the recording apparatus 100 may erroneously detect a scene change. The recording apparatus according to the second embodiment can cope with such a stream. In the scene change degree calculation unit 260, an infinite impulse in the time direction is applied to each frequency component output by the selector 166. After applying the response filter, the difference is output to the difference calculation unit 170.

こうすることにより、直前の１枚のピクチャの周波数成分ではなく、過去のピクチャの周波数成分が加味された周波数成分との差分が求められるため、照明が変化する画像に対してもシーンチェンジ検出の精度を上げることができる。 In this way, the difference between the frequency component of the previous picture and the frequency component of the previous picture is obtained instead of the frequency component of the previous picture. The accuracy can be increased.

以上、実施の形態をもとに本発明を説明した。実施の形態は例示であり、本発明の主旨から逸脱しない限り、上述した実施の形態に対してさまざまな変更、増減を行ってもよい。これらの変更、増減が行われた変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described above based on the embodiment. The embodiment is an exemplification, and various modifications and changes may be made to the above-described embodiment without departing from the gist of the present invention. It will be understood by those skilled in the art that modifications in which these changes and increases / decreases are also within the scope of the present invention.

例えば、ビデオ録画装置では、録画したビデオのシーンの検索や、ビデオ編集の開始点とするためにシーンチェンジが生じたＧＯＰの検出がなされている。ビデオの編集は、ＧＯＰ単位で行われることが多いので、必ずしもピクチャ単位で行われるとは限らない。そのため、編集のためのシーンチェンジ検出を行う際に、シーンチェンジ検出部１２０またはシーンチェンジ検出部２２０の手法と同様に縮小画像を生成して周波数成分を抽出し、連続する２つのＧＯＰの先頭Ｉピクチャの縮小画像間で周波数成分の差分を求めて閾値と比較することによりシーンチェンジが生じたＧＯＰの検出を行うことができる。ここでＩピクチャを選ぶのは、部分デコードがＰピクチャやＢピクチャより簡単なためである。 For example, in a video recording apparatus, a GOP in which a scene change has occurred is detected in order to search for a recorded video scene or to start video editing. Since video editing is often performed in GOP units, it is not always performed in picture units. Therefore, when scene change detection for editing is performed, a reduced image is generated and frequency components are extracted in the same manner as the method of the scene change detection unit 120 or the scene change detection unit 220, and the leading I of two consecutive GOPs is extracted. A GOP in which a scene change has occurred can be detected by obtaining a difference between frequency components between reduced images of a picture and comparing the difference with a threshold value. The reason why the I picture is selected here is that partial decoding is easier than the P picture and B picture.

また、ビデオ録画装置では、ストリームを録画しながら別ストリームをデコードして再生する場合や、複数のストリームを録画する場合があり、これらの場合にはＭＰＥＧ２デコーダの性能制限により、録画するストリームを全てビデオにデコードできない、すなわち再圧縮できないことがある。この場合には、入力ストリームをいったんそのまま記録しておき、後で再圧縮するが、このような場合でも、ビデオ編集のためにあらかじめシーンチェンジ検出ができていると便利である。そのため、入力ストリームの記録時に、シーンチェンジ検出部１２０の手法と同様にシーンチェンジ検出を行って、シーンチェンジ情報を付属情報としてストリームに付属させるようにすればよい。本発明の方法は、ハードウェア、ソフトウエアのどちらで実現するにしても、ＭＰＥＧ２デコーダよりは必要な処理が少なく、実現しやすいという特徴がある。 The video recording device may decode and reproduce another stream while recording a stream, or may record a plurality of streams. In these cases, all the streams to be recorded are recorded due to performance limitations of the MPEG2 decoder. Sometimes video cannot be decoded, ie not recompressed. In this case, the input stream is recorded once as it is and then recompressed later. Even in such a case, it is convenient if the scene change can be detected in advance for video editing. Therefore, at the time of recording the input stream, scene change detection may be performed in the same manner as the method of the scene change detection unit 120 so that the scene change information is attached to the stream as attached information. The method of the present invention is characterized in that it requires less processing than the MPEG2 decoder and is easy to implement whether it is implemented by hardware or software.

１００録画装置
１１０録画部
１１２ＭＰＥＧ−２デコーダ
１１４Ｈ．２６４エンコーダ
１２０シーンチェンジ検出部
１３０縮小画像生成部
１３２部分可変長復号部
１３４部分逆量子化部
１３６平均成分復元部
１３８加算器
１４０セレクタ
１４２予測画素作成部
１４４フレームメモリ
１４６縮小ピクチャ生成部
１５０周波数成分取得部
１５２ブロック化部
１５４順方向ＤＣＴ変換部
１６０シーンチェンジ度算出部
１６２バッファ
１６４バッファ
１６６セレクタ
１６８バッファ
１７０差分算出部
１８０シーンチェンジ判定部
２２０シーンチェンジ検出部
２６０シーンチェンジ度算出部
２６２演算器 100 Recording Device 110 Recording Unit 112 MPEG-2 Decoder 114 H.264 H.264 encoder 120 Scene change detection unit 130 Reduced image generation unit 132 Partial variable length decoding unit 134 Partial inverse quantization unit 136 Average component restoration unit 138 Adder 140 Selector 142 Predicted pixel creation unit 144 Frame memory 146 Reduced picture generation unit 150 Frequency component Acquisition unit 152 Blocking unit 154 Forward DCT conversion unit 160 Scene change degree calculation unit 162 Buffer 164 Buffer 166 Selector 168 Buffer 170 Difference calculation unit 180 Scene change determination unit 220 Scene change detection unit 260 Scene change degree calculation unit 262 Calculator

Claims

A reduced image generating unit that acquires reduced images of a plurality of pictures constituting the moving image from the moving image data compressed by a compression method using interframe or field motion compensation;
A frequency component acquisition unit that respectively extracts frequency components of the plurality of reduced images obtained by the reduced image generation unit;
The frequency components of the plurality of reduced images are rearranged in the display order, and the scene change degree indicating the magnitude of the possibility of scene change is reduced for each two consecutive reduced images or for the first I picture of two consecutive GOPs. An image processing apparatus comprising: a scene change degree calculation unit that obtains a difference between frequency components acquired by the frequency component acquisition unit for each image.

2. The image according to claim 1, wherein the scene change degree calculation unit obtains the difference after weighting the frequency component acquired by the frequency component acquisition unit so that the weighting coefficient increases as the frequency decreases. Processing equipment.

3. The image processing according to claim 1, wherein the scene chain degree calculation unit obtains the difference after performing infinite impulse response filtering in a time direction on the frequency components rearranged in the display order. apparatus.

4. The image processing according to claim 1, wherein the frequency component acquisition unit performs forward DCT conversion on the plurality of reduced images and acquires a DCT coefficient as a frequency component. 5. apparatus.

The moving picture is encoded with the orthogonal transform coefficient of a block in a picture, or the orthogonal transform coefficient of a prediction error block obtained by performing motion compensation between frames or fields with reference to past and / or future pictures. Compressed with the compression method
The image processing apparatus according to claim 4, wherein the reduced image generation unit obtains the reduced image by partially decoding orthogonal transform coefficients of each picture of the moving image.

The scene change determination unit according to any one of claims 1 to 5, further comprising a scene change determination unit that determines that a scene change has occurred when the scene change degree calculated by the scene change degree calculation unit is equal to or greater than a predetermined threshold. The image processing apparatus according to item.