JP2008005536A

JP2008005536A - Method and apparatus for detecting scene cuts in block-based video coding system

Info

Publication number: JP2008005536A
Application number: JP2007211889A
Authority: JP
Inventors: Prashanth Kuchibhotla; プラシャントゥクチボートラ，
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 1996-02-26
Filing date: 2007-08-15
Publication date: 2008-01-10

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and an apparatus for detecting scene cuts. <P>SOLUTION: A scene change detector compares predicted macroblocks from an anchor image 140 to input macroblocks from an input image 120 on a macroblock-by-macroblock basis to generate a residual macroblock 142 representing the difference between each predicted macroblock and each input macroblock. A variance 146 for each residual macroblock and a variance 148 for each input macroblock are computed after each comparison. The residual variance is compared to the input macroblock variance. Whenever the variance of the residual macroblock exceeds the variance of the input macroblock, a counter 152 is incremented. A scene cut detector repeats this process until each macroblock in the predicted image is compared to each input macroblock. If the count value ever exceeds a threshold level 154 while an input image is being processed, the scene cut detector sets a scene cut indicator flag. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明はブロックベースのビデオコーディング技術に関し、より詳細には、本発明は、ブロックベースのビデオコーディングシステム内のビデオシーケンスにおけるシーンカットを検出する方法と装置に関する。 The present invention relates to block-based video coding techniques, and more particularly, the present invention relates to a method and apparatus for detecting scene cuts in a video sequence within a block-based video coding system.

ブロックベースのビデオコーディングシステムは通常、画像内（ピクチャー内）と画像のシーケンスの中の画像間（ピクチャー間）の空間的、時間的な両冗長度を利用するコーディング技術を使用する。このブロックベースの画像コーディングシステムは、周知のムービング・ピクチャー・エキスパート・グループ（ＭＰＥＧ）（ビデオコーディングの規格）、すなわちＩＳＯ／ＩＥＣ国際規格１１１７２−２（１９９４）（一般にＭＰＥＧ−１と称する）と、１３８１８−２（１９９５年１月２０日草案）（一般にＭＰＥＧ−２と称する）を利用するものを含む。入力ビデオシーケンスにおける冗長度を利用してビデオシーケンスを伝達可能なビットストリームの形に効率的にコーディングするために、ブロックベースのコーディング技術は、入力ビデオシーケンス内の逐次ピクチャーが実質的に類似した情報を含む、すなわち画像シーンはピクチャーからピクチャーへと殆ど変化のないものと仮定している。ピクチャーシーケンスで起こるシーンカットは、効率的なコーディングのための基本的仮定に違反する。従って、シーン変更（シーンカット）の後、ブロックベースのコーディング技術は、シーン変更に続く最初のピクチャーをコーディングするために大量のビットを使用しなければならない。任意の１画像のコーディングに利用できるビットの数は通常、限られているので、シーンカットはコーディングに実質的なエラーを発生させて、復号されたピクチャーの実質的な歪みをもたらす場合がある。 Block-based video coding systems typically use coding techniques that take advantage of both spatial and temporal redundancy within an image (within a picture) and between images (within a picture) (inter-picture). This block-based image coding system is known as the Moving Picture Expert Group (MPEG) (standard for video coding), that is, ISO / IEC international standard 11172-2 (1994) (generally referred to as MPEG-1), 13818-2 (Draft January 20, 1995) (generally referred to as MPEG-2). In order to efficiently code the video sequence into a bitstream that can be transmitted using the redundancy in the input video sequence, block-based coding techniques provide information that the sequential pictures in the input video sequence are substantially similar. That is, it is assumed that the image scene has almost no change from picture to picture. Scene cuts that occur in picture sequences violate basic assumptions for efficient coding. Thus, after a scene change (scene cut), block-based coding techniques must use a large number of bits to code the first picture following the scene change. Since the number of bits available for coding any one image is usually limited, scene cuts can cause substantial errors in coding, resulting in substantial distortion of the decoded picture.

従って、この分野で、コーディングシステムが確実な措置を講じて実質的なコーディングエラーを避け得るように、ピクチャーのコーディングに先立ってシーンカットの発生を検出する方法と装置に対するニーズが存在する。 Accordingly, there is a need in the art for a method and apparatus for detecting the occurrence of a scene cut prior to coding a picture so that the coding system can take certain steps to avoid substantial coding errors.

これまでの従来技術に伴う不都合は、従来のブロックベースのビデオコーディングシステムに組込み可能なシーンカット検出器という、本発明の実施例によって克服される。シーンカット検出器は予測画像からの予測マクロブロックを入力画像からの入カマクロブロックと、マクロブロック毎のベースで比較して各予測マクロブロックと各入力マクロブロック間の差を表す残差マクロブロックを発生させる。各残差マクロブロックの分散と各入力マクロブロックの分散は、それぞれの比較の後で計算される。残差マクロブロックと入力マクロブロックの分散は決定関数(decision function）と比較される。決定関数比較の結果によってカウンタが増分される。シーンカット検出器は、予測画像の各マクロブロックが各入力マクロブロックに比較されるまでこのプロセスを繰り返す。入力画像の処理中にカウント値が閾値を超えるようなことがあれば、シーンカット検出器は、入力画像が新しいシーンの中にあるものと確認して、それに応じてシーンカット表示フラグをセットする。 The disadvantages of the prior art are overcome by an embodiment of the present invention, a scene cut detector that can be incorporated into a conventional block-based video coding system. The scene cut detector compares the predicted macroblock from the predicted image with the input macroblock from the input image on a per-macroblock basis and represents the difference between each predicted macroblock and each input macroblock Is generated. The variance of each residual macroblock and the variance of each input macroblock are calculated after each comparison. The variance of the residual macroblock and the input macroblock is compared with a decision function. The counter is incremented by the result of the decision function comparison. The scene cut detector repeats this process until each macroblock of the predicted image is compared to each input macroblock. If the count value exceeds the threshold during the processing of the input image, the scene cut detector confirms that the input image is in a new scene and sets the scene cut display flag accordingly. .

本発明の実施例の教示は、添付の図面と共に下記の説明を考慮することによって容易に理解できる。 The teachings of embodiments of the present invention can be readily understood by considering the following description in conjunction with the accompanying drawings, in which:

図１は、本発明の実施例を組み込んだブロックベースのコーディングシステム１００（具体的には、ＭＰＥＧエンコーダ）のブロック線図を示す。ポート１０２におけるシステムへの入力信号は、複数のブロックに分割された前処理画像で、ブロックはそこでシステムへの入力として逐次的に提供される。ＭＰＥＧ規格では、これらのピクセルのブロックは、マクロブロック、例えば１６×１６ピクセルブロックとして一般的に知られている。下記の開示ではＭＰＥＧ規格の専門用語を使用するが、その用語であるマクロブロックは、運動補償(motioncompensation）の基準に使用される任意サイズのピクセルのブロックの記述を意図するものであることは言うまでもない。 FIG. 1 shows a block diagram of a block-based coding system 100 (specifically an MPEG encoder) incorporating an embodiment of the present invention. The input signal to the system at port 102 is a pre-processed image that is divided into a plurality of blocks, where the blocks are sequentially provided as inputs to the system. In the MPEG standard, these blocks of pixels are commonly known as macroblocks, eg 16 × 16 pixel blocks. The following disclosure uses MPEG standard terminology, but the term macroblock is, of course, intended to describe a block of arbitrarily sized pixels used in motion compensation standards. Yes.

システムは、システム出力信号から一連の予測マクロブロック（Ｐ）を計算する。各予測マクロブロックは、丁度、伝達された出力信号のレシーバが受信信号を復号するように、出力信号を復号することによって図解的に生成される。減算器(subtractor）１０６は、予測マクロブロックを入力マクロブロックから差し引くことによって、経路１０７上で残差信号（この分野では、これも簡単に残差または残差マクロブロックと呼ぶ）を発生させる。 The system calculates a series of predicted macroblocks (P) from the system output signal. Each prediction macroblock is generated diagrammatically by decoding the output signal just as a receiver of the transmitted output signal decodes the received signal. A subtractor 106 subtracts the predicted macroblock from the input macroblock to generate a residual signal on the path 107 (also referred to in the art as a residual or residual macroblock).

予測マクロブロックが入力マクロブロックに実質的に類似している場合、残差は比較的小さく、わずかのビットを使って容易にコーディングされる。そのようなシナリオでは、入力マクロブロックは運動補償される、と言われる。しかしながら、予測マクロブロックと入力マクロブロック間の差が大きな場合、残差をコーディングすることは難しい。従って、システムは運動補償された残差マクロブロックをコーディングする代わりに、入力マクロブロックを直接コーディングする方が楽である。この選択はコーディングモードの選択として知られている。入力マクロブロックＩのコーディングはイントラコーディングと呼び、残差のコーディングはインターコーディングと呼ぶ。これらの２モード間の選択はイントラ・インター・ディシジョン（ＩＩＤ）として知られている。 If the predicted macroblock is substantially similar to the input macroblock, the residual is relatively small and is easily coded using few bits. In such a scenario, the input macroblock is said to be motion compensated. However, if the difference between the predicted macroblock and the input macroblock is large, it is difficult to code the residual. Thus, the system is easier to code the input macroblock directly instead of coding the motion compensated residual macroblock. This selection is known as coding mode selection. The coding of the input macroblock I is called intra coding, and the coding of the residual is called inter coding. The choice between these two modes is known as Intra Inter Decision (IID).

ＩＩＤはＩＩＤ回路１１０によって行なわれ、回路はコーディングモードスイッチ１０８をセットする。ＩＩＤは、まず残差マクロブロックの分散(Var R)と入力マクロブロックの分散(Var I)を計算することによって算出される。コーディング決定はこれらの値に基づく。この決定を行なうために使用できる幾つかの関数が存在する。例えば、VarRがVar Iよりも小さい場合、ＩＩＤはインターモードを選択する。逆に、Var IがVar Rよりも小さい場合、ＩＩＤはイントラモードを選択する。 The IID is performed by the IID circuit 110, which sets the coding mode switch 108. The IID is calculated by first calculating the variance of the residual macroblock (Var R) and the variance of the input macroblock (Var I). Coding decisions are based on these values. There are several functions that can be used to make this determination. For example, if VarR is less than Var I, IID selects inter mode. Conversely, if Var I is less than Var R, IID selects intra mode.

選択されたブロックは離散的余弦変換(discrete cosinetransform)（ＤＣＴ）ブロック１１２で処理される。ＤＣＴはＤＣＴへの入力信号を表す係数を生成する。量子化器(quantizer)１１４はその係数を量子化してポート１０４で出力ブロックを生成する。速度制御ブロック１１６は係数の量子化のために使用される量子化尺度（ステップサイズ）を制御する。 The selected block is processed in a discrete cosine transform (DCT) block 112. The DCT generates coefficients that represent the input signal to the DCT. A quantizer 114 quantizes the coefficients and generates an output block at port 104. The speed control block 116 controls the quantization measure (step size) used for coefficient quantization.

正しい予測ブロックを生成すると共に効率的なハーフＰＥＬ運動ベクトルの発生を達成するために、エンコーダは復号画像にアクセスする必要がある。このアクセスを達成するために、量子化器１１４の出力が、逆量子化器１１８と逆ＤＣＴ１２０の両者の中を通される。逆ＤＣＴの出力はＤＣＴ１１２への入力と同一であることが理想である。インターモードでは、逆ＤＣＴと予測マクロブロックの出力を合計することによって、復号マクロブロックが生成される。イントラモードの間、復号マクロブロックは単なる逆ＤＣＴの出力である。復号マクロブロックは次に、フレームストア１２４に格納される。フレームストアは、画像情報の全再構成フレームを構成するこれらの複数の「再構成(reconstructed)」マクロブロックを蓄積する。再構成フレームは運動ベクトル予測器(motion vectorpredictor)１２６によって使用され、次に現われる入力画像のための予測マクロブロックの発生に用いられる運動ベクトルを生じる。 In order to generate the correct prediction block and achieve efficient half-PEL motion vector generation, the encoder needs to access the decoded image. To achieve this access, the output of quantizer 114 is passed through both inverse quantizer 118 and inverse DCT 120. Ideally, the output of the inverse DCT is the same as the input to the DCT 112. In inter mode, a decoded macroblock is generated by summing the output of the inverse DCT and the predicted macroblock. During intra mode, the decoded macroblock is simply the output of the inverse DCT. The decoded macroblock is then stored in the frame store 124. The frame store stores a plurality of these “reconstructed” macroblocks that make up all reconstructed frames of image information. The reconstructed frame is used by a motion vector predictor 126 to produce a motion vector that is used to generate a prediction macroblock for the next appearing input image.

運動ベクトルを発生させるために、運動ベクトル予測器１２６は３個のコンポーネントを備える。すなわち、フルＰＥＬ運動推定器(full-pel motion estimator)１２８、ハーフＰＥＬ運動推定器１３０、および運動モードブロック１３２である。フルＰＥＬ運動推定器１２８は、先行画像のマクロブロックと現在の入力マクロブロックとの間の粗い整合を求める「粗(coarse)」運動ベクトル発生器である。先行画像はアンカー画像と呼ばれる。ＭＰＥＧ規格では、アンカー画像はピクチャーのグループ（ＧＯＰ）として知られる画像シーケンス内のＩまたはＰフレームとして知られるものである。運動ベクトルは、２つのマクロブロック間で粗整合が発見された相対位置を表すベクトルである。粗運動ベクトル発生器は、一つのピクチャーエレメント（ＰＥＬ）に対して正確な運動ベクトルを生成する。 In order to generate motion vectors, motion vector predictor 126 comprises three components. A full PEL motion estimator 128, a half PEL motion estimator 130, and a motion mode block 132. Full PEL motion estimator 128 is a “coarse” motion vector generator that seeks a coarse match between the macroblock of the previous image and the current input macroblock. The preceding image is called an anchor image. In the MPEG standard, anchor images are what are known as I or P frames in an image sequence known as a group of pictures (GOP). The motion vector is a vector representing a relative position where a coarse match is found between two macroblocks. The coarse motion vector generator generates an accurate motion vector for one picture element (PEL).

フルＰＥＬ運動推定器の精度はハーフＰＥＬ運動推定器で改善される。ハーフＰＥＬ推定器は、フルＰＥＬ運動ベクトルとフレームストア１２４からの再構成マクロブロックを使ってハーフＰＥＬ精度に合致した運動ベクトルを計算する。ハーフＰＥＬ運動ベクトルは次に運動モードブロック１３２に送られる。通常、各マクロブロックに対して多重の運動ベクトルが存在する。モードブロック１３２は、各入力マクロブロック毎の運動を表す最良の運動ベクトルを選択する。 The accuracy of a full PEL motion estimator is improved with a half PEL motion estimator. The half PEL estimator uses the full PEL motion vector and the reconstructed macroblock from the frame store 124 to calculate a motion vector that matches the half PEL accuracy. The half PEL motion vector is then sent to motion mode block 132. Usually there are multiple motion vectors for each macroblock. The mode block 132 selects the best motion vector that represents the motion for each input macroblock.

フルＰＥＬ推定器は、ハーフＰＥＬ推定器に比べてコンピュータ的に集中度の高いタスクである。このために、幾つもの実施例では専用ハードウェアで独立に計算される。すべてのフルＰＥＬ運動ベクトルはハーフＰＥＬ処理が始まる前に計算されることが多い。 A full PEL estimator is a task that is computationally more concentrated than a half PEL estimator. For this reason, in some embodiments it is calculated independently on dedicated hardware. All full PEL motion vectors are often calculated before half PEL processing begins.

上記のＭＰＥＧエンコーダシステムは、カルフォルニア州MilpitasのLSILogic，Inc.からモデルＬ６４１２０として集積回路のセットで入手可能な従来システムである。重要なことだが、このＭＰＥＧエンコーダは、ハーフＰＥＬ推定器が動作を始める前に、フルＰＥＬ運動ベクトルの全フレームを格納する。 The above MPEG encoder system is a conventional system available as a model L64120 as a set of integrated circuits from LSILogic, Inc. of Milpitas, California. Importantly, this MPEG encoder stores the entire frame of the full PEL motion vector before the half PEL estimator begins operation.

運動推定と運動補償の考え方は、現行ピクチャーが先に発生したピクチャー（アンカー画像）とあまり差がないという基本的な仮定に基づいている。しかしながら、シーン変更（シーンカットとも呼ぶ）が起こると、アンカーピクチャーは現行ピクチャーとは実質的に異なる。従って、予測マクロブロックは非常に不正確で、残差は大きい。従って、ピクチャーの大部分の入力マクロブロックに対して、ＩＩＤは残差（インターモード）をコーディングする代わりに、コーディング用の入力マクロブロック（イントラモード）を選択する。強調すべきは、このコーディング決定は、シーン変更が存在しない場合でも起こることであり、また正規にコーディングされたピクチャーがイントラとインターコードのマクロブロックのミクスチャを含むかもしれないことである。しかしながら、イントラコードマクロブロックの割合は、シーンカットが起こったときに大きく増加する。本発明の実施例のシーンカット検出器は、ピクチャーのすべてのマクロブロックを分析した後、シーンカットが起こっているか否かを決定する。これは、イントラコードマクロブロックの数を数えて、そのカウントを閾値と比較することによって達成される。具体的には、任意の所定フレーム内のＩコードマクロブロックの割合が、閾値を超える場合、そのフレームはシーンカットに続くものと見做される。 The concept of motion estimation and motion compensation is based on the basic assumption that the current picture is not very different from the previously generated picture (anchor image). However, when a scene change (also called a scene cut) occurs, the anchor picture is substantially different from the current picture. Therefore, the predicted macroblock is very inaccurate and the residual is large. Therefore, for most input macroblocks of a picture, IID selects an input macroblock (intra mode) for coding instead of coding a residual (intermode). It should be emphasized that this coding decision occurs even in the absence of scene changes, and that a correctly coded picture may contain a mix of intra and inter code macroblocks. However, the percentage of intra code macroblocks increases greatly when a scene cut occurs. The scene cut detector of an embodiment of the present invention analyzes all macroblocks in a picture and then determines whether a scene cut is occurring. This is accomplished by counting the number of intracode macroblocks and comparing the count to a threshold value. Specifically, if the proportion of I code macroblocks in any given frame exceeds a threshold, the frame is considered to follow a scene cut.

典型的なＭＰＥＧエンコーダでは、実際のＩＩＤ決定は、ハーフＰＥＬ運動ベクトルが生成されて、最良の運動ベクトルが選ばれた後で行なわれる。フルＰＥＬ推定器１２８は、最初のマクロブロックがエンコーダによってコーディングされる前に、全フレーム用の運動ベクトルを発生させるので、これらのフルＰＥＬ結果をモニターする発明性のあるシーンカット検出器装置１３４は、すべてのマクロブロックに対するＩＩＤ推定、すなわち残差を実際に分析するときにＩＩＤが行なうであろう推定を生成できる。シーンカット検出器は、イントラコードマクロブロックカウンタ１３８に直列に接続されたＩＩＤ推定器１３６を備える。カウンタ１３８は、シーンカット検出器がシーンカットは起こった旨を決定したことを示すシーンカット決定（フラグ）を発生させる。 In a typical MPEG encoder, the actual IID determination is made after the half PEL motion vector is generated and the best motion vector is selected. Since the full PEL estimator 128 generates motion vectors for all frames before the first macroblock is coded by the encoder, the inventive scene cut detector device 134 that monitors these full PEL results is , An IID estimate for all macroblocks, i.e. an estimate that the IID would make when actually analyzing the residual, can be generated. The scene cut detector includes an IID estimator 136 connected in series to an intra code macroblock counter 138. Counter 138 generates a scene cut decision (flag) indicating that the scene cut detector has determined that a scene cut has occurred.

図２は、ＭＰＥＧエンコーダ１００の発明性のあるシーンカット検出器１３４の詳細なブロック線図を示す。フルＰＥＬ運動推定器１２８には、予測マクロブロックが発見される適切なＩまたはＰのアンカー画像と共に、Ｉマクロブロックが提供される。アンカー画像はフレームメモリ１４０に格納される。フルＰＥＬ運動ベクトル発生器１４１は、引用して本明細書に組込む１９９４年９月２７日発行の米国特許第5,351,095号と、引用して本明細書に組込む１９９４年９月２日出願の米国特許出願第08/300,023号に開示されたものを含む、多くの周知の方法の一つを使って、各入力マクロブロック毎の運動ベクトルを発生させる。フルＰＥＬ運動ベクトルとアンカー画像ストア１４０からのアンカー画像を使って、運動補償器１４５は各入力マクロブロック（Ｉ）に対する予測マクロブロック（Ｐ）を発生させる。 FIG. 2 shows a detailed block diagram of the inventive scene cut detector 134 of the MPEG encoder 100. The full PEL motion estimator 128 is provided with an I macroblock along with an appropriate I or P anchor image where the predicted macroblock is found. The anchor image is stored in the frame memory 140. Full PEL motion vector generator 141 is disclosed in US Pat. No. 5,351,095 issued September 27, 1994, which is incorporated herein by reference, and US patent filed on September 2, 1994, which is incorporated herein by reference. One of many well-known methods, including those disclosed in application 08 / 300,023, is used to generate a motion vector for each input macroblock. Using the full PEL motion vector and the anchor image from anchor image store 140, motion compensator 145 generates a predicted macroblock (P) for each input macroblock (I).

入力マクロブロック（経路１２０）と予測画像マクロブロック（Ｐ）は、シーンカット検出器１３４への入力を形成する。ＩＩＤ推定器１３６は、予測マクロブロックを入力マクロブロックから差し引くこと（減算器１４２）によってフルＰＥＬ残差を計算する。ＩＩＤ推定器は次に、分散ブロック１４６、１４８を使って、入力マクロブロックの分散(Var I)とフルＰＥＬ残差の分散(Var R)とを計算する。ＩＩＤ回路１５０は次に、これらの分散に基づいてそのＩＩＤ推定を行なう。カウンタ１５２は、イントラモードの決定の数を数えて、ブロック１５４で、そのカウント数をカウント閾値と比較する。与えられたピクチャーの処理中にカウント数が閾値を超えるようなことがあれば、イントラＭＢカウンタがシーンカット決定フラグを発生させる。 The input macroblock (path 120) and the predicted image macroblock (P) form the input to the scene cut detector 134. The IID estimator 136 calculates the full PEL residual by subtracting the predicted macroblock from the input macroblock (subtractor 142). The IID estimator then uses variance blocks 146, 148 to calculate the variance of the input macroblock (Var I) and the variance of the full PEL residual (Var R). The IID circuit 150 then performs its IID estimation based on these variances. The counter 152 counts the number of intra mode decisions and compares the number of counts with a count threshold at block 154. If the count number exceeds a threshold during processing of a given picture, the intra MB counter generates a scene cut decision flag.

図３は、Var RとVar Iを比較する場合にＩＩＤ推定器によって使用される典型的な決定関数のグラフ３００を示す。最も簡単な決定関数は一次関数３０２である。この関数は、VarIがVar Rより小さいときにＩＩＤ推定がイントラモード（領域３１０）になるように、比較器を用いて実行される。逆に、Var RがVar Iより小さいときは、ＩＩＤ推定はインターモード（領域３１２）になる。 FIG. 3 shows a graph 300 of a typical decision function used by the IID estimator when comparing Var R and Var I. The simplest decision function is a linear function 302. This function is performed using a comparator so that the IID estimation is in intra mode (region 310) when VarI is less than VarR. Conversely, when Var R is less than Var I, IID estimation is in inter mode (region 312).

しかしながら、簡単ではあるが一次関数は最良の結果の提供には役立たない。かくして、非一次関数３０４がより典型的な関数を代表する。この関数はVar Rの特定値３０６では垂直で、その後、線形になる。動作時には、比較的小さなVar R値を持つマクロブロックは、インターモードを使ってコーディングされる。値３０６より大きなVarRの値はすべて、関数３０２の直接比較を使って比較される。 However, although simple, linear functions do not help provide the best results. Thus, the non-linear function 304 represents a more typical function. This function is vertical at a specific value 306 for Var R and then linear. In operation, macroblocks with relatively small Var R values are coded using inter mode. All values of VarR greater than value 306 are compared using direct comparison of function 302.

ハーフＰＥＬ運動推定器の方がより正確に運動ベクトルを定義するので、Var I値よりもほんのわずかに大きいVarR値を持つマクロブロック、例えば曲線３０２に近いがその真下にある点は、ハーフＰＥＬ推定器を使ってより正確な推定が得られるときはその曲線より上にシフトできる。従って、ＭＰＥＧエンコーダのＩＩＤはインターモードコーディングを使用するであろうが、それにもかかわらずシーンカット検出器のＩＩＤ推定は、イントラモードコーディングが使用されるものと推定する。この異常を補償するために、一般に関数３０８が使用される。関数３０８の形は関数３０４に似ているが、関数３０８は関数３０４の下にわずかにシフトされている。かくして、不正確なＩＩＤ推定が避けられる、すなわち、曲線近くにあるであろう値が今度はインターモード領域にくる。 Since the half PEL motion estimator defines the motion vector more accurately, a macroblock with a VarR value that is only slightly larger than the Var I value, for example, a point close to but directly below the curve 302 is a half PEL estimate. If a more accurate estimate is obtained using the instrument, it can be shifted above the curve. Thus, the IID of the MPEG encoder will use inter-mode coding, but the CID estimation of the scene cut detector nevertheless assumes that intra-mode coding is used. A function 308 is generally used to compensate for this anomaly. The form of function 308 is similar to function 304, but function 308 is slightly shifted below function 304. Thus, inaccurate IID estimation is avoided, i.e. values that would be near the curve are now in the inter-mode region.

勿論、図３に示す関数は単なる説明用である。特定の結果を得るために他の一次関数や非一次関数が使用される。発明の実施例はＩＩＤ推定器内の任意の関数を含むものである。 Of course, the functions shown in FIG. 3 are merely illustrative. Other linear and non-linear functions are used to obtain specific results. Embodiments of the invention include arbitrary functions within the IID estimator.

マクロブロックの通常のシーケンスの間（シーンカットなし）、予測マクロブロックは、フルＰＥＬ精度においても、大部分がインターモードの決定をもたらすだろう。従って、シーンカットフラグはセットされないだろう。ハーフＰＥＬリファインメン(half-pel refinement)後に行なわれるイントラモード決定の割合は、運動ベクトルの追加の精度によって低くなることに注目されたい。 During a normal sequence of macroblocks (no scene cuts), predictive macroblocks will largely lead to inter-mode decisions, even at full PEL accuracy. Therefore, the scene cut flag will not be set. Note that the rate of intra mode determination made after half-pel refinement is reduced by the additional accuracy of the motion vectors.

シーンカットが起こるときは、システムが、ハーフＰＥＬやフルＰＥＬ精度に合致した運動ベクトルを発生させるか否かは問題でない。運動推定はいずれの場合も不正確になる。かなりの数のイントラモードマクロブロックがフルＰＥＬとハーフＰＥＬの両ステージで選択される。従って、フルＰＥＬ精度のＩＩＤ推定器を使ったイントラモード決定の粗カウントは、シーンカットの検出に対して充分である。 When a scene cut occurs, it does not matter whether the system generates a motion vector that matches the half PEL or full PEL accuracy. Motion estimation is inaccurate in both cases. A significant number of intra-mode macroblocks are selected in both full PEL and half PEL stages. Thus, a coarse count of intra mode determination using a full PEL accuracy IID estimator is sufficient for scene cut detection.

推定決定の３３％（１／３）がイントラモードの決定のときは、一般に、閾値を超えるようにセットされる。勿論、閾値は期待画像シーケンス内容によって要求される任意の割合にセットできる。 When 33% (1/3) of the estimated decision is an intra mode decision, it is generally set to exceed the threshold. Of course, the threshold can be set to any ratio required by the expected image sequence content.

シーンカットの検出に加えて、発明の実施例は、あまり良好にコーディングされないようなピクチャーの検出にも役立つ。例えば、物体がシーン内の別の物体の背後から出入りするときに起こるような、画像シーンが予測不能に変化している場合、シーンを表すシーケンスの２つの連続ピクチャーは実質的に異なるだろう。このような画像では、エンコーダはピクチャーシーケンスに対するコーディングビット予算(coding bit budget)を超えることができる、つまり通信チャンネルを介して伝達できるよりも多くのビットを使って画像をコーディングできるだろう。発明性のある検出器を使って、コーディングの開始前に、予測とコーディングが難しくなるようなピクチャーを検出する。従って、エンコーダは、ビット予算の超過を避けるためにコーディング戦略を変更できる。 In addition to detecting scene cuts, embodiments of the invention are also useful for detecting pictures that are not well coded. For example, if an image scene is changing unpredictably, such as occurs when an object enters and leaves behind another object in the scene, the two successive pictures of the sequence representing the scene will be substantially different. In such an image, the encoder could exceed the coding bit budget for the picture sequence, i.e. it could code the image using more bits than can be transmitted over the communication channel. An inventive detector is used to detect pictures that are difficult to predict and code before the start of coding. Thus, the encoder can change the coding strategy to avoid exceeding the bit budget.

シーンカット検出器フラグを使用する一つの特殊装置が、１９９６年２月２６日に本明細書と同時に出願された米国特許出願第08/606,622号に記載され、引用によって本明細書に組み込まれている。フラグに応じて、この装置は、エンコーダがピクチャーを粗くコーディングしてビット予算を維持するように、量子化尺度を変更する。 One special device that uses a scene cut detector flag is described in US patent application Ser. No. 08 / 606,622, filed at the same time as this specification on Feb. 26, 1996, incorporated herein by reference. Yes. Depending on the flag, the device changes the quantization measure so that the encoder coarsely codes the picture to maintain the bit budget.

本発明の実施例の教示を組み込んだ単一の実施態様を詳しく図示、説明したが、当業者なら、本教示を組み込んだ他の多くの様々な実施例を容易に考案できるであろう。 While a single embodiment incorporating the teachings of the embodiments of the present invention has been illustrated and described in detail, those skilled in the art can readily devise many other various embodiments incorporating the present teachings.

本発明の実施例のシーンカット検出器を組み込んだブロックベースのコーディングシステムのブロック線図を示す。FIG. 3 shows a block diagram of a block-based coding system incorporating a scene cut detector of an embodiment of the present invention. 本発明の実施例のシーンカット検出器の詳細なブロック線図を示す。FIG. 2 shows a detailed block diagram of a scene cut detector according to an embodiment of the present invention. 本発明の実施例のＩＩＤ推定器（IID estimator）によって使用される決定関数のグラフを示す。Figure 3 shows a graph of a decision function used by an IID estimator of an embodiment of the present invention.

Claims

Each input image of the sequence of input images is divided into a plurality of macroblocks, and at least one motion vector is calculated for each of the macroblocks, and a sequence of predicted images is generated. A method for detecting a scene cut between a first image and a second image in a sequence of input images in a block-based video encoder comprising a plurality of predictive macroblocks derived from the input image and the motion vector Because
(A) generating a residual macroblock based on a difference between a predicted macroblock from the first image and an input macroblock from the second image;
(B) calculating a variance of the input macroblock and a variance of the residual macroblock;
(C) comparing the variance of the input macroblock and the variance of the residual macroblock, the variance of the residual macroblock exceeds a first threshold, and the variance of the residual macroblock is the input Incrementing the counter value if greater than the sum of the variance of the macroblock and the shift value;
(D) repeating steps (a), (b), and (c) for all macroblocks of the second image;
(E) setting a flag indicating the occurrence of the scene cut when the count value exceeds a second threshold;
Having a method.

The method of claim 1, wherein the second threshold is set to 33% of the total number of macroblocks in the input image.

Each input image of the sequence of input images is divided into a plurality of macroblocks, and at least one motion vector is calculated for each of the macroblocks, and a sequence of predicted images is generated. In a block-based video encoder comprising a plurality of prediction macroblocks derived from the input image and the motion vector, a scene cut between a first image and a second image in the sequence of input images is performed. A device for detecting,
A subtractor configured to generate a residual macroblock based on a difference between a predicted macroblock from the first image and an input macroblock from the second image;
A residual variance generator configured to calculate a variance of the residual macroblock;
An input variance generator configured to calculate a variance of the input macroblock;
Generating a first output if the variance of the residual macroblock exceeds a first threshold and the variance of the residual macroblock is greater than the sum of the variance of the input macroblock and the shift value; A decision circuit configured to generate a second output in the case of
A counter configured to count the first output;
Threshold means configured to set a flag indicating the occurrence of the scene cut when the count value counted by the counter exceeds a second threshold;
A device comprising:

4. The apparatus of claim 3, wherein the second threshold is set to 33% of the total number of macroblocks in the input image.