JP4399794B2

JP4399794B2 - Image coding apparatus and image coding method

Info

Publication number: JP4399794B2
Application number: JP2004270082A
Authority: JP
Inventors: 基晴上田
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2004-09-16
Filing date: 2004-09-16
Publication date: 2010-01-20
Anticipated expiration: 2024-09-16
Also published as: JP2006086861A

Description

本発明は、ＭＰＥＧ２（Moving Picture Experts Group Phase 2）などの動画像の画像符号化を実施するための画像符号化装置に関し、特に、画像符号化の際に、視覚的な劣化を抑制することが可能な画像符号化装置に関する。 The present invention relates to an image coding apparatus for performing image coding of moving images such as MPEG2 (Moving Picture Experts Group Phase 2), and in particular, suppresses visual deterioration during image coding. The present invention relates to a possible image encoding device.

近年、デジタル化された画像信号に対して高能率符号化による圧縮された情報を用いて、衛星波、地上波、電話回線などの様々な伝送路を通じて、情報を配信するサービスが実用化されている。このようなサービスでは、動画像・音声などの情報を配信する際、動画像・音声の高能率符号化方式として、国際規格であるＭＰＥＧ２が用いられている。ＭＰＥＧ２は、画像信号の隣接画素間（空間方向）の相関や、隣接フレーム間又は隣接フィールド間（時間方向）の相関を利用して、画像信号の情報量を圧縮する符号化方式である。 In recent years, services that distribute information through various transmission paths such as satellite waves, terrestrial waves, and telephone lines using information compressed by high-efficiency coding for digitized image signals have been put into practical use. Yes. In such a service, when distributing information such as moving images and sounds, MPEG2 which is an international standard is used as a high-efficiency encoding method for moving images and sounds. MPEG2 is an encoding method that compresses the information amount of an image signal by using a correlation between adjacent pixels (spatial direction) of an image signal and a correlation between adjacent frames or adjacent fields (time direction).

ＭＰＥＧ２規格における画像符号化は、下記のようなアルゴリズムで処理される。まず、時間的に連続する画像フレームを、基準フレームと予測フレームに振り分ける。基準フレームは、空間方向の相関のみを用いて符号化することで、そのフレームの符号化データのみで、元の画像を復元することができる。一方、予測フレームは、基準となるフレームからの時間方向の相関と空間方向の相関とを用いて符号化することにより、基準フレームに比べて高い符号化効率を実現することができる。なお、予測フレームの符号化データは、復元された基準フレームと、予測フレームの符号化データとによって復元される。 Image encoding in the MPEG2 standard is processed by the following algorithm. First, temporally continuous image frames are divided into a reference frame and a prediction frame. By encoding the reference frame using only the spatial correlation, the original image can be restored using only the encoded data of the frame. On the other hand, the prediction frame is encoded using the correlation in the time direction and the correlation in the spatial direction from the reference frame, thereby realizing higher encoding efficiency than the reference frame. Note that the encoded data of the prediction frame is recovered by the recovered reference frame and the encoded data of the prediction frame.

次に、具体的なＭＰＥＧ２画像符号化で用いられる符号化体系について、図４を用いて説明する。なお、図４では、必要に応じて、識別可能となるように各ピクチャタイプに番号を付している。図４（Ａ）中に『Ｉ』と示されている基準フレームであるＩピクチャ（Ｉフレーム）は、定期的に存在し、復号処理の基準となる情報である。一方、予測フレームには、図４（Ａ）中に『Ｐ』と示されている、時間的に前（過去）の基準フレームからの予測のみで符号化されるＰピクチャ（Ｐフレーム）と、図４（Ａ）中に『Ｂ』と示されている、時間的に前後（過去と未来）の２つの基準フレームから予測符号化されるＢピクチャ（Ｂフレーム）とが存在する。なお、図４（Ａ）中の矢印は、Ｐピクチャ及びＢピクチャに係る予測方向を示すものである。Ｐピクチャは、自身が予測フレームであるとともに、他のＰピクチャやＢピクチャの基準フレームとしても利用される。 Next, a specific encoding system used in MPEG2 image encoding will be described with reference to FIG. In FIG. 4, numbers are assigned to the picture types as necessary so that they can be identified. An I picture (I frame), which is a reference frame indicated as “I” in FIG. 4A, is periodically present and is information serving as a reference for decoding processing. On the other hand, in the prediction frame, a P picture (P frame) encoded by only prediction from a temporally previous (past) reference frame, indicated by “P” in FIG. There is a B picture (B frame) that is predictively encoded from two reference frames before and after (past and future) in time, which are indicated as “B” in FIG. Note that arrows in FIG. 4A indicate prediction directions related to the P picture and the B picture. The P picture itself is a prediction frame and is also used as a reference frame for other P pictures and B pictures.

Ｉピクチャの画像信号は、輝度信号に基づいて水平１６画素×垂直１６画素のマクロブロックと呼ばれる処理単位に分割される。分割されたマクロブロックのデータは、更に８画素×８画素単位の２次元ブロックに分割され、直交変換の一種であるＤＣＴ（Discrete Cosine Transform：離散コサイン変換）処理が行われる。 An I-picture image signal is divided into processing units called macroblocks of horizontal 16 pixels × vertical 16 pixels based on the luminance signal. The divided macroblock data is further divided into two-dimensional blocks of 8 pixels × 8 pixels and subjected to DCT (Discrete Cosine Transform) processing which is a kind of orthogonal transform.

ＤＣＴ処理後の信号は、その２次元ブロックの周波数成分に準じた値を示すため、一般的な画像では低域に成分が集中する。また、高周波数成分の情報劣化は低周波数成分の情報劣化よりも視覚的に目立ちにくい性質がある。よって、低域成分を細かく、高域成分を粗く量子化し、その係数成分と成分が無い係数０の連続する長さを可変長符号化することにより、情報量を圧縮している。 Since the signal after the DCT processing shows a value according to the frequency component of the two-dimensional block, the component is concentrated in a low band in a general image. In addition, information degradation of high frequency components has a property that is visually less noticeable than information degradation of low frequency components. Therefore, the amount of information is compressed by finely quantizing the low frequency components and coarsely quantizing the high frequency components, and variable length coding the coefficient component and the continuous length of coefficient 0 having no component.

Ｐピクチャの画像信号も、Ｉピクチャと同様に、輝度信号に基づいて水平１６画素×垂直１６画素のマクロブロックの単位に分割される。Ｐピクチャでは、マクロブロックごとに基準フレームとの間の動きベクトルが計算される。動きベクトルの検出は、一般的にブロックマッチングにより求められる。このブロックマッチングでは、マクロブロックの各画素と、動きベクトル値だけマクロブロックの存在する水平・垂直の位置を動かした場所の基準フレームを水平１６画素×垂直１６画素にブロック化した各画素との差分絶対値総和（あるいは差分２乗総和）が求められ、その最小値を取る動きベクトルの値が、検出された動きベクトルとして出力される。 Similarly to the I picture, the P picture image signal is also divided into units of macroblocks of horizontal 16 pixels × vertical 16 pixels based on the luminance signal. In the P picture, a motion vector between the reference frame and each macroblock is calculated. Motion vector detection is generally obtained by block matching. In this block matching, the difference between each pixel of the macroblock and each pixel obtained by blocking the reference frame where the horizontal / vertical position where the macroblock exists by the motion vector value is moved into 16 horizontal pixels × 16 vertical pixels. The absolute value sum (or the sum of squared differences) is obtained, and the value of the motion vector taking the minimum value is output as the detected motion vector.

マクロブロックの各画素は、動きベクトルにより切り出された２次元ブロックの各画素との差分が取られる。正確な動きベクトルが検出された場合には、差分ブロックの情報量は元のマクロブロックの持っている情報量よりも大幅に少なくなるため、Ｉピクチャよりも粗い量子化処理が可能となる。実際には、差分ブロックを符号化するか、あるいは非差分ブロック（イントラ（Intra）ブロック）を符号化するかが選択され（予測モード判定）、選択されたブロックに対してＩピクチャと同様のＤＣＴ・可変長符号化処理が施されて情報量が圧縮される。 Each pixel of the macro block is subjected to a difference from each pixel of the two-dimensional block cut out by the motion vector. When an accurate motion vector is detected, the information amount of the difference block is significantly smaller than the information amount of the original macroblock, so that coarser quantization processing than that of the I picture is possible. Actually, it is selected whether to encode a differential block or a non-differential block (intra block) (prediction mode determination), and DCT similar to an I picture is selected for the selected block. A variable length encoding process is performed to compress the amount of information.

また、Ｂピクチャに関しても、Ｐピクチャと同様の処理が行われるが、基準フレームであるＩ、Ｐピクチャが時間的に前後に存在しており、各基準フレームとの間で動きベクトルの検出が行われる。Ｂピクチャでは予測の選択肢が、前基準フレームからの予測（フォワード（Forward）予測）・後基準フレームからの予測（バックワード（Backward）予測）・２つの予測ブロックの画素ごとの平均値（アベレージ（Average）予測）の３種類存在し、イントラブロックのみで復号を行う方式を合わせた４種類の方式の中から予測モード判定が行われる。 In addition, the same processing as that for the P picture is performed for the B picture, but the I and P pictures that are the reference frames exist before and after the time frame, and motion vectors are detected between the reference frames. Is called. In the B picture, prediction options are prediction from a previous reference frame (forward prediction), prediction from a subsequent reference frame (backward prediction), and an average value (average (average (2)) of two prediction blocks. There are three types of (average) prediction), and prediction mode determination is performed from among four types of schemes including a scheme for decoding only by intra blocks.

Ｂピクチャは、時間的に前後の基準フレームから予測が可能となるため、Ｐピクチャよりも更に予測効率が向上する。したがって、一般的に、Ｂピクチャは、Ｐピクチャよりも更に粗く量子化される。なお、Ｂピクチャとして選択されたブロックは、Ｉ、Ｐピクチャと同様の符号化処理が行われる。 B pictures can be predicted from temporally preceding and following reference frames, so that prediction efficiency is further improved than P pictures. Therefore, in general, a B picture is quantized more coarsely than a P picture. The block selected as the B picture is subjected to the same encoding process as the I and P pictures.

Ｂピクチャの復号処理では、時間的に後の基準フレームからの予測処理も行われるため、この基準フレームは、Ｂピクチャに先行して符号化される必要がある。このため、符号化処理の際に、記録入力された画像信号は、図４（Ｂ）に示されるように、Ｂピクチャの基準フレームであるＩピクチャ又はＰピクチャの後にＢピクチャが配置されるように、順序の並べ替えが行われて符号化される。すなわち、符号化処理時には、復号処理時の符号化順序に鑑みて、原画像の入力オーダの順序の並べ替えが行われる。一方、復号処理では、図４（Ｃ）に示すように、図４（Ｂ）の順序に対して逆の並べ替えを行って出力することにより、入力された画像信号の順序で復号画像が再生可能となる。 In the decoding process of a B picture, a prediction process from a later reference frame is also performed, and thus this reference frame needs to be encoded before the B picture. For this reason, in the encoding process, as shown in FIG. 4B, the image signal recorded and input is arranged such that the B picture is arranged after the I picture or P picture which is the reference frame of the B picture. Then, the order is rearranged and encoded. That is, during the encoding process, the order of the input order of the original image is rearranged in view of the encoding order during the decoding process. On the other hand, in the decoding process, as shown in FIG. 4C, the decoded image is reproduced in the order of the input image signals by performing the reverse rearrangement with respect to the order of FIG. It becomes possible.

次に、ＭＰＥＧ２画像符号化を実現するための一般的な符号化装置と復号装置について説明する。まず、従来の技術において一般的な符号化装置について説明する。図５は、従来の技術に係る一般的な画像符号化装置の一例を示すブロック図である。図５において、入力端子２０１から入力されたデジタル画像信号（入力画像信号）は、入力画像メモリ２０２に供給されて記憶され、符号化シンタックスに従って符号化される順番に並べ替えを行うために遅延される。そして、入力画像メモリ２０２から出力されたデジタル画像信号は、２次元ブロック変換回路２０３において、マクロブロックの切り出し処理が行われる。 Next, a general encoding device and decoding device for realizing MPEG2 image encoding will be described. First, a general encoding apparatus in the prior art will be described. FIG. 5 is a block diagram illustrating an example of a general image encoding device according to the related art. In FIG. 5, the digital image signal (input image signal) input from the input terminal 201 is supplied to and stored in the input image memory 202, and is delayed to be rearranged in the order of encoding according to the encoding syntax. Is done. The digital image signal output from the input image memory 202 is subjected to macroblock cutout processing in the two-dimensional block conversion circuit 203.

基準フレームに関するマクロブロックデータは、減算器２０４を介して直交変換回路２０５に供給され、ここで、水平８画素×垂直８画素単位でＤＣＴ処理が行われて、ＤＣＴ係数が算出される。ＤＣＴ係数は、さらに輝度信号に基づいて水平１６画素×垂直１６画素のマクロブロック単位にまとめられて、量子化回路２０６に送られる。量子化回路２０６においては、例えば、周波数成分ごとに異なる値を持つ量子化マトリクスによって、ＤＣＴ係数ごとに異なる値で除算することにより、量子化処理が行われる。量子化処理されたＤＣＴ係数は符号化回路２１４に送られ、符号化回路２１４において、符号化テーブル２１５の係数に対応したアドレスを参照することにより、可変長又は固定長の符号化が行われる。そして、マルチプレクサ２１６において、上記の符号化回路２１４における処理後の符号化データと、２次元ブロック変換回路２０３からの画面内でのマクロブロックの場所などを示す付加情報とが多重化され、画像ストリームバッファ２１８にいったん格納された後、ビットストリーム（出力画像ビットストリーム）として出力端子２１９から出力される。 Macroblock data relating to the reference frame is supplied to the orthogonal transform circuit 205 via the subtractor 204, where DCT processing is performed in units of horizontal 8 pixels × vertical 8 pixels to calculate DCT coefficients. The DCT coefficients are further collected in units of macroblocks of horizontal 16 pixels × vertical 16 pixels based on the luminance signal, and sent to the quantization circuit 206. In the quantization circuit 206, for example, the quantization process is performed by dividing by a different value for each DCT coefficient by a quantization matrix having a different value for each frequency component. The quantized DCT coefficient is sent to the encoding circuit 214, and variable length or fixed length encoding is performed by referring to an address corresponding to the coefficient of the encoding table 215. Then, the multiplexer 216 multiplexes the encoded data after the processing in the encoding circuit 214 and the additional information indicating the location of the macroblock in the screen from the two-dimensional block conversion circuit 203, and the image stream After being stored in the buffer 218 once, it is output from the output terminal 219 as a bit stream (output image bit stream).

また、量子化回路２０６において量子化されたＤＣＴ係数は、逆量子化回路２１２及び逆直交変換回路２１３において逆量子化処理及び逆ＤＣＴ処理が行われて、量子化されたＤＣＴ係数が復号され、加算器２１０及びデブロック回路２１１を介して参照画像メモリ２０９に供給されて格納される。この参照画像メモリ２０９に格納された画像は、予測フレームの符号化処理時に利用される。 Also, the DCT coefficients quantized by the quantization circuit 206 are subjected to inverse quantization processing and inverse DCT processing by the inverse quantization circuit 212 and the inverse orthogonal transform circuit 213, and the quantized DCT coefficients are decoded. The data is supplied to and stored in the reference image memory 209 via the adder 210 and the deblocking circuit 211. The image stored in the reference image memory 209 is used at the time of predictive frame encoding processing.

一方、予測フレームに関しては、入力画像メモリ２０２から切り出されたマクロブロックデータと参照画像メモリ２０９に格納されている画像との間で、動きベクトル検出回路２０７によって画像間における動きベクトルが求められる。動きベクトル検出回路２０７において求められた動きベクトルは、動き補償予測回路２０８に供給され、ここで、参照画像メモリ２０９からの参照画像から予測ブロックの切り出し処理が行われる。動き補償予測回路２０８では、切り出された複数の予測ブロックに従って、最適な予測モードの選択が行われ、符号化すべき入力画像ブロックとの差分信号が、直交変換回路２０５に送出される。この差分信号に関しては、上述の基準フレームの各ブロックと同様の処理が行われ、ＤＣＴ係数が量子化処理されて、動きベクトルや予測モードと共に出力画像ビットストリームとしてマルチプレクサ２１６から、画像ストリームバッファ２１８を経て、出力端子２１９より出力される。 On the other hand, for the predicted frame, a motion vector between images is obtained by the motion vector detection circuit 207 between the macroblock data cut out from the input image memory 202 and the image stored in the reference image memory 209. The motion vector obtained in the motion vector detection circuit 207 is supplied to the motion compensation prediction circuit 208, where a prediction block is cut out from the reference image from the reference image memory 209. The motion compensated prediction circuit 208 selects an optimal prediction mode according to the plurality of extracted prediction blocks, and sends a difference signal from the input image block to be encoded to the orthogonal transformation circuit 205. The difference signal is processed in the same manner as each block of the reference frame described above, the DCT coefficient is quantized, and the image stream buffer 218 is output from the multiplexer 216 as an output image bit stream together with the motion vector and the prediction mode. Then, it is output from the output terminal 219.

なお、符号量の制御に関しては、符号量制御回路２１７において、マルチプレクサ２１６から出力されたビットストリームの符号量と、目標とする符号量（目標符号量）との比較が行われ、目標符号量に近づけるために量子化回路２０６の量子化の細かさ（量子化スケール）の制御が行われる。そして、上述した３種類の情報量の異なるピクチャタイプ（フレームタイプ）に対し、設定された符号化ビットレートに対する各ピクチャタイプの性質及び出現頻度を用いて、各フレームに対する目標符号量が算出される。 Regarding the control of the code amount, the code amount control circuit 217 compares the code amount of the bit stream output from the multiplexer 216 with the target code amount (target code amount) to obtain the target code amount. In order to make it closer, the quantization fineness (quantization scale) of the quantization circuit 206 is controlled. Then, for the above-described three types of picture types (frame types) having different information amounts, the target code amount for each frame is calculated using the nature and appearance frequency of each picture type for the set encoding bit rate. .

また、目標符号量は、仮想的に復号装置シミュレートされたストリームバッファ（ＶＢＶ（Video Buffer Verifier）バッファと呼ばれる）に対して、バッファのオーバーフロー・アンダーフローが起きないように制限される。また、量子化スケールは、スケールと出力符号量とが一般的にほぼ反比例の関係があることを利用して、フレームタイプごとに目標符号量に対する量子化スケール値が計算されて、量子化処理が行われる。そして、ブロックごとに目標符号量に近づく方向に量子化スケールを変動させることによって、目標符号量内に符号化ストリームを抑えるように制御される。 Also, the target code amount is limited so that a buffer overflow / underflow does not occur in a stream buffer (called a VBV (Video Buffer Verifier) buffer) virtually simulated by a decoding device. In addition, the quantization scale is calculated by calculating the quantization scale value for the target code amount for each frame type by utilizing the fact that the scale and the output code amount are generally inversely proportional. Done. Then, by controlling the quantization scale in a direction approaching the target code amount for each block, control is performed to suppress the encoded stream within the target code amount.

次に、従来の技術において一般的な復号装置について説明する。図６は、従来の技術に係る一般的な復号装置の一例を示すブロック図である。図６において、まず、入力端子１０１から入力された画像ビットストリーム（画像ストリーム）が、画像ストリームバッファ１０２に蓄えられる。なお、画像ビットストリームには仮想的にシミュレートされたバッファ値が書かれており、そのバッファ値分だけ、画像ビットストリームが画像ストリームバッファ１０２に蓄えられてから下記の復号処理が行われるようにすることによって、バッファが破綻して復号処理が止まることを防ぐことが可能となる。画像ストリームバッファ１０２から出力された画像ビットストリームは、可変長復号回路１０３において、量子化スケール、予測モード、動きベクトルなどの付加情報が分離されるとともに、量子化されたＤＣＴ係数の復号が行われる。 Next, a general decoding device in the prior art will be described. FIG. 6 is a block diagram illustrating an example of a general decoding device according to the related art. In FIG. 6, first, an image bit stream (image stream) input from the input terminal 101 is stored in the image stream buffer 102. Note that a virtually simulated buffer value is written in the image bit stream, and the following decoding process is performed after the image bit stream is stored in the image stream buffer 102 by the buffer value. By doing so, it is possible to prevent the decoding process from stopping due to the failure of the buffer. In the image bit stream output from the image stream buffer 102, the variable length decoding circuit 103 separates additional information such as a quantization scale, a prediction mode, and a motion vector, and decodes the quantized DCT coefficient. .

復号されたＤＣＴ係数に関しては、符号化回路（図５に示す画像符号化装置）内の逆量子化回路２１２及び逆直交変換回路２１３と同様の処理が行われ、逆量子化回路１０５及び逆直交変換回路１１１において逆量子化処理及び逆ＤＣＴ処理が行われ、イントラブロック又は差分ブロックが復号されて、加算器１０７に供給される。また、予測ブロックの場合には、可変長復号回路１０３で復号された予測モードと動きベクトル値とにより、動き補償予測回路１０６において、参照画像メモリ１０９から読み出された参照画像信号（当該処理の前に、既に格納されたＩピクチャやＰピクチャの画像信号）から予測ブロックの切り出し処理が行われる。これにより、復号されたイントラブロック又は差分ブロックと、動き補償予測回路１０６において切り出された予測ブロックとの加算が加算器１０７にて行われ、マクロブロックの画像信号が復元される。 The decoded DCT coefficients are processed in the same manner as the inverse quantization circuit 212 and the inverse orthogonal transform circuit 213 in the encoding circuit (image encoding apparatus shown in FIG. 5), and the inverse quantization circuit 105 and the inverse orthogonal circuit are processed. In the transform circuit 111, inverse quantization processing and inverse DCT processing are performed, and the intra block or the difference block is decoded and supplied to the adder 107. In the case of a prediction block, a reference image signal read from the reference image memory 109 in the motion compensated prediction circuit 106 based on the prediction mode and the motion vector value decoded by the variable length decoding circuit 103 (of the process). A prediction block is cut out from a previously stored image signal of an I picture or a P picture. As a result, the decoded intra block or difference block and the prediction block cut out by the motion compensation prediction circuit 106 are added by the adder 107, and the image signal of the macro block is restored.

加算器１０７における加算処理によって復元されたマクロブロックデータ（マクロブロックの画像信号）は、デブロック回路１０８に供給されて、画像スキャン順に画像信号に戻される。このとき、Ｉ又はＰピクチャの場合には、参照画像メモリ１０９に書き込まれ、Ｂピクチャの場合には、出力フレームメモリ１１０にいったん蓄えられた後、画像信号（出力画像信号）として出力される。なお、参照画像メモリ１０９に蓄積されたＩ又はＰピクチャの画像データは、図４（Ａ）〜（Ｃ）に示すような画像出力タイミングに従って、出力フレームメモリ１１０にいったん蓄積された後、Ｂピクチャと同様に画像信号（出力画像信号）として出力される。 The macroblock data (macroblock image signal) restored by the addition processing in the adder 107 is supplied to the deblocking circuit 108 and returned to the image signal in the order of image scanning. At this time, in the case of an I or P picture, it is written in the reference image memory 109, and in the case of a B picture, it is once stored in the output frame memory 110 and then output as an image signal (output image signal). Note that the I or P picture image data stored in the reference image memory 109 is once stored in the output frame memory 110 in accordance with the image output timing as shown in FIGS. In the same manner as described above, an image signal (output image signal) is output.

また、ＭＰＥＧ２規格のような所定フレームごとの符号量の変動を許容する画像符号化技術において、出力される画像ビットストリームが目標符号量となるように制御する方法として、一般的に、例えば下記のような手法が採用されている。 In addition, in an image encoding technique that allows variation in the code amount for each predetermined frame as in the MPEG2 standard, as a method for controlling an output image bitstream to be a target code amount, generally, for example, the following Such a method is adopted.

例えば、一定時間の目標符号量に対し、各ピクチャタイプの符号化画像が持つ情報量に応じて、フレームごとの目標符号量が割り当てられていくようにする。具体的には、以前に符号化された各ピクチャタイプの要した符号量をBits、各ピクチャタイプの量子化スケール値の平均値をAvgQとした場合に、各ピクチャタイプの持つ複雑度の近似値Cを、上記の符号量と量子化スケール値の平均値との積によって算定することが可能である。
C(T) = Bits(T)*AvgQ(T)
（T：ピクチャタイプ） For example, a target code amount for each frame is assigned to a target code amount for a certain period of time according to the information amount of each picture type encoded image. Specifically, when the code amount required for each previously encoded picture type is Bits and the average value of the quantization scale value of each picture type is AvgQ, an approximation of the complexity of each picture type C can be calculated by the product of the code amount and the average quantization scale value.
C (T) = Bits (T) * AvgQ (T)
(T: Picture type)

このとき、上記の各ピクチャタイプの複雑度の近似値Cを用いて、符号化制御で想定される一定時間（例えば、Ｆフレーム）内に与えられる目標符号量TotalBitsに対して、各ピクチャタイプが用いられるフレーム数をFnumとした場合、符号化されるピクチャタイプがそれぞれＩ、Ｐ、Ｂピクチャの場合におけるフレームの目標符号量Budget(I) 、Budget(P)、 Budget(B)は、
Budget(I) = {TotalBits*C(I)} / {Fnum(I)*C(I) + Fnum(P)*C(P) + Fnum(B)*C(B)}
Budget(P) = {TotalBits*C(P)} / {Fnum(I)*C(I) + Fnum(P)*C(P) + Fnum(B)*C(B)}
Budget(B) = {TotalBits*C(B)} / {Fnum(I)*C(I) + Fnum(P)*C(P) + Fnum(B)*C(B)}
と算出される。 At this time, using the approximate value C of the complexity of each picture type described above, each picture type has a target code amount TotalBits given within a predetermined time (for example, F frame) assumed in coding control. When the number of frames used is Fnum, the target code amounts Budget (I), Budget (P), and Budget (B) for frames when the picture types to be encoded are I, P, and B pictures, respectively,
Budget (I) = {TotalBits * C (I)} / {Fnum (I) * C (I) + Fnum (P) * C (P) + Fnum (B) * C (B)}
Budget (P) = {TotalBits * C (P)} / {Fnum (I) * C (I) + Fnum (P) * C (P) + Fnum (B) * C (B)}
Budget (B) = {TotalBits * C (B)} / {Fnum (I) * C (I) + Fnum (P) * C (P) + Fnum (B) * C (B)}
Is calculated.

このようにして算出された目標符号量は、過去の入力画像の符号化難易度から設定される、未来の入力画像に対する目標符号量である。したがって、この目標符号量に従った符号量の制御によれば、難易度の変化が少ないシーンが変化しない場合において、良好な制御を行うことが可能となる。しかしながら、シーンの変化に対しては追従が遅く、例えば、フェードイン、シーンチェンジ、画面の移動による動き、解像度の変化などに対しては、制御が乱れることがあり、その結果、符号化された画像の品質が大幅に劣化してしまうおそれがある。 The target code amount calculated in this way is a target code amount for a future input image set based on the encoding difficulty level of the past input image. Therefore, according to the control of the code amount according to the target code amount, it is possible to perform a good control when the scene with a small change in difficulty does not change. However, it is slow to follow scene changes. For example, fade-in, scene change, movement due to screen movement, change in resolution, etc., may result in disordered control. There is a risk that the quality of the image is greatly degraded.

一方、シーンの変化に対応するために、入力画像の輝度成分の変化を参照することによって、フェードイン／アウトやシーンチェンジの場面の判断を行う制御方法も知られている。この制御方法では、例えば、輝度信号（ＤＣ成分：直流成分）のフレーム内総和のフレーム間の変化率を計測することによってシーンの変化を認識するとともに、その変化率に対応した各フレームの目標符号量の修正が行われる。なお、この制御方法は、例えば、下記の特許文献１に開示されている。
特開平８−３１７３８７号公報（図１、３、４、段落００１５〜００３６） On the other hand, in order to cope with a scene change, a control method is also known in which a scene of a fade-in / out or scene change is determined by referring to a change in luminance component of an input image. In this control method, for example, the change in the scene is recognized by measuring the change rate between frames of the sum of the luminance signal (DC component: DC component) within the frame, and the target code of each frame corresponding to the change rate is recognized. A quantity correction is made. In addition, this control method is disclosed by the following patent document 1, for example.
JP-A-8-317387 (FIGS. 1, 3, 4, paragraphs 0015 to 0036)

例えば、特許文献１に開示されている技術によれば、入力画像の輝度成分だけが変化していくようなシーン変化（例えば、視覚的に目立つような輝度変化の大きい場面）に対する符号化制御に係る効率を向上させることは可能である。しかしながら、フェードイン／アウトやシーンチェンジ時に輝度成分が変化しないようなケースは数多く存在し、例えば、同じ場面で撮影されている２つのシーンが切り替わる場合や、黒レベルではなくグレイレベルからのフェードイン／アウトなどのシーン変化が起こる場合には、上述の特許文献１に開示されている技術では、シーンの変化を正確に検出することは不可能である。 For example, according to the technique disclosed in Patent Document 1, it is possible to perform coding control for a scene change in which only the luminance component of the input image changes (for example, a scene with a large luminance change that is visually noticeable). It is possible to improve the efficiency. However, there are many cases in which the luminance component does not change during fade-in / out or scene change. For example, when two scenes shot in the same scene are switched, or fade-in from the gray level instead of the black level When a scene change such as “/ out” occurs, it is impossible to accurately detect the scene change by the technique disclosed in Patent Document 1 described above.

また、上述の特許文献１に開示されている技術では、シーン変化が検出された場合に、符号化するフレームに対して、逐次、その変動に対応した目標符号量を割り当てることになるため、例えば、複雑な画像がフェードインにより現れてくる場合などには、フェードインの途中で符号量を割り当て過ぎてしまうこととなり、フェードイン終了直後の複雑な画像に対して、充分な符号量を割り当てることができなくなってしまうという弊害が生じるおそれがある。 In addition, in the technique disclosed in Patent Document 1 described above, when a scene change is detected, a target code amount corresponding to the variation is sequentially allocated to a frame to be encoded. When a complex image appears due to fade-in, the code amount is allocated too much during the fade-in, and a sufficient code amount is allocated to the complex image immediately after the fade-in ends. There is a risk that it will not be possible to do so.

また、未来の入力画像に対しての符号化の難易度をあらかじめ測定しておき、その結果を基にして符号量の制御を行う手法として、一般的に、２パスエンコードと呼ばれる方式が存在している。しかしながら、この方式によれば、符号化する素材（入力画像）として２度同じ画像を必要とするため、符号化の対象となる入力画像として、いったん他の記録媒体に記録された画像データを利用しなければならないという制約が生じる。 In addition, there is generally a method called 2-pass encoding as a method for measuring the difficulty of encoding an input image in the future and controlling the code amount based on the result. ing. However, according to this method, since the same image is required twice as a material (input image) to be encoded, image data once recorded on another recording medium is used as an input image to be encoded. There is a restriction that must be done.

特に、カメラからの信号を入力して符号化する場合や、編集処理を行った画像データを直接符号化する場合、２パスエンコードによる処理を行うことは困難である。したがって、入力画像を直接符号化する場合には、２パスエンコードによる処理を行わずに、シーンの変化に対する予測又は判断を行うことによって符号量の割り当てを変化させ、安定した符号量の制御を行う必要がある。 In particular, when a signal from a camera is input and encoded, or when image data that has been subjected to editing processing is directly encoded, it is difficult to perform processing by two-pass encoding. Therefore, when the input image is directly encoded, the code amount assignment is changed by predicting or judging the change of the scene without performing the process by the two-pass encoding, and the stable code amount is controlled. There is a need.

上記の問題に鑑み、本発明は、映像が滑らかに変化するようなシーン（例えば、フェードシーン）において、画質劣化を抑制した良好な画像符号化制御を実現することが可能な画像符号化装置を提供することを目的とする。 In view of the above problems, the present invention provides an image encoding device capable of realizing good image encoding control in which image quality deterioration is suppressed in a scene where a video changes smoothly (for example, a fade scene). The purpose is to provide.

上記の目的を達成するため、本発明によれば、画像信号の符号化を行う画像符号化装置であって、
前記符号化の対象となる前記画像信号の時間的に連続する複数フレームのそれぞれに関するフレーム内複雑度及びＤＣ成分を算出して、その算出結果をフレーム内情報量として出力するフレーム内情報量算出手段と、
前記フレーム内情報量算出手段から出力された前記複数フレームのそれぞれに関する前記フレーム内複雑度を保持するフレーム内情報量保持手段と、
前記フレーム内情報量保持手段に保持された前記複数フレームの前記フレーム内複雑度及び前記ＤＣ成分に基づき、前記複数フレームの前記フレーム内複雑度及び前記ＤＣ成分の時間的変化を検証することによって、前記複数のフレームによって表示されるシーンがフェード状態か否かを判定するシーン検出手段と、
前記フレーム内情報量保持手段に保持された前記複数フレームの前記フレーム内複雑度と、前記シーン検出手段による前記フェード状態か否かの判定結果に基づいて、前記フェード状態である場合には、前記複数フレームの前記フレーム内複雑度の勾配を用いて算出された予測値を、前記複数フレームの後続フレームのフレーム内複雑度とする一方、前記フェード状態でない場合には、前記複数フレームのうちの最新フレームのフレーム内複雑度を、前記後続のフレームのフレーム内複雑度とするフレーム内情報量予測手段と、
最後に符号化された各ピクチャタイプのフレームの符号量と量子化スケール平均値とから計算された複雑度の近似値を保持する一方、最後に符号化された各ピクチャタイプのフレームに対応する前記フレーム内複雑度を最新の実測複雑度として保持し、前記フレーム内情報量予測手段で予測された前記後続フレームの前記フレーム内複雑度と、前記シーン検出手段による前記フェード状態か否かの判定結果とに基づいて、前記フェード状態でない場合には、前記後続フレームの前記フレーム内複雑度と、Ｉピクチャの前記最新の実測複雑度とを用いて、Ｉピクチャの複雑度の近似値のみの補正を行い、前記補正後の複雑度の近似値と、一定時間内の目標符号量総和と、各ピクチャタイプが用いられるフレーム数とを用いて、各ピクチャタイプの目標符号量を設定する一方、前記フェード状態である場合には、前記後続フレームの前記フレーム内複雑度と、各ピクチャタイプの前記最新の実測複雑度とを用いて、目標符号量に関連する想定フレーム数すべてに対して、前記複雑度の近似値をフレームごとに変換して、前記目標符号量に関連する想定フレーム数の目標符号量総和を用いて、各ピクチャタイプの目標符号量を設定する符号量制御手段とを、
有する画像符号化装置が提供される。 In order to achieve the above object, according to the present invention, an image encoding device for encoding an image signal,
Calculates the frame complexity and DC Ingredients for each of a plurality of frames temporally continuous in the image signal to be subjected to the encoding, the intraframe information amount calculating and outputting the calculation result as a frame data amount Means,
An intra-frame information amount holding means for holding the intra-frame complexity for each of the plurality of frames output from the intra-frame information amount calculation means;
Based on the intra-frame complexity and the DC component of the plurality of frames held in the intra-frame information amount holding means, by verifying temporal changes of the intra-frame complexity and the DC component of the plurality of frames, Scene detection means for determining whether or not a scene displayed by the plurality of frames is in a fade state;
Based on the in-frame complexity of the plurality of frames held in the in-frame information amount holding unit and the determination result of whether or not the fade state is in the fade state by the scene detection unit , The prediction value calculated using the gradient of the intra-frame complexity of a plurality of frames is set as the intra-frame complexity of the subsequent frame of the plurality of frames. An intra- frame information amount prediction means for setting an intra-frame complexity of a frame to an intra-frame complexity of the subsequent frame ;
While holding the approximate value of the complexity calculated from the code amount of the frame of each picture type encoded last and the quantization scale average value, the frame corresponding to the frame of each picture type encoded last The intra-frame complexity is held as the latest measured complexity, and the intra-frame complexity of the subsequent frame predicted by the intra-frame information amount prediction unit and the determination result of the fade state by the scene detection unit Based on the above, if not in the fade state, only the approximate value of the complexity of the I picture is corrected using the intra-frame complexity of the subsequent frame and the latest measured complexity of the I picture. Each of the picture types using the approximate value of the complexity after the correction, the target code amount sum in a certain time, and the number of frames in which each picture type is used. While setting the target code amount, in the fade state, the assumption related to the target code amount using the intraframe complexity of the subsequent frame and the latest measured complexity of each picture type. For all the numbers of frames, the approximation value of the complexity is converted for each frame, and the target code amount of each picture type is set using the target code amount sum of the assumed number of frames related to the target code amount. Code amount control means,
An image encoding device is provided.

また、上記の目的を達成するため、本発明によれば、画像信号の符号化を行う画像符号化装置によって実行される画像符号化方法であって、
前記符号化の対象となる前記画像信号の時間的に連続する複数フレームのそれぞれに関するフレーム内複雑度及びＤＣ成分を算出して、その算出結果をフレーム内情報量として出力するフレーム内情報量算出ステップと、
前記フレーム内情報量算出ステップにおいて出力された前記複数フレームのそれぞれに関する前記フレーム内複雑度を保持するフレーム内情報量保持ステップと、
前記フレーム内情報量保持ステップにおいて保持された前記複数フレームの前記フレーム内複雑度及び前記ＤＣ成分に基づき、前記複数フレームの前記フレーム内複雑度及び前記ＤＣ成分の時間的変化を検証することによって、前記複数のフレームによって表示されるシーンがフェード状態か否かを判定するシーン検出ステップと、
前記フレーム内情報量保持ステップにおいて保持された前記複数フレームの前記フレーム内複雑度と、前記シーン検出ステップにおける前記フェード状態か否かの判定結果に基づいて、前記フェード状態である場合には、前記複数フレームの前記フレーム内複雑度の勾配を用いて算出された予測値を、前記複数フレームの後続フレームのフレーム内複雑度とする一方、前記フェード状態でない場合には、前記複数フレームのうちの最新フレームのフレーム内複雑度を、前記後続のフレームのフレーム内複雑度とするフレーム内情報量予測ステップと、
最後に符号化された各ピクチャタイプのフレームの符号量と量子化スケール平均値とから計算された複雑度の近似値を保持する一方、最後に符号化された各ピクチャタイプのフレームに対応する前記フレーム内複雑度を最新の実測複雑度として保持し、前記フレーム内情報量予測ステップにおいて予測された前記後続フレームの前記フレーム内複雑度と、前記シーン検出ステップにおける前記フェード状態か否かの判定結果とに基づいて、前記フェード状態でない場合には、前記後続フレームの前記フレーム内複雑度と、Ｉピクチャの前記最新の実測複雑度とを用いて、Ｉピクチャの複雑度の近似値のみの補正を行い、前記補正後の複雑度の近似値と、一定時間内の目標符号量総和と、各ピクチャタイプが用いられるフレーム数とを用いて、各ピクチャタイプの目標符号量を設定する一方、前記フェード状態である場合には、前記後続フレームの前記フレーム内複雑度と、各ピクチャタイプの前記最新の実測複雑度とを用いて、目標符号量に関連する想定フレーム数すべてに対して、前記複雑度の近似値をフレームごとに変換して、前記目標符号量に関連する想定フレーム数の目標符号量総和を用いて、各ピクチャタイプの目標符号量を設定する符号量制御ステップとを、
有する画像符号化方法が提供される。 In order to achieve the above object, according to the present invention, there is provided an image encoding method executed by an image encoding apparatus that encodes an image signal,
An intra-frame information amount calculation step of calculating an intra-frame complexity and a DC component for each of a plurality of temporally continuous frames of the image signal to be encoded and outputting the calculation result as an intra-frame information amount When,
An intra-frame information amount holding step for holding the intra-frame complexity for each of the plurality of frames output in the intra-frame information amount calculation step;
Based on the intra-frame complexity and the DC component of the plurality of frames held in the intra-frame information amount holding step, by verifying temporal changes of the intra-frame complexity and the DC component of the plurality of frames, A scene detection step for determining whether or not a scene displayed by the plurality of frames is in a fade state;
Based on the in-frame complexity of the plurality of frames held in the in-frame information amount holding step and the determination result of whether or not the fade state is in the scene detection step, The prediction value calculated using the gradient of the intra-frame complexity of a plurality of frames is set as the intra-frame complexity of the subsequent frame of the plurality of frames. An intra-frame information amount prediction step in which an intra-frame complexity of a frame is set to an intra-frame complexity of the subsequent frame;
While holding the approximate value of the complexity calculated from the code amount of the frame of each picture type encoded last and the quantization scale average value, the frame corresponding to the frame of each picture type encoded last The intra-frame complexity is held as the latest actually measured complexity, and the intra-frame complexity of the subsequent frame predicted in the intra-frame information amount prediction step and the determination result of whether or not the fade state is in the scene detection step Based on the above, if not in the fade state, only the approximate value of the complexity of the I picture is corrected using the intra-frame complexity of the subsequent frame and the latest measured complexity of the I picture. Using the approximation of the complexity after the correction, the target code amount total within a certain time, and the number of frames in which each picture type is used, While setting the target code amount of the picture type, in the fade state, the target code amount is set using the intraframe complexity of the subsequent frame and the latest measured complexity of each picture type. The target code amount for each picture type is obtained by converting the approximate value of the complexity for each frame and using the target code amount sum of the assumed number of frames related to the target code amount. A code amount control step for setting
An image encoding method is provided.

本発明に係る画像符号化装置は、入力画像の周波数成分やＤＣ成分などのフレーム内情報量を参照することによって確実に映像が滑らかに変化するようなシーン（例えば、フェードシーン（フェードインやフェードアウトのシーン））の検出を行うとともに、当該フレーム内情報量の時間連続性を利用して、未来のフレームのフレーム内情報量を予測し、その予測結果に基づいて、目標符号量などの符号量制御パラメータの補正を行うように構成されており、例えば、フェードシーンにおいて、画質劣化を抑制した良好な画像符号化制御を実現することが可能となる。 The image coding apparatus according to the present invention refers to a scene (for example, fade scene (fade-in and fade-out) in which video is surely changed smoothly by referring to the amount of information in a frame such as a frequency component and a DC component of an input image. )), And the time continuity of the information amount in the frame is used to predict the information amount in the frame of the future frame. Based on the prediction result, the code amount such as the target code amount For example, in a fade scene, it is possible to realize good image coding control in which image quality deterioration is suppressed.

以下、図面を参照しながら、本発明の実施の形態における画像符号化装置について説明する。図１は、本発明の実施の形態における画像符号化装置の一例を示すブロック図である。なお、図１に示す画像符号化装置は、上述した従来の技術に係る画像符号化装置（図５参照）と共通する構成要素を有しており、ここでは、これらの共通する構成要素の説明については省略する。 Hereinafter, an image coding apparatus according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing an example of an image coding apparatus according to an embodiment of the present invention. Note that the image encoding device shown in FIG. 1 has components common to the above-described conventional image encoding device (see FIG. 5), and here, description of these common components is provided. Is omitted.

図１に示す画像符号化装置は、フレーム内情報量（アクティビティ（画面のフレーム内複雑度）などの画面内相関関係を表す周波数成分や、輝度平均値などの各画素に係る情報を表すＤＣ成分）の時間的推移に基づいて、一定時間後の入力フレームの複雑度を推測するためのアルゴリズムを実行することが可能である。すなわち、図１に示す画像符号化装置は、複数のフレームに係るアクティビティの変化を検出することによって、シーンの状態を推測するとともに将来の入力画像を予測し、その予測結果に基づいて、目標符号量の割り当てを行うことによって、例えば、カメラなどのリアルタイムで生成された入力画像に対して、入力画像内の素材の変化に対する追従の遅延や劣化の少ない画像符号化処理を実現し、特にフェードシーンに対して、良好な画像符号量の制御を実現することが可能である。以下、具体的に、図１に示す画像符号化装置の構成及び動作について説明する。 The image encoding apparatus shown in FIG. 1 has a frequency component that represents an intra-frame information amount (activity (intra-frame complexity of the screen)) and a DC component that represents information related to each pixel such as a luminance average value. ) To estimate the complexity of the input frame after a certain time. That is, the image encoding apparatus shown in FIG. 1 detects a change in activity related to a plurality of frames, thereby estimating the state of the scene and predicting a future input image, and based on the prediction result, the target code By assigning the amount, for example, for an input image generated in real time, such as a camera, image encoding processing with less delay and less deterioration with respect to changes in the material in the input image is realized. On the other hand, it is possible to realize good control of the image code amount. The configuration and operation of the image encoding device shown in FIG. 1 will be specifically described below.

図１に示す本発明に係る画像符号化装置は、図５に示す従来の技術に係る画像符号化装置に加えて、アクティビティ算出回路３０１、アクティビティ保持メモリ３０２、シーン検出回路３０３、アクティビティ予測回路３０４を有している。これらの構成要素は、本発明に関連したアクティビティに係る処理を行うために設けられている。また、符号量制御回路３０５は、図５に示す符号量制御回路２１７の機能に加えて、さらに、上述のアクティビティに係る処理結果に基づいた処理を行うための機能を有している。なお、その他の構成要素に関しては、図５に示す従来の技術に係る符号化装置に設けられた構成要素と同一の機能を有しているので、ここでは説明を省略する。 The image coding apparatus according to the present invention shown in FIG. 1 includes an activity calculation circuit 301, an activity holding memory 302, a scene detection circuit 303, and an activity prediction circuit 304 in addition to the conventional image coding apparatus shown in FIG. have. These components are provided to perform processing related to activities related to the present invention. In addition to the function of the code amount control circuit 217 shown in FIG. 5, the code amount control circuit 305 further has a function for performing processing based on the processing result relating to the activity described above. Since the other constituent elements have the same functions as those of the constituent elements provided in the conventional coding apparatus shown in FIG.

画像符号化装置の入力端子２０１から入力された入力画像信号（デジタル画像信号）は、入力画像メモリ２０２に供給されるとともに、アクティビティ算出回路３０１にも供給される。このアクティビティ算出回路３０１は、各フレームの画像の複雑度の算出を行うことが可能である。アクティビティ算出回路３０１で算出されるアクティビティとしては、例えば、隣接する画素間の差分（例えば、輝度や色差信号などの任意の画素情報の差分）の大きさから計算される各フレームの画像の水平・垂直方向の高周波数成分の値が利用可能である。 An input image signal (digital image signal) input from the input terminal 201 of the image encoding device is supplied to the input image memory 202 and also to the activity calculation circuit 301. The activity calculation circuit 301 can calculate the complexity of the image of each frame. The activity calculated by the activity calculation circuit 301 includes, for example, the level of the image of each frame calculated from the size of a difference between adjacent pixels (for example, a difference in arbitrary pixel information such as luminance and color difference signals). The value of the high frequency component in the vertical direction can be used.

具体的には、以下の式のように、入力画像の上下左右の隣り合う画素の差分絶対値を積算することによってアクティビティを示すパラメータ（アクティビティFAct）の算出が可能である。
FAct=Σ{abs(Pixel[x,y]-Pixel[x+1,y])+abs(Pixel[x,y]−Pixel[x,y+1])}
/(Width*Height*α)
ただし、Σは、xが0からWidth-2、yが0からHeight-2までの積算を表す。また、αは定数、absは絶対値、Widthは画像全体の水平方向の幅、Heightは画像全体の垂直方向の高さ、Pixelは任意の画素情報を表す。 Specifically, the parameter indicating the activity (activity FAct) can be calculated by accumulating the absolute difference values of the adjacent pixels on the upper, lower, left, and right sides of the input image as in the following expression.
FAct = Σ {abs (Pixel [x, y] -Pixel [x + 1, y]) + abs (Pixel [x, y] −Pixel [x, y + 1])}
/ (Width * Height * α)
However, Σ represents the integration from x to 0 to Width-2 and y from 0 to Height-2. Α is a constant, abs is an absolute value, Width is the horizontal width of the entire image, Height is the vertical height of the entire image, and Pixel is arbitrary pixel information.

上記のように、アクティビティ算出回路３０１からは、例えば、各画素情報の差分の積算が、画面の大きさWidth×Heightに合わせて正規化されることによって算出されたアクティビティが出力される。 As described above, the activity calculation circuit 301 outputs, for example, an activity calculated by normalizing the sum of differences of each pixel information in accordance with the screen size Width × Height.

また、さらに、アクティビティ算出回路３０１では、例えば、以下の式のように、入力画像の輝度平均値FDCも同時に算出され、出力される。
FDC=Σpixel[x,y]/(Width*Height) Furthermore, in the activity calculation circuit 301, for example, the luminance average value FDC of the input image is simultaneously calculated and output as in the following equation.
FDC = Σpixel [x, y] / (Width * Height)

これらの値FAct（アクティビティ）、FDC（輝度平均値）は、アクティビティ算出回路３０１から出力された後、アクティビティ保持メモリ３０２に格納される。アクティビティ保持メモリ３０２は、これらのアクティビティ算出回路３０１から出力されたアクティビティFAct、輝度平均値FDCを蓄積できるように構成されており、例えば、過去Ｎフレーム分のアクティビティFAct、輝度平均値FDCを格納するとともに、新しいアクティビティFAct、輝度平均値FDCを格納する際には、一番古い値（Ｎ＋１フレーム前の値）が破棄されるように構成されている。 These values FAct (activity) and FDC (luminance average value) are output from the activity calculation circuit 301 and then stored in the activity holding memory 302. The activity holding memory 302 is configured to accumulate the activity FAct and the luminance average value FDC output from these activity calculation circuits 301. For example, the activity holding memory 302 stores the activity FAct and luminance average value FDC for the past N frames. At the same time, when storing a new activity FAct and luminance average value FDC, the oldest value (the value before N + 1 frame) is discarded.

例えば、図４（Ａ）〜（Ｃ）に示すような符号化シンタックス（周期Ｍ＝３）で符号化体系が構成されており、Ｉピクチャ及びＰピクチャの符号化を行う場合には、自分自身のフレームを含めて過去Ｎフレームの算出されたアクティビティFAct及び輝度平均値FDCが格納されており、１フレーム前及び２フレーム前に関しては未符号化のフレームとなる。一方、Ｂピクチャの符号化を行う場合には、自分自身のフレームに対して未来のフレームの３フレーム（フレーム内符号済みＩ／Ｐピクチャが１フレーム）と過去Ｎ−３フレームの算出されたアクティビティが格納されている。 For example, an encoding system is configured with an encoding syntax (period M = 3) as shown in FIGS. 4A to 4C, and when I and P pictures are encoded, The calculated activity FAct and luminance average value FDC of the past N frames including its own frame are stored, and the previous frame and the previous frame are uncoded frames. On the other hand, in the case of encoding a B picture, the calculated activities of three future frames (one intraframe coded I / P picture) and past N-3 frames with respect to its own frame Is stored.

一方、シーン検出回路３０３は、上記のアクティビティ保持メモリ３０２に格納されたアクティビティFAct及び輝度平均値FDCを読み出して、フレーム間におけるこれらの値の連続性を調べることによって、フェードイン／アウトなどの検出を行う。ここで、シーン検出回路３０３におけるフェードイン／アウトの検出アルゴリズムについて、図２及び図３を参照しながら説明する。 On the other hand, the scene detection circuit 303 reads out the activity FAct and the luminance average value FDC stored in the activity holding memory 302 and detects the continuity of these values between frames, thereby detecting fade-in / out, etc. I do. Here, a fade-in / out detection algorithm in the scene detection circuit 303 will be described with reference to FIGS.

図２は、本発明の実施の形態における画像符号化装置のシーン検出回路のフェードイン／アウトの検出アルゴリズムの一例を示すフローチャートの１ページ目である。なお、アクティビティ保持メモリ３０２に格納されている最新のフレーム（すなわち、Ｎ番目のフレーム）のアクティビティ及び輝度平均値をFAct(N)及びFDC(N)、１フレーム前のフレーム（すなわち、Ｎ−１番目のフレーム）のアクティビティ及び輝度平均値をFAct(N−1)及びFDC(N-1)、Ｎフレーム前のフレーム（すなわち、０番目のフレーム）のアクティビティ及び輝度平均値をFAct(0)及びFDC(0)と記載することにする。また、Ｎは定数、ＰはＮ−１より小さい定数である。 FIG. 2 is the first page of a flowchart illustrating an example of a fade-in / out detection algorithm of the scene detection circuit of the image encoding device according to the embodiment of the present invention. Note that the activity and luminance average value of the latest frame (that is, the Nth frame) stored in the activity holding memory 302 are FAct (N) and FDC (N), and the previous frame (that is, N−1). The activity and luminance average value of the first frame) are FAct (N−1) and FDC (N−1), and the activity and luminance average value of the frame N frames before (ie, the 0th frame) are FAct (0) and It will be described as FDC (0). N is a constant, and P is a constant smaller than N-1.

シーン検出回路３０３では、フェードイン／アウトのシーンを検出するために、Ｎフレーム間のアクティビティFActの変化が計測され、このアクティビティFActが一様な変化をしているか否かが調べられる。具体的には、まず、変数Ｉ、ＪがそれぞれＪ＝０、Ｉ＝Ｊ＋１に設定され（ステップＳ１、Ｓ２）、FAct(J)とFAct(N)との大きさの比較が行われる（ステップＳ３）。 In the scene detection circuit 303, in order to detect a fade-in / out scene, a change in activity FAct between N frames is measured, and it is checked whether or not the activity FAct is changing uniformly. Specifically, first, variables I and J are set to J = 0 and I = J + 1, respectively (steps S1 and S2), and the magnitudes of FAct (J) and FAct (N) are compared (steps). S3).

ここで、FAct(J)≦FAct(N)の場合（ステップＳ３で『はい』）、すなわち、最新のフレームのアクティビティがＮ−Ｊフレーム前のアクティビティより大きい場合か同値の場合には、Ｎ−Ｊフレーム前から最新のフレームまでのアクティビティの変化が広義の単調増加の関係にあるか否かが調べられる。すなわち、変数Ｉを初期値Ｊ＋１からＮ−１までインクリメントしながら、すべての変数Ｉに対して、FAct(I-1)≦FAct(I)≦FAct(N)の関係となっているか否かが調べられる（ステップＳ４〜Ｓ６）。そして、すべての変数Ｉに対して、FAct(I-1)≦FAct(I)≦FAct(N)の関係が成立する場合には、Ｎ−Ｊフレーム前から最新のフレームまでのアクティビティが広義の単調増加の関係にあり、Ｊフレーム〜最新のフレーム（Ｎフレーム）のシーンがフェードイン状態の可能性があると判定される（ステップＳ７：仮フェードイン判定）。 Here, if FAct (J) ≦ FAct (N) (“Yes” in step S3), that is, if the activity of the latest frame is greater than or equal to the activity before the NJ frame, N− It is checked whether the change in activity from before the J frame to the latest frame is in a broad monotonic relationship. That is, whether the relationship of FAct (I-1) ≦ FAct (I) ≦ FAct (N) is satisfied for all the variables I while incrementing the variable I from the initial value J + 1 to N−1. It is examined (steps S4 to S6). If the relationship of FAct (I-1) ≦ FAct (I) ≦ FAct (N) is established for all variables I, the activities from NJ frames before to the latest frame are broadly defined. It is determined that there is a possibility that the scene from the J frame to the latest frame (N frame) is in a fade-in state because of a monotonically increasing relationship (step S7: provisional fade-in determination).

また、ステップＳ４で任意の変数Ｉに関してFAct(I-1)≦FAct(I)≦FAct(N)の関係が成立しない場合には、Ｊフレーム〜最新のフレーム（Ｎフレーム）のシーンがフェードインではないと判断されて、ステップＳ８に進み、変数Ｊが１つインクリメント（Ｊ＝Ｊ＋１）される。そして、変数Ｊが定数Ｍを超えるまで（ステップＳ９で『いいえ』）、上述のフェード判定が行われる一方、変数Ｊが定数Ｍを超えた場合（ステップＳ９で『はい』）には、最終的に、判定対象となる入力画像（アクティビティ保持メモリ３０２に格納されているＮフレームの入力画像）に関して、フェード未検出と判定される（ステップＳ１４）。 If the relationship FAct (I-1) ≦ FAct (I) ≦ FAct (N) does not hold for an arbitrary variable I in step S4, the scene from J frame to the latest frame (N frame) fades in. Therefore, the process proceeds to step S8, and the variable J is incremented by one (J = J + 1). The above-described fade determination is performed until the variable J exceeds the constant M (“No” in step S9). On the other hand, if the variable J exceeds the constant M (“Yes” in step S9), the final determination is made. Furthermore, regarding the input image to be determined (the input image of N frames stored in the activity holding memory 302), it is determined that the fade has not been detected (step S14).

一方、FAct(J)≦FAct(N)ではない場合（ステップＳ３で『いいえ』）、すなわち、FAct(J)＞FAct(N)となっており、最新のフレームのアクティビティがＮ−Ｊフレーム前のアクティビティよりも小さい場合には、Ｎ−Ｊフレーム前から最新のフレームまでのアクティビティの変化が広義の単調減少の関係にあるか否かが調べられる。この場合には、変数Ｉを初期値Ｊ＋１からＮ−１までインクリメントしながら、すべての変数Ｉに対して、FAct(I-1)≧FAct(I)≧FAct(N)の関係となっているか否かが調べられる（ステップＳ１０〜Ｓ１２）。 On the other hand, if FAct (J) ≦ FAct (N) is not satisfied (“No” in step S3), that is, FAct (J)> FAct (N), and the latest frame activity is NJ frames before. If the activity is smaller than the current activity, it is checked whether or not the change in activity from before the NJ frame to the latest frame is in a monotonic decrease relationship in a broad sense. In this case, whether the relationship of FAct (I-1) ≧ FAct (I) ≧ FAct (N) is satisfied for all the variables I while the variable I is incremented from the initial value J + 1 to N−1. It is checked whether or not (steps S10 to S12).

そして、上述したステップＳ４〜Ｓ６における広義の単調増加の関係にあるか否かを調べる場合と同様に、すべての変数Ｉに対して、FAct(I-1)≧FAct(I)≧FAct(N)の関係が成立する場合には、Ｎ−Ｊフレーム前から最新のフレームまでのアクティビティが広義の単調減少の関係にあり、Ｊフレーム〜最新のフレーム（Ｎフレーム）のシーンがフェードアウト状態の可能性があると判定される（ステップＳ１３：仮フェードアウト判定）。また、広義の単調減少の関係が成立しない場合には、ステップＳ８に進んで、変数Ｊが１つインクリメントされて、再びフェード判定が行われる一方、変数Ｊが定数Ｍを超えた場合には、フェード未検出と判定される。 Then, as in the case of checking whether or not there is a broad monotonic increase relationship in steps S4 to S6 described above, for all variables I, FAct (I-1) ≧ FAct (I) ≧ FAct (N ) Relationship is established, the activity from the NJ frame to the latest frame is in a monotonically decreasing relationship, and the scene from the J frame to the latest frame (N frame) may be in a fade-out state. Is determined (step S13: provisional fade-out determination). On the other hand, if the broad monotonic decrease relationship is not established, the process proceeds to step S8, where the variable J is incremented by one and the fade determination is performed again. On the other hand, if the variable J exceeds the constant M, It is determined that no fade has been detected.

以上のように、シーン検出回路３０３では、Ｎ−Ｊフレーム前（ただし、０≦Ｊ≦Ｍ）から最新のフレームまでのシーンのフェード判定が行われ、その結果、アクティビティが広義の単調増加の関係にある仮フェードイン判定（ステップＳ７）、アクティビティが広義の単調減少の関係にある仮フェードアウト判定（ステップＳ１３）、フェード未検出（ステップＳ１４）のうちのいずれか１つの判定が行われる。 As described above, the scene detection circuit 303 performs a fade determination for a scene from NJ frames before (however, 0 ≦ J ≦ M) to the latest frame, and as a result, the activity has a monotonically increasing relationship. The temporary fade-in determination (step S7), the temporary fade-out determination (step S13) in which the activity has a monotonous decrease relationship in a broad sense, and the fade not detected (step S14) are performed.

また、上述の処理によって、仮フェードイン判定又は仮フェードアウト判定がなされた場合には、さらに、輝度平均値FDCの変化を調べることによって、フェードシーンか否かの判定を行うことが望ましい。図３は、本発明の実施の形態における画像符号化装置のシーン検出回路のフェードイン／アウトの検出アルゴリズムの一例を示すフローチャートの２ページ目である。なお、図３に示す仮フェード検出（ステップＳ２１）は、図２において、仮フェードイン判定（ステップＳ７）又は仮フェードアウト判定（ステップＳ１３）の判定結果が得られた状態を表している。 In addition, when a temporary fade-in determination or a temporary fade-out determination is made by the above-described processing, it is desirable to further determine whether or not the scene is a fade scene by examining a change in the luminance average value FDC. FIG. 3 is the second page of the flowchart illustrating an example of the fade-in / out detection algorithm of the scene detection circuit of the image encoding device according to the embodiment of the present invention. The temporary fade detection (step S21) shown in FIG. 3 represents a state in which the determination result of the temporary fade-in determination (step S7) or the temporary fade-out determination (step S13) is obtained in FIG.

シーン検出回路３０３は、仮フェードイン判定又は仮フェードアウト判定の判定結果を得た場合（ステップＳ２１）には、さらに、当該仮フェード検出の判定結果が得られた判定対象区間（Ｊフレームから最新のフレームまでの区間）のＤＣ成分の推移を観測する。この判定では、例えば、判定対象区間における輝度平均値FDCが、増加傾向又は減少傾向の状態にあるか否かが検証される。 When the scene detection circuit 303 obtains the determination result of the temporary fade-in determination or the temporary fade-out determination (step S21), the scene detection circuit 303 further determines the determination target section (the latest frame from the J frame) where the determination result of the temporary fade detection is obtained. Observe the transition of the DC component during the interval up to the frame. In this determination, for example, it is verified whether the luminance average value FDC in the determination target section is in an increasing tendency or a decreasing tendency state.

具体的には、まず、変数ＩがＩ＝Ｊ＋１（ただし、Ｊは、上述の仮フェード検出の判定結果が得られた判定対象区間を特定する定数）に設定され（ステップＳ２２）、FDC(J)とFDC(N)との大きさの比較が行われる（ステップＳ２３）。 Specifically, first, the variable I is set to I = J + 1 (where J is a constant that identifies the determination target section from which the above-described provisional fade detection determination result is obtained) (step S22), and FDC (J ) And FDC (N) are compared (step S23).

ここで、FDC(J)≦FDC(N)の場合（ステップＳ２３で『はい』）、すなわち、最新のフレームの輝度平均値がＮ−Ｊフレーム前の輝度平均値より大きい場合か同値の場合には、Ｎ−Ｊフレーム前から最新のフレームまでの輝度平均値の変化が増加傾向にあるか否かが調べられる。すなわち、変数Ｉを初期値Ｊ＋１からＮ−１までインクリメントしながら、すべての変数Ｉに対して、FDC(I-1)-β≦FDC(I)≦FDC(N)+βの関係となっているか否かが調べられる（ステップＳ２４〜Ｓ２６）。なお、上記の式の所定値βは、輝度平均値の変化の誤差（隣接するフレーム間において、輝度平均値が±βの範囲内だけの揺らぎ）を許容するために設定される値である。所定値βは、例えば、Ｎ−Ｊフレーム前から最新のフレームまでのＤＣ差分値FDC(N)-FDC(J)の１０分の１程度の値に設定されることが望ましい。 Here, when FDC (J) ≦ FDC (N) (“Yes” in step S23), that is, when the luminance average value of the latest frame is greater than or equal to the luminance average value before the NJ frame. It is checked whether or not the change in the average brightness value from before the NJ frame to the latest frame tends to increase. That is, while the variable I is incremented from the initial value J + 1 to N-1, the relationship of FDC (I-1)-. Beta..ltoreq.FDC (I) .ltoreq.FDC (N) +. Beta. It is checked whether or not it exists (steps S24 to S26). Note that the predetermined value β in the above expression is a value set to allow an error in the change of the luminance average value (fluctuation only between the luminance average values within ± β between adjacent frames). For example, the predetermined value β is preferably set to a value of about 1/10 of the DC difference value FDC (N) -FDC (J) from the N-J frame before to the latest frame.

そして、すべての変数Ｉに対して、FDC(I-1)-β≦FDC(I)≦FDC(N)+βの関係が成立する場合には、Ｎ−Ｊフレーム前から最新のフレームまでの輝度平均値の変化が増加傾向にあり、Ｊフレーム〜最新のフレーム（Ｎフレーム）のシーンがフェードシーンであると決定される（ステップＳ２７：フェード検出）。なお、ステップＳ２７におけるフェード検出では、ステップＳ２１における仮フェードイン判定又は仮フェードアウト判定の判定結果に応じて、フェードイン検出又はフェードアウト検出の決定がなされる。また、ステップＳ２４で任意の変数Ｉに関してFDC(I-1)-β≦FDC(I)≦FDC(N)+βの関係が成立しない場合には、当該判定対象区間に係るシーンはフェードシーンではないと決定される（ステップＳ３２：フェード未検出）。 If the relationship of FDC (I-1) −β ≦ FDC (I) ≦ FDC (N) + β holds for all variables I, the frame from N−J frame to the latest frame The change in the luminance average value tends to increase, and the scene from the J frame to the latest frame (N frame) is determined to be a fade scene (step S27: fade detection). In the fade detection in step S27, the fade-in detection or the fade-out detection is determined according to the determination result of the temporary fade-in determination or the temporary fade-out determination in step S21. If the relationship of FDC (I-1) −β ≦ FDC (I) ≦ FDC (N) + β is not established for an arbitrary variable I in step S24, the scene related to the determination target section is a fade scene. (Step S32: Fade not detected).

一方、FDC(J)≦FDC(N)ではない場合（ステップＳ２３で『いいえ』）、すなわち、FDC(J)＞FDC(N)となっており、最新のフレームの輝度平均値がＮ−Ｊフレーム前の輝度平均値よりも小さい場合には、Ｎ−Ｊフレーム前から最新のフレームまでのアクティビティの変化が減少傾向にあるか否かが調べられる。すなわち、変数Ｉを初期値Ｊ＋１からＮ−１までインクリメントしながら、すべての変数Ｉに対して、FDC(I-1)+β≧FDC(I)≧FDC(N)-βの関係となっているか否かが調べられる（ステップＳ２８〜Ｓ３０）。なお、上述したステップＳ２４における増加傾向か否かを調べる場合と同様に、このステップＳ２８においても、輝度平均値の変化の誤差を許容するために、上述の所定値βが設定されることが望ましい。 On the other hand, if FDC (J) ≦ FDC (N) is not satisfied (“No” in step S23), that is, FDC (J)> FDC (N), and the luminance average value of the latest frame is N−J. If it is smaller than the average luminance value before the frame, it is checked whether or not the change in activity from before the NJ frame to the latest frame tends to decrease. That is, while the variable I is incremented from the initial value J + 1 to N−1, the relationship of FDC (I−1) + β ≧ FDC (I) ≧ FDC (N) −β is established for all variables I. It is checked whether or not it is present (steps S28 to S30). As in the case of checking whether or not there is an increasing tendency in step S24 described above, in this step S28, it is desirable to set the predetermined value β described above in order to allow an error in the change of the luminance average value. .

そして、すべての変数Ｉに対して、FDC(I-1)+β≧FDC(I)≧FDC(N)-βの関係が成立する場合には、Ｎ−Ｊフレーム前から最新のフレームまでの輝度平均値の変化が減少傾向にあり、Ｊフレーム〜最新のフレーム（Ｎフレーム）のシーンがフェードシーンであると決定される（ステップＳ３１：フェード検出）。なお、ステップＳ３１におけるフェード検出では、ステップＳ２１における仮フェードイン判定又は仮フェードアウト判定の判定結果に応じて、フェードイン検出又はフェードアウト検出の決定がなされる。また、ステップＳ２８で任意の変数Ｉに関してFDC(I-1)+β≧FDC(I)≧FDC(N)-βの関係が成立しない場合には、当該判定対象区間に係るシーンはフェードシーンではないと決定される（ステップＳ３２：フェード未検出）。 When the relationship of FDC (I-1) + β ≧ FDC (I) ≧ FDC (N) −β is established for all variables I, the frame from the previous N−J frame to the latest frame The change in the average luminance value tends to decrease, and the scene from the J frame to the latest frame (N frame) is determined to be a fade scene (step S31: fade detection). In the fade detection in step S31, the fade-in detection or the fade-out detection is determined according to the determination result of the temporary fade-in determination or the temporary fade-out determination in step S21. Further, if the relationship of FDC (I-1) + β ≧ FDC (I) ≧ FDC (N) −β is not established with respect to an arbitrary variable I in step S28, the scene related to the determination target section is a fade scene. (Step S32: Fade not detected).

なお、図３に示す輝度平均値を利用した検出アルゴリズムにおいて、輝度平均値FDCの増加傾向及び減少傾向の両方の検証を行っている理由は、例えば、白地に黒い文字が浮かび上がる（あるいは白地上の黒い文字が消えていく）ような、輝度平均値の高い画像から輝度平均値を下げる画像が浮かび上がる（あるいは消えていく）フェードと、黒地に白い文字が浮かび上がる（あるいは黒地上の白い文字が消えていく）ような、輝度平均値の低い画像から輝度平均値を上げる画像が浮かび上がる（あるいは消えていく）フェードの両方を考慮できるようにするためである。 In the detection algorithm using the average luminance value shown in FIG. 3, the reason why both the increasing tendency and decreasing tendency of the average luminance value FDC are verified is, for example, that black characters appear on a white background (or a white ground) When the average brightness value fades from an image with a high average brightness value (or disappears), such as a black character disappears), a white character appears on a black background (or white character on the black ground) This is because it is possible to consider both fades in which an image whose luminance average value is increased from an image whose luminance average value is low (such as disappears).

以上の検出アルゴリズムによって、シーン検出回路３０３は、アクティビティFActや、更には輝度平均値FDCを参照して、フェードシーンの検出を行うことが可能である。シーン検出回路３０３においてフェードシーンが検出された場合には、シーン検出回路３０３からアクティビティ予測回路３０４に対して、当該フェードシーンを特定するための値（例えば、上述の最新のフレームを特定する値Ｎや、フェードシーンの区間を特定する値Ｊの値）や、アクティビティの増加（フェードイン）及び減少（フェードアウト）のどちらが検出されたのかを示すフラグが供給される。 With the above detection algorithm, the scene detection circuit 303 can detect a fade scene with reference to the activity FAct and further the luminance average value FDC. When the scene detection circuit 303 detects a fade scene, the scene detection circuit 303 sends a value for specifying the fade scene to the activity prediction circuit 304 (for example, the value N for specifying the latest frame described above). And a flag indicating whether an increase (fade in) or a decrease (fade out) in activity is detected.

アクティビティ予測回路３０４は、シーン検出回路３０３からの情報に応じて、未来のフレームに係るアクティビティの予測を行う。アクティビティ予測回路３０４は、例えばＩ／Ｐピクチャの場合には、未符号化のＢフレームの２フレームを含めた現在のフレーム（Ｎフレーム）の２フレーム前のフレーム（Ｎ−２フレーム）から、符号量制御に係るコントロールの対象となる一定時間後(例えば、Ｌフレーム後)のフレーム（Ｎ＋Ｌフレーム）までの間のアクティビティの予測値（予測アクティビティ値PredAct）の生成を行う。予測アクティビティ値としては、Ｎ−２フレームからＮフレームまでのアクティビティに関しては、実測アクティビティ値が利用可能である一方、Ｎ＋１フレームからＮ＋Ｌフレームまでのアクティビティに関しては、アクティビティ値の予測を行って、その予測結果が利用される。 The activity prediction circuit 304 predicts an activity related to a future frame in accordance with information from the scene detection circuit 303. For example, in the case of an I / P picture, the activity prediction circuit 304 encodes from the frame (N-2 frame) two frames before the current frame (N frame) including two unencoded B frames. A predicted activity value (predicted activity value PredAct) is generated until a frame (N + L frame) after a certain time (for example, after L frames), which is a control target related to the quantity control. As the predicted activity value, the measured activity value can be used for the activities from the N-2 frame to the N frame, while the activity value is predicted for the activities from the N + 1 frame to the N + L frame. The result is used.

なお、シーン検出回路３０３がフェードシーンではないと判断したフレームに関しては、アクティビティ予測回路３０４は、例えば、シーンの変化がないという想定で、各フレームの予測アクティビティ値PredActに、最新のフレーム（Ｎフレーム）の実測アクティビティ値を出力することが可能である。 For a frame that the scene detection circuit 303 determines is not a fade scene, for example, the activity prediction circuit 304 assumes that there is no scene change, and the predicted activity value PredAct of each frame includes the latest frame (N frames). ) Can be output.

一方、シーン検出回路３０３によってフェードシーンであると判断されている場合には、Ｋフレームの予測アクティビティ値PredAct(K)（ただし、変数Ｋは、Ｎ＋１≦Ｋ≦Ｎ＋Ｌ）は、Ｊフレーム、Ｎフレーム、Ｎ−１フレームのそれぞれの実測アクティビティ値から予測される。 On the other hand, when the scene detection circuit 303 determines that the scene is a fade scene, the predicted activity value PredAct (K) of the K frame (where the variable K is N + 1 ≦ K ≦ N + L) is J frame, N frame. , N-1 frames are predicted from the actually measured activity values.

すなわち、FAct(J)≦FAct(N)の場合には、
PredAct(K) = FAct(J)
＋min[{FAct(N)-FAct(J)}*(K-J)/(J-N), {FAct(N)-FAct(N-1)}*(K-J)]
と予測可能である。 That is, if FAct (J) ≦ FAct (N),
PredAct (K) = FAct (J)
+ Min [{FAct (N) -FAct (J)} * (KJ) / (JN), {FAct (N) -FAct (N-1)} * (KJ)]
Is predictable.

また、FAct(J)＞FAct(N)の場合には、
PredAct(K) = FAct(J)
−min[{FAct(J)-FAct(N)}*(K-J)/(J-N), {FAct(N-1)-FAct(N)}*(K-J)]
と予測可能である。 If FAct (J)> FAct (N),
PredAct (K) = FAct (J)
−min [{FAct (J) -FAct (N)} * (KJ) / (JN), {FAct (N-1) -FAct (N)} * (KJ)]
Is predictable.

なお、上述の２つの式では、Ｊフレーム〜Ｎフレームの勾配と、Ｎ−１フレーム〜Ｎフレームの勾配のうちの小さい値を予測勾配として採用している。また、上述の式によって算出される予測アクティビティ値PredAct(K)は、最小想定値MinAct及び最大想定値MaxActによって帯域が制限される（すなわち、MinAct＜PredAct(I)＜MaxAct）。 In the above two formulas, a small value of the gradient from the J frame to the N frame and the gradient from the N-1 frame to the N frame is used as the prediction gradient. Further, the band of the predicted activity value PredAct (K) calculated by the above formula is limited by the minimum assumed value MinAct and the maximum assumed value MaxAct (that is, MinAct <PredAct (I) <MaxAct).

また、Ｂピクチャの場合も同様に、現在のフレームに係る符号化シンタックスを考慮すると、符号化するフレームがＮ−３フレームとなり、符号化済みのＮ−１フレーム又はＮ−２フレームを含めてＮフレームまでの実測値が出ているため、Ｎ−３フレームからＮフレームまでは、実測アクティビティ値が利用可能である一方、Ｎ＋１フレームからＮ＋Ｌフレームまでは、Ｉ／Ｐピクチャの場合と同様に、アクティビティ値の予測を行って、その予測結果が利用される。 Similarly, in the case of a B picture, if the encoding syntax related to the current frame is taken into consideration, the frame to be encoded is an N-3 frame, including the encoded N-1 frame or N-2 frame. Since actual measurement values up to N frames are available, actual activity values can be used from N-3 frames to N frames, while from N + 1 frames to N + L frames, as in the case of I / P pictures, The activity value is predicted, and the prediction result is used.

符号量制御回路３０５は、アクティビティ予測回路３０４によって生成・出力される予測アクティビティ値PredActを用いて、目標符号量の補正・算出を行う。符号量制御回路３０５内には、最後に符号化された各ピクチャタイプの符号量Bits(T)と量子化スケール平均値AvgQ(T)から計算された複雑度の近似値C(T)=Bits(T)*AvgQ(T)（ただし、Ｔはピクチャタイプ）が保持されるとともに、さらに、最後に符号化された各ピクチャタイプの実測アクティビティ値SuvAct(T)が保持されるように構成されている。 The code amount control circuit 305 corrects / calculates the target code amount using the predicted activity value PredAct generated / output by the activity prediction circuit 304. In the code amount control circuit 305, an approximation value C (T) = Bits of the complexity calculated from the code amount Bits (T) of each picture type encoded last and the quantization scale average value AvgQ (T) (T) * AvgQ (T) (where T is a picture type) is held, and the actual activity value SuvAct (T) of each picture type encoded last is held. Yes.

符号量制御回路３０５には、シーン検出回路３０３から、フェードシーンか否かを示すフラグが供給される。シーン検出回路３０３から供給されるフラグがフェードシーンではないことを示している場合には、符号量制御回路３０５は、予測アクティビティ値PredActを用いて、Ｉピクチャの複雑度の近似値C(I)のみの補正を行う。なお、フェードシーンではない場合には、PredActは、当該Ｉピクチャのフレームの実測アクティビティ値、又は次のＩピクチャに最もフレーム間距離の近いフレームの実測アクティビティ値となる。 The code amount control circuit 305 is supplied with a flag indicating whether or not it is a fade scene from the scene detection circuit 303. When the flag supplied from the scene detection circuit 303 indicates that the scene is not a fade scene, the code amount control circuit 305 uses the prediction activity value PredAct to approximate the I picture complexity C (I). Only correct. If the scene is not a fade scene, PredAct is the measured activity value of the frame of the I picture or the measured activity value of the frame closest to the next I picture.

すなわち、Ｉピクチャの複雑度の近似値C(I)の補正式は、
Crev(I)=C(I)*PredAct(NextI)/SuvAct(I)
と表すことが可能である。なお、NextIは、符号化シンタックスにより決定される次のＩピクチャが出現するフレームを指している。 That is, the correction formula for the approximate value C (I) of the complexity of the I picture is
Crev (I) = C (I) * PredAct (NextI) / SuvAct (I)
Can be expressed as Note that NextI indicates a frame in which the next I picture determined by the encoding syntax appears.

また、上述の式によって変換されたCrev(I)を用いて、各ピクチャタイプＩ、Ｐ、Ｂのそれぞれの目標符号量Budgetを、以下のように設定することが可能である。
Budget(I)={TotalBits*Crev(I)}/{Fnum(I)*Crev(I)+Fnum(P)*C(P)+Fnum(B)*C(B)}
Budget(P)={TotalBits*C(P)}/{Fnum(I)*Crev(I)+Fnum(P)*C(P)+Fnum(B)*C(B)}
Budget(B)={TotalBits*C(B)}/{Fnum(I)*Crev(I)+Fnum(P)*C(P)+Fnum(B)*C(B)} Further, the target code amount Budget of each picture type I, P, B can be set as follows using Crev (I) converted by the above formula.
Budget (I) = {TotalBits * Crev (I)} / {Fnum (I) * Crev (I) + Fnum (P) * C (P) + Fnum (B) * C (B)}
Budget (P) = {TotalBits * C (P)} / {Fnum (I) * Crev (I) + Fnum (P) * C (P) + Fnum (B) * C (B)}
Budget (B) = {TotalBits * C (B)} / {Fnum (I) * Crev (I) + Fnum (P) * C (P) + Fnum (B) * C (B)}

一方、シーン検出回路３０３から供給されるフラグがフェードシーンであることを示している場合には、符号量制御回路３０５は、目標符号量に関連する想定フレーム数すべてに対して、複雑度の近似値C(T)をフレームごとに変換して、目標符号量を求める。このとき、符号化するピクチャをＮとすると、Ｎフレーム〜Ｎ＋Ｌフレームまでの複雑度の近似値C(T(W))(ただし、Ｔ（Ｗ）はＷフレームのピクチャタイプ、ＷはＮ〜Ｎ＋Ｌ）に対して補正が行われる。 On the other hand, when the flag supplied from the scene detection circuit 303 indicates that the scene is a fade scene, the code amount control circuit 305 approximates the complexity for all the assumed number of frames related to the target code amount. The value C (T) is converted for each frame to obtain the target code amount. At this time, assuming that the picture to be encoded is N, the approximate value C (T (W)) of complexity from N frames to N + L frames (where T (W) is the picture type of the W frame and W is N to N + L) ) Is corrected.

例えば、ピクチャタイプT(W)がＩピクチャの場合には、
Crev(W)=C(I)*PredAct(W)/SuvAct(I)
のように補正可能である。 For example, if the picture type T (W) is an I picture,
Crev (W) = C (I) * PredAct (W) / SuvAct (I)
It can be corrected as follows.

また、例えば、ピクチャタイプT(W)がＰピクチャの場合には、
Crev(W)=C(P)*γ+C(P)*{PredAct(W)/SuvAct(P)}*(1-γ)
ただし、γは定数（０＜γ＜１）
のように補正可能である。 For example, when the picture type T (W) is a P picture,
Crev (W) = C (P) * γ + C (P) * {PredAct (W) / SuvAct (P)} * (1-γ)
Where γ is a constant (0 <γ <1)
It can be corrected as follows.

また、例えば、ピクチャタイプT(W)がＢピクチャの場合には、
Crev(W)=C(B)*Δ+C(B)*{PredAct(W)/SuvAct(B)}*(1-Δ)
ただし、Δは定数（０＜Δ＜１）
のように補正可能である。 For example, when the picture type T (W) is a B picture,
Crev (W) = C (B) * Δ + C (B) * {PredAct (W) / SuvAct (B)} * (1-Δ)
Where Δ is a constant (0 <Δ <1)
It can be corrected as follows.

上記の各ピクチャタイプT(W)に係る複雑度の近似値の補正値Crev(W)を用いて、最終的に、目標符号量は以下の式で求められる。
Budget(N)=TotalBits*Crev(N)/ΣCrev(W)
ただし、ΣはＷ＝Ｎ〜Ｎ＋Ｌまでの総和 Using the correction value Crev (W) of the approximate value of the complexity related to each picture type T (W), the target code amount is finally obtained by the following equation.
Budget (N) = TotalBits * Crev (N) / ΣCrev (W)
Where Σ is the sum of W = N to N + L

このようにして最終的に求められた目標符号量の補正値は、符号量制御回路３０５から量子化回路２０６に供給される。量子化回路２０６では、従来の画像符号化処理と同様に、この目標符号量の補正値を新たな目標符号量として利用し、画像ビットストリームの符号量を新たな目標符号量に近づけるように、量子化スケールの制御が行われる。 The target code amount correction value finally obtained in this manner is supplied from the code amount control circuit 305 to the quantization circuit 206. The quantization circuit 206 uses the correction value of the target code amount as a new target code amount, as in the conventional image encoding process, so that the code amount of the image bitstream approaches the new target code amount. The quantization scale is controlled.

以上の動作により、入力画像のアクティビティや輝度平均値を参照することによって確実にフェードシーンの検出を行うとともに、アクティビティの時間連続性を利用して、未来のフレームのアクティビティ（予測アクティビティ値）を予測し、その予測結果に基づいて、目標符号量などの符号量制御パラメータの補正を行うことが可能となり、その結果、特に、フェードシーンにおける画質劣化を抑制することが可能となる。 With the above operations, fade scenes are reliably detected by referring to the input image activity and average brightness, and future frame activity (predicted activity value) is predicted using the time continuity of the activity. In addition, based on the prediction result, it is possible to correct the code amount control parameter such as the target code amount, and as a result, it is possible to suppress image quality deterioration particularly in a fade scene.

また、本発明によれば、特に、黒やグレイなどの一様なシーンから、複雑な映像がフェードインによって現れてくるようなシーンに対しても、あらかじめフェードシーンの傾向を認識して、フレームごとの目標符号量を設定することが可能となるため、複雑な映像に対して将来必要となるであろう情報量をあらかじめ予約した状態で、目標符号量の割り当てを行うことが可能となる。これにより、従来の技術では、特にフェードイン終了直後の複雑な画像に対して、適切な符号量の割り当てが困難となってしまうという弊害が生じ、画質の劣化が発生してしまうような場合でも、本発明によれば、特にフェードイン終了直後の複雑な画像に対して、良好な符号量制御を実現することが可能となる。 In addition, according to the present invention, in particular, even for a scene in which a complicated video appears by fading in from a uniform scene such as black or gray, the tendency of the fade scene is recognized in advance, and the frame Since it is possible to set a target code amount for each, it is possible to assign a target code amount in a state where an amount of information that will be required in the future for a complex video is reserved in advance. As a result, the conventional technique has a disadvantage that it becomes difficult to assign an appropriate code amount to a complex image immediately after the fade-in, and even when the image quality is deteriorated. According to the present invention, it is possible to realize good code amount control particularly for a complex image immediately after the end of fade-in.

なお、上述の実施の形態では、主にフェードシーンに対する画像符号化制御について説明したが、映像の傾向が滑らかに変化するようなシーン（例えば、画面のパンやエフェクトなどによるシーン変化）に関しても、本発明に係る画像符号化制御を適用することが可能であり、上述のフェードシーンに対して適用した場合と同様に、良好な符号量制御を実現することが可能である。 In the above-described embodiment, image coding control for a fade scene has been mainly described. However, for a scene in which the tendency of a video changes smoothly (for example, a scene change due to a screen pan or an effect), The image coding control according to the present invention can be applied, and a good code amount control can be realized as in the case where the image coding control is applied to the above-described fade scene.

また、上述の実施の形態では、本発明に係る画像符号化装置の構成要素の一例として、回路や模式的なブロックなどのハードウェア要素を図示しながら説明しているが、従来の画像符号化装置と同様に、コンピュータが実行可能なソフトウェア（プログラム）によって、これらのハードウェア要素を実現することも可能である。 In the above-described embodiment, hardware elements such as a circuit and a schematic block are described as examples of components of the image encoding device according to the present invention. Similar to the apparatus, these hardware elements can be realized by software (program) executable by a computer.

本発明に係る画像符号化装置は、映像が滑らかに変化するようなシーン（例えば、フェードシーン）において、画質劣化を抑制した良好な画像符号化制御を実現することが可能であり、ＭＰＥＧ２などの動画像の画像符号化を実施するための画像符号化技術に適用可能である。 The image coding apparatus according to the present invention can realize good image coding control in which image quality deterioration is suppressed in a scene (for example, a fade scene) in which a video changes smoothly. The present invention can be applied to an image encoding technique for performing image encoding of a moving image.

本発明の実施の形態における画像符号化装置の一例を示すブロック図である。It is a block diagram which shows an example of the image coding apparatus in embodiment of this invention. 本発明の実施の形態における画像符号化装置のシーン検出回路のフェードイン／アウトの検出アルゴリズムの一例を示すフローチャートの１ページ目である。It is the 1st page of the flowchart which shows an example of the fade-in / out detection algorithm of the scene detection circuit of the image coding apparatus in the embodiment of the present invention. 本発明の実施の形態における画像符号化装置のシーン検出回路のフェードイン／アウトの検出アルゴリズムの一例を示すフローチャートの２ページ目である。It is the 2nd page of the flowchart which shows an example of the fade-in / out detection algorithm of the scene detection circuit of the image coding apparatus in embodiment of this invention. 従来の技術に係るＭＰＥＧ２画像符号化における処理及び出力時の画像の並びを模式的に示す図であり、（Ａ）は、ＭＰＥＧ２画像符号化で用いられている符号化体系を示す図、（Ｂ）は、ＭＰＥＧ２画像符号化時の符号化順序の並べ替えを示す図、（Ｃ）は、ＭＰＥＧ２画像復号時のストリーム到達順序及び復号画像出力順序を示す図である。It is a figure which shows typically the arrangement | sequence of the image at the time of the process in MPEG2 image coding based on the prior art, and an output, (A) is a figure which shows the encoding system used by MPEG2 image coding, ) Is a diagram showing rearrangement of the encoding order at the time of MPEG2 image encoding, and (C) is a diagram showing the stream arrival order and the decoded image output order at the time of MPEG2 image decoding. 従来の技術に係る一般的な画像符号化装置の一例を示すブロック図である。It is a block diagram which shows an example of the general image coding apparatus which concerns on a prior art. 従来の技術に係る一般的な復号装置の一例を示すブロック図である。It is a block diagram which shows an example of the general decoding apparatus concerning a prior art.

Explanation of symbols

１０１、２０１入力端子
１０２、２１８画像ストリームバッファ
１０３可変長復号回路
１０４、２１５符号化テーブル
１０５、２１２逆量子化回路
１０６、２０８動き補償予測回路
１０７、２１０加算器
１０８、２１１デブロック回路
１０９、２０９参照画像メモリ
１１０出力フレームメモリ
１１１、２１３逆直交変換回路
１１２、２１９出力端子
２０２入力画像メモリ
２０３２次元ブロック変換回路
２０４減算器
２０５直交変換回路
２０６量子化回路
２０７動きベクトル検出回路
２１４符号化回路
２１６マルチプレクサ
２１７、３０５符号量制御回路
３０１アクティビティ算出回路
３０２アクティビティ保持メモリ
３０３シーン検出回路
３０４アクティビティ予測回路
101, 201 Input terminal 102, 218 Image stream buffer 103 Variable length decoding circuit 104, 215 Encoding table 105, 212 Inverse quantization circuit 106, 208 Motion compensation prediction circuit 107, 210 Adder 108, 211 Deblock circuit 109, 209 Reference image memory 110 Output frame memory 111, 213 Inverse orthogonal transformation circuit 112, 219 Output terminal 202 Input image memory 203 Two-dimensional block transformation circuit 204 Subtractor 205 Orthogonal transformation circuit 206 Quantization circuit 207 Motion vector detection circuit 214 Coding circuit 216 Multiplexers 217 and 305 Code amount control circuit 301 Activity calculation circuit 302 Activity holding memory 303 Scene detection circuit 304 Activity prediction circuit

Claims

An image encoding device for encoding an image signal,
Intraframe information amount calculation means for calculating an intraframe complexity and a DC component for each of a plurality of temporally continuous frames of the image signal to be encoded and outputting the calculation result as an intraframe information amount When,
An intra-frame information amount holding means for holding the intra-frame complexity for each of the plurality of frames output from the intra-frame information amount calculation means;
Based on the intra-frame complexity and the DC component of the plurality of frames held in the intra-frame information amount holding means, by verifying temporal changes of the intra-frame complexity and the DC component of the plurality of frames, Scene detection means for determining whether or not a scene displayed by the plurality of frames is in a fade state;
Based on the in-frame complexity of the plurality of frames held in the in-frame information amount holding unit and the determination result of whether or not the fade state is in the fade state by the scene detection unit , The prediction value calculated using the gradient of the intra-frame complexity of a plurality of frames is set as the intra-frame complexity of the subsequent frame of the plurality of frames. An intra- frame information amount prediction means for setting an intra-frame complexity of a frame to an intra-frame complexity of the subsequent frame;
While holding the approximate value of the complexity calculated from the code amount of the frame of each picture type encoded last and the quantization scale average value, the frame corresponding to the frame of each picture type encoded last The intra-frame complexity is held as the latest measured complexity, and the intra-frame complexity of the subsequent frame predicted by the intra-frame information amount prediction unit and the determination result of the fade state by the scene detection unit Based on the above, if not in the fade state, only the approximate value of the complexity of the I picture is corrected using the intra-frame complexity of the subsequent frame and the latest measured complexity of the I picture. Each of the picture types using the approximate value of the complexity after the correction, the target code amount sum in a certain time, and the number of frames in which each picture type is used. While setting the target code amount, in the fade state, the assumption related to the target code amount using the intraframe complexity of the subsequent frame and the latest measured complexity of each picture type. For all the numbers of frames, the approximation value of the complexity is converted for each frame, and the target code amount of each picture type is set using the target code amount sum of the assumed number of frames related to the target code amount. Code amount control means,
An image encoding apparatus having the same.

An image encoding method executed by an image encoding device that encodes an image signal,
An intra-frame information amount calculation step of calculating an intra-frame complexity and a DC component for each of a plurality of temporally continuous frames of the image signal to be encoded and outputting the calculation result as an intra-frame information amount When,
An intra-frame information amount holding step for holding the intra-frame complexity for each of the plurality of frames output in the intra-frame information amount calculation step;
Based on the intra-frame complexity and the DC component of the plurality of frames held in the intra-frame information amount holding step, by verifying temporal changes of the intra-frame complexity and the DC component of the plurality of frames, A scene detection step for determining whether or not a scene displayed by the plurality of frames is in a fade state;
Based on the in-frame complexity of the plurality of frames held in the in-frame information amount holding step and the determination result of whether or not the fade state is in the scene detection step, The prediction value calculated using the gradient of the intra-frame complexity of a plurality of frames is set as the intra-frame complexity of the subsequent frame of the plurality of frames. An intra-frame information amount prediction step in which an intra-frame complexity of a frame is set to an intra-frame complexity of the subsequent frame;
While holding the approximate value of the complexity calculated from the code amount of the frame of each picture type encoded last and the quantization scale average value, the frame corresponding to the frame of each picture type encoded last The intra-frame complexity is held as the latest actually measured complexity, and the intra-frame complexity of the subsequent frame predicted in the intra-frame information amount prediction step and the determination result of whether or not the fade state is in the scene detection step Based on the above, if not in the fade state, only the approximate value of the complexity of the I picture is corrected using the intra-frame complexity of the subsequent frame and the latest measured complexity of the I picture. Using the approximation of the complexity after the correction, the target code amount total within a certain time, and the number of frames in which each picture type is used, While setting the target code amount of the picture type, in the fade state, the target code amount is set using the intraframe complexity of the subsequent frame and the latest measured complexity of each picture type. The target code amount for each picture type is obtained by converting the approximate value of the complexity for each frame and using the target code amount sum of the assumed number of frames related to the target code amount. A code amount control step for setting
An image encoding method.