JP2005086249A

JP2005086249A - Dynamic image coding method and dynamic image coding apparatus

Info

Publication number: JP2005086249A
Application number: JP2003312751A
Authority: JP
Inventors: Akiyuki Tanizawa; 昭行谷沢; Yoshihiro Kikuchi; 義浩菊池; Shinichiro Koto; 晋一郎古藤; Takeshi Nakajo; 健中條; Wataru Asano; 渉浅野; Naomi Takeda; 奈穂美武田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2003-09-04
Filing date: 2003-09-04
Publication date: 2005-03-31
Anticipated expiration: 2023-09-04
Also published as: JP4130617B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a dynamic image coding method for carrying out moving picture coding at a faster speed by suppressing increase in a computing volume without decreasing the coding efficiency. <P>SOLUTION: The dynamic image coding method for dividing an input image signal into a plurality of pixel blocks, selecting one mode among a plurality of coding modes by each pixel block; and carrying out coding by each pixel in the selected coding mode, generates an estimated parameter different from each pixel block and each coding mode, uses the estimated parameter to estimate a generated code amount of the pixel block of the input image signal and coding distortion, and determines an optimum coding mode on the basis of the estimated generated code amount and the estimated coding distortion. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、画素ブロック毎に複数の符号化モードから適応的に１つのモードを選択することが可能な動画像符号化方法に関する。 The present invention relates to a moving picture coding method capable of adaptively selecting one mode from a plurality of coding modes for each pixel block.

複数の予測モードや、複数のブロック形状を持つ予測符号化を用いた動画像符号化方法が、ＩＴＵ−ＴとＩＳＯ／ＩＥＣとの組み合わせにより、ＩＴＵ−Ｔ Rｅｃ．Ｈ．２６４及びＩＳＯ／ＩＥＣ１４４９６−１０として審議されている（以下、Ｈ．２６４）。 A moving picture encoding method using predictive coding having a plurality of prediction modes and a plurality of block shapes is a combination of ITU-T and ISO / IEC. H. H.264 and ISO / IEC 14496-10 (hereinafter referred to as H.264).

Ｈ．２６４のフレーム内予測符号化では、予測ブロック毎に、４×４の予測ブロックと１６×１６の予測ブロックの形状が選択可能であり、また、複数の予測モードから予測することが可能になっている。従来の符号化方式であるＭＰＥＧ−１、ＭＰＥＧ−２、ＭＰＥＧ−４のフレーム内予測符号化では、選べる予測モードが少なかったが、Ｈ．２６４では予測ブロック形状が１６×１６画素ブロック、４×４画素ブロックのように小ブロック化されており、豊富な予測モードの中から画像の特徴に応じて最適な符号化モードを選択することが可能となっている。 H. In the H.264 intra-frame predictive coding, the shape of a 4 × 4 prediction block and a 16 × 16 prediction block can be selected for each prediction block, and prediction can be performed from a plurality of prediction modes. Yes. In the intra-frame predictive encoding of MPEG-1, MPEG-2, and MPEG-4, which are conventional encoding methods, there are few selectable prediction modes. In H.264, the prediction block shape is divided into small blocks such as a 16 × 16 pixel block and a 4 × 4 pixel block, and it is possible to select an optimal encoding mode from abundant prediction modes according to image characteristics. It is possible.

フレーム間予測符号化においては、従来は８×８画素ブロックサイズ以上の予測が用いられていたが、Ｈ．２６４では４×４画素ブロックサイズでの予測が可能で、さらに既に符号化された複数枚の参照画像からの動き補償により、従来方法よりも高精度な予測が可能となっている。このように、ブロック毎に選択可能な予測モードの数を増やし、より予測効率の高い符号化モードを選択することで、符号化効率の向上が図られている。 In inter-frame predictive coding, conventionally, prediction of 8 × 8 pixel block size or larger has been used. In H.264, prediction with a 4 × 4 pixel block size is possible, and more accurate prediction than in the conventional method is possible by motion compensation from a plurality of already encoded reference images. As described above, the number of prediction modes that can be selected for each block is increased, and an encoding mode with higher prediction efficiency is selected to improve the encoding efficiency.

また、発生符号量を拘束条件として、ラグランジュ未定乗数法を用いて符号化制御を行う符号量−符号化歪み最適化方法が提案されている。これは、実際に符号化して得られる発生符号量と、符号化歪み（例えば、原画像とローカルデコード画像の二乗誤差、平均二乗誤差等）から、符号化効率の最も高い符号化モードを選択する方法である。しかし、この方法の問題点として、符号化モード数及びブロック形状数が増大すると、モードの取り得る組み合わせの数だけ繰り返し符号化する必要が生じるため、実計算時間が増大する。 Also, a code amount-coding distortion optimization method has been proposed in which encoding control is performed using a Lagrange undetermined multiplier method with the generated code amount as a constraint. For this, the encoding mode with the highest encoding efficiency is selected from the generated code amount actually obtained by encoding and the encoding distortion (for example, the square error and the mean square error between the original image and the local decoded image). Is the method. However, as a problem of this method, when the number of encoding modes and the number of block shapes increase, it becomes necessary to repeatedly encode the number of combinations that the mode can take, so that the actual calculation time increases.

また、ラグランジュ未定乗数を用いた動画像符号化方法が提案されている（文献１）。この方法によると、フレーム間予測符号化において、動きベクトル情報を用い、符号化対象ブロックに関して他フレームからの参照の多少を被参照度として被参照度テーブルを作成する。この被参照度テーブルに符号量割り当てを決定する際、被参照度とラグランジュ未定乗数を１対１に対応させる。与えられた未定乗数からラグランジュコストを求めて繰り返し符号化を行う。この方法は未定乗数を被参照度を用いて決定し、発生符号量と符号化歪みは実測するために演算量が増加し、Ｈ．２６４のように予測モード、或いはブロック形状が増加した場合に対して高速化が図られていない。 In addition, a moving image encoding method using a Lagrange undetermined multiplier has been proposed (Reference 1). According to this method, in the inter-frame predictive encoding, the motion vector information is used, and the reference degree table is created by using the reference from the other frames with respect to the encoding target block. When determining the code amount allocation in this referenced level table, the referenced level and the Lagrange undetermined multiplier are made to correspond one-to-one. A Lagrangian cost is obtained from a given undetermined multiplier, and encoding is performed repeatedly. In this method, the undetermined multiplier is determined using the degree of reference, and the amount of calculation increases because the amount of generated code and encoding distortion are actually measured. As in the case of H.264, the speed is not increased with respect to the case where the prediction mode or the block shape increases.

また、雑音によって生じる画像データの劣化を抑制するために、１フレームに対する符号量が一定になるように、ブロック毎に量子化ステップを決定し、対象ブロックにおける発生符号量と符号化歪みを直前までに符号化されたデータから推定し、推定された発生符号量と符号化歪みの和が小さい予測モードを選択する動画像符号化方法が提案されている（文献２）。この方法はフレーム間予測符号化とフレーム内予測符号化のモード判定を考慮しているため、ある符号化モードに複数の予測モードが取りえた場合を考慮しておらず、符号化モードに対する推定値が予測モードにより変動しないため、符号化効率への改善が図られていない。
特開２００２−２１８４６７特開２００１−３２０３１２ In addition, in order to suppress degradation of image data caused by noise, a quantization step is determined for each block so that the code amount for one frame is constant, and the generated code amount and encoding distortion in the target block are immediately before. There has been proposed a moving picture coding method in which a prediction mode is estimated based on data encoded in (3) and a prediction mode having a small sum of the estimated generated code amount and coding distortion is selected (Reference 2). Since this method considers the mode determination of inter-frame predictive coding and intra-frame predictive coding, it does not consider the case where a plurality of prediction modes can be taken in a certain coding mode, and the estimated value for the coding mode However, since it does not vary depending on the prediction mode, the coding efficiency is not improved.
JP2002-218467 JP 2001-332031 A

上述したように、ブロック毎に様々な符号化モードを選択することが可能な動画像符号化方法において、最適な符号化モードを選択するためには、予測モードの種類の増加に伴い、発生符号量や符号化歪みを得るための演算量が増大するという問題がある。また、目標符号量が小さくなればなるほど、隣り合うブロック間の画質特性の違いが顕在化し、主観画質が低下するという問題がある。 As described above, in the moving picture coding method in which various coding modes can be selected for each block, in order to select the optimum coding mode, the generated code is increased as the number of types of prediction modes increases. There is a problem that the amount of calculation for obtaining the amount and coding distortion increases. Further, there is a problem that as the target code amount becomes smaller, the difference in image quality characteristics between adjacent blocks becomes obvious, and the subjective image quality decreases.

本発明では、複数の画素ブロックに対応する予測モード毎に、実際に符号化したときの発生符号量と符号化歪みを推定パラメータを用いて推定することにより、符号化効率を落とさずに、演算量の増大を抑えて、より高速な動画像符号化方法を提供することを目的とする。さらに、コスト関数に隣接ブロックとの誤差を減少させる関数を導入し、ブロック間の予測特性の違いによる画質劣化を低減し、主観画質を向上する。 In the present invention, for each prediction mode corresponding to a plurality of pixel blocks, the generated code amount and encoding distortion when actually encoded are estimated using the estimation parameter, so that the calculation efficiency is not reduced. An object of the present invention is to provide a higher-speed moving picture encoding method while suppressing an increase in the amount. Furthermore, a function for reducing an error with an adjacent block is introduced into the cost function, image quality deterioration due to a difference in prediction characteristics between blocks is reduced, and subjective image quality is improved.

本発明の第１局面は、入力画像信号を複数の画素ブロックに分割し、画素ブロック毎に複数の符号化モードから１つのモードを選択して、前記選択された符号化モードで、前記画素ブロック毎に符号化を行う動画像符号化方法において、画素ブロック毎及び符号化モード毎に異なる推定パラメータを生成し、該推定パラメータを用いて前記入力画像信号の画素ブロックの発生符号量と符号化歪みを推定し、前記推定された発生符号量と符号化歪みに基づいて、最適符号化モードを決定することを特徴とする動画像符号化方法を提供する。 According to a first aspect of the present invention, an input image signal is divided into a plurality of pixel blocks, one mode is selected from a plurality of encoding modes for each pixel block, and the pixel block is selected in the selected encoding mode. In the moving picture coding method in which coding is performed every time, different estimation parameters are generated for each pixel block and each coding mode, and the generated code amount and coding distortion of the pixel block of the input image signal are generated using the estimation parameters. Is provided, and an optimal encoding mode is determined based on the estimated generated code amount and encoding distortion.

本発明の第２局面は、画素ブロック毎及び符号化モード毎に異なる推定パラメータを生成する手段と、該推定パラメータを用いて入力画像信号の画素ブロックの発生符号量と符号化歪みを推定する手段と、前記推定された発生符号量と符号化歪みに基づいて、複数の符号化モードから最適符号化モードを決定する手段と、前記最適符号化モードで前記画素ブロック毎に符号化を行う手段とを具備することを特徴とする動画像符号化装置を提供する。 According to a second aspect of the present invention, means for generating different estimation parameters for each pixel block and for each coding mode, and means for estimating a generated code amount and coding distortion of a pixel block of an input image signal using the estimation parameters And means for determining an optimal encoding mode from a plurality of encoding modes based on the estimated generated code amount and encoding distortion; and means for encoding each pixel block in the optimal encoding mode; A moving picture encoding apparatus is provided.

本発明によれば、発生符号量Rと符号化歪みDを予測モード毎に異なる推定パラメータを用いて計算し、また、階層的な予測モード判定、或いは隣接画素の符号化歪みを考慮したモード判定を行うことにより、画質劣化を抑えつつ、モード判定に必要な演算量を大幅に減少させることが可能となる。 According to the present invention, the generated code amount R and the coding distortion D are calculated using different estimation parameters for each prediction mode, and the hierarchical prediction mode determination or the mode determination considering the encoding distortion of adjacent pixels By performing the above, it is possible to significantly reduce the amount of calculation required for mode determination while suppressing image quality deterioration.

図１は、本発明の実施形態に係わる動画像符号化装置の構成を示すブロック図である。 FIG. 1 is a block diagram showing a configuration of a moving picture coding apparatus according to an embodiment of the present invention.

図１によると、動画像信号は符号化部１１４に入力される。この符号化部１１４の減算器１０１は入力信号を直交変換する直交変換部（例えば離散コサイン変換器）１０２および直交変換係数（DCT係数）を量子化する量子化部１０３を介して可変長符号化部１１１に接続される。量子化部１０３の出力端は局部復号器を構成する逆量子化部１０４、逆直交変換部１０５，加算器１０６を介してフレームメモリ１０７に接続される。フレームメモリ１０７の出力端は後述するフレーム間予測部１０８およびフレーム内予測部１０９の入力端に接続される。フレーム間予測部１０８およびフレーム内予測部１０９の出力端は後述するＭＢ（マクロブロック）予測モード選択部１１５に接続される。このＭＢ予測モード選択部１１５の出力端は減算器１０１および加算器１０６に夫々接続される。 According to FIG. 1, the moving image signal is input to the encoding unit 114. The subtractor 101 of the encoding unit 114 performs variable length encoding via an orthogonal transform unit (for example, a discrete cosine transform) 102 that orthogonally transforms an input signal and a quantization unit 103 that quantizes an orthogonal transform coefficient (DCT coefficient). Connected to the unit 111. The output terminal of the quantization unit 103 is connected to the frame memory 107 via an inverse quantization unit 104, an inverse orthogonal transform unit 105, and an adder 106 that constitute a local decoder. An output terminal of the frame memory 107 is connected to input terminals of an inter-frame prediction unit 108 and an intra-frame prediction unit 109 described later. Output terminals of the inter-frame prediction unit 108 and the intra-frame prediction unit 109 are connected to an MB (macroblock) prediction mode selection unit 115 described later. The output terminals of the MB prediction mode selection unit 115 are connected to the subtracter 101 and the adder 106, respectively.

可変長符号化部１１１の出力端は多重化部１１２を介して出力バッファ１１３に接続される。符号化制御部１１０は符号化部１１４を制御するために設けられている。 The output terminal of the variable length coding unit 111 is connected to the output buffer 113 via the multiplexing unit 112. The encoding control unit 110 is provided to control the encoding unit 114.

上記構成において、入力動画像信号は，複数の画素ブロックに分割され、画素ブロック毎にフレーム内予測部１０９及びフレーム間予測部１０８に入力される。フレーム内予測部１０９或いはフレーム間予測部１０８では、フレームメモリ１０７に記録された参照フレームを用いて、複数の予測モードの中から最適な予測モードが選択され、選ばれた予測モードを用いて予測画素信号が生成される。予測画素信号に基づいてＭＢ予測モード選択部１１５により最適な予測モードが選択される。即ち、フレーム内予測部１０９により選択された最適予測モードにより生成された予測画素信号とフレーム間予測部１０８により選択された最適予測モードにより生成された予測画像信号に基づいてＭＢ予測モード選択部１１５はフレーム間予測モードおよびフレーム内予測モードの一方の最適な予測モードを選択する。選択された予測モードに対応する予測画素信号が減算器１０１に入力される。減算器１０１により予測画素信号と入力画像信号との予測残差信号が計算される。この予測残差信号は直交変換部１０２に入力され、直交変換（例えばDCT変換）される。 In the above configuration, the input moving image signal is divided into a plurality of pixel blocks, and is input to the intra-frame prediction unit 109 and the inter-frame prediction unit 108 for each pixel block. The intra-frame prediction unit 109 or the inter-frame prediction unit 108 uses the reference frame recorded in the frame memory 107 to select an optimal prediction mode from among a plurality of prediction modes, and performs prediction using the selected prediction mode. A pixel signal is generated. Based on the prediction pixel signal, the MB prediction mode selection unit 115 selects an optimal prediction mode. That is, the MB prediction mode selection unit 115 is based on the prediction pixel signal generated by the optimal prediction mode selected by the intra-frame prediction unit 109 and the prediction image signal generated by the optimal prediction mode selected by the inter-frame prediction unit 108. Selects an optimal prediction mode of one of the inter-frame prediction mode and the intra-frame prediction mode. A prediction pixel signal corresponding to the selected prediction mode is input to the subtractor 101. A subtracter 101 calculates a prediction residual signal between the prediction pixel signal and the input image signal. This prediction residual signal is input to the orthogonal transform unit 102 and subjected to orthogonal transform (for example, DCT transform).

直交変換係数は量子化部１０３により量子化され、量子化された直交変換係数は、ＭＢ予測モード選択部１１５から出力される予測モード情報、量子化係数等の予測方法に関する情報とともに、可変長符号化部１１１により可変長符号化される。これらの符号化データは多重化部１１２によりの多重化され、出力バッファ１１３を通して符号化データとして出力される。 The orthogonal transform coefficients are quantized by the quantization unit 103, and the quantized orthogonal transform coefficients are variable length codes along with prediction mode information output from the MB prediction mode selection unit 115, information on prediction methods such as quantization coefficients, and the like. The encoding unit 111 performs variable length encoding. These encoded data are multiplexed by the multiplexing unit 112 and output as encoded data through the output buffer 113.

また、量子化された直交変換係数は、逆量子化部１０４および逆直交変換部１０５を介して局部復号される。局部復号信号、即ち復号予測残差信号は加算器１０６において予測信号と加算されて、参照フレームとしてフレームメモリ１０７へと保存される。 Further, the quantized orthogonal transform coefficient is locally decoded through the inverse quantization unit 104 and the inverse orthogonal transform unit 105. The local decoded signal, that is, the decoded prediction residual signal is added to the prediction signal in the adder 106 and stored in the frame memory 107 as a reference frame.

符号化制御部１１０は発生符号量のフィードバック制御及び量子化特性制御などを行い、発生符号量の制御を行うレート制御や、予測部の制御、符号化全体の制御を行う。 The encoding control unit 110 performs feedback control and quantization characteristic control of the generated code amount, and performs rate control for controlling the generated code amount, control of the prediction unit, and control of the entire encoding.

図２から図６を用いて、具体的な予測モードの説明を行う。 A specific prediction mode will be described with reference to FIGS.

本実施形態の予測符号化では、マクロブロック毎に複数のブロック形状があり、それぞれ予測モードを持っている。例えばＨ．２６４などのフレーム内予測における輝度信号では、１６個の４×４画素ブロックを持つマクロブロックと、１個の１６×１６画素ブロックを持つマクロブロックとの２通りが提案されている。４×４画素ブロックには９つの予測モードがあり、１６×１６画素ブロックには４つの予測モードがある。 In the predictive encoding of this embodiment, there are a plurality of block shapes for each macroblock, and each has a prediction mode. For example, H.C. For luminance signals in intra-frame prediction such as H.264, two types of macroblocks having 16 4 × 4 pixel blocks and macroblocks having one 16 × 16 pixel block have been proposed. A 4 × 4 pixel block has nine prediction modes, and a 16 × 16 pixel block has four prediction modes.

図５にＨ．２６４のフレーム内予測に用いられるマクロブロックのブロック形状が示されている。Ｈ．２６４などでは、符号化対象フレームを１６個の１６×１６画素のマクロブロックに分割し、さらにフレーム内予測ではマクロブロックを１６個の４×４画素ブロックへと分割する。４×４画素ブロックの場合、フレーム内予測は４×４予測が順次１６回行われる。 FIG. A block shape of a macroblock used for H.264 intra-frame prediction is shown. H. In H.264, the encoding target frame is divided into 16 16 × 16 pixel macroblocks, and the macroblock is further divided into 16 4 × 4 pixel blocks in intra-frame prediction. In the case of a 4 × 4 pixel block, 4 × 4 prediction is sequentially performed 16 times in the frame.

図６はＨ．２６４のフレーム内予測の４×４画素ブロックにおける全予測モード、即ち垂直予測モード、水平予測モード、DC予測モード、直交左下予測モード、直交右下予測モード、垂直右予測モード、水平下予測モード、垂直左予測モードおよび水平上予測モードを示している。記号Ａ〜Ｍは既に符号化されている参照画素信号である。例えば、垂直予測モードは垂直方向に沿ってそれぞれ参照画素A,B,C,Dから予測する。DＣ予測モードは参照画素A〜DとJ〜Mまでの平均値を求め、この平均値により４×４ブロックの全画素が予測される。 FIG. All prediction modes in 4 × 4 pixel block of H.264 intra-frame prediction, that is, vertical prediction mode, horizontal prediction mode, DC prediction mode, orthogonal lower left prediction mode, orthogonal lower right prediction mode, vertical right prediction mode, horizontal lower prediction mode, A vertical left prediction mode and a horizontal upper prediction mode are shown. Symbols A to M are reference pixel signals that have already been encoded. For example, in the vertical prediction mode, prediction is performed from reference pixels A, B, C, and D, respectively, along the vertical direction. In the DC prediction mode, an average value of reference pixels A to D and J to M is obtained, and all pixels of a 4 × 4 block are predicted based on the average value.

図２は図１のフレーム内予測部１０９の構成を示している。これによると、図１のフレームメモリ１０７から得られる参照画像信号２０５は、ブロック形状制御部２０１にて各画素ブロックで利用される形状に分割或いは整列される。画素ブロック形状に応じて画素ブロックフレーム内予測部２１１内の４×４ブロックフレーム内予測部２０２および１６×１６ブロックフレーム内予測部２０３によってフレーム内予測モード及び各画素ブロック形状でのブロックフレーム内予測モードが選択される。 FIG. 2 shows the configuration of the intra-frame prediction unit 109 of FIG. According to this, the reference image signal 205 obtained from the frame memory 107 in FIG. 1 is divided or aligned by the block shape control unit 201 into a shape used in each pixel block. According to the pixel block shape, the intra-frame prediction mode and intra-frame prediction in each pixel block shape are performed by the 4 × 4 block intra-frame prediction unit 202 and the 16 × 16 block intra-frame prediction unit 203 in the pixel block intra-frame prediction unit 211. A mode is selected.

ブロック内予測モード決定部２０４は画素ブロックフレーム内予測部２０２，２０３で得られる複数の予測モードの中からモード判定（例えば、復号画素信号と入力画像信号の二乗誤差の小さい予測モードの選択）を行う。即ち、ブロック内予測モード決定部２０４は最適予測モードを判定し、予測画素信号２０７、復号画素信号２０８、直交変換係数２０９、量子化パラメータ２１０、予測モード情報２１２を出力する。ブロック内予測モード決定部２０４は、上述したモード判定の代わりに後述する推定パラメータを用いたモード判定を用いても良いし、発生符号量と符号化歪みの実測値を用いたモード判定を行っても良い。 The intra-block prediction mode determination unit 204 performs mode determination (for example, selection of a prediction mode with a small square error between the decoded pixel signal and the input image signal) from the plurality of prediction modes obtained by the pixel block intra-frame prediction units 202 and 203. Do. That is, the intra-block prediction mode determination unit 204 determines the optimal prediction mode, and outputs the prediction pixel signal 207, the decoded pixel signal 208, the orthogonal transform coefficient 209, the quantization parameter 210, and the prediction mode information 212. The intra-block prediction mode determination unit 204 may use mode determination using an estimation parameter, which will be described later, instead of the above-described mode determination, or perform mode determination using a generated code amount and an actual measurement value of coding distortion. Also good.

図３および図４は４×４ブロックフレーム内予測部２０２と１６×１６ブロックフレーム内予測部２０３を示している。図２のブロック形状制御部２０１にて４×４画素ブロックが選択されたときに、図３の４×４予測部３０１へ参照画素信号２０５が入力される。４×４予測部３０１は予測画素信号３０９を予測モード決定部３０４に入力すると、予測モード決定部３０４は選択された画素ブロックの最適予測モードを決定する。 3 and 4 show a 4 × 4 block intra-frame prediction unit 202 and a 16 × 16 block intra-frame prediction unit 203. When the 4 × 4 pixel block is selected by the block shape control unit 201 in FIG. 2, the reference pixel signal 205 is input to the 4 × 4 prediction unit 301 in FIG. 3. When the 4 × 4 prediction unit 301 inputs the prediction pixel signal 309 to the prediction mode determination unit 304, the prediction mode determination unit 304 determines the optimal prediction mode of the selected pixel block.

即ち、このとき、予測画素信号３０９は減算器３１０により入力画像信号２０６から減算され、予測残差信号３１１が生成される。予測残差信号３１１は符号量推定部３０２および符号化歪み推定部３０３に送られる。符号量推定部３０２は、予測モード３１９と量子化パラメータ３２０及び予測残差信号３１１を用いて、発生符号量推定値R＾（３１２）を計算する。同様に符号化歪み推定部３０３は予測モード３１９と予測画素信号３０９を用いて復号画素信号３１８と入力画像信号２０６の符号化歪み推定値D＾（３１３）を計算する。推定発生符号量R＾および推定符号化歪みD＾を用いて予測モード決定部３０４はラグランジュ未定乗数法に従って各予測モードのコストを計算し、コストが最小となる予測モードを最適予測モードとして選択し、予測モード情報３１９を出力する。更に、予測モード決定部３０４は予測画素信号３２１および量子化パラメータ３２０を選択し出力する。 In other words, at this time, the prediction pixel signal 309 is subtracted from the input image signal 206 by the subtractor 310, and the prediction residual signal 311 is generated. The prediction residual signal 311 is sent to the code amount estimation unit 302 and the coding distortion estimation unit 303. The code amount estimation unit 302 uses the prediction mode 319, the quantization parameter 320, and the prediction residual signal 311 to calculate the generated code amount estimation value R ^ (312). Similarly, the coding distortion estimation unit 303 calculates the coding distortion estimated value D ^ (313) of the decoded pixel signal 318 and the input image signal 206 using the prediction mode 319 and the prediction pixel signal 309. The prediction mode determination unit 304 calculates the cost of each prediction mode according to the Lagrange undetermined multiplier method using the estimated generated code amount R ^ and the estimated coding distortion D ^, and selects the prediction mode with the minimum cost as the optimal prediction mode. The prediction mode information 319 is output. Further, the prediction mode determination unit 304 selects and outputs the prediction pixel signal 321 and the quantization parameter 320.

予測画素信号３２１は減算器３１４より入力画像信号から減算されて予測残差信号３１５が生成される。予測残差信号３１５は直交変換部３０５及び量子化部３０６によって直交変換され、量子化される。量子化データは逆量子化部３０７及び逆直交変換部３０８を介して加算器３１６に入力され、予測画素信号３２１と加算される。これにより、復号画素信号３１８が生成される。この復号画素信号３１８は次の４×４ブロックの参照画素信号として利用される。 The predicted pixel signal 321 is subtracted from the input image signal by the subtractor 314 to generate a predicted residual signal 315. The prediction residual signal 315 is orthogonally transformed and quantized by the orthogonal transformation unit 305 and the quantization unit 306. The quantized data is input to the adder 316 via the inverse quantization unit 307 and the inverse orthogonal transform unit 308, and is added to the predicted pixel signal 321. As a result, a decoded pixel signal 318 is generated. The decoded pixel signal 318 is used as a reference pixel signal for the next 4 × 4 block.

予測残差信号３１５の計算は冗長であるため、推定部３０２，３０３に入力される予測残差信号を保存しておき、直交変換部３０５へと直接入力してもよい。マクロブロック内の４×４画素ブロック、即ち１６個のブロックについて、それぞれ上記のモード判定処理を順次行う。４×４画素ブロックの予測モードはマクロブロックの４×４予測組み合わせ候補となりブロック内予測モード決定部２０４へと入力される。 Since the calculation of the prediction residual signal 315 is redundant, the prediction residual signal input to the estimation units 302 and 303 may be stored and directly input to the orthogonal transform unit 305. The above-described mode determination process is sequentially performed on each 4 × 4 pixel block in the macroblock, that is, 16 blocks. The prediction mode of the 4 × 4 pixel block becomes a 4 × 4 prediction combination candidate of the macroblock, and is input to the intra-block prediction mode determination unit 204.

図４の１６×１６ブロックフレーム内予測部２０３では、図２のブロック形状制御部２０１にて１６×１６画素ブロックが選択されたときに、１６×１６ブロック予測部４０１へ参照画素信号２０５が入力される。１６×１６ブロック予測部４０１は参照画像信号２０５を参照して予測画素信号４０９を生成する。予測画素信号４０９が１６×１６ブロック予測モード決定部４０４に入力されると、１６×１６ブロック予測モード決定部４０４は選択画素ブロックの形状の最適予測モードを決定する。 4, when a 16 × 16 pixel block is selected by the block shape control unit 201 in FIG. 2, the reference pixel signal 205 is input to the 16 × 16 block prediction unit 401. Is done. The 16 × 16 block prediction unit 401 generates a prediction pixel signal 409 with reference to the reference image signal 205. When the prediction pixel signal 409 is input to the 16 × 16 block prediction mode determination unit 404, the 16 × 16 block prediction mode determination unit 404 determines the optimal prediction mode of the shape of the selected pixel block.

即ち、このとき、予測画素信号４０９が減算器４１０により入力画像信号２０６から減算されて予測残差信号４１１が生成される。この予測残差信号４１１は符号量推定部４０２及び符号化歪み推定部４０３に送られる。符号量推定部４０２は、予測モード４１９と量子化パラメータ４２０及び予測残差信号４１１を用いて、発生符号量推定値R＾（４１２）を計算する。同様に符号化歪み推定部４０３は予測モード４１９と予測画素信号４０９を用いて復号画素信号４１８と入力画像信号２０６の符号化歪み推定値D＾（４１３）を計算する。推定符号量R＾および推定符号化歪みD＾を用いて予測モード決定部４０４はラグランジュ未定乗数法に基づいて各モードのコストを計算し、コストが最小となる予測モードを最適予測モードとして選択し、予測モード情報４１９を出力する。更に、１６×１６ブロック予測モード決定部４０４は量子化パラメータ４２０および予測画素信号４２１を出力する。 That is, at this time, the prediction pixel signal 409 is subtracted from the input image signal 206 by the subtractor 410 to generate the prediction residual signal 411. The prediction residual signal 411 is sent to the code amount estimation unit 402 and the coding distortion estimation unit 403. The code amount estimation unit 402 uses the prediction mode 419, the quantization parameter 420, and the prediction residual signal 411 to calculate the generated code amount estimated value R ^ (412). Similarly, the coding distortion estimation unit 403 calculates a coding distortion estimated value D ^ (413) of the decoded pixel signal 418 and the input image signal 206 using the prediction mode 419 and the prediction pixel signal 409. The prediction mode determination unit 404 calculates the cost of each mode based on the Lagrange undetermined multiplier method using the estimated code amount R ^ and the estimated coding distortion D ^, and selects the prediction mode that minimizes the cost as the optimum prediction mode. The prediction mode information 419 is output. Further, the 16 × 16 block prediction mode determination unit 404 outputs a quantization parameter 420 and a prediction pixel signal 421.

予測画素信号４２１は減算器４１４で入力画像信号から減算されて予測残差信号４１５が生成される。予測残差信号４１５は直交変換部４０５及び量子化部４０６を通って直交変換および量子化される。量子化データは逆量子化部４０７及び逆直交変換部４０８を介して加算器４１６に入力され、予測画素信号４２１と加算される。これにより、復号画素信号４１８が生成される。復号画素信号４１８は次のマクロブロックの参照画素信号として利用される。 The prediction pixel signal 421 is subtracted from the input image signal by the subtracter 414 to generate a prediction residual signal 415. The prediction residual signal 415 is orthogonally transformed and quantized through the orthogonal transformation unit 405 and the quantization unit 406. The quantized data is input to the adder 416 via the inverse quantization unit 407 and the inverse orthogonal transform unit 408, and is added to the predicted pixel signal 421. As a result, a decoded pixel signal 418 is generated. The decoded pixel signal 418 is used as a reference pixel signal for the next macroblock.

図３および図４の予測モード決定部３０４および４０４は上述したようにラグランジュの未定乗数法を用いる。ここでラグランジュコストをＪ、発生符号量をR及び符号化歪みをDとする。λはラグランジュ乗数であり、量子化パラメータに依存する。

As described above, the prediction

mode determination units

304 and 404 in FIGS. 3 and 4 use the Lagrange multiplier method. Here, the Lagrangian cost is J, the generated code amount is R, and the coding distortion is D. λ is a Lagrange multiplier and depends on the quantization parameter.

ラグランジュ未定乗数法は、ある拘束条件を持つ最大化問題を拘束条件無しの最大化問題へと帰着させる手法であり、Thomas Wiegand and Berand Girod, “Multi-frame motion-compensated prediction for video transmission”, Kluwer Academic Publishers 2001に動画像符号化方法のモード選択が提案されている。 Lagrange's undetermined multiplier method is a technique that reduces a maximization problem with a certain constraint condition to a maximization problem without a constraint condition. Thomas Wiegand and Berand Girod, “Multi-frame motion-compensated prediction for video transmission”, Kluwer Academic Publishers 2001 proposes a mode selection of a video encoding method.

符号化歪みDは、予測残差信号（３１５，４１５）を直交変換し、量子化し、逆量子化し、逆直交変換した後に予測信号と加算されて得られるローカルデコード画像と原画像との誤差として計算される。また発生符号量Rは量子化した直後の直交変換係数を用いて符号化した後に得られるため、複数のモード、或いはブロック形状を有する場合、計算負荷が増大する。 The encoding distortion D is an error between the local decoded image obtained by orthogonally transforming the prediction residual signal (315, 415), quantizing, inversely quantizing, inversely orthogonally transforming and adding the prediction signal and the original image. Calculated. Further, since the generated code amount R is obtained after encoding using the orthogonal transform coefficient immediately after quantization, the calculation load increases when it has a plurality of modes or block shapes.

本発明の実施形態の動画像符号化方法によると、推定符号化歪みD＾および推定発生符号量R＾でなる推定パラメータを予測残差信号の分散σ²および量子化パラメータQPから計算することにより、予測モードループ内での直交変換、量子化、逆量子化、逆直交変換等の処理が省かれる。また可変長符号化部、符号化歪み算出部などの構成要素を付加しないで、演算量やハードウェアコストを削減し、かつ画質劣化が抑えられる。 According to the moving picture coding method of the embodiment of the present invention, by calculating the estimation parameter composed of the estimated coding distortion D ^ and the estimated generated code quantity R ^ from the variance σ ² of the prediction residual signal and the quantization parameter QP In addition, processing such as orthogonal transform, quantization, inverse quantization, and inverse orthogonal transform in the prediction mode loop is omitted. Further, without adding components such as a variable-length encoding unit and an encoding distortion calculation unit, it is possible to reduce the calculation amount and hardware cost, and to suppress image quality deterioration.

図１１のフローチャートを参照してフレーム内予測部１０８の処理を説明する。最初に最小コストを更新する変数min_costを初期化する。予測モードループ内では、フレーム内予測（ステップ２）、符号量推定（ステップ４）、符号化歪み推定（ステップ５）、コストＪ評価（ステップ６）を順次行う。この後、min_cost>Jが判定される（ステップ８）。この判定がＹＥＳであると、最小コストが更新され（ステップ９）、予測モードが更新される（ステップ１０）。 The processing of the intraframe prediction unit 108 will be described with reference to the flowchart of FIG. First, a variable min_cost for updating the minimum cost is initialized. In the prediction mode loop, intra-frame prediction (step 2), code amount estimation (step 4), coding distortion estimation (step 5), and cost J evaluation (step 6) are sequentially performed. Thereafter, min_cost> J is determined (step 8). If this determination is YES, the minimum cost is updated (step 9), and the prediction mode is updated (step 10).

予測モード更新後およびmin_cost>Jの判定がＮＯのとき、予測終了が決定される（ステップ１１）。ステップ２〜１０の予測モードループが存在する予測モードの数だけ繰り返されると、予測ループが終了する。 After the prediction mode is updated and when the determination of min_cost> J is NO, the end of prediction is determined (step 11). When the prediction mode loop of steps 2 to 10 is repeated for the number of prediction modes in which the prediction mode loop exists, the prediction loop is terminated.

予測ループが終了し、最小コストを与える予測モードが決定されると、直交変換（ステップ１２）、量子化（ステップ１３）、逆量子化（ステップ１４）および逆直交変換（ステップ１５）が順次行われ、参照画素が更新される（ステップ１６）。この後、処理は終了する。 When the prediction loop is completed and the prediction mode giving the minimum cost is determined, orthogonal transformation (step 12), quantization (step 13), inverse quantization (step 14), and inverse orthogonal transformation (step 15) are sequentially performed. The reference pixel is updated (step 16). Thereafter, the process ends.

本実施形態では、コスト最小の予測モードによってのみ、上記の変換処理が行われればよい。また符号量を推定しているために可変長符号化を行う必要が無い。従って、従来方法に比較して処理が大幅に高速化できる。 In the present embodiment, the above conversion process may be performed only in the prediction mode with the lowest cost. Further, since the code amount is estimated, there is no need to perform variable length coding. Therefore, the processing can be greatly speeded up as compared with the conventional method.

次に符号量推定と符号化歪み推定方法について説明する。符号化歪みは量子化パラメータQPと予測モードＩにより次式のように近似する。

Next, code amount estimation and coding distortion estimation methods will be described. The coding distortion is approximated by the quantization parameter QP and the prediction mode I as follows:

発生符号量は、予測モード、量子化パラメータ、入力画像信号などに依存するが、フレーム内予測に対しては次式で近似することが出来る。

The generated code amount depends on the prediction mode, the quantization parameter, the input image signal, and the like, but can be approximated by the following equation for intra-frame prediction.

ここでＩは予測モード、a_Ι,b_Ι,c_Ι,d_Ιは予測モードＩに依存する推定パラメータであり、σ²は予測残差信号の分散を表す。推定符号化歪みをD＾、推定符号量をR＾、量子化パラメータをQPで表している。符号化歪み及び発生符号量の推定に、予測モード毎に異なる推定パラメータを用いることによって、推定精度をより向上させることが可能となる。 Here, I is a prediction mode, a _Ι , b _,, c _,, and d _Ι are estimation parameters that depend on the prediction mode I, and σ ² represents the variance of the prediction residual signal. The estimated coding distortion is represented by D ^, the estimated code amount is represented by R ^, and the quantization parameter is represented by QP. By using different estimation parameters for each prediction mode for estimating the coding distortion and the generated code amount, it is possible to further improve the estimation accuracy.

上記推定パラメータa_Ι,b_Ι,c_Ι,d_Ιは、下記のように予め複数のサンプル画像を用いて学習して決定しておくことが可能である。まず、サンプル画像の画素ブロック毎に予測モードを固定し、種々のサンプル画像に対して量子化パラメータを順次変えながら符号化を行い、その画像における発生符号量−符号化歪みの関係を求め、同時に予測残差信号の分散を測定する。 The estimation parameters a _Ι , b _Ι , c _Ι , and d _{上記} can be determined by learning in advance using a plurality of sample images as described below. First, the prediction mode is fixed for each pixel block of the sample image, encoding is performed while sequentially changing the quantization parameter for various sample images, and the relationship between the generated code amount and the encoding distortion in the image is obtained. Measure the variance of the predicted residual signal.

図７は符号化歪み推定例を示している。横軸が量子化パラメータQPの値を示しており、縦軸は実測値から得られる符号化歪みDである。点は符号化して得られる実測値であり、実線は近似曲線を表している。符号化歪みの推定に関しては、量子化パラメータの値毎に上記固定した予測モードで得られる符号化歪みの実測値Dと量子化パラメータの関係を指数近似し、該予測モードの推定パラメータを決定する。ブロック毎の発生符号量の推定においては、上記固定の予測モードで得られる符号量とlog（σ²／D＾）の関係を線形近似、或いは指数近似し、該予測モードの推定パラメータを決定する。このように予め決定された上記の推定パラメータを用いて、符号化歪み推定値と発生符号量推定値を予測残差信号の分散、量子化パラメータQPから計算する。これにより、予測モードループ内での複雑な処理を省き、演算量やハードウェアコストを削減し、かつ画質劣化を抑えた動画像符号化方法を提供することができる。 FIG. 7 shows an example of encoding distortion estimation. The horizontal axis represents the value of the quantization parameter QP, and the vertical axis represents the coding distortion D obtained from the actual measurement value. A point is an actual measurement value obtained by encoding, and a solid line represents an approximate curve. For estimation of coding distortion, for each quantization parameter value, exponential approximation is performed on the relationship between the measured value D of the coding distortion obtained in the fixed prediction mode and the quantization parameter, and the estimation parameter of the prediction mode is determined. . In the estimation of the generated code amount for each block, the estimation parameter of the prediction mode is determined by linearly or exponentially approximating the relationship between the code amount obtained in the fixed prediction mode and log (σ ² / D ^). . Using the estimation parameters determined in advance as described above, the encoding distortion estimation value and the generated code amount estimation value are calculated from the variance of the prediction residual signal and the quantization parameter QP. As a result, it is possible to provide a moving picture encoding method that eliminates complicated processing in the prediction mode loop, reduces the amount of calculation and hardware cost, and suppresses image quality deterioration.

また、推定パラメータを用いてモード判定を行う別の実施形態では、符号化の際に、基準となる複数の量子化パラメータを用いて、数フレームだけ実際に符号化する。符号化対象画像の発生符号量Rと符号化歪みDを測定する。得られた符号量−符号化歪み曲線を近似し、マクロブロック毎の推定パラメータa_Ι,b_Ι,c_Ι,d_Ιを計算する。ここで得られた推定パラメータを用いて符号化する。このように符号化前に数フレームを符号化して推定パラメータを決定することによって、符号化対象に合った推定パラメータのセットを用いることができ、符号化効率の高い画質が得られる。 In another embodiment in which mode determination is performed using estimated parameters, only a few frames are actually encoded using a plurality of reference quantization parameters at the time of encoding. The generated code amount R and encoding distortion D of the encoding target image are measured. The obtained code amount-coding distortion curve is approximated, and estimation parameters a _Ι , b _Ι , c _Ι , and d _の for each macroblock are calculated. Encoding is performed using the estimation parameters obtained here. In this way, by encoding several frames before encoding and determining estimation parameters, a set of estimation parameters suitable for the encoding target can be used, and image quality with high encoding efficiency can be obtained.

次に、第２の実施形態を説明するが、第２の実施形態では、第１の実施形態で導入した推定パラメータを用いたモード判定手法を画素ブロック毎に階層化して用いる。即ち、入力画像信号を複数の大画素ブロックに分割し、さらに大画素ブロックを複数の小画素ブロックに分割する。例えばＨ．２６４などのフレーム内予測では、図５に示すように、マクロブロックを１６×１６画素の大画素ブロックと４×４画素の小画素ブロックに分割する。 Next, a second embodiment will be described. In the second embodiment, the mode determination method using the estimation parameter introduced in the first embodiment is hierarchized for each pixel block. That is, the input image signal is divided into a plurality of large pixel blocks, and the large pixel block is further divided into a plurality of small pixel blocks. For example, H.C. In intra-frame prediction such as H.264, as shown in FIG. 5, a macroblock is divided into a large pixel block of 16 × 16 pixels and a small pixel block of 4 × 4 pixels.

１６×１６画素ブロックにおいては、予測モード数は４通りと少ないが、４×４画素ブロックでは９通りも存在する。４×４画素ブロックでは、１マクロブロックの予測を得るために、莫大な組み合わせが考えられ、演算量が増大する。そのため計算回数の多い小画素ブロックでは、演算量の増加を抑えるために上述した推定パラメータを導入した発生符号量推定値と符号化歪み推定値を用いたモード判定を用いる。大画素ブロックでは予測モード数が少ないので画質向上のために、それぞれ発生符号量と符号化歪みを符号化処理する。この時、発生符号量と符号化歪みを実際に測定することによりモード判定を行う。大画素ブロック、小画素ブロックのような階層構造を用いることによって画質劣化を防ぎ、演算量を大幅に削減することが可能である。 In the 16 × 16 pixel block, the number of prediction modes is as small as four, but in the 4 × 4 pixel block, there are nine types. In a 4 × 4 pixel block, in order to obtain a prediction of one macroblock, enormous combinations are conceivable and the amount of calculation increases. Therefore, in the small pixel block having a large number of calculations, mode determination using the generated code amount estimated value and the encoded distortion estimated value introduced with the estimation parameter described above is used in order to suppress an increase in the amount of calculation. Since the number of prediction modes is small in the large pixel block, the generated code amount and the encoding distortion are encoded to improve the image quality. At this time, mode determination is performed by actually measuring the amount of generated code and encoding distortion. By using a hierarchical structure such as a large pixel block and a small pixel block, image quality deterioration can be prevented and the amount of calculation can be greatly reduced.

小画素ブロックにおいて、推定パラメータを用いて発生符号量推定値と符号化歪み推定値を計算し、モード判定を行う例が図３に示す形態であり、そのときの処理過程が図１１のフローチャートに示される。この方法では予測モード毎に異なる推定パラメータを用いて推定を行うため精度が良い。また予測モードループ内で煩雑な処理を行わないので符号化処理が非常に高速である。 In the small pixel block, an example of performing the mode determination by calculating the generated code amount estimated value and the encoded distortion estimated value using the estimation parameter is the mode shown in FIG. 3, and the processing process at that time is shown in the flowchart of FIG. Indicated. This method has high accuracy because estimation is performed using different estimation parameters for each prediction mode. Also, since no complicated processing is performed in the prediction mode loop, the encoding process is very fast.

一方、大画素ブロックにおいては、発生符号量と符号化歪みを実測し、モード判定を行う実施形態は図８に示す形態であり、このときの処理過程は図１２のフローチャートに示される。 On the other hand, in the large pixel block, the embodiment in which the generated code amount and the coding distortion are actually measured and the mode is determined is shown in FIG. 8, and the processing process at this time is shown in the flowchart of FIG.

図８は図２の画素ブロックフレーム内予測部２０３を示す。これによると、参照画素信号２０５が１６×１６予測部５０１へ入力されると、１６×１６予測部５０１は予測画素信号５１２を出力する。予測画素信号５１２は減算器５０９により入力画像信号２０６から減算され、予測残差信号５１１が生成される。この予測残差信号５１１は直交変換部５０２によって直交変換（例えばDCTなど）される。直交変換により得られた直交変換係数は量子化部５０３によって量子化される。量子化変換係数は可変長符号化部５０６へと送られ、ここで可変長符号化されて発生符号量実測値R（５１３）が求められる。 FIG. 8 shows the pixel block intra-frame prediction unit 203 of FIG. According to this, when the reference pixel signal 205 is input to the 16 × 16 prediction unit 501, the 16 × 16 prediction unit 501 outputs a prediction pixel signal 512. The predicted pixel signal 512 is subtracted from the input image signal 206 by a subtracter 509 to generate a predicted residual signal 511. The prediction residual signal 511 is subjected to orthogonal transform (for example, DCT) by the orthogonal transform unit 502. The orthogonal transform coefficient obtained by the orthogonal transform is quantized by the quantization unit 503. The quantized transform coefficient is sent to the variable length coding unit 506, where it is subjected to variable length coding to obtain a generated code amount actual measurement value R (513).

また、量子化直交変換係数は逆量子化部５０４によって逆量子化され、さらに逆直交変換部５０５によって復号化され、局部復号信号が生成される。この局部復号信号は予測部５０１から得られる予測画素信号５１２と加算される。加算結果は復号画素信号５１６として符号化歪み算出部５０７へと入力される。符号化歪み算出部５０７は入力画像信号２０６と復号画素信号５１６とに基づいて符号化歪み実測値D（５１４）と発生符号量実測値R（５１３）を算出する。符号化歪み実測値D（５１４）と発生符号量実測値R（５１３）は１６×１６画素ブロック予測モード決定部５０８へ入力される。 Further, the quantized orthogonal transform coefficient is inversely quantized by the inverse quantization unit 504 and further decoded by the inverse orthogonal transform unit 505 to generate a local decoded signal. This local decoded signal is added to the prediction pixel signal 512 obtained from the prediction unit 501. The addition result is input to the coding distortion calculation unit 507 as a decoded pixel signal 516. The coding distortion calculation unit 507 calculates a coding distortion actual value D (514) and a generated code amount actual value R (513) based on the input image signal 206 and the decoded pixel signal 516. The actual coding distortion value D (514) and the generated code amount actual value R (513) are input to the 16 × 16 pixel block prediction mode determination unit 508.

１６×１６画素ブロック予測モード決定部５０８はラグランジュ未定乗数法を用いて符号化歪み実測値D（５１４）と発生符号量実測値R（５１３）からラグランジュコストを算出し、コスト最小の予測モードを最終予測モードとして、復号画素信号５１６、直交変換係数５１５、量子化パラメータ５１９などとともに出力する。 The 16 × 16 pixel block prediction mode determination unit 508 calculates a Lagrangian cost from the actual measured coding distortion value D (514) and the actual generated code amount R (513) using a Lagrange undetermined multiplier method, and sets the prediction mode with the lowest cost. As the final prediction mode, it is output together with the decoded pixel signal 516, the orthogonal transform coefficient 515, the quantization parameter 519, and the like.

図１２に示すフローチャートによると、先ず、最小コストを更新する変数min_costを初期化する（ステップ０１）。フレーム内予測（ステップ０２）が実施されると、予測モードループにおいて、直交変換（ステップ１２）、量子化（ステップ１３）、逆量子化（ステップ１４）、逆直交変換（ステップ１５）、可変長符号化（ステップ１７）及び符号化歪み計算（ステップ１８）が順次行われる。この場合、予測モードが増えるとそれに伴って演算量が増大する反面、正確な発生符号量と符号化歪みが計算でき、符号化効率の高い高画質な符号化画像が得られる。 According to the flowchart shown in FIG. 12, first, a variable min_cost for updating the minimum cost is initialized (step 01). When intra-frame prediction (step 02) is performed, in the prediction mode loop, orthogonal transform (step 12), quantization (step 13), inverse quantization (step 14), inverse orthogonal transform (step 15), variable length Encoding (step 17) and encoding distortion calculation (step 18) are sequentially performed. In this case, if the prediction mode increases, the amount of calculation increases accordingly. However, an accurate generated code amount and encoding distortion can be calculated, and a high-quality encoded image with high encoding efficiency can be obtained.

この後、符号化歪みDに基づきコスト評価が行われ（ステップ０６）、min_cost>Jが判定される（ステップ８）。この判定がＹＥＳであると、最小コストが更新され（ステップ９）、予測モードが更新される（ステップ１０）。 Thereafter, cost evaluation is performed based on the coding distortion D (step 06), and min_cost> J is determined (step 8). If this determination is YES, the minimum cost is updated (step 9), and the prediction mode is updated (step 10).

予測モード更新後およびmin_cost>Jの判定がＮＯのとき、予測終了が決定される（ステップ１１）。ステップ２〜１０の予測モードループが予測モードの数だけ繰り返されると、予測ループが終了する。その後、参照画素が更新され（ステップ１６）、処理は終了する。 After the prediction mode is updated and when the determination of min_cost> J is NO, the end of prediction is determined (step 11). When the prediction mode loop of steps 2 to 10 is repeated by the number of prediction modes, the prediction loop is terminated. Thereafter, the reference pixel is updated (step 16), and the process ends.

実際に符号化を行い発生符号量Rと符号化歪みDを計算するモード判定法は、予測モード数の多い小画素ブロックに対しては演算量の増大を招くが、大画素ブロックに対してはモード数が少ないため、大きな演算量の増加とはならない。このように大画素ブロックと小画素ブロックの特徴を生かして、計算量の増加する可能性のある小画素ブロックでは、推定パラメータを導入した第１の実施例で述べた発生符号量推定値と符号化歪み推定値を用いたモード判定が行われる。大画素ブロックでは画質向上のために、それぞれ発生符号量と符号化歪みを実測値から計算することにより正確なラグランジュコストを求めたモード判定を行う。このように階層構造を用いてモード判定を行うことによって符号化効率の低下を抑えつつ演算量を大幅に削減することができ、ハードウェアコストを削減することができる。 The mode decision method that actually performs coding and calculates the generated code amount R and coding distortion D causes an increase in the amount of computation for small pixel blocks with a large number of prediction modes, but for large pixel blocks. Since the number of modes is small, the amount of computation does not increase greatly. As described above, in the small pixel block in which the calculation amount may increase by utilizing the features of the large pixel block and the small pixel block, the generated code amount estimated value and the code described in the first embodiment in which the estimation parameter is introduced. Mode determination using the estimated distortion estimation value is performed. In the large pixel block, in order to improve the image quality, mode determination for obtaining an accurate Lagrangian cost is performed by calculating the generated code amount and the encoding distortion from the actually measured values. By performing mode determination using the hierarchical structure in this way, it is possible to significantly reduce the amount of calculation while suppressing a decrease in encoding efficiency, and it is possible to reduce hardware costs.

画質向上のための別の実施形態では、小画素ブロックの最適な予測モードの組み合わせをただ１つに限定せず、上述した推定パラメータを用いたモード判定を行い、小画素ブロックで複数の組み合わせ候補の中から選ばれた、複数の予測モード情報を大画素ブロックへと送る。大画素ブロックでは、上述した実測値によるモード判定を用いて大画素ブロックの予測モードで予測された予測候補と、小画素ブロックから送られてきた小画素ブロックの予測候補の組み合わせ候補の中から、最適な予測モード、或いは予測モードの組み合わせを判定する。これにより、推定パラメータを用いて計算された発生符号量と符号化歪みの推定値が外れたときの符号化効率を上げることができる。 In another embodiment for improving image quality, the combination of optimal prediction modes for small pixel blocks is not limited to only one, but mode determination using the above-described estimation parameters is performed, and a plurality of combination candidates in a small pixel block A plurality of prediction mode information selected from the above are sent to the large pixel block. In the large pixel block, from among the combination candidates of the prediction candidate predicted in the prediction mode of the large pixel block using the mode determination based on the actual measurement value described above and the prediction candidate of the small pixel block sent from the small pixel block, The optimum prediction mode or combination of prediction modes is determined. Thereby, it is possible to increase the encoding efficiency when the generated code amount calculated using the estimation parameter and the estimated value of the encoding distortion deviate.

図９はＨ．２６４における図２のブロック内予測モード決定部２０４の例を示している。図２の画素ブロックフレーム内予測部２１１においては、４×４ブロックフレーム内予測部２０２によって予測された４×４画素ブロックの組み合わせ候補（４×４予測モードの組み合わせ）がブロック内予測モード決定部２０４へ送られる。即ち、４×４画素組み合わせ候補１（６０１），４×４画素組み合わせ候補２（６０２），４×４画素組み合わせ候補３（６０３）が候補データ制御部６０８へ入力される。 FIG. 2 illustrates an example of the intra-block prediction mode determination unit 204 in FIG. In the pixel block intra-frame prediction unit 211 of FIG. 2, a 4 × 4 pixel block combination candidate (a combination of 4 × 4 prediction modes) predicted by the 4 × 4 block intra-frame prediction unit 202 is an intra-block prediction mode determination unit. 204. That is, 4 × 4 pixel combination candidate 1 (601), 4 × 4 pixel combination candidate 2 (602), and 4 × 4 pixel combination candidate 3 (603) are input to the candidate data control unit 608.

図４の１６×１６ブロックフレーム内予測部２０３では、符号量推定部４０２、符号化歪み推定部４０８、直交変換部４０５、量子化部４０６、逆量子化部４０７、逆直交変換部４０８を通らず、１６×１６予測部４０１から、１６×１６予測モード０（６０４）、１６×１６予測モード１（６０５），１６×１６予測モード２（６０６），１６×１６予測モード３（６０７）の予測情報を候補データ制御部６０８を介して、可変長符号化部６０９と符号化歪み算出部６１０へ送る。可変長符号化部６０９と符号化歪み算出部６１０は発生符号量R（６１４）と符号化歪みD（６１５）をそれぞれ実測し、これらの値を用いてラグランジュコストを計算し、モード判定を行い、最適な予測モード、或いは予測モードの組み合わせを決定し、復号画素信号６１３、予測モード情報６１６、及び直交変換係数６１７、量子化パラメータ６１８を出力する。ここで、１６×１６ブロック予測モード決定部２０３では予測モードの中から１つだけを選択しても良いし、全予測モードを出力しても良い。このように複数の小画素ブロックの組み合わせ候補と大画素ブロックの予測候補を用いて、発生符号量と符号化歪みを実測して計算し、モード判定を行うことにより、小画素ブロックのモード判定の精度が向上し画質向上効果が得られる。 In the 16 × 16 block intra-frame prediction unit 203 in FIG. 4, the code amount estimation unit 402, the coding distortion estimation unit 408, the orthogonal transform unit 405, the quantization unit 406, the inverse quantization unit 407, and the inverse orthogonal transform unit 408 pass. First, the 16 × 16 prediction mode 401 (16 × 16 prediction mode 0 (604), 16 × 16 prediction mode 1 (605), 16 × 16 prediction mode 2 (606), 16 × 16 prediction mode 3 (607)) is transmitted. Prediction information is sent to the variable length coding unit 609 and the coding distortion calculation unit 610 via the candidate data control unit 608. The variable length encoding unit 609 and the encoding distortion calculation unit 610 measure the generated code amount R (614) and the encoding distortion D (615), calculate the Lagrangian cost using these values, and perform mode determination. The optimal prediction mode or combination of prediction modes is determined, and the decoded pixel signal 613, prediction mode information 616, orthogonal transform coefficient 617, and quantization parameter 618 are output. Here, the 16 × 16 block prediction mode determination unit 203 may select only one of the prediction modes or may output all prediction modes. In this way, by using a plurality of small pixel block combination candidates and a large pixel block prediction candidate to actually measure and calculate the generated code amount and coding distortion, and perform mode determination, the mode determination of the small pixel block is performed. Accuracy is improved and an image quality improvement effect is obtained.

第３の実施形態では、主観画質の向上のために新しいコスト関数を導入する。符号化器の様式は第１の実施形態、或いは第２の実施形態を用いても良く、モード判定におけるコスト関数に、主観画質を考慮した隣接ブロックとの符号化歪みの誤差を導入する。 In the third embodiment, a new cost function is introduced to improve subjective image quality. The encoder may use the first embodiment or the second embodiment, and introduces an error of encoding distortion with an adjacent block in consideration of subjective image quality into the cost function in mode determination.

符号量RがRcよりも小さいという拘束条件の下で最適な復号画素信号LDを得る確率を最大化する。

The probability of obtaining an optimal decoded pixel signal LD under the constraint that the code amount R is smaller than Rc is maximized.

ここで、確率P(LD│I,QP)はギブス分布を仮定し、Bayesの定理を用いて次式に展開する。

Here, the probability P (LD | I, QP) assumes a Gibbs distribution and is expanded to the following equation using Bayes' theorem.

ここでC=P(I,QP)であり、予測モードI，量子化パラメータQPがともに与えられる結合確率は定数となる。上記の第１項の尤度関数と第２項の事前関数はそれぞれギブス分布となり、そのエネルギー関数（コスト）はR＜Rｃの拘束条件の下で次式で定義する。

Here, C = P (I, QP), and the joint probability that both the prediction mode I and the quantization parameter QP are given is a constant. The likelihood function of the first term and the prior function of the second term each have a Gibbs distribution, and the energy function (cost) is defined by the following equation under the constraint condition of R <Rc.

尤度関数は従来のラグランジュコストを表しており、拘束条件R＜Rｃの下で符号化歪みを最小にするラグランジュ未定乗数法の枠組みに合致する。また、事前確率関数は現ブロックと隣接ブロックとの相関の強さを表しており、隣接ブロックと現ブロックの符号化歪みをある範囲内に保つことで画像全体の主観的な画質向上効果が得られる。ここでD（I，QP）は符号化歪み、Ｓは画素ブロック中の全画素、またＮは隣接画素ブロックの全画素を表している。図１０は隣接画素ブロックの関係を示している。符号化対象ブロックの左ブロック、上ブロック、左上ブロック、右上ブロック、さらに符号化済み参照フレームの、符号化対象ブロックと同位置の参照ブロックを隣接ブロックとして符号化歪みを計算する。尤度関数ηは相関の強度を変更する乗数であり、例えば次式で表す。

The likelihood function represents the conventional Lagrangian cost, and conforms to the Lagrange undetermined multiplier method framework that minimizes the coding distortion under the constraint condition R <Rc. The prior probability function indicates the strength of the correlation between the current block and the adjacent block. By maintaining the coding distortion of the adjacent block and the current block within a certain range, the subjective image quality improvement effect of the entire image can be obtained. It is done. Here, D (I, QP) is encoding distortion, S is all pixels in the pixel block, and N is all pixels in the adjacent pixel block. FIG. 10 shows the relationship between adjacent pixel blocks. Coding distortion is calculated using the left block, the upper block, the upper left block, the upper right block of the coding target block, and the reference block at the same position as the coding target block of the coded reference frame as an adjacent block. The likelihood function η is a multiplier for changing the strength of the correlation, and is represented by the following equation, for example.

隣接ブロックとの誤差が閾値θ_thより小さいときはλ_αが選ばれ、大きいときはλ_βが選ばれる。ηはラグランジュコストの符号化歪みに隣接ブロックとの相関の強度を導入するパラメータを示す。これによって画質の特徴が似たブロックでは符号化歪みが近い予測モードを取りやすくなるようλ_αを設定し、画質の特徴が大きく異なるブロックでは従来のラグランジュコストに大きな影響を与えないよう小さなλ_βを設定する。これらのパラメータを効果的に変えることで、現画素ブロックと隣接画素ブロックの符号化歪みを一定に保ち、また同時に特徴の異なるブロックの相関を弱める。これは符号化歪みに対する拘束条件としてラグランジュコストを２つの拘束条件を持つ最大化問題へと拡張していることと等価である。［数６］、［数７］を［数５］に代入し、平均場近似を行うことで次式が得られる。

When the error between the adjacent blocks is smaller than the threshold value theta _th is lambda _alpha is selected, when large lambda _beta is chosen. η represents a parameter for introducing the strength of correlation with adjacent blocks into the Lagrangian cost encoding distortion. As a result, λ _α is set so that prediction modes with similar coding distortion can be easily obtained for blocks with similar image quality features, and small λ _{β for} blocks with significantly different image quality features so as not to have a significant impact on conventional Lagrangian costs. Set. By effectively changing these parameters, the coding distortion of the current pixel block and the adjacent pixel block is kept constant, and at the same time, the correlation between blocks having different characteristics is weakened. This is equivalent to extending the Lagrangian cost as a constraint for coding distortion to a maximization problem with two constraints. By substituting [Equation 6] and [Equation 7] into [Equation 5] and performing mean field approximation, the following equation is obtained.

ここで、両辺に対してR_sで偏微分すると、平均場近似により第二項は本式に影響を与えず、

Here, when partial differentiation is performed on both sides with R _s , the second term does not affect this equation due to mean field approximation,

つまりラグランジュ未定乗数法の枠組みに合致する。これにより、上述したラグランジュ未定乗数法の枠組みを変えることなく、画質の特徴が似たブロックでは符号化歪みを一定に保ち、画質の特徴が大きく異なるブロックではコストに大きな影響を与えないようになるため、ブロックノイズを視覚的に低減し、画像フレーム全体の主観画質を向上することが可能となる。 In other words, it is consistent with the Lagrange multiplier method framework. As a result, without changing the framework of the Lagrange undetermined multiplier method described above, coding distortion is kept constant in blocks with similar image quality characteristics, and cost is not greatly affected in blocks with greatly different image quality characteristics. Therefore, it is possible to visually reduce block noise and improve the subjective image quality of the entire image frame.

上述したように本発明によると、複数の符号化モードがあってブロック毎およびモード毎に異なるパラメータが設定されている。入力画像信号に応じてこのパラメータを用いて最適な符号化モードを推定する。 As described above, according to the present invention, there are a plurality of encoding modes, and different parameters are set for each block and each mode. The optimum encoding mode is estimated using this parameter according to the input image signal.

また、本発明では、フレームが大きいブロックから小さいブロックへと階層構造に分けられ、階層毎に異なる符号化モードが設定される。この場合、下階層に行くに従ってモード数が増えるので下階層では推定パラメータを用いて符号化モードが決定される。即ち、高速の推定アルゴリズムによってモード判定が行われる。上階層では候補の数が少ないので実際に符号化して発生した発生符号量と符号化歪みに基づいてモード判定を行う。即ち、モード数の組み合わせが多い階層では推定パラメータにより高速にモード判定を行い、モード数の少ない階層では実測により高精度なモード判定を行う。 In the present invention, the frame is divided into a hierarchical structure from a large block to a small block, and a different encoding mode is set for each layer. In this case, since the number of modes increases as going to the lower layer, the encoding mode is determined using the estimation parameter in the lower layer. That is, mode determination is performed by a high-speed estimation algorithm. Since the number of candidates is small in the upper hierarchy, mode determination is performed based on the generated code amount and encoding distortion generated by actual encoding. That is, mode determination is performed at high speed using the estimation parameter in a hierarchy with a large number of mode combinations, and highly accurate mode determination is performed through actual measurement in a hierarchy with a small number of modes.

更に、本発明では、モード判定にラグランジュ未定乗数法という手法を用い、ラグランジュ未定乗数法の枠組みを変えることなくラグランジュコスト計算に新たな項を導入してブロック同士の歪みが出ないような符号化モード判定が行われる。 Furthermore, the present invention uses a method called Lagrange undetermined multiplier method for mode determination, and introduces a new term in Lagrange cost calculation without changing the framework of Lagrange undetermined multiplier method, so that distortion between blocks does not occur. Mode determination is performed.

上述のような本発明による動画像符号化装置は、動画像伝送システムおよび動画像記録装置における画像圧縮処理に適している。 The moving image encoding apparatus according to the present invention as described above is suitable for image compression processing in a moving image transmission system and a moving image recording apparatus.

本発明の実施形態に係わる動画像符号化装置のブロック回路図。1 is a block circuit diagram of a video encoding apparatus according to an embodiment of the present invention. 図１のフレーム内予測部のブロック回路図。FIG. 2 is a block circuit diagram of an intra-frame prediction unit in FIG. 1. 図２に示される、推定パラメータを用いた４×４画素ブロックのフレーム内予測部のブロック回路図。FIG. 3 is a block circuit diagram of a 4 × 4 pixel block intra-frame prediction unit using estimation parameters shown in FIG. 2. 図２に示される、推定パラメータを用いた１６×１６画素ブロックのフレーム内予測部のブロック回路図。FIG. 3 is a block circuit diagram of a 16 × 16 pixel block intra-frame prediction unit using estimation parameters shown in FIG. 2. 本発明の実施形態に係わる画素ブロックの分割例と予測モードを示す図。The figure which shows the example of a division | segmentation of pixel block and prediction mode concerning embodiment of this invention. 本発明の実施形態に係わるＨ．２６４における４×４画素ブロックの予測モード例を示す図。H. according to an embodiment of the present invention. The figure which shows the example of a prediction mode of the 4x4 pixel block in H.264. 本発明の実施形態に係わる推定パラメータを選択する例を示す図。The figure which shows the example which selects the estimation parameter concerning embodiment of this invention. 本発明の実施形態に係わる符号量と符号化歪みを実測してモード判定を行う例を示す図。The figure which shows the example which performs mode determination by actually measuring the code amount and encoding distortion concerning embodiment of this invention. 本発明の実施形態に係わる複数候補からモード判定を行うモード判定を示す図。The figure which shows the mode determination which performs mode determination from the several candidate concerning embodiment of this invention. 本発明の実施形態に係わるコスト関数の隣接画素ブロックの例を示す図。The figure which shows the example of the adjacent pixel block of the cost function concerning embodiment of this invention. 本発明の実施形態に係わる推定パラメータを用いたフレーム内予測を示すフローチャート図。The flowchart figure which shows the prediction in a flame | frame using the estimation parameter concerning embodiment of this invention. 本発明の実施形態に係わる符号量と符号化歪みを実測してモード判定を行うフレーム内予測のフローチャート図。FIG. 5 is a flowchart of intra-frame prediction in which mode determination is performed by actually measuring a code amount and coding distortion according to an embodiment of the present invention.

Explanation of symbols

１０１…減算器、１０２…直交変換部、１０３…量子化部、１０４…逆量子化部、１０５…逆直交変換部、１０６…加算器、１０７…フレームメモリ、１０８…フレーム間予測部、１０９…フレーム内予測部、１１０…符号化制御部
１１１…可変長符号化部、１１２…多重化部、１１３…出力バッファ、１１４…符号化部、１１５…ＭＢ予測モード選択部、２０１…ブロック形状制御部、２０２…４×４ブロックフレーム内予測部、２０３…１６×１６ブロックフレーム内予測部、２０４…ブロック内予測モード決定部、２０５…参照画像信号、２０６…入力画像信号、２０７…予測画素信号、２０８…復号画素信号、２０９…直交変換係数、２１０…量子化パラメータ、２１１…画素ブロックフレーム内予測部 DESCRIPTION OF SYMBOLS 101 ... Subtractor 102 ... Orthogonal transformation part 103 ... Quantization part 104 ... Inverse quantization part 105 ... Inverse orthogonal transformation part 106 ... Adder 107 ... Frame memory 108 ... Inter-frame prediction part 109 ... Intraframe prediction unit 110 ... Coding control unit 111 ... Variable length coding unit 112 112 Multiplexing unit 113 ... Output buffer 114 ... Coding unit 115 ... MB prediction mode selection unit 201 ... Block shape control unit 202 ... 4 × 4 block intra-frame prediction unit, 203 ... 16 × 16 block intra-frame prediction unit, 204 ... intra-block prediction mode determination unit, 205 ... reference image signal, 206 ... input image signal, 207 ... prediction pixel signal, 208: Decoded pixel signal, 209: Orthogonal transform coefficient, 210 ... Quantization parameter, 211 ... Intra-pixel block intra-frame prediction unit

Claims

A moving image in which an input image signal is divided into a plurality of pixel blocks, one mode is selected from a plurality of encoding modes for each pixel block, and encoding is performed for each pixel block in the selected encoding mode. In the encoding method,
A different estimation parameter is generated for each pixel block and each encoding mode, and the generated code amount and coding distortion of the pixel block of the input image signal are estimated using the estimated parameter, and the estimated generated code amount and code A moving picture coding method, wherein an optimal coding mode is determined based on a coding distortion.

A moving image signal is divided into a plurality of first pixel blocks, the first pixel block is divided into a plurality of second pixel blocks, and one code is generated from a plurality of encoding modes for each size of the pixel block. In the moving picture coding method for selecting a coding mode and performing coding for each pixel block in the selected coding mode,
For the second pixel block, using the estimation parameters generated for each pixel block and each encoding mode, the generated code amount and encoding distortion of the pixel block are estimated, and the estimated generated code Based on the amount and coding distortion, determine one candidate from the plurality of coding modes, or a combination candidate of the plurality of coding modes,
With respect to the first pixel block, the code using one candidate or a plurality of combination candidates from the plurality of coding modes based on the generated code amount and the actual value of the coding distortion in each coding mode. A moving picture coding method characterized by determining a coding mode.

Each image frame of the input moving image signal is divided into a plurality of pixel blocks, one mode is selected from a plurality of encoding modes for each pixel block, and each pixel block is encoded in the selected encoding mode. In a moving image encoding method for performing
A first encoding distortion of an encoded pixel block adjacent to an encoding target pixel block in the image frame is detected, and an already encoded image frame at the same position as the encoding target pixel block has been encoded. Detecting a second coding distortion of the pixel block, and determining a coding mode in which an error between at least one of the first and second coding distortions and the coding distortion of the coding target pixel block is reduced; A video encoding method characterized by the above.

Means for generating different estimation parameters for each pixel block and for each encoding mode;
Means for estimating a generated code amount and encoding distortion of a pixel block of an input image signal using the estimation parameter;
Means for determining an optimum coding mode from a plurality of coding modes based on the estimated generated code amount and coding distortion;
Means for encoding each pixel block in the optimal encoding mode;
A moving picture encoding apparatus comprising:

Means for dividing a moving image signal into a plurality of first pixel blocks, and dividing the first pixel block into a plurality of second pixel blocks;
Means for generating different estimation parameters for each pixel block and for each encoding mode;
With respect to the second pixel block, using the estimation parameters generated for each pixel block and each coding mode, means for estimating the generated code amount and coding distortion of the pixel block;
Means for determining one candidate from the plurality of encoding modes or a combination candidate of the plurality of encoding modes based on the estimated generated code amount and encoding distortion;
With respect to the first pixel block, the code using one candidate or a plurality of combination candidates from the plurality of coding modes based on the generated code amount and the actual value of the coding distortion in each coding mode. Means for determining the activation mode;
Means for encoding for each pixel block in the determined encoding mode;
A moving picture encoding apparatus comprising:

Each image frame of the input moving image signal is divided into a plurality of pixel blocks, one mode is selected from a plurality of encoding modes for each pixel block, and each pixel block is encoded in the selected encoding mode. In a video encoding device that performs
Means for detecting a first encoding distortion of an encoded pixel block adjacent to an encoding target pixel block in an image frame;
Means for detecting a second encoding distortion of an encoded pixel block of an already encoded image frame at the same position as the encoding target pixel block;
Means for determining an encoding mode in which an error between at least one of the first and second encoding distortions and the encoding distortion of the encoding target pixel block is reduced;
Means for encoding each pixel block in the determined encoding mode;
A moving picture encoding apparatus comprising: