JP2010288181A

JP2010288181A - Moving image encoding method, moving image encoding apparatus, and moving image encoding program

Info

Publication number: JP2010288181A
Application number: JP2009141995A
Authority: JP
Inventors: Shigeru Fukushima; 茂福島
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2009-06-15
Filing date: 2009-06-15
Publication date: 2010-12-24

Abstract

<P>PROBLEM TO BE SOLVED: To drastically reduce prediction errors and improve the encoding efficiency owing to error averaging, by selecting a signal with a minimum error evaluation value as a predicted image signal, among a plurality of motion compensation predicted image signals and one or more composite motion compensation predicted image signals. <P>SOLUTION: Using a motion vector, a motion compensation and image composition selector 111 applies motion compensation prediction to reference image signals up-sampled separately by filters 108-110. Then, the motion compensation and image composition selector 111 generates a motion compensation predicted image signal, respectively. Next, the motion compensation and image composition selector 111 synthesizes two pieces of motion compensation predicted image signal to generate synthetic image signal. Then, the motion compensation and image composition selector 111 adopts image signal with a minimum error evaluation value as a predicted image signal, among three pieces of motion compensation predicted image signal and three pieces of synthetic image signal. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は動画像符号化方法、動画像符号化装置及び動画像符号化プログラムに係り、特に動き補償予測処理を搭載した動画像符号化方法、動画像符号化装置及び動画像符号化プログラムに関する。 The present invention relates to a moving image encoding method, a moving image encoding device, and a moving image encoding program, and more particularly, to a moving image encoding method, a moving image encoding device, and a moving image encoding program equipped with motion compensation prediction processing.

ＭＰＥＧ（Moving Picture Experts Group）に代表される動画像の圧縮符号化では、フレーム間の相関を用いて符号量を圧縮する動き補償予測符号化が多く用いられる。動き補償予測符号化では、符号化画像と参照画像の各ブロックの相対位置関係を示す動きベクトルが必要である。この動きベクトルは、ブロック間の誤差評価値が小さいほど動き補償予測の精度が向上し、符号化効率が向上する。 In motion picture compression coding represented by MPEG (Moving Picture Experts Group), motion compensation predictive coding that compresses a code amount using correlation between frames is often used. In motion compensation predictive coding, a motion vector indicating the relative positional relationship between each block of the coded image and the reference image is required. As for the motion vector, as the error evaluation value between blocks is smaller, the accuracy of motion compensation prediction is improved and the coding efficiency is improved.

ここで、上記の誤差評価値とは、一般に、ブロックマッチングを用いて、ブロック間の相違を絶対差分和や差分二乗和などで表現した予測誤差や、動きベクトル符号量やＤＣＴ（離散コサイン変換：Discrete Cosine Transform）係数符号量に基づいた符号量換算などで表現したものである。 Here, the error evaluation value is generally a prediction error in which a difference between blocks is expressed by an absolute difference sum or a sum of squared differences using block matching, a motion vector code amount, DCT (discrete cosine transform: This is expressed by code amount conversion based on the coefficient code amount (discrete cosine transform).

動き補償予測の精度を向上させる技術として、（１）Ｂピクチャ（双方向予測ピクチャ）を用いた双方向予測、（２）分数精度の動き補償予測、が知られている。 As techniques for improving the accuracy of motion compensation prediction, (1) bidirectional prediction using a B picture (bidirectional prediction picture) and (2) fractional accuracy motion compensation prediction are known.

上記（１）のＢピクチャを用いた双方向予測は、参照画像を２枚設定し、それぞれの参照画像について検出した動きベクトルを用いて動き補償予測した２枚の画像を合成することで、誤差の平均化により予測誤差を小さくする技術である。ただし、双方向予測には動きベクトルが２本必要となり、片方向予測に対して動きベクトル情報が１本分増えるため、予測誤差情報の減少分より動きベクトル情報の増加分の方が大きい場合は、符号化効率を向上させることはできない。 The bi-directional prediction using the B picture in (1) sets two reference images and combines two images that have been motion-compensated predicted using the motion vectors detected for each reference image. This is a technique for reducing the prediction error by averaging. However, two motion vectors are required for bidirectional prediction, and motion vector information is increased by one for unidirectional prediction. Therefore, when the increase in motion vector information is larger than the decrease in prediction error information, The encoding efficiency cannot be improved.

一方、上記（２）の分数精度の動き補償予測は、参照画像の画素を内挿処理又は補間処理（本明細書ではこれを「アップサンプリング」という）して動き補償予測を行う際に、実際に算出した参照画像上の整数画素位置の画素を用いてアップサンプリングし、１／２画素位置や１／４画素位置の画素値を算出し、その画素位置の画素値を用いて動きベクトルを探索することでより精度良く動き補償予測することができる技術である。これにより、ビデオ信号内のオブジェクトの動きを分数精度で正確に表現できるため、更に符号化効率を向上させることが可能である。 On the other hand, the fractional accuracy motion compensated prediction of (2) above is actually performed when motion compensated prediction is performed by interpolating or interpolating the pixels of the reference image (this is referred to as “upsampling” in this specification). Up-sampling is performed using the pixel at the integer pixel position on the reference image calculated in step 1, and the pixel value at the 1/2 pixel position or 1/4 pixel position is calculated, and the motion vector is searched using the pixel value at the pixel position. By doing so, it is a technique that can perform motion compensation prediction with higher accuracy. Thereby, since the motion of the object in the video signal can be accurately expressed with fractional accuracy, the encoding efficiency can be further improved.

上記の分数精度の動き補償予測について更に詳しく説明する。参照画像を分数精度にアップサンプリングする手法として、ＩＳＯ(International Organization for Standardization;国際標準化機構)とＩＥＣ(International Electrotechnical Commission;国際電気標準会議)とが、合同で設立した第１合同技術委員会で標準化された最新の動画像圧縮符号化標準であるＭＰＥＧ-４ＡＶＣ(Advanced Video Coding)では、上記の分数精度の画素値を、線形フィルタである６タップフィルタにより参照画像をフィルタリングすることで得る。 The motion compensation prediction with fractional accuracy will be described in more detail. ISO (International Organization for Standardization) and IEC (International Electrotechnical Commission) standardize the first joint technical committee established as a method for upsampling reference images with fractional accuracy. According to MPEG-4 AVC (Advanced Video Coding) which is the latest moving picture compression coding standard, the pixel value with the fractional accuracy is obtained by filtering the reference picture with a 6-tap filter which is a linear filter.

この６タップフィルタでは、例えば、符号化対象画像信号のブロックとの間でブロックマッチングにより動きベクトルが探索されるべき参照画像信号が、図１１に丸印で示す１画素精度の画素位置で表される画像信号であるものとすると、１画素精度の各画素の中間位置にある例えばｂで示す１／２画素精度の画素の画素値は、画素ｂを中心とする周囲の横方向に配列されたＥ、Ｆ、Ｇ、Ｈ、Ｉ及びＪで示す６つの１画素精度の画素の画素値に基づいて生成する。一方、ｈで示す１／２画素精度の画素の画素値は、画素ｈを中心とする周囲の縦方向に配列されたＡ、Ｃ、Ｇ、Ｍ、Ｒ及びＴで示す６つの１画素精度の画素の画素値に基づいて生成する。同様に、１／２画素精度のａａ、ｂｂ、ｃｃ、ｄｄ、ｅｅ、ｆｆ、ｇｇ、ｈｈ、ｂ、ｍ、ｓ等の各画素の画素値は、その画素を中心とする縦方向だけ又は横方向だけに配列された１画素精度の６つの画素の画素値から生成される（６タップフィルタを縦方向のみ又は横方向のみ用いて生成される。）。 In this 6-tap filter, for example, a reference image signal whose motion vector is to be searched by block matching with a block of an encoding target image signal is represented by a pixel position with 1 pixel accuracy indicated by a circle in FIG. For example, the pixel value of a pixel with 1/2 pixel accuracy indicated by b at the middle position of each pixel with 1 pixel accuracy is arranged in the horizontal direction around the pixel b. The pixel values are generated based on the pixel values of six single-pixel precision pixels indicated by E, F, G, H, I, and J. On the other hand, the pixel value of the pixel with half-pixel accuracy indicated by h is six one-pixel accuracy indicated by A, C, G, M, R, and T arranged in the vertical direction around the pixel h. Generated based on the pixel value of the pixel. Similarly, the pixel value of each pixel such as aa, bb, cc, dd, ee, ff, gg, hh, b, m, s, etc. with 1/2 pixel accuracy is only in the vertical direction centered on the pixel or in the horizontal direction. It is generated from pixel values of six pixels with one pixel accuracy arranged only in the direction (generated using a 6-tap filter only in the vertical direction or only in the horizontal direction).

一方、図１１にｊで示す１／２画素精度の画素の画素値は、その画素ｊを中心とする横方向に配列された１／２画素精度の６つの画素ｃｃ、ｄｄ、ｈ、ｍ、ｅｅ、ｆｆの画素値（又は縦方向に配列された１／２画素精度の６つの画素ａａ、ｂｂ、ｂ、ｓ、ｇｇ、ｈｈの画素値）から生成される。この画素ｊを生成するための６つの画素ｃｃ、ｄｄ、ｈ、ｍ、ｅｅ、ｆｆの画素値は、それぞれその画素を中心とする周囲の縦方向の６つの１画素精度の画素の画素値に基づいて生成されるので、結局、画素ｊの画素値は、６タップフィルタを縦方向と横方向の両方に適用して求められることになる。 On the other hand, the pixel value of a pixel with 1/2 pixel accuracy indicated by j in FIG. 11 is six pixels cc, dd, h, m, It is generated from pixel values of ee and ff (or pixel values of six pixels aa, bb, b, s, gg, and hh with 1/2 pixel accuracy arranged in the vertical direction). The pixel values of the six pixels cc, dd, h, m, ee, and ff for generating the pixel j are the pixel values of six pixels with one pixel accuracy in the vertical direction around the pixel, respectively. Therefore, after all, the pixel value of the pixel j is obtained by applying the 6-tap filter in both the vertical direction and the horizontal direction.

更に、１／４画素精度である画素ｆ、ｉ、ｋ、ｑの各画素値は、それぞれ先に算出した上記の画素ｊとその画素ｊに隣接する１／２画素精度の画素ｂ、ｈ、ｍ、ｓの画素値とから生成される。従って、これらの画素ｆ、ｉ、ｋ、ｑの各画素値も画素ｊの値を用いるので、６タップフィルタを縦方向と横方向の両方に適用して求められることになる。他の１／２画素精度の画素同士から画素値が求められる１／４画素精度の画素についても同様である。これらをまとめると、６タップフィルタにより図１２に示すような式で演算された画素値の各画素が得られる。なお、図１１では、６タップフィルタを縦方向と横方向の両方に適用してアップサンプリングされた画素の位置を×印で示している。 Further, the pixel values of the pixels f, i, k, and q having the ¼ pixel accuracy are the above-described pixel j and the pixels b, h, It is generated from the pixel values of m and s. Accordingly, since the pixel values of these pixels f, i, k, and q also use the value of the pixel j, they are obtained by applying the 6-tap filter in both the vertical direction and the horizontal direction. The same applies to the pixels with ¼ pixel accuracy for which pixel values are obtained from other pixels with ½ pixel accuracy. When these are put together, each pixel having a pixel value calculated by a formula as shown in FIG. 12 is obtained by a 6-tap filter. In FIG. 11, the positions of pixels up-sampled by applying the 6-tap filter in both the vertical direction and the horizontal direction are indicated by crosses.

図１２に示すようなフィルタ係数が(1,-5,20,20,-5,1)に固定された６タップフィルタを用いて１／４画素精度の画素値を生成するアップサンプリング手法は、フィルタ係数が固定されているため、符号化装置において、フィルタ係数を決定する演算量が不要であるとともに、符号化／復号化の両装置において定められている１つのフィルタリング手法に対応した回路を用意するだけで済むため、回路規模が小さくなる。 An up-sampling method for generating a pixel value with 1/4 pixel accuracy using a 6-tap filter in which the filter coefficient as shown in FIG. 12 is fixed to (1, -5,20,20, -5,1) is as follows: Since the filter coefficient is fixed, the encoding device does not require the amount of calculation for determining the filter coefficient, and a circuit corresponding to one filtering method defined in both the encoding / decoding devices is prepared. Since it is only necessary to do this, the circuit scale is reduced.

しかし、参照画像の状態に関係なく、すべての参照画像について固定のフィルタ係数によりアップサンプリングを施してしまうと、参照画像が劣化している場合などには、最適なアップサンプリングを施すことができず、符号化効率が低下してしまう。これは、固定のフィルタ係数を用いる場合、通常、符号化劣化を考慮せずにフィルタを設計していることが原因である。その結果、符号化劣化を含む参照画像を用いて動き補償予測する場合には、最適なアップサンプリングが施せない。 However, regardless of the state of the reference image, if all the reference images are upsampled with a fixed filter coefficient, the optimal upsampling cannot be performed if the reference image is degraded. As a result, the encoding efficiency decreases. This is because when a fixed filter coefficient is used, the filter is usually designed without considering the encoding deterioration. As a result, when motion compensation prediction is performed using a reference image including coding degradation, optimal upsampling cannot be performed.

そこで、特許文献１などでは、各参照画像に対して最適なアップサンプリングを施すために、各参照画像に対して原画像と動き補償予測した画像との誤差評価値が最小になるようにフィルタ係数を決定している。 Therefore, in Patent Document 1 or the like, in order to perform optimum upsampling for each reference image, the filter coefficient is set so that the error evaluation value between the original image and the motion compensation predicted image is minimized for each reference image. Is determined.

原画像と動き補償予測した画像との誤差評価値が最小になるようなフィルタ係数の決定について具体的に説明する。フィルタ係数の決定は、符号化装置側で行う処理である。まず、動き補償予測の対象となる参照画像に対して、固定のフィルタ係数のフィルタを用いて参照画像をアップサンプリングし、動きベクトルを検出する。 The determination of the filter coefficient that minimizes the error evaluation value between the original image and the motion compensation predicted image will be specifically described. The determination of the filter coefficient is a process performed on the encoding device side. First, a reference image is up-sampled using a filter having a fixed filter coefficient with respect to a reference image to be subjected to motion compensation prediction, and a motion vector is detected.

次に、符号化対象の原画像と、検出された動きベクトルを用いて動き補償予測した画像との誤差評価値が小さくなるようにアップサンプリングフィルタの係数を決定する。例えば、非分離二次元ウィナーフィルタを用いて次式により計算する。 Next, the coefficient of the upsampling filter is determined so that the error evaluation value between the original image to be encoded and the image subjected to motion compensation prediction using the detected motion vector becomes small. For example, the calculation is performed by the following formula using a non-separable two-dimensional Wiener filter.

ただし、上式中、Ｓx,yは原画像を表し、Ｐx,yは参照画像を表す。また、(ｅSP)2は誤差評価値（ここでは差分二乗和を用いる）であり、(ｅSP)2が最小となるようにフィルタ係数ｈi,jSPを決定する。

In the above equation, Sx, y represents the original image, and Px, y represents the reference image. Further, (eSP) 2 is an error evaluation value (here, the sum of squared differences is used), and the filter coefficient hi, jSP is determined so that (eSP) 2 is minimized.

上記に代表されるフィルタ係数決定アルゴリズムは、様々な手法が提案されており、そのうちのどの手法を採用するかは符号化効率と演算量のバランスを考慮して適宜決定するのがよい。符号化効率と演算量の間にはトレードオフの関係が存在するためである。 Various methods have been proposed for the filter coefficient determination algorithm represented by the above, and it is preferable to appropriately determine which method is to be adopted in consideration of the balance between the encoding efficiency and the calculation amount. This is because there is a trade-off relationship between the encoding efficiency and the calculation amount.

所定のフィルタ係数決定アルゴリズムを用いて、６タップフィルタのフィルタ係数を（2,-4,18,18,-4,2）のように算出した場合、この６タップフィルタを用いて参照画像の画素をアップサンプリングすると、ピクチャ全体としては符号化効率が向上する。しかし、局所的（ブロック単位等）には、必ずしも算出されたフィルタ係数が他のフィルタ係数よりも符号化効率が良いとは限らない。 When the filter coefficient of the 6-tap filter is calculated as (2, -4,18,18, -4,2) using a predetermined filter coefficient determination algorithm, the pixel of the reference image is calculated using this 6-tap filter. Upsampling improves the coding efficiency of the entire picture. However, locally (for example, in block units), the calculated filter coefficients do not necessarily have better encoding efficiency than other filter coefficients.

そこで、特許文献２のように、複数のフィルタリング手法を用意し、フィルタ係数の選択をブロック単位などで適応的に選択することで、符号化効率を更に向上させる技術も提案されている。 Thus, as in Patent Document 2, a technique for further improving the coding efficiency by preparing a plurality of filtering methods and adaptively selecting filter coefficients in units of blocks has been proposed.

特表２００８−５３６４１４号公報Special table 2008-536414 gazette 特表２００８−５４４７０８号公報Special table 2008-544708 gazette

このように、従来の動画像符号化装置では、分散精度の動き補償予測画像を生成する際に、原画像との誤差が小さくなるようにアップサンプリング用のフィルタの係数を決定している。その結果、伝送する予測誤差が減少し符号化効率が向上する。更に、複数のフィルタ係数を用意し、そのフィルタ係数の選択をブロック単位などの所定の単位で適応的に選択することで、符号化効率を更に向上させることも可能である。 As described above, in the conventional moving image coding apparatus, when generating a motion compensated prediction image with dispersion accuracy, the coefficient of the upsampling filter is determined so that an error from the original image becomes small. As a result, the prediction error to be transmitted is reduced and the coding efficiency is improved. Furthermore, it is possible to further improve the coding efficiency by preparing a plurality of filter coefficients and adaptively selecting the filter coefficients in a predetermined unit such as a block unit.

しかしながら、このような複数のフィルタ係数から適応的にフィルタ係数を選択するだけでは、誤差の平均化を利用して予測誤差を更に小さくするために双方向予測を用いる場合、動きベクトルが２本必要なことに何ら変わりはない。よって、双方向予測による予測誤差情報の減少分より動きベクトル情報の増加分の方が大きい場合は、符号化効率を向上させることはできない。また、Ｐピクチャ（順方向予測ピクチャ）のように参照画像を１枚しか使用できない場合、誤差の平均化を利用して予測誤差を更に小さくする手段がない。 However, if only a filter coefficient is adaptively selected from such a plurality of filter coefficients, two motion vectors are required when bi-directional prediction is used to further reduce the prediction error using error averaging. There is no change in anything. Therefore, when the amount of increase in motion vector information is larger than the amount of decrease in prediction error information due to bidirectional prediction, the encoding efficiency cannot be improved. Further, when only one reference image can be used as in a P picture (forward prediction picture), there is no means for further reducing the prediction error by using error averaging.

本発明は以上の点に鑑みなされたもので、複数の動き補償予測画像信号と、２以上の動き補償予測画像信号を合成して生成した１以上の合成動き補償予測画像信号とのうち、誤差評価値が最小の信号を選択して予測画像信号とすることで、誤差の平均化により予測誤差を大幅に減少し符号化効率を向上する動画像符号化方法、動画像符号化装置及び動画像符号化プログラムを提供することを目的とする。 The present invention has been made in view of the above points. Among the plurality of motion compensation prediction image signals and one or more combined motion compensation prediction image signals generated by combining two or more motion compensation prediction image signals, an error is generated. A moving picture coding method, a moving picture coding apparatus, and a moving picture that select a signal with the smallest evaluation value and use it as a predicted picture signal to significantly reduce the prediction error and improve coding efficiency by averaging the errors. An object is to provide an encoding program.

上記の目的を達成するため、本発明の動画像符号化方法は、入力動画像信号における符号化対象画像信号を符号化して得た符号化信号を局部復号処理して参照画像信号を生成し、その参照画像信号を動き補償予測して得られた予測画像信号を用いて、入力動画像信号における現符号化対象画像信号の動き補償予測符号化を行う動画像符号化方法において、入力動画像信号における現符号化対象画像信号の動きベクトルを検出する第１のステップと、生成済みの参照画像信号から、２以上のフィルタ処理と、同一の動きベクトルに基づいた動き補償処理とにより、各フィルタ処理にそれぞれ対応した動き補償予測画像信号を生成する第２のステップと、生成された複数の動き補償予測画像信号のうち、互いに異なる組み合わせの２以上の動き補償予測画像信号を合成して１以上の合成動き補償予測画像信号を生成する第３のステップと、複数の動き補償予測画像信号及び合成動き補償予測画像信号のうち、誤差評価値が最小の信号を選択して予測画像信号として出力する第４のステップと、現符号化対象画像信号と第４のステップにより選択された予測画像信号との差分信号に対して、直交変換及び量子化して符号化信号である量子化後信号を生成する第５のステップと、選択された予測画像信号の生成に用いた生成済みの参照画像信号、同一の動きベクトル、及びフィルタ処理をそれぞれ特定するための参照画像信号特定情報、動きベクトル特定情報、及びフィルタ処理特定情報を、量子化後信号に所定の符号化処理をして生成した動画像符号化信号に付加して出力する第６のステップと、量子化後信号を逆量子化及び逆直交変換して得た信号に、選択された予測画像信号を加算することによる局部復号処理により、新たな参照画像信号を生成する第７のステップとを含むことを特徴とする。 In order to achieve the above object, the moving image encoding method of the present invention generates a reference image signal by locally decoding an encoded signal obtained by encoding an encoding target image signal in an input moving image signal, In a moving picture coding method for performing motion compensated prediction coding of a current coding target image signal in an input moving picture signal using a predicted picture signal obtained by motion compensation prediction of the reference picture signal, the input moving picture signal Each of the filter processing by the first step of detecting the motion vector of the current encoding target image signal in FIG. 1, two or more filter processing from the generated reference image signal, and motion compensation processing based on the same motion vector A second step of generating a motion compensated prediction image signal corresponding to each of the motion compensation prediction image signals, and two or more motion compensations of different combinations among the generated motion compensation prediction image signals. A third step of synthesizing the predicted image signals to generate one or more combined motion compensated predicted image signals; and a signal having the smallest error evaluation value among the plurality of motion compensated predicted image signals and the combined motion compensated predicted image signals. A fourth step that is selected and output as a predicted image signal, and a difference signal between the current encoding target image signal and the predicted image signal selected in the fourth step is subjected to orthogonal transform and quantization to be an encoded signal And a reference image signal for specifying the generated reference image signal, the same motion vector, and the filter processing used to generate the selected predicted image signal, respectively, Specific information, motion vector specifying information, and filtering processing specifying information are added to a moving image encoded signal generated by performing a predetermined encoding process on the quantized signal and output in a sixth step. And a seventh step of generating a new reference image signal by local decoding processing by adding the selected predicted image signal to a signal obtained by inverse quantization and inverse orthogonal transform of the quantized signal. It is characterized by including.

ここで、上記の第３のステップは、複数の動き補償予測画像信号のうち、互いに異なる組み合わせの２以上の動き補償予測画像信号のそれぞれを、設定した合成比率で重み付け合成して１以上の合成動き補償予測画像信号を生成し、上記の第６のステップは、上記の第４のステップが、合成動き補償予測画像信号を選択して予測画像信号として出力するとき、予測画像信号の生成に用いた合成比率を示す合成比率情報を、動画像符号化信号に更に付加するようにしてもよい。 Here, in the third step, one or more synthesis is performed by weighting and synthesizing each of two or more motion compensation prediction image signals of different combinations among the plurality of motion compensation prediction image signals at a set synthesis ratio. A motion compensated predicted image signal is generated, and the sixth step is used to generate a predicted image signal when the fourth step selects a synthesized motion compensated predicted image signal and outputs it as a predicted image signal. The synthesis ratio information indicating the synthesized ratio may be further added to the moving image encoded signal.

また、上記の目的を達成するため、本発明の動画像符号化装置は、入力動画像信号における符号化対象画像信号を符号化して得た符号化信号を局部復号処理して参照画像信号を生成し、その参照画像信号を動き補償予測して得られた予測画像信号を用いて、入力動画像信号における現符号化対象画像信号の動き補償予測符号化を行う動画像符号化装置において、入力動画像信号における現符号化対象画像信号の動きベクトルを検出する動きベクトル検出手段と、生成済みの参照画像信号から、２以上のフィルタ処理と、同一の動きベクトルに基づいた動き補償処理とにより、各前記フィルタ処理にそれぞれ対応した動き補償予測画像信号を生成する動き補償予測画像信号生成手段と、生成された複数の動き補償予測画像信号のうち、互いに異なる組み合わせの２以上の動き補償予測画像信号を合成して１以上の合成動き補償予測画像信号を生成する合成動き補償予測画像信号生成手段と、複数の動き補償予測画像信号及び合成動き補償予測画像信号のうち、誤差評価値が最小の信号を選択して予測画像信号として出力する予測画像信号選択手段と、現符号化対象画像信号と予測画像信号選択手段により選択された予測画像信号との差分信号に対して、直交変換及び量子化して符号化信号である量子化後信号を生成する量子化後信号生成手段と、選択された予測画像信号の生成に用いた生成済みの参照画像信号、同一の動きベクトル、及びフィルタ処理をそれぞれ特定するための参照画像信号特定情報、動きベクトル特定情報、及びフィルタ処理特定情報を、量子化後信号に所定の符号化処理をして生成した動画像符号化信号に付加して出力する符号化手段と、量子化後信号を逆量子化及び逆直交変換して得た信号に、選択された予測画像信号を加算することによる局部復号処理により、新たな参照画像信号を生成する参照画像信号生成手段とを有することを特徴とする。 In order to achieve the above object, the moving image encoding apparatus of the present invention generates a reference image signal by locally decoding an encoded signal obtained by encoding an encoding target image signal in an input moving image signal. In the moving picture coding apparatus that performs motion compensated prediction coding of the current coding target image signal in the input moving picture signal using the predicted picture signal obtained by motion compensation prediction of the reference picture signal, A motion vector detecting means for detecting a motion vector of the current encoding target image signal in the image signal, two or more filtering processes from the generated reference image signal, and a motion compensation process based on the same motion vector, A motion compensated predicted image signal generating unit that generates a motion compensated predicted image signal corresponding to each of the filtering processes and a plurality of generated motion compensated predicted image signals are different from each other. A combined motion compensated predicted image signal generating means for generating one or more combined motion compensated predicted image signals by combining two or more motion compensated predicted image signals in combination, a plurality of motion compensated predicted image signals and a combined motion compensated predicted image A difference between a prediction image signal selection unit that selects a signal having the smallest error evaluation value from the signals and outputs the selected signal as a prediction image signal, and a prediction image signal selected by the current encoding target image signal and the prediction image signal selection unit The quantized signal generating means for generating a quantized signal that is an encoded signal by orthogonal transform and quantization with respect to the signal, and the generated reference image signal used for generating the selected predicted image signal are the same The reference image signal specifying information, the motion vector specifying information, and the filter processing specifying information for specifying the motion vector and the filtering process are respectively added to the quantized signal by a predetermined code. Adds the selected predicted image signal to the signal obtained by dequantizing and inverse orthogonal transforming the quantized signal and adding the output to the encoded video signal generated by the quantization process And a reference image signal generating means for generating a new reference image signal by local decoding processing.

ここで、上記の合成動き補償予測画像信号生成手段は、複数の動き補償予測画像信号のうち、互いに異なる組み合わせの２以上の動き補償予測画像信号のそれぞれを、設定した合成比率で重み付け合成して１以上の合成動き補償予測画像信号を生成し、上記の符号化手段は、上記の予測画像信号選択手段が、合成動き補償予測画像信号を選択して予測画像信号として出力するとき、予測画像信号の生成に用いた合成比率を示す合成比率情報を、動画像符号化信号に更に付加するようにしてもよい。 Here, the synthesized motion compensated prediction image signal generation unit weights and synthesizes each of two or more motion compensation prediction image signals of different combinations among the plurality of motion compensation prediction image signals at a set composition ratio. One or more synthesized motion compensated predicted image signals are generated, and the encoding means selects the predicted image signal when the predicted image signal selecting means selects the synthesized motion compensated predicted image signal and outputs it as a predicted image signal. The synthesis ratio information indicating the synthesis ratio used for generating the image may be further added to the encoded video image signal.

更に、上記の目的を達成するため、本発明の動画像符号化プログラムは、入力動画像信号における符号化対象画像信号を符号化して得た符号化信号を局部復号処理して参照画像信号を生成し、その参照画像信号を動き補償予測して得られた予測画像信号を用いて、入力動画像信号における現符号化対象画像信号の動き補償予測符号化を行う動画像符号化処理をコンピュータに実行させる動画像符号化プログラムにおいて、
上記コンピュータに、入力動画像信号における現符号化対象画像信号の動きベクトルを検出する第１のステップと、生成済みの参照画像信号から、互いに異なるフィルタ係数の複数のフィルタのうちのいずれか１つを用いたフィルタ処理と、同一の動きベクトルに基づいた動き補償処理とにより、複数の動き補償予測画像信号を生成する第２のステップと、複数の動き補償予測画像信号のうち、互いに異なる組み合わせの２以上の動き補償予測画像信号を合成して１以上の合成動き補償予測画像信号を生成する第３のステップと、複数の動き補償予測画像信号及び合成動き補償予測画像信号のうち、誤差評価値が最小の信号を選択して予測画像信号として出力する第４のステップと、現符号化対象画像信号と第４のステップにより選択された予測画像信号との差分信号に対して、直交変換及び量子化して符号化信号である量子化後信号を生成する第５のステップと、選択された予測画像信号の生成に用いた生成済みの参照画像信号、同一の動きベクトル、及びフィルタをそれぞれ特定するための参照画像信号特定情報、動きベクトル特定情報、及びフィルタ特定情報を、量子化後信号に所定の符号化処理をして生成した動画像符号化信号に付加して出力する第６のステップと、量子化後信号を逆量子化及び逆直交変換して得た信号に、選択された予測画像信号を加算することによる局部復号処理により、新たな参照画像信号を生成する第７のステップとを実行させることを特徴とする。 Furthermore, in order to achieve the above object, the moving picture coding program of the present invention generates a reference picture signal by locally decoding the coded signal obtained by coding the coding target picture signal in the input moving picture signal. Then, using a prediction image signal obtained by motion compensation prediction of the reference image signal, a moving image encoding process for performing motion compensation prediction encoding of the current encoding target image signal in the input moving image signal is executed on the computer. In the moving image encoding program to be
A first step of detecting a motion vector of a current encoding target image signal in an input moving image signal in the computer, and any one of a plurality of filters having different filter coefficients from a generated reference image signal A second step of generating a plurality of motion compensated predicted image signals by a filter process using the same motion vector and a motion compensation process based on the same motion vector, and combinations of mutually different combinations among the plurality of motion compensated predicted image signals A third step of synthesizing two or more motion-compensated predicted image signals to generate one or more synthesized motion-compensated predicted image signals, and an error evaluation value among a plurality of motion-compensated predicted image signals and synthesized motion-compensated predicted image signals Is selected by the fourth step of selecting the smallest signal and outputting it as the predicted image signal, and the current encoding target image signal and the fourth step. A fifth step for generating a quantized signal, which is an encoded signal, by performing orthogonal transform and quantization on the difference signal from the measured image signal, and the generated reference used to generate the selected predicted image signal A moving image generated by performing a predetermined encoding process on the quantized signal, the image signal, the same motion vector, and the reference image signal specifying information for specifying the filter, the motion vector specifying information, and the filter specifying information. A sixth step of adding and outputting to the encoded signal, and a local decoding process by adding the selected predicted image signal to the signal obtained by inverse quantization and inverse orthogonal transform of the quantized signal, And a seventh step of generating a new reference image signal.

ここで、本発明の動画像符号化プログラムの上記の第３のステップは、複数の動き補償予測画像信号のうち、互いに異なる組み合わせの２以上の動き補償予測画像信号のそれぞれを、設定した合成比率で重み付け合成して１以上の合成動き補償予測画像信号を生成し、上記の第６のステップは、上記の第４のステップが、合成動き補償予測画像信号を選択して予測画像信号として出力するとき、予測画像信号の生成に用いた合成比率を示す合成比率情報を、動画像符号化信号に更に付加するようにしてもよい。 Here, in the third step of the moving picture coding program of the present invention, the combination ratio in which each of two or more motion compensated predicted image signals of different combinations among the plurality of motion compensated predicted image signals is set. Are combined by weighting to generate one or more combined motion compensated predicted image signals. In the sixth step, the fourth step selects the combined motion compensated predicted image signal and outputs it as a predicted image signal. At this time, synthesis ratio information indicating a synthesis ratio used for generating the predicted image signal may be further added to the moving image encoded signal.

本発明によれば、複数の動き補償予測画像信号と、２以上の動き補償予測画像信号を合成して生成した１以上の合成動き補償予測画像信号とのうち、誤差評価値が最小の信号を選択して予測画像信号とすることで、誤差の平均化により予測誤差を大幅に減少し符号化効率を向上させることができる。 According to the present invention, a signal having a minimum error evaluation value is selected from among a plurality of motion compensation prediction image signals and one or more combined motion compensation prediction image signals generated by combining two or more motion compensation prediction image signals. By selecting it as a predicted image signal, it is possible to greatly reduce the prediction error and improve the coding efficiency by averaging the errors.

また、本発明によれば、複数の動き補償予測画像信号の生成に動きベクトルが１本のままで済むため、動きベクトル情報の増加なく精度の高い予測画像を生成できる。 Also, according to the present invention, since only one motion vector is required for generating a plurality of motion compensated prediction image signals, a highly accurate prediction image can be generated without an increase in motion vector information.

本発明の動画像符号化装置の第１の実施の形態のブロック図である。It is a block diagram of 1st Embodiment of the moving image encoder of this invention. 図１中の動き補償・画像合成選択部１１１の動作説明用フローチャートである。2 is a flowchart for explaining the operation of a motion compensation / image composition selection unit 111 in FIG. 1. 画像の合成方法の一例の説明図である。It is explanatory drawing of an example of the synthetic | combination method of an image. 図１中の動き補償・画像合成選択部１１１による画像合成モードの一例の説明図である。It is explanatory drawing of an example of the image composition mode by the motion compensation and image composition selection part 111 in FIG. 画像合成モード情報のシンタックスの一例を示す図である。It is a figure which shows an example of the syntax of image composition mode information. 図５中のmix_modeのセマンティックスの一例を示す図である。It is a figure which shows an example of the semantics of mix_mode in FIG. 重み付け予測のシンタックスの一例を示す図である。It is a figure which shows an example of the syntax of weighted prediction. 図７中のweight_idxのセマンティックスの一例を示す図である。It is a figure which shows an example of the semantics of weight_idx in FIG. 本発明の動画像符号化装置の第２の実施の形態のブロック図である。It is a block diagram of 2nd Embodiment of the moving image encoder of this invention. 動画像復号化装置の一例のブロック図である。It is a block diagram of an example of a moving image decoding apparatus. ＭＰＥＧ-４ＡＶＣで用いられるアップサンプルフィルタの説明図（その１）である。It is explanatory drawing (the 1) of the up-sampling filter used by MPEG-4AVC. ＭＰＥＧ-４ＡＶＣで用いられるアップサンプルフィルタの説明図（その２）である。It is explanatory drawing (the 2) of the up sample filter used by MPEG-4AVC.

次に、本発明の実施の形態について図面を参照して詳細に説明する。本発明は、動き補償予測の際に、参照画像を複数のアップサンプリング手法を用いて複数の動き補償予測画像を生成し、生成した動き補償予測画像を合成した画像を予測画像にすることで、符号化効率の向上を実現したものである。 Next, embodiments of the present invention will be described in detail with reference to the drawings. The present invention generates a plurality of motion compensated prediction images using a plurality of upsampling techniques for a reference image at the time of motion compensation prediction, and converts the generated motion compensated prediction image into a prediction image. This improves the encoding efficiency.

（第１の実施の形態）
図１は、本発明になる動画像符号化装置の第１の実施の形態のブロック図を示す。 (First embodiment)
FIG. 1 shows a block diagram of a first embodiment of a video encoding apparatus according to the present invention.

本実施の形態の動画像符号化装置１００は、局部復号部１０２と、減算器１０３と、直交変換・量子化部１０４と、動きベクトル検出部１１２と、エントロピー符号化部１１３とからなる。 The moving picture coding apparatus 100 according to the present embodiment includes a local decoding unit 102, a subtractor 103, an orthogonal transformation / quantization unit 104, a motion vector detection unit 112, and an entropy coding unit 113.

局部復号部１０２は、逆量子化・逆直交変換部１０５と、加算器１０６と、参照画像メモリ１０７と、３つのフィルタ１０８、１０９及び１１０と、動き補償・画像合成選択部１１１とからなり、動き補償・画像合成選択部１１１から局部復号により得られた予測画像信号を減算器１０３へ出力する。また、動き補償・画像合成選択部１１１は、画像合成モード情報をエントロピー符号化部１１３へ供給する。 The local decoding unit 102 includes an inverse quantization / inverse orthogonal transform unit 105, an adder 106, a reference image memory 107, three filters 108, 109, and 110, and a motion compensation / image synthesis selection unit 111. A predicted image signal obtained by local decoding from the motion compensation / image synthesis selection unit 111 is output to the subtractor 103. In addition, the motion compensation / image synthesis selection unit 111 supplies the image synthesis mode information to the entropy encoding unit 113.

減算器１０３は、入力端子１０１よりの動画像信号である入力画像信号（入力動画像信号）と後述する動き補償・画像合成選択部１１１からの予測画像信号とを減算し、それらの差分信号である予測誤差信号を生成する。直交変換・量子化部１０４は、減算器１０３からの予測誤差信号を直交変換・量子化して量子化後信号を生成する。 The subtracter 103 subtracts an input image signal (input moving image signal) that is a moving image signal from the input terminal 101 and a predicted image signal from a motion compensation / image synthesis selection unit 111 described later, and uses the difference signal therebetween. A certain prediction error signal is generated. The orthogonal transform / quantization unit 104 performs orthogonal transform / quantization on the prediction error signal from the subtractor 103 to generate a quantized signal.

逆量子化・逆直交変換部１０５は、直交変換・量子化部１０４から出力される量子化後信号を逆量子化・逆直交変換する。加算器１０６は、逆量子化・逆直交変換部１０５から出力される逆量子化・逆直交変換された信号と、動き補償・画像合成選択部１１１からの予測画像信号とを加算して、局部復号された参照画像信号を生成する。参照画像メモリ１０７は、局部復号された参照画像信号を蓄積する。 The inverse quantization / inverse orthogonal transform unit 105 performs inverse quantization / inverse orthogonal transform on the quantized signal output from the orthogonal transform / quantization unit 104. The adder 106 adds the inversely quantized / inverse orthogonal transformed signal output from the inverse quantization / inverse orthogonal transform unit 105 and the predicted image signal from the motion compensation / image synthesis selection unit 111, and A decoded reference image signal is generated. The reference image memory 107 accumulates the locally decoded reference image signal.

フィルタ１０８、１０９及び１１０は、それぞれ参照画像メモリ１０７から読み出される同じ参照画像信号に対して、互いに異なるフィルタ係数によりフィルタリングを行い、参照画像信号をアップサンプリングする。 The filters 108, 109, and 110 filter the same reference image signal read from the reference image memory 107 with different filter coefficients, and upsample the reference image signal.

ここで、フィルタ１０８、１０９及び１１０でそれぞれ用いるフィルタリング方法の一例を紹介する。フィルタ１０８は、その出力信号が動きベクトル検出部１１２におけるアップサンプリングにも用いられるため、固定フィルタ係数のフィルタリングを行うことが望ましい。例えば、フィルタ１０８は、ＭＰＥＧ-４ＡＶＣで用いられる６タップフィルタ(フィルタ係数；1,-5,20,20,-5,1)であるものとする。 Here, an example of a filtering method used in each of the filters 108, 109, and 110 will be introduced. Since the output signal of the filter 108 is also used for upsampling in the motion vector detection unit 112, it is desirable to perform filtering of fixed filter coefficients. For example, it is assumed that the filter 108 is a 6-tap filter (filter coefficient; 1, -5, 20, 20, -5, 1) used in MPEG-4 AVC.

一方、フィルタ１０９とフィルタ１１０のフィルタ係数は、ピクチャ単位で切り替わる。例えば、フィルタ１０９のフィルタ係数は、符号化対象ピクチャの原画像（入力画像信号）と、復号画像（局部復号部１０２において局部復号された参照画像信号）との誤差が最小となるように、前述した数１で示した演算式のフィルタリングを行うウィナーフィルタを用いて決定する。ただし、符号化対象ピクチャについて最適なフィルタ係数を算出することは、一度ピクチャ内のすべてのブロックについて動き補償予測を行い、フィルタ係数を決定してから再度ブロック毎に最適なフィルタ手段を選択する必要があるため、実質２パスエンコードとなる。そのため、１パスエンコードが要求される場合は、フィルタ１０９は用いない。 On the other hand, the filter coefficients of the filter 109 and the filter 110 are switched in units of pictures. For example, the filter coefficient of the filter 109 is set so that the error between the original image (input image signal) of the encoding target picture and the decoded image (reference image signal locally decoded by the local decoding unit 102) is minimized. It is determined using a Wiener filter that performs filtering of the arithmetic expression shown in Equation 1. However, calculating the optimal filter coefficient for the picture to be encoded requires performing motion compensation prediction for all the blocks in the picture once, determining the filter coefficient, and then selecting the optimal filter means for each block again. Therefore, it becomes substantially two-pass encoding. Therefore, the filter 109 is not used when 1-pass encoding is required.

フィルタ１１０は、１つ前の符号化対象ピクチャについてフィルタ１０９においてウィナーフィルタを用いて算出したフィルタ係数をそのまま用いる。１つ前の符号化対象ピクチャについて算出したフィルタ係数をそのまま用いることは、既にフィルタ係数が決定されているため、１パスエンコードで行うことができると共に、フィルタ係数伝送の符号量を削減することも可能である。ただし、ここで説明したフィルタリング方法はほんの一例であり、他の方法で生成したフィルタ係数を用いたとしても本発明に含まれるものである。 The filter 110 uses the filter coefficient calculated by using the Wiener filter in the filter 109 for the previous encoding target picture as it is. Using the filter coefficient calculated for the previous encoding target picture as it is can be performed by one-pass encoding since the filter coefficient has already been determined, and can also reduce the code amount of filter coefficient transmission. Is possible. However, the filtering method described here is only an example, and even if a filter coefficient generated by another method is used, it is included in the present invention.

動きベクトル検出部１１２は、入力画像信号と、参照画像メモリ１０７に蓄積された復号化済の参照画像信号に対してフィルタ１０８を用いてアップサンプリングした信号とから、入力画像信号における現符号化対象画像信号の動きベクトルを検出する。 The motion vector detection unit 112 calculates a current encoding target in the input image signal from the input image signal and a signal obtained by up-sampling the decoded reference image signal stored in the reference image memory 107 using the filter 108. A motion vector of the image signal is detected.

動き補償・画像合成選択部１１１は、動きベクトル検出部１１２で検出した動きベクトルを用いて、参照画像信号をフィルタ１０８〜１１０でアップサンプリングして得られた各画像信号に対してそれぞれ動き補償予測を行う。また、動き補償・画像合成選択部１１１は、各動き補償予測された画像信号を選択又は合成することで、原画像信号との予測誤差が少ない予測画像信号を生成する。この予測画像信号の生成方法については後で詳しく述べる。 The motion compensation / image synthesis selection unit 111 uses the motion vector detected by the motion vector detection unit 112 to perform motion compensation prediction on each image signal obtained by up-sampling the reference image signal by the filters 108 to 110. I do. Also, the motion compensation / image synthesis selection unit 111 generates a predicted image signal with little prediction error from the original image signal by selecting or synthesizing each motion compensated predicted image signal. A method for generating the predicted image signal will be described in detail later.

エントロピー符号化部１１３は、直交変換・量子化部１０４で直交変換・量子化された画像信号をエントロピー符号化して動画像符号化信号のビットストリームを出力する。また、エントロピー符号化部１１３は、動き補償・画像合成選択部１１１で用いられた参照画像信号のアップサンプリング手法を示す情報と、アップサンプリングされた画像信号をどのように合成したかを示す情報などの付加情報（画像合成モード情報）をエントロピー符号化し、上記動画像符号化信号に付加する。エントロピー符号化部１１３は、エントロピー符号化して得られた動画像符号化信号と画像合成モード情報などからなるビットストリームを出力端子１１４を介して装置外部へ出力する。 The entropy encoding unit 113 entropy-encodes the image signal orthogonally transformed and quantized by the orthogonal transform / quantization unit 104 and outputs a bit stream of the moving image encoded signal. The entropy encoding unit 113 also includes information indicating the upsampling method of the reference image signal used in the motion compensation / image synthesis selection unit 111, information indicating how the upsampled image signal is combined, and the like. The additional information (image synthesis mode information) is entropy-encoded and added to the moving image encoded signal. The entropy encoding unit 113 outputs a bit stream including a moving image encoded signal obtained by entropy encoding and image synthesis mode information to the outside of the apparatus via the output terminal 114.

ここで、上記の画像合成モード情報は、後述するように、動き補償画像合成選択部１１１により選択された予測画像信号の生成に用いた参照画像信号を特定するための参照画像信号特定情報、フィルタ１０８〜１１０で用いる同一の動きベクトルを特定する動きベクトル特定情報、フィルタ１０８〜１１０のうち選択された予測画像信号の生成に用いた参照画像信号を生成するフィルタ処理を特定するためのフィルタ処理特定情報を含む。 Here, as described later, the image synthesis mode information includes reference image signal specifying information for specifying the reference image signal used to generate the predicted image signal selected by the motion compensated image synthesis selecting unit 111, and a filter. Motion vector specification information for specifying the same motion vector used in 108 to 110, and filter processing specification for specifying the filter processing for generating the reference image signal used to generate the predicted image signal selected from the filters 108 to 110 Contains information.

このように、本実施形態の動画像符号化装置１００は、参照画像信号をアップサンプリングするフィルタを３つ用意し、これにより得られた３つの動き補償予測画像信号と、これら３つの動き補償予測画像信号に基づいて画像合成して得られた複数の動き補償予測画像信号との中から、所定の一の予測画像信号を選択して予測画像信号に用いる点に特徴がある。 As described above, the moving image coding apparatus 100 according to the present embodiment prepares three filters for upsampling the reference image signal, the three motion compensated prediction image signals obtained thereby, and the three motion compensated predictions. It is characterized in that a predetermined one predicted image signal is selected from a plurality of motion compensated predicted image signals obtained by synthesizing images based on the image signal and used for the predicted image signal.

次に、本実施形態の要部の動き補償・画像合成選択部１１１について図２のフローチャート等を用いて詳細に説明する。図２のフローチャートは、動き補償のブロック単位で実行される。 Next, the motion compensation / image composition selection unit 111 as the main part of the present embodiment will be described in detail with reference to the flowchart of FIG. The flowchart of FIG. 2 is executed in units of motion compensation blocks.

まず、動き補償・画像合成選択部１１１は、フィルタ１０８、１０９、１１０によってそれぞれ別々にアップサンプリングされた参照画像信号に対して、動きベクトル検出部１１２で検出した動きベクトルを用いて動き補償予測を行って、それぞれ動き補償予測画像信号を生成する（ステップＳ１１）。次に、動き補償・画像合成選択部１１１は、上記の３つのフィルタ１０８、１０９、１１０のうち、２つのフィルタ（ここではフィルタ１０８と１０９、フィルタ１０９と１１０、フィルタ１０８と１１０）でアップサンプリングされた参照画像信号からステップＳ１１で得られた２つの動き補償予測画像信号を合成して、合成画像信号を生成する（ステップＳ１２）。 First, the motion compensation / image synthesis selection unit 111 performs motion compensation prediction on the reference image signals individually upsampled by the filters 108, 109, and 110 using the motion vector detected by the motion vector detection unit 112. To generate a motion compensated prediction image signal (step S11). Next, the motion compensation / image synthesis selection unit 111 performs upsampling with two of the three filters 108, 109, and 110 (here, the filters 108 and 109, the filters 109 and 110, and the filters 108 and 110). The two motion compensated prediction image signals obtained in step S11 are synthesized from the reference image signal thus generated to generate a synthesized image signal (step S12).

画像信号を合成する手法は、２つの画像信号の平均値を用いる場合が最も一般的である。図３は、平均値を用いる場合の画像合成の様子を模式的に示す。図３において、上記の２つのフィルタの一方の第１のフィルタのアップリング参照画像１６０の、入力画像信号である符号化対象画像１５０中の符号化対象ブロック１５１に対応する参照画像ブロック１６１に対して、動きベクトル検出部１１２で検出された動きベクトルで指示される動き補償予測された参照画像ブロックを１６２で示す。一方、上記の２つのフィルタの他方の第２のフィルタのアップリング参照画像１７０の、入力画像信号である符号化対象画像１５０中の符号化対象ブロック１５１に対応する参照画像ブロック１７１に対して、上記と同一の動きベクトルで指示される動き補償予測された参照画像ブロックを１７２で示す。 The most common method for synthesizing image signals is to use an average value of two image signals. FIG. 3 schematically shows a state of image composition when the average value is used. In FIG. 3, the reference image block 161 corresponding to the encoding target block 151 in the encoding target image 150 that is the input image signal of the uplink reference image 160 of the first filter of one of the two filters described above. A motion compensation predicted reference image block indicated by the motion vector detected by the motion vector detection unit 112 is indicated by 162. On the other hand, with respect to the reference image block 171 corresponding to the encoding target block 151 in the encoding target image 150 of the input image signal of the uplink reference image 170 of the other second filter of the above two filters, A motion compensation predicted reference image block indicated by the same motion vector as described above is indicated by 172.

この場合、動き補償・画像合成選択部１１１は、２つの動き補償予測された参照画像ブロック１６２と１７２との和を２で除算することで、それらの平均値である合成画像信号を算出する。又は、より精度良く平均値を算出する場合には、参照画像ブロック１６２と１７２の和に１を加算してから２で除算する。これは、四捨五入に相当する演算である。なお、画像合成方法のうち、平均値以外の他の合成方法（重み付け予測）については後述する。 In this case, the motion compensation / image synthesis selection unit 111 divides the sum of the two motion compensation predicted reference image blocks 162 and 172 by 2 to calculate a synthesized image signal that is an average value of them. Alternatively, when the average value is calculated with higher accuracy, 1 is added to the sum of the reference image blocks 162 and 172 and then divided by 2. This is an operation equivalent to rounding off. Of the image synthesis methods, a synthesis method (weighted prediction) other than the average value will be described later.

最後に、動き補償・画像合成選択部１１１は、上記の３つのフィルタ１０８、１０９、１１０でそれぞれ別々にアップサンプリングした参照画像信号から得られた３つの動き補償予測画像信号と、ステップＳ１２で生成された３つの合成画像信号の中から、最小誤差評価値を持つ画像信号を予測画像信号に採用する（ステップＳ１３）。 Finally, the motion compensation / image synthesis selection unit 111 generates the three motion compensated prediction image signals obtained from the reference image signals separately upsampled by the three filters 108, 109, and 110, respectively, and generated in step S12. Of the three synthesized image signals, the image signal having the minimum error evaluation value is adopted as the predicted image signal (step S13).

このように、動き補償・画像合成選択部１１１は、３つのフィルタ１０８、１０９、１１０でそれぞれ別々にアップサンプリングした参照画像信号から得られた３つの動き補償予測画像信号の中から、どの動き補償予測画像信号を選択するか、又は、３つの動き補償予測画像信号の中からどの２つの動き補償予測画像信号を選択して合成するか（画像合成モード）を決定している。 As described above, the motion compensation / image synthesis selection unit 111 selects which motion compensation out of the three motion compensation predicted image signals obtained from the reference image signals separately up-sampled by the three filters 108, 109, and 110, respectively. It is determined whether a predicted image signal is selected or which two motion compensated predicted image signals are selected from the three motion compensated predicted image signals and synthesized (image synthesis mode).

図４は、本実施の形態の画像合成モードの説明図を示す。本実施の形態の画像合成モードは、３つのフィルタ１０８、１０９、１１０のうちの１つのフィルタでアップサンプリングした参照画像信号から生成される動き補償予測画像信号を単独で選択する場合がＳ２１、Ｓ２２、Ｓ２３で示すように３通りある。また、３つのフィルタ１０８、１０９、１１０のうち２つのフィルタでアップサンプリングした参照画像信号から生成される２つの動き補償予測画像信号を合成した画像信号を選択する場合がＳ２４、Ｓ２５、Ｓ２６で示すように３通りある。従って、本実施の形態の画像合成モードでは、上記の合計の６通りの選択肢のうちの一つの選択肢を選ぶ。 FIG. 4 is an explanatory diagram of the image composition mode of the present embodiment. In the image synthesis mode according to the present embodiment, the motion compensated prediction image signal generated from the reference image signal upsampled by one of the three filters 108, 109, and 110 may be selected independently. , There are three ways as shown in S23. The case of selecting an image signal obtained by synthesizing two motion compensated prediction image signals generated from reference image signals upsampled by two of the three filters 108, 109, and 110 is indicated by S24, S25, and S26. There are three ways. Therefore, in the image composition mode of the present embodiment, one of the above six options is selected.

この画像合成モードの選択肢の数は、フィルタから出力される動き補償予測画像信号を単独で何通り用いるかに依存する。例えば、最も単純なフィルタが２通りの場合は、画像合成モードの選択肢の数は、単独の動き補償予測画像信号２通り、合成画像信号１通りの合計３通りである。 The number of choices for this image synthesis mode depends on how many motion compensated prediction image signals output from the filter are used. For example, when there are two simplest filters, the number of image synthesis mode options is three, that is, two independent motion compensated prediction image signals and one composite image signal.

上記の画像合成モードによりどの信号が選択・合成されたかを、動画像復号化装置で識別しないと復号化できないので、画像合成モード情報は、図１のエントロピー符号化手段１１３によりエントロピー符号化され、ビットストリーム内の情報として伝送される。 Since it cannot be decoded unless the moving image decoding apparatus identifies which signal has been selected and combined in the image combining mode, the image combining mode information is entropy encoded by the entropy encoding unit 113 in FIG. It is transmitted as information in the bitstream.

次に、画像合成モード情報の伝送方法について説明する。 Next, a method for transmitting image composition mode information will be described.

図５は、画像合成モードのシンタックスの一例を示す。まず、１つのピクチャの画面を１６ラインずつスライスしたときの１６ライン幅のスライス単位で適応フィルタを使用するかどうかのフラグ（use_adaptive_filter_flag）を送る。ただし、１６ラインを１スライスとする本分割方法はほんの一例であり、１ピクチャを１スライスとする場合もある。デフォルト以外のフィルタを使用する場合にはこのフラグをオンにする。本実施の形態の場合には、フィルタ１０８をデフォルトフィルタとし、フィルタ１０９及び１１０を自ら定義するフィルタとして説明する。 FIG. 5 shows an example of the syntax of the image composition mode. First, a flag (use_adaptive_filter_flag) indicating whether or not to use an adaptive filter is sent in units of 16-line width slices when a screen of one picture is sliced by 16 lines. However, this division method in which 16 lines are one slice is merely an example, and one picture may be one slice. Turn this flag on if you want to use a filter other than the default. In the case of the present embodiment, the filter 108 is described as a default filter, and the filters 109 and 110 are described as filters that define themselves.

次に、図５に示すようにスライス単位で、デフォルト以外に定義するフィルタの数（num_of_filter）を送る。本実施形態では、フィルタの数（num_of_filter）は「２」である。そして、各フィルタについてそのフィルタ係数を送る。このフィルタ係数の伝送方法は、フィルタの対称性を利用したりすることで、すべての係数を送らなくてもよい。 Next, as shown in FIG. 5, the number of filters (num_of_filter) defined other than the default is sent in units of slices. In the present embodiment, the number of filters (num_of_filter) is “2”. Then, the filter coefficient is sent for each filter. This filter coefficient transmission method does not have to send all the coefficients by utilizing the symmetry of the filter.

また、１６ライン幅のスライスは、１６ライン幅の複数個のマクロブロックからなる。そこで、水平方向１６画素、垂直方向１６画素の輝度ブロックと、それぞれ水平方向８画素、垂直方向８画素の２つの色差ブロックとからなるマクロブロック単位（又はサブマクロブロック単位）では、図５に示すように画像合成モード（mix_mode）を送る。この画像合成モード（mix_mode）は、参照画像を示す番号（ref_idx）と動きベクトル差分（ｍｖｄ）と一緒に送られる。 A slice having a width of 16 lines is composed of a plurality of macroblocks having a width of 16 lines. Therefore, a macroblock unit (or sub-macroblock unit) composed of a luminance block of 16 pixels in the horizontal direction and 16 pixels in the vertical direction and two color difference blocks of 8 pixels in the horizontal direction and 8 pixels in the vertical direction, respectively, is shown in FIG. The image composition mode (mix_mode) is sent as follows. This image composition mode (mix_mode) is sent together with a reference image number (ref_idx) and a motion vector difference (mvd).

図６は、画像合成モード（mix_mode）のセマンティックスの一例を示す。シンタックスの意味を示すセマンティックスは、図６では、値が「０」のとき使用フィルタがフィルタ＃１、値が「１」のとき使用フィルタがフィルタ＃２、値が「２」のとき使用フィルタがフィルタ＃３、値が「４」のとき使用フィルタがフィルタ＃１と＃２、値が「５」のとき使用フィルタがフィルタ＃２と＃３、値が「６」のとき使用フィルタがフィルタ＃１と＃３であることを示す。ここで、フィルタ＃１、＃２、＃３は、それぞれ図１のフィルタ１０８、１０９、１１０のことである。 FIG. 6 shows an example of the semantics of the image composition mode (mix_mode). In FIG. 6, the semantics indicating the meaning of the syntax are shown in FIG. 6, when the value is “0”, the used filter is filter # 1, when the value is “1”, the used filter is filter # 2, and when the value is “2” Is filter # 3, when the value is “4”, the filters used are filters # 1 and # 2, when the value is “5”, the filters used are filters # 2 and # 3, and when the value is “6”, the filters used are the filters Indicates # 1 and # 3. Here, the filters # 1, # 2, and # 3 are the filters 108, 109, and 110 in FIG. 1, respectively.

ただし、図６はＰピクチャの場合のほんの一例である。本来は、最も選択される可能性が高い事象を小さい番号に割り当てるとよい。この割り当ては固定でもよいが、スライスヘッダなどで順序を変更できるようにしてもよい。 However, FIG. 6 is only an example in the case of a P picture. Originally, an event most likely to be selected should be assigned to a small number. This assignment may be fixed, but the order may be changed by a slice header or the like.

また、この画像合成モード情報を伝送する場合のシンタックスはほんの一例であり、例えばスライス単位で送信するフラグをピクチャ単位で送信するなどしてもよい。 Moreover, the syntax in the case of transmitting this image composition mode information is only an example. For example, a flag to be transmitted in units of slices may be transmitted in units of pictures.

このように、本実施形態の動画像符号化装置１００では、動き補償予測の際に参照画像信号に対して複数のアップサンプリング手法を用いて複数の動き補償予測画像信号を生成し、生成した複数の動き補償予測画像信号と、複数の動き補償予測画像信号を合成した複数の合成画像信号とのうち選択した一つの画像信号を予測画像信号にすることで、誤差の平均化により予測誤差が大幅に減少し符号化効率を向上させることができる。 As described above, the moving image encoding apparatus 100 according to the present embodiment generates a plurality of motion compensated prediction image signals by using a plurality of upsampling techniques for a reference image signal in motion compensation prediction. Predictive image signal is greatly increased by averaging one error signal from the selected motion compensation predicted image signal and a plurality of synthesized image signals obtained by combining multiple motion compensated predicted image signals. Thus, the coding efficiency can be improved.

つまり、従来、片方向予測しか用いることができなかったＰピクチャにおいても、Ｂピクチャの双方向予測に相当するような複数の動き補償予測画像信号を合成した画像信号を予測画像信号とすることができる。その結果、合成を用いない場合と比較して、予測誤差が減少し符号化効率が大幅に向上する。 That is, even in a P picture that can only use unidirectional prediction in the past, an image signal obtained by synthesizing a plurality of motion compensation predicted image signals corresponding to bidirectional prediction of a B picture can be used as a predicted image signal. it can. As a result, the prediction error is reduced and the coding efficiency is greatly improved as compared with the case where synthesis is not used.

また、従来のＢピクチャの双方向予測では、複数の動き補償予測画像の生成に動きベクトルが２本必要であり、片方向予測と比較して動きベクトルの情報量が増加してしまう。これに対し、本実施形態の動画像符号化装置１００では３つのフィルタを用いて複数の動き補償予測画像信号を生成し、それらの中から一つの予測画像信号を選択する方法では、動きベクトルが１本のままで済むため、動きベクトルの情報量は増加しない。そのため、従来のＢピクチャの双方向予測のように、予測誤差は小さくなるが動きベクトルの符号量がそれ以上に増加してしまう課題が解決され、動きベクトル情報の増加なく精度の高い予測画像信号を生成できる。 In addition, in the conventional bi-prediction of a B picture, two motion vectors are required to generate a plurality of motion-compensated prediction images, and the amount of motion vector information increases as compared with uni-directional prediction. On the other hand, in the moving picture encoding apparatus 100 according to the present embodiment, in a method of generating a plurality of motion compensated prediction image signals using three filters and selecting one prediction image signal from them, a motion vector is Since only one is needed, the amount of motion vector information does not increase. Therefore, the problem that the prediction error becomes small but the code amount of the motion vector increases more than the conventional B picture bi-directional prediction is solved, and the prediction image signal with high accuracy without increasing the motion vector information is solved. Can be generated.

ここで、本実施の形態で用いるアップサンプリング用フィルタ１０８、１０９、１１０は線形フィルタであるため、複数のフィルタの合成画像を単独フィルタとして表現して伝送することも可能であるが、その場合は以下の欠点がある。
（１）合成画像を生成するフィルタの係数を伝送する必要があり、符号量が増加する。
（２）限られたビット深度の整数フィルタでは、本実施の形態の合成画像と同一の画像を生成するフィルタを厳密に表現できない。
（３）単独フィルタとして表現するためのフィルタ係数算出演算量が必要となる。
（４）後述する重み付け予測に対応するためには、それぞれの重み付け係数に対応したフィルタ係数を算出／伝送する必要がある。 Here, since the upsampling filters 108, 109, and 110 used in this embodiment are linear filters, it is also possible to represent and transmit a composite image of a plurality of filters as a single filter. There are the following disadvantages.
(1) It is necessary to transmit the coefficient of a filter for generating a composite image, and the amount of codes increases.
(2) An integer filter having a limited bit depth cannot accurately represent a filter that generates the same image as the synthesized image of the present embodiment.
(3) A filter coefficient calculation calculation amount for expressing as a single filter is required.
(4) In order to cope with weighted prediction described later, it is necessary to calculate / transmit filter coefficients corresponding to the respective weighting coefficients.

従って、複数のフィルタの合成画像を単独フィルタで生成するよりも、本実施の形態の合成画像生成方法の方が優れているのは明白である。 Therefore, it is clear that the composite image generation method of the present embodiment is superior to the case where a composite image of a plurality of filters is generated by a single filter.

次に、画像の合成方法の他の例について説明する。上記の実施の形態の説明では、画像の合成方法として２つの動き補償予測画像信号の平均値を用いる場合の例を示したが、画像の合成方法はこれに限らず、ＭＰＥＧ−４ＡＶＣで用いられている重み付け予測のように、２つの動き補償予測画像信号を所定の比率で合成したものを予測画像信号とするようにしてもよい。この場合は、更に効率の良い予測画像信号を生成することが可能である。 Next, another example of an image composition method will be described. In the description of the above embodiment, an example in which an average value of two motion compensated prediction image signals is used as an image synthesis method is shown, but the image synthesis method is not limited to this, and is used in MPEG-4 AVC. As in the weighted prediction, a combination of two motion compensated prediction image signals at a predetermined ratio may be used as the prediction image signal. In this case, a more efficient predicted image signal can be generated.

図７は、重み付け予測のシンタックスの一例を示す。まず、図７に示すように、スライス単位で重み付け予測を行うかどうかのフラグ（adaptive_filter_weighted_flag）をビットストリーム内に含めて送る。重み付け予測を使用する場合は、重み付け予測の精度（weighted_accuracy_idx）を指定する。ここでは、weighted_accuracy_idx＝０の場合の精度（画像合成計算時の分母に相当）は「３」であるとし、weighted_accuracy_idx＝１の場合の精度は「４」であるとする。 FIG. 7 shows an example of the syntax of weighted prediction. First, as shown in FIG. 7, a flag (adaptive_filter_weighted_flag) indicating whether or not to perform weighted prediction in units of slices is included in the bitstream and sent. When weighted prediction is used, the accuracy of weighted prediction (weighted_accuracy_idx) is designated. Here, it is assumed that the accuracy when weighted_accuracy_idx = 0 (corresponding to the denominator at the time of image synthesis calculation) is “3”, and the accuracy when weighted_accuracy_idx = 1 is “4”.

マクロブロック単位（又はサブマクロブロック単位）では、図７に示すように、重み付け予測の精度（weighted_accuracy_idx）毎に定義した重み付けインデックス（weighted_idx）を送る。 In a macroblock unit (or submacroblock unit), as shown in FIG. 7, a weighting index (weighted_idx) defined for each weighted prediction accuracy (weighted_accuracy_idx) is sent.

図８は、重み付けインデックス(weighted_idx)のセマンティックスの一例を示す。weighted_accuracy_idx＝０の場合、weighted_idx＝０は１つ目のフィルタリング画像の比率を「２」、２つ目のフィルタリング画像の比率を「１」として画像を合成することを意味する。例えば、フィルタ１０８のフィルタリング画像の比率を「２」、フィルタ１０９のフィルタリング画像の比率を「１」として、合成することなどが挙げられる。また、weighted_idx＝１は１つ目のフィルタリング画像の比率を「１」、２つ目のフィルタリング画像の比率を「２」として画像を合成することを意味する。 FIG. 8 shows an example of the semantics of the weighted index (weighted_idx). When weighted_accuracy_idx = 0, weighted_idx = 0 means that images are synthesized with the ratio of the first filtered image being “2” and the ratio of the second filtered image being “1”. For example, the ratio of the filtering image of the filter 108 is “2” and the ratio of the filtering image of the filter 109 is “1”. Weighted_idx = 1 means that the image is synthesized with the ratio of the first filtered image being “1” and the ratio of the second filtered image being “2”.

また、weighted_accuracy_idx＝１の場合、weighted_idx＝０は１つ目のフィルタリング画像の比率を「２」、２つ目のフィルタリング画像の比率を「２」として画像を合成することを意味する。例えば、フィルタ１０８のフィルタリング画像の比率を「２」、フィルタ１０９のフィルタリング画像の比率を「２」として、合成することなどが挙げられる。また、weighted_accuracy_idx＝１で、weighted_idx＝１は１つ目のフィルタリング画像の比率を「３」、２つ目のフィルタリング画像の比率を「１」として画像を合成することを意味し、
weighted_idx＝２は１つ目のフィルタリング画像の比率を「１」、２つ目のフィルタリング画像の比率を「３」として画像を合成することを意味する。 Further, when weighted_accuracy_idx = 1, weighted_idx = 0 means that images are combined with the ratio of the first filtered image being “2” and the ratio of the second filtered image being “2”. For example, the ratio of the filtering image of the filter 108 is “2”, and the ratio of the filtering image of the filter 109 is “2”. Weighted_accuracy_idx = 1, weighted_idx = 1 means that the ratio of the first filtered image is “3”, and the ratio of the second filtered image is “1”, and the images are combined.
weighted_idx = 2 means that the ratio of the first filtered image is “1” and the ratio of the second filtered image is “3” to synthesize the images.

この重み付け比率はいろいろな手法で決定することが考えられるが、重み付け比率の候補を複数用意し、それぞれの候補の重み付け比率に基づいて実際に画像を合成し、合成された画像と原画像の誤差評価値が最も小さくなる重み付け比率を採用するアルゴリズムが最も符号化効率が良い。 This weighting ratio can be determined by various methods, but multiple weighting ratio candidates are prepared, images are actually combined based on the weighting ratios of each candidate, and the error between the combined image and the original image An algorithm that employs a weighting ratio with the smallest evaluation value has the highest coding efficiency.

（第２の実施の形態）
図９は、本発明になる動画像符号化装置の第２の実施の形態のブロック図を示す。同図中、図１と同一構成部分には同一符号を付し、その説明を省略する。本実施の形態の動画像符号化装置２００は、局部復号部２０１と、減算器１０３と、直交変換・量子化部１０４と、動きベクトル検出部１１２と、エントロピー符号化部１１３とからなり、局部復号部２０１の構成が図１の局部復号部１０２と異なる。 (Second Embodiment)
FIG. 9 shows a block diagram of a second embodiment of the moving picture coding apparatus according to the present invention. In the figure, the same components as those in FIG. The moving picture coding apparatus 200 according to the present embodiment includes a local decoding unit 201, a subtractor 103, an orthogonal transformation / quantization unit 104, a motion vector detection unit 112, and an entropy coding unit 113. The configuration of the decoding unit 201 is different from the local decoding unit 102 of FIG.

局部復号部２０１は、逆量子化・逆直交変換部１０５と、加算器１０６と、参照画像メモリ１０７と、フィルタ１０８と、動き補償部２０２と、フィルタ２０３及び２０４と、画像合成・選択部２０５とからなり、画像合成・選択部２０５から局部復号により得られた予測画像信号を減算器１０３へ出力する。また、画像合成・選択部２０５は、画像合成モード情報をエントロピー符号化部１１３へ供給する。 The local decoding unit 201 includes an inverse quantization / inverse orthogonal transform unit 105, an adder 106, a reference image memory 107, a filter 108, a motion compensation unit 202, filters 203 and 204, and an image synthesis / selection unit 205. The prediction image signal obtained by local decoding from the image synthesis / selection unit 205 is output to the subtractor 103. The image composition / selection unit 205 also supplies the image composition mode information to the entropy coding unit 113.

動き補償部２０２は、動きベクトル検出部１１２で検出した動きベクトルを用いて、参照画像信号をフィルタ１０８でアップサンプリングした画像信号に対して動き補償予測を行う。 The motion compensation unit 202 uses the motion vector detected by the motion vector detection unit 112 to perform motion compensation prediction on the image signal obtained by upsampling the reference image signal by the filter 108.

フィルタ２０３及び２０４は、互いに異なるフィルタ係数を有し、動き補償部２０２で動き補償した画像をそれぞれ別々にフィルタリングする。フィルタ２０３及び２０４は、フィルタ１０８のようなアップサンプリング用フィルタではなく、出力画像信号が入力画像信号と同一サンプル数であるフィルタリングを行う。 The filters 203 and 204 have different filter coefficients, and individually filter the images subjected to motion compensation by the motion compensation unit 202. The filters 203 and 204 are not upsampling filters such as the filter 108, and perform filtering in which the output image signal has the same number of samples as the input image signal.

画像合成・選択部２０５は、フィルタ２０３及び２０４でそれぞれフィルタリングされた２つの動き補償予測画像信号と、この２つの動き補償予測画像信号を所定の方法で合成した一つの合成画像信号とのうち、最小誤差評価値を持つ画像信号（入力画像信号との予測誤差が少ない一つの画像信号）を予測画像信号として選択する。この予測画像信号の生成方法については、第１の実施の形態で説明した生成方法と同一であるので説明は省略する。 The image synthesis / selection unit 205 includes two motion compensation predicted image signals filtered by the filters 203 and 204, and one synthesized image signal obtained by synthesizing the two motion compensation predicted image signals by a predetermined method. An image signal having a minimum error evaluation value (one image signal with a small prediction error from the input image signal) is selected as a predicted image signal. The predicted image signal generation method is the same as the generation method described in the first embodiment, and a description thereof will be omitted.

減算器１０３は、入力画像信号と画像合成・選択部２０５により選択されて出力された予測画像信号とを減算することで予測誤差信号を生成し、直交変換・量子化部１０４へ供給する。 The subtracter 103 generates a prediction error signal by subtracting the input image signal and the predicted image signal selected and output by the image synthesis / selection unit 205 and supplies the prediction error signal to the orthogonal transform / quantization unit 104.

第１の実施の形態の動画像符号化装置１００との相違点は、動画像符号化装置１００では参照画像信号に対して、互いに異なるフィルタ係数の３つのフィルタ１０８〜１１０でアップサンプリングを施して得た３つのアップサンプリング後の動き補償予測画像信号を選択・合成しているのに対して、本実施の形態の動画像符号化装置２００では１つのアップサンプリング後の動き補償予測画像信号に対して、フィルタ２０３及び２０４でそれぞれフィルタリングを施した２つの画像信号を選択・合成していることである。 The difference from the moving image encoding apparatus 100 of the first embodiment is that the moving image encoding apparatus 100 up-samples the reference image signal with three filters 108 to 110 having different filter coefficients. While the obtained three motion-compensated prediction image signals after upsampling are selected and synthesized, the moving picture coding apparatus 200 according to the present embodiment selects one motion-compensated prediction image signal after up-sampling. That is, the two image signals filtered by the filters 203 and 204 are selected and combined.

ただし、本実施の形態の動画像符号化装置２００も、第１の実施の形態の動画像符号化装置１００と同様に複数の動き補償予測画像信号を生成し、生成した複数の動き補償予測画像信号と、複数の動き補償予測画像信号を合成した合成画像信号とのうち、最小誤差評価値を持つ画像信号を選択するようにしているため、誤差の平均化により予測誤差が大幅に減少し符号化効率を向上させることができる。 However, the moving image encoding apparatus 200 according to the present embodiment also generates a plurality of motion compensated prediction image signals, similarly to the moving image encoding apparatus 100 according to the first embodiment, and generates the plurality of motion compensated prediction images. Since the image signal having the minimum error evaluation value is selected from the signal and the composite image signal obtained by synthesizing a plurality of motion compensated prediction image signals, the prediction error is greatly reduced due to the error averaging. Efficiency can be improved.

また、本実施の形態の動画像符号化装置２００も、予測誤差は小さくなるが動きベクトルの符号量がそれ以上に増加してしまう従来のＢピクチャの双方向予測の課題が解決され、動きベクトル情報の増加なく精度の高い予測画像信号を生成できる。 In addition, the moving picture coding apparatus 200 according to the present embodiment also solves the problem of conventional B-picture bi-prediction in which the prediction error is small but the code amount of the motion vector is further increased. A highly accurate predicted image signal can be generated without an increase in information.

次に、本発明の動画像符号化装置により符号化して得られたビットストリームを復号する動画像復号化装置について説明する。 Next, a moving picture decoding apparatus for decoding a bit stream obtained by encoding with the moving picture encoding apparatus of the present invention will be described.

図１０は、動画像復号化装置の一例のブロック図を示す。図１０に示すように、動画像復号化装置３００は、入力端子３０１を介して入力されるビットストリームをエントロピー復号化するエントロピー復号化部３０２と、動き補償・画像合成部３０３と、逆量子化・逆直交変換部３０４と、加算器３０５と、参照画像メモリ３０６と、３つのアップサンプリング用のフィルタ３０７、３０８及び３０９と、出力端子３１０とからなる。 FIG. 10 is a block diagram illustrating an example of a moving picture decoding apparatus. As illustrated in FIG. 10, the moving picture decoding apparatus 300 includes an entropy decoding unit 302 that performs entropy decoding on a bitstream input via an input terminal 301, a motion compensation / image synthesis unit 303, and an inverse quantization. An inverse orthogonal transform unit 304, an adder 305, a reference image memory 306, three upsampling filters 307, 308 and 309, and an output terminal 310.

入力端子３０１には、図１の動画像符号化装置１００により生成されたビットストリーム、あるいは動画像符号化装置１００の各構成要素の動作をコンピュータにより実行させて得られたビットストリームが入力される。 A bit stream generated by the video encoding device 100 in FIG. 1 or a bit stream obtained by causing a computer to execute the operation of each component of the video encoding device 100 is input to the input terminal 301. .

エントロピー復号化部３０２は、入力端子３０１に入力された上記のビットストリームをエントロピー復号化して直交変換・量子化されている画像信号を復号すると共に、動きベクトルや画像合成モード情報などの付加情報を復号する。 The entropy decoding unit 302 entropy-decodes the bitstream input to the input terminal 301 to decode an orthogonally transformed / quantized image signal, and adds additional information such as a motion vector and image synthesis mode information. Decrypt.

動き補償・画像合成部３０３は、エントロピー復号化部３０２から供給された動きベクトルを用いて、フィルタ３０７〜３０９からそれぞれ供給されるアップサンプリングされた参照画像信号に対して、動き補償予測を行って複数の動き補償予測画像信号を生成すると共に、更にエントロピー復号化部３０２から供給された画像合成モード情報に基づいて、上記の複数の動き補償予測画像信号を用いて画像合成モードに対応した画像合成を行う。この画像合成方法は、前述した動画像符号化装置における画像合成方法と同じである。 The motion compensation / image synthesis unit 303 performs motion compensation prediction on the upsampled reference image signals respectively supplied from the filters 307 to 309 using the motion vector supplied from the entropy decoding unit 302. A plurality of motion compensated prediction image signals are generated, and further, based on the image composition mode information supplied from the entropy decoding unit 302, image synthesis corresponding to the image composition mode is performed using the plurality of motion compensation prediction image signals. I do. This image synthesizing method is the same as the image synthesizing method in the above-described moving image encoding apparatus.

逆量子化・逆直交変換部３０４は、エントロピー復号化部３０２から出力された直交変換・量子化されている画像信号を逆量子化・逆直交変換して差分信号（予測誤差信号）を復号する。加算器３０５は、逆量子化・逆直交変換部３０４から出力された予測誤差信号と、動き補償・画像合成部３０３から出力された動き補償予測画像信号とを加算して画像信号を復号化して出力端子３１０へ出力すると共に、参照画像メモリ３０６へ出力する。参照画像メモリ３０６は、加算器３０５から供給される復号化された画像信号を参照画像信号として蓄積する。 The inverse quantization / inverse orthogonal transform unit 304 performs inverse quantization / inverse orthogonal transform on the orthogonally transformed / quantized image signal output from the entropy decoding unit 302 to decode a differential signal (prediction error signal). . The adder 305 adds the prediction error signal output from the inverse quantization / inverse orthogonal transform unit 304 and the motion compensated prediction image signal output from the motion compensation / image synthesis unit 303 to decode the image signal. The data is output to the output terminal 310 and also output to the reference image memory 306. The reference image memory 306 stores the decoded image signal supplied from the adder 305 as a reference image signal.

フィルタ３０７、３０８及び３０９は、それぞれ互いに異なるフィルタ係数を用いて、参照画像メモリ３０６から供給される同じ参照画像信号に対して、別々にフィルタリングして、アップサンプリングされた予測画像信号を生成し、その予測画像信号を動き補償・画像合成部３０３へ出力する。これらフィルタ３０７、３０８及び３０９は、図１のフィルタ１０８、１０９及び１１０に対応した構成のフィルタである。 The filters 307, 308, and 309 separately filter the same reference image signal supplied from the reference image memory 306 using different filter coefficients to generate an upsampled predicted image signal, The predicted image signal is output to the motion compensation / image synthesis unit 303. These filters 307, 308 and 309 are filters corresponding to the filters 108, 109 and 110 in FIG.

上記の動画像復号化装置３００は、参照画像信号をアップサンプリングするフィルタを３つ（フィルタ３０７〜３０９）用意し、動き補償・画像合成部３０３においてこれら３つのフィルタ３０７〜３０９でそれぞれアップサンプリングされた３つの参照画像信号に基づいて、エントロピー復号化部３０２からの画像合成モード情報に応じて生成した画像信号を予測画像に用いることに特徴がある。 The moving image decoding apparatus 300 prepares three filters (filters 307 to 309) for upsampling the reference image signal, and the motion compensation / image synthesis unit 303 performs upsampling on these three filters 307 to 309, respectively. The image signal generated according to the image synthesis mode information from the entropy decoding unit 302 based on the three reference image signals is characterized in that it is used as a predicted image.

次に、この動画像復号化装置３００の要部である動き補償・画像合成部３０３の詳細について説明する。 Next, details of the motion compensation / image synthesis unit 303, which is a main part of the video decoding device 300, will be described.

動き補償・画像合成部３０３は、まず、フィルタ３０７〜３０９から供給されるアップサンプリングされた参照画像信号に対して、エントロピー復号化部３０２から供給される１本の動きベクトルを用いて動き補償予測処理を施して、３つの動き補償予測画像信号を生成する。 The motion compensation / image synthesis unit 303 first uses the single motion vector supplied from the entropy decoding unit 302 for the upsampled reference image signal supplied from the filters 307 to 309 to perform motion compensation prediction. Processing is performed to generate three motion compensated prediction image signals.

続いて、動き補償・画像合成部３０３は、エントロピー復号化部３０２から供給される画像合成モード情報が所定の２つの動き補償予測画像信号を合成することを示している場合は、上記の３つの動き補償予測画像信号から画像合成モードで指定された２つの動き補償予測画像信号を合成する。この画像合成の手法は、図３と共に説明した２つの画像信号の平均値を算出する方法が一般的であるが、図７、図８と共に説明した重み付けをした画像合成でもよい。いずれにしても、その画像合成方法は、画像合成モード情報により示された画像合成方法が採用される。 Subsequently, the motion compensation / image synthesis unit 303, when the image synthesis mode information supplied from the entropy decoding unit 302 indicates that two predetermined motion compensation predicted image signals are synthesized, Two motion compensation predicted image signals designated in the image synthesis mode are synthesized from the motion compensated prediction image signal. This image composition method is generally the method of calculating the average value of two image signals described with reference to FIG. 3, but may be the weighted image composition described with reference to FIGS. In any case, the image composition method indicated by the image composition mode information is adopted as the image composition method.

また、動き補償・画像合成部３０３は、エントロピー復号化部３０２から供給される画像合成モード情報が上記の３つの動き補償予測画像信号のうちの所定の一つの動き補償予測画像信号を選択することを示している場合は、上記の３つの動き補償予測画像信号から画像合成モードで指定された１つの動き補償予測画像信号を選択して出力する。 In addition, the motion compensation / image synthesis unit 303 selects a predetermined motion compensation predicted image signal from among the three motion compensation predicted image signals as the image synthesis mode information supplied from the entropy decoding unit 302. , One motion compensated predicted image signal designated in the image synthesis mode is selected from the three motion compensated predicted image signals and output.

このようにして、動き補償・画像合成部３０２は、選択又は合成して得た動き補償予測画像信号を予測画像信号として加算器３０５へ供給する。加算器３０５はこの予測画像信号と逆量子化・逆直交変換部３０４からの復号された予測誤差信号とを加算して、復号化した画像信号を出力する。 In this way, the motion compensation / image synthesis unit 302 supplies the motion compensated predicted image signal obtained by selection or synthesis to the adder 305 as a predicted image signal. The adder 305 adds the predicted image signal and the decoded prediction error signal from the inverse quantization / inverse orthogonal transform unit 304, and outputs a decoded image signal.

この動画像復号化装置３００によれば、動き補償予測の際に参照画像信号に対して複数のアップサンプリング手法を用いて複数の動き補償予測画像信号を生成し、生成した複数の動き補償予測画像信号と、複数の動き補償予測画像信号を合成した複数の合成画像信号とのうち、画像合成モード情報により指定された画像信号を予測画像信号にすることで、誤差の平均化により予測誤差が大幅に減少し符号化効率を向上させることができる。 According to the moving picture decoding apparatus 300, a plurality of motion compensated prediction image signals are generated by using a plurality of upsampling techniques with respect to a reference image signal at the time of motion compensation prediction, and the generated plurality of motion compensated prediction images are generated. Prediction error due to error averaging by making the image signal specified by the image synthesis mode information out of a plurality of synthesized image signals obtained by combining the signal and a plurality of motion compensated predicted image signals into a predicted image signal Thus, the coding efficiency can be improved.

また、従来のＢピクチャの双方向予測では動きベクトルが２本必要であり、片方向予測と比較して動きベクトルの情報量が増加してしまう。これに対し、動画像復号化装置３００のような、３つのフィルタを用いて複数の動き補償予測画像信号を生成し、それらの中から一つの予測画像信号を選択する方法では、動きベクトルが１本のままで済むため、動きベクトルの情報量は増加しない。そのため、従来のＢピクチャの双方向予測のように、予測誤差は小さくなるが動きベクトルの符号量がそれ以上に増加してしまう課題が解決され、動きベクトル情報の増加なく精度の高い予測画像信号を生成できる。 In addition, in the conventional bi-prediction of a B picture, two motion vectors are required, and the information amount of the motion vector increases as compared with the unidirectional prediction. On the other hand, in a method of generating a plurality of motion compensated prediction image signals using three filters and selecting one prediction image signal from among them, such as the moving image decoding apparatus 300, the motion vector is 1 Since the book is sufficient, the information amount of the motion vector does not increase. Therefore, the problem that the prediction error becomes small but the code amount of the motion vector increases more than the conventional B picture bi-directional prediction is solved, and the prediction image signal with high accuracy without increasing the motion vector information is solved. Can be generated.

なお、本発明は以上の実施の形態に限定されるものではなく、例えば、合成画像の生成要素の動き補償予測画像の数は２つに限らず、３つ以上でもよい。また、本発明は、ＭＰＥＧ-４ＡＶＣのダイレクトモード、スキップモードのように、所定の計算方法により自動的に動きベクトルを決定し、動きベクトル情報を伝送しないようなモードに対して適用できることも明白である。例えば、所定の計算方法により自動決定された動きベクトルに対して、画像合成モード情報だけを伝送することで予測画像を生成することや、動きベクトルだけでなく、画像合成モードも自動的に決定することで予測画像を生成することなどが考えられる。 Note that the present invention is not limited to the above-described embodiment. For example, the number of motion compensated prediction images as a generation element of a composite image is not limited to two, and may be three or more. It is also apparent that the present invention can be applied to a mode in which a motion vector is automatically determined by a predetermined calculation method and motion vector information is not transmitted, such as the MPEG-4 AVC direct mode and skip mode. is there. For example, with respect to a motion vector automatically determined by a predetermined calculation method, a predicted image is generated by transmitting only image synthesis mode information, and not only a motion vector but also an image synthesis mode is automatically determined. For example, a predicted image may be generated.

また、本発明はＢピクチャにも適用できる。例えば、ＭＰＥＧ-４ＡＶＣのＢピクチャは、２つのリスト（Ｌ０、Ｌ１）からそれぞれ参照画像を選択し、各参照画像を合成することができる。本発明を適用すると、リスト（Ｌ０、Ｌ１）毎に画像を合成し、合成された画像に対して更に画像を合成することができる。つまり、各リストの画像を本発明で合成した上で、さらにＭＰＥＧ-４ＡＶＣのＢピクチャのように、各リストで選択した画像を合成することも可能である。 The present invention can also be applied to B pictures. For example, for MPEG-4AVC B pictures, reference images can be selected from two lists (L0, L1), and the respective reference images can be combined. By applying the present invention, it is possible to synthesize images for each list (L0, L1) and further synthesize an image with the synthesized images. That is, it is possible to combine the images selected in each list, such as MPEG-4AVC B pictures, after combining the images in each list according to the present invention.

このように、本発明は，１本の動きベクトルを用いて複数の動き補償予測画像を生成しその画像を合成して予測画像とすることを特徴とするものであり、そのような符号化装置はすべて本発明に含まれる。 As described above, the present invention is characterized in that a plurality of motion compensated prediction images are generated using one motion vector, and the images are combined to form a prediction image. Are all included in the present invention.

また、本発明は、上記の動画像符号化装置１００、２００の機能をコンピュータに実現させるための動画像符号化プログラムも含むものである。この動画像符号化プログラムは、記録媒体から読み取られてコンピュータに取り込まれてもよいし、通信ネットワークを介して伝送されてコンピュータに取り込まれてもよい。更に、本発明は、上記の動画像符号化装置１００、２００で生成したビットストリームをパケット化して送信する送信装置にも適用することができる。 The present invention also includes a moving picture coding program for causing a computer to realize the functions of the moving picture coding apparatuses 100 and 200 described above. This moving image encoding program may be read from a recording medium and taken into a computer, or may be transmitted via a communication network and taken into a computer. Furthermore, the present invention can also be applied to a transmission apparatus that packetizes and transmits the bit stream generated by the moving image encoding apparatuses 100 and 200 described above.

本発明は、動画像符号化装置及び動画像符号化プログラムに関するもので、特に動画像符号化の符号化効率を向上させる技術として非常に有用である。 The present invention relates to a moving image encoding apparatus and a moving image encoding program, and is particularly useful as a technique for improving the encoding efficiency of moving image encoding.

１００、２００動画像符号化装置
１０１画像信号入力端子
１０２、２０１局部復号部
１０３減算器
１０４直交変換・量子化部
１０５逆量子化・逆直交変換部
１０６加算器
１０７参照画像メモリ
１０８、１０９、１１０アップサンプリング用フィルタ
１１１動き補償・画像合成選択部
１１２動きベクトル検出部
１１３エントロピー符号化部
１１４ビットストリーム出力端子
２０２動き補償部
２０３、２０４フィルタ
２０５画像合成・選択部 100, 200 Video encoding device 101 Image signal input terminals 102, 201 Local decoding unit 103 Subtractor 104 Orthogonal transformation / quantization unit 105 Inverse quantization / inverse orthogonal transformation unit 106 Adder 107 Reference image memory 108, 109, 110 Upsampling filter 111 Motion compensation / image synthesis selection unit 112 Motion vector detection unit 113 Entropy encoding unit 114 Bit stream output terminal 202 Motion compensation unit 203, 204 Filter 205 Image synthesis / selection unit

Claims

A reference image signal is generated by locally decoding an encoded signal obtained by encoding an encoding target image signal in an input moving image signal, and a prediction image signal obtained by performing motion compensation prediction on the reference image signal is used. In the moving picture coding method for performing motion compensation predictive coding of the current coding target picture signal in the input moving picture signal,
A first step of detecting a motion vector of a current encoding target image signal in the input moving image signal;
A second step of generating a motion compensated prediction image signal corresponding to each of the filter processes from the generated reference image signal by two or more filter processes and a motion compensation process based on the same motion vector; When,
A third step of generating one or more combined motion compensated prediction image signals by combining two or more motion compensation prediction image signals of different combinations among the plurality of generated motion compensation prediction image signals;
A fourth step of selecting a signal having the smallest error evaluation value from the plurality of motion compensated prediction image signals and the combined motion compensation prediction image signal and outputting the selected signal as the prediction image signal;
A differential signal between the current encoding target image signal and the predicted image signal selected in the fourth step is orthogonally transformed and quantized to generate a quantized signal that is the encoded signal. And the steps
Reference image signal specifying information, motion vector specifying information, and filter for specifying each of the generated reference image signal, the same motion vector, and the filter processing used for generating the selected predicted image signal A sixth step of adding processing specific information to a moving image encoded signal generated by performing a predetermined encoding process on the quantized signal;
A seventh step of generating a new reference image signal by local decoding processing by adding the selected predicted image signal to a signal obtained by inverse quantization and inverse orthogonal transform of the quantized signal; A moving picture encoding method comprising:

In the third step, one or more synthesized motion compensated predictions are obtained by weighting and synthesizing two or more motion compensated predicted image signals of different combinations among the plurality of motion compensated predicted image signals at a set synthesis ratio. Generate an image signal,
The sixth step is a composite ratio information indicating a composite ratio used for generating the predicted image signal when the fourth step selects the composite motion compensated predicted image signal and outputs it as the predicted image signal. The video encoding method according to claim 1, further comprising: adding to the video encoded signal.

A reference image signal is generated by locally decoding an encoded signal obtained by encoding an encoding target image signal in an input moving image signal, and a prediction image signal obtained by performing motion compensation prediction on the reference image signal is used. In the moving picture coding apparatus for performing motion compensation predictive coding of the current coding target picture signal in the input moving picture signal,
Motion vector detecting means for detecting a motion vector of the current encoding target image signal in the input moving image signal;
A motion-compensated prediction image that generates a motion-compensated prediction image signal corresponding to each filter process from the generated reference image signal by two or more filter processes and a motion compensation process based on the same motion vector Signal generating means;
A combined motion compensated prediction image signal generation unit that generates two or more combined motion compensation prediction image signals by combining two or more motion compensation prediction image signals of different combinations among the plurality of generated motion compensation prediction image signals; ,
Prediction image signal selection means for selecting a signal having the smallest error evaluation value from the plurality of motion compensation prediction image signals and the combined motion compensation prediction image signal and outputting the selected signal as the prediction image signal;
Quantization that generates a post-quantization signal that is the encoded signal by performing orthogonal transform and quantization on a difference signal between the current encoding target image signal and the predicted image signal selected by the predicted image signal selection unit A signal generation means after conversion;
Reference image signal specifying information, motion vector specifying information, and filter for specifying each of the generated reference image signal, the same motion vector, and the filter processing used for generating the selected predicted image signal Encoding means for adding processing specific information to a moving image encoded signal generated by performing a predetermined encoding process on the quantized signal; and
Reference image signal generation means for generating a new reference image signal by local decoding processing by adding the selected predicted image signal to a signal obtained by inverse quantization and inverse orthogonal transform of the quantized signal A moving picture encoding apparatus comprising:

The synthesized motion compensated predicted image signal generation means weights and synthesizes each of two or more motion compensated predicted image signals of different combinations among the plurality of motion compensated predicted image signals at a set synthesis ratio. Generating a composite motion compensated prediction image signal;
The encoding means, when the prediction image signal selection means selects the composite motion compensated prediction image signal and outputs it as the prediction image signal, the composition ratio information indicating the composition ratio used to generate the prediction image signal The video encoding apparatus according to claim 3, further comprising: a video encoding signal added to the video encoding signal.

A reference image signal is generated by locally decoding an encoded signal obtained by encoding an encoding target image signal in an input moving image signal, and a prediction image signal obtained by performing motion compensation prediction on the reference image signal is used. In the moving picture coding program for causing the computer to execute a moving picture coding process for performing motion compensation prediction coding of the current coding target image signal in the input moving picture signal,
In the computer,
A first step of detecting a motion vector of a current encoding target image signal in the input moving image signal;
A second step of generating a motion compensated prediction image signal corresponding to each of the filter processes from the generated reference image signal by two or more filter processes and a motion compensation process based on the same motion vector; When,
A third step of generating one or more combined motion compensated prediction image signals by combining two or more motion compensation prediction image signals of different combinations among the plurality of generated motion compensation prediction image signals;
A fourth step of selecting a signal having the smallest error evaluation value from the plurality of motion compensated prediction image signals and the combined motion compensation prediction image signal and outputting the selected signal as the prediction image signal;
A differential signal between the current encoding target image signal and the predicted image signal selected in the fourth step is orthogonally transformed and quantized to generate a quantized signal that is the encoded signal. And the steps
Reference image signal specifying information, motion vector specifying information, and filter for specifying each of the generated reference image signal, the same motion vector, and the filter processing used for generating the selected predicted image signal A sixth step of adding processing specific information to a moving image encoded signal generated by performing a predetermined encoding process on the quantized signal;
A seventh step of generating a new reference image signal by local decoding processing by adding the selected predicted image signal to a signal obtained by inverse quantization and inverse orthogonal transform of the quantized signal; A moving picture encoding program characterized in that

In the third step, one or more synthesized motion compensated predictions are obtained by weighting and synthesizing two or more motion compensated predicted image signals of different combinations among the plurality of motion compensated predicted image signals at a set synthesis ratio. Generate an image signal,
The sixth step is a composite ratio information indicating a composite ratio used for generating the predicted image signal when the fourth step selects the composite motion compensated predicted image signal and outputs it as the predicted image signal. The moving picture coding program according to claim 5, further adding to the moving picture coded signal.