JP3700230B2

JP3700230B2 - Motion compensation method in video coding

Info

Publication number: JP3700230B2
Application number: JP361696A
Authority: JP
Inventors: 雄一郎中屋; 芳典鈴木; 哲伊達
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1996-01-12
Filing date: 1996-01-12
Publication date: 2005-09-28
Anticipated expiration: 2016-01-12
Also published as: JPH09200763A

Description

【０００１】
【発明の属する技術分野】
本発明は、動画像符号化における動き補償方法に関するものである。
【０００２】
【従来の技術】
動画像の高能率符号化において、時間的に近接するフレーム間の類似性を活用する動き補償は情報圧縮に大きな効果を示すことが知られている。現在の画像符号化技術の主流となっている動き補償方式は、動画像符号化方式の国際標準であるＭＰＥＧ１およびＭＰＥＧ２にも採用されている半画素精度のブロックマッチングである。この方式では、符号化しようとする画像を多数のブロックに分割し、ブロックごとにその動きベクトルを水平・垂直方向に隣接画素間距離の半分の長さを最小単位として求める。この処理を数式を用いて表現すると以下のようになる。符号化しようとするフレーム（現フレーム）の予測画像をＰ(ｘ, ｙ)、参照画像（Ｐと時間的に近接しており、既に符号化が完了しているフレームの復号画像）をＲ(ｘ, ｙ)とする。また、ｘとｙは整数であるとして、ＰとＲでは座標値が整数である点に画素が存在すると仮定する。このとき、ＰとＲの関係は、
【０００３】
【数１】

【０００４】
で表される。ただし、画像はｎ個のブロックに分割されるとして、Ｂiは画像のｉ番目のブロックに含まれる画素、(ｕi, ｖi)はｉ番目のブロックの動きベクトルを表している。
【０００５】
半画素精度のブロックマッチングでは、ｕiとｖiはそれぞれ画素間距離の半分、つまりこの場合は１／２を最小単位として求められることになる。したがって、座標値が整数ではなく、参照画像において実際には画素が存在しない点（以後、このような点を内挿点とよぶ）の輝度値を求めることが必要となる。この際の処理としては、周辺４画素を用いた共１次内挿が使われることが多い。この内挿方式を数式で記述すると、座標値の小数成分をαとβ（０≦α, β＜１）として、参照画像の内挿点(ｘ＋α, ｙ＋β)における輝度値Ｒ(ｘ＋α, ｙ＋β)は、
【０００６】
【数２】

【０００７】
で表される。
【０００８】
半画素精度のブロックマッチングは上で述べた通り、現在広く用いられているが、ＭＰＥＧ１やＭＰＥＧ２より高い情報圧縮率が必要となるアプリケーションではさらに高度な動き補償方式が要求される。ブロックマッチングの欠点はブロック内のすべての画素が同一の動きベクトルを持たなければならない点にある。そこでこの問題を解決するために、隣接する画素が異なる動きベクトルを持つことを許容する動き補償方式が最近提案されている。以下にこの方式の一例である空間変換に基づく動き補償に関して簡単に説明する。
【０００９】
空間変換に基づく動き補償では、予測画像Ｐと参照画像Ｒの関係は、
【００１０】
【数３】

【００１１】
で表される。ただし、画像はｎ個の小領域（パッチ）に分割されるとして、Ｐiは画像のｉ番目のパッチに含まれる画素を表している。また、変換関数ｆi(ｘ, ｙ)とｇi(ｘ, ｙ)は現フレームの画像と参照画像との間の空間的な対応を表現している。このとき、Ｐi内の画素(ｘ, ｙ)の動きベクトルは、(ｘ−ｆi(ｘ, ｙ),ｙ−ｇi(ｘ, ｙ))で表すことができる。ところで、ブロックマッチングは変換関数が定数である方式として、空間変換に基づく動き補償の特殊な例として解釈することもできる。しかし、本明細書で空間変換に基づく動き補償という言葉を用いるときには、ブロックマッチングはその中に含まないこととする。
【００１２】
変換関数の形としては、アフィン変換
【００１３】
【数４】

【００１４】
を用いた例（中屋他、「３角形パッチに基づく動き補償の基礎検討」、電子情報通信学会技術報告、IE90-106、平2-03参照）、共１次変換
【００１５】
【数５】

【００１６】
を用いた例（ G. J. Sullivan and R. L. Baker, "Motion compensation for video compression using control grid interpolation", Proc. ICASSP '91 , M9.1, pp.2713-2716, 1991-05）
などが報告されている。ここでａij、ｂijはパッチごとに推定される動きパラメータである。実際の画像符号化を行う場合には、ａij、ｂijを直接伝送するのではなく、パッチの頂点の動きベクトルが伝送される。例えば変換関数としてアフィン変換を採用し、３角形のパッチを用いればパッチの３個の頂点の動きベクトルから動きパラメータａijを計算することができる。したがって、受信側では受信した頂点の動きベクトルから送信側と同じ変換関数を構成することが可能となる。一方、変換関数として共１次変換を用いた場合には、長方形のパッチを用いてそのパッチの４個の頂点の動きベクトルを伝送すれば同様の処理を実現することができる。以下では、変換関数にアフィン変換を用いた場合に関して説明するが、この説明は共１次変換を用いた場合についても、ほぼそのまま適用することができる。
【００１７】
変換関数が確定しても空間変換に基づく動き補償には様々なバリエーションを考えることができるが、その一例を図１に示す。この例では、パッチの境界において動きベクトルが連続的に変化するように制約されている。以下では、参照画像１０１を用いて現フレームの原画像１０２の予測画像を合成することを考える。このために、まず現フレームは複数の多角形のパッチに分割され、パッチ分割された画像１０８となる。パッチの頂点は格子点とよばれ、各格子点は複数のパッチに共有される。例えば、パッチ１０９は、格子点１１０、１１１、１１２から構成され、これらの格子点は他のパッチの頂点を兼ねている。こうして画像を複数のパッチに分割した後に、動き推定が行なわれる。ここに示す例では、動き推定は各格子点を対象として参照画像との間で行なわれる。この結果、動き推定後の参照画像１０３で各パッチは変形されたものとなる。例えば、パッチ１０９は、変形されたパッチ１０４に対応している。これは、動き推定の結果、格子点１０５、１０６、１０７がそれぞれ１１０、１１１、１１２に移動したと推定されたためである。予測画像はパッチ内の各画素に関して変換関数を計算し、数３にしたがって参照画像の中から対応する点の輝度値を求めることにより合成される。このように一方の画像の一部に変形操作を加えて他の画像に貼り付ける処理のことをテキスチャマッピング、またはイメージワーピングとよぶ。これは、上で述べた通り３個の頂点の動きベクトルから数４の６個の動きパラメータを計算し、画素ごとに数４を計算することにより実現することができる。
【００１８】
空間変換に基づく動き補償は、回転や拡大など、ブロックマッチングでは対応できない動きパターンにも対応出来ることに特徴がある。その一方で処理の演算量が多いという問題があるが、これを簡略化する方法として特願平06-193971で示されている方法などが考案されている。
【００１９】
図１で示した空間変換に基づく動き補償方式のもう一つの欠点として、動きベクトルの不連続に対応できないことが挙げられる。一般的に動画像においては動物体と静止物体の境界部分等において動きベクトルの不連続が発生する。しかし、図１で示した方式では動きベクトルの連続性（パッチ内はもちろん、パッチの境界においても動きベクトルは連続的に変化すること）が仮定されているため、復号化後の再生画像内において動物体と静止物体の境界部分で不自然な歪みが発生するなどの問題が発生する。
【００２０】
【発明が解決しようとする課題】
ブロックマッチングでは単純な平行移動に基づく動きモデルが用いられているため、十分に画像内の物体の動きを近似することができない。一方空間変換に基づく動き補償では動きベクトルの連続性が仮定されているため、再生画像の動物体と静止物体の境界部分等で不自然な歪みを発生する問題が生じる。
【００２１】
【課題を解決するための手段】
画像をブロックに分割し、それぞれのブロックに複数の動きベクトルが関与するようにする。そして、これらの動きベクトルを使用して、複数の候補の中で最適な動き補償方式を選択できるようにする。
【００２２】
【発明の実施の形態】
図２に画像を横ｎ個、縦ｍ個（ｎ、ｍは正の整数）のブロックに分割し、１個または複数のブロックの頂点が重なる位置に（ｎ＋１）×（ｍ＋１）個の格子点と呼ばれる点を配置した様子を示す。例えば２０１、２０２はブロック、２０３、２０４、２０５、２０６は格子点である。なお、格子点は必ずしもブロックの頂点の上に存在しているとしなくても良いが、本明細書では図２に示した位置に存在しているとする。
【００２３】
各格子点は動きベクトルを持つことができるとする。この動きベクトルの推定方法としては、中心に格子点を持つブロックによってブロックマッチングを行うなどの方法を考えることができる。各ブロック内の予測画像は、そのブロックの頂点に位置する４個の格子点の動きベクトルを用いて合成されるとする。例えばブロック２０１を考えた場合、格子点２０３、２０４、２０５、２０６の４個の格子点の動きベクトルが使用される。このように、画像内のすべてのブロックには４個の格子点が関与するようにすることができる。
【００２４】
４個の格子点を用いて動き補償を行う方法は複数考えられる。その１つとして、４個の格子点の中から１個を選択し、その格子点の動きベクトルをブロック全体の動きベクトルとする方法が挙げられる。例えば図２のブロック２０１では、格子点２０３、２０４、２０５、２０６の中から最適な動きベクトルを持つ格子点を選択することができる。また、４個の格子点の動きベクトルの平均値をブロックの動きベクトルとしても良い。さらに、１個または２個の格子点の動きベクトルが他と大きく異なるような場合には、４個の動きベクトルの水平、垂直成分それぞれの中から最大値と最小値を除外してから平均を求める方法も考えることができる。この予測画像の合成方法はブロックマッチングと同じであり、単純な演算で良好な予測特性を得ることができる。しかし、物体の回転、拡大・縮小や同一ブロック内に異なる方向に動く複数の物体が存在する場合には、十分な特性が得られない。
【００２５】
図３には画像を複数の直角３角形に分割した例を示す。ブロック３０１の例では格子点３０３、３０５、３０４によって構成される３角形と格子点３０４、３０５、３０６によって構成される３角形に分割される。ブロック３０２の例では、格子点３０７、３１０、３０８によって構成される３角形と格子点３０７、３０９、３１０によって構成される３角形に分割される。このような分割を行った場合の予測画像の合成方法として、まずそれぞれの部分に、ブロックに関与している４個の動きベクトルの中から１個ずつの動きベクトルを割り当てる方法が考えられる。例えばブロック３０１の例では格子点３０３、３０５、３０４によって構成される３角形に対しては格子点３０３の動きベクトルを、格子点３０４、３０５、３０６によって構成される３角形に対しては格子点３０６の動きベクトルを割り当てることができる。この方法は上で述べた、ブロック全体に１個の動きベクトルを割り当てる方式と比較して処理がやや複雑であるが、その分対応できる動きの範囲が広くなっている。例えば３０１のブロック分割において動物体と静止物体の境界が格子点３０４と３０５を結ぶ対角線上に位置しているような場合には、高い予測特性を示すことが予想できる。また、ブロック分割の方法としてブロック３０１に示した例とブロック３０２に示した例の両方の中から一方を選択できるようにすれば、対応できる動きのパターンの範囲をさらに広げることができる。なお、ブロックを２個の部分に分割する方法として、図３で示した方法以外にも手段があることは明らかである。図３の方式以外の分割においてもそれぞれの部分に４個の動きベクトルの中から１個を割り当てれば同様の処理を実現することができる。
【００２６】
図３の分割を用いたもう１つの予測画像の合成方法として、数４で示したアフィン変換を用いた方法を考えることができる。例えばブロック３０１の例では、格子点３０３、３０５、３０４の動きベクトルを用いることによって、これらの格子点によって構成される３角形に対してアフィン変換に基づく予測画像の合成を行うことができる。また、同様の処理は格子点３０５、３０６、３０４によって構成される３角形に対しても行うことができる。ここでは、「従来の技術」の中で説明した空間変換に基づく動き補償と共通の技術が使用されている。この方法は上で述べた方法と比べてさらに処理が複雑となるが、物体の回転、拡大・縮小に対応できる重要な特徴を持っている。ただ、ブロック内で動きベクトルは連続となるので、ブロック内に異なる動きをする複数の物体が存在するような場合には良い特性は得られない。また、この方法においてもブロックの分割に関して２通りの方法が選択できるようにすることによって、予測特性をさらに改善することができる。
【００２７】
アフィン変換を用いた例と良く似た予測画像の合成方法として数５で示した共１次変換を用いた方法がある。例えば、図２のブロック２０１の場合、格子点２０３、２０４、２０５、２０６の４個の動きベクトルを用いることによって数５でｂijによって表される８個のパラメータを決定することができる。こうしてパッチ内の画素に対して空間変換に基づく動き補償と共通の技術を用いて予測画像の合成を行えばよい。この共１次変換を用いた方法はアフィン変換を用いた方法と良く似た特性を示すが、処理がやや複雑である。しかし、動きのパターンによってはアフィン変換を用いた場合より良い特性を示すことがあるため、予測画像の合成方法の選択肢の中に入れておくと便利である。
【００２８】
図４はブロックを４個の部分に分割する予測画像の合成方法を示した例である。格子点４０２、４０３、４０４、４０５によって構成される４角形のブロック４０１は４個の小ブロック４０６、４０７、４０８、４０９に分割される。そして、例えば小ブロック４０６には格子点４０２の動きベクトル、小ブロック４０７には格子点４０４の動きベクトル、小ブロック４０８には格子点４０３の動きベクトル、小ブロック４０９には格子点４０５の動きベクトルを割り当てることによって４個の小ブロックによるブロックマッチングを実現できる。この方法はブロックを４個の部分に分割するため、上で述べた方法と比較してより多くの異なる動きをする物体が存在するような場合にも対応することができる。ただし、物体の回転、拡大・縮小には十分に対応することができない。なお、ブロックを４個の部分に分割する方法として、図４で示した方法以外にも手段があることは明らかである。図４の方式以外の分割においてもそれぞれの部分に４個の動きベクトルの中から１個を割り当てれば同様の処理を実現することができる。
【００２９】
以上述べてきたように、ここでとりあげたブロック内の予測画像の合成方式は、それぞれに長所と欠点を持ち合わせている。動画像は、一般的に画像内の領域ごとに動きのパターンが大きく異なるという特徴を持っている。例えば静止領域、平行移動する領域、回転する領域、動領域と静止領域の境界などでは全く動きのパターンが異なっている。したがって画像をブロックに分割して領域ごとの局所的な対応ができるようにしたとしても、予測画像の合成方法が１通りしかない（＝従来のブロックマッチングなど）では十分な予測特性を得ることができない。そこで、上で挙げた多数の予測画像の合成方法の中からブロックごとに最適な方法を選択することができるようにすることによって、予測特性を向上させることが可能となる。予測画像の誤差のみを考えるのであれば、異なる特徴を持った予測画像の合成方法をより多数持った方が特性を上げる上で有利となる。
【００３０】
予測画像の合成方法はブロックごとに必ずしも１個のみを選ぶ必要はない。２個の合成方法を選択し（例えばアフィン変換とブロック４分割）、両者の間で輝度値の平均値をとったものをブロック内の予測画像とすることによって、さらに予測特性を向上させることも可能である。一般に異なる予測画像を平均化することによってノイズ除去とローパスフィルタの効果が得られ、予測特性が改善されることが知られている。ＭＰＥＧ１やＭＰＥＧ２などの標準動画像符号化方式において、２つの方向（正方向と逆方法）から予測した画像を平均化する操作を行うこと（＝両方向予測）ができるようになっているのはこのためである。なお、本明細書に記載した発明が、この両方向予測にも適用できることは明らかである。
【００３１】
【発明の効果】
本発明により、動画像符号化の動き補償処理における予測特性を向上させることができる。
【図面の簡単な説明】
【図１】空間変換に基づく動き補償の処理の例を示した図である。
【図２】ブロックと格子点の配置の例を示した図である。
【図３】ブロックを２個の直角３角形に分割して動き補償を行う方式の例を示した図である。
【図４】ブロックを４個の小ブロックに分割して動き補償を行う方式の例を示した図である。
【符号の説明】
１０１…参照画像、１０２…現フレームの原画像、１０３…動き推定後の参照画像のパッチと格子点、１０４、１０９…パッチ、１０５〜１０７、１１０〜１１２、２０３〜２０６、３０３〜３１０、４０２〜４０５…格子点、２０１、２０２、３０１、３０２、４０１…ブロック、４０６〜４０９…小ブロック。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a motion compensation method in moving picture coding.
[0002]
[Prior art]
In high-efficiency coding of moving images, it is known that motion compensation that uses the similarity between temporally adjacent frames has a great effect on information compression. The motion compensation method that is the mainstream of the current image coding technology is block matching with half-pixel accuracy, which is also adopted in MPEG1 and MPEG2 which are international standards for moving image coding. In this method, an image to be encoded is divided into a large number of blocks, and the motion vector for each block is obtained in the horizontal / vertical direction with a length half the distance between adjacent pixels as a minimum unit. This process is expressed as follows using mathematical formulas. P (x, y) is a predicted image of a frame to be encoded (current frame), and a reference image (decoded image of a frame that is temporally close to P and already encoded) is R ( x, y). Further, assuming that x and y are integers, it is assumed that pixels exist at points where the coordinate values are integers in P and R. At this time, the relationship between P and R is
[0003]
[Expression 1]

[0004]
It is represented by However, the image is divided into n blocks, Bi is a pixel included in the i-th block of the image, and (ui, vi) represents a motion vector of the i-th block.
[0005]
In block matching with half-pixel accuracy, ui and vi are each obtained by taking half of the distance between pixels, that is, 1/2 in this case as a minimum unit. Therefore, it is necessary to obtain the luminance value of a point where the coordinate value is not an integer and no pixel actually exists in the reference image (hereinafter, such a point is referred to as an interpolation point). In this case, bilinear interpolation using four peripheral pixels is often used. When this interpolation method is described by mathematical formulas, the luminance value R (x + α, y + β) at the interpolation point (x + α, y + β) of the reference image, where α and β (0 ≦ α, β <1) are the decimal components of the coordinate value. Is
[0006]
[Expression 2]

[0007]
It is represented by
[0008]
As described above, half-pixel precision block matching is widely used at present. However, an application that requires a higher information compression rate than MPEG1 or MPEG2 requires a more advanced motion compensation method. The disadvantage of block matching is that all pixels in the block must have the same motion vector. In order to solve this problem, a motion compensation method that allows adjacent pixels to have different motion vectors has been recently proposed. The following is a brief description of motion compensation based on spatial transformation, which is an example of this method.
[0009]
In motion compensation based on spatial transformation, the relationship between the predicted image P and the reference image R is
[0010]
[Equation 3]

[0011]
It is represented by However, assuming that the image is divided into n small regions (patches), Pi represents a pixel included in the i-th patch of the image. Also, the conversion functions fi (x, y) and gi (x, y) represent the spatial correspondence between the current frame image and the reference image. At this time, the motion vector of the pixel (x, y) in Pi can be expressed by (x-fi (x, y), y-gi (x, y)). By the way, block matching can be interpreted as a special example of motion compensation based on spatial transformation as a method in which the transformation function is a constant. However, when the term motion compensation based on spatial transformation is used in this specification, block matching is not included therein.
[0012]
As the form of the transformation function, affine transformation
[Expression 4]

[0014]
(Nakaya et al., “Basic study of motion compensation based on triangular patch”, IEICE Technical Report, IE90-106, Hei 2-03), co-linear transformation
[Equation 5]

[0016]
(GJ Sullivan and RL Baker, "Motion compensation for video compression using control grid interpolation", Proc. ICASSP '91, M9.1, pp.2713-2716, 1991-05)
Etc. have been reported. Here, aij and bij are motion parameters estimated for each patch. When actual image encoding is performed, the motion vectors at the vertices of the patch are transmitted instead of directly transmitting aij and bij. For example, if affine transformation is adopted as the transformation function and a triangular patch is used, the motion parameter aij can be calculated from the motion vectors of the three vertices of the patch. Therefore, on the receiving side, the same transformation function as that on the transmitting side can be configured from the received vertex motion vectors. On the other hand, when bilinear transformation is used as the transformation function, similar processing can be realized by using a rectangular patch and transmitting the motion vectors of the four vertices of the patch. In the following, a case where an affine transformation is used as a transformation function will be described. However, this explanation can be applied almost as it is even when a bilinear transformation is used.
[0017]
Even if the transformation function is determined, various variations can be considered for motion compensation based on spatial transformation. One example is shown in FIG. In this example, the motion vector is constrained to change continuously at the patch boundary. In the following, it is considered to synthesize a predicted image of the original image 102 of the current frame using the reference image 101. For this purpose, the current frame is first divided into a plurality of polygonal patches, and the patch-divided image 108 is obtained. The vertexes of the patches are called lattice points, and each lattice point is shared by a plurality of patches. For example, the patch 109 is composed of grid points 110, 111, and 112, and these grid points also serve as vertices of other patches. Thus, after the image is divided into a plurality of patches, motion estimation is performed. In the example shown here, motion estimation is performed between each grid point and a reference image. As a result, each patch is transformed in the reference image 103 after motion estimation. For example, the patch 109 corresponds to the deformed patch 104. This is because it is estimated that the lattice points 105, 106, and 107 have moved to 110, 111, and 112, respectively, as a result of the motion estimation. The predicted image is synthesized by calculating a conversion function for each pixel in the patch and obtaining the luminance value of the corresponding point from the reference image according to Equation 3. The process of applying a deformation operation to a part of one image and pasting it on another image in this way is called texture mapping or image warping. As described above, this can be realized by calculating the six motion parameters of Equation 4 from the motion vectors of the three vertices and calculating Equation 4 for each pixel.
[0018]
Motion compensation based on spatial transformation is characterized by being able to handle motion patterns that cannot be handled by block matching, such as rotation and enlargement. On the other hand, there is a problem that the amount of processing is large, but as a method for simplifying this, the method shown in Japanese Patent Application No. 06-193971 has been devised.
[0019]
Another disadvantage of the motion compensation method based on the spatial transformation shown in FIG. 1 is that it cannot cope with discontinuity of motion vectors. In general, in a moving image, discontinuity of motion vectors occurs at a boundary portion between a moving object and a stationary object. However, since the motion vector continuity (the motion vector continuously changes not only in the patch but also at the boundary of the patch) is assumed in the method shown in FIG. 1, in the reproduced image after decoding, Problems such as unnatural distortion occur at the boundary between the moving object and the stationary object.
[0020]
[Problems to be solved by the invention]
Since a motion model based on a simple parallel movement is used in block matching, the motion of an object in an image cannot be sufficiently approximated. On the other hand, in motion compensation based on spatial transformation, since continuity of motion vectors is assumed, there arises a problem that unnatural distortion occurs at the boundary between a moving object and a stationary object in a reproduced image.
[0021]
[Means for Solving the Problems]
The image is divided into blocks, and a plurality of motion vectors are involved in each block. Then, using these motion vectors, an optimal motion compensation method can be selected from among a plurality of candidates.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 2 divides an image into n blocks in the horizontal direction and m blocks in the vertical direction (n and m are positive integers), and (n + 1) × (m + 1) lattice points at positions where the vertices of one or more blocks overlap. It shows how the points called are arranged. For example, 201 and 202 are blocks, and 203, 204, 205, and 206 are grid points. Note that the grid point does not necessarily exist on the top of the block, but in this specification, it is assumed that the grid point exists at the position shown in FIG.
[0023]
Assume that each lattice point can have a motion vector. As a motion vector estimation method, a method of performing block matching using a block having a lattice point at the center can be considered. Assume that the predicted image in each block is synthesized using the motion vectors of four lattice points located at the vertices of the block. For example, when the block 201 is considered, motion vectors of four grid points of grid points 203, 204, 205, and 206 are used. Thus, four blocks can be involved in every block in the image.
[0024]
There are a plurality of methods for performing motion compensation using four lattice points. As one of them, there is a method in which one of four lattice points is selected and the motion vector of the lattice point is used as the motion vector of the entire block. For example, in the block 201 of FIG. 2, a grid point having an optimal motion vector can be selected from the grid points 203, 204, 205, and 206. The average value of the motion vectors of the four grid points may be used as the motion vector of the block. Furthermore, if the motion vector of one or two grid points is significantly different from the others, the average value is calculated after removing the maximum and minimum values from the horizontal and vertical components of the four motion vectors. You can also think about how to find it. The method of synthesizing the predicted image is the same as that of block matching, and good prediction characteristics can be obtained with a simple calculation. However, sufficient characteristics cannot be obtained when there are a plurality of objects moving in different directions in the same block.
[0025]
FIG. 3 shows an example in which an image is divided into a plurality of right triangles. In the example of the block 301, the triangle is divided into a triangle constituted by the lattice points 303, 305 and 304 and a triangle constituted by the lattice points 304, 305 and 306. In the example of the block 302, the triangle is divided into a triangle formed by the lattice points 307, 310, and 308 and a triangle formed by the lattice points 307, 309, and 310. As a method for synthesizing a predicted image when such division is performed, first, a method of assigning one motion vector to each of the four motion vectors involved in the block can be considered. For example, in the example of the block 301, the motion vector of the grid point 303 is used for a triangle formed by the grid points 303, 305, and 304, and the grid point is set for a triangle formed by the grid points 304, 305, and 306. 306 motion vectors can be assigned. This method is slightly more complicated than the above-described method in which one motion vector is assigned to the entire block, but the range of motion that can be handled is increased accordingly. For example, when the boundary between the moving object and the stationary object is located on the diagonal line connecting the lattice points 304 and 305 in the 301 block division, it can be expected to show high prediction characteristics. If one of the example shown in block 301 and the example shown in block 302 can be selected as a block division method, the range of motion patterns that can be handled can be further expanded. It is obvious that there are means other than the method shown in FIG. 3 as a method for dividing the block into two parts. Even in the division other than the method of FIG. 3, if one of four motion vectors is assigned to each part, the same processing can be realized.
[0026]
As another method for synthesizing a predicted image using the division of FIG. 3, a method using affine transformation expressed by Equation 4 can be considered. For example, in the example of the block 301, by using the motion vectors of the grid points 303, 305, and 304, it is possible to synthesize a predicted image based on the affine transformation for a triangle formed by these grid points. Similar processing can also be performed on a triangle formed by the grid points 305, 306, and 304. Here, a technique common to the motion compensation based on the spatial transformation described in “Prior Art” is used. This method is more complicated than the method described above, but has an important feature that can cope with the rotation and enlargement / reduction of the object. However, since the motion vectors are continuous in the block, good characteristics cannot be obtained when there are a plurality of objects having different motions in the block. Also in this method, the prediction characteristics can be further improved by enabling selection of two methods for block division.
[0027]
As a method for synthesizing a predicted image that is very similar to the example using affine transformation, there is a method using bilinear transformation shown in Equation 5. For example, in the case of the block 201 in FIG. 2, the eight parameters represented by bij in the equation 5 can be determined by using the four motion vectors of the lattice points 203, 204, 205, and 206. Thus, the prediction image may be synthesized for the pixels in the patch using a technique common to motion compensation based on spatial transformation. The method using the bilinear transformation shows characteristics similar to those of the method using the affine transformation, but the processing is somewhat complicated. However, depending on the motion pattern, better characteristics may be shown than when affine transformation is used, so it is convenient to put it in the choice of prediction image synthesis method.
[0028]
FIG. 4 shows an example of a prediction image synthesis method for dividing a block into four parts. A quadrangular block 401 composed of

lattice points

402, 403, 404, and 405 is divided into four

small blocks

406, 407, 408, and 409. For example, the small block 406 has a motion vector at a lattice point 402, the small block 407 has a motion vector at a lattice point 404, the small block 408 has a motion vector at a lattice point 403, and the small block 409 has a motion vector at a lattice point 405. Can be used to realize block matching using four small blocks. Since this method divides the block into four parts, it can cope with the case where there are more objects that move differently compared to the method described above. However, it cannot sufficiently cope with the rotation and enlargement / reduction of the object. It is obvious that there are means other than the method shown in FIG. 4 as a method for dividing the block into four parts. In the division other than the method of FIG. 4, the same processing can be realized by assigning one of the four motion vectors to each part.
[0029]
As described above, each of the prediction image synthesis methods in the block taken up here has advantages and disadvantages. A moving image generally has a feature that a motion pattern is greatly different for each region in the image. For example, the motion pattern is completely different in a stationary region, a parallel moving region, a rotating region, a boundary between a moving region and a stationary region, and the like. Therefore, even if the image is divided into blocks so that local correspondence can be made for each region, sufficient prediction characteristics can be obtained if there is only one method for synthesizing the predicted image (= conventional block matching or the like). Can not. Therefore, by making it possible to select an optimum method for each block from among a large number of prediction image synthesis methods listed above, it is possible to improve prediction characteristics. If only the error of the predicted image is considered, it is advantageous in improving the characteristics to have more methods for synthesizing the predicted image having different characteristics.
[0030]
It is not always necessary to select only one prediction image synthesis method for each block. The prediction characteristics can be further improved by selecting two synthesis methods (for example, affine transformation and dividing into four blocks) and taking the average value of luminance values between them as a predicted image in the block. Is possible. In general, it is known that by averaging different predicted images, the effects of noise removal and a low-pass filter are obtained, and the prediction characteristics are improved. In standard video coding systems such as MPEG1 and MPEG2, it is possible to perform an operation of averaging images predicted from two directions (forward and reverse methods) (= bi-directional prediction). Because. It is obvious that the invention described in this specification can be applied to this bidirectional prediction.
[0031]
【The invention's effect】
According to the present invention, it is possible to improve prediction characteristics in motion compensation processing of moving image coding.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating an example of motion compensation processing based on spatial transformation.
FIG. 2 is a diagram showing an example of arrangement of blocks and grid points.
FIG. 3 is a diagram showing an example of a method for performing motion compensation by dividing a block into two right triangles.
FIG. 4 is a diagram illustrating an example of a method of performing motion compensation by dividing a block into four small blocks.
[Explanation of symbols]
101 ... Reference image, 102 ... Original image of current frame, 103 ... Patch and grid points of reference image after motion estimation, 104, 109 ... Patch, 105-107, 110-112, 203-206, 303-310, 402 405 to lattice points, 201, 202, 301, 302, 401 blocks, and 406 to 409 small blocks.

Claims

In video encoding or decoding ,
The image is divided into n horizontal and m vertical (n and m are positive integers) square or rectangular blocks so that there are four grid points having a motion vector for each block .
A motion compensated prediction image in the block is synthesized by motion compensation using at least one of the four motion vectors of the grid points existing in the block ;
A method of synthesizing a motion compensated prediction image is selected from two or more candidates for each block ,
The candidate includes a method of using an average of two predicted images in the block generated by two different synthesis methods for each block as a predicted image in the block .

The motion compensation method according to claim 1, wherein the candidate includes a method in which all the pixels in the block follow a motion vector of the four motion vectors .

The motion compensation method according to claim 1, wherein the candidate includes a method in which all pixels in the block follow an average value of the four motion vectors.

The candidate, the block is divided into two or more parts, characterized in that it comprises a method of pixels in said portion is to follow the one in the motion vector of the lattice points existing partial The motion compensation method according to claim 1.

5. The motion compensation method according to claim 4 , wherein the division is divided into four parts .

The candidate divides the block into two right triangles diagonally , selects three of the motion vectors of the grid points existing in the right triangle , and the selected three motion vectors motion compensation method according to claim 1, characterized in that it comprises a method for determining a motion vector in the perpendicular triangle by the affine transformation determined by.

The motion compensation method according to claim 1, wherein the candidate includes a method of obtaining a motion vector in the block by bilinear transformation determined by the four motion vectors.