JP3634878B2

JP3634878B2 - Image encoding device

Info

Publication number: JP3634878B2
Application number: JP19397194A
Authority: JP
Inventors: 雄一郎中屋; 淳一木村
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1994-08-18
Filing date: 1994-08-18
Publication date: 2005-03-30
Anticipated expiration: 2020-03-30
Also published as: JPH0865676A

Description

【０００１】
【産業上の利用分野】
本発明は、同一パッチに所属するすべての画素が共通の動きベクトルを持つ制約がなく、画素の動きベクトルの水平・垂直成分が隣接画素間距離の整数倍以外の値を取り得る動き補償方式を実行する画像符号化装置に関するものである。
【０００２】
【従来の技術】
動画像の高能率符号化において、時間的に近接するフレーム間の類似性を活用する動き補償は情報圧縮に大きな効果を示すことが知られている。現在の画像符号化技術の主流となっている動き補償方式は、動画像符号化方式の国際標準であるＭＰＥＧ１およびＭＰＥＧ２にも採用されている半画素精度のブロックマッチングである。この方式では、符号化しようとする画像を多数のブロックに分割し、ブロックごとにその動きベクトルを水平・垂直方向に隣接画素間距離の半分の長さを最小単位として求める。この処理を数式を用いて表現すると以下のようになる。符号化しようとするフレーム（現フレーム）の予測画像をＰ（ｘ，ｙ）、参照画像（Ｐと時間的に近接しており、既に符号化が完了しているフレームの復号画像）をＲ（ｘ，ｙ）とする。また、ｘとｙは整数であるとして、ＰとＲでは座標値が整数である点に画素が存在すると仮定する。このとき、ＰとＲの関係は、
【０００３】
【数１】

【０００４】
で表される。ただし、画像はｎ個のブロックに分割されるとして、Ｂｉは画像のｉ番目のブロックに含まれる画素、（ｕｉ，ｖｉ）はｉ番目のブロックの動きベクトルを表している。動きベクトル（ｕｉ，ｖｉ）の推定方式として最も一般的に用いられているのは、（ｕｉ，ｖｉ）に一定の探索範囲を設け（例えば−１５≦ｕｉ，ｖｉ≦１５）、その中でブロック内の予測誤差Ｅｉ（ｕｉ，ｖｉ）を最小化するものを探索するという方式である。予測誤差の評価基準として平均絶対誤差を用いた場合、Ｅｉ（ｕｉ，ｖｉ）は、符号化しようとしているフレームの原画像をＩ（ｘ，ｙ）として、
【０００５】
【数２】

【０００６】
で表される。ただし、Ｎｉはｉ番目のブロックに含まれる画素の数である。このように、異なる動きベクトルについてそれぞれ予測誤差を評価し、この誤差が最も小さい動きベクトルを探索する処理をマッチング処理とよぶ。一定の探索範囲の中ですべての（ｕｉ，ｖｉ）についてＥｉ（ｕｉ，ｖｉ）を計算し、その最小値を探索することを全探索とよぶ。
【０００７】
半画素精度のブロックマッチングでは、ｕｉとｖｉはそれぞれ画素間距離の半分、つまりこの場合は１／２を最小単位として求められることになる。したがって、座標値が整数ではなく、参照画像において実際には画素が存在しない点（以後、このような点を内挿点とよぶ）の輝度値を求めることが必要となる。この際の処理としては、周辺４画素を用いた共１次内挿が使われることが多い。この内挿方式を数式で記述すると、座標値の小数成分をｐとｑ（０≦ｐ，ｑ＜１）として、参照画像の内挿点（ｘ＋ｐ，ｙ＋ｑ）における輝度値Ｒ（ｘ＋ｐ，ｙ＋ｑ）は、
【０００８】
【数３】

【０００９】
で表される。
【００１０】
半画素精度のブロックマッチングを行うシステムでは、まず広い探索範囲に対して１画素精度の全探索を行なって大まかに動きベクトルを推定した後に、この動きベクトルの周辺のごく狭い範囲（例えば、縦横±半画素の範囲）に対して半画素精度の全探索を行なう２段階探索が広く用いられている。そして、第２段階の探索を行う際には、まず内挿点における輝度値を事前に求めてから探索を行う方法が良く用いられる。この方法の処理の例を図１に示す。この例では縦横４画素のブロックを使用している。参照画像において、座標値が整数でもともと画素が存在する点を「○」で表し、新たに輝度値を求めた内挿点を「×」で表す。また、現フレームの原画像のブロックの画素を「□」で表す。第１段階の探索で得られた動きベクトルを（ｕｃ，ｖｃ）とする。マッチングの例１０１は、第１段階の探索で、動きベクトルが（ｕｃ，ｖｃ）であるときのマッチングの様子を示している。予測誤差の評価は重なった「○」と「□」の間で行われる。例１０２、１０３、１０４はそれぞれ第２段階の探索で動きベクトルが（ｕｃ＋１／２，ｖｃ）、（ｕｃ＋１／２，ｖｃ＋１／２）、（ｕｃ − １／２，ｖｃ − １／２）であるときを表している。これらの例では、予測誤差の評価は重なった「×」と「□」の間で行われる。この図からわかる通り、第２段階の探索における探索範囲を縦横±１／２画素とした場合、６５個（図１の「×」の数）の内挿点の輝度値を求めることにより、動きベクトル８個分のマッチング処理をカバーすることができる。このとき、輝度値を求めた内挿点はすべてマッチングに使用される。マッチングのたびに参照画像において内挿の計算をすると、４×４×８＝１２８回の内挿を行う必要がある。このように内挿演算の回数を減らすことができるのは、参照画像の同じ内挿点が複数回使用されるためである。
【００１１】
半画素精度のブロックマッチングは上で述べた通り、現在広く用いられているが、ＭＰＥＧ１やＭＰＥＧ２より高い情報圧縮率が必要となるアプリケーションではさらに高度な動き補償方式が要求される。ブロックマッチングの欠点はブロック内のすべての画素が同一の動きベクトルを持たなければならない点にある。そこでこの問題を解決するために、隣接する画素が異なる動きベクトルを持つことを許容する動き補償方式が最近提案されている。以下にこの方式の一例である空間変換に基づく動き補償に関して簡単に説明する。
【００１２】
空間変換に基づく動き補償では、予測画像Ｐと参照画像Ｒの関係は、
【００１３】
【数４】

【００１４】
で表される。ただし、画像はｎ個の小領域（パッチ）に分割されるとして、Ｐｉは画像のｉ番目のパッチに含まれる画素を表している。また、変換関数ｆｉ（ｘ，ｙ）とｇｉ（ｘ，ｙ）は現フレームの画像と参照画像との間の空間的な対応を表現している。このとき、Ｐｉ内の画素（ｘ，ｙ）の動きベクトルは、（ｘ−ｆｉ（ｘ，ｙ），ｙ−ｇｉ（ｘ，ｙ））で表すことができる。ところで、ブロックマッチングは変換関数が定数である方式として、空間変換に基づく動き補償の特殊な例として解釈することもできる。しかし、本明細書で空間変換に基づく動き補償という言葉を用いるときには、ブロックマッチングはその中に含まないこととする。
【００１５】
変換関数の形としては、アフィン変換
【００１６】
【数５】

【００１７】
を用いた例（中屋他、「３角形パッチに基づく動き補償の基礎検討」、電子情報通信学会技術報告、ＩＥ９０−１０６、平２−０３参照）、共１次変換
【００１８】
【数６】

【００１９】
を用いた例（Ｇ．Ｊ．ＳｕｌｌｉｖａｎａｎｄＲ．Ｌ．Ｂａｋｅｒ， ”Ｍｏｔｉｏｎｃｏｍｐｅｎｓａｔｉｏｎｆｏｒｖｉｄｅｏｃｏｍｐｒｅｓｓｉｏｎｕｓｉｎｇｃｏｎｔｒｏｌｇｒｉｄｉｎｔｅｒｐｏｌａｔｉｏｎ”，Ｐｒｏｃ．ＩＣＡＳＳＰ ’９１，Ｍ９．１，ｐｐ．２７１３−２７１６，１９９１−０５）、透視変換
【００２０】
【数７】

【００２１】
を用いた例（Ｖ．ＳｅｆｅｒｄｉｓａｎｄＭ．Ｇｈａｎｂａｒｉ， ”Ｇｅｎｅｒａｌａｐｐｒｏａｃｈｔｏｂｌｏｃｋ−ｍａｔｃｈｉｎｇｍｏｔｉｏｎｅｓｔｉｍａｔｉｏｎ’’，ＯｐｔｉｃａｌＥｎｇｉｎｅｅｒｉｎｇ，ｖｏｌ．３２，ｎｏ．７，ｐｐ．１４６４−１４７４，１９９３−０７）などが報告されている。ここでａｉｊ、ｂｉｊ、ｃｉｊはパッチごとに推定される動きパラメータである。符号化装置で得られる予測画像と同じものを受信側で得るためには、画像符号化装置は何らかの形でパッチごとに変換関数の動きパラメータが特定できる情報を、動き情報として受信側に伝送すればよい。例えば、変換関数にアフィン変換を用い、パッチの形状が３角形であるとする。この場合は、６個の動きパラメータを直接伝送しても、パッチの３個の頂点の動きベクトルを伝送しても、受信側で６個の動きパラメータを再生することができる（共１次変換を用いた場合には４角形のパッチを採用すれば同様の処理が可能である）。以下では、変換関数にアフィン変換を用いた場合に関して説明するが、この説明は他の変換を用いた場合についても、ほぼそのまま適用することができる。
【００２２】
変換関数が確定しても空間変換に基づく動き補償には様々なバリエーションを考えることができるが、その一例を図２に示す。この例では、パッチの境界において動きベクトルが連続的に変化するように制約されている。以下では、参照画像２０１を用いて現フレームの原画像２０２の予測画像を合成することを考える。このために、まず現フレームは複数の多角形のパッチに分割され、パッチ分割された画像２０８となる。パッチの頂点は格子点とよばれ、各格子点は複数のパッチに共有される。例えば、パッチ２０９は、格子点２１０、２１１、２１２から構成され、これらの格子点は他のパッチの頂点を兼ねている。こうして画像を複数のパッチに分割した後に、動き推定が行なわれる。ここに示す例では、動き推定は各格子点を対象として参照画像との間でを行なわれる。この結果、動き推定後の参照画像２０３で各パッチは変形されたものとなる。例えば、パッチ２０９は、変形されたパッチ２０４に対応している。これは、動き推定の結果、格子点２０５、２０６、２０７がそれぞれ２１０、２１１、２１２に移動したと推定されたためである。予測画像はパッチ内の各画素に関して変換関数を計算し、数４にしたがって参照画像の中から対応する点の輝度値を求めることにより合成される。これは、この例では３個の頂点の動きベクトルから数５の６個の動きパラメータを計算し、画素ごとに数５を計算することにより実現される。上で述べた通り、動き情報として格子点の動きベクトルを伝送しても、動きパラメータを伝送しても良いが、この例では１個の格子点が複数のパッチの頂点を兼ねている分だけ前者の方が効率的である。
【００２３】
空間変換に基づく動き補償においてもブロックマッチングと同様に、マッチングに基づく動き推定が有効であることが指摘されている。マッチングに基づく動き補償のアルゴリズムの一例を以下に示す。この方式は６角マッチングとよばれ、上の例のようにパッチの境界で動きベクトルが連続的に変化する場合に有効である。この方式は、以下の２つの処理により構成されている。
【００２４】
（１）ブロックマッチングによる格子点の大まかな動き推定
（２）修正アルゴリズムによる動きベクトルの修正
（１）の処理では、格子点を含むブロック（大きさは任意）に対してブロックマッチングを適用し、このブロックの動きベクトルを格子点の大まかな動きベクトルとする。この処理の目的はあくまで格子点の大まかな動きベクトルを求めることであって、必ずブロックマッチングを用いなければならないわけではない。（２）の処理の様子は図３に示す。この図は参照画像におけるパッチと格子点の一部を示したもの（図２の画像２０３に相当する）である。したがって、この図の中で格子点の位置を変化させることは、その格子点の動きベクトルを変化させることを意味する。格子点３０１の動きベクトルを修正する場合、まずこの格子点が関与するすべてのパッチによって構成される多角形３０２の頂点にあたる、格子点３０３〜３０８の動きベクトルを固定する。こうして格子点３０１の動きベクトルを一定の範囲内で変化させる（例えば格子点３０１を格子点３０９の位置に移動させる）と、その結果多角形３０２が含むパッチ内の予測誤差も変化する。そして、探索範囲内で多角形３０２内の予測誤差を最小にした動きベクトルが、格子点３０１の修正された動きベクトルとして登録される。こうして格子点３０１の修正が終了し、他の格子点に移動してから同様の修正を続ける。一旦すべての格子点に対して修正を行なった後に、もう一度最初の格子点から繰り返し修正を行なえば、さらに予測誤差を小さくすることができる。この繰り返しの回数としては、２〜３回が適当であることが報告されている。
【００２５】
修正アルゴリズムにおける典型的な探索範囲は縦横±３画素である。この場合、１個の格子点について１回の修正で４９回のマッチングが多角形３０２内で行なわれる。一方で、１個のパッチは３個の格子点の修正アルゴリズムに関与するため、パッチ内の各画素に対して合計で１４７回予測誤差の評価が行なわれることになる。さらにこの修正が繰り返し行なわれれば、そのたびに誤差評価の回数はさらに増えることになる。この結果、誤差評価が行なわれるたびに対象となる画素に対して内挿の計算が行なわれ、演算量が膨大となる。
【００２６】
空間変換に基づく動き補償における内挿演算の問題は、半画素精度のブロックマッチングにおける同様の問題と比較して以下の点で本質的な違いがあるためにやっかいである。空間変換に基づく動き補償では、たとえ格子点の動きベクトルの水平・垂直成分を１／２の整数倍に制限したとしても、各画素の動きベクトルの水平・垂直成分は同様に１／２の整数倍とはならない。また、一般的に各画素の動きベクトルの小数点以下の成分は任意の値がとれるため、マッチング処理において参照画像の同じ内挿点の輝度値が複数回用いられることはむしろまれである。
【００２７】
【発明が解決しようとする課題】
空間変換に基づく動き補償においてマッチングに基づく動き推定を行なう場合、輝度値の内挿に要する演算量が膨大となる問題が発生する。本発明の目的は、輝度値の内挿の計算回数を減らし、少ない演算量で動き推定の処理を実現することにある。
【００２８】
【課題を解決するための手段】
動き推定処理の前に、参照画像においてｘ座標とｙ座標がそれぞれ１／ｍ１と１／ｍ２の整数倍である点（ｍ１とｍ２は正の整数）の輝度値を内挿により求めた高精細参照画像を用意する。したがって、高精細参照画像においては、ｘ座標とｙ座標がそれぞれ１／ｍ１と１／ｍ２の整数倍である点に画素が存在することになる。動き推定処理において、座標値が整数でない位置における参照画像の輝度値が必要となった場合には、高精細参照画像の中でこの座標に最も近い位置に存在する画素の輝度値で近似することにより、内挿演算の回数を減らす目的は達成される。
【００２９】
【作用】
上記の高精細参照画像を作成する処理では、原画像の１画像あたりｍ１×ｍ２−１回の内挿計算が必要となる。しかし、一旦この高精細化処理を終えてしまうと、動き推定処理においてこれ以上に内挿の演算を行なう必要はなくなる。「従来の技術」でとりあげた空間変換に基づく動き補償の例では、動き推定処理で１画素あたり１４７回以上の内挿演算が必要である。したがって、ｍ１＝ｍ２＝２とすれば、内挿演算の回数は５０分の１程度にすることができ、演算量の大幅な低減につながる。
【００３０】
【実施例】
第１の実施例として、参照画像全体を高精細化してから動き推定処理を行う方式を示す。まず、高精細参照画像Ｒ’を作成する。輝度値の内挿方式に共１次内挿（数３）を用いるとして、Ｒ’は以下の式で表される。
【００３１】
【数８】

【００３２】
ただし、ｓとｔは整数であり、かつ０≦ｓ＜ｍ１，０≦ｔ＜ｍ２であるとする。また、Ｒ’ではｘ、ｙ、ｓ、ｔがすべて整数である点に画素が存在するとする。ｓ＝ｔ＝０である点は、もともと参照画像に存在する画素に対応しており、それ以外の点の輝度値は内挿によって求められる。なお、以下では簡単のため、特にｍ１＝ｍ２＝ｍ（ｍは正の整数）である場合について実施例を挙げる。
【００３３】
図４に高精細参照画像を活用する画像符号化装置の例を示す。なお、図の矢印はデータの流れを表しており、アドレス信号などは省略してある。この装置では、動き推定部４０１が動き推定の処理を担当している。参照画像４０４は高精細化の処理を行う参照画像高精細化回路４０５で処理された後、高精細参照画像４０６としてフレームメモリ４０７に蓄えられ、マッチング処理回路４０９へ近似輝度値情報４０８を提供する。一方で現フレームの原画像４０２もフレームメモリ４０３に蓄えられ、マッチング処理回路４０９で動き推定のために活用される。マッチング処理回路が出力した動き情報４１５は受信側に伝送されるが、符号化装置内でも予測画像合成回路４１０における予測画像の合成のために活用される。合成された予測画像と現フレームの原画像４１１との差は減算器４１２で求められ、予測誤差４１３として、予測誤差符号化器４１４で符号化され、予測誤差情報４１６として伝送される。従来の方式では、変換関数の演算、内挿処理、予測誤差の評価がすべてマッチング回路で行なわれていたのに対し、本実施例では内挿処理を事前に参照画像高精細化回路で行なうことにより演算量を低減させている。また、高精細参照画像を用いることにより、変換関数に求められる演算精度を下げることができ、さらに処理を簡略化することができる。これは、変換関数の演算に誤差が発生した場合に、高精細参照画像で近似に用いる画素が異なるものにならない範囲であれば、動き推定の結果に影響がないためである。なお、ここで内挿により輝度値を求められた高精細参照画像の画素は、すべてがマッチング処理で使用されるとは限らない。この点は「従来の技術」でとりあげたブロックマッチングの例とは異なっている。
【００３４】
図５にｍ＝２として、輝度値の内挿に共１次内挿を用いた場合の参照画像高精細化回路４０５の例を示す。この図でも矢印はデータの流れを表しており、図４と同じ参照番号は同じものをさしている。入力の参照画像信号４０４は、上から下へラインごとに左から右へ画素の輝度値を与えるとする。この信号を２個の画素遅延回路５０２、５０３と１個のライン遅延回路５０１より構成された回路に加えることにより、上下左右に隣接する４個の画素の輝度値５０４〜５０７を得ることができる。これらの輝度値それぞれにに対して積算器５０８〜５１１を用いて内挿位置に応じて重み付け係数を掛け、この結果を加算器５１２〜５１４に加える。この結果をさらに加算器５１５とシフトレジスタ５１６に加えることにより、４で割った後に小数点以下を四捨五入する演算を実現する。以上の処理の結果、高精細参照画像Ｒ’の４画素分の輝度値５１７〜５２０を出力４０６として得ることができる。
【００３５】
図６に、マッチング処理回路４０９内において、高精細参照画像Ｒ’を用いて参照画像の内挿点における輝度値の近似値を得る回路の一例を示す。図４と同じ参照番号は同じものをさしている。ここでは、変換関数を計算することにより、２進固定小数点表現の座標値６０１、６０２が与えられていると仮定する。また、ｍは図５と同様に２であるとして、高精細参照画像Ｒ’はフレームメモリ４０７に蓄えられているとする。座標値６０１、６０２は１／４を加える加算器６０３、小数点以下２桁目以下を切り捨てる回路６０４を通過することにより、小数点以下２桁目を四捨五入され、１／２の整数倍に変換される。この結果得られる座標値６０５、６０６は、高精細参照画像Ｒ’において画素の存在する点の座標値に対応している。この座標値は座標−アドレス変換回路６０７によりフレームメモリ４０７のアドレスに変換され、目的の近似輝度値情報４０８を得る。なお、この例では変換関数の小数点以下３桁目以下の成分は全く使用されていない。したがって、変換関数の小数点以下２桁目以上に影響しない範囲の演算誤差は動き推定結果に影響を与えないことになる。これは上で述べたように、高精細参照画像を用いることにより、変換関数に要求される演算精度が下がったためである。
【００３６】
上で挙げた実施例では内挿演算の回数が減少する一方で、高精細参照画像を格納するために参照画像の４倍の大きさの画像を蓄えるメモリが必要となる。そこで内挿演算の回数は上の実施例よりは多くなるものの、必要となるメモリ容量の小さい方式の実施例を示す。この方式では、現フレームの原画像と参照画像の必要な部分を少しずつ取り込みながら参照画像を高精細化して動き推定に用いる。隣接画素間距離は、現フレームの原画像、参照画像共に水平・垂直方向に１であるとする。なお、ここでは動き推定方式に６角マッチングが使用されていると仮定して、６角マッチングにおける修正処理を実行する回路を中心に説明する。６角マッチングのもう一方の処理である格子点の大まかな動き推定は、「従来の技術」で説明した通り、格子点を含むブロックに対してブロックマッチングを実行することにより行われる。
【００３７】
図７に現フレームの原画像の一部における格子点７０３〜７１１の位置を示す。格子点の間隔を縦横にＮｇ、格子点ごとの動きベクトルの探索範囲を水平・垂直方向に±Ｎｓであるとすれば、参照画像の縦横２Ｎｇ＋２Ｎｓの範囲７０１と、現フレームの原画像の縦横２Ｎｇの範囲７０２（斜線の範囲）に含まれる画素を使えば、格子点７０３に関する６角マッチングの修正処理を行うことができる（実際にはこれらより狭い範囲でも良いが、処理を簡略化するために正方形の領域を用いる）。したがって、修正処理を行う装置はこれらの範囲に含まれる画素の輝度値をあらかじめ読み込むことにより、以後の処理を外部のフレームメモリとは独立した状態で実行することができる。またこの場合、格子点７０３の修正処理を行う前に格子点７０８の修正処理を行うようにすれば、既に範囲７０１および７０２の画素の一部が修正処理を行う装置に読み込まれていることになる。したがって、このときには図８に示すように参照画像の範囲８０１および現フレームの原画像の範囲８０２のみを追加して読み込むようにすれば良い（図８において図７と同じ参照番号は同じものをさしている）。この追加読み込みのときには、格子点７０８の動き推定処理に使用した画素情報の一部は必要なくなるので、この情報が入っていたメモリの上に範囲８０１および８０２の情報を書き込んでも良い。このようにして、動き推定を行う格子点が左から右に移動するごとに、新たに必要な情報のみを読み込むようにすれば処理を簡略化することができる。
【００３８】
図９に、図７および８に示した方式に従って６角マッチングの修正処理を行う装置の動き推定部９０９の例を示す。この図において矢印はデータの流れを表しており、図４と同じ参照番号は同じものをさしている。動き推定部９０９は、図４の動き推定部４０１と異なる構成で同じ働きをする。入力の現フレームの原画像４０２と参照画像４０４はそれぞれフレームメモリ９０１と９０３に蓄えられる。これに対してまず格子点の大まかな動き推定が回路９０２で実行され、求められた動きベクトルに従って参照画像における格子点の座標の情報が格子点座標メモリ９０４に蓄えられる。続いて修正処理部９０５が６角マッチングにおける修正処理を行う。以下では、図８の例にならって、直前に格子点７０８に対する処理が行われた場合の格子点７０３に対する修正処理に関して説明する。修正処理部９０５は、高精細化回路９０７とマッチング処理回路９０６からなっている。まず高精細化回路９０７は、参照画像が蓄えられたフレームメモリ９０３から新たに必要な範囲（図８の例では、範囲８０１）の画素の輝度値情報を読み出す。この情報に対して内挿処理を行い、動き推定において必要となる範囲の高精細参照画像を作成して、マッチング処理回路９０６に与える。マッチング処理回路では、同様に現フレームの原画像のフレームメモリ９０１からも新たに必要な範囲（図８の例では、範囲８０２）の輝度値情報を読み込む。マッチング処理回路は自分自身で修正処理に必要な範囲の高精細参照画像と現フレームの原画像を蓄えるフレームメモリを持っており、このメモリを利用して処理を行う。マッチング処理回路はさらに格子点座標メモリ９０４から参照画像における格子点の座標情報の中で新たに必要となったもの（図８の例では、格子点７０４、７０６、７１１の座標情報。これは格子点７０７、７０８、７１０の座標情報が前の処理で使用されているため。）を読み込み、６角マッチングの修正処理を行う。この処理結果に従い、修正された参照画像における格子点の座標（図８の例では、格子点７０３の座標）を格子点座標メモリ９０４に書き込む。以上で格子点７０３の修正は終了し、修正処理部は図８の格子点７０４の修正処理に移る。修正処理がすべて終了すると、格子点座標メモリ９０４に蓄えられた情報は動きベクトル演算回路９０８において格子点ごとの動きベクトルに変換されて動き情報４１５として出力される。また、予測画像の予測誤差を計算するために、現フレームの原画像の情報４１１も出力される。
【００３９】
図１０に図９の処理において並列処理を導入した例を示す。図９と同じ参照番号は同じものをさしている。この例では、６角マッチングにおける修正処理を行う修正処理部が複数存在し、処理を分担している。参照画像と現フレームの原画像を蓄えているフレームメモリ９０１と９０３から輝度値情報を読み出すためには、共通のデータバス１００１とアドレスバス１００２を利用する。また、参照画像における格子点の座標を蓄えた格子点座標メモリ９０４から情報を読み出したり、情報を書き込む場合には共通のデータバス１００５とアドレスバス１００４を用いる。これらのバスを介して、格子点の大まかな動き推定を行う回路９０２、６角マッチングの修正処理を行う回路９０５、１００３は情報の転送を行う。修正処理部９０５と１００３は同じ構成である。同様の構成の修正処理部をさらに加えることのより、修正処理をさらに高速に実行することができる。修正処理部は輝度値情報の読み込みと格子点座標情報の読み込みと書き込みのとき以外はほぼ独立に処理を行うことができるため、メモリへのアクセスの衝突を避けて並列に処理を行うことができる。
【００４０】
図８、９、１０に示した実施例では、修正処理において参照画像の１画素について約（２＋２Ｎｓ／Ｎｇ）×（ｍ×ｍ − １）回の内挿演算が必要であり、図４で示した実施例の約（２＋２Ｎｓ／Ｎｇ）倍の回数を必要とする。しかし、高精細参照画像全体を蓄えるメモリが必要ないため、全体で必要なメモリ容量を小さくすることができる。
【００４１】
回路における乗除算のしやすさを考慮すると、ｍは２のべき乗であることが望ましい。ｍを小さくすると回路規模を小さくすることができる。しかし、一方で動き推定における座標（動きベクトル）の近似精度が悪くなり、数２の演算で予測誤差の大小関係が逆転しやすくなるために動き推定結果に狂いが生じ、予測特性は劣化する。これに対し、ｍを大きくすると逆の現象が起こる。回路規模を考慮すれば、ｍの値としては４以下が望ましい。また、予測特性を考慮すれば、ｍの値としては２以上が望ましい。したがって、両者のバランスを考えてｍの値としては２、４が適当である。許容される予測誤差と回路規模に応じて適当なｍの値は選択すれば良い。
【００４２】
画素密度が縦横にｍ倍の高精細参照画像を用いて動き推定を行なうことは、数３における変換関数ｆｉ（ｘ，ｙ）とｇｉ（ｘ，ｙ）の値が１／ｍの整数倍となるように制限される（変換関数の最小単位が隣接画素間距離の１／ｍとなる）ことを意味する。しかし、これはあくまで動き推定の際の制限で、予測画像の合成の際にもこの制限を守る必要はない。一方で、空間変換に基づく動き補償では送信側と受信側における予測画像のミスマッチ（送信側と受信側における変換関数の演算精度が異なることが原因で、両者で合成される予測画像に不一致が生じること）を防ぐため、予測画像を合成するときの変換関数の演算精度に関して何らかの基準を設定する必要がある。この基準を設定する方法の一つとして、動き推定の際と同様に予測画像を合成する際の変換関数に最小単位を設ける方法がある。しかし、上で述べた理由から、このときの最小単位は必ずしも動き推定における最小単位と同じ１／ｍである必要はない。一般的に動き推定が終了し、動きパラメータが確定していても、予測画像の合成における変換関数の演算精度を上げると予測誤差は減少する。したがって、予測画像の合成のときには動き推定のときよりも動きベクトルの最小単位を小さくすることにより、予測誤差を減少させることができる。予測画像の合成の際には、輝度値の内挿演算は１画素につき１回のみ行なわれるため、この演算が多少複雑になっても動き推定の場合ほど全体の処理量に大きな影響を及ぼさない。
【００４３】
なお、以下の変形も本発明に含まれることは明らかである。
【００４４】
（１）輝度値の内挿に用いる関数として、本明細書では共１次内挿をとりあげたが、これ以外の関数を用いても良い。関数が複雑になると、内挿演算の回数を減らすことの効果は大きくなる。
【００４５】
（２）変換関数の種類として、本明細書ではアフィン変換を中心にとりあげたが、これ以外の変換関数を用いても良い。同一パッチ内の画素が共通の動きベクトルに従う必要がなく、画素の動きベクトルの垂直・水平成分が隣接画素間距離の整数倍以外の値を取り得る限り、本発明は有効である。
【００４６】
（３）パッチの形状は、画素の集合を特定するものであれば良く、特に本明細書でとりあげた３角形でなくても良い。
【００４７】
（４）空間変換に基づく動き補償に関し、本明細書ではパッチの境界で動きベクトルが連続的に変化する方式をとりあげた。しかし、例えばパッチごとに動きパラメータをそのまま伝送する方式など、パッチ境界で不連続を許容する方式であっても良い。
【００４８】
（５）動き推定アルゴリズムとして、本明細書ではブロックマッチングと６角マッチングをとりあげたが、これは他のマッチングに基づく方式であっても良い。予測誤差の評価が多数回行なわれる方式である限り、本発明は有効である。
【００４９】
（６）空間変換に基づく動き補償において、本明細書でとりあげた例のように伝送される動き情報が格子点の動きベクトルでなくても良い。動き情報は、上記項目（４）でとりあげた例のように、パッチごとの変換関数を特定するものであれば良い。
【００５０】
（７）実施例では、ｍ１＝ｍ２の場合について述べたが、両者が異なっていても良い。
【００５１】
（８）本明細書では、現フレームのパッチ構造を固定して参照画像のパッチを変形させる方式に関して説明したが、逆に参照画像のパッチ構造を固定して現フレームのパッチを変形させる方式であっても良い。
【００５２】
（９）本明細書では、１個の予測画像を合成するために用いる参照画像の数は１個として説明したが、複数の参照画像を用いる方式であっても良い。
【００５３】
【発明の効果】
本発明により、同一パッチに所属するすべての画素が共通の動きベクトルを持つ制約がなく、かつ画素の動きベクトルの水平・垂直成分が隣接画素間距離の整数倍以外の値を取り得る動き補償方式の動き推定処理において、輝度値の内挿の演算を行なう回数を減らすことが可能となる。
【図面の簡単な説明】
【図１】半画素精度のブロックマッチングにおける２段階探索の処理の例を示した図である。
【図２】空間変換に基づく動き補償の処理の例を示した図である。
【図３】空間変換に基づく動き補償における動き推定処理の例として、６角マッチングとよばれる方式の処理を示した図である。
【図４】高精細参照画像を活用する画像符号化装置の例を示した図である。
【図５】輝度値の内挿により、参照画像を高精細化する回路の例を示した図である。
【図６】変換関数の演算結果から、高精細参照画像における輝度値を得る回路の例を示した図である。
【図７】６角マッチングの修正処理で使用する画素の範囲を示した図である。
【図８】６角マッチングの修正処理において隣の格子点に続いて修正処理を行う場合に、新たに必要となる画素の範囲を示した図である。
【図９】現フレームの原画像と参照画像の必要な部分を少しずつ取り込みながら参照画像を高精細化して動き推定に用いる方式を示した図である。
【図１０】現フレームの原画像と参照画像の必要な部分を少しずつ取り込みながら動き推定に用いる方式に並列処理を導入した場合を示した図である。
【符号の説明】
１０１、１０２、１０３、１０４…半画素精度のブロックマッチングにおけるマッチング処理の例、２０１…参照画像、２０２…現フレームの原画像、２０３…動き推定後の参照画像のパッチと格子点、２０４、２０９…パッチ、２０５〜２０７、２１０〜２１２、３０１、３０３〜３０９、７０３〜７１１…格子点、２０８…現フレームの原画像のパッチと格子点、３０２…修正処理におけるマッチングの対象となる多角形、４０１、９０９…動き推定部、４０２、４１１…現フレームの原画像、４０３、４０７、９０１、９０３…フレームメモリ、４０４…参照画像、４０５…参照画像高精細化回路、４０６…高精細参照画像、４０８…近似輝度値、４０９…マッチング処理回路、４１０…予測画像合成回路、４１２…減算器、４１３…予測誤差、４１４…予測誤差符号化器、４１５…動き情報、４１６…予測誤差情報、５０１…ライン遅延回路、５０２、５０３…画素遅延回路、５０４〜５０７…参照画像の画素の輝度値、５０８〜５１１…乗算器、５１２〜５１５、６０３…加算器、５１６…シフトレジスタ、５１７〜５２０…高精細参照画像の画素の輝度値、６０１、６０２…参照画像における座標値の固定小数点表現（変換関数の演算結果）、６０４…小数点以下２桁目以下（２進表現）切り捨て回路、６０５、６０６…高精細参照画像における座標値、６０７…座標−アドレス変換回路、７０１…参照画像において修正に使用される範囲、７０２…現フレームの原画像において修正に使用される範囲、８０１…参照画像において追加される範囲、８０２…現フレームの原画像において追加される範囲、９０２…６角マッチングにおける格子点の大まかな動き推定を行う回路、９０４…参照画像における格子点の座標メモリ、９０５、１００３…６角マッチングの修正処理部、９０６…マッチング処理回路、９０７…参照画像の一部の高精細化回路、９０８…動きベクトル演算回路、１００１…輝度値情報のデータバス、１００２…輝度値情報のアドレスバス、１００４…格子点の座標情報のアドレスバス、１００５…格子点の座標情報のデータバス。[0001]
[Industrial application fields]
The present invention provides a motion compensation method in which all pixels belonging to the same patch are not restricted to have a common motion vector, and the horizontal and vertical components of the pixel motion vector can take values other than integer multiples of the distance between adjacent pixels. The present invention relates to an image encoding device to be executed.
[0002]
[Prior art]
In high-efficiency coding of moving images, it is known that motion compensation that uses the similarity between temporally adjacent frames has a great effect on information compression. The motion compensation method that is the mainstream of the current image coding technology is block matching with half-pixel accuracy, which is also adopted in MPEG1 and MPEG2 which are international standards for moving image coding. In this method, an image to be encoded is divided into a large number of blocks, and the motion vector for each block is obtained in the horizontal / vertical direction with a length half the distance between adjacent pixels as a minimum unit. This process is expressed as follows using mathematical formulas. P (x, y) is a predicted image of a frame to be encoded (current frame), and a reference image (decoded image of a frame that is temporally close to P and already encoded) is R ( x, y). Further, assuming that x and y are integers, it is assumed that pixels exist at points where the coordinate values are integers in P and R. At this time, the relationship between P and R is
[0003]
[Expression 1]

[0004]
It is represented by However, assuming that the image is divided into n blocks, Bi represents a pixel included in the i-th block of the image, and (ui, vi) represents a motion vector of the i-th block. The most commonly used method for estimating the motion vector (ui, vi) is that a fixed search range is provided in (ui, vi) (for example, −15 ≦ ui, vi ≦ 15), and blocks are included therein. In which a prediction error Ei (ui, vi) is minimized. When the average absolute error is used as the evaluation criterion for the prediction error, Ei (ui, vi) is expressed as I (x, y) as the original image of the frame to be encoded.
[0005]
[Expression 2]

[0006]
It is represented by Here, Ni is the number of pixels included in the i-th block. In this way, the process of evaluating prediction errors for different motion vectors and searching for a motion vector having the smallest error is called matching processing. The calculation of Ei (ui, vi) for all (ui, vi) within a certain search range and searching for the minimum value is called full search.
[0007]
In block matching with half-pixel accuracy, ui and vi are each obtained by using half of the distance between pixels, that is, 1/2 in this case as a minimum unit. Therefore, it is necessary to obtain the luminance value of a point where the coordinate value is not an integer and no pixel actually exists in the reference image (hereinafter, such a point is referred to as an interpolation point). In this case, bilinear interpolation using four peripheral pixels is often used. When this interpolation method is described by a mathematical expression, the decimal values of the coordinate values are p and q (0 ≦ p, q <1), and the luminance value R (x + p, y + q) at the interpolation point (x + p, y + q) of the reference image. Is
[0008]
[Equation 3]

[0009]
It is represented by
[0010]
In a system that performs block matching with half-pixel accuracy, first, a full search with one pixel accuracy is performed on a wide search range to roughly estimate a motion vector, and then a very narrow range around the motion vector (for example, vertical and horizontal ± A two-stage search that performs a full search with half-pixel precision over a range of half-pixels is widely used. When performing the second stage search, a method is often used in which the brightness value at the interpolation point is first obtained before the search is performed. An example of the processing of this method is shown in FIG. In this example, a 4-pixel block is used. In the reference image, even if the coordinate value is an integer, the point where the pixel originally exists is represented by “◯”, and the interpolation point at which the brightness value is newly obtained is represented by “x”. Also, the pixel of the block of the original image of the current frame is represented by “□”. The motion vector obtained in the first stage search is defined as (uc, vc). The matching example 101 shows a state of matching when the motion vector is (uc, vc) in the first stage search. The prediction error is evaluated between the overlapped “◯” and “□”. In Examples 102, 103, and 104, the motion vectors are (uc + 1/2, vc), (uc + 1/2, vc + 1/2), (uc−1 / 2, vc−) in the second stage search. 1/2). In these examples, the prediction error is evaluated between the overlapped “x” and “□”. As can be seen from this figure, when the search range in the second stage search is ± 1/2 pixel vertically and horizontally, by obtaining the luminance values of 65 interpolation points (number of “x” in FIG. 1), Matching processing for 8 vectors can be covered. At this time, all the interpolation points for which the luminance values are obtained are used for matching. If interpolation is calculated in the reference image for each matching, it is necessary to perform 4 × 4 × 8 = 128 interpolations. The reason why the number of interpolation operations can be reduced in this way is that the same interpolation point of the reference image is used a plurality of times.
[0011]
As described above, half-pixel precision block matching is widely used at present. However, an application that requires a higher information compression rate than MPEG1 or MPEG2 requires a more advanced motion compensation method. The disadvantage of block matching is that all pixels in the block must have the same motion vector. In order to solve this problem, a motion compensation method that allows adjacent pixels to have different motion vectors has been recently proposed. The following is a brief description of motion compensation based on spatial transformation, which is an example of this method.
[0012]
In motion compensation based on spatial transformation, the relationship between the predicted image P and the reference image R is
[0013]
[Expression 4]

[0014]
It is represented by However, assuming that the image is divided into n small regions (patches), Pi represents a pixel included in the i-th patch of the image. Also, the conversion functions fi (x, y) and gi (x, y) express the spatial correspondence between the current frame image and the reference image. At this time, the motion vector of the pixel (x, y) in Pi can be represented by (x-fi (x, y), y-gi (x, y)). By the way, block matching can be interpreted as a special example of motion compensation based on spatial transformation as a method in which the transformation function is a constant. However, when the term motion compensation based on spatial transformation is used in this specification, block matching is not included therein.
[0015]
As the form of the transformation function, affine transformation
[0016]
[Equation 5]

[0017]
(See Nakaya et al., “Basic study of motion compensation based on triangular patch”, IEICE Technical Report, IE90-106, Hei 2-03), bilinear transformation
[0018]
[Formula 6]

[0019]
(G. J. Sullivan and R. L. Baker, “Motion compensation for video compression using grid interpolation”, Proc. ICASSP '91, M.1, 1997. p. Perspective transformation
[0020]
[Expression 7]

[0021]
(V. Sefferdis and M. Ghanbari, “General approach to block-matching motion estimation”, Optical Engineering, vol. 32, No. 7, pp. 144-1, pp. 144-1, pp. 144-1. Here, aij, bij, and cij are motion parameters estimated for each patch, and in order to obtain the same predicted image obtained by the encoding device on the receiving side, the image encoding device is in some form. Information that can identify the motion parameter of the conversion function for each patch may be transmitted to the receiving side as motion information, for example, affine transformation is used for the conversion function and the shape of the patch is a triangle. Direct transmission of 6 motion parameters Sending or transmitting the motion vectors of the three vertices of the patch can reproduce the six motion parameters on the receiving side (when using the bilinear transformation, the rectangular patch is In the following, a case where an affine transformation is used as a transformation function will be described, but this explanation can be applied almost as it is when another transformation is used.
[0022]
Even if the transformation function is determined, various variations can be considered for motion compensation based on spatial transformation. An example is shown in FIG. In this example, the motion vector is constrained to change continuously at the patch boundary. In the following, it is considered to synthesize a predicted image of the original image 202 of the current frame using the reference image 201. For this purpose, the current frame is first divided into a plurality of polygonal patches, resulting in a patch-divided image 208. The vertexes of the patches are called lattice points, and each lattice point is shared by a plurality of patches. For example, the patch 209 includes

lattice points

210, 211, and 212, and these lattice points also serve as vertices of other patches. Thus, after the image is divided into a plurality of patches, motion estimation is performed. In the example shown here, motion estimation is performed between each grid point and the reference image. As a result, each patch is deformed in the reference image 203 after motion estimation. For example, the patch 209 corresponds to the deformed patch 204. This is because it is estimated that the lattice points 205, 206, and 207 have moved to 210, 211, and 212, respectively, as a result of the motion estimation. The predicted image is synthesized by calculating a conversion function for each pixel in the patch and obtaining the luminance value of the corresponding point from the reference image according to Equation 4. In this example, this is realized by calculating six motion parameters of Formula 5 from the motion vectors of three vertices and calculating Formula 5 for each pixel. As described above, the motion vector of the lattice point may be transmitted as the motion information, or the motion parameter may be transmitted. However, in this example, only one lattice point serves as the vertices of a plurality of patches. The former is more efficient.
[0023]
It has been pointed out that motion estimation based on matching is effective in motion compensation based on spatial transformation as well as block matching. An example of a motion compensation algorithm based on matching is shown below. This method is called hexagonal matching, and is effective when the motion vector continuously changes at the patch boundary as in the above example. This method includes the following two processes.
[0024]
(1) Rough motion estimation of lattice points by block matching
(2) Correction of motion vector by correction algorithm
In the process (1), block matching is applied to a block including a lattice point (the size is arbitrary), and the motion vector of this block is used as a rough motion vector of the lattice point. The purpose of this processing is to obtain a rough motion vector of the lattice points, and it is not always necessary to use block matching. The state of the process (2) is shown in FIG. This figure shows a part of patches and lattice points in the reference image (corresponding to the image 203 in FIG. 2). Therefore, changing the position of a lattice point in this figure means changing the motion vector of that lattice point. When correcting the motion vector of the lattice point 301, first, the motion vectors of the lattice points 303 to 308 corresponding to the vertices of the polygon 302 constituted by all the patches related to the lattice point are fixed. Thus, when the motion vector of the lattice point 301 is changed within a certain range (for example, the lattice point 301 is moved to the position of the lattice point 309), the prediction error in the patch included in the polygon 302 also changes. Then, the motion vector that minimizes the prediction error in the polygon 302 within the search range is registered as the corrected motion vector of the lattice point 301. In this way, the correction of the grid point 301 is completed, and the same correction is continued after moving to another grid point. If correction is once performed for all grid points and then repeated from the first grid point, the prediction error can be further reduced. It has been reported that the number of repetitions is 2 to 3 times.
[0025]
A typical search range in the correction algorithm is ± 3 pixels vertically and horizontally. In this case, 49 matching is performed in the polygon 302 by one correction for one lattice point. On the other hand, since one patch is involved in the correction algorithm for three grid points, the prediction error is evaluated 147 times in total for each pixel in the patch. Further, if this correction is repeated, the number of error evaluations increases each time. As a result, every time error evaluation is performed, interpolation calculation is performed on the target pixel, and the amount of calculation becomes enormous.
[0026]
The problem of interpolation computation in motion compensation based on spatial transformation is troublesome because of the following essential differences compared to similar problems in half-pixel precision block matching. In motion compensation based on spatial transformation, even if the horizontal / vertical component of the motion vector at the grid point is limited to an integral multiple of 1/2, the horizontal / vertical component of the motion vector of each pixel is similarly an integer of 1/2. It will not be doubled. In general, since the components after the decimal point of the motion vector of each pixel can take arbitrary values, it is rare that the luminance value at the same interpolation point of the reference image is used a plurality of times in the matching process.
[0027]
[Problems to be solved by the invention]
When motion estimation based on matching is performed in motion compensation based on spatial transformation, there is a problem that the amount of calculation required for interpolation of luminance values becomes enormous. An object of the present invention is to reduce the number of calculations of luminance value interpolation and to realize motion estimation processing with a small amount of calculation.
[0028]
[Means for Solving the Problems]
Prior to the motion estimation process, the high-definition obtained by interpolating luminance values of points (m1 and m2 are positive integers) whose x and y coordinates are integer multiples of 1 / m1 and 1 / m2, respectively, in the reference image Prepare a reference image. Therefore, in the high-definition reference image, pixels exist at points where the x coordinate and the y coordinate are integer multiples of 1 / m1 and 1 / m2, respectively. In the motion estimation process, if the brightness value of the reference image at a position where the coordinate value is not an integer is required, approximate the brightness value of the pixel located at the position closest to this coordinate in the high-definition reference image. Thus, the purpose of reducing the number of interpolation operations is achieved.
[0029]
[Action]
In the process of creating the above-described high-definition reference image, m1 × m2-1 times of interpolation calculation is required for each original image. However, once this high-definition processing is finished, there is no need to perform further interpolation calculations in the motion estimation processing. In the example of motion compensation based on the spatial transformation taken up in “Prior Art”, the motion estimation process requires 147 or more interpolation operations per pixel. Therefore, if m1 = m2 = 2, the number of interpolation operations can be reduced to about 1/50, leading to a significant reduction in the amount of calculation.
[0030]
【Example】
As a first embodiment, a method for performing motion estimation processing after increasing the definition of the entire reference image will be described. First, a high-definition reference image R ′ is created. Assuming that bilinear interpolation (Equation 3) is used for the luminance value interpolation method, R ′ is expressed by the following equation.
[0031]
[Equation 8]

[0032]
However, s and t are integers, and 0 ≦ s <m1, 0 ≦ t <m2. In R ′, it is assumed that a pixel exists at a point where x, y, s, and t are all integers. The point where s = t = 0 originally corresponds to the pixel existing in the reference image, and the luminance values of the other points are obtained by interpolation. In the following, for the sake of simplicity, an example will be given particularly in the case of m1 = m2 = m (m is a positive integer).
[0033]
FIG. 4 shows an example of an image encoding apparatus that utilizes a high-definition reference image. Note that the arrows in the figure represent the flow of data, and address signals and the like are omitted. In this apparatus, the motion estimation unit 401 is in charge of motion estimation processing. The reference image 404 is processed by the reference image high-definition circuit 405 that performs high-definition processing, and then stored in the frame memory 407 as the high-definition reference image 406, and provides approximate luminance value information 408 to the matching processing circuit 409. . On the other hand, the original image 402 of the current frame is also stored in the frame memory 403 and used for motion estimation by the matching processing circuit 409. The motion information 415 output from the matching processing circuit is transmitted to the receiving side, but is also used for the prediction image synthesis in the prediction image synthesis circuit 410 in the encoding device. A difference between the synthesized predicted image and the original image 411 of the current frame is obtained by a subtracter 412, encoded as a prediction error 413 by a prediction error encoder 414, and transmitted as prediction error information 416. In the conventional method, the calculation of the conversion function, the interpolation process, and the evaluation of the prediction error are all performed by the matching circuit, whereas in this embodiment, the interpolation process is performed in advance by the reference image enhancement circuit. This reduces the amount of computation. In addition, by using a high-definition reference image, it is possible to reduce the calculation accuracy required for the conversion function and further simplify the processing. This is because, when an error occurs in the calculation of the conversion function, the result of motion estimation is not affected as long as the pixels used for approximation in the high-definition reference image are not different. Note that not all pixels of the high-definition reference image for which the luminance value is obtained by interpolation are used in the matching process. This point is different from the example of block matching taken up in “Prior Art”.
[0034]
FIG. 5 shows an example of the reference image enhancement circuit 405 when m = 2 and bilinear interpolation is used for luminance value interpolation. In this figure, the arrows indicate the flow of data, and the same reference numerals as those in FIG. 4 indicate the same items. It is assumed that the input reference image signal 404 gives pixel luminance values from left to right for each line from top to bottom. By applying this signal to a circuit composed of two pixel delay circuits 502 and 503 and one line delay circuit 501, luminance values 504 to 507 of four pixels adjacent vertically and horizontally can be obtained. . Each of these luminance values is multiplied by a weighting coefficient according to the interpolation position using the integrators 508 to 511, and the result is added to the adders 512 to 514. The result is further added to the adder 515 and the shift register 516, thereby realizing an operation of rounding off after the decimal point after dividing by 4. As a result of the above processing, luminance values 517 to 520 for four pixels of the high-definition reference image R ′ can be obtained as the output 406.
[0035]
FIG. 6 shows an example of a circuit that obtains an approximate value of the luminance value at the interpolation point of the reference image using the high-definition reference image R ′ in the matching processing circuit 409. The same reference numerals as those in FIG. 4 denote the same components. Here, it is assumed that coordinate

values

601 and 602 in binary fixed-point representation are given by calculating a conversion function. Further, it is assumed that m is 2 as in FIG. 5 and the high-definition reference image R ′ is stored in the frame memory 407. The coordinate

values

601 and 602 are passed through an adder 603 for adding 1/4 and a circuit 604 for rounding down the second decimal place. The second decimal place is rounded off and converted to an integral multiple of 1/2. . The coordinate

values

605 and 606 obtained as a result correspond to the coordinate values of the points where the pixels exist in the high-definition reference image R ′. This coordinate value is converted into an address in the frame memory 407 by the coordinate-address conversion circuit 607 to obtain target approximate luminance value information 408. In this example, the component of the conversion function with no more than the third digit after the decimal point is not used at all. Therefore, a calculation error in a range that does not affect the second decimal place or more of the conversion function does not affect the motion estimation result. This is because, as described above, the calculation accuracy required for the conversion function is reduced by using the high-definition reference image.
[0036]
In the embodiment described above, the number of interpolation operations is reduced, but a memory for storing an image four times larger than the reference image is required to store the high-definition reference image. Therefore, although the number of interpolation operations is larger than that in the above embodiment, an embodiment of a method that requires a small memory capacity is shown. In this method, a reference image is refined and used for motion estimation while gradually capturing necessary portions of an original image and a reference image of the current frame. It is assumed that the distance between adjacent pixels is 1 in the horizontal and vertical directions for both the original image and the reference image of the current frame. Here, assuming that hexagonal matching is used for the motion estimation method, a description will be given focusing on a circuit that executes correction processing in hexagonal matching. The rough motion estimation of the grid point, which is the other process of the hexagonal matching, is performed by executing block matching on the block including the grid point, as described in “Prior Art”.
[0037]
FIG. 7 shows positions of grid points 703 to 711 in a part of the original image of the current frame. If the interval between grid points is Ng vertically and horizontally, and the search range of motion vectors for each grid point is ± Ns horizontally and vertically, the range 701 of 2Ng + 2Ns of the reference image and the original image of the current frame If pixels included in the vertical and horizontal 2Ng range 702 (shaded area) are used, hexagonal matching correction processing for the lattice point 703 can be performed (actually, a narrower range may be used, but the processing is simplified). To use a square area). Therefore, the apparatus that performs the correction process can execute the subsequent processes independently from the external frame memory by reading in advance the luminance values of the pixels included in these ranges. In this case, if the correction process of the grid point 708 is performed before the correction process of the grid point 703, a part of the pixels in the

ranges

701 and 702 has already been read by the apparatus that performs the correction process. Become. Therefore, at this time, as shown in FIG. 8, only the reference image range 801 and the current frame original image range 802 may be added and read (in FIG. 8, the same reference numbers as those in FIG. ) At the time of this additional reading, part of the pixel information used for the motion estimation processing of the grid point 708 is not necessary, so the information in the

ranges

801 and 802 may be written on the memory in which this information was stored. In this way, the processing can be simplified if only the necessary information is newly read each time the grid point for motion estimation moves from left to right.
[0038]
FIG. 9 shows an example of a motion estimation unit 909 of an apparatus that performs hexagonal matching correction processing according to the method shown in FIGS. In this figure, arrows indicate the flow of data, and the same reference numerals as in FIG. 4 indicate the same items. The motion estimation unit 909 operates in the same manner with a different configuration from the motion estimation unit 401 of FIG. The input original image 402 and reference image 404 of the current frame are stored in

frame memories

901 and 903, respectively. On the other hand, first, rough motion estimation of the lattice point is executed by the circuit 902, and information on the coordinates of the lattice point in the reference image is stored in the lattice point coordinate memory 904 in accordance with the obtained motion vector. Subsequently, the correction processing unit 905 performs correction processing in hexagonal matching. In the following, following the example of FIG. 8, a correction process for the grid point 703 when the process for the grid point 708 is performed immediately before will be described. The correction processing unit 905 includes a high definition circuit 907 and a matching processing circuit 906. First, the high-definition circuit 907 reads out the luminance value information of the pixels in the newly required range (the range 801 in the example of FIG. 8) from the frame memory 903 in which the reference images are stored. Interpolation processing is performed on this information to create a high-definition reference image in a range necessary for motion estimation, and this is given to the matching processing circuit 906. Similarly, the matching processing circuit reads luminance value information of a newly required range (in the example of FIG. 8, range 802) from the frame memory 901 of the original image of the current frame. The matching processing circuit has a frame memory that stores a high-definition reference image within a range necessary for correction processing and the original image of the current frame by itself, and performs processing using this memory. The matching processing circuit is newly required from the lattice point coordinate memory 904 in the coordinate information of the lattice points in the reference image (in the example of FIG. 8, the coordinate information of the lattice points 704, 706, and 711. This is the lattice information. The coordinate information of the

points

707, 708, and 710 is used in the previous process.) Is read, and the hexagonal matching correction process is performed. According to this processing result, the coordinates of the lattice points in the corrected reference image (in the example of FIG. 8, the coordinates of the lattice points 703) are written into the lattice point coordinate memory 904. Thus, the correction of the grid point 703 is completed, and the correction processing unit proceeds to the correction process of the grid point 704 in FIG. When all the correction processes are completed, the information stored in the lattice point coordinate memory 904 is converted into a motion vector for each lattice point in the motion vector calculation circuit 908 and output as motion information 415. In addition, in order to calculate the prediction error of the predicted image, information 411 of the original image of the current frame is also output.
[0039]
FIG. 10 shows an example in which parallel processing is introduced in the processing of FIG. The same reference numerals as those in FIG. 9 denote the same components. In this example, there are a plurality of correction processing units that perform correction processing in hexagonal matching, and the processing is shared. In order to read the luminance value information from the

frame memories

901 and 903 storing the reference image and the original image of the current frame, a common data bus 1001 and address bus 1002 are used. Further, when information is read from or written into the grid point coordinate memory 904 that stores the coordinates of grid points in the reference image, a common data bus 1005 and address bus 1004 are used. Via these buses, a circuit 902 that performs rough motion estimation of lattice points, and

circuits

905 and 1003 that perform hexagonal matching correction processing transfer information. The

correction processing units

905 and 1003 have the same configuration. By further adding a correction processing unit having the same configuration, the correction processing can be executed at higher speed. Since the correction processing unit can perform processing almost independently except when reading luminance value information and reading and writing grid point coordinate information, it can perform parallel processing while avoiding memory access conflicts. .
[0040]
In the embodiments shown in FIGS. 8, 9, and 10, correction processing requires about (2 + 2Ns / Ng) × (m × m−1) times of interpolation for one pixel of the reference image. Approximately (2 + 2Ns / Ng) times as many times as the example shown is required. However, since a memory for storing the entire high-definition reference image is not necessary, the memory capacity required as a whole can be reduced.
[0041]
Considering the ease of multiplication and division in the circuit, m is preferably a power of 2. If m is reduced, the circuit scale can be reduced. However, on the other hand, the accuracy of approximation of the coordinates (motion vector) in motion estimation deteriorates, and the magnitude relationship between prediction errors is easily reversed by the calculation of Equation 2, so that the motion estimation result is distorted and the prediction characteristics deteriorate. On the other hand, when m is increased, the reverse phenomenon occurs. Considering the circuit scale, the value of m is preferably 4 or less. In consideration of prediction characteristics, the value of m is preferably 2 or more. Therefore, 2 and 4 are appropriate as the value of m considering the balance between the two. An appropriate value of m may be selected according to the allowable prediction error and circuit scale.
[0042]
Performing motion estimation using a high-definition reference image whose pixel density is m times vertically and horizontally means that the values of the conversion functions fi (x, y) and gi (x, y) in Equation 3 are integral multiples of 1 / m. (The minimum unit of the conversion function is 1 / m of the distance between adjacent pixels). However, this is a limitation at the time of motion estimation, and it is not necessary to observe this limitation when synthesizing a predicted image. On the other hand, in motion compensation based on spatial transformation, there is a mismatch between predicted images on the transmitting side and the receiving side (due to the difference in calculation accuracy of the conversion function between the transmitting side and the receiving side, there is a mismatch between the predicted images synthesized between the two. In order to prevent this, it is necessary to set some reference regarding the calculation accuracy of the conversion function when the predicted image is synthesized. As one method for setting this reference, there is a method in which a minimum unit is provided in a conversion function when synthesizing a predicted image as in the case of motion estimation. However, for the reasons described above, the minimum unit at this time is not necessarily 1 / m, which is the same as the minimum unit in motion estimation. In general, even if motion estimation is completed and motion parameters are determined, the prediction error decreases when the calculation accuracy of the conversion function in the synthesis of the predicted image is increased. Therefore, the prediction error can be reduced by making the minimum unit of the motion vector smaller when synthesizing the predicted image than when estimating the motion. When synthesizing a predicted image, the luminance value interpolation operation is performed only once per pixel. Therefore, even if this calculation is somewhat complicated, the overall processing amount is not greatly affected as in the case of motion estimation. .
[0043]
Obviously, the following modifications are also included in the present invention.
[0044]
(1) As a function used for luminance value interpolation, bilinear interpolation is taken up in this specification, but other functions may be used. As the function becomes more complex, the effect of reducing the number of interpolation operations increases.
[0045]
(2) As the type of the conversion function, the present specification focuses on the affine transformation, but other conversion functions may be used. The present invention is effective as long as the pixels in the same patch do not need to follow a common motion vector and the vertical and horizontal components of the pixel motion vector can take values other than an integral multiple of the distance between adjacent pixels.
[0046]
(3) The shape of the patch is not particularly limited as long as it identifies a set of pixels, and may not be a triangle particularly described in this specification.
[0047]
(4) Regarding motion compensation based on spatial transformation, this specification has taken up a method in which motion vectors continuously change at patch boundaries. However, a method that allows discontinuity at patch boundaries, such as a method of transmitting motion parameters as they are for each patch, may be used.
[0048]
(5) As the motion estimation algorithm, block matching and hexagonal matching are taken up in this specification, but this may be based on other matching methods. As long as the prediction error is evaluated many times, the present invention is effective.
[0049]
(6) In motion compensation based on spatial transformation, the motion information transmitted as in the example given in this specification may not be a motion vector at a lattice point. The motion information may be any information that specifies a conversion function for each patch, as in the example taken up in item (4) above.
[0050]
(7) In the embodiment, the case of m1 = m2 has been described, but both may be different.
[0051]
(8) In this specification, the method of fixing the patch structure of the reference image while fixing the patch structure of the current frame has been described, but conversely the method of fixing the patch structure of the reference image and deforming the patch of the current frame. There may be.
[0052]
(9) In the present specification, the number of reference images used for synthesizing one predicted image has been described as one, but a method using a plurality of reference images may be used.
[0053]
【The invention's effect】
According to the present invention, there is no restriction that all pixels belonging to the same patch have a common motion vector, and the horizontal and vertical components of the pixel motion vector can take values other than an integral multiple of the distance between adjacent pixels. In this motion estimation process, it is possible to reduce the number of times of performing luminance value interpolation.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating an example of a two-stage search process in block matching with half-pixel accuracy.
FIG. 2 is a diagram illustrating an example of motion compensation processing based on spatial transformation.
FIG. 3 is a diagram showing processing of a method called hexagonal matching as an example of motion estimation processing in motion compensation based on spatial transformation.
FIG. 4 is a diagram illustrating an example of an image encoding device that utilizes a high-definition reference image.
FIG. 5 is a diagram illustrating an example of a circuit that increases the definition of a reference image by interpolating luminance values.
FIG. 6 is a diagram illustrating an example of a circuit that obtains a luminance value in a high-definition reference image from a calculation result of a conversion function.
FIG. 7 is a diagram showing a pixel range used in hexagonal matching correction processing;
FIG. 8 is a diagram illustrating a pixel range that is newly required when correction processing is performed following an adjacent grid point in correction processing for hexagonal matching.
FIG. 9 is a diagram showing a method used for motion estimation by increasing the definition of a reference image while gradually capturing necessary portions of an original image and a reference image of a current frame.
FIG. 10 is a diagram illustrating a case where parallel processing is introduced into a method used for motion estimation while capturing necessary portions of an original image and a reference image of a current frame little by little.
[Explanation of symbols]
101, 102, 103, 104: Example of matching processing in half-pixel precision block matching, 201: Reference image, 202: Original image of current frame, 203: Patch and grid points of reference image after motion estimation, 204, 209 ... patches, 205 to 207, 210 to 212, 301, 303 to 309, 703 to 711 ... grid points, 208 ... patches and grid points of the original image of the current frame, 302 ... polygons to be matched in the correction process, 401, 909 ... motion estimation unit, 402, 411 ... original image of current frame, 403, 407, 901, 903 ... frame memory, 404 ... reference image, 405 ... reference image refinement circuit, 406 ... high definition reference image, 408 ... approximate luminance value, 409 ... matching processing circuit, 410 ... predicted image synthesis circuit, 412 ... subtractor, 413 ... Measurement error, 414 ... Prediction error encoder, 415 ... Motion information, 416 ... Prediction error information, 501 ... Line delay circuit, 502, 503 ... Pixel delay circuit, 504 to 507 ... Reference pixel brightness value, 508 ... 511 ... Multiplier, 512 to 515, 603 ... Adder, 516 ... Shift register, 517 to 520 ... Luminance value of pixel of high-definition reference image, 601 and 602 ... Fixed-point representation of coordinate value in reference image (conversion function (Calculation result), 604... Second decimal place (binary expression) truncation circuit, 605, 606... Coordinate value in high-definition reference image, 607... Coordinate-address conversion circuit, 701. Range, 702 ... Range used for correction in the original image of the current frame, 801 ... Range added in the reference image, 802 ... Current frame Range added in original image, 902... Circuit for roughly estimating grid point motion in hexagonal matching, 904... Coordinate memory of grid points in reference image, 905, 1003... Hexagonal matching correction processing unit, 906. Matching processing circuit, 907... High definition circuit for a part of the reference image, 908... Motion vector arithmetic circuit, 1001... Data bus for luminance value information, 1002 ... address bus for luminance value information, 1004. Address bus, 1005... Data bus for coordinate information of lattice points.

Claims

Means for executing a motion compensation method in which all pixels belonging to the same patch are not restricted to have a common motion vector, and the horizontal and vertical components of the pixel motion vector can take values other than an integral multiple of the distance between adjacent pixels. When,
A high-definition reference image in which the pixel density is set to m1 and m2 times in the horizontal and vertical directions (m1 and m2 are positive integers) is obtained by interpolating luminance values at points where no pixels exist in the reference image. Means,
A memory for storing the high-definition reference image,
The means for executing the motion compensation method is configured to determine a position of a point in the reference image corresponding to a position of a pixel of the predicted image using a conversion function representing a relationship between the reference image and the predicted image when performing motion estimation. If the pixel of the reference image does not exist at the position of the obtained point, the luminance value of the obtained point is the luminance value of the pixel located closest to this point in the high-definition reference image. By approximating the motion vector by approximation , the position of the point in the reference image corresponding to the pixel position of the predicted image by the approximation of the motion vector and the point in which the pixel of the reference image does not exist, and in the high-definition reference image An image encoding device characterized in that the position of a pixel located closest to this point may not match.

Means for executing a motion compensation method in which all pixels belonging to the same patch are not restricted to have a common motion vector, and the horizontal and vertical components of the pixel motion vector can take values other than an integral multiple of the distance between adjacent pixels. When,
The pixel density is set to m1 and m2 times (m1 and m2 are positive integers) in the horizontal and vertical directions by interpolating the luminance value of the point where no pixel exists for a part of the reference image. Means for synthesizing a portion of a fine reference image;
A memory for storing a part of the high-definition reference image,
The means for executing the motion compensation method uses a conversion function that represents a relationship between the reference image and the predicted image when performing motion estimation, and includes a part of the reference image corresponding to a pixel position of the predicted image. When the position of the point is determined and no pixel of the reference image exists at the determined position of the point, the luminance value of the determined point is determined as the highest of the points in the high-definition reference image. A motion vector is approximated by approximating with a luminance value of a pixel located nearby, and the pixel of the reference image at a point in a part of the reference image corresponding to the pixel position of the predicted image is approximated by the motion vector approximation . An image encoding apparatus, wherein a position of a non-existing point may not match a position of a pixel in a part of the high-definition reference image closest to the position.

Means for limiting the horizontal and vertical components of the motion vector in each pixel during synthesis of the predicted image to integer multiples of 1 / d 1 and 1 / d 2 (d 1 and d 2 are positive integers) of the distance between adjacent pixels, respectively. 3. The image encoding apparatus according to claim 1, wherein m1 ≦ d1 and m2 ≦ d2.

4. The image coding apparatus according to claim 1, wherein the values of m1 and m2 are each an integer of 2 or more.

4. The image encoding apparatus according to claim 1, wherein the values of m1 and m2 are each 2 to the power of w1 and 2 to the power of w2 (w1 and w2 are positive integers).

4. The image encoding apparatus according to claim 1, wherein the values of m1 and m2 are two.

4. The image coding apparatus according to claim 1, wherein the values of m1 and m2 are four.