JP3781194B2

JP3781194B2 - Motion vector field coding

Info

Publication number: JP3781194B2
Application number: JP51630897A
Authority: JP
Inventors: ニービグロスキー，ジャセック; カークセビクス，マータ
Original assignee: Nokia Oyj
Current assignee: Nokia Oyj
Priority date: 1995-10-20
Filing date: 1995-10-20
Publication date: 2006-05-31
Anticipated expiration: 2015-10-20
Also published as: EP0856228B1; KR19990064293A; AU3701495A; WO1997016025A1; JP2000512440A; HK1017553A1; DE69511119T2; DE69511119D1; KR100381061B1; US6163575A; EP0856228A1

Description

発明の分野
本発明は、一般的にはビデオ圧縮に関する。より正確には、本発明は、推定された動きフィールド（motion field）を符号化してビデオシーケンスにおける動き情報を作成する方法に関する。
発明の背景
動き補償予測は、大多数のビデオ符号化方式の非常に重要な要素である。図１は、動き補償を使用してビデオシーケンスを圧縮する符号器の略図である。この符号器の必須要素は、動き補償予測ブロック１，動き推定器２，及び動きフィールド符号器３である。動き補償ビデオ符号器の動作原理は、現在のフレームと呼ばれる符号化される入来フレームＩ_n(x,y)と予測フレーム

との差である予測エラーＥ_n(x,y)を圧縮するというものであり、ここで、

予測フレーム

は動き補償予測ブロック１により作成されるものであって、前の、或いはその他の既に符号化されている

と表記されるフレーム（参照フレーム（reference frame）と呼ばれる）のピクセル値と、現在のフレーム及び参照フレームの間の各ピクセルの動きベクトル（motion vector）とを使って作成される。動きベクトルは動きフィールド推定器２により計算され、その結果として得られたベクトルフィールドは、予測ブロック１に入力される前に何らかの方法で符号化される。予測フレームは次の通りである：

数の対［x+Δx(x,y)，y+Δy(x,y)］は、現在のフレームの中の場所(x,y)のピクセルの動きベクトルと呼ばれ、Δx(x,y)及びΔy(x,y)はこのピクセルの水平変位及び垂直変位の値である。現在のフレームＩ_n(x,y)の中の全てのピクセルの動きベクトルの集合は動きベクトルフィールドと称される。符号化された動きベクトルフィールドは動き情報として復号器にも送られる。
図２の復号器において、現在のフレームＩ_n(x,y)の各ピクセルは、参照フレーム

の中の該ピクセルの予測値

を求めることによって、復元（再構成）される。動き補償予測ブロック２１は、受け取った動き情報及び参照フレーム

（この絵では参照フレームは現在のフレームと同じである）を使用して予測フレームを作成する。予測エラー復号器２２において、復号された予測エラーＥ_n(x,y)が予測フレームと加算されて、元の現在のフレームＩ_nが得られる。
動き補償（motion compensated）（MC））予測の一般的目的は、復号器に送らなければならない情報の量をなるべく少なくすることである。この予測は、例えばＥ_n(x,y)のエネルギーとして測定された予測エラーの量をなるべく少なくし、且つ、動きベクトルフィールドを表示するのに必要な情報の量をなるべく少なくするべきである。
１９９０年３月２６−１８日、マサチューセッツ州ケンブリッジ市での画像符号化シンポジウム’９０の議事録、841-845ページに掲載されているＨ．グエン、E.デュボアの文献”画像符号化のための動き情報の表示”（the document H.Nguen, E. Dubois, ”Representation of motion information for image coding”. Proc. Picture Coding Symposium '90, Cambridge, Massachusetts, March 26-18, 1990, pages 841-845）は、動きフィールド符号化手法を概説している。大ざっぱに言うと、予測エラーを小さくするためには、より複雑な動きフィールドが必要である、即ち、より多数のビットをその符号化のために使用しなければならない。従って、ビデオ符号化の総合的目的は、予測エラーの尺度をなるべく低く保ちながら動きベクトルフィールドをなるべくコンパクトに符号化することである。
動きフィールド推定ブロック１（図１）は、与えられたセグメントの全てのピクセルの、このセグメントの例えば二乗予測エラー（square prediction error）などの予測エラーの何らかの尺度を最小にする動きベクトルを、計算する。動きフィールド推定手法は、動きフィールドのモデルと、予測エラーの選択された尺度を最小化するためのアルゴリズムとの両方において異なる。
フレームの中のピクセルの数が非常に多いために、各ピクセルについて別々の動きベクトルを送信するのは効率的でない。殆どのビデオ符号化方式で、現在のフレームは大きな画像セグメントに分割され、セグメントの全ての動きベクトルを少数のパラメータで記述できるようになっている。画像セグメントは正方形のブロックであってもよくて、例えば国際規格ISO/IEC MPEG-1又はITU-TH.261に準拠する符復号器（コーデック）では１６×１６のピクセル（画素）のブロックが使用されており、また画像セグメントは例えば分割アルゴリズムによって得られる全て任意の形状の領域から成っていてもよい。実際上、各セグメントは少なくとも数十個のピクセルを包含する。
セグメント内の各ピクセルの動きベクトルをコンパクトに表示するためには、それらの値を少数のパラメータの関数で記述できることが望ましい。その様な関数は動きベクトルフィールドモデル（motion vector field model）と称されている。公知のグループのモデルは線形の動きモデルであり、その動きベクトルは動きフィールド基底関数の線形結合である。このようなモデルでは画像セグメントの動きベクトルは下記の一般式：

で記述され、ここでパラメータｃ_iは動き係数（motion coefficent）と呼ばれるものであり、復号器に送られる。関数ｆ_i(x,y)は動きフィールド基底関数（motion field basis function）と呼ばれていて、この関数は固定されていて符号器及び復号器の両方に知られている。
上記の式を有する線形動きモデルを使うときの問題は、予測エラーＥ_n(x,y)の尺度をなるべく低く保ちながら、復号器に送られる動き係数ｃ_iの数をどうやって最小化するかということである。このプロセスは、符号器において動きフィールド符号化ブロック３によって実行される（図１を参照）。それは、ブロック２によって達成される、計算に関して非常に複雑な動きフィールド推定の後に、実行される。従って、動きフィールド符号化は計算に関して単純で、追加の負担を符号器にかけないことがきわめて重要である。
復号器に送る必要のある動き係数の総数は、画像の中のセグメントの個数と、セグメントあたりの動き係数の個数との両方に依存する。従って、動き係数の総数を減らす方法は少なくとも２つある。
第１の方法は、予測エラーを大幅に増大させることなく共通の動きベクトルフィールドで予測することのできるセグメント同士を結合（併合）させることによってセグメントの数を減らすことである。隣り合っているセグメントを同じ動き係数の集合で予測できることが非常に頻繁にあるので、フレーム内のセグメントの数を減らすことができる。その様なセグメント同士を結合させるプロセスは動き補助併合（motion assisted merging）と呼ばれる。
第２の方法は、なるべく少ない係数で満足できるほど低い予測エラーを達成できる動きモデルを各セグメントについて選択することである。動きの量と複雑さとは各フレーム間で、また各セグメント間で異なるので、セグメントあたりに常にN+M個の動き係数をすべて使用するのは効率的でない。満足できるほど低い予測エラーを達成できる最少数の動き係数を全てのセグメントについて求める必要がある。その様な係数の適応的選択のプロセスは動き係数除去と呼ばれる。
図３は、各セグメントに分割されたフレームを示している。動き係数符号化を行う従来技術の手法は、動き補助併合を行う幾つかの手法を含んでいる。全てのセグメントの動きベクトルが推定された後、動き補助併合が行われる。それは、隣接するセグメントＳ_i及びＳ_jの全ての対をそれらの動き係数ｃ_i及びｃ_jとともに考慮して行われる。結合されたセグメントＳ_i及びＳ_jの領域はＳ_ijと表示される。予測エラーをＳ_i及びＳ_jの別々の予測から生じるエラーより大幅に増大させることなく動き係数ｃ_ijの１つの集合で領域Ｓ_ijを予測できるならば、Ｓ_i及びＳ_jは併合される。動き補助併合を行う方法は、本質的に、互いに結合されたセグメントを良好に予測することを可能する動き係数ｃ_ijの単一の集合を求める方法において異なる。
１つの方法は徹底的な動き推定による併合方法として知られている。この方法は、隣り合うセグメントＳ_i及びＳ_jの全ての対について動き係数ｃ_ijの新しい集合を”最初から”推定する。もしＳ_ijについての予測エラーがあまり大きくはならなければ、セグメントＳ_i及びＳ_jは併合される。この方法では、併合することのできるセグメントを非常によく選択できるけれども、通常は符号器の複雑さを数桁も大きくするので、この方法を実行することはできない。
もう一つの方法は、動きフィールド拡張による併合として知られている。この方法は、予測エラーをあまり大きくすることなくいずれかの動きパラメータｃ_i又はｃ_jを使ってＳ_ijの領域を予測できるか否か試験する。この方法は、新しい動き推定を全く必要としないので、計算の複雑さが非常に低いという特徴を持っている。しかし、この方法では、１つのセグメントについて計算された係数での動きの補償によって隣のセグメントを予測することさえもうまくできるのは非常にまれであるので、セグメント同士を併合できないことが非常に頻繁にある。
もう一つの方法は動きフィールドはめ込みによる併合として知られている。この方法では、動き係数ｃ_ijは近似の方法により計算される。それは、各セグメントで少数の動きベクトルを評価することによって行われる。図４においてセグメントＳ_i及びＳ_jの幾つかの動きベクトルが描かれている。セグメントＳ_ijについての動きフィールドは、或る公知のはめ込み方法を使用してこれらのベクトルを通じて共通の動きベクトルフィールドをはめ込むことによって作成される。この方法の欠点は、はめ込みによって得られる動きフィールドが充分に精密ではなくて、予測エラーが許容できないほど大きくなることがしばしばあることである。
いろいろなモデルで動きを推定し、最も適当なものを選択する方法が下記の２つの文献で提案されている：
１９９４年度、音響学、音声及び信号処理に関する国際会議議事録、III265-268ページ、H.ニコラス及びC.ラビットの”画像シーケンス符号化のための決定論的緩和法を用いる領域に基づく動き推定方法”（H. Nicolas and C. Labit, ”Region-based motion estimation using deterministic relaxation schemes for image sequence coding, ”Proc. 1994 International Conference on Acoustics, Speech and Signal Processing, pp. III265-286）；
ビデオ技術のための回路及びシステムに関するIEEE会報、１９９４年６月、第３巻第４号、357-364ページ、Ｐ．チッコーニ及びH. ニコラスによる”画像シーケンス符号化のための領域に基づく効率的動き推定方法及び対称指向分割方法”（P. Cicconi and H. Nicolas, ”Efficient regient-based motion estimation and symmetry oriented segmentation for image sequence coding, ”IEEE Trans. on Circuits and Systems for Video Technology, Vol. 4, No. 3, June 1994, pp. 357-364）
それらの方法は、いろいろなモデルで動き推定を行って、最も適当なものを１つ選択することによって、動きの複雑さに応じて動きモデルを適応させようと試みる。それらの方法の主な欠点は、計算が複雑で、実際に試験することのできるいろいろな動きフィールドモデルの量が少ないことである。
上記の方法のいずれも、単独では、予測エラーＥ_n(x,y)の尺度をなるべく低く保ちながら、復号器に送られる動き係数ｃ_iの数を最小にするという問題を解決することはできない。
発明の概要
本発明の目的は、予測エラーを大幅に増大させずに公知の動き推定方法により作成される動きフィールドベクトル情報の量を大幅に減少させる動きフィールド符号器を作ることである。利用可能な信号処理装置或いは汎用マイクロプロセッサで実用的に実現できるように、その動きフィールド符号器の複雑さは低くなければならない。
本発明によれば、動きフィールド符号器は３個の主なブロックを含んでいる。
第１の主なブロックはQR動き分析器（motion analyzer）と呼ばれる。その任務は、動きフィールド推定器によって作成されて入力された動きフィールドの新しい表示を求めることである。その新しい表示は第２の主なブロックに入力される。この第１の主なブロックでの動作は、行列演算からなる複数のステップを含んでいる。その第１のステップでは予測フレームが公知の近似方法で線形化され、予測フレームは動きベクトルに関して線形となる。第２ステップでは二乗予測エラー（square prediction error）を最小化するために行列Ｅ_i及び行列ｙ_iが作成される。第３ステップでは周知のQR因数分解アルゴリズムを使って行列Ｅ_iを２つの行列Ｑ_i及びＲ_iの積に分解する。また、因数行列Ｑ_i及び行列ｙ_iから補助ベクトルｚ_iを計算する。行列Ｒ_iの一部分と補助ベクトルｚ_iとが第２の主なブロックに入力される。
セグメント併合ブロック（segment merging block）と呼ばれる第２の主なブロックは、隣り合うセグメントの対の結合された領域を共通の動きフィールドを使用して予測できるか否かを判定して、それらのセグメントの対を併合させる。行列演算では始めに行列方程式を作成し、その後に公知の行列計算方法を使用して因数行列を処理する。その結果は行列方程式であり、その１つの行列はいくつかの項を含んでいて、それらに基づいて、併合されるセグメントの領域での二乗予測エラーを簡単に計算できる。もし二乗予測エラーの変化が、選択された基準に照らして許容できる程度のものであるならば、それらのセグメントを併合させる。
全てのセグメントの対が検討された後、セグメント併合ブロックの出力は：
i. セグメントの数が減っている、新たに分割された画像であるか、
ii. 各新セグメントについて該ブロックは行列Ｒ¹ _ij、ベクトルｚ¹ _ijを出力するか、
iii.併合情報であり、この情報は復号器に送られて、復号器が併合されたセグメントを特定するのに役立つ。
第３の主なブロックは係数除去ブロック（coefficient removal block）と呼ばれる。このブロックは入力として、新たに各セグメントに分割された現在のフレームを受け取るとともに、全てのセグメントについてセグメント併合ブロックにより作成された行列Ｒ¹ _k、ｚ¹ _k、及びｃ_kを受け取る。全てのセグメントの動きベクトルが数個の動き係数によって表示される。動き係数除去ブロックは、各セグメントについて、予測エラーをあまり増大させることなく動きフィールドモデルを簡単にすることができるか否か判定する。何らかの基底関数が動きモデルから除去され、その様にして単純化された動きフィールドモデルを記述するのに必要な係数は少なくなる。
この第３の主なブロックでの動作は行列操作であり、始めに因数行列の１つの行と列とを除去して行列方程式を修正し、次にその行列方程式を三角行列にする（triangularized）。１つの列及び行の除去は動きモデルからの１つの基底関数の除去に相当する。１つの基底関数の除去によって生じるセグメントについての二乗予測エラーの変化は方程式中の１つの項の二乗に等しい。
選択された基準に照らして予測エラーの変化が許容できる程度であるならば、１つの係数が係数の集合から除去される。これらの行列操作を更に繰り返して、該セグメントについてより多くの係数を減らすことができる。充分な量の係数が除去された後、得られた一次方程式を解くことによってそのセグメントについての最終の動き係数が計算される。例えば後退代入（backsubstitution）など、周知のアルゴリズムの１つを使って方程式を解くことができる。
第３の主なブロックは、処理した全てのセグメントについて、どの基底関数が動きフィールドモデルから除去されたのかを知らせる選択情報を出力する。更にこのブロックは、残っている基底関数に対応する新しい動き係数も出力する。選択情報と動き係数との両方が復号器に送られる。
【図面の簡単な説明】
図１は、公知の符号器の概略図である。
図２は、公知の復号器の概略図である。
図３は、併合される隣り合っているセグメントを示す。
図４は、動きフィールド近似による併合を示す。
図５は、本発明による動きフィールド符号器である。
図６は、ＱＲ動き分析器の概略図である。
図示した実施例の説明
図５は、本発明による動きフィールド符号器を示している。これは図１のブロック３に対応するものであるけれども、入力として参照フレーム及び現在のフレームも有する。このブロックへの第３の入力は、図１の動きフィールド推定ブロック２によって作成された動きベクトルフィールド［Δx(・)，Δy(・)］である。
ビデオ符号器の出力が各セグメントに分割された圧縮されているフレームであり、その各セグメントに動き係数が随伴しているとするならば、座標(x_i,y_i），i=1，2，．．．，P、のP個のピクセルからなるセグメントＳ_iについて、動きフィールド符号器の任務は、圧縮されている動きベクトルフィールド

の動き係数

を求めることであり、その動きベクトルは線形動きモデルで記述され、そのフィールドは下記の形を持っており：

それは二乗予測エラー：

を最小化するようになっている。
前記の任務を達成するために、動きフィールド符号器は３つの主な構成ブロックからなっており、それらはＱＲ動き分析ブロック、セグメント併合ブロック、及び動き係数除去ブロックである。セグメント併合ブロック及び動き係数除去ブロックは、二乗予測エラーを増大させる結果をもたらす動き情報の量を減少させる。
ＱＲ動き分析器の目的は、動きフィールドの新しい表示を求めることである。その新しい表示は、後に、他の２つのブロック、即ち併合されたセグメントについての動き係数を高速で且つ柔軟に求めるためのブロック及び係数除去を行うブロック、で使用される。ＱＲ動き分析器の動作は下記のステップからなる。
即ち、ステップ１はエラーの線形化である。このステップでは方程式（５）の予測フレーム

が公知の何らかの近似方法によって近似されて

に関して線形となる。すると、式（５）の総和での各要素は係数ｃ_iの一次結合となる：

ステップ２は行列の作成である。それは、式（６）の最小化が行列表示

の最小化と全く同等であるという事実に基づいており、ここでＥ_i及びｙ_iは次の通りである：

ステップ３はＱＲ因数分解である。周知のＱＲ因数分解アルゴリズムがG.H.ゴルブ及びＣ．ブァン・ローンの文献”行列計算”第２版、１９８９年ジョンズホプキンス大学出版会（the document G. H. Golub and C. van Loan, ”Matrix computation”2'nd edition, The Jojns Hopkins University Press, 1989）で解説されている。このアルゴリズムを使用してＥ_iを２つの行列の積に分解する：
Ｅ_i＝Ｑ_iＲ_i (8)
このステップで補助ベクトルｚ_iも計算するが、これは次の通りである：
ｚ_i＝Ｑ^T _iｙ_i (9)
ステップ４ではＱＲ動き分析ブロックの出力を計算する。その出力は行列Ｒ_iの始めのN+M個の行からなる行列Ｒ_i ¹と、ｚ_iの始めのN+M個の要素からなるベクトルｚ¹ _iとからなる。
セグメント併合ブロックでは、隣り合うセグメントＳ_i及びＳ_j（図４を参照）の対を、動き係数ｃ_ijによって記述される共通の動きフィールドを使ってそれらの結合領域Ｓ_ijを予測できるか否かを判定して、併合操作がなされる。合併操作は次の各ステップからなる。
即ち、ステップ１は行列計算からなる。本発明は、下記の一次方程式の系：

を解くことによって動き係数ｃ_ijを求めることができるという、以前は知られていなかった性質を利用する。ここでＲ¹ _i，ｚ¹ _i及びＲ¹ _j，ｚ¹ _jは、それぞれセグメントＳ_i及びＳ_jについてＱＲ分析ブロックにより既に作成されている。
ステップ２は、ステップ１で得られた行列を三角行列にする（triangularization）ステップである。行列Ｒ¹ _i，Ｒ¹ _jは上部の三角行列であり、方程式系（１０）は前記の文献の教示によると下記の形を有する：

ここで記号×はゼロでない要素を表す。前記の文献の教示によると、この系は、各行にスカラーを乗じる一連の掛け算を行った後にそれらの行を加えることによって三角行列にされる、即ち、それは下記の形に変換される：

ステップ３では併合エラーを計算する。セグメントＳ_i及びＳ_jの併合により生じる領域Ｓ_ijでの二乗予測エラーの変化ΔＥ_ijを前記の文献の教示に従って次のようにして計算する：

最後に、ステップ４で、式（１３）の二乗予測エラーの変化が、選択された基準に照らして許容できる程度であるならば、前記のセグメント同士を併合させる。それにより得られた新しいセグメントＳ_ijについて、方程式系（１２）の始めのN+M個の行をとることによって行列Ｒ¹ _ij及びベクトルｚ¹ _ijを作成する、即ちそれらは下記の式によって与えられる：

フレームのセグメントの全ての対が検討された後、セグメント併合ブロックの出力が得られる。その出力は３種類の情報からなる。第１に、その出力は、セグメントの個数が減らされている、画像の新しい分割態様を提示する。第２に、新しいセグメントの各々について該ブロックは行列Ｒ¹ _ij及びベクトルｚ¹ _ijを出力する。第３に、該ブロックは併合情報を提示し、それは復号器に送られて、復号器が併合されたセグメントを識別しやすくする。
今、方程式系Ｒ¹ _ijｃ_ij＝ｚ¹ _ijを解くことによってセグメントＳ_ijについての動き係数ｃ_ij（ｃ₁，ｃ₂，．．．ｃ_N+M）を計算することができるが、もし次のブロック（係数除去ブロック）を使用するのならばその計算は不要である。
ここで係数除去ブロックの動作を考察する。このブロックは入力として現在のフレームのセグメントへの新しい分割態様を受け取るとともに、全てのセグメントＳ_kについて、前もってセグメント併合ブロックによって作成された行列Ｒ¹ _k，ｚ¹ _k及びｃ_kを受け取る。どのセグメントの動きベクトルもN+M個の動き係数によって表示される。
動き係数除去ブロックは、与えられたセグメントＳ_kについて予測エラーをあまり増大させることなく動きフィールドモデルを簡単化することが可能か否かを判定する。本明細書の背景技術の項で説明した方程式（３）のモデルから何らかの基底関数が除去されると、簡単化された動きフィールドモデルが得られる。その様に簡単化された動きフィールドモデルを記述するのに必要な係数は比較的に少なくなっている。
動きフィールドモデルからi番目の基底関数（及びi番目の係数）を除去できるか否かを判定するために、下記の処理手順が各セグメントに対して実行される。
即ち、ステップ１は行列修正を含んでおり、ここでＲ¹ _kからi番目の列を除去し、ｃ_kからi番目の要素を除去することによって下記の一次方程式系：
Ｒ¹ _kｃ_k＝ｚ¹ _k (15)
を修正する。
ステップ２は行列三角化を含んでおり、ここで各行にスカラーを乗じる一連の掛け算を行った後にそれらの行を加えることによって方程式系（１５）が公知の方法で三角行列にされる、即ち、それは下記の形に変換される：

ステップ３はエラー評価を含んでいる。i番目の係数の除去によって生じるセグメントについての二乗予測エラーの変化は、単に方程式（１６）の中の項ｑ² _iに等しい。
ステップ４は係数の除去を含んでいる。もし予測エラーの変化が選択された基準に照らして許容できる程度であれば、係数ｃ_iは係数の集合から除去される。これで係数の新しい個数はN+M-1となる。行列Ｒ¹ _k及びベクトルｚ¹ _kは下記の形に修正される：

行列（１７）を方程式（１５）で使用してステップ１−４を繰り返すことによって、このセグメントについての係数の個数を更に減らすことができる。
ステップ５は係数計算を含んでいる。充分な数の係数が除去された後、このステップが始まる。このステップで、一次方程式系：
Ｒ¹ _kｃ_k＝ｚ¹ _k (18)
を解くことによってセグメントＳ_kについての最終の動き係数が計算される。ここで行列Ｒ¹ _k及びベクトルｚ¹ _kは前のステップ１−４の結果である。例えば後退代入（backsubstitution）など、周知のアルゴリズムの１つを使ってこの方程式を解くことができる。
動き係数除去ブロックは、処理した全てのセグメントについて、どの基底関数が動きフィールドモデルから除去されたのかを復号器に知らせる選択情報を出力する。また、該ブロックは残っている基底関数に対応する新しい動き係数を出力する。この選択情報と動き係数との両方が復号器に送られる。
これらブロックの全てにおける全てのステップの結果として、本発明による動きフィールド符号器は、どのセグメントが併合されたかを復号器に知らせる併合情報と、どの基底関数が除去されたかを復号器に知らせる選択情報と、動き係数情報とを作成する。
従来技術と比べると、本発明の主な利点は、予測エラーを大幅に増大させずに動き情報の量を大幅に減少させることができることである。また、システム全体の複雑さが低くて、利用可能な信号処理装置或いは汎用マイクロプロセッサで実用的に実現できる。
セグメント併合ブロックは、別々のセグメントについて推定された与えられた動きベクトルから、結合されたセグメントの動きベクトルを求めることができるという独特の能力を持っている。このブロックが作成する動きベクトルは、実際上、結合されたセグメントについての二乗エラーを最小に保つことに関して最適であることを証明することができる。このことが、このブロックが二乗予測エラーをごくわずか増大させるだけでセグメントの個数を劇的に減少させる能力を持っていることの理由である。
動き係数除去ブロックは、動きモデルをビデオシーンの中の動きの実際の量及び種類に瞬時に適応させるための非常に強力な手段である。このブロックは、例えば実現可能なあらゆる動きフィールド基底関数の組み合わせなど、非常に多数のモデルで予測の結果（セグメントについての二乗予測エラーの値）を容易に試験することができる。他の公知の如何なる方法もこの様な柔軟性を持っていない。この方式の強力な利点は、動き推定のプロセスを反復する必要がないので計算が簡単だということである。
動き推定後にＱＲ動き分析を行うことによって、この動きフィールド符号器は、画像セグメントの所望の如何なる組み合わせについても、或いはセグメントの動きフィールドの所望の如何なるモデルについても、非常に単純な一次方程式系を解くことによって新しい動き係数を求めることができる。
好ましい実施例の説明
好ましい実施例では１２個の係数を有する下記の二次多項式動きベクトルフィールドモデルが使用される：

このモデルは、実際上、ビデオシーケンスにおける非常に複雑な動きでも十分に処理することができて、良好な予測結果をもたらす。
ＱＲ動き分析ブロックで、下記の各点：
x′_i＝x_i＋Δx(x_i，ｙ_i）
y′_i＝y_i＋Δy(x_i，y_i）
の周りの全てのピクセル(x_i，y_i），i=1，2，．．．，P、での

のテイラー展開（Taylor expansion）を使ってステップ１での線形化を行う。

という性質を使うと、予測エラーは下記の通りとなる：

下記の式：

を使って補助値ｇ_j(x,y)を計算する。ここで関数ｆ_j(x_i，y_i）は方程式（4a）及び(4b）で定義される基底関数である。
方程式（９）の中の行列E及びベクトルyは下記の式：

を使って作成される。Ｇ_x(x,y)及びＧ_y(x,y)は、下記の式：

を使って計算される参照フレーム

の水平勾配及び垂直勾配の値である。
図６はＱＲ動き分析器の略図である。行選択ブロックは入力された行列の始めのN+M個の行だけを選択する。セグメント併合ブロックでは、セグメント併合のために下記の方針が実行される：
ａ．フレーム全体で許容される二乗予測エラーの増加に対応するしきい値Ｔを選択する。
ｂ．隣り合うセグメント同士の全ての対について方程式（１３）を使ってΔＥ_ijを計算する。
ｃ．ΔＥ_ijが最小であるセグメントの対を併合させる。
ｄ．併合された全てのセグメント対に対応するΔＥ_ijの総和がTより大きくなるまでポイントｂ−ｃを繰り返す。
方程式系（１１）を三角化するために、一連のギブンズ回転（Givens rotations）が使用される。
動き係数除去ブロックでは係数除去のために下記の方針を実行する：
ａ．フレーム全体で許容される二乗予測エラーの増加に対応するしきい値Ｔを選択する。
ｂ．全てのセグメント及び全ての基底関数について方程式（１６）を使ってｑ_i ²を計算する。
ｃ．ｑ_i ²が最小のセグメントの基底関数を除去する。
ｄ．いろいろなセグメントで除去された全ての基底係数に対応する全てのｑ_i ²の総和がTより大きくなるまでポイントｂ−ｃを反復する。
式の系（１６）は、一連のギブンズ回転によって三角化される。
セグメントの最終の動き係数は、後退代入アルゴリズムを使用して方程式（１８）を解くことによって計算される。

のピクセル値は、整数座標x及びyだけについて定義される。x又はyが整数でない多くの場合に、整数座標を持った最も近い各ピクセルの双線形補間を行ってそのピクセル値を計算する。
このシステムを、本発明の範囲から逸脱せずにいろいろに実施することができる。例えば、方程式（３）でいろいろな線形動きモデルを使用することができる。いろいろな方法を使用して式（５）の項を線形化することができる。更に、２つのセグメントを併合させるか否かを決定するためにいろいろな基準を用いることができる。与えられた基底関数をモデルから除去するべきか否かを決定するための方針は様々であってよい。方程式（１０）及び（１５）の各行列の三角化をいろいろなアルゴリズムを使用して実行することができ、方程式（１８）を解くことによる最終係数の計算を、一次方程式系を解くための公知のいろいろなアルゴリズムを使用して実行することができる。最後に、非整数座標における

の値に対していろいろな補間方法を使用することができる。Field of Invention
The present invention relates generally to video compression. More precisely, the present invention relates to a method for generating motion information in a video sequence by encoding an estimated motion field.
Background of the Invention
Motion compensated prediction is a very important element of most video coding schemes. FIG. 1 is a schematic diagram of an encoder that compresses a video sequence using motion compensation. The essential elements of this encoder are a motion compensated prediction block 1, a motion estimator 2, and a motion field encoder 3. The operating principle of the motion compensated video encoder is that the incoming frame I to be encoded, called the current frame, is_n(x, y) and prediction frame

Prediction error E_nwhere (x, y) is compressed, where

Prediction frame

Is created by motion compensated prediction block 1 and has been previously or otherwise encoded

And a motion vector of each pixel between the current frame and the reference frame. The motion vector is calculated by the motion field estimator 2 and the resulting vector field is encoded in some way before being input to the prediction block 1. The prediction frame is as follows:

The number pair [x + Δx (x, y), y + Δy (x, y)] is called the motion vector of the pixel at location (x, y) in the current frame and Δx (x, y ) And Δy (x, y) are the horizontal and vertical displacement values of this pixel. Current frame I_nThe set of motion vectors for all pixels in (x, y) is called the motion vector field. The encoded motion vector field is also sent to the decoder as motion information.
In the decoder of FIG._nEach pixel at (x, y) is a reference frame

The predicted value of the pixel in

Is restored (reconstructed). The motion compensation prediction block 21 receives the received motion information and reference frame.

(In this picture, the reference frame is the same as the current frame) to create a prediction frame. In the prediction error decoder 22, the decoded prediction error E_n(x, y) is added to the predicted frame to get the original current frame I_nIs obtained.
The general purpose of motion compensated (MC) prediction is to minimize the amount of information that must be sent to the decoder. This prediction is for example E_nThe amount of prediction error measured as (x, y) energy should be as small as possible, and the amount of information required to display the motion vector field should be as small as possible.
March 26-18, 1990, H., published on pages 841-845 of the Minutes of the Image Coding Symposium '90 in Cambridge, Massachusetts. Nguyen, E. Dubois, “The document H. Nguen, E. Dubois,“ Representation of motion information for image coding ”. Proc. Picture Coding Symposium '90, Cambridge, Massachusetts, March 26-18, 1990, pages 841-845) outlines motion field coding techniques. Roughly speaking, in order to reduce the prediction error, a more complex motion field is required, i.e. a larger number of bits must be used for the encoding. The overall goal of video coding is therefore to encode the motion vector field as compactly as possible while keeping the prediction error measure as low as possible.
Motion field estimation block 1 (FIG. 1) calculates a motion vector that minimizes some measure of prediction error, such as, for example, a square prediction error, for all pixels of a given segment. . Motion field estimation techniques differ in both the motion field model and the algorithm for minimizing the selected measure of prediction error.
Due to the very large number of pixels in the frame, it is not efficient to send a separate motion vector for each pixel. In most video coding schemes, the current frame is divided into large image segments so that all motion vectors of the segment can be described with a small number of parameters. The image segment may be a square block. For example, a block of 16 × 16 pixels is used in a codec (codec) conforming to the international standard ISO / IEC MPEG-1 or ITU-TH.261. In addition, the image segment may be composed of a region having an arbitrary shape obtained by, for example, a division algorithm. In practice, each segment contains at least tens of pixels.
In order to display the motion vector of each pixel in the segment in a compact manner, it is desirable that these values can be described by a function of a small number of parameters. Such a function is called a motion vector field model. A known group of models are linear motion models whose motion vectors are linear combinations of motion field basis functions. In such a model, the motion vector of an image segment has the following general formula:

Where parameter c_iIs called a motion coefficent and is sent to the decoder. Function f_i(x, y) is called the motion field basis function, which is fixed and known to both the encoder and the decoder.
The problem when using a linear motion model with the above equation is the prediction error E_nThe motion coefficient c sent to the decoder while keeping the scale of (x, y) as low as possible._iHow to minimize the number of. This process is performed by the motion field encoding block 3 in the encoder (see FIG. 1). It is performed after the motion field estimation, which is very complicated with respect to the calculation, achieved by block 2. Therefore, motion field coding is simple in terms of computation and it is very important that no additional burden is placed on the encoder.
The total number of motion coefficients that need to be sent to the decoder depends on both the number of segments in the image and the number of motion coefficients per segment. Therefore, there are at least two ways to reduce the total number of motion coefficients.
The first method is to reduce the number of segments by combining (merging) segments that can be predicted with a common motion vector field without significantly increasing the prediction error. It is very often possible to predict neighboring segments with the same set of motion coefficients, so that the number of segments in a frame can be reduced. The process of joining such segments is called motion assisted merging.
The second method is to select a motion model for each segment that can achieve a sufficiently low prediction error with as few coefficients as possible. Since the amount and complexity of motion varies from frame to frame and from segment to segment, it is not efficient to always use all N + M motion coefficients per segment. The minimum number of motion coefficients that can achieve a satisfactorily low prediction error needs to be determined for all segments. The process of adaptive selection of such coefficients is called motion coefficient removal.
FIG. 3 shows a frame divided into segments. Prior art techniques for performing motion coefficient coding include several techniques for performing motion-assisted merging. After all segment motion vectors have been estimated, motion-assisted merging is performed. It is the adjacent segment S_iAnd S_jAll pairs of their motion coefficients c_iAnd c_jIt is done with consideration. Combined segment S_iAnd S_jThe area of S_ijIs displayed. Predictive error S_iAnd S_jMotion coefficient c without significantly increasing the error resulting from separate predictions of_ijA region S in a set of_ijIf we can predict_iAnd S_jAre merged. The method of performing motion-assisted merging is essentially a motion factor c that makes it possible to predict well the segments combined with each other._ijDiffer in the way of finding a single set of
One method is known as a merge method with thorough motion estimation. This method uses the adjacent segment S_iAnd S_jMotion coefficient c for all pairs of_ijEstimate a new set of “from scratch”. If S_ijIf the prediction error for is not too large, segment S_iAnd S_jAre merged. Although this method allows very good selection of segments that can be merged, it usually does not do so because it increases the complexity of the encoder by several orders of magnitude.
Another method is known as merging by motion field expansion. This method does not increase the prediction error too much, without any motion parameter c_iOr c_jUsing S_ijIt is tested whether the area of can be predicted. This method is characterized by very low computational complexity because it does not require any new motion estimation. However, with this method, it is very rare that even a neighboring segment can be predicted successfully by motion compensation with the coefficients calculated for one segment, so it is very often impossible to merge segments. It is in.
Another method is known as merge by motion field fitting. In this method, the motion coefficient c_ijIs calculated by an approximate method. That is done by evaluating a small number of motion vectors in each segment. Segment S in FIG._iAnd S_jSeveral motion vectors are drawn. Segment S_ijIs created by fitting a common motion vector field through these vectors using some known fitting method. The disadvantage of this method is that the motion field obtained by fitting is not precise enough and the prediction error is often unacceptably large.
The following two literatures suggest ways to estimate motion with various models and select the most appropriate one:
1994, International Conference Proceedings on Acoustics, Speech and Signal Processing, pages III265-268, H. Nicholas and C. Rabbit's "Regional Motion Estimation Method Using Deterministic Relaxation Method for Image Sequence Coding" ”(H. Nicolas and C. Labit,“ Region-based motion estimation using deterministic relaxation schemes for image sequence coding, ”Proc. 1994 International Conference on Acoustics, Speech and Signal Processing, pp. III265-286);
IEEE Bulletin on Circuits and Systems for Video Technology, June 1994, Vol. 3, No. 4, pp. 357-364, p. “Efficient regient-based motion estimation and symmetry oriented segmentation for image” by P. Cicconi and H. Nicolas, “Efficient regient-based motion estimation and symmetry oriented segmentation for image” sequence coding, "IEEE Trans. on Circuits and Systems for Video Technology, Vol. 4, No. 3, June 1994, pp. 357-364)
These methods attempt to adapt the motion model according to the complexity of the motion by performing motion estimation with various models and selecting the most appropriate one. The main drawback of these methods is that they are computationally complex and the amount of different motion field models that can actually be tested is small.
None of the above methods alone is a prediction error E_nThe motion coefficient c sent to the decoder while keeping the scale of (x, y) as low as possible._iThe problem of minimizing the number cannot be solved.
Summary of the Invention
It is an object of the present invention to create a motion field encoder that greatly reduces the amount of motion field vector information created by known motion estimation methods without significantly increasing prediction errors. The complexity of the motion field encoder must be low so that it can be implemented practically with available signal processors or general purpose microprocessors.
According to the invention, the motion field encoder includes three main blocks.
The first main block is called the QR motion analyzer. Its task is to seek a new representation of the motion field created and input by the motion field estimator. The new display is entered into the second main block. The operation in the first main block includes a plurality of steps consisting of matrix operations. In the first step, the prediction frame is linearized by a known approximation method, and the prediction frame is linear with respect to the motion vector. In the second step, the matrix E is used to minimize the square prediction error._iAnd matrix y_iIs created. In the third step, the matrix E using the well-known QR factorization algorithm_iTo two matrices Q_iAnd R_iIt decomposes into the product of Also, the factor matrix Q_iAnd matrix y_iTo auxiliary vector z_iCalculate Matrix R_iPart and auxiliary vector z_iAre input to the second main block.
A second main block, called a segment merging block, determines whether the combined region of adjacent pairs of segments can be predicted using a common motion field, and those segments Merge pairs of. In matrix operation, a matrix equation is first created, and then a factor matrix is processed using a known matrix calculation method. The result is a matrix equation, the matrix of which contains several terms, on which a square prediction error in the region of the merged segment can be easily calculated. If the change in squared prediction error is acceptable to the chosen criteria, the segments are merged.
After all segment pairs have been considered, the output of the merge segment block is:
i. A newly segmented image with a reduced number of segments,
ii. For each new segment, the block is a matrix R¹ _ij, Vector z¹ _ijOr
iii. Merged information, which is sent to the decoder to help the decoder identify the merged segment.
The third main block is called a coefficient removal block. This block receives as input the current frame newly divided into segments, and a matrix R created by the segment merge block for all segments.¹ _k, Z¹ _kAnd c_kReceive. The motion vectors of all segments are displayed by several motion coefficients. The motion factor removal block determines, for each segment, whether the motion field model can be simplified without significantly increasing the prediction error. Some basis functions are removed from the motion model, so fewer coefficients are needed to describe a simplified motion field model.
The operation in this third main block is a matrix operation, which first modifies the matrix equation by removing one row and column of the factor matrix, and then makes the matrix equation triangular (triangularized) . Removing one column and row corresponds to removing one basis function from the motion model. The change in squared prediction error for a segment caused by the removal of one basis function is equal to the square of one term in the equation.
A coefficient is removed from the set of coefficients if the change in prediction error is acceptable in light of the selected criteria. These matrix operations can be further repeated to reduce more coefficients for the segment. After a sufficient amount of coefficients has been removed, the final motion coefficient for the segment is calculated by solving the resulting linear equation. The equation can be solved using one of the well-known algorithms, for example backsubstitution.
The third main block outputs selection information that informs which basis functions have been removed from the motion field model for all processed segments. The block also outputs new motion coefficients corresponding to the remaining basis functions. Both selection information and motion coefficients are sent to the decoder.
[Brief description of the drawings]
FIG. 1 is a schematic diagram of a known encoder.
FIG. 2 is a schematic diagram of a known decoder.
FIG. 3 shows adjacent segments being merged.
FIG. 4 illustrates merging with motion field approximation.
FIG. 5 is a motion field encoder according to the present invention.
FIG. 6 is a schematic diagram of a QR motion analyzer.
Description of the illustrated embodiment
FIG. 5 shows a motion field encoder according to the invention. This corresponds to block 3 of FIG. 1, but also has a reference frame and a current frame as inputs. The third input to this block is the motion vector field [Δx (•), Δy (•)] created by the motion field estimation block 2 of FIG.
If the output of the video encoder is a compressed frame divided into segments, and each segment has a motion coefficient associated with it, the coordinates (x_i, y_i), I = 1, 2,. . . , P, segment S consisting of P pixels_iAbout motion field encoder mission, motion vector field being compressed

Motion coefficient

The motion vector is described by a linear motion model, and the field has the following form:

It is a square prediction error:

Is to be minimized.
To accomplish the above task, the motion field encoder consists of three main building blocks, a QR motion analysis block, a segment merge block, and a motion coefficient removal block. The segment merging block and the motion coefficient removal block reduce the amount of motion information that results in increasing the square prediction error.
The purpose of the QR motion analyzer is to determine a new representation of the motion field. The new display is later used in the other two blocks: a block for fast and flexible determination of motion coefficients for the merged segments and a block for coefficient removal. The operation of the QR motion analyzer consists of the following steps.
That is, step 1 is error linearization. In this step, the prediction frame of equation (5)

Is approximated by some known approximation method

Is linear with respect to Then, each element in the sum of Expression (5) is a coefficient c_iIs a linear combination of:

Step 2 is the creation of a matrix. That is, the minimization of equation (6) is a matrix display

Based on the fact that it is exactly equivalent to minimizing_iAnd y_iIs as follows:

Step 3 is QR factorization. The well-known QR factorization algorithm is described by G.H. 2nd edition of the document "Matrix Calculation" by Van Lone, explained in 1989 Johns Hopkins University Press (the document GH Golub and C. van Loan, "Matrix computation" 2'nd edition, The Jojns Hopkins University Press, 1989) Has been. Using this algorithm E_iDecomposes into the product of two matrices:
E_i= Q_iR_i (8)
At this step, auxiliary vector z_iAlso computes as follows:
z_i= Q^T _iy_i (9)
In step 4, the output of the QR motion analysis block is calculated. Its output is the matrix R_iMatrix R with N + M rows at the beginning of_i ¹And z_iVector z of N + M elements at the beginning of¹ _iIt consists of.
In the segment merge block, adjacent segment S_iAnd S_jA pair of motion coefficients c (see FIG. 4)_ijTheir combined regions S using a common motion field described by_ijIs determined, and a merging operation is performed. The merge operation consists of the following steps.
That is, step 1 consists of matrix calculation. The present invention provides the following system of linear equations:

By solving for the motion coefficient c_ijIt uses the property that was not known before. Where R¹ _i, Z¹ _iAnd R¹ _j, Z¹ _jIs segment S_iAnd S_jHas already been created by the QR analysis block.
Step 2 is a step of making the matrix obtained in Step 1 a triangular matrix. Matrix R¹ _i, R¹ _jIs the upper triangular matrix and the system of equations (10) has the form:

Here, the symbol x represents a non-zero element. According to the teachings of the above references, the system is triangulated by performing a series of multiplications that multiply each row by a scalar and then adding those rows, ie it is converted to the form:

In step 3, the merge error is calculated. Segment S_iAnd S_jRegion S resulting from merger_ijChange of square prediction error ΔE_ijIs calculated according to the teachings of the above literature as follows:

Finally, in step 4, if the change in the squared prediction error of equation (13) is acceptable to the selected criteria, the segments are merged. New segment S obtained thereby_ijFor the matrix R by taking the first N + M rows of the system of equations (12)¹ _ijAnd vector z¹ _ijI.e. they are given by the following formula:

After all pairs of segments of the frame have been considered, the output of the segment merge block is obtained. The output consists of three types of information. First, the output presents a new segmentation of the image in which the number of segments is reduced. Second, for each new segment, the block is a matrix R¹ _ijAnd vector z¹ _ijIs output. Third, the block presents merge information, which is sent to the decoder to help the decoder identify the merged segment.
Now the system of equations R¹ _ijc_ij= Z¹ _ijSegment S by solving_ijMotion coefficient c for_ij(C₁, C₂,. . . c_{N + M}) Can be calculated, but if the next block (coefficient removal block) is used, that calculation is not necessary.
Now consider the operation of the coefficient removal block. This block receives as input the new division into segments of the current frame and all segments S_kThe matrix R created in advance by the segment merging block¹ _k, Z¹ _kAnd c_kReceive. Every segment's motion vector is represented by N + M motion coefficients.
The motion coefficient removal block is given a segment S_kIt is determined whether the motion field model can be simplified without significantly increasing the prediction error. If any basis function is removed from the model of equation (3) described in the background section of this specification, a simplified motion field model is obtained. Fewer coefficients are needed to describe such a simplified motion field model.
In order to determine whether the i th basis function (and i th coefficient) can be removed from the motion field model, the following procedure is performed for each segment.
That is, step 1 includes matrix modification, where R¹ _kRemove the i th column from c_kBy removing the i th element from the linear system:
R¹ _kc_k= Z¹ _k (15)
To correct.
Step 2 includes matrix triangulation, where the system of equations (15) is triangulated in a known manner by adding the rows after performing a series of multiplications that multiply each row by a scalar: It is converted to the following form:

Step 3 includes error evaluation. The change in square prediction error for the segment caused by the removal of the i th coefficient is simply the term q in equation (16).² _ibe equivalent to.
Step 4 includes the removal of coefficients. If the change in prediction error is acceptable to the selected criteria, the coefficient c_iIs removed from the set of coefficients. The new number of coefficients is now N + M-1. Matrix R¹ _kAnd vector z¹ _kIs modified to:

By repeating steps 1-4 using matrix (17) in equation (15), the number of coefficients for this segment can be further reduced.
Step 5 includes coefficient calculation. This step begins after a sufficient number of coefficients have been removed. In this step, the system of linear equations:
R¹ _kc_k= Z¹ _k (18)
Segment S by solving_kThe final motion coefficient for is computed. Where matrix R¹ _kAnd vector z¹ _kIs the result of the previous step 1-4. This equation can be solved using one of the well-known algorithms such as backsubstitution.
The motion coefficient removal block outputs selection information that informs the decoder which basis functions have been removed from the motion field model for all processed segments. The block also outputs new motion coefficients corresponding to the remaining basis functions. Both this selection information and the motion coefficients are sent to the decoder.
As a result of all steps in all of these blocks, the motion field encoder according to the present invention provides merge information to inform the decoder which segments have been merged and selection information to inform the decoder which basis functions have been removed. And motion coefficient information.
Compared to the prior art, the main advantage of the present invention is that the amount of motion information can be greatly reduced without significantly increasing the prediction error. Further, the complexity of the entire system is low, and it can be practically realized by an available signal processing device or a general-purpose microprocessor.
The merged segment block has the unique ability to determine the combined segment motion vectors from the given motion vectors estimated for the different segments. The motion vector produced by this block can prove to be optimal in practice with respect to keeping the square error for the combined segments to a minimum. This is why this block has the ability to dramatically reduce the number of segments with only a slight increase in square prediction error.
The motion coefficient removal block is a very powerful means for instantly adapting the motion model to the actual amount and type of motion in the video scene. This block can easily test the prediction results (the value of the squared prediction error for a segment) on a very large number of models, for example any possible combination of motion field basis functions. None of the other known methods have such flexibility. A powerful advantage of this scheme is that it is easy to calculate because it does not require iterating the motion estimation process.
By performing QR motion analysis after motion estimation, this motion field encoder solves a very simple system of linear equations for any desired combination of image segments, or any desired model of segment motion fields. Thus, a new motion coefficient can be obtained.
DESCRIPTION OF PREFERRED EMBODIMENTS
In the preferred embodiment, the following second order polynomial motion vector field model with 12 coefficients is used:

This model can effectively handle even very complex motions in a video sequence and gives good prediction results.
In the QR motion analysis block, the following points:
x ′_i= X_i+ Δx (x_i, Y_i)
y ′_i= Y_i+ Δy (x_i, Y_i)
All pixels around (x_i, Y_i), I = 1, 2,. . . , P, in

Perform linearization in step 1 using Taylor expansion.

Using this property, the prediction error is as follows:

The following formula:

Auxiliary value g using_jCalculate (x, y). Where function f_j(x_i, Y_i) Is a basis function defined by equations (4a) and (4b).
The matrix E and vector y in equation (9) are:

Created using G_x(x, y) and G_y(x, y) is the following formula:

Reference frame calculated using

Of horizontal and vertical slopes.
FIG. 6 is a schematic diagram of a QR motion analyzer. The row selection block selects only the first N + M rows of the input matrix. In the segment merge block, the following policy is implemented for segment merge:
a. A threshold T corresponding to the increase in the square prediction error allowed for the entire frame is selected.
b. ΔE using equation (13) for all pairs of adjacent segments_ijCalculate
c. ΔE_ijMerge the pair of segments with the smallest.
d. ΔE corresponding to all merged segment pairs_ijRepeat points bc until the sum of becomes greater than T.
To triangulate the system of equations (11), a series of Givens rotations are used.
The motion coefficient removal block implements the following policy for coefficient removal:
a. A threshold T corresponding to the increase in the square prediction error allowed for the entire frame is selected.
b. Q using equation (16) for all segments and all basis functions_i ²Calculate
c. q_i ²Remove the basis function of the smallest segment.
d. All q corresponding to all basis coefficients removed in various segments_i ²Repeat points bc until the sum of is greater than T.
The system of equations (16) is triangulated by a series of Givens rotations.
The final motion coefficient of the segment is calculated by solving equation (18) using a backward substitution algorithm.

Are defined only for integer coordinates x and y. In many cases where x or y is not an integer, bilinear interpolation of each nearest pixel with integer coordinates is performed to calculate the pixel value.
This system can be implemented in various ways without departing from the scope of the present invention. For example, various linear motion models can be used in equation (3). Various methods can be used to linearize the term in equation (5). In addition, various criteria can be used to determine whether to merge two segments. The policy for determining whether a given basis function should be removed from the model can vary. Triangulation of each matrix of equations (10) and (15) can be performed using various algorithms, the calculation of the final coefficients by solving equation (18), known for solving the system of linear equations Can be implemented using a variety of algorithms. Finally, in non-integer coordinates

Various interpolation methods can be used for the values of.

Claims

A video codec for coding a current frame I _n of a video sequence, the video codec has a motion field coder for supplying the motion vector field is coded, the motion field The encoder is
And the current frame I _n,
Reference frame

When,
A motion analyzer adapted to receive a motion vector field;
The motion analyzer is adapted to determine the new display of the motion vector field, the display, for each segment S _i of a plurality of image segments of the current frame I _n, represented by the vector c _i It is accompanied by a linear motion model based on a set of N + M motion coefficients and a set of basis functions, where N and M are respectively motion vector fields that are compressed.

The number of motion coefficients of
The motion analyzer is
For each segment S _i of the current frame, based on the matrix display E _{_{_i}} c _i -y _i, reference and its current frame segment frame

Prediction error linearization means for approximating a prediction error function based on a segment of E _i , where E _i is a matrix, y _i is a vector,
Here is the prediction error function approximated by the motion analyzer for each segment of the current frame:

here,

And for each segment of the current frame includes matrix processing means for generating a ^first matrix R _i ¹ and a first vector z _i ¹ from the matrix representation, where R _i ¹ c _i = z _i ¹ , the number of rows of the ^first matrix R _i ¹ is equal to the number of motion coefficients N + M,
Here, the matrix processing means includes QR factorization means, which decomposes the matrix E _i into the products of the matrices Q _i and R _i , and generates auxiliary vectors z _i (where z _i = Q _i ^γ create a y _i), made from the beginning of the N + M number of elements of the vector z _i with the beginning of the N + M rows of matrix R _i form a R _i ¹ so as to form a vector z _i ¹ And
The motion field encoder has segment merging means, which means:
Coupled to the motion analyzer for receiving the first matrix R _i ¹ and the first vector z _i ¹ ;
Pair (S _i, S _j) of the adjacent segments of the current frame I _n for, merged matrix from the first matrix R _i ^1, R _j ¹ and first vectors z _i ^1, z _j each pair of ¹ Determining R _ij ¹ and the merged vector z _ij ¹ ,
If the first matrix R _i ¹ and the first vector z _i ¹ pair produce an acceptable prediction error with respect to the criteria on which the merged matrix R _ij ¹ and merged vector z _ij ¹ are selected, then Replacing the merge matrix R _ij ¹ and merge vector z _ij ¹ to merge the corresponding segment pairs;
The motion field encoder has coefficient removal means coupled to the segment merging means, the means comprising:
Receiving a merged matrix R _ij ¹ and a merged vector z _ij ¹ replacing the remaining first matrix R _i ¹ and the first vector z _i ¹ ;
For each segment or merged segment of the current frame I _n, with the corresponding matrix R _k ¹ and vector z _k ^1, so as to be acceptable in light of the criteria change in prediction error function approximated is selected Determine a reduced set of motion coefficients c _k ;
Adapted to provide an encoded version of the current frame I _n, the shape, for each segment or merged segment, said motion coefficients which reduced the number, which coefficients have been omitted A video codec comprising:

The segment merging means, for the pair (S _i , S _j ) of adjacent segments, has the following formula:

The following form:

2. The video codec of claim 1, including means for triangulating, wherein c _ij is a set of N + M motion coefficients of a pair of adjacent segments merged.

The segment merging means has the following formula for each pair (S _i , S _j ) of adjacent segments:

3. A video codec according to claim 2, wherein it is determined whether the prediction error is acceptable according to a selected criterion by calculating a change in square prediction error ΔE _{ij according} to.

The segment merging means uses the first N + M rows of the triangulated matrix equation to transform the merging matrix R _ij ¹ and the merging vector z _ij ¹ into the following form:

The video codec according to claim 2 or 3, wherein the video codec is determined as follows.

The encoded version of the current frame I _n the video codec according to any one of claims 1 to 4 contains an identifier of the merged segments.

It said coefficient removal means for each segment, the matrix R _k ¹ from the i-th column selectively thereby removing simplified by selectively removing the i-th element from the vector z _k ¹ matrix equation R _k 6. The video codec according to claim 1, wherein the reduced number of motion coefficients c _k is determined by providing ¹ c _k = z _k ¹ .

The coefficient removing means forms a simplified matrix equation in the following form:

Triangulate into the following formula:
ΔE _k = q _i ²
The video codec of claim 6, wherein the change in squared prediction error is calculated according to:

The coefficient removal means removes the motion coefficient c _i from the coefficient matrix c _k if the change in the square prediction error is acceptable in light of the selected criterion, and the triangulated matrix equation,

8. A video codec according to claim 7 adapted to remove the bottom row of.

The coefficient removal means, one of the matrix equation R _k ¹ c _k = z _k ¹ or claims 1 to 8 adapted to determine the motion coefficients c _k by solving the simplified form of the equation The video codec according to one item.