JP4187264B2

JP4187264B2 - Video signal encoding method and apparatus using feature point based motion estimation technique

Info

Publication number: JP4187264B2
Application number: JP52829596A
Authority: JP
Inventors: ヘモックチュン; ミンサブリー
Original assignee: Daewoo Electronics Co Ltd
Current assignee: WiniaDaewoo Co Ltd
Priority date: 1995-03-18
Filing date: 1995-05-06
Publication date: 2008-11-26
Anticipated expiration: 2023-11-26
Also published as: NO974209L; PL177586B1; MX9707106A; BR9510541A; TW257924B; AU711311B2; CN1178057A; PL322302A1; CA2215386C; JPH11506576A; KR0181034B1; EP0815690B1; ATE207279T1; CN1144471C; FI973715A0; KR960036766A; DE69523340D1; US5978030A; NO974209D0; AU2420695A

Abstract

An apparatus for encoding a digital video signal to reduce a transmission rate of the digital video signal, which comprises a feature point based motion compensation circuit for selecting a set of feature points from the reconstructed reference frame to detect a set of motion vectors between a current frame and a original reference frame corresponding to the set of feature points by using a feature point based motion estimation, and for generating a second predicted frame based on the set of motion vectors and the reconstructed reference frame. The feature point based motion estimation employs a convergence process in which a displacement of each of the feature points are given to a motion vector thereof and the six triangles of each of the hexagon are affine-transformed independently using the displacements of their vertex feature points. If the displacements provide a better PSNR, the motion vector of the subject feature point is sequentially updated. Therefore, the inventive convergence process is very efficient in the matching process to determine the predicted image as close as possible to the original image having zooming, rotation or scaling of objects.

Description

【０００１】
発明の技術分野
本発明は、映像信号符号化装置に関し、特に、改善された特徴点ベース動き推定技法を用いてディジタル映像信号を符号化して、高画質のディジタル映像信号の伝送レートを効果的に減らし得るディジタル映像信号符号化装置に関する。
【０００２】
背景技術
周知のように、ディジタル形態の映像信号の伝送は、アナログ信号の伝送より良好な画質を保持し得る。一連の映像「フレーム」から構成される映像信号をディジタル形態で表現する時、とりわけ、高精細度テレビ（ＨＤＴＶ）システムのデータ伝送には、大量のディジタルデータが必要である。しかし、通常の伝送チャネル上の利用可能な周波数帯域は制限されているため、その伝送チャネルを経て大量のディジタルデータを伝送するためには、伝送すべきデータを圧縮するか、または減らす必要がある。多様な映像圧縮技法のうち、統計的符号化技法と、時間的、空間的圧縮技法とを組み合わせた、いわゆるハイブリッド符号化技法が最も効率的なものとして知られている。
【０００３】
殆どのハイブリッド符号化技法は、動き補償ＤＰＣＭ（差分パルス符号変調）、２次元ＤＣＴ（離散的コサイン変換）、ＤＣＴ係数の量子化、ＶＬＣ（可変長符号化）などを用いる。この動き補償ＤＰＣＭは、現フレームとその前フレームまたは後続フレーム（即ち、基準フレーム）との間の物体の動きを推定し、物体の動きに応じて現フレームを予測して、現フレームとその予測値との間の差を表す差分信号を発生するプロセスである。この技法は、例えば、Staffan Ericsson氏の論文の「Fixed and Adaptive Predictors for Hybrid Predictive/Transform Coding」，IEEE Transactions on Communications,COM-33，NO.12（1985年12月）と、Ninomiya及びOhtsuka氏の論文の「A Motion-Compensated InterframeCoding Scheme for Television Pictures」，IEEE Transactions on Communications,COM-30,NO.1（1982年１月）に開示されている。
【０００４】
映像データ間の空間的冗長性を除去するまたは、低減する２次元ＤＣＴは、ディジタル映像データのブロック（例えば、８ｘ８画素よりなるブロック）を変換係数のデータの組に変換する。この技法は、例えば、Chen及びPratt氏の論文「Scene Adaptive Coder」，IEEE Transactions on Communications,COM-32,NO.3（1984年３月）に開示されている。そのような変換係数のデータを量子化部、ジグザグ走査及びＶＬＣにて処理することによって、伝送すべきデータの量を効果的に圧縮することができる。
【０００５】
詳述すると、動き補償ＤＰＣＭでは、現フレームとその基準フレームとの間の動きの推定に基づいて、現フレームのデータを対応する基準フレームのデータから推定する。このようにして、推定された動きは、基準フレームと現フレームとの間の画素の変位を表す２次元の動きベクトルによって表現される。
【０００６】
物体の画素の変位を推定するのには基本的に２つの方法がある。そのうちの一つは、ブロックベース推定法であり、他の一つは画素ベース推定法である。
【０００７】
ブロックベース動き推定法においては、現フレームのブロックは基準フレームのブロックと最も良好な整合が得られるまで比較する。これにより、全ブロックに対するフレーム間変位ベクトル（これは、複数の画素よりなるブロックがフレーム間でどの位移動したかを示す）を伝送されるべき現フレームに対して推定し得る。
【０００８】
このようなブロック整合技法は、ITU Telecommunication Standardization Sector Study Group 15、Working Party 15/1 Experts Group on Very Low BitRate Visual Telephony、「Video Codec Test Model,TMN4 Rev1」、（1994年10月25日）に開示されているように、ビデオシーケンスに含まれたＰフレーム及びＢフレームを予測するのに用いられ得る。ここで、Ｐまたは予測フレームは（基準フレームとしての）前フレームから予測され、Ｂまたは両方向予測フレームは（基準フレームとしての）前フレーム及び後続フレームから予測される。詳述すると、いわゆるＢフレームの符号化においては、前後方変位ベクトルを求めるために両方向動き推定法を用いる。前方変位ベクトルは、Ｂフレームと（基準フレームとしての）前イントラ（Ｉ）または予測（Ｐ）フレームとの間の物体の動きを推定して求められ、後方変位ベクトルは、Ｂフレーム及び（基準フレームとしての）後続イントラ（Ｉ）または予測（Ｐ）フレームに基づいて求められる。
【０００９】
しかしながら、ブロックベース動き推定法においては、ブロックの境界で現れるブロッキング効果が動き補償プロセスの際に生じ得、また、ブロック内の全画素が一方に移動しない場合には、正確な動きを推定しにくくなることによって、全体的な画質が低下される。
【００１０】
一方、画素ベース動き推定法を用いると、全ての画素に対して変位が求められる。この方法によると、画素値をより正確に推定し得、スケール変更（例えば、映像平面に鉛直した動き、即ち、ズーミング）も簡単に扱うことができる。しかし、この画素ベース動き推定法においては、動きベクトルが全ての画素に対して求められるため、実際に全ての動きベクトルのデータを受信機へ伝送することは不可能である。
【００１１】
画素ベース動き推定法によって生じる過度な伝送データに関する問題を克服するための方法のうちの一つとして、特徴点ベース動き推定法がある。
【００１２】
この特徴点ベース動き推定法においては、選択された画素（即ち、特徴点）の組に対する動きベクトルが受信機へ伝送される。ここで、各特徴点は隣接画素を表し得る画素として規定されることによって、受信機においては、各特徴点に対する動きベクトルを用いて非特徴点に対する動きベクトルを復元するかまたは近似化することができる。本願発明と出願人を同じくする係属中の米国特許出願番号第08/367,520号明細書に、「Method and Apparatus for Encoding a Video Signal Using Pixel-by-Pixel Motion Estimation」との各称で開示されているように、特徴点ベース動き推定法を採用するエンコーダにおいては、最初、前フレームに含まれている全ての画素から複数の特徴点が選択される。しかる後、選択された特徴点に対する動きベクトルが求められる。各動きベクトルは、前フレームにおける一つの特徴点と現フレームにおける対応する整合点（即ち、最も類似な画素）との間の空間的変位を表す。詳述すると、各特徴点に対する整合点は、周知のブロック整合アルゴリズムを用いて現フレーム内の探索領域で求められる。ここで、特徴点ブロックは選択された特徴点を取り囲むブロックとして定義され、探索領域は対応する特徴点の位置を取り囲む所定のエリア内の領域として定義される。
【００１３】
この場合、選択された特徴点に対応する全探索領域を通じて整合特徴点のうち最も良好な一つのみを求めるのが最も好ましい。しかしながら、特徴点の整合の際に、複数の同一の最適整合特徴点ブロックが得られる場合も有る。従って、特徴点ブロック及び対応する探索領域との間でそのような相関性を有する特徴点に対する動きベクトルを、正確に検索するのは困難である。さらに、現フレームにおいて、探索領域が基準フレームにおける特徴点と現フレームにおける対応する整合点（即ち、最も類似な画素）との間の空間的変位によって決定されない場合は、正確な動きを推定しにくくなることによって、全体的な画質が低下されるという不都合がある。
【００１４】
発明の開示
従って、本発明の目的は、特徴点に対する動きベクトルを効果的に推定することによって、高画質のディジタル映像信号の伝送率を効率的に減らし得るディジタル映像信号符号化方法を提供することにある。
【００１５】
本発明の他の目的は、映像信号符号化システムに用いられ、特徴点ベース動き推定法を用いて動きベクトルを効果的に推定して、高画質のディジタル映像信号の伝送率を効率的に減らし得るディジタル映像信号符号化装置を提供することにある。
【００１６】
本発明のさらに他の目的は、特徴点ベース動き推定法及びブロックベース動き推定法を選択的に用いて、全体的な映像の画質を効果的に向上させ得る映像信号符号化システムを提供することにある。
【００１７】
上記の目的を達成するために、本発明の好適な一実施例によれば、特徴点ベース動き推定技法を用いて、映像信号で表現された、現フレームと復元基準フレーム及び元基準フレームを有する基準フレームとの間の動きベクトルの組を推定する動きベクトル推定方法であって、前記復元基準フレーム内に含まれる画素から、複数の重複多角形を有する多角グリッドを形成する特徴点の組を選択する第ａ工程と、前記特徴点の組の位置と同一の現フレーム上の位置に準特徴点の組を決定する第ｂ工程と、前記準特徴点に対する、初期動きベクトルの全ての成分がゼロに設定される第ｃ工程と、前記準特徴点の中から、隣接するＮ個の隣接準特徴点を有する主な準特徴点を選択するが、前記N個の隣接準特徴点を用いてN個の辺を有する主な現多角形を形成する第ｄ工程と、一定領域内の画素各々に対する前記主な準特徴点の位置変位を表す複数の候補動きベクトルを発生する第ｅ工程と、前記一定領域内の画素各々に対する前記複数の候補動きベクトル各々と、主な準特徴点の初期動きベクトルを用いて求めた複数の更新済みの初期動きベクトルと、前記N個の隣接する準特徴点の初期動きベクトルに基づいて、前記主な現多角形内に含まれる全ての画素に対して、前記複数の更新済みの初期動きベクトル各々に対応する前記元基準フレーム上の予測位置を決定する第ｆ工程と、前記元基準フレーム上の予測位置に対応する画素値により前記主な現多角形内に含まれる各画素の予測値を求めて、前記複数の更新済みの初期動きベクトルと同一の数の予測される主な現多角形を形成する第ｇ工程と、前記主な現多角形の画素値と前記予測される主な現多角形各々の画素値との間の差分を計算し、前記複数の更新済みの初期動きベクトルと同一の数のピーク信号対雑音比（ＰＳＮＲ）を発生する第ｈ工程と、前記ＰＳＮＲのうち、最大のＰＳＮＲを有する予測された主な現多角形に対応する更新済みの初期動きベクトルで前記主な準特徴点の初期動きベクトルを更新する第ｉ工程と、前記主な準特徴点に隣接する準特徴点を新しい主な準特徴点として選択して、第ｄ工程〜第ｉ工程を繰り返して全ての対象の準特徴点の初期動きベクトルを更新する第ｊ工程と、前記繰返し工程が予め定められた回数だけ行われるまで、前記第ｊ工程を繰り返す第ｋ工程とを含むことを特徴とする動きベクトル推定方法が提供される。
【００１８】
本発明の好適な他の実施例によれば、映像信号符号化システムに用いられ、特徴点ベース動き推定技法を用いて、映像信号で表現された、現フレームと復元基準フレーム及び元基準フレームを有する基準フレームとの間の動きベクトルの組を推定する動きベクトル推定装置であって、前記復元基準フレーム内に含まれる画素から、複数の重複多角形を有する多角グリッドを形成する特徴点の組を選択する第１選択手段と、前記特徴点の組の位置と同一の現フレーム上の位置に準特徴点の組を決定する準特徴点決定手段と、前記準特徴点の組に対する、各成分がゼロに設定される初期動きベクトルの組を格納する格納手段と、前記準特徴点の中から、隣接するＮ個の隣接準特徴点を有する主な準特徴点を選択するが、前記N個の隣接準特徴点を用いてN個の辺を有する主な現多角形を形成する前記第２選択手段と、一定領域内の画素各々に対する前記主な準特徴点の位置変位を表す複数の候補動きベクトルを発生する加算手段と、前記一定領域内の画素各々に対する前記複数の候補動きベクトル各々と、主な準特徴点の初期動きベクトルを用いて求めた複数の更新済みの初期動きベクトルと、前記N個の隣接する準特徴点の初期動きベクトルに基づいて、前記主な現多角形内に含まれる全ての画素に対して、前記複数の更新済みの初期動きベクトル各々に対応する前記元基準フレーム上の予測位置を決定する予測位置決定手段と、
前記元基準フレーム上の予測位置に対応する画素値により前記主な現多角形内に含まれる各画素の予測値を求めて、前記複数の更新済みの初期動きベクトルと同一の数の予測される主な現多角形を形成する予測画素発生手段と、前記主な現多角形の画素値と前記予測される主な現多角形各々の画素値との間の差分を計算し、前記複数の更新済みの初期動きベクトルと同一の数のピーク信号対雑音比（ＰＳＮＲ）を発生する差分計算手段と、前記PSNRのうち、最大のPSNRを有する予測された主な現多角形に対応する更新済みの初期動きベクトルで前記主な準特徴点の初期動きベクトルを更新する第３選択手段と、前記初期動きベクトルの全体が予め定められた回数だけ更新された場合、前記格納手段からの前記初期動きベクトルの組を前記動きベクトルの組として取出す動きベクトル取出手段とを含むことを特徴とする動きベクトル推定装置が提供される。
【００１９】
発明の実施の様態
以下、本発明の好適実施例について図面を参照しながらより詳しく説明する。
【００２０】
図１には、本発明による映像信号符号化システムのブロック図が示されている。この符号化システムはフレーム再配列部101、減算部102、映像信号エンコーダ105、映像信号デコーダ113、加算部115、第１フレーム格納部120、第２フレーム格納部130、エントロピー符号化部107及び動き補償部150から構成されている。
【００２１】
入力ディジタル映像信号は、図２に示したように、二つのフレーム（または、ピクチャ）シーケンスを有し、第１フレームシーケンスは一つのイントラ（Ｉ）フレームI1と、三つの両方向予測フレームB1、B2、B3と、三つの予測フレームP1、P2、P3とを有し、第２フレームシーケンスは一つのイントラ（Ｉ）フレームI1と、三つの前方予測フレームF1、F2、F3と、三つの予測フレームP1、P2、P3とを有する。従って、映像信号符号化システムは二つのシーケンス符号化モード、即ち、第１シーケンス符号化モード及び第２シーケンス符号化モードを備える。
【００２２】
第１シーケンス符号化モードにおいて、ラインＬ17は第１スイッチ103によってラインＬ11に接続されて、I1、B1、P1、B2、P2、B3、P3からなる第１フレームシーケンスが第１スイッチ103を通じてフレーム再配列回路101に供給される。このフレーム再配列回路101は入力シーケンスを例えば、I1、P1、B1、P2、B2、P3、B3のディジタル映像信号に再配列することによって、Ｂフレームに対する両方向予測フレーム信号を求める。その後、再配列されたディジタル映像信号は、ラインＬ18を介して第２スイッチ104ａに、ラインL12を介して第１フレーム格納部120に、ラインL1を介して動き補償部１50に各々供給される。
【００２３】
第２シーケンス符号化モードにおいては、ラインＬ17は第１スイッチ103によりラインＬ11に接続されて、I1、F1、P1、F2、P2、F3、P3からなる第２フレームシーケンスが第１スイッチ103を通じて、ラインＬ12を介して第１フレーム格納部120に、ラインL1を介して動き補償部150に、ラインL18を介して第２スイッチ104aに各々供給される。第１スイッチ103は、従来のシステム制御部、例えば、マイクロプロセッサ（図示せず）からのシーケンスモード制御信号CS1により駆動される。上述したことから分かるように、第１シーケンス符号化モードで行われる場合、再配列による遅延があるため、テレビ電話及び電子会議等の適用分野においては第２シーケンス符号化モードが低遅延モードとして効率的に用いられ得る。
【００２４】
図１に示したように、映像信号符号化システムは、二つのフレーム符号化モード、即ち、フレーム間符号化モード及びフレーム内符号化モードを選択的に行うのに用いられる第２スイッチ104a及び第３スイッチ104bを備える。公知のように、第２スイッチ104a及び第３スイッチ104bは、システム制御部からのフレームモード制御信号CS2により同時に駆動される。
【００２５】
フレーム間符号化モードにおいて、インタフレームＩ1は、現フレーム信号としてラインＬ14を介して映像信号エンコーダ105に直接に供給される。この現フレーム信号は例えば、離散的コサイン変換（ＤＣＴ）及び周知の量子化技法の一つを用いて、量子化変換係数の組に符号化される。また、インタフレームＩ1は、第１フレーム格納部120のフレームメモリ121内の元基準フレームとして格納される。ここで、第１フレーム格納部120は、ラインＬ2、Ｌ3及びＬ4を介して動き補償部150に各々接続されている三つのフレームメモリ121、122及び123を備える。しかる後、量子化変換係数はエントロピー符号化部107及び映像信号デコーダ113に各々供給される。エントロピー符号化部107において、映像信号エンコーダ105からの量子化変換係数は、例えば、可変長符号化技法を用いて同時に符号化され、その伝送のための伝送器（図示せず）に伝送される。
【００２６】
一方、映像信号デコーダ113は、逆量子化及び逆離散的コサイン変換技法を用いて、映像信号エンコーダ105からの量子化変換係数を復元イントラフレーム（フレーム間）信号に再度変換する。その後、映像信号デコーダ113からの復元フレーム間信号は、第２フレーム格納部130のフレームメモリ131内の復元基準フレームとして格納される。ここで、第２フレーム格納部130は、ラインＬ′2、Ｌ′3、Ｌ′4を介して動き補償部150に各々接続されている三つのフレームメモリ131、131、133を備える。
【００２７】
インタ符号化モードにおいて、インタフレーム（例えば、予測フレームP1、両方向予測フレームまたは前方予測フレームF1）は、現フレーム信号として動き補償部150及び減算部102に各々供給され、第１フレーム格納部120のフレームメモリ131に格納される。ここで、いわゆるインタフレームは両方向予測フレームB1、B2、B3、予測フレームP1、P2、P3及び前方予測フレームF1、F2、F3を備える。しかる後、既にフレームメモリ121に格納されていた元基準フレームは、ラインＬ2を介して動き補償部150に供給され、フレームメモリ122にシフトされるかまたは格納される。この動き補償部150は、後述するように、ブロックベース動き補償チャネル及び特徴点ベース動き補償チャネルを有する。
【００２８】
現フレームが予測フレームP1である場合、ラインＬ1上の現フレーム信号及びラインL′1上の第２フレーム格納部130のフレームメモリ131からの復元基準フレーム信号は、ブロックベース動き補償チャネルを通じて処理され、ラインＬ30上に予測現フレーム信号を発生し、ラインＬ20上に動きベクトルの組を発生するために、現フレーム信号を予測する。現フレームが前方予測フレームF1（または、両方向予測フレームB1）である場合には、ラインＬ1上の現フレーム信号、ラインＬ2、Ｌ3、Ｌ4のうちの一つ上の第１フレーム格納部120からの元基準フレーム信号、及びラインＬ′2、Ｌ′3、Ｌ′4のうちの一つ上の第２フレーム格納部130からの復元基準フレーム信号は、特徴点ベース動き補償チャネルを通じて処理され、ラインＬ30上に現フレーム信号を発生し、ラインＬ20上に動きベクトルの組を各々発生するために、現フレーム信号を予測する。図３を参照して、動き補償部150を詳細に説明する。
【００２９】
減算部102においては、ラインＬ３０上の予測現フレーム信号とラインＬ15上の現フレーム信号との間の差が求められ、結果データ（即ち、差分画素値を表す誤差信号）は映像信号エンコーダ105に入力される。ここで、誤差信号は、例えば、ＤＣＴ及び周知の量子化法の一つを用いて、量子化変換係数の組に符号化される。即ち、現フレームと予測現フレームとの間の差によって求められた誤差がＤＣＴ符号化される。この場合、誤推定された動きベクトルにより発生された激しく劣化された領域のみを補償するため、量子化幅の大きさは大きい値に設定される。
【００３０】
続いて、量子化変換係数は、エントロピー符号化部107及び映像信号デコーダ113に各々供給される。エントロピー符号化部107において、映像信号エンコーダ105からの量子化変換係数及び動き補償部150からラインＬ20を介して伝送された動きベクトルは、例えば、可変長符号化技法を用いて同時に符号化され、その伝送のために伝送器（図示せず）に伝送される。
【００３１】
一方、映像信号デコーダ113は、逆量子化及び逆離散的コサイン変換を用いて、映像信号エンコーダ105からの量子化変換係数を復元誤差信号に再度変換する。
【００３２】
映像信号デコーダ113からの復元誤差信号及びラインL16を介して動き補償部150から入力された予測現フレーム信号は、第３スイッチ104ｂを介して加算部115にて組み合せられることによって、ラインＬ′1を介して、第２フレーム格納部130内に前フレームとして格納されるべき復元基準フレーム信号を供給する。
【００３３】
第２フレーム格納部130は、例えば、図１に示したように、直列に接続されている三つのフレームメモリ131、132及び133を備える。即ち、加算部115からの復元フレーム信号は、最初、例えば、フレームメモリ131に格納された後、ラインＬ2を介して動き補償部150に供給され、加算部115からの次の復元フレーム信号がフレームメモリ131に入力される場合、フレーム単位でフレームメモリ132にシフトされる。このプロセスは、映像符号化動作が行われる間、順に繰り返される。
【００３４】
図２を参照すると、上記した第１及び第２フレームシーケンスの例示図が示されている。図示したように、現フレームが予測フレームP1である場合、動きベクトルの組ＳＭＶ１は、復元インタフレームＩ1を用いて、第２フレーム格納部130から取出された基準フレームとしてブロック単位で求められる。同様に、現フレームP2及びP3に対する動きベクトルの各組ＳＭＶ２及びＳＭＶ３は、各々基準フレームP1及びP2を用いて求められる。
【００３５】
現フレームが両方向予測フレームB1である場合、前方向動きベクトルの組ＦＭＶ１は、第２フレーム格納部130から取出された復元基準フレームI1及び第１フレーム格納部120から取出された元基準フレームI1を用いて、特徴点から求められる。同様に、現フレームB1に対する後方向動きベクトルの組ＢＭＶ１は、元基準フレームP1及び復元基準フレームP1を用いて求められる。しかる後、映像信号符号化システムは、前方向動きベクトルの組ＦＭＶ１と後方向動きベクトルの組ＢＭＶ１との間で選択され、それに対応する動きベクトルを伝送する。
【００３６】
現フレームが前方予測フレームF1である場合、前方向動きベクトルの組ＦＭＶ２は、第１フレーム格納部120から取出された元基準フレームI1及び第２フレーム格納部130から取出された復元基準フレームF1を用いて、特徴点から求められる。
【００３７】
上述したように、動き推定及び補償のため、第１及び第２フレームシーケンスに含まれた各フレームは、第１及び第２フレーム格納部120、130内に下記の〔表I〕及び〔表II〕に示したように配列される。
【００３８】
【表１】

【００３９】
【表２】

【００４０】
ここで、Ｉ１：前方向動き推定に対して用いられるフレーム
Ｐ１、Ｐ２：後方向動き推定に対して用いられるフレーム
【００４１】
上記のように、予測フレームP1、P2、P3は、ブロックベース動き推定を用いたＤＣＴベース予測符号化（いわゆる、ＴＭＮ４）技法を通じて再構成され、介在フレーム（即ち、両方向予測フレームB1、B2、B3）または、前方予測フレームF1、F2、F3は、本発明による改善された特徴点ベース動き補償―離散的コサイン変換（ＭＣ−ＤＣＴ）技法を用いて再構成される。
【００４２】
図３には、図１に示した動き補償部150の詳細なブロック図が示されている。図３に示したように、動き補償部150は、三つの入力選択部154、155及び156、ブロックベース動き補償部151、第１特徴点ベース動き補償部152、第２特徴点ベース動き補償部153及び二つの出力選択部157、158を備える。
【００４３】
従来のブロック整合アルゴリズムを用いたブロックベース動き補償部151は、各予測フレームP1、P2、P3に対する動きベクトルの組を検出し、それに対応する予測フレームに対する予測現フレームを発生する。従って、〔表I〕及び〔表II〕に示したように、予測フレームＰ１が現フレームとしてブロックベース動き補償部151に供給される場合、入力選択部154は、ラインＬ′2上の復元インタフレームＩ1を基準フレームとして、ブロックベース動き補償部151に供給する。ブロックベース動き補償部151においては、動きベクトルの組が推定され、予測現フレーム信号がその推定を通じて構成される。その後、動きベクトルの組及び予測現フレーム信号は、ラインＬ20及びＬ30上の各出力選択部157、158を介して、エントロピー符号化部107及び減算部102に各々供給される。
【００４４】
アフィン変換を用いる第１特徴点ベース動き補償部152は、各両方向予測フレームB1、B2、B3または前方予測フレームF1、F2、F3に対する前方向推定の動きベクトルの組を検出し、それに対応する両方向または前方予測フレームに対する予測現フレームを発生する。従って、ラインＬ1上の両方向予測フレームB1が、現フレームとして第１特徴点ベース動き補償部152に供給される場合、入力選択部155は、〔表I〕に示したように、ラインＬ2上の元イントラフレームＩ1を元基準フレームとして第１特徴点ベース動き補償部152に供給する。入力選択部156は、ラインＬ′2上の復元イントラフレームＩ1を復元基準フレームとして第１特徴点ベース動き補償部152に供給して、予測フレームを発生する。第１特徴点ベース動き補償部152においては、前方向推定の動きベクトルの組が復元基準フレーム及び元基準フレームを用いて推定され、予測現フレーム信号が復元基準フレームを用いて構成される。続いて、前方向推定の動きベクトルの組及び予測現フレーム信号は、ラインＬ20、Ｌ30上の各出力選択部157、158を介して供給される。ここで、各出力選択部157及び158は、システム制御部（図示せず）からの制御信号CS5及びCS6によって制御される。
【００４５】
第２特徴点ベース動き補償部153は後述するアフィン変換を用いて、各両方向予測フレームB1、B2、B3に対する各後方向推定の動きベクトルの組を検出し、それに対応する両方向予測フレームに対する予測現フレームを発生する。従って、両方向予測フレームB1が、現フレームとして第２特徴点ベース動き補償部153に供給される場合、ラインＬ2上の元基準フレームP1は、元基準フレームとして第２特徴点ベース動き補償部153に供給され、ラインＬ′2上の復元予測フレームP1は、復元基準フレームとして第２特徴点ベース動き補償部153に供給される。第２特徴点ベース動き補償部153において、後方向推定の動きベクトルの組は復元基準フレーム及び元基準フレームを用いて求められ、予測現フレーム信号は復元基準フレームを用いることによって求められる。その後、後方向推定の動きベクトルの組及び予測現フレーム信号は、ラインＬ20、Ｌ30上の出力選択部157、158に各々供給される。
【００４６】
図４には、図３中の特徴点ベース動き補償部の詳細なブロック図が示されている。第２フレーム格納部130からのラインＬ′2上の復元基準フレームは、特徴点の組を発生する特徴点選択部210と動き補償部240とに各々入力される。
【００４７】
その後、特徴点の組は、動きベクトル探索部230及び動き補償部240に各々供給される。動きベクトル探索部230は、元基準フレーム及び現フレームを受取り、特徴点の組に対する動きベクトルの組を発生する。この動きベクトルの組は、動きベクトルの組及び特徴点の組に基づいて、予測現フレームを発生する動き補償部24に伝送される。
【００４８】
特徴点選択部21において、特徴点の組は復元基準フレーム内に含まれた複数の画素から選択される。ここで、各特徴点は一つの画素の位置によって規定される。図５（Ａ）及び（Ｂ）には、現フレーム及び復元基準フレームの例が示されている。
【００４９】
図６（Ａ）〜（Ｅ）は、本発明による特徴点選択プロセスを説明するための模式図である。図６（Ａ）に示したように、各エッジは、公知のソベル（Sobel）エッジ検出器（例えば、A.K.Jain氏の論文、「Fundamentals of Digital Image Processing」、1989年、Prentice-Hall International参照）を用いて、図５（Ｂ）に示した復元基準フレームＰ（ｘ、ｙ）で検出される。ソベル演算子からの出力

は、予め定められた閾値Ｔｅと比較される。この予め定められた閾値Ｔｅは、本発明によって６として好適に選択される。ソベル演算子からの出力値

が予め定められた閾値Ｔｅより小さい場合、出力値

は０に設定される。そうでない場合には、その出力値が変換されない。従って、図６（Ａ）のエッジ映像信号ｅｇ（ｘ、ｙ）は、下記のように定義される。

【００５０】
本発明の好適実施例において、特徴点は、図６（Ｂ）に示したように、複数の重複六角形を有する六角グリッドを用いたグリッド技法を用いて決定される。図６（Ｃ）に示したように、六角形610は、七つのグリッド点611〜617を結ぶ線分によって決定される。六角形610に含まれたグリッド点617は、正方形より隣接グリッド点611〜616をさらに取り囲むことによって、特徴点がより効果的に組合わせられるようにする。六角形610は六つの非重複三角形621〜626を有し、グリッド点611〜617は該三角形621〜626の頂点である。六角形610の解像度は、本発明によってラインＨＨ及びＨＶにより決定され、好ましくは、各々１３及び１０に設定される。
【００５１】
図６（Ｄ）を参照すると、各グリッド点（例えば、G1〜G4）に対して、非重複探索範囲（例えば、SR1〜SR4）が設定される。探索範囲ＳＲ１に位置したエッジ点（例えば、E7）は、エッジ点（例えば、E7）を取り囲む八つの画素の和値が最大となる場合、グリッド点（例えば、G1）に対する特徴点になる。従って、特徴点Diは次のように求められる。
Di={(x,y)|Max Σ[k=-1、1]Σ[l=-1、1]EG(x+k、y+1)} 式(2)
ここで、Σ[x=0、I]Σ[y=0、J]Z(x、y)
=Z(0,0)+Z(0,1)+…+Z(0,I)+Z(1,0)+…+Z(1,J)+…+Z(I,0)+Z(I,J)
ＥＧ：探索範囲内のエッジ点の値
ｉ：正の整数
【００５２】
特徴点の組は、式（２）を用いて決定される。ここで、特徴点の組は、エッジ点上に重複するグリッド点と、非重複探索範囲ＳＲｉ内に位置し、それを取り囲む画素点の最大和値を有するエッジ点と、非重複探索範囲内に含まれたエッジ点のない該当グリッド点とを備える。
【００５３】
その後、最大和値を有するエッジ点が一つ以上存在する場合には、グリッド点に最も近いエッジ点を特徴点として決定する。
【００５４】
特徴点の組が決定された場合、図６（Ｂ）に示した六角グリッドは、図６（Ｅ）に示す六角形の特徴点グリッドとして変形される。六角形の特徴点グリッドが決定された後、特徴点の組は動きベクトルの組を検出する、図４の動きベクトル探索部230に供給される。本発明によると、アフィン変換を用いる収束プロセスは、動きベクトルの組を探索するのに用いられる。
【００５５】
図７（Ａ）及び（Ｂ）には、本発明による動きベクトル探索の過程を説明するための模式図が示されている。準特徴点の組は、特徴点の組を用いて現フレームで決定される。ここで、各復元基準フレームの特徴点は、それに対応する現フレームの準特徴点にマッピングされる。各準特徴点（例えば、D1〜D30）に対する初期の動きベクトルは（０、０）に設定される。
【００５６】
しかる後、準特徴点（例えば、D7）がその動きベクトルの推定のため処理されるべき主な準特徴点として割当てられるか捨てられる際、主な現多角形700が収束の過程で用いられる。主な現多角形700は、主な準特徴点D7と、主な準特徴点D7を取り囲むそれに隣接する準特徴点（例えば、D1〜D6）とを接続する線分により決定される。主な現多角形700は、六つの非重複三角形701〜706を備える。ここで、主な準特徴点は三角形の共通頂点上に位置する。
【００５７】
その後、予め定められた候補動きベクトルの数が、順に準特徴点D7の初期動きベクトルに加算される。ここで、予め定められた候補動きベクトルの数は、水平及び垂直に０〜±７の範囲内で好適に選択され、候補動きベクトルD7Y1は、三角形701が逆になるので許されない。候補動きベクトルD7X1は、更新された初期動きベクトルD7D′7を発生するため、該六つの隣接する特徴点D1〜D6の初期動きベクトルを変更せず、主な準特徴点D7の初期ベクトルに加えられる。従って、更新された初期動きベクトルD7D′7は、主な準特徴点D7と候補準特徴点D′7との間の変位を表す。
【００５８】
主な現多角形700に含まれた各画素に対する予測位置は、更新された初期動きベクトル及び隣接する準特徴点の初期ベクトルを用いて、元基準フレーム上に決定される。
【００５９】
しかる後、主な現多角形700に含まれた各画素の位置は、予測位置に対応する元基準フレーム上の画素値によって補間され、予測された主な現多角形を形成する。本発明の好適実施例によると、このプロセスは、三つの特徴点（例えば、D1、D2、D7）を有する各三角形（例えば、701）にて周知のアフィン変換により行われる。アフィン変換は、下記のように定義される。
【数１】

ここで、（ｘ、ｙ）：予測された主な現多角形内の画素のｘ及びｙ座標
（ｘ′、ｙ′）：元基準フレーム上の予測位置
ａ〜ｆ：アフィン変換係数
【００６０】
六つのマッピングパラメータａ、ｂ、ｃ、ｄ、ｅ、ｆは、三つの準特徴点（例えば、D1、D2、D7）の動きベクトルを用いてユニークに決定される。一旦アフィン変換係数が決定されると、三角形701における各残余画素は、元基準フレームの位置上にマッピングされ得る。元基準フレームの予測位置（ｘ′、ｙ′）が多くの場合において整数の組でないため、予測位置（ｘ′、ｙ′）で補間されたグレイレベルは、公知の双線形補間技法（bilinear interpolation technique）を用いて求められる。アフィンマッピングプロセスが各三角形701〜706に別に適用される。その後、候補動きベクトルに対する予測された主な現多角形が求められる。
【００６１】
その後、予測された主な現六角形は現六角形700と比較され、予測された主な現六角形のピーク信号対雑音比（ＰＳＮＲ:peak signal to noise ratio）及び現六角形が増加するかがチェックされる。この場合、主な準特徴点D7の初期動きベクトル（０、０）は、更新済みの初期動きベクトルD7D′7に更新される。
【００６２】
このプロセスは、残余の候補動きベクトルに対して反復され、また、第１過程における上記現フレームに含まれた全ての準特徴点にも行われる。
【００６３】
図７（Ｂ）を参照すると、第１過程が完成されたとする場合、準特徴点D7は、主な準特徴点にセットされ、隣接する準特徴点D1〜D6に対する更新された初期動きベクトルは、D1D′2、D2D′2、D3D′3、D4D′4、D5D′5及びD6D′6である。同様にして、予め定められた候補動きベクトルは、主な準特徴点D7D′7の初期動きベクトルに順に加算される。例えば、候補動きベクトルD′7X2は、それに隣接する六つの特徴点D1D′1、D2D′2、D3D′3、D4D′4、D5D′5、D6D′6の動きベクトルを変更せず、初期動きベクトル主な準特徴点D7D′7の初期ベクトルに加算される。従って、更新された初期動きベクトルはD7X2になる。上述したように、予め定められた候補動きベクトルの数は、水平及び垂直に０〜±７の範囲内で好適に選択される。しかし、候補動きベクトルD7Y2は、三角形701が逆になるので許容されない。
【００６４】
主な現多角形700に含まれた各画素に対する予測位置は、更新された動きベクトルD7X2及び隣接する準特徴点D1D′1、D2D′2、D3D′3、D4D′4、D5D′5及びD6D′6の初期ベクトルを用いて、元基準フレーム上で決定される。しかる後、主な現多角形700に含まれた各画素の位置は、予測された主な現多角形700′（図７（Ｂ）の点線）からの予測位置に対応する元基準フレーム上の画素値によって補間される。
【００６５】
続いて、予測された主な現多角形700′は、現六角形と比較され、予測された主な現六角形のＰＳＮＲ及び現六角形が増加するかがチェックされる。この場合、主な準特徴点D7D′7の初期動きベクトルは、更新済みの初期動きベクトルD7X2に更新される。
【００６６】
このプロセスは、残余候補動きベクトルに対して反復され、また、第２過程における現フレームに含まれた全ての準特徴点にも行われる。
【００６７】
上記プロセスは、動きベクトルが収束されるまで全ての特徴点に行われるが、好ましくは、殆どの場合、動きベクトルが５番目の段階の前に収束されるため、第５段階に設定される。
【００６８】
上述したように、収束の過程において、各特徴点の変位は、動きベクトルとして表われ、各六角形の六つの三角形は、その頂点特徴点の変位を用いて独立的にアフィン変換される。変位がより良好なＰＳＮＲを供給する場合、主な準特徴点の動きベクトルは順に更新される。従って、収束の過程は、ズーミング、回転またはスケーリング物体を有する元映像により近くし得る予測映像を決定する整合プロセスにおいて非常に効果的である。
【００６９】
本発明の好適実施例によると、このプロセスは、ハードウェアの実行のため、三つの過程にて行われ得る。図７（Ａ）に示したように、主な非重複現多角形を形成する、D1、D3及びD5として表示された準特徴点は、最初、各六つの隣接する特徴点（D2、D7、D6、D10、D11、D17）、（D2、D4、D7、D12、D13、D19）、（D4、D6、D7、D8、D9、D15）を用いて同時に処理される。
【００７０】
図４を再び参照すると、その後、全ての準特徴点に対して求められた動きベクトルは、復元基準フレームを用いて予測現フレーム信号を発生する動き補償ブロック240に、全ての特徴点に対して動きベクトルとして供給される。即ち、予測現フレーム信号は、復元前フレーム及び求められた動きベクトルを用いるアフィン変換によって求められる。上述したように、このマッピングは、復号化システム（図示せず）が復元基準フレームのみ有するため、復元基準フレームが元基準フレームの代わりに用いられたことを除いては、動きベクトル検索プロセスに対して用いられたアフィン変換を用いて同一に行われる。
【００７１】
一方、符号化システムは、特徴点ベース動き補償を用いて、動きベクトルのみ有する非常に正確な映像を発生するので、現フレームと予測現フレームとの間の差分またはエラー信号は伝送されない。
【００７２】
上述したように、特徴点ベース動き補償を用いた本発明の符号化システムは、動き補償の組を正確に得ることによって、符号化効果をより向上させ得る。
【００７３】
特徴点ベース動き補償アルゴリズムは映像特徴点に基づき、アフィン変換は物体の回転及びズーミングを補償するのに用いられる。通常の場合、動き補償された映像は、高画質のより高いＰＳＮＲを有する。もし大量の動きにおける予測を失敗したとき、エラー映像は、大きい量子化幅を有するＤＣＴを用いて符号化され伝送され得る。詳述すると、２４k bpsにて本発明の符号化システムを用いてより正確な物体の画質を得ることができる。また、特徴点の位置はフレーム単位で変更するため、本発明の符号化システムは、基準フレームとして、符号化部及び復号化部ともに存在して特徴点の位置情報を伝送する必要がない復元現フレームを用いる。さらに、本発明の符号化システムに用いられた画素単位動き補償は、動きベクトルのみ有するアフィン変換を用いてズーミング、回転及び物体のスケーリングを補償し得るため、ブロックベース動き補償より高画質の物体を発生する。
【００７４】
上記において、本発明の特定な実施例について説明したが、本細書に記載した特許請求の範囲を逸脱することなく、当業者は種々の変更を加え得ることは勿論である。
【００７５】
【図面の簡単な説明】
【図１】図１は、本発明による特徴点ベース動き補償部が組み込まれた映像信号符号化装置のブロック図である。
【図２】図２は、（Ａ）及び（Ｂ）よりなり、各々フレームシーケンスを説明するための概略的な模式図ある。
【図３】図３は、図１中の動き補償部の詳細なブロック図である。
【図４】図４は、図３中の動きベクトル探索部の例示的なブロック図である。
【図５】図５は、（Ａ）及び（Ｂ）よりなり、各々現フレーム及び復元予測フレームの例示的な模式図であり、
【図６】図６は、（Ａ）〜（Ｅ）よりなり、各々本発明による特徴点選択過程を説明するための例示的な模式図である。
【図７】図７は、（Ａ）及び（Ｂ）よりなり、各々本発明による動きベクトル探索過程を説明するため模式図である。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a video signal encoding apparatus, and more particularly to a digital video capable of effectively reducing a transmission rate of a high-quality digital video signal by encoding a digital video signal using an improved feature point-based motion estimation technique. The present invention relates to a signal encoding device.
[0002]
Background art
As is well known, transmission of a video signal in digital form can maintain better image quality than transmission of an analog signal. When a video signal composed of a series of video “frames” is expressed in digital form, a large amount of digital data is required especially for data transmission in a high-definition television (HDTV) system. However, since the frequency band that can be used on a normal transmission channel is limited, in order to transmit a large amount of digital data through the transmission channel, it is necessary to compress or reduce the data to be transmitted. . Among various video compression techniques, a so-called hybrid encoding technique that combines a statistical encoding technique and a temporal and spatial compression technique is known as the most efficient one.
[0003]
Most hybrid coding techniques use motion compensated DPCM (Differential Pulse Code Modulation), two-dimensional DCT (Discrete Cosine Transform), DCT coefficient quantization, VLC (Variable Length Coding) and the like. This motion compensated DPCM estimates the motion of an object between the current frame and its previous frame or subsequent frame (ie, a reference frame), predicts the current frame according to the motion of the object, and predicts the current frame and its prediction. A process for generating a difference signal representing a difference between values. This technique is described in, for example, Staffan Ericsson's paper “Fixed and Adaptive Predictors for Hybrid Predictive / Transform Coding”, IEEE Transactions on Communications, COM-33, NO.12 (December 1985), and Ninomiya and Ohtsuka. The paper “A Motion-Compensated Interframe Coding Scheme for Television Pictures”, IEEE Transactions on Communications, COM-30, NO.1 (January 1982).
[0004]
A two-dimensional DCT that removes or reduces spatial redundancy between video data converts a block of digital video data (eg, a block of 8 × 8 pixels) into a set of transform coefficient data. This technique is disclosed, for example, in Chen and Pratt's paper “Scene Adaptive Coder”, IEEE Transactions on Communications, COM-32, NO. 3 (March 1984). The amount of data to be transmitted can be effectively compressed by processing such transform coefficient data with a quantizer, zigzag scanning and VLC.
[0005]
More specifically, in the motion compensation DPCM, based on the estimation of motion between the current frame and its reference frame, the current frame data is estimated from the corresponding reference frame data. In this way, the estimated motion is represented by a two-dimensional motion vector representing the displacement of the pixel between the reference frame and the current frame.
[0006]
There are basically two ways to estimate the displacement of an object pixel. One of them is a block-based estimation method, and the other is a pixel-based estimation method.
[0007]
In the block-based motion estimation method, the current frame block is compared with the reference frame block until the best match is obtained. Thereby, an inter-frame displacement vector for all blocks (which indicates how much a block of pixels has moved between frames) can be estimated for the current frame to be transmitted.
[0008]
Such block alignment techniques are disclosed in ITU Telecommunication Standardization Sector Study Group 15, Working Party 15/1 Experts Group on Very Low BitRate Visual Telephony, “Video Codec Test Model, TMN4 Rev1,” (October 25, 1994) As can be used, it can be used to predict P and B frames contained in a video sequence. Here, P or predicted frames are predicted from previous frames (as reference frames), and B or bi-directional predicted frames are predicted from previous frames and subsequent frames (as reference frames). More specifically, in so-called B frame encoding, a bidirectional motion estimation method is used to obtain a front-rear displacement vector. The forward displacement vector is determined by estimating the motion of the object between the B frame and the previous intra (I) or prediction (P) frame (as the reference frame), and the backward displacement vector is determined by the B frame and the (reference frame). As follows) based on subsequent intra (I) or predicted (P) frames.
[0009]
However, in the block-based motion estimation method, a blocking effect appearing at the block boundary can occur during the motion compensation process, and if all the pixels in the block do not move to one side, it is difficult to estimate an accurate motion. This reduces the overall image quality.
[0010]
On the other hand, when the pixel-based motion estimation method is used, displacement is obtained for all pixels. According to this method, the pixel value can be estimated more accurately, and scale change (for example, movement perpendicular to the image plane, that is, zooming) can be easily handled. However, in this pixel-based motion estimation method, since motion vectors are obtained for all pixels, it is impossible to actually transmit all motion vector data to the receiver.
[0011]
One of the methods for overcoming the problem of excessive transmission data caused by the pixel-based motion estimation method is a feature point-based motion estimation method.
[0012]
In this feature point based motion estimation method, a motion vector for a selected set of pixels (ie, feature points) is transmitted to a receiver. Here, by defining each feature point as a pixel that can represent an adjacent pixel, the receiver can restore or approximate the motion vector for the non-feature point using the motion vector for each feature point. it can. Published in pending US patent application Ser.No. 08 / 367,520, which shares the same applicant with the present invention, under the terms “Method and Apparatus for Encoding a Video Signal Using Pixel-by-Pixel Motion Estimation”. As described above, in an encoder that employs a feature point-based motion estimation method, first, a plurality of feature points are selected from all the pixels included in the previous frame. Thereafter, a motion vector for the selected feature point is obtained. Each motion vector represents a spatial displacement between one feature point in the previous frame and the corresponding matching point (ie, the most similar pixel) in the current frame. More specifically, a matching point for each feature point is obtained in a search area in the current frame using a well-known block matching algorithm. Here, the feature point block is defined as a block surrounding the selected feature point, and the search area is defined as a region within a predetermined area surrounding the position of the corresponding feature point.
[0013]
In this case, it is most preferable to obtain only the best one of the matching feature points through the entire search region corresponding to the selected feature point. However, when matching feature points, a plurality of identical optimum matching feature point blocks may be obtained. Therefore, it is difficult to accurately search motion vectors for feature points having such correlation between the feature point block and the corresponding search region. Furthermore, in the current frame, if the search area is not determined by the spatial displacement between the feature point in the reference frame and the corresponding matching point in the current frame (ie the most similar pixel), it is difficult to estimate the exact motion. As a result, the overall image quality is degraded.
[0014]
Disclosure of the invention
Accordingly, an object of the present invention is to provide a digital video signal encoding method capable of effectively reducing the transmission rate of a high-quality digital video signal by effectively estimating a motion vector for a feature point.
[0015]
Another object of the present invention is to be used in a video signal encoding system, which effectively estimates a motion vector using a feature point-based motion estimation method and efficiently reduces the transmission rate of a high-quality digital video signal. An object of the present invention is to provide a digital video signal encoding apparatus.
[0016]
Still another object of the present invention is to provide a video signal encoding system capable of effectively improving the overall image quality by selectively using the feature point-based motion estimation method and the block-based motion estimation method. It is in.
[0017]
To achieve the above object, according to a preferred embodiment of the present invention, the present invention includes a current frame, a restored reference frame, and an original reference frame represented by a video signal using a feature point-based motion estimation technique. A motion vector estimation method for estimating a set of motion vectors between a reference frame and selecting a set of feature points forming a polygon grid having a plurality of overlapping polygons from pixels included in the restored reference frame Step a to perform, step b to determine a quasi-feature point set at the same position on the current frame as the position of the feature point set, and all components of the initial motion vector for the quasi-feature point are zero. The main quasi-feature point having N adjacent quasi-feature points is selected from among the quasi-feature points set in step c and the quasi-feature points. Main current polygon with one side A first d forming,Within a certain areaAn e-th step for generating a plurality of candidate motion vectors representing positional displacements of the main quasi-feature points for each of the pixels, each of the plurality of candidate motion vectors for each of the pixels in the fixed region, and main quasi-feature points Based on the plurality of updated initial motion vectors obtained using the initial motion vectors of the initial motion vectors and the initial motion vectors of the N adjacent quasi-feature points, all the pixels included in the main current polygon are On the other hand, the f-th step of determining a predicted position on the original reference frame corresponding to each of the plurality of updated initial motion vectors, and the main current value by a pixel value corresponding to the predicted position on the original reference frame. Obtaining a predicted value of each pixel included in the polygon to form the same number of predicted main current polygons as the plurality of updated initial motion vectors; and the main current many Square pixel values and Calculating a difference between pixel values of each of the predicted main current polygons to generate the same number of peak signal-to-noise ratios (PSNR) as the plurality of updated initial motion vectors And an updated initial motion vector corresponding to the predicted main current polygon having the largest PSNR among the PSNRs.soUpdating the initial motion vector of the main quasi-feature point, selecting a quasi-feature point adjacent to the main quasi-feature point as a new main quasi-feature point, and performing steps d to i A j-th step of repeatedly updating the initial motion vectors of all target quasi-feature points, and a k-th step of repeating the j-th step until the repetition step is performed a predetermined number of times. A motion vector estimation method is provided.
[0018]
According to another preferred embodiment of the present invention, a current frame, a restored reference frame, and an original reference frame, which are used in a video signal encoding system and represented by a video signal using a feature point-based motion estimation technique, are obtained. A motion vector estimation device for estimating a set of motion vectors between a reference frame and a reference frame having a set of feature points forming a polygon grid having a plurality of overlapping polygons from pixels included in the restoration reference frame The first selection means for selecting, the quasi-feature point determining means for determining a quasi-feature point set at the same position on the current frame as the position of the set of feature points, and each component for the quasi-feature point set A storage means for storing a set of initial motion vectors set to zero and a main quasi-feature point having N adjacent quasi-feature points are selected from the quasi-feature points. Use adjacent quasi-feature points Said second selection means for forming a main current polygon having N sides,Within a certain areaAdding means for generating a plurality of candidate motion vectors representing positional displacement of the main quasi-feature point for each of the pixels, each of the plurality of candidate motion vectors for each pixel in the fixed region, and Based on a plurality of updated initial motion vectors obtained using initial motion vectors and initial motion vectors of the N adjacent quasi-feature points, for all pixels included in the main current polygon A predicted position determining means for determining a predicted position on the original reference frame corresponding to each of the plurality of updated initial motion vectors;
A predicted value of each pixel included in the main current polygon is obtained from a pixel value corresponding to a predicted position on the original reference frame, and the same number of predicted motion vectors as the plurality of updated initial motion vectors are predicted. Predicted pixel generating means for forming a main current polygon, and calculating a difference between a pixel value of the main current polygon and a pixel value of each of the predicted main current polygons, and the plurality of updates A difference calculation means for generating the same number of peak signal-to-noise ratios (PSNR) as the initial motion vectors already completed, and an updated corresponding to the predicted main current polygon having the largest PSNR among the PSNRs Initial motion vectorsoThird selection means for updating the initial motion vector of the main quasi-feature point, and when the entire initial motion vector is updated a predetermined number of times, the set of the initial motion vectors from the storage means is And a motion vector extracting means for extracting as a set of motion vectors.A motion vector estimation device is provided..
[0019]
Embodiment of the Invention
Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.
[0020]
FIG. 1 is a block diagram of a video signal encoding system according to the present invention. This encoding system includes a frame rearrangement unit 101, a subtraction unit 102, a video signal encoder 105, a video signal decoder 113, an addition unit 115, a first frame storage unit 120, a second frame storage unit 130, an entropy encoding unit 107, and a motion The compensation unit 150 is configured.
[0021]
As shown in FIG. 2, the input digital video signal has two frame (or picture) sequences. The first frame sequence is one intra (I) frame I1 and three bidirectional prediction frames B1 and B2. , B3 and three prediction frames P1, P2, P3, and the second frame sequence is one intra (I) frame I1, three forward prediction frames F1, F2, F3, and three prediction frames P1. , P2, and P3. Accordingly, the video signal encoding system includes two sequence encoding modes, that is, a first sequence encoding mode and a second sequence encoding mode.
[0022]
In the first sequence coding mode, the line L17 is connected to the line L11 by the first switch 103, and the first frame sequence consisting of I1, B1, P1, B2, P2, B3, and P3 is re-framed through the first switch 103. Supplied to the array circuit 101. The frame rearrangement circuit 101 obtains a bidirectional prediction frame signal for the B frame by rearranging the input sequence into, for example, digital video signals of I1, P1, B1, P2, B2, P3, and B3. Thereafter, the rearranged digital video signal is supplied to the second switch 104a via the line L18, to the first frame storage unit 120 via the line L12, and to the motion compensation unit 150 via the line L1.
[0023]
In the second sequence encoding mode, the line L17 is connected to the line L11 by the first switch 103, and the second frame sequence consisting of I1, F1, P1, F2, P2, F3, and P3 passes through the first switch 103. The signal is supplied to the first frame storage unit 120 via the line L12, to the motion compensation unit 150 via the line L1, and to the second switch 104a via the line L18. The first switch 103 is driven by a sequence mode control signal CS1 from a conventional system control unit, for example, a microprocessor (not shown). As can be seen from the above, there is a delay due to rearrangement when performed in the first sequence coding mode, so that the second sequence coding mode is efficient as a low delay mode in application fields such as videophones and electronic conferences. Can be used.
[0024]
As shown in FIG. 1, the video signal encoding system includes a second switch 104a and a second switch 104a used for selectively performing two frame encoding modes, that is, an interframe encoding mode and an intraframe encoding mode. Three switches 104b are provided. As is well known, the second switch 104a and the third switch 104b are simultaneously driven by a frame mode control signal CS2 from the system control unit.
[0025]
In the interframe coding mode, the interframe I1 is directly supplied to the video signal encoder 105 via the line L14 as the current frame signal. The current frame signal is encoded into a set of quantized transform coefficients using, for example, a discrete cosine transform (DCT) and one of the well-known quantization techniques. The inter frame I1 is stored as an original reference frame in the frame memory 121 of the first frame storage unit 120. Here, the first frame storage unit 120 includes three

frame memories

121, 122, and 123 connected to the motion compensation unit 150 via lines L2, L3, and L4, respectively. Thereafter, the quantized transform coefficients are supplied to the entropy coding unit 107 and the video signal decoder 113, respectively. In the entropy encoding unit 107, the quantized transform coefficients from the video signal encoder 105 are encoded simultaneously using, for example, a variable length encoding technique and transmitted to a transmitter (not shown) for the transmission. .
[0026]
On the other hand, the video signal decoder 113 reconverts the quantized transform coefficient from the video signal encoder 105 into a restored intraframe (interframe) signal using inverse quantization and inverse discrete cosine transform techniques. Thereafter, the restored inter-frame signal from the video signal decoder 113 is stored as a restored reference frame in the frame memory 131 of the second frame storage unit 130. Here, the second frame storage unit 130 includes three

frame memories

131, 131, and 133 that are connected to the motion compensation unit 150 via lines L′ 2, L′ 3, and L′ 4, respectively.
[0027]
In the inter coding mode, an inter frame (for example, a prediction frame P1, a bi-directional prediction frame, or a forward prediction frame F1) is supplied as a current frame signal to the motion compensation unit 150 and the subtraction unit 102, respectively, and the first frame storage unit 120 It is stored in the frame memory 131. Here, the so-called inter frame includes bidirectional prediction frames B1, B2, and B3, prediction frames P1, P2, and P3, and forward prediction frames F1, F2, and F3. Thereafter, the original reference frame already stored in the frame memory 121 is supplied to the motion compensation unit 150 via the line L2, and is shifted or stored in the frame memory 122. As will be described later, the motion compensation unit 150 includes a block-based motion compensation channel and a feature point-based motion compensation channel.
[0028]
When the current frame is the predicted frame P1, the current frame signal on the line L1 and the restored reference frame signal from the frame memory 131 of the second frame storage unit 130 on the line L'1 are processed through the block-based motion compensation channel. The current frame signal is predicted in order to generate a predicted current frame signal on line L30 and to generate a set of motion vectors on line L20. When the current frame is the forward prediction frame F1 (or the bidirectional prediction frame B1), the current frame signal on the line L1 and the first frame storage unit 120 on one of the lines L2, L3, and L4. The original reference frame signal and the restored reference frame signal from the second frame storage unit 130 on one of the lines L′ 2, L′ 3, and L′ 4 are processed through the feature point based motion compensation channel, and the line The current frame signal is predicted to generate the current frame signal on L30 and each set of motion vectors on line L20. The motion compensation unit 150 will be described in detail with reference to FIG.
[0029]
In the subtracting unit 102, the difference between the predicted current frame signal on the line L30 and the current frame signal on the line L15 is obtained, and the result data (that is, the error signal indicating the difference pixel value) is sent to the video signal encoder 105. Entered. Here, the error signal is encoded into a set of quantized transform coefficients using, for example, DCT and one of known quantization methods. That is, the error obtained by the difference between the current frame and the predicted current frame is DCT encoded. In this case, the magnitude of the quantization width is set to a large value in order to compensate only for the severely degraded region generated by the erroneously estimated motion vector.
[0030]
Subsequently, the quantized transform coefficients are supplied to the entropy coding unit 107 and the video signal decoder 113, respectively. In the entropy encoding unit 107, the quantized transform coefficient from the video signal encoder 105 and the motion vector transmitted from the motion compensation unit 150 via the line L20 are simultaneously encoded using, for example, a variable length encoding technique, It is transmitted to a transmitter (not shown) for the transmission.
[0031]
On the other hand, the video signal decoder 113 reconverts the quantized transform coefficient from the video signal encoder 105 into a restored error signal using inverse quantization and inverse discrete cosine transform.
[0032]
The restoration error signal from the video signal decoder 113 and the predicted current frame signal input from the motion compensation unit 150 via the line L16 are combined by the addition unit 115 via the third switch 104b, whereby the line L′ 1. Then, a restored reference frame signal to be stored as the previous frame is supplied in the second frame storage unit 130.
[0033]
For example, as shown in FIG. 1, the second frame storage unit 130 includes three

frame memories

131, 132, and 133 connected in series. That is, the restored frame signal from the adder 115 is first stored in the frame memory 131, for example, and then supplied to the motion compensator 150 via the line L2, and the next restored frame signal from the adder 115 is framed. When input to the memory 131, it is shifted to the frame memory 132 in units of frames. This process is repeated in sequence while the video encoding operation is performed.
[0034]
Referring to FIG. 2, an exemplary diagram of the first and second frame sequences described above is shown. As shown in the figure, when the current frame is the prediction frame P1, the motion vector set SMV1 is obtained in units of blocks as a reference frame extracted from the second frame storage unit 130 using the restored interframe I1. Similarly, each set of motion vectors SMV2 and SMV3 for current frames P2 and P3 is determined using reference frames P1 and P2, respectively.
[0035]
When the current frame is the bidirectional prediction frame B1, the forward motion vector set FMV1 includes the restored reference frame I1 extracted from the second frame storage unit 130 and the original reference frame I1 extracted from the first frame storage unit 120. And obtained from the feature points. Similarly, the backward motion vector set BMV1 for the current frame B1 is obtained using the original reference frame P1 and the restored reference frame P1. Thereafter, the video signal encoding system selects between the forward motion vector set FMV1 and the backward motion vector set BMV1 and transmits the corresponding motion vector.
[0036]
When the current frame is the forward prediction frame F1, the forward motion vector set FMV2 includes the original reference frame I1 extracted from the first frame storage unit 120 and the restored reference frame F1 extracted from the second frame storage unit 130. And obtained from the feature points.
[0037]
As described above, for motion estimation and compensation, each frame included in the first and second frame sequences is stored in the first and second frame storage units 120 and 130 as shown in the following [Table I] and [Table II]. ] Are arranged as shown in FIG.
[0038]
[Table 1]

[0039]
[Table 2]

[0040]
Where I1: Frame used for forward motion estimation
P1, P2: Frames used for backward motion estimation
[0041]
As described above, the prediction frames P1, P2, P3 are reconstructed through a DCT-based predictive coding (so-called TMN4) technique using block-based motion estimation, and intervening frames (ie, bidirectional prediction frames B1, B2, B3). ) Or the forward predicted frames F1, F2, F3 are reconstructed using the improved feature point based motion compensation-discrete cosine transform (MC-DCT) technique according to the present invention.
[0042]
FIG. 3 shows a detailed block diagram of the motion compensation unit 150 shown in FIG. As shown in FIG. 3, the motion compensation unit 150 includes three

input selection units

154, 155, and 156, a block-based motion compensation unit 151, a first feature point-based motion compensation unit 152, and a second feature point-based motion compensation unit. 153 and two

output selection units

157 and 158.
[0043]
A block-based motion compensation unit 151 using a conventional block matching algorithm detects a set of motion vectors for each predicted frame P1, P2, and P3, and generates a predicted current frame for the corresponding predicted frame. Accordingly, as shown in [Table I] and [Table II], when the predicted frame P1 is supplied as the current frame to the block-based motion compensation unit 151, the input selection unit 154 causes the restoration interface on the line L′ 2 to be restored. The frame I1 is supplied as a reference frame to the block base motion compensation unit 151. In the block-based motion compensation unit 151, a set of motion vectors is estimated, and a predicted current frame signal is configured through the estimation. Thereafter, the set of motion vectors and the predicted current frame signal are supplied to the entropy encoding unit 107 and the subtraction unit 102 via the

output selection units

157 and 158 on the lines L20 and L30, respectively.
[0044]
The first feature point based motion compensation unit 152 using affine transformation detects a set of motion vectors for forward estimation for each bidirectional prediction frame B1, B2, B3 or forward prediction frames F1, F2, F3, and the corresponding bidirectional direction. Alternatively, a prediction current frame for the forward prediction frame is generated. Accordingly, when the bi-directional prediction frame B1 on the line L1 is supplied as the current frame to the first feature point based motion compensation unit 152, the input selection unit 155 performs the operation on the line L2 as shown in [Table I]. The original intra frame I1 is supplied to the first feature point base motion compensation unit 152 as an original reference frame. The input selection unit 156 supplies the restored intra frame I1 on the line L′ 2 as the restoration reference frame to the first feature point based motion compensation unit 152 to generate a prediction frame. In the first feature point-based motion compensation unit 152, a set of motion vectors for forward estimation is estimated using the restoration reference frame and the original reference frame, and the predicted current frame signal is configured using the restoration reference frame. Subsequently, the set of motion vectors for forward estimation and the predicted current frame signal are supplied via the

output selection units

157 and 158 on the lines L20 and L30. Here, the

output selection units

157 and 158 are controlled by control signals CS5 and CS6 from a system control unit (not shown).
[0045]
The second feature point-based motion compensation unit 153 detects a set of motion vectors for each backward estimation frame for each bidirectional prediction frame B1, B2, and B3 using affine transformation, which will be described later, and performs prediction prediction for the corresponding bidirectional prediction frame. Generate a frame. Therefore, when the bidirectional prediction frame B1 is supplied to the second feature point base motion compensation unit 153 as the current frame, the original reference frame P1 on the line L2 is sent to the second feature point base motion compensation unit 153 as the original reference frame. The restored prediction frame P1 on the line L′ 2 is supplied to the second feature point based motion compensation unit 153 as a restoration reference frame. In the second feature point based motion compensation unit 153, a set of motion vectors for backward estimation is obtained using the restoration reference frame and the original reference frame, and the predicted current frame signal is obtained by using the restoration reference frame. Thereafter, the set of motion vectors for backward estimation and the predicted current frame signal are supplied to

output selection units

157 and 158 on lines L20 and L30, respectively.
[0046]
FIG. 4 shows a detailed block diagram of the feature point-based motion compensation unit in FIG. The restoration reference frame on the line L′ 2 from the second frame storage unit 130 is input to the feature point selection unit 210 and the motion compensation unit 240 that generate a set of feature points.
[0047]
Thereafter, the set of feature points is supplied to the motion vector search unit 230 and the motion compensation unit 240, respectively. The motion vector search unit 230 receives the original reference frame and the current frame, and generates a set of motion vectors for the set of feature points. The set of motion vectors is transmitted to the motion compensation unit 24 that generates a predicted current frame based on the set of motion vectors and the set of feature points.
[0048]
In the feature point selection unit 21, a set of feature points is selected from a plurality of pixels included in the restoration reference frame. Here, each feature point is defined by the position of one pixel. 5A and 5B show examples of the current frame and the restoration reference frame.
[0049]
6A to 6E are schematic views for explaining a feature point selection process according to the present invention. As shown in FIG. 6A, each edge is a known Sobel edge detector (see, for example, AKJain's paper, “Fundamentals of Digital Image Processing”, 1989, Prentice-Hall International). Is used to detect the restoration reference frame P (x, y) shown in FIG. Output from Sobel operator

Is compared with a predetermined threshold Te. This predetermined threshold value Te is preferably selected as 6 by the present invention. Output value from Sobel operator

Is less than a predetermined threshold Te, the output value

Is set to 0. Otherwise, the output value is not converted. Accordingly, the edge video signal eg (x, y) in FIG. 6A is defined as follows.

[0050]
In the preferred embodiment of the present invention, the feature points are determined using a grid technique using a hexagonal grid having a plurality of overlapping hexagons, as shown in FIG. 6B. As shown in FIG. 6C, the hexagon 610 is determined by a line segment connecting the seven grid points 611 to 617. The grid points 617 included in the hexagon 610 further surround the adjacent grid points 611-616 rather than the square so that the feature points can be combined more effectively. Hexagon 610 has six non-overlapping triangles 621-626, and grid points 611-617 are the vertices of the triangles 621-626. The resolution of hexagon 610 is determined by lines HH and HV according to the present invention and is preferably set to 13 and 10, respectively.
[0051]
Referring to FIG. 6D, a non-overlapping search range (for example, SR1 to SR4) is set for each grid point (for example, G1 to G4). The edge point (for example, E7) located in the search range SR1 is a feature point for the grid point (for example, G1) when the sum value of eight pixels surrounding the edge point (for example, E7) is maximized. Therefore, the feature point Di is obtained as follows.
Di = {(x, y) | Max Σ [k = -1, 1] Σ [l = -1, 1] EG (x + k, y + 1)} Equation (2)
Where Σ [x = 0, I] Σ [y = 0, J] Z (x, y)
= Z (0,0) + Z (0,1) +… + Z (0, I) + Z (1,0) +… + Z (1, J) +… + Z (I, 0) + Z (I, J)
EG: Edge point value within the search range
i: positive integer
[0052]
A set of feature points is determined using equation (2). Here, the set of feature points includes grid points that overlap on the edge points, edge points that are located in the non-overlapping search range SRi and have the maximum sum of pixel points that surround them, and non-overlapping search ranges. And a corresponding grid point without an included edge point.
[0053]
After that, when there are one or more edge points having the maximum sum value, the edge point closest to the grid point is determined as the feature point.
[0054]
When a set of feature points is determined, the hexagonal grid shown in FIG. 6B is transformed into a hexagonal feature point grid shown in FIG. After the hexagonal feature point grid is determined, the feature point set is supplied to the motion vector search unit 230 of FIG. 4 which detects a set of motion vectors. According to the present invention, a convergence process using affine transformation is used to search for a set of motion vectors.
[0055]
7A and 7B are schematic diagrams for explaining the process of motion vector search according to the present invention. A set of quasi-feature points is determined in the current frame using the set of feature points. Here, the feature points of each restoration reference frame are mapped to the corresponding semi-feature points of the current frame. The initial motion vector for each quasi-feature point (for example, D1 to D30) is set to (0, 0).
[0056]
Thereafter, when the quasi-feature point (eg, D7) is assigned or discarded as the main quasi-feature point to be processed for its motion vector estimation, the main current polygon 700 is used in the process of convergence. The main current polygon 700 is determined by a line segment connecting the main quasi-feature point D7 and the quasi-feature points adjacent to the main quasi-feature point D7 (for example, D1 to D6). The main current polygon 700 comprises six non-overlapping triangles 701-706. Here, the main quasi-feature point is located on the common vertex of the triangle.
[0057]
Thereafter, a predetermined number of candidate motion vectors are sequentially added to the initial motion vector of the quasi-feature point D7. Here, the predetermined number of candidate motion vectors is suitably selected in the range of 0 to ± 7 horizontally and vertically, and the candidate motion vector D7Y1 is not allowed because the triangle 701 is reversed. Candidate motion vector D7X1 generates updated initial motion vector D7D'7, so the initial motion vectors of the six adjacent feature points D1 to D6 are not changed and added to the initial vectors of main quasi-feature points D7 It is done. Accordingly, the updated initial motion vector D7D′7 represents the displacement between the main quasi-feature point D7 and the candidate quasi-feature point D′ 7.
[0058]
The predicted position for each pixel included in the main current polygon 700 is determined on the original reference frame using the updated initial motion vector and the initial vectors of adjacent quasi-feature points.
[0059]
Thereafter, the position of each pixel included in the main current polygon 700 is interpolated by the pixel value on the original reference frame corresponding to the predicted position to form the predicted main current polygon. According to a preferred embodiment of the present invention, this process is performed by a well-known affine transformation with each triangle (eg, 701) having three feature points (eg, D1, D2, D7). The affine transformation is defined as follows.
[Expression 1]

Where (x, y): x and y coordinates of the pixels in the predicted main current polygon
(X ′, y ′): predicted position on the original reference frame
a to f: Affine transformation coefficients
[0060]
The six mapping parameters a, b, c, d, e, and f are uniquely determined using motion vectors of three quasi-feature points (for example, D1, D2, and D7). Once the affine transformation coefficients are determined, each residual pixel in triangle 701 can be mapped onto the position of the original reference frame. Since the predicted position (x ′, y ′) of the original reference frame is not a set of integers in many cases, the gray level interpolated at the predicted position (x ′, y ′) is the known bilinear interpolation technique. technique). The affine mapping process is applied separately to each triangle 701-706. Thereafter, the predicted main current polygon for the candidate motion vector is determined.
[0061]
The predicted main current hexagon is then compared with the current hexagon 700 to see if the predicted main current hexagon's peak signal to noise ratio (PSNR) and the current hexagon increase. Is checked. In this case, the initial motion vector (0, 0) of the main quasi-feature point D7 is updated to the updated initial motion vector D7D′7.
[0062]
This process is repeated for the remaining candidate motion vectors, and is also performed for all quasi-feature points included in the current frame in the first step.
[0063]
Referring to FIG. 7B, when the first process is completed, the quasi-feature point D7 is set as the main quasi-feature point, and the updated initial motion vectors for the adjacent quasi-feature points D1 to D6 are , D1D'2, D2D'2, D3D'3, D4D'4, D5D'5 and D6D'6. Similarly, predetermined candidate motion vectors are sequentially added to initial motion vectors of main quasi-feature points D7D′7. For example, the candidate motion vector D'7X2 does not change the motion vectors of the six feature points D1D'1, D2D'2, D3D'3, D4D'4, D5D'5, D6D'6 adjacent to it, and the initial motion The vector is added to the initial vector of the main quasi-feature point D7D'7. Therefore, the updated initial motion vector is D7X2. As described above, the predetermined number of candidate motion vectors is suitably selected in the range of 0 to ± 7 horizontally and vertically. However, the candidate motion vector D7Y2 is not allowed because the triangle 701 is reversed.
[0064]
The predicted position for each pixel included in the main current polygon 700 is the updated motion vector D7X2 and the adjacent quasi-feature points D1D′1, D2D′2, D3D′3, D4D′4, D5D′5 and D6D. It is determined on the original reference frame using the initial vector of '6. Thereafter, the position of each pixel included in the main current polygon 700 is on the original reference frame corresponding to the predicted position from the predicted main current polygon 700 ′ (dotted line in FIG. 7B). Interpolated by pixel value.
[0065]
Subsequently, the predicted main current polygon 700 'is compared with the current hexagon to check whether the predicted main current hexagon PSNR and the current hexagon increase. In this case, the initial motion vector of the main quasi-feature point D7D′7 is updated to the updated initial motion vector D7X2.
[0066]
This process is repeated for the remaining candidate motion vectors, and is also performed for all quasi-feature points included in the current frame in the second step.
[0067]
The above process is performed on all feature points until the motion vector is converged, but is preferably set to the fifth stage because in most cases the motion vector is converged before the fifth stage.
[0068]
As described above, in the process of convergence, the displacement of each feature point is expressed as a motion vector, and the six triangles of each hexagon are independently affine transformed using the displacement of the vertex feature point. If the displacement provides a better PSNR, the motion vectors of the main quasi-feature points are updated in sequence. Thus, the convergence process is very effective in the matching process to determine the predicted video that can be closer to the original video with zooming, rotating or scaling objects.
[0069]
According to a preferred embodiment of the present invention, this process can be performed in three steps for hardware implementation. As shown in FIG. 7 (A), the quasi-feature points displayed as D1, D3, and D5 that form the main non-overlapping current polygon are initially six adjacent feature points (D2, D7, D6, D10, D11, D17), (D2, D4, D7, D12, D13, D19), (D4, D6, D7, D8, D9, D15).
[0070]
Referring back to FIG. 4, the motion vectors determined for all quasi-feature points are then sent to the motion compensation block 240 that generates the predicted current frame signal using the restored reference frame for all the feature points. Supplied as a motion vector. That is, the predicted current frame signal is obtained by affine transformation using the pre-restoration frame and the obtained motion vector. As mentioned above, this mapping is for the motion vector search process, except that the decoding system (not shown) has only the restored reference frame, so that the restored reference frame was used instead of the original reference frame. The same is done using the affine transformation used.
[0071]
On the other hand, the encoding system uses feature point based motion compensation to generate a very accurate video with only motion vectors, so no difference or error signal between the current frame and the predicted current frame is transmitted.
[0072]
As described above, the coding system of the present invention using the feature point-based motion compensation can further improve the coding effect by accurately obtaining the motion compensation set.
[0073]
Feature point based motion compensation algorithms are based on video feature points and affine transformations are used to compensate for object rotation and zooming. Usually, motion compensated video has a higher PSNR with higher image quality. If prediction in a large amount of motion fails, the error video can be encoded and transmitted using DCT with a large quantization width. More specifically, a more accurate image quality of an object can be obtained using the encoding system of the present invention at 24 kbps. In addition, since the position of the feature point is changed in units of frames, the encoding system of the present invention is a restoration method in which both the encoding unit and the decoding unit exist as the reference frame and it is not necessary to transmit the position information of the feature point. Use frames. Furthermore, the pixel-based motion compensation used in the coding system of the present invention can compensate for zooming, rotation, and object scaling by using an affine transformation having only motion vectors, so that an object with higher image quality than block-based motion compensation can be obtained. appear.
[0074]
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that various modifications may be made without departing from the scope of the claims set forth herein.
[0075]
[Brief description of the drawings]
FIG. 1 is a block diagram of a video signal encoding apparatus incorporating a feature point-based motion compensation unit according to the present invention.
FIG. 2 is a schematic diagram for explaining a frame sequence, which includes (A) and (B).
FIG. 3 is a detailed block diagram of a motion compensation unit in FIG. 1;
FIG. 4 is an exemplary block diagram of a motion vector search unit in FIG. 3;
FIG. 5 is an exemplary schematic diagram of a current frame and a reconstructed prediction frame, each including (A) and (B);
FIG. 6 is an exemplary schematic diagram for explaining a feature point selection process according to the present invention, which includes (A) to (E).
FIG. 7 is a schematic diagram for explaining a motion vector search process according to the present invention, comprising (A) and (B).

Claims

A motion vector estimation method for estimating a set of motion vectors represented by a video signal between a current frame and a reference frame having a restored reference frame and an original reference frame, using a feature point-based motion estimation technique,
A-step of selecting a set of feature points that form a polygonal grid having a plurality of overlapping polygons from pixels included in the restoration reference frame;
A step b of determining a quasi-feature point set at the same position on the current frame as the position of the feature point set;
C-step in which all components of the initial motion vector for the quasi-feature point are set to zero;
A main quasi-feature point having N adjacent quasi-feature points adjacent to each other is selected from the quasi-feature points, and the main current feature having N sides using the N adjacent quasi-feature points is selected. A d-th step of forming a square;
Generating a plurality of candidate motion vectors representing positional displacements of the main quasi-feature points for each pixel in a fixed region ;
Each of the plurality of candidate motion vectors for each pixel in the fixed region, a plurality of updated initial motion vectors obtained using initial motion vectors of main quasi-feature points, and the N adjacent quasi-feature points A predicted position on the original reference frame corresponding to each of the plurality of updated initial motion vectors is determined for all the pixels included in the main current polygon based on the initial motion vector of the first current polygon. f process;
A predicted value of each pixel included in the main current polygon is obtained from a pixel value corresponding to a predicted position on the original reference frame, and the same number of predicted motion vectors as the plurality of updated initial motion vectors are predicted. The g-th step of forming the main current polygon;
Calculating a difference between a pixel value of the main current polygon and a pixel value of each of the predicted main current polygons, and the same number of peak signal to noise as the plurality of updated initial motion vectors The h-th step for generating the ratio (PSNR);
Updating the initial motion vector of the main quasi-feature point with an updated initial motion vector corresponding to the predicted main current polygon having the largest PSNR among the PSNRs;
A quasi-feature point adjacent to the main quasi-feature point is selected as a new main quasi-feature point, and the d-th to i-th steps are repeated to update the initial motion vectors of all target quasi-feature points. Process,
And a k-th step of repeating the j-th step until the repetition step is performed a predetermined number of times.

The step a includes
P (x, y) is the reference frame;

Is an output from a well-known Sobel operator, and when Te is a predetermined threshold value, an edge image eg (x, y) in the restoration reference frame is expressed as follows:

Detecting step a1,
A second step of determining a polygonal grid having a large number of grid points for forming a plurality of overlapping polygons on the edge image;
A3 step of determining a non-overlapping search range for each grid point;
A set of feature points having grid points without edge points included in a non-overlapping search range, which overlaps with an edge point that is located within the search range and has the maximum sum value of eight pixels surrounding itself. The motion vector estimation method according to claim 1, further comprising: a4th step of determining.

The set of feature points has an edge point that is closest to the polygonal grid when two or more edge points have the same maximum sum value in the search range. Motion vector estimation method.

The motion vector estimation method according to claim 3, wherein the polygon is a hexagon and N is six.

The motion vector estimation according to claim 4, wherein the main current hexagon comprises six triangles defined by line segments connecting the main quasi-feature point and the quasi-feature point adjacent thereto. Method.

6. The motion vector estimation method according to claim 5, wherein the predetermined number of iterations is 5, and the predetermined threshold is 6.

The motion vector estimation method according to claim 6, wherein the certain region has a range of 0 to ± 7 in the vertical and horizontal directions.

When EG is a value of an edge point in the search range and i is a positive integer, the feature point Di is as follows:
Di = {(x, y) | Max Σ [k = -1, 1] Σ [l = -1, 1] EG (x + k, y + 1)}
Where Σ [x = 0, I] Σ [y = 0, J] Z (x, y)
= Z (0,0) + Z (0,1) +… + Z (0, I) + Z (1,0) +… + Z (1, J) +… + Z (I, 0) + Z (I, J)
The motion vector estimation method according to claim 7, wherein the motion vector estimation method is defined.

Estimate a set of motion vectors between a current frame and a reference frame having a restored reference frame and an original reference frame represented by the video signal using a feature point-based motion estimation technique. A motion vector estimation device for performing
First selection means for selecting a set of feature points forming a polygonal grid having a plurality of overlapping polygons from pixels included in the restoration reference frame;
Quasi-feature point determining means for determining a quasi-feature point set at the same position on the current frame as the position of the set of feature points;
Storage means for storing a set of initial motion vectors in which each component is set to zero for the set of quasi-feature points;
A main quasi-feature point having N adjacent quasi-feature points adjacent to each other is selected from the quasi-feature points, and the main current feature having N sides using the N adjacent quasi-feature points is selected. The second selection means for forming a square;
Adding means for generating a plurality of candidate motion vectors representing positional displacements of the main quasi-feature points for each pixel in a fixed region ;
Each of the plurality of candidate motion vectors for each pixel in the fixed region, a plurality of updated initial motion vectors obtained using initial motion vectors of main quasi-feature points, and the N adjacent quasi-feature points A prediction for determining a predicted position on the original reference frame corresponding to each of the plurality of updated initial motion vectors for all pixels included in the main current polygon based on the initial motion vector of Positioning means;
A predicted value of each pixel included in the main current polygon is obtained from a pixel value corresponding to a predicted position on the original reference frame, and the same number of predicted motion vectors as the plurality of updated initial motion vectors are predicted. Predictive pixel generating means for forming the main current polygon;
Calculating a difference between a pixel value of the main current polygon and a pixel value of each of the predicted main current polygons, and the same number of peak signal to noise as the plurality of updated initial motion vectors A difference calculating means for generating a ratio (PSNR);
Among the PSNR, a third selection means for updating the initial motion vector of the main quasi-feature points in the initial motion vector updated corresponding to the predicted major current polygon having a maximum PSNR,
If the entire of the initial motion vectors are updated by a predetermined number of times, a movement which comprises a motion vector extracting means for taking out the set of initial motion vectors from the storage means as the set of motion vectors Vector estimation device.

The first selection means comprises:
P (x, y) is the reference frame;

Edge image detection means for detecting;
A polygon grid determining means for determining a polygon grid having a large number of grid points for forming a plurality of overlapping polygons on an edge image;
Non-overlapping search range determining means for determining a non-overlapping search range for each grid point;
A set of feature points having grid points without edge points included in a non-overlapping search range, which overlaps with an edge point that is located within the search range and has the maximum sum value of eight pixels surrounding itself. The motion vector estimation apparatus according to claim 9, further comprising a feature point determination unit for determining.

11. The feature point set of claim 10, wherein the set of feature points has an edge point closest to the polygonal grid when two or more edge points have the same maximum sum value in the search range. Motion vector estimation device.

The motion vector estimation apparatus according to claim 11, wherein the polygon is a hexagon and N is six.

The motion vector estimation according to claim 12, wherein the main current hexagon comprises six triangles defined by line segments connecting the main quasi-feature point and the quasi-feature point adjacent thereto. apparatus.

The motion vector estimation apparatus according to claim 13, wherein the number of the peripheral pixel points is 8, the predetermined number of repetitions is 5, and the predetermined threshold is 6.

15. The motion vector estimation apparatus according to claim 14, wherein the certain area has a range of 0 to ± 7 in the vertical and horizontal directions.

A digital video signal encoding apparatus that encodes a digital video signal including a plurality of frames including a current frame and a reference frame, and reduces a transmission rate of the video signal,
First storage means for storing a restoration reference frame of the digital video signal;
Second storage means for storing an original reference frame of the digital video signal;
A plurality of motion vectors between the current frame and the restoration reference frame are detected using block-based motion estimation, and a first predicted current frame is generated based on the plurality of motion vectors and the restoration reference frame. First motion compensation means;
Using feature point based motion estimation, selecting a set of feature points from the restored reference frame, detecting a set of motion vectors between the current frame and the original reference frame corresponding to the set of feature points; Second motion compensation means for generating a second predicted current frame based on the set of motion vectors and the restoration reference frame;
When the current frame is a predicted frame, the plurality of motion vectors detected from the first motion compensation unit and the first predicted current frame are selected, and when the current frame is a bidirectional prediction frame or a forward prediction frame, Selecting means for selecting a set of motion vectors detected from the second motion compensation means and a second predicted current frame;
Transform coding means for transform coding an error signal representing a displacement between the predicted current frame and the current frame and generating a transform coding error signal;
Digital video signal encoding comprising: statistical encoding means for statistically encoding the transform encoding error signal and the selected motion vector to generate an encoded video signal to be transmitted. In the device
A motion vector estimation device for estimating a set of motion vectors represented by a video signal between a current frame and a reference frame having a restored reference frame and an original reference frame, using a feature point-based motion estimation technique,
First selection means for selecting a set of feature points forming a polygonal grid having a plurality of overlapping polygons from pixels included in the restoration reference frame;
Quasi-feature point determining means for determining a quasi-feature point set at the same position on the current frame as the position of the set of feature points;
Storage means for storing a set of initial vectors in which each component is set to zero for the set of quasi-feature points;
A main quasi-feature point having N adjacent quasi-feature points adjacent to each other is selected from the quasi-feature points, and the main current feature having N sides using the N adjacent quasi-feature points is selected. The second selection means for forming a square;
Adding means for generating a plurality of candidate motion vectors representing positional displacements of the main quasi-feature points for each pixel in a fixed region ;
Each of the plurality of candidate motion vectors for each pixel in the fixed region, a plurality of updated initial motion vectors obtained using initial motion vectors of main quasi-feature points, and the N adjacent quasi-feature points A prediction for determining a predicted position on the original reference frame corresponding to each of the plurality of updated initial motion vectors for all pixels included in the main current polygon based on the initial motion vector of Positioning means;
A predicted value of each pixel included in the main current polygon is obtained from a pixel value corresponding to a predicted position on the original reference frame, and the same number of predicted motion vectors as the plurality of updated initial motion vectors are predicted. Predictive pixel generating means for forming the main current polygon;
Calculating a difference between a pixel value of the main current polygon and a pixel value of each of the predicted main current polygons, and the same number of peak signal to noise as the plurality of updated initial motion vectors A difference calculating means for generating a ratio (PSNR);
Among the PSNR, a third selection means for updating the initial motion vector of the main quasi-feature points in the initial motion vector updated corresponding to the predicted major current polygon having a maximum PSNR,
A motion vector estimation unit including: a motion vector extracting unit that extracts the initial motion vector set from the storage unit as the motion vector set when the entire initial motion vector is updated a predetermined number of times; A digital video signal encoding apparatus comprising:

The first selection means comprises:
P (x, y) is the reference frame;

Edge image detection means for detecting;
A polygon grid determining means for determining a polygon grid having a large number of grid points for forming a plurality of overlapping polygons on an edge image;
Non-overlapping search range determining means for determining a non-overlapping search range for each grid point;
A set of feature points having grid points without edge points included in a non-overlapping search range, which overlaps with an edge point that is located within the search range and has the maximum sum value of eight pixels surrounding itself. 17. The digital video signal encoding apparatus according to claim 16, further comprising a feature point determining means for determining.

The polygon is a hexagon, N is 6, and the main current hexagon has six triangles defined by line segments connecting the main quasi-feature point and its adjacent quasi-feature point. 18. The digital video signal encoding apparatus according to claim 17, wherein the predicted position determining means uses a known affine transformation.

19. The digital video signal encoding according to claim 18, wherein the number of the peripheral pixel points is 8, the predetermined number of repetitions is 5, and the predetermined threshold is 6. apparatus.

20. The digital video signal encoding apparatus according to claim 19, wherein the certain area has a range of 0 to ± 7 vertically and horizontally.