JP4153774B2

JP4153774B2 - Video encoding method, decoding method thereof, and apparatus thereof

Info

Publication number: JP4153774B2
Application number: JP2002320771A
Authority: JP
Inventors: 孝之仲地; 知子澤邉
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2002-11-05
Filing date: 2002-11-05
Publication date: 2008-09-24
Anticipated expiration: 2022-11-05
Also published as: JP2004158946A

Description

【０００１】
【発明の属する技術分野】
本発明は、動画像を効率よく伝送、蓄積するための符号化および復号化に関する。
【０００２】
【従来の技術】
動画像の代表的な符号化法として、
（１）ＭＰＥＧに代表される動き補償とＤＣＴを用いた手法。
【０００３】
（２）ＭｏｔｉｏｎＪＰＥＧ２０００に代表されるウェーブレット変換を用いた手法。
はよく知られている。
【０００４】
動き補償とＤＣＴを用いた手法はこれまで数々のモデルが提案され、フレーム間相関とフレーム相関を効率よく除去することにより高い符号化効率を実現している。
【０００５】
一方、ＭｏｔｉｏｎＪＰＥＧ２０００など、ウェーブレットを用いた手法は、動き補償とＤＣＴを用いた手法にはない空間・時間・ＳＮＲスケーラビリティなど様々な有効な機能を持つ。
【０００６】
しかしながら、ＭｏｔｉｏｎＪＰＥＧ２０００はフレーム内相関のみを利用するため、動き補償とＤＣＴを用いた手法よりも符号化効率が劣ることが知られている。ウェーブレットを用いて、かつフレーム間相関を除去し、符号化効率を改善した方法としては、時空間適応予測がある（例えば、非特許文献１参照）。
【０００７】
この方法は、可逆符号化として提案された手法であるが、非可逆符号化にも適用可能な手法であり、効率よくフレーム間相関を除去することが可能である。
【０００８】
【非特許文献１】
仲地孝之、澤邊知子、藤井竜也、藤井哲郎、“解像度スケーラビリティーを有する動画像可逆符号化法の検討”、第１６回ディジタル信号処理シンポジウム講演論文集、pp.439-444,2001年11月
【０００９】
【発明が解決しようとする課題】
時空間予測符号化は、ウェーブレット係数領域において非線形予測を適用することによってフレーム間相関を除去する。ウェーブレット変換を用いているため空間スケーラビリティ機能が実現でき、かつフレーム間相関も除去しているため符号化効率も高い方式である。
【００１０】
しかしながら、時空間予測を行う帯域が固定のために、必ずしも種々の画像に適した方式とはなっていない。また、帯域を可変としたモデルも提案されている（例えば、非特許文献１）。しかし、この手法では、付加情報量が必要であり効率的ではない。
【００１１】
本発明の目的は、付加情報量を増加させることなく、画像の統計的性質に応じて適応的にフレーム間相関を除去する帯域を変化させ、符号化効率を改善した時空間予測による動画像符号化方法とその復号化方法、およびそれらの装置を提供することにある。
【００１２】
【課題を解決するための手段】
上記の問題を解決するため、本発明は、以下の動画像符号化方法、動画像復号化方法、およびそれらの装置を特徴とする。
【００１３】
（１）動画像を対象とする符号化において、原画像を帯域分割し、分割した帯域ごとにイントラフレーム処理（２次元予測）を行う２次元予測器とインターフレーム処理（３次元予測）を行う３次元予測器を切り替えるために、現フレームと参照フレームの被符号化対象画素近傍の復号済み信号の相関係数を計算し、相関係数が大きい場合にはインターフレーム処理（３次元予測）を行い、それ以外の場合にはイントラフレーム処理（２次元予測）を行う時空間適応予測において、最低周波数帯域から空間的に同一方向の帯域間に相関があることを利用して該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域のフレーム間相関が高い場合には該帯域を時空間適応予測にて符号化し、フレーム間相関が低い場合には直接符号化することを特徴とする。
【００１４】
（２）動画像を出力とする復号化において、上記（１）記載の動画像符号化方法により符号化された信号を、該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域のフレーム間相関が高い場合には該帯域を時空間適応予測にて復号化し、フレーム間相関が低い場合には直接復号化することを特徴とする。
【００１５】
（３）上記（１）記載の動画像符号化方法において、該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域のインターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して高い場合には該帯域を時空間適応予測にて符号化し、インターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して低い場合には直接符号化することを特徴とする。
【００１６】
（４）動画像を出力とする復号化において、上記（３）記載の動画像符号化方法により符号化された信号を、該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域のインターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して高い場合には該帯域を時空間適応予測にて復号化し、インターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して低い場合には直接復号化することを特徴とする。
【００１７】
（５）動画像を対象とする符号化において、原画像を帯域分割し、分割した帯域ごとにイントラフレーム処理（２次元予測）を行う２次元予測器とインターフレーム処理（３次元予測）を行う３次元予測器を切り替えるために、現フレームと参照フレームの被符号化対象画素近傍の復号済み信号の相関係数を計算し、相関係数が大きい場合にはインターフレーム処理（３次元予測）を行い、それ以外の場合にはイントラフレーム処理（２次元予測）を行う時空間適応予測において、時空間適応予測を行うか直接符号化を行うかを小ブロック単位に判断し、該帯域の小ブロックが該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域における空間的に同一位置にある小ブロックのフレーム間相関が大きい場合には時空間適応予測にて符号化し、フレーム間相関が低い場合には直接符号化することを特徴とする。
【００１８】
（６）動画像を出力とする復号化において、上記（５）記載の動画像符号化方法により符号化された信号を、時空間適応予測復号を行うか直接復号を行うかを小ブロック単位に判断し、該帯域の小ブロックが該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域における空間的に同一位置にある小ブロックのフレーム間相関が大きい場合には時空間適応予測にて復号化し、フレーム間相関が低い場合には直接復号化することを特徴とする。
【００１９】
（７）上記（５）記載の動画像符号化方法において、該帯域の小ブロックが該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域における空間的に同一位置にある小ブロック内のインターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して高い場合には時空間適応予測にて符号化し、インターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して低い場合には直接符号化することを特徴とする。
【００２０】
（８）動画像を出力とする復号化において、上記（７）記載の動画像符号化方法により符号化された信号を、該帯域の小ブロックが該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域における空間的に同一位置にある小ブロック内のインターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して高い場合には時空間適応予測にて復号化し、インターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して低い場合には直接復号化することを特徴とする。
【００２１】
（９）動画像を対象とする符号化において、原画像を帯域分割し、分割した帯域ごとにイントラフレーム処理（２次元予測）を行う２次元予測器とインターフレーム処理（３次元予測）を行う３次元予測器を切り替えるために、現フレームと参照フレームの被符号化対象画素近傍の復号済み信号の相関係数を計算し、相関係数が大きい場合にはインターフレーム処理（３次元予測）を行い、それ以外の場合にはイントラフレーム処理（２次元予測）を行う時空間適応予測において、最低周波数帯域から空間的に同一方向の帯域間に相関があることを利用して該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域のフレーム間相関が高い場合には該帯域を時空間適応予測にて符号化する手段と、フレーム間相関が低い場合には直接符号化する手段を有することを特徴とする。
【００２２】
（１０）動画像を出力とする復号化において、上記（９）記載の動画像符号化装置により符号化された信号を、該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域のフレーム間相関が高い場合には該帯域を時空間適応予測にて復号化する手段と、フレーム間相関が低い場合には直接復号化する手段を有することを特徴とする。
【００２３】
（１１）上記（９）記載の動画像符号化において、該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域のインターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して高い場合には該帯域を時空間適応予測にて符号化する手段と、インターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して低い場合には直接符号化する手段を有することを特徴とする。
【００２４】
（１２）動画像を出力とする復号化において、上記（１１）記載の動画像符号化装置により符号化された信号を、該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域のインターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して高い場合には該帯域を時空間適応予測にて復号化する手段と、インターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して低い場合には直接復号化する手段を有することを特徴とする。
【００２５】
（１３）動画像を対象とする符号化において、原画像を帯域分割し、分割した帯域ごとにイントラフレーム処理（２次元予測）を行う２次元予測器とインターフレーム処理（３次元予測）を行う３次元予測器を切り替えるために、現フレームと参照フレームの被符号化対象画素近傍の復号済み信号の相関係数を計算し、相関係数が大きい場合にはインターフレーム処理（３次元予測）を行い、それ以外の場合にはイントラフレーム処理（２次元予測）を行う時空間適応予測において、時空間適応予測を行うか直接符号化を行うかを小ブロック単位に判断し、該帯域の小ブロックが該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域における空間的に同一位置にある小ブロックのフレーム間相関が大きい場合には時空間適応予測にて符号化する手段と、フレーム間相関が低い場合には直接符号化する手段を有することを特徴とする。
【００２６】
（１４）動画像を出力とする復号化において、上記（１３）記載の動画像符号化装置により符号化された信号を、時空間適応予測復号を行うか直接復号を行うかを小ブロック単位に判断し、該帯域の小ブロックが該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域における空間的に同一位置にある小ブロックのフレーム間相関が大きい場合には時空間適応予測にて復号化する手段と、フレーム間相関が低い場合には直接復号化する手段を有することを特徴とする。
【００２７】
（１５）上記（１３）記載の動画像符号化において、該帯域の小ブロックが該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域における空間的に同一位置にある小ブロック内のインターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して高い場合には時空間適応予測にて符号化する手段と、インターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して低い場合には直接符号化する手段を有することを特徴とする。
【００２８】
（１６）動画像を出力とする復号化において、上記（１５）記載の動画像符号化装置により符号化された信号を、該帯域の小ブロックが該帯域から最低周波数帯域へ向かって１レベル低い周波数帯域における空間的に同一位置にある小ブロック内のインターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して高い場合には時空間適応予測にて復号化する手段と、インターフレーム処理（３次元予測）の割合がイントラフレーム処理（２次元予測）に対して低い場合には直接復号化する手段を有することを特徴とする。
【００２９】
【発明の実施の形態】
（第１実施形態）
図１及び図２に時空間適応予測による動画像符号化装置および方法を実現するための基本構成図を示す。図１は非可逆符号化用、図２は可逆符号化用であり、本発明による時空間適応予測符号化法は可逆および非可逆のどちらにも適用可能である。
【００３０】
図１または図２において、１０は帯域分割部、１１〜１４は分割された帯域ごとに設けた時空間適応予測処理部、１６はエントロピー符号化部である。非可逆符号化の場合には、量子化部１５においてウェーブレット係数または時空間適応予測による差分信号を量子化する。さらに、非可逆符号化の場合には、エントロピー符号化として、ＪＰＥＧ２０００で用いられているＥＢＣＯＴを用いることにより、空間およびＳＮＲスケーラビリティを実現できる。可逆符号化においては、量子化を行わず、ウェーブレット係数または時空間適応予測による差分信号を直接エントロピー符号化する。
【００３１】
入力された原画像は、帯域分割部１０において複数の空間解像度の帯域に分割される。この帯域分割には、図３に示すオクターブ分割を用いる。オクターブ分割では、１次元の２分割フィルタを用いて低帯域方向に次々に分割することによって、入力信号を複数の帯域に分割することができる。この処理を、水平方向および垂直方向にそれぞれ施す。
【００３２】
次に、分割された各帯域において時空間適応予測により符号化を行う。まず、最低周波数帯域ＬＬ（ｎ）に対して（ｎは、帯域分割レベル数）適応予測符号化を行う。ここで、適応予測符号化は、前記非特許文献１に示される方法を利用することができる。この方法を説明すると、時空間適応予測符号化法では、２次元予測器と３次元予測器を用意し、画像信号の局所的性質により予測器を切り替える。この２次元予測器と３次元予測器を切り替えるために、現フレームと参照フレームの被符号化対象画素近傍の復号済み信号の相関係数を計算する。この計算で、相関係数が大きい場合、すなわち現フレーム内信号と参照フレーム内信号の波形が似ている場合には、予測精度が向上すると考えられることから、３次元予測を行い、それ以外の場合には２次元予測を行う。これにより、画像の局所的性質に適応することができ、予測精度が向上する。
【００３３】
次に、最低周波数帯域における適応予測符号化が終了後、フレーム間相関判定部１７で現フレームと参照フレームの最低周波数帯域におけるフレーム間相関を計算する。フレーム間相関が大きいか小さいかを判断するにはいくつかの方法が考えられるが、一例を後の（第５実施形態）として示す。
【００３４】
まず、最低周波数帯域におけるフレーム間相関の大きさによって、隣接する帯域ＬＨ（ｎ）、ＨＬ（ｎ）、ＨＨ（ｎ）において、適応予測符号化を行うかどうか決定する。
【００３５】
・フレーム間相関が大きい場合
時空間適応予測符号化処理し、高帯域への処理に移る。
【００３６】
・フレーム間相関が小さい場合
ウェーブレット係数を直接符号化し、残りの高帯域のウェーブレット係数も全て直接符号化して処理終了する。
【００３７】
この理由は、最低周波数帯域の信号と隣接する帯域ＬＨ（ｎ）、ＨＬ（ｎ）、ＨＨ（ｎ）の信号には相関が存在することを利用している。すなわち、最低周波数帯域においてフレーム間相関が大きい場合は、隣接する３つの帯域においてもフレーム間相関が強いことが予想される。この手法では、隣接する帯域ＬＨ（ｎ）、ＨＬ（ｎ）、ＨＨ（ｎ）において時空間適応予測を行うかどうかは最低周波数帯域によって決定され、エンコーダおよびデコーダで共有した情報を利用できるため、新たな付加情報量も必要としない。
【００３８】
フレーム間相関が大きい場合、引き続く高周波数帯域では以下のように処理を進める。ウェーブレットにおいては、最低周波数帯域から見て同一方向の周波数帯域間には相関が存在することが知られている。すなわち、
ＬＨ（ｎ）、ＬＨ（ｎ−１）、…、ＬＨ（１）
の帯域間には相関が存在し、同様にＨＬ方向間、ＨＨ方向間、すなわち
ＨＬ（ｎ）、ＨＬ（ｎ−１）、…、ＨＬ（１）の間
ＨＨ（ｎ）、ＨＨ（ｎ−１）、…、ＨＨ（１）の間
に、それぞれ相関は存在する。この性質を利用して、各帯域において時空間適応予測を行うか行わないかを、それぞれの周波数帯域の１レベル低い周波数帯域のフレーム間相関の大きさによって判断する。すなわち、
・フレーム間相関が大きい場合
時空間適応予測符号化処理し、高帯域への処理に移る。
【００３９】
・フレーム間相関が小さい場合
ウェーブレット係数を直接符号化し、残りの高帯域のウェーブレット係数も全て直接符号化し、処理終了する。
【００４０】
以上の処理を高帯域へ向かって行い、フレーム間相関が小さいか、最高周波数帯域ＬＨ（１）まで処理を終了するか、いずれかの条件を満足するとＬＨ方向の処理は終了する。
【００４１】
以上のＬＨ（ｎ−１）、ＬＨ（ｎ−２）、…、ＬＨ（１）帯域部の処理を図示したのが図４であり、フレーム間相関の計算処理２３、２４、…の計算結果の大きさによりＬＨ方向の帯域時空間適応予測処理２１〜２２を行うか行わないかを判断する。ＨＬ方向およびＨＨ方向の処理も同様に行うことができる。図５は、それぞれの帯域の相関関係を図示したもので、矢印の方向に向かって以上に示した処理を進める。また、図６には分割レベルが２レベルの場合の帯域時空間適応予測処理のフローチャートを示し、ＬＬ（２）での予測処理でフレーム間の相関が大きい場合に、ＨＬ（２）とＬＨ（２）およびＨＨ（２）について予測処理を行い、さらにフレーム間相関の大小からＨＬ（１）とＬＨ（１）およびＨＨ（１）での予測処理を行うか行わないかを判定する。
【００４２】
以上の方法においては、帯域時空間適応予測処理を行うかどうかは１レベル低い周波数帯域のフレーム間相関の大きさによって決定されるため、この処理においてもエンコーダおよびデコーダで共有した情報を利用でき、新たな付加情報量も必要としない、
上記までの処理終了後、図１または図２のエントロピー符号化部１６においては、適応予測符号化出力の残差信号またはウェーブレット係数、および動き推定ベクトルを符号化し、符号化ビットストリームを生成する。
【００４３】
（第２実施形態）
図７および図８に、上記の第１実施形態例で符号化されたデータを復号するための動画像復号化装置および方法を実現するための基本構成図を示す。図７は非可逆復号化用、図８は可逆復号化用である。図７または図８において、３０はエントロピー復号化部、３１〜３４は分割された帯域ごとに設けた時空間適応予測復号化部、３５は帯域合成部、３７はフレーム間相関判定部である。
【００４４】
最初にエントロピー復号化部３０において符号化ビットストリームから残差信号またはウェーブレット係数、および動きベクトルを復号する。引き続き、非可逆符号化の場合には、逆量子化部３５においてウェーブレット係数または時空間適応予測による差分信号を逆量子化する。最低周波数帯ＬＬ（ｎ）の時空間適応予測復号化部３１の処理は、例えば、前記の非特許文献１に示す方法を利用することができる。この方法を説明すると、時空間適応予測復号化においては、時空間適応予測符号化と同様に、現フレームと参照フレームの被符号化対象画素近傍の復号済み信号の相関係数を計算して予測器を選択する。この計算で、相関係数が大きい場合には３次元予測を行い、小さい場合には２次元予測を行う。得られた予測値に、差分信号を逆量子化した信号を加えることにより、最低周波数帯域の信号が復元される。
【００４５】
次に、時空間適応予測復号化部３１は、引き続く高周波数帯域の復号を時空間予測復号化で行うか、直接ウェーブレット係数を復号するか判断するために、フレーム間相関判定部３７で最低周波数帯ＬＬ（ｎ）におけるフレーム間相関を計算する。この計算で、
・フレーム間相関が大きい場合
時空間適応予測復号化処理し、高帯域への処理に移る。
【００４６】
・フレーム間相関が小さい場合
ウェーブレット係数を直接復号化し、残りの高帯域のウェーブレット係数も全て直接復号化して処理終了する。
【００４７】
フレーム間相関が大きい場合、引き続く高周波数帯域では以下のように処理を進める。
【００４８】
ＬＨ方向、ＨＬ方向、ＨＨ方向の各帯域間で、それぞれ低周波帯域から高周波数帯域へ向かって、１レベル低い周波数帯域におけるフレーム間相関を計算する。この計算で、
・フレーム間相関が大きい場合
時空間適応予測復号化処理し、高帯域への処理に移る。
【００４９】
・フレーム間相関が小さい場合
ウェーブレット係数を直接復号化し、残りの高帯域のウェーブレット係数も全て直接復号化し、処理終了する。
【００５０】
なお、ＬＨ（ｎ−１）、ＬＨ（ｎ−２）、…、ＬＨ（１）帯域部の処理を図示したのが図９であり、フレーム間相関判定部４３、４４、…によりの計算により、ＬＨ方向の帯域時空間適応予測復号化処理４０〜４２を行うか行わないかを判断し、フレーム間相関が小さいか、または最高周波数帯域ＬＨ（１）の処理を終了するか、いずれかの条件を満足するとＬＨ方向の処理は終了する。ＨＬ方向およびＨＨ方向の処理も同様に行うことができる。図１０には分割レベルが２レベルの場合の帯域時空間適応予測復号化処理のフローチャートを示し、ＬＬ（２）での予測処理でフレーム間の相関が大きい場合に、ＨＬ（２）とＬＨ（２）およびＨＨ（２）について予測処理を行い、さらにフレーム間相関の大小からＨＬ（１）とＬＨ（１）およびＨＨ（１）での予測処理を行うか行わないかを判定する。
【００５１】
上記までの処理終了後、図７または図８の帯域合成部３５においては、各帯域の出力を合成して画像を復号化する。
【００５２】
（第３実施形態）
前記の第１実施形態例においては、適応予測符号化を行うかウェーブレット係数を直接符号化するかは各周波数帯域で帯域ごとに行っていた。本実施形態例では、最低周波数帯域以外において適応予測符号化を行うかウェーブレット係数を直接符号化するかの判断を小ブロック単位で行う。小ブロック単位の処理によって、より画像の局所的性質に適応した符号化が可能となり、符号化効率が向上する。
【００５３】
ウェーブレットにおいては、同じ空間位置に対応する各周波数成分は互いに相関があることが知られている、この性質を利用して、図１１に示すように各周波数帯域の同じ空間位置に対応する小ブロック単位で処理を行う。
【００５４】
まず、最低周波数帯域は、すべて第１実施形態例と同様に、すべて時空間適応予測符号化を施す。
【００５５】
次に、最低周波数帯域をＬ×Ｌ画素の小ブロック単位に分割する。図の左から右、上から下の方向へ向かうラスタースキャンの順で、それぞれの小ブロックのフレーム間相関を計算する。それぞれの小ブロックのフレーム間相関によって、隣接する帯域ＬＨ（ｎ）、ＨＬ（ｎ）、ＨＨ（ｎ）の対応する空間位置の小ブロックにおいて適応予測符号化を行うかどうか決定する。すなわち、
・フレーム間相関が大きい場合
小ブロック内を時空間適応予測符号化処理し、高帯域への処理に移る。
【００５６】
・フレーム間相関が小さい場合
ウェーブレット係数を直接符号化し、残りの高帯域のウェーブレット係数も全て直接符号化し、処理終了する。
【００５７】
フレーム間相関が大きい場合、引き続く高周波数帯域で、ブロック単位で同様な処理を進める。ラスタースキャン順序で全ての小ブロックに関して、以上の処理を行う。図１２には分割レベルが２レベルの場合の帯域適応処理のフローチャートを示し、ＬＬ（２）の小ブロックＭＢ（ｎ）についての予測処理でフレーム間の相関が大きい場合に、ＨＬ（２）ＭＢ（ｎ）とＬＨ（２）ＭＢ（ｎ）およびＨＨ（２）ＭＢ（ｎ）について予測処理を行い、さらにフレーム間相関の大小からＨＬ（１）ＭＢ（ｎ）とＬＨ（１）ＭＢ（ｎ）およびＨＨ（１）ＭＢ（ｎ）での予測処理を行うか行わないかを判定する。
【００５８】
エントロピー符号化においては、適応予測符号化出力の残差信号またはウェーブレット係数、および動き推定ベクトルを符号化し、符号化ビットストリームを生成する。
【００５９】
（実施形態４）
本実施形態例では、前記の第３実施形態例による小ブロック単位処理により符号化された信号の復号化方法を示す。基本構成は、図７または図８に示す第３実施形態例に等しいが、最低周波数帯域以外の処理が小ブロック単位になる点が異なる。
【００６０】
最初にエントロピー復号化部３０において符号化ビットストリームから残差信号またはウェーブレット係数、および動きベクトルを復号する。引き続き、非可逆符号化の場合には、逆量子化部３５においてウェーブレット係数または時空間適応予測による差分信号を逆量子化する。
【００６１】
最低周波数帯ＬＬ（ｎ）の時空間適応予測復号化部３１の処理は、第３実施形態例に全く等しく、前記の非特許文献１に示す方法を利用することができる。
【００６２】
次に、復号化された最低周波数帯域の信号をＬ×Ｌ画素の小ブロック単位に分割する。図の左から右、上から下の方向へ向かうラスタースキャンの順で、それぞれの小ブロックのフレーム間相関を計算する。それぞれの小ブロックのフレーム間相関の大きさによって、隣接する帯域ＬＨ（ｎ）、ＨＬ（ｎ）、ＨＨ（ｎ）の対応する空間位置の小ブロックにおいて適応予測復号化化を行うかどうか決定する。すなわち、
フレーム間相関が大きい場合
小ブロック内を時空間適応予測復号化処理し、高帯域への処理に移る。
【００６３】
・フレーム間相関が小さい場合
ウェーブレット係数を直接復号化し、残りの高帯域のウェーブレット係数も全て直接復号化し、処理終了する。
【００６４】
フレーム間相関が大きい場合、引き続く高周波数帯域で、小ブロック単位で同様な処理を進める。ラスタースキャン順序で全ての小ブロックに関して、これらの処理を行う。図１３には分割レベルが２レベルの場合の帯域時空間適応予測復号化処理のフローチャートを示し、ＬＬ（２）の小ブロックＭＢ（ｎ）についての予測処理でフレーム間の相関が大きい場合に、ＨＬ（２）ＭＢ（ｎ）とＬＨ（２）ＭＢ（ｎ）およびＨＨ（２）ＭＢ（ｎ）について予測処理を行い、さらにフレーム間相関の大小からＨＬ（１）ＭＢ（ｎ）とＬＨ（１）ＭＢ（ｎ）およびＨＨ（１）ＭＢ（ｎ）での予測処理を行うか行わないかを判定する。
【００６５】
これら判定処理後、帯域合成部３６においては、各帯域の出力を合成して画像を復号化する。
【００６６】
（第５実施形態）
本実施形態例では、前記の第１実施形態において、フレーム間相関の大きさとして、以下に示すフレーム間の相関係数Ｒ_interを用いた場合の動画像符号化装置および方法の実施形態を示す。
【００６７】
Ｒ_inter＝Ｉｎｔｅｒ／（Ｉｎｔｅｒ＋Ｉｎｔｒａ）
ここで、Ｉｎｔｅｒはインターフレーム処理（３次元予測）の処理回数、Ｉｎｔｒａはイントラフレーム処理（２次元予測）の処理回数を表す。インターフレーム処理を行うかイントラフレーム処理を行うかは、被符号化画素近傍の２枚の連続するフレーム間の信号値の相関係数によって決定され、相関係数が大きい場合にはインターフレーム処理、小さい場合にはフレーム内処理を行う。
【００６８】
したがって、Ｒ_interが大きい場合、すなわち、インターフレーム処理の処理回数が多い場合には、フレーム間相関が大きいと判断する。この最低周波数帯域でのＲ_inter値によって、隣接する帯域ＬＨ（ｎ）、ＨＬ（ｎ）、ＨＨ（ｎ）において適応予測符号化を行うかどうか決定する。すなわち、
・Ｒ_inter＞＝ＴＨの場合
時空間適応予測符号化処理し、高帯域への処理に移る。
【００６９】
・Ｒ_inter＜ＴＨの場合
ウェーブレット係数を直接符号化し、残りの高帯域のウェーブレット係数も全て直接符号化し、処理終了する。
【００７０】
ここで、ＴＨは閾値であり、０＝＜ＴＨ＝＜１の値をとる。
【００７１】
Ｒ_inter＞＝ＴＨの場合、引き続く高周波数帯域では以下のように処理を進める。
【００７２】
・Ｒ_inter＞＝ＴＨの場合
時空間適応予測符号化処理し、高帯域への処理に移る。
【００７３】
・Ｒ_inter＜ＴＨの場合
ウェーブレット係数を直接符号化し、残りの高帯域のウェーブレット係数も全て直接符号化し、処理終了する。
【００７４】
以上の処理を高帯域へ向かって行い、Ｒ_inter＜ＴＨの条件を満足するか、最高周波数帯域ＨＬ（１）まで処理を終了するか、いずれかの条件を満足するとＬＨ方向の処理は終了する。以上のＬＨ（ｎ−１）、ＬＨ（ｎ−２）、…、ＬＨ（１）帯域部の処理を図示したのが図１４であり、Ｒ_interによる相関判定部２５、２６、…の計算結果の大きさによりＬＨ方向の帯域時空間適応予測処理２１〜２２を行うか行わないかを判断する。ＨＬ方向およびＨＨ方向の処理も同様に行うことができる。図１５には分割レベルが２レベルの場合の帯域時空間適応予測処理のフローチャートを示し、ＬＬ（２）での予測処理でＲ_inter値がＴＨ以上に大きい場合に、ＨＬ（２）とＬＨ（２）およびＨＨ（２）について予測処理を行い、さらにＲ_inter値とＴＨとの大小からＨＬ（１）とＬＨ（１）およびＨＨ（１）での予測処理を行うか行わないかを判定する。
【００７５】
以上の方法においては、時空間適応予測を行うかどうかは１レベル低い周波数帯域のＲ_inter値によって決定されるため、この処理においてもエンコーダおよびデコーダで共有した情報を利用でき、新たな付加情報量も必要としない。
【００７６】
（第６実施形態）
本実施形態例では、第２実施形態において、フレーム間相関の大きさとして、Ｒ_interを用いた場合の動画像復号化装置および方法の実施形態を示す。Ｒ_interによる相関判定部２５、２６において最低周波数帯ＬＨ（ｎ）におけるＲ_inter値を計算する。この結果で、
・Ｒ_inter＞＝ＴＨの場合
時空間適応予測復号化処理し、高帯域への処理に移る。
【００７７】
・Ｒ_inter＜ＴＨの場合、ウェーブレット係数を直接復号化し、残りの高帯域のウェーブレット係数も全て直接復号化し、処理終了する。
【００７８】
Ｒ_inter＞＝ＴＨの場合、引き続く高周波数帯域では以下のように処理を進める。
【００７９】
ＬＨ方向、ＨＬ方向、ＨＨ方向の各帯域間で、それぞれ低周波帯域から高周波数帯域へ向かって、１レベル低い周波数帯域におけるＲ_inter値を計算し、その値が、
・Ｒ_inter＞＝ＴＨの場合
時空間適応予測復号化処理し、高帯域への処理に移る。
【００８０】
・Ｒ_inter＜ＴＨの場合
ウェーブレット係数を直接復号化し、残りの高帯域のウェーブレット係数も全て直接復号化し、処理終了する。
【００８１】
なお、ＬＨ（ｎ−１）、ＬＨ（ｎ−２）、…、ＬＨ（１）帯域部の処理例を図１６に示した。Ｒ_inter＜ＴＨの条件を満足するかまたは最高周波数帯域ＬＨ（１）の処理を終了するか、いずれかの条件を満足すると、ＬＨ方向の処理は終了する。ＨＬ方向およびＨＨ方向の処理も同様に行うことができる。図１７には分割レベルが２レベルの場合の帯域時空間適応復号化処理のフローチャートを示し、ＬＬ（２）での予測処理でＲ_inter値がＴＨより大きい場合に、ＨＬ（２）とＬＨ（２）およびＨＨ（２）について予測処理を行い、さらにＲ_interの大小からＨＬ（１）とＬＨ（１）およびＨＨ（１）での予測処理を行うか行わないかを判定する。
【００８２】
（第７実施形態）
本実施形態例では、第３実施形態において、フレーム間相関の大きさとして、Ｒ_interを用いた場合の動画像符号化装置および方法の実施形態を示す。図１１の左から右、上から下の方向へ向かうラスタースキャンの順で、それぞれの小ブロックのＲ_inter値を計算する。それぞれの小ブロックのＲ_inter値によって、隣接する帯域ＬＨ（ｎ）、ＨＬ（ｎ）、ＨＨ（ｎ）の対応する空間位置の小ブロックにおいて適応予測符号化を行うかどうか決定する。すなわち、
・Ｒ_inter＞＝ＴＨの場合
小ブロック内を時空間適応予測符号化処理し、高帯域への処理に移る。
【００８３】
・Ｒ_inter＜ＴＨの場合
ウェーブレット係数を直接符号化し、残りの高帯域のウェーブレット係数も全て直接符号化し、処理終了する。
【００８４】
Ｒ_inter＞＝ＴＨの場合、引き続く高周波数帯域で、ブロック単位で同様な処理を進める。ラスタースキャン順序で全ての小ブロックに関して、以上の処理を行う。図１８には分割レベルが２レベルの場合の帯域時空間適応符号化処理のフローチャートを示し、ＬＬ（２）ＭＢ（ｎ）での予測処理でＲ_inter値がＴＨより大きい場合に、ＨＬ（２）ＭＢ（ｎ）とＬＨ（２）ＭＢ（ｎ）およびＨＨ（２）ＭＢ（ｎ）について予測処理を行い、さらにＲ_interの大小からＨＬ（１）ＭＢ（ｎ）とＬＨ（１）ＭＢ（ｎ）およびＨＨ（１）ＭＢ（ｎ）での予測処理を行うか行わないかを判定する。
【００８５】
（第８実施形態）
本実施形態例では、第４実施形態において、フレーム間相関の大きさとして、Ｒ_interを用いた場合の動画像復号化装置および方法の実施形態を示す。図１１の左から右、上から下の方向へ向かうラスタースキャンの順で、それぞれの小ブロックのＲ_inter値を計算する。それぐれの小ブロックのＲ_inter値によって、隣接する帯域ＬＨ（ｎ）、ＨＬ（ｎ）、ＨＨ（ｎ）の対応する空間位置の小ブロックにおいて適応予測復号化化を行うかどうか決定する。すなわち、
・Ｒ_inter＞＝ＴＨの場合
小ブロック内を時空間適応予測復号化処理し、高帯域への処理に移る。
【００８６】
・Ｒ_inter＜ＴＨの場合
ウェーブレット係数を直接復号化し、残りの高帯域のウェーブレット係数も全て直接復号化し、処理終了する。
【００８７】
Ｒ_inter＞＝ＴＨの場合、引き続く高周波数帯域で、小ブロック単位で同様な処理を進める。ラスタースキャン順序で全ての小ブロックに関して、これらの処理を行う。図１９には分割レベルが２レベルの場合の帯域時空間適応復号化処理のフローチャートを示し、ＬＬ（２）ＭＢ（ｎ）での予測処理でＲ_inter値がＴＨより大きい場合に、ＨＬ（２）ＭＢ（ｎ）とＬＨ（２）ＭＢ（ｎ）およびＨＨ（２）ＭＢ（ｎ）について予測処理を行い、さらにＲ_interの大小からＨＬ（１）ＭＢ（ｎ）とＬＨ（１）ＭＢ（ｎ）およびＨＨ（１）ＭＢ（ｎ）での予測処理を行うか行わないかを判定する。
【００８８】
【発明の効果】
以上の説明で明らかなように、本発明によれば、動画像の効率の良い非可逆符号化および可逆符号化ができ、少ないディスク容量で保存が可能となる。さらに、空間解像度スケーラビリティーを有するために、画像表示機器の性能や用途に応じた空間解像度で画像を復号することが可能である。
【００８９】
また、低域から任意の帯域まで復号すると原画像よりも低い空間解像度の画像を再生でき、全てのデータを復号すると、原画像と同じ解像度の画像が再生される。画像表示機器の性能や用途に応じて、原画像よりも低い空間解像度の画像を再生したい場合は、必要な帯域までに対応する符号化データの復号のみで済む。原画像と同じ解像度の画像を再生して解像度変換を行うよりも処理時間が短く、また符号化ビットストリームを伝送する場合は必要なデータのみを伝送すれば良いため、伝送レートも小さくなる。
【図面の簡単な説明】
【図１】本発明の第１実施形態における動画像符号化装置・方法の基本構成図（非可逆符号化用）。
【図２】本発明の第１実施形態における動画像符号化装置・方法の基本構成図（可逆符号化用）。
【図３】オクターブ分割による帯域分割の模式図。
【図４】帯域時空間適応予測符号化処理の要部構成図。
【図５】分割帯域のフレーム間相関関係図。
【図６】フレーム間相関による帯域時空間適応予測符号化処理のフローチャート。
【図７】本発明の第２実施形態における動画像復号化装置・方法の基本構成図（非可逆符号化用）。
【図８】本発明の第２実施形態における動画像符号化装置・方法の基本構成図（可逆符号化用）。
【図９】帯域時空間適応予測復号化処理の要部構成図。
【図１０】フレーム間相関による帯域時空間適応予測復号処理のフローチャート。
【図１１】小ブロック分割帯域のフレーム間相関関係図。
【図１２】小ブロックによる帯域時空間適応予測符号化処理のフローチャート。
【図１３】小ブロックによる帯域時空間適応予測復号化処理のフローチャート。
【図１４】相関係数Ｒ_interによる帯域時空間適応予測符号化処理の要部構成図。
【図１５】相関係数Ｒ_interによる帯域時空間適応予測符号化処理のフローチャート。
【図１６】相関係数Ｒ_interによる帯域時空間適応予測復号化処理の要部構成図。
【図１７】相関係数Ｒ_interによる帯域時空間適応予測復号化処理のフローチャート。
【図１８】小ブロックと相関係数Ｒ_interによる帯域時空間適応予測符号化処理のフローチャート。
【図１９】小ブロックと相関係数Ｒ_interによる帯域時空間適応予測復号化処理のフローチャート。
【符号の説明】
１０…帯域分割部
１１〜１４、２０〜２２…帯域時空間適応予測部
１５…量子化部
１６…エントロピー符号化部
１７…フレーム間相関判定部
２３、２４…フレーム間相関判定部
２５、２６…Ｒ_interによる相関判定部
３０…エントロピー復号化部
３１〜３４…帯域時空間適応予測復号化部
３５…逆量子化部
３６…帯域合成部
３７…フレーム間相関判定部
４０〜４２…帯域時空間適応予測復号化部
４３、４４…フレーム間相関判定部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to encoding and decoding for efficiently transmitting and storing moving images.
[0002]
[Prior art]
As a typical encoding method for moving images,
(1) A technique using motion compensation represented by MPEG and DCT.
[0003]
(2) A technique using wavelet transform represented by Motion JPEG2000.
Is well known.
[0004]
As a method using motion compensation and DCT, various models have been proposed so far, and high encoding efficiency is realized by efficiently removing inter-frame correlation and frame correlation.
[0005]
On the other hand, methods using wavelets, such as Motion JPEG 2000, have various effective functions such as space, time, and SNR scalability that are not available in methods using motion compensation and DCT.
[0006]
However, since Motion JPEG 2000 uses only intra-frame correlation, it is known that the encoding efficiency is inferior to the method using motion compensation and DCT. Spatio-temporal adaptive prediction is available as a method that improves the coding efficiency by using wavelets and eliminating inter-frame correlation (see, for example, Non-Patent Document 1).
[0007]
This method is a method proposed as lossless encoding, but can also be applied to lossy encoding, and can efficiently eliminate correlation between frames.
[0008]
[Non-Patent Document 1]
Takayuki Nakachi, Tomoko Sawazaki, Tatsuya Fujii, Tetsuro Fujii, “Study of lossless video coding with resolution scalability”, Proceedings of the 16th Digital Signal Processing Symposium, pp.439-444, November 2001
[0009]
[Problems to be solved by the invention]
Spatio-temporal predictive coding removes inter-frame correlation by applying nonlinear prediction in the wavelet coefficient domain. Since the wavelet transform is used, the spatial scalability function can be realized and the correlation between frames is also removed, so that the coding efficiency is high.
[0010]
However, since the band for performing spatiotemporal prediction is fixed, it is not always a method suitable for various images. A model with variable bandwidth has also been proposed (for example, Non-Patent Document 1). However, this method requires an additional amount of information and is not efficient.
[0011]
An object of the present invention is to change the band for removing the inter-frame correlation adaptively according to the statistical properties of the image without increasing the amount of additional information, and to improve the coding efficiency for the moving image code based on space-time prediction. And a decoding method thereof, and an apparatus thereof.
[0012]
[Means for Solving the Problems]
In order to solve the above problems, the present invention is characterized by the following moving picture coding method, moving picture decoding method, and apparatuses thereof.
[0013]
(1) In encoding for moving images, the original image is band-divided, and for each divided band In order to switch between a two-dimensional predictor that performs intra-frame processing (two-dimensional prediction) and a three-dimensional predictor that performs inter-frame processing (three-dimensional prediction), a decoded signal in the vicinity of the encoding target pixel of the current frame and the reference frame is used. If the correlation coefficient is large, inter-frame processing (three-dimensional prediction) is performed, otherwise intra-frame processing (two-dimensional prediction) is performed. In spatio-temporal adaptive prediction, when there is a correlation between bands in the same direction spatially from the lowest frequency band, when the inter-frame correlation in the frequency band one level lower from the band toward the lowest frequency band is high The band is encoded by spatio-temporal adaptive prediction, and is directly encoded when the inter-frame correlation is low.
[0014]
(2) In decoding using a moving image as an output, a signal encoded by the moving image encoding method described in (1) above is used for inter-frame correlation in a frequency band that is one level lower from the band toward the lowest frequency band. If the frequency is high, the band is decoded by spatio-temporal adaptive prediction, and if the correlation between frames is low, it is directly decoded.
[0015]
(3) In the moving picture coding method according to (1), the inter-frame processing in a frequency band that is one level lower from the band toward the lowest frequency band (3D prediction) Percentage of intra-frame processing (2D prediction) If the frequency is higher than that, the band is encoded by spatio-temporal adaptive prediction, and interframe processing is performed. (3D prediction) Percentage of intra-frame processing (2D prediction) However, it is characterized in that it is directly encoded when it is low.
[0016]
(4) In decoding using a moving image as an output, the inter-frame processing of the signal encoded by the moving image encoding method described in (3) above in a frequency band one level lower from the band toward the lowest frequency band (3D prediction) Percentage of intra-frame processing (2D prediction) If the frequency is higher than the above, the band is decoded by spatio-temporal adaptive prediction, and interframe processing is performed. (3D prediction) Percentage of intra-frame processing (2D prediction) If it is lower than the above, the decoding is performed directly.
[0017]
(5) In encoding for moving images, the original image is band-divided, and for each divided band In order to switch between a two-dimensional predictor that performs intra-frame processing (two-dimensional prediction) and a three-dimensional predictor that performs inter-frame processing (three-dimensional prediction), a decoded signal in the vicinity of the encoding target pixel of the current frame and the reference frame is used. If the correlation coefficient is large, inter-frame processing (three-dimensional prediction) is performed, otherwise intra-frame processing (two-dimensional prediction) is performed. In spatiotemporal adaptive prediction, Spatio-temporal adaptive prediction Between the frames of small blocks in the same spatial position in the frequency band that is one level lower from the band toward the lowest frequency band. When the correlation is large, encoding is performed by spatio-temporal adaptive prediction, and when the correlation between frames is low, encoding is directly performed.
[0018]
(6) In decoding using a moving image as an output, a signal encoded by the moving image encoding method described in (5) above, Spatio-temporal adaptive prediction Whether to perform decoding or direct decoding is determined in units of small blocks, and the small blocks in the band are between the frames of the small blocks in the same spatial position in the frequency band one level lower from the band toward the lowest frequency band. If the correlation is large Spatio-temporal adaptive prediction It is characterized in that when the correlation between frames is low, it is directly decoded.
[0019]
(7) In the moving picture coding method according to (5), the inter-frame in a small block in which the small block in the band is spatially at the same position in a frequency band that is one level lower from the band toward the lowest frequency band. processing (3D prediction) Percentage of intra-frame processing (2D prediction) If it is higher than, it is encoded by spatio-temporal adaptive prediction and interframe processing (3D prediction) Percentage of intra-frame processing (2D prediction) However, it is characterized in that it is directly encoded when it is low.
[0020]
(8) In decoding using a moving image as an output, a signal encoded by the moving image encoding method described in (7) above is one level lower in the small block of the band from the band toward the lowest frequency band. Inter-frame processing in small blocks at the same spatial position in the frequency band (3D prediction) Percentage of intra-frame processing (2D prediction) If it is higher than the value, it is decoded by spatio-temporal adaptive prediction and interframe processing (3D prediction) Percentage of intra-frame processing (2D prediction) If it is lower than the above, the decoding is performed directly.
[0021]
(9) In encoding for moving images, the original image is band-divided, and for each divided band In order to switch between a two-dimensional predictor that performs intra-frame processing (two-dimensional prediction) and a three-dimensional predictor that performs inter-frame processing (three-dimensional prediction), a decoded signal in the vicinity of the encoding target pixel of the current frame and the reference frame is used. If the correlation coefficient is large, inter-frame processing (three-dimensional prediction) is performed, otherwise intra-frame processing (two-dimensional prediction) is performed. In spatio-temporal adaptive prediction, when there is a correlation between bands in the same direction spatially from the lowest frequency band, when the inter-frame correlation in the frequency band one level lower from the band toward the lowest frequency band is high It is characterized by having means for encoding the band by spatio-temporal adaptive prediction and means for directly encoding when the inter-frame correlation is low.
[0022]
(10) In decoding using a moving image as an output, a signal encoded by the moving image encoding device according to (9) is correlated with the inter-frame correlation in a frequency band one level lower from the band toward the lowest frequency band. It is characterized by having means for decoding the band by spatio-temporal adaptive prediction when the frequency is high and means for directly decoding when the inter-frame correlation is low.
[0023]
(11) In the moving picture coding described in (9) above, inter-frame processing in a frequency band that is one level lower from the band toward the lowest frequency band (3D prediction) Percentage of intra-frame processing (2D prediction) Means for encoding the band by spatio-temporal adaptive prediction, and interframe processing (3D prediction) Percentage of intra-frame processing (2D prediction) In contrast, it is characterized by having a means for direct encoding when the value is low.
[0024]
(12) In decoding using a moving image as an output, the signal encoded by the moving image encoding device according to (11) is subjected to interframe processing in a frequency band one level lower from the band toward the lowest frequency band. (3D prediction) Percentage of intra-frame processing (2D prediction) Means for decoding the band by spatio-temporal adaptive prediction, and interframe processing (3D prediction) Percentage of intra-frame processing (2D prediction) In contrast, it is characterized by having a means for direct decoding when it is low.
[0025]
(13) In encoding for moving images, the original image is band-divided, and for each divided band In order to switch between a two-dimensional predictor that performs intra-frame processing (two-dimensional prediction) and a three-dimensional predictor that performs inter-frame processing (three-dimensional prediction), a decoded signal in the vicinity of the encoding target pixel of the current frame and the reference frame is used. If the correlation coefficient is large, inter-frame processing (three-dimensional prediction) is performed, otherwise intra-frame processing (two-dimensional prediction) is performed. In spatiotemporal adaptive prediction, Spatio-temporal adaptive prediction Whether to perform direct encoding or direct encoding, and determine the band Small block When the inter-frame correlation of small blocks located in the same spatial position in the frequency band that is one level lower from the band toward the lowest frequency band is large, means for encoding by spatio-temporal adaptive prediction, If it is low, it has a means for direct encoding.
[0026]
(14) In decoding using a moving image as an output, the signal encoded by the moving image encoding device described in (13) above, Spatio-temporal adaptive prediction Whether to perform decoding or direct decoding is determined in units of small blocks, and the small blocks in the band are between the frames of the small blocks in the same spatial position in the frequency band one level lower from the band toward the lowest frequency band. It is characterized by having means for decoding by spatio-temporal adaptive prediction when the correlation is large and means for directly decoding when the correlation between frames is low.
[0027]
(15) In the moving picture coding according to (13), the inter-frame processing in the small block in which the small block in the band is spatially at the same position in the frequency band one level lower from the band toward the lowest frequency band. (3D prediction) Percentage of intra-frame processing (2D prediction) If it is higher than the above, means for encoding by spatio-temporal adaptive prediction and interframe processing (3D prediction) Percentage of intra-frame processing (2D prediction) In contrast, it is characterized by having a means for direct encoding when the value is low.
[0028]
(16) In decoding using a moving image as an output, the signal encoded by the moving image encoding device described in (15) is one level lower in the small block of the band from the band toward the lowest frequency band. Inter-frame processing in small blocks at the same spatial position in the frequency band (3D prediction) Percentage of intra-frame processing (2D prediction) When it is higher than the above, means for decoding by spatio-temporal adaptive prediction and interframe processing (3D prediction) Percentage of intra-frame processing (2D prediction) In contrast, it is characterized by having a means for direct decoding when it is low.
[0029]
DETAILED DESCRIPTION OF THE INVENTION
(First embodiment)
1 and 2 show a basic configuration diagram for realizing a moving picture coding apparatus and method based on space-time adaptive prediction. FIG. 1 is for lossy encoding, and FIG. 2 is for lossless encoding. The space-time adaptive predictive encoding method according to the present invention can be applied to both lossless and lossy encoding.
[0030]
In FIG. 1 or FIG. 2, 10 is a band dividing unit, 11 to 14 are spatiotemporal adaptive prediction processing units provided for each divided band, and 16 is an entropy coding unit. In the case of irreversible encoding, the quantization unit 15 quantizes the difference signal based on wavelet coefficients or space-time adaptive prediction. Further, in the case of lossy encoding, spatial and SNR scalability can be realized by using EBCOT used in JPEG2000 as entropy encoding. In lossless coding, quantization is not performed, and a differential signal based on wavelet coefficients or space-time adaptive prediction is directly entropy coded.
[0031]
The input original image is divided into a plurality of spatial resolution bands by the band dividing unit 10. For this band division, octave division shown in FIG. 3 is used. In the octave division, the input signal can be divided into a plurality of bands by sequentially dividing in a low band direction using a one-dimensional two-division filter. This process is performed in the horizontal direction and the vertical direction, respectively.
[0032]
Next, encoding is performed by space-time adaptive prediction in each divided band. First, adaptive prediction coding is performed on the lowest frequency band LL (n) (n is the number of band division levels). Here, the adaptive prediction encoding can use the method disclosed in Non-Patent Document 1. Explaining this method, in the space-time adaptive predictive coding method, a two-dimensional predictor and a three-dimensional predictor are prepared, and the predictor is switched depending on the local property of the image signal. In order to switch between the two-dimensional predictor and the three-dimensional predictor, the correlation coefficient of the decoded signal in the vicinity of the encoding target pixel in the current frame and the reference frame is calculated. In this calculation, when the correlation coefficient is large, that is, when the waveform of the signal in the current frame and the signal in the reference frame are similar, the prediction accuracy is considered to improve. In some cases, two-dimensional prediction is performed. Thereby, it can adapt to the local property of an image and prediction accuracy improves.
[0033]
Next, after the adaptive prediction coding in the lowest frequency band is completed, the interframe correlation determination unit 17 calculates the interframe correlation in the lowest frequency band of the current frame and the reference frame. Several methods can be considered to determine whether the inter-frame correlation is large or small, and an example will be shown later (fifth embodiment).
[0034]
First, it is determined whether to perform adaptive predictive coding in adjacent bands LH (n), HL (n), and HH (n) according to the magnitude of inter-frame correlation in the lowest frequency band.
[0035]
・ When correlation between frames is large
The space-time adaptive predictive encoding process is performed, and the process moves to a higher band.
[0036]
・ When correlation between frames is small
The wavelet coefficients are directly encoded, all the remaining high-band wavelet coefficients are directly encoded, and the process is terminated.
[0037]
The reason for this is that there is a correlation between signals in the lowest frequency band and signals in adjacent bands LH (n), HL (n), and HH (n). That is, when the inter-frame correlation is large in the lowest frequency band, it is expected that the inter-frame correlation is strong also in the three adjacent bands. In this method, whether space-time adaptive prediction is performed in adjacent bands LH (n), HL (n), and HH (n) is determined by the lowest frequency band, and information shared by the encoder and decoder can be used. A new amount of additional information is not required.
[0038]
When the inter-frame correlation is large, the process proceeds as follows in the subsequent high frequency band. In wavelets, it is known that there is a correlation between frequency bands in the same direction as seen from the lowest frequency band. That is,
LH (n), LH (n-1), ..., LH (1)
There is a correlation between the bands of HL, and similarly between HL directions, between HH directions,
Between HL (n), HL (n-1), ..., HL (1)
Between HH (n), HH (n-1), ..., HH (1)
Each has a correlation. Using this property, whether or not the spatio-temporal adaptive prediction is performed in each band is determined based on the magnitude of the inter-frame correlation in the frequency band one level lower than each frequency band. That is,
・ When correlation between frames is large
The space-time adaptive predictive encoding process is performed, and the process moves to a higher band.
[0039]
・ When correlation between frames is small
The wavelet coefficients are directly encoded, and all the remaining high-band wavelet coefficients are directly encoded, and the process ends.
[0040]
The above processing is performed toward the high band, and when the inter-frame correlation is small or the processing is terminated up to the maximum frequency band LH (1), or when either condition is satisfied, the processing in the LH direction is terminated.
[0041]
FIG. 4 illustrates the processing of the LH (n−1), LH (n−2),..., LH (1) band part, and the calculation results of inter-frame correlation calculation processing 23, 24,. It is determined whether or not to perform the band spatio-temporal adaptive prediction processes 21 to 22 in the LH direction according to the size of. Processing in the HL direction and the HH direction can be performed in the same manner. FIG. 5 illustrates the correlation between the respective bands, and the processing described above proceeds in the direction of the arrow. FIG. 6 shows a flowchart of the band spatio-temporal adaptive prediction process when the division level is two. When the correlation between frames is large in the prediction process in LL (2), HL (2) and LH ( 2) and HH (2) are subjected to prediction processing, and further, it is determined whether or not to perform prediction processing in HL (1), LH (1), and HH (1) based on the magnitude of inter-frame correlation.
[0042]
In the above method, whether or not to perform the band spatio-temporal adaptive prediction process is determined by the magnitude of the inter-frame correlation in the frequency band one level lower, so that information shared by the encoder and decoder can be used in this process, No additional amount of additional information is required,
After the above processing is completed, the entropy encoding unit 16 in FIG. 1 or 2 encodes the residual signal or wavelet coefficient of the adaptive prediction encoding output and the motion estimation vector to generate an encoded bit stream.
[0043]
(Second Embodiment)
FIGS. 7 and 8 show basic configuration diagrams for realizing the moving picture decoding apparatus and method for decoding the data encoded in the first embodiment. FIG. 7 is for lossy decoding, and FIG. 8 is for lossless decoding. 7 or 8, 30 is an entropy decoding unit, 31 to 34 are space-time adaptive prediction decoding units provided for each divided band, 35 is a band synthesis unit, and 37 is an interframe correlation determination unit.
[0044]
First, the entropy decoding unit 30 decodes the residual signal or wavelet coefficient and the motion vector from the encoded bit stream. Subsequently, in the case of irreversible encoding, the inverse quantization unit 35 inversely quantizes the wavelet coefficient or the differential signal based on space-time adaptive prediction. For the processing of the spatio-temporal adaptive predictive decoding unit 31 in the lowest frequency band LL (n), for example, the method described in Non-Patent Document 1 can be used. Explaining this method, in spatio-temporal adaptive predictive decoding, as in spatio-temporal adaptive predictive coding, prediction is performed by calculating the correlation coefficient of the decoded signal in the vicinity of the encoding target pixel in the current frame and the reference frame. Select the vessel. In this calculation, three-dimensional prediction is performed when the correlation coefficient is large, and two-dimensional prediction is performed when the correlation coefficient is small. By adding a signal obtained by dequantizing the differential signal to the obtained predicted value, the signal in the lowest frequency band is restored.
[0045]
Next, the spatio-temporal adaptive predictive decoding unit 31 uses the inter-frame correlation determination unit 37 to determine whether to perform subsequent high-frequency band decoding by spatio-temporal predictive decoding or to directly decode wavelet coefficients. The interframe correlation in the band LL (n) is calculated. In this calculation,
・ When correlation between frames is large
The space-time adaptive predictive decoding process is performed, and the process moves to a higher band.
[0046]
・ When correlation between frames is small
The wavelet coefficients are directly decoded, all the remaining high-band wavelet coefficients are also directly decoded, and the process ends.
[0047]
When the inter-frame correlation is large, the process proceeds as follows in the subsequent high frequency band.
[0048]
An inter-frame correlation in a frequency band that is one level lower is calculated from the low frequency band to the high frequency band among the LH direction, HL direction, and HH direction bands. In this calculation,
・ When correlation between frames is large
The space-time adaptive predictive decoding process is performed, and the process moves to a higher band.
[0049]
・ When correlation between frames is small
The wavelet coefficients are directly decoded, and all the remaining high-band wavelet coefficients are also directly decoded, and the process ends.
[0050]
FIG. 9 illustrates the processing of the LH (n−1), LH (n−2),..., LH (1) band part, and the calculation by the inter-frame correlation determination units 43, 44,. , Whether or not to perform the band spatio-temporal adaptive predictive decoding process 40 to 42 in the LH direction, and whether the inter-frame correlation is small or the process of the highest frequency band LH (1) is terminated. When the condition is satisfied, the processing in the LH direction ends. Processing in the HL direction and the HH direction can be performed in the same manner. FIG. 10 shows a flowchart of the band spatio-temporal adaptive predictive decoding process when the division level is two. When the correlation between frames is large in the prediction process in LL (2), HL (2) and LH ( 2) and HH (2) are subjected to prediction processing, and further, it is determined whether or not to perform prediction processing in HL (1), LH (1), and HH (1) based on the magnitude of inter-frame correlation.
[0051]
After the above processing is completed, the band synthesizing unit 35 in FIG. 7 or FIG. 8 synthesizes the output of each band and decodes the image.
[0052]
(Third embodiment)
In the first embodiment, whether to perform adaptive predictive coding or directly encode wavelet coefficients is performed for each frequency band in each frequency band. In the present embodiment example, it is determined in units of small blocks whether adaptive predictive coding is performed outside the lowest frequency band or whether wavelet coefficients are directly encoded. By processing in units of small blocks, encoding adapted to the local nature of the image can be performed, and encoding efficiency is improved.
[0053]
In the wavelet, it is known that each frequency component corresponding to the same spatial position has a correlation with each other. Using this property, as shown in FIG. 11, a small block corresponding to the same spatial position in each frequency band. Process in units.
[0054]
First, all the lowest frequency bands are subjected to space-time adaptive prediction encoding, as in the first embodiment.
[0055]
Next, the lowest frequency band is divided into small block units of L × L pixels. The inter-frame correlation of each small block is calculated in the order of raster scanning from left to right and from top to bottom in the figure. Whether to perform adaptive predictive coding on the small blocks at the corresponding spatial positions in the adjacent bands LH (n), HL (n), and HH (n) is determined based on the inter-frame correlation of each small block. That is,
・ When correlation between frames is large
The space-time adaptive predictive coding process is performed in the small block, and the process moves to a high band.
[0056]
・ When correlation between frames is small
The wavelet coefficients are directly encoded, and all the remaining high-band wavelet coefficients are directly encoded, and the process ends.
[0057]
When the correlation between frames is large, the same processing is performed in units of blocks in the subsequent high frequency band. The above processing is performed for all small blocks in the raster scan order. FIG. 12 shows a flowchart of the band adaptation process when the division level is two. When the correlation between frames is large in the prediction process for the small block MB (n) of LL (2), HL (2) MB (N) and LH (2) MB (n) and HH (2) MB (n) are predicted, and HL (1) MB (n) and LH (1) MB (n ) And HH (1) MB (n) are determined whether or not to perform the prediction process.
[0058]
In entropy coding, a residual signal or wavelet coefficient of an adaptive predictive coding output and a motion estimation vector are coded to generate a coded bit stream.
[0059]
(Embodiment 4)
In this embodiment, a method for decoding a signal encoded by small block unit processing according to the third embodiment will be described. The basic configuration is the same as that of the third embodiment shown in FIG. 7 or FIG. 8, except that processing other than the lowest frequency band is performed in units of small blocks.
[0060]
First, the entropy decoding unit 30 decodes the residual signal or wavelet coefficient and the motion vector from the encoded bit stream. Subsequently, in the case of irreversible encoding, the inverse quantization unit 35 inversely quantizes the wavelet coefficient or the differential signal based on space-time adaptive prediction.
[0061]
The processing of the spatio-temporal adaptive predictive decoding unit 31 in the lowest frequency band LL (n) is exactly the same as in the third embodiment, and the method described in Non-Patent Document 1 can be used.
[0062]
Next, the decoded signal of the lowest frequency band is divided into small block units of L × L pixels. The inter-frame correlation of each small block is calculated in the order of raster scanning from left to right and from top to bottom in the figure. Depending on the magnitude of the inter-frame correlation of each small block, it is determined whether or not to perform adaptive predictive decoding in the small block at the corresponding spatial position in the adjacent bands LH (n), HL (n), and HH (n). . That is,
When the correlation between frames is large
Spatio-temporal adaptive predictive decoding processing is performed in the small block, and the processing shifts to high bandwidth.
[0063]
・ When correlation between frames is small
The wavelet coefficients are directly decoded, and all the remaining high-band wavelet coefficients are also directly decoded, and the process ends.
[0064]
When the inter-frame correlation is large, the same processing is performed in units of small blocks in the subsequent high frequency band. These processes are performed for all small blocks in the raster scan order. FIG. 13 shows a flowchart of the band spatio-temporal adaptive predictive decoding process when the division level is two. When the correlation between frames is large in the prediction process for the small block MB (n) of LL (2), HL (2) MB (n), LH (2) MB (n) and HH (2) MB (n) are subjected to prediction processing, and HL (1) MB (n) and LH ( 1) It is determined whether or not to perform the prediction process for MB (n) and HH (1) MB (n).
[0065]
After these determination processes, the band synthesizer 36 synthesizes the output of each band and decodes the image.
[0066]
(Fifth embodiment)
In the present embodiment example, the correlation coefficient R between frames shown below is used as the magnitude of the correlation between frames in the first embodiment. _inter 1 shows an embodiment of a moving picture encoding apparatus and method when using a video.
[0067]
R _inter = Inter / (Inter + Intra)
Here, Inter represents the number of times of inter-frame processing (three-dimensional prediction), and Intra represents the number of times of intra-frame processing (two-dimensional prediction). Whether to perform inter-frame processing or intra-frame processing is determined by the correlation coefficient of the signal value between two consecutive frames in the vicinity of the encoded pixel. If the correlation coefficient is large, inter-frame processing is performed. If it is smaller, intra-frame processing is performed.
[0068]
Therefore, R _inter Is large, that is, when the number of inter-frame processing is large, it is determined that the inter-frame correlation is large. R in this lowest frequency band _inter Depending on the value, whether to perform adaptive predictive coding in adjacent bands LH (n), HL (n), and HH (n) is determined. That is,
・ R _inter > = TH
The space-time adaptive predictive encoding process is performed, and the process moves to a higher band.
[0069]
・ R _inter <If TH
The wavelet coefficients are directly encoded, and all the remaining high-band wavelet coefficients are directly encoded, and the process ends.
[0070]
Here, TH is a threshold value and takes a value of 0 = <TH = <1.
[0071]
R _inter If> = TH, the process proceeds as follows in the subsequent high frequency band.
[0072]
・ R _inter > = TH
The space-time adaptive predictive encoding process is performed, and the process moves to a higher band.
[0073]
・ R _inter <If TH
The wavelet coefficients are directly encoded, and all the remaining high-band wavelet coefficients are directly encoded, and the process ends.
[0074]
The above processing is performed toward the high band, and R _inter <When the condition of TH is satisfied, or the process is terminated up to the maximum frequency band HL (1), or when either condition is satisfied, the process in the LH direction is terminated. FIG. 14 illustrates the processing of the LH (n−1), LH (n−2),..., LH (1) band section. _inter It is determined whether or not to perform the band spatiotemporal adaptive prediction processes 21 to 22 in the LH direction according to the magnitude of the calculation result of the correlation determination units 25, 26,. Processing in the HL direction and the HH direction can be performed in the same manner. FIG. 15 shows a flowchart of the band spatio-temporal adaptive prediction processing when the division level is 2 levels. In the prediction processing in LL (2), R _inter When the value is greater than TH, prediction processing is performed for HL (2), LH (2), and HH (2), and R _inter It is determined whether or not to perform prediction processing with HL (1), LH (1), and HH (1) based on the magnitude of the value and TH.
[0075]
In the above method, whether space-time adaptive prediction is performed or not is determined by R in a frequency band one level lower. _inter Since it is determined by the value, information shared by the encoder and the decoder can be used in this process, and a new amount of additional information is not required.
[0076]
(Sixth embodiment)
In the present embodiment example, in the second embodiment, as the magnitude of the inter-frame correlation, R _inter 1 shows an embodiment of a moving picture decoding apparatus and method in the case of using. R _inter R in the lowest frequency band LH (n) in the correlation determination units 25 and 26 by _inter Calculate the value. With this result,
・ R _inter > = TH
The space-time adaptive predictive decoding process is performed, and the process moves to a higher band.
[0077]
・ R _inter If <TH, the wavelet coefficients are directly decoded, all the remaining high-band wavelet coefficients are also directly decoded, and the process ends.
[0078]
R _inter If> = TH, the process proceeds as follows in the subsequent high frequency band.
[0079]
R in a frequency band that is one level lower from the low frequency band to the high frequency band among the bands in the LH direction, the HL direction, and the HH direction. _inter Calculate the value, and the value is
・ R _inter > = TH
The space-time adaptive predictive decoding process is performed, and the process moves to a higher band.
[0080]
・ R _inter <If TH
The wavelet coefficients are directly decoded, and all the remaining high-band wavelet coefficients are also directly decoded, and the process ends.
[0081]
In addition, the processing example of LH (n-1), LH (n-2), ..., LH (1) band part was shown in FIG. R _inter If the condition of <TH is satisfied or the processing of the highest frequency band LH (1) is terminated, or if any of the conditions is satisfied, the processing in the LH direction is terminated. Processing in the HL direction and the HH direction can be performed in the same manner. FIG. 17 shows a flowchart of the band spatio-temporal adaptive decoding process when the division level is two levels. In the prediction process in LL (2), R _inter If the value is greater than TH, a prediction process is performed for HL (2), LH (2), and HH (2), and R _inter It is determined whether or not to perform the prediction processing with HL (1), LH (1), and HH (1).
[0082]
(Seventh embodiment)
In the present embodiment example, in the third embodiment, R _inter 1 shows an embodiment of a moving picture encoding apparatus and method when using a video. In the order of raster scanning from left to right and from top to bottom in FIG. _inter Calculate the value. R for each small block _inter Depending on the value, it is determined whether or not to perform adaptive predictive coding in a small block at a corresponding spatial position in adjacent bands LH (n), HL (n), and HH (n). That is,
・ R _inter > = TH
The space-time adaptive predictive coding process is performed in the small block, and the process moves to a high band.
[0083]
・ R _inter <If TH
The wavelet coefficients are directly encoded, and all the remaining high-band wavelet coefficients are directly encoded, and the process ends.
[0084]
R _inter When> = TH, the same processing is performed in units of blocks in the subsequent high frequency band. The above processing is performed for all small blocks in the raster scan order. FIG. 18 shows a flowchart of the band spatio-temporal adaptive encoding process when the division level is two levels. In the prediction process in LL (2) MB (n), R _inter When the value is greater than TH, prediction processing is performed for HL (2) MB (n), LH (2) MB (n), and HH (2) MB (n), and R _inter It is determined whether or not the prediction processing is performed for HL (1) MB (n), LH (1) MB (n), and HH (1) MB (n).
[0085]
(Eighth embodiment)
In the present embodiment example, in the fourth embodiment, as the magnitude of the inter-frame correlation, R _inter 1 shows an embodiment of a moving picture decoding apparatus and method in the case of using. In the order of raster scanning from left to right and from top to bottom in FIG. _inter Calculate the value. Small block R _inter The value determines whether or not to perform adaptive predictive decoding in a small block at a corresponding spatial position in the adjacent bands LH (n), HL (n), and HH (n). That is,
・ R _inter > = TH
Spatio-temporal adaptive predictive decoding processing is performed in the small block, and the processing shifts to high bandwidth.
[0086]
・ R _inter <If TH
The wavelet coefficients are directly decoded, and all the remaining high-band wavelet coefficients are also directly decoded, and the process ends.
[0087]
R _inter If> = TH, the same processing is performed in units of small blocks in the subsequent high frequency band. These processes are performed for all small blocks in the raster scan order. FIG. 19 shows a flowchart of the band spatio-temporal adaptive decoding process when the division level is 2 levels. _inter When the value is greater than TH, prediction processing is performed for HL (2) MB (n), LH (2) MB (n), and HH (2) MB (n), and R _inter It is determined whether or not the prediction processing is performed for HL (1) MB (n), LH (1) MB (n), and HH (1) MB (n).
[0088]
【The invention's effect】
As is apparent from the above description, according to the present invention, efficient lossy encoding and lossless encoding of moving images can be performed, and storage with a small disk capacity is possible. Furthermore, since it has spatial resolution scalability, it is possible to decode an image with a spatial resolution in accordance with the performance and application of the image display device.
[0089]
In addition, when decoding from a low band to an arbitrary band, an image having a lower spatial resolution than the original image can be reproduced, and when all data is decoded, an image having the same resolution as the original image is reproduced. If it is desired to reproduce an image having a lower spatial resolution than the original image according to the performance and application of the image display device, it is only necessary to decode the encoded data corresponding to the necessary band. The processing time is shorter than when the resolution conversion is performed by reproducing an image having the same resolution as the original image, and only the necessary data needs to be transmitted when transmitting the encoded bit stream, so the transmission rate is also reduced.
[Brief description of the drawings]
FIG. 1 is a basic configuration diagram (for lossy encoding) of a moving image encoding apparatus and method according to a first embodiment of the present invention.
FIG. 2 is a basic configuration diagram (for lossless encoding) of a moving image encoding apparatus and method according to the first embodiment of the present invention.
FIG. 3 is a schematic diagram of band division by octave division.
FIG. 4 is a block diagram of the main part of a band space-time adaptive predictive coding process.
FIG. 5 is a correlation diagram between frames of divided bands.
FIG. 6 is a flowchart of bandwidth spatio-temporal adaptive prediction encoding processing based on inter-frame correlation.
FIG. 7 is a basic configuration diagram (for lossy encoding) of a moving picture decoding apparatus and method according to a second embodiment of the present invention.
FIG. 8 is a basic configuration diagram (for lossless encoding) of a moving image encoding apparatus and method according to a second embodiment of the present invention.
FIG. 9 is a block diagram of the main part of a band space-time adaptive predictive decoding process.
FIG. 10 is a flowchart of bandwidth spatio-temporal adaptive prediction decoding processing based on inter-frame correlation.
FIG. 11 is a correlation diagram between frames in a small block division band.
FIG. 12 is a flowchart of bandwidth spatio-temporal adaptive prediction encoding processing using small blocks.
FIG. 13 is a flowchart of band space-time adaptive predictive decoding processing using small blocks.
FIG. 14 Correlation coefficient R _inter The principal part block diagram of the band spatio-temporal adaptive prediction encoding process by A.
FIG. 15: Correlation coefficient R _inter The flowchart of the band spatio-temporal adaptive prediction encoding process by.
FIG. 16: correlation coefficient R _inter The principal part block diagram of the band spatio-temporal adaptive prediction decoding process by this.
FIG. 17: correlation coefficient R _inter 7 is a flowchart of band spatio-temporal adaptive predictive decoding processing according to FIG.
FIG. 18: small block and correlation coefficient R _inter The flowchart of the band spatio-temporal adaptive prediction encoding process by.
FIG. 19: small block and correlation coefficient R _inter 7 is a flowchart of band spatio-temporal adaptive predictive decoding processing according to FIG.
[Explanation of symbols]
10: Band division unit
11-14, 20-22 ... Band space-time adaptive prediction unit
15 ... Quantization part
16: Entropy encoding unit
17: Inter-frame correlation determination unit
23, 24 ... Inter-frame correlation determination unit
25, 26 ... R _inter Correlation judgment part by
30: Entropy decoding unit
31-34: Band space-time adaptive predictive decoding unit
35 ... Inverse quantization section
36. Band synthesis unit
37 ... Inter-frame correlation determination unit
40 to 42: Band space-time adaptive predictive decoding unit
43, 44 ... Inter-frame correlation determination unit

Claims

In coding for moving images, a two-dimensional predictor that performs intra-frame processing (two-dimensional prediction) for each divided band and three-dimensional prediction that performs inter-frame processing (three-dimensional prediction). In order to switch the detector, the correlation coefficient of the decoded signal near the encoding target pixel of the current frame and the reference frame is calculated, and if the correlation coefficient is large, interframe processing (three-dimensional prediction) is performed. In other cases, in the spatio-temporal adaptive prediction in which intra-frame processing (two-dimensional prediction) is performed , the correlation from the lowest frequency band to the lowest frequency band is made using the correlation between the bands in the same spatial direction. If the inter-frame correlation in the frequency band one level lower is high, the band is encoded by spatio-temporal adaptive prediction, and if the inter-frame correlation is low, direct encoding is performed. Moving picture coding method according to.

In decoding using a moving image as an output, when the signal encoded by the moving image encoding method according to claim 1 has a high inter-frame correlation in a frequency band one level lower from the band toward the lowest frequency band. A video decoding method characterized in that the band is decoded by spatio-temporal adaptive prediction and is directly decoded when the inter-frame correlation is low.

2. The moving picture encoding method according to claim 1, wherein a rate of inter-frame processing (three-dimensional prediction) in a frequency band that is one level lower from the band toward the lowest frequency band is higher than intra-frame processing (two-dimensional prediction) . In this case, the band is encoded by spatio-temporal adaptive prediction, and is directly encoded when the rate of inter-frame processing (three-dimensional prediction) is lower than that of intra-frame processing (two-dimensional prediction) . Video encoding method.

In decoding using a moving image as an output, a signal encoded by the moving image encoding method according to claim 3 is subjected to interframe processing (three-dimensional prediction) in a frequency band one level lower from the band toward the lowest frequency band. ) Is higher than intra-frame processing (two-dimensional prediction) , the band is decoded by spatio-temporal adaptive prediction, and the ratio of inter-frame processing (three-dimensional prediction) is intra-frame processing (two-dimensional prediction). A moving picture decoding method characterized in that if it is lower than the above, direct decoding is performed.

In coding for moving images, a two-dimensional predictor that performs intra-frame processing (two-dimensional prediction) for each divided band and three-dimensional prediction that performs inter-frame processing (three-dimensional prediction). In order to switch the detector, the correlation coefficient of the decoded signal near the encoding target pixel of the current frame and the reference frame is calculated, and if the correlation coefficient is large, interframe processing (three-dimensional prediction) is performed. Otherwise, in the spatio-temporal adaptive prediction in which intra-frame processing (two-dimensional prediction) is performed, it is determined in units of small blocks whether to perform spatio-temporal adaptive prediction or direct encoding, and the small block in the band When the inter-frame correlation of small blocks in the same spatial position in the frequency band that is one level lower from the lowest frequency band is large, encoding is performed by space-time adaptive prediction. Moving picture coding method which is characterized in that direct coding if the inter-frame correlation is low.

In decoding using a moving image as an output, the signal encoded by the moving image encoding method according to claim 5 is determined in units of small blocks whether to perform space-time adaptive predictive decoding or direct decoding, When a small block of a band has a large inter-frame correlation between small blocks located in the same spatial position in a frequency band that is one level lower from the band toward the lowest frequency band, the inter-frame correlation is decoded by space-time adaptive prediction . A moving picture decoding method, characterized in that if the signal is low, direct decoding is performed.

6. The moving picture encoding method according to claim 5, wherein the small block in the band is inter-frame processing (three-dimensional) in the small block spatially located in the same frequency band in the frequency band one level lower from the band toward the lowest frequency band. When the ratio of ( prediction) is higher than that of intra-frame processing (two-dimensional prediction) , encoding is performed by spatio-temporal adaptive prediction, and the ratio of inter-frame processing (three-dimensional prediction) is higher than that of intra-frame processing (two-dimensional prediction). A moving picture coding method, wherein the coding is performed directly when the value is low.

8. In decoding using a moving image as an output, a signal encoded by the moving image encoding method according to claim 7 is a space in a frequency band in which a small block of the band is one level lower from the band toward the lowest frequency band. If the ratio of inter-frame processing (three-dimensional prediction) within a small block at the same position is higher than intra-frame processing (two-dimensional prediction) , decoding is performed by spatio-temporal adaptive prediction, and inter-frame processing (3 moving picture decoding method characterized in that the ratio of the dimension prediction) is decoded directly in the case lower than the intra-frame processing (two-dimensional prediction).

In coding for moving images, a two-dimensional predictor that performs intra-frame processing (two-dimensional prediction) for each divided band and three-dimensional prediction that performs inter-frame processing (three-dimensional prediction). In order to switch the detector, the correlation coefficient of the decoded signal near the encoding target pixel of the current frame and the reference frame is calculated, and if the correlation coefficient is large, interframe processing (three-dimensional prediction) is performed. In other cases, in the spatio-temporal adaptive prediction in which intra-frame processing (two-dimensional prediction) is performed , the correlation from the lowest frequency band to the lowest frequency band is made using the correlation between the bands in the same spatial direction. When the inter-frame correlation of the frequency band lower by 1 level is high, the band is encoded by spatio-temporal adaptive prediction, and when the inter-frame correlation is low, direct encoding is performed. Video encoding apparatus characterized by having steps.

In decoding using a moving image as an output, when the signal encoded by the moving image encoding device according to claim 9 has a high inter-frame correlation in a frequency band one level lower from the band toward the lowest frequency band. Comprises a means for decoding the band by spatio-temporal adaptive prediction and a means for directly decoding when the inter-frame correlation is low.

10. The moving picture coding according to claim 9, wherein the rate of inter-frame processing (three-dimensional prediction) in a frequency band that is one level lower from the band toward the lowest frequency band is higher than the intra-frame processing (two-dimensional prediction) . Includes means for encoding the band by spatio-temporal adaptive prediction, and means for directly encoding when the rate of inter-frame processing (three-dimensional prediction) is lower than that of intra-frame processing (two-dimensional prediction) . A moving picture coding apparatus characterized by the above.

In decoding using a moving image as an output, a signal encoded by the moving image encoding device according to claim 11 is subjected to interframe processing (three-dimensional prediction) in a frequency band one level lower from the band toward the lowest frequency band. ) Is higher than the intra-frame processing (two-dimensional prediction) , the means for decoding the band by spatio-temporal adaptive prediction, and the inter-frame processing (three-dimensional prediction) ratio is the intra-frame processing (2 A moving picture decoding apparatus comprising a means for directly decoding when it is lower than ( dimensional prediction) .

In coding for moving images, a two-dimensional predictor that performs intra-frame processing (two-dimensional prediction) for each divided band and three-dimensional prediction that performs inter-frame processing (three-dimensional prediction). In order to switch the detector, the correlation coefficient of the decoded signal near the encoding target pixel of the current frame and the reference frame is calculated, and if the correlation coefficient is large, interframe processing (three-dimensional prediction) is performed. Otherwise, in the spatio-temporal adaptive prediction in which intra-frame processing (two-dimensional prediction) is performed, it is determined in units of small blocks whether to perform spatio-temporal adaptive prediction or direct encoding, and the small block in the band When the inter-frame correlation of small blocks in the same spatial position in the frequency band that is one level lower from the lowest frequency band is large, encoding is performed by spatio-temporal adaptive prediction. It means, moving picture encoding apparatus characterized by comprising means for directly encoding when the inter-frame correlation is low.

In decoding using a moving image as an output, it is determined in units of small blocks whether the signal encoded by the moving image encoding device according to claim 13 is subjected to space-time adaptive prediction decoding or direct decoding, and Means for decoding by spatio-temporal adaptive prediction when a small block in a band has a large inter-frame correlation between small blocks located in the same spatial position in a frequency band that is one level lower from the band toward the lowest frequency band; A moving picture decoding apparatus comprising means for directly decoding when a correlation between frames is low.

14. The moving picture coding according to claim 13, wherein the small block in the band is inter-frame processed (three-dimensional prediction) in the small block spatially located in the frequency band that is one level lower from the band toward the lowest frequency band. ) Is higher than intra-frame processing ( two-dimensional prediction) , means for encoding by spatio-temporal adaptive prediction, and inter-frame processing (three-dimensional prediction) is intra-frame processing (two-dimensional prediction). A moving picture coding apparatus comprising means for direct coding when the value is low.

16. In decoding using a moving image as an output, a signal encoded by the moving image encoding device according to claim 15 is converted into a space in a frequency band in which a small block of the band is one level lower from the band toward the lowest frequency band. Means for decoding by spatio-temporal adaptive prediction when the ratio of inter-frame processing (three-dimensional prediction) in a small block at the same position is higher than intra-frame processing (two-dimensional prediction); A moving picture decoding apparatus comprising: means for directly decoding when a ratio of processing (three-dimensional prediction) is lower than intra-frame processing (two-dimensional prediction) .