JP4051799B2

JP4051799B2 - Image decoding apparatus and image decoding method

Info

Publication number: JP4051799B2
Application number: JP04373599A
Authority: JP
Inventors: 一彦西堀; 修八木; 英輝鍋迫
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1999-02-22
Filing date: 1999-02-22
Publication date: 2008-02-27
Anticipated expiration: 2019-02-22
Also published as: JP2000244917A

Description

【０００１】
【発明の属する技術分野】
本発明は、８×８画素からなる直交変換ブロック単位で直交変換することによる圧縮符号化をした第１の解像度の圧縮画像データを復号する画像復号装置及び画像復号方法に関し、特に、第１の解像度の圧縮画像データを復号して、この第１の解像度よりも低い第２の解像度の動画像データに縮小する画像復号装置及び画像復号方法に関するものである。
【０００２】
【従来の技術】
ＭＰＥＧ２（Moving Picture Experts Group phase2）等の画像圧縮方式を用いたデジタルテレビジョン放送の規格化が進められている。デジタルテレビジョン放送の規格には、標準解像度画像（例えば垂直方向の有効ライン数が５７６本）に対応した規格、高解像度画像（例えば垂直方向の有効ライン数が１１５２本）に対応した規格等がある。そのため、近年、高解像度画像の圧縮画像データを復号するとともにこの圧縮画像データを１／２の解像度に縮小することにより、標準解像度画像の画像データを生成して、この画像データを標準解像度に対応したテレビジョンモニタに表示するダウンデコーダが求められている。
【０００３】
高解像度画像に対して動き予測による予測符号化及び離散コサイン変換による圧縮符号化をしたＭＰＥＧ２等のビットストリームを、復号するとともに標準解像度画像にダウンサンプリングするダウンデコーダが、文献「低域ドリフトのないスケーラブル・デコーダ」（岩橋・神林・貴家：信学技報 CS94-186,DSP94-108,1995-01）に提案されている（以下、この文献を文献１と呼ぶ。）。この文献１には、以下の第１から第３のダウンデコーダが示されている。
【０００４】
第１のダウンデコーダは、図８に示すように、高解像度画像のビットストリームに対して８（水平方向のＤＣ成分から数えた係数の数）×８（垂直方向のＤＣ成分から数えた係数の数）の逆離散コサイン変換をする逆離散コサイン変換装置１０１と、離散コサイン変換がされた高解像度画像と動き補償がされた参照画像とを加算する加算装置１０２と、参照画像を一時記憶するフレームメモリ１０３と、フレームメモリ１０３が記憶した参照画像に１／２画素精度で動き補償をする動き補償装置１０４と、フレームメモリ１０３が記憶した参照画像を標準解像度の画像に変換するダウンサンプリング装置１０５とを備えている。
【０００５】
この第１のダウンデコーダでは、逆離散コサイン変換を行い高解像度画像として復号した出力画像を、ダウンサンプリング装置１０５で縮小して標準解像度の画像データを出力する。
【０００６】
第２のダウンデコーダは、図９に示すように、高解像度画像のビットストリームのＤＣＴ（Discrete Cosine Transform）ブロックの高周波成分の係数を０に置き換えて８×８の逆離散コサイン変換をする逆離散コサイン変換装置１１１と、離散コサイン変換がされた高解像度画像と動き補償がされた参照画像とを加算する加算装置１１２と、参照画像を一時記憶するフレームメモリ１１３と、フレームメモリ１１３が記憶した参照画像に１／２画素精度で動き補償をする動き補償装置１１４と、フレームメモリ１１３が記憶した参照画像を標準解像度の画像に変換するダウンサンプリング装置１１５とを備えている。
【０００７】
この第２のダウンデコーダでは、ＤＣＴブロックの全ての係数のうち高周波成分の係数を０に置き換えて逆離散コサイン変換を行い高解像度画像として復号した出力画像を、ダウンサンプリング装置１０５で縮小して標準解像度の画像データを出力する。
【０００８】
第３のダウンデコーダは、図１０に示すように、高解像度画像のビットストリームのＤＣＴブロックの低周波成分の係数のみを用いて例えば４×４の逆離散コサイン変換をして標準解像度画像に復号する縮小逆離散コサイン変換装置１２１と、縮小逆離散コサイン変換がされた標準解像度画像と動き補償がされた参照画像とを加算する加算装置１２２と、参照画像を一時記憶するフレームメモリ１２３と、フレームメモリ１２３が記憶した参照画像に１／４画素精度で動き補償をする動き補償装置１２４とを備えている。
【０００９】
この第３のダウンデコーダでは、ＤＣＴブロックの全ての係数のうち低周波成分の係数のみを用いて逆離散コサイン変換を行い、高解像度画像から標準解像度画像として復号する。
【００１０】
ここで、上記第１のダウンデコーダでは、ＤＣＴブロック内の全ての係数に対して逆離散コサイン変換を行い高解像度画像を復号しているため、高い演算処理能力の逆離散コサイン変換装置１０１と高容量のフレームメモリ１０３とが必要となる。また、上記第２のダウンデコーダでは、ＤＣＴブロック内の係数のうち高周波成分を０として離散コサイン変換を行い高解像度画像を復号しているため、逆離散コサイン変換装置１１１の演算処理能力は低くて良いが、やはり高容量のフレームメモリ１１３が必要となる。これら第１及び第２のダウンデコーダに対し、第３のダウンデコーダでは、ＤＣＴブロック内の全ての係数うち低周波成分の係数のみを用いて逆離散コサイン変換をしているため逆離散コサイン変換装置１２１の演算処理能力が低くてよく、さらに、標準解像度画像の参照画像を復号しているのでフレームメモリ１２３の容量も少なくすることができる。
【００１１】
ところで、テレビジョン放送等の動画像の表示方式には、順次走査方式と飛び越し走査方式とがある。順次走査方式は、フレーム内の全ての画素を同じタイミングでサンプリングした画像を、順次表示する表示方式である。飛び越し走査方式は、フレーム内の画素を水平方向の１ライン毎に異なるタイミングでサンプリングした画像を、交互に表示する表示方式である。
【００１２】
この飛び越し走査方式では、フレーム内の画素を１ライン毎に異なるタイミングでサンプリングした画像のうちの一方を、トップフィールド（第１フィールドともいう。）といい、他方をボトムフィールド（第２のフィールドともいう。）という。フレームの水平方向の先頭ラインが含まれる画像がトップフィールドとなり、フレームの水平方向の２番目のラインが含まれる画像がボトムフィールドとなる。従って、飛び越し走査方式では、１つのフレームが２つのフィールドから構成されることとなる。
【００１３】
ＭＥＰＧ２では、飛び越し走査方式に対応した動画像信号を効率良く圧縮するため、画面の圧縮単位であるピクチャにフレームを割り当てて符号化するだけでなく、ピクチャにフィールドを割り当てて符号化することもできる。
【００１４】
ＭＰＥＧ２では、ピクチャにフィールドが割り当てられた場合には、そのビットストリームの構造をフィールド構造と呼び、ピクチャにフレームが割り当てられた場合には、そのビットストリームの構造をフレーム構造と呼ぶ。また、フィールド構造では、フィールド内の画素からＤＣＴブロックが形成され、フィールド単位で離散コサイン変換がされる。このフィールド単位で離散コサイン変換を行う処理モードのことをフィールドＤＣＴモードと呼ぶ。また、フレーム構造では、フレーム内の画素からＤＣＴブロックが形成され、フレーム単位で離散コサイン変換がされる。このフレーム単位で離散コサイン変換を行う処理モードのことをフレームＤＣＴモードと呼ぶ。さらに、フィールド構造では、フィールド内の画素からマクロブロックが形成され、フィールド単位で動き予測がされる。このフィールド単位で動き予測を行う処理モードのことをフィールド動き予測モードと呼ぶ。また、フレーム構造では、フレーム内の画素からマクロブロックが形成され、フレーム単位で動き予測がされる。フレーム単位で動き予測を行う処理モードのことをフレーム動き予測モードと呼ぶ。
【００１５】
ところで、上記文献１に示された第３のダウンデコーダを利用して、飛び越し走査方式に対応した圧縮画像データを復号する画像復号装置が、例えば文献「A Compensation Method of Drift Errors in Scalability」（N.OBIKANE,K.TAHARA and J.YONEMITSU,HDTV Work Shop'93）に提案されている（以下、この文献を文献２と呼ぶ）。
【００１６】
この文献２に示された従来の画像復号装置は、図１１に示すように、高解像度画像をＭＰＥＧ２で圧縮したビットストリームが供給され、このビットストリームを解析するビットストリーム解析装置１３１と、データの発生頻度に応じた符号長を割り当てる可変長符号化がされたビットストリームを復号する可変長符号復号装置１３２と、ＤＣＴブロックの各係数に量子化ステップを掛ける逆量子化装置２０３３と、ＤＣＴブロックの全ての係数のうち低周波成分の係数のみを用いて例えば４×４の逆離散コサイン変換をして標準解像度画像を復号する縮小逆離散コサイン変換装置１３４と、縮小逆離散コサイン変換がされた標準解像度画像と動き補償がされた参照画像とを加算する加算装置１３５と、参照画像を一時記憶するフレームメモリ１３６と、フレームメモリ１３６が記憶した参照画像に１／４画素精度で動き補償をする動き補償装置１３７とを備えている。
【００１７】
この文献２に示された従来の画像復号装置の縮小逆離散コサイン変換装置１３４は、ＤＣＴブロック内の全ての係数のうち低周波成分の係数のみを用いて逆離散コサイン変換をするが、フレームＤＣＴモードとフィールドＤＣＴモードとで、逆離散コサイン変換を行う係数の位置が異なっている。
【００１８】
具体的には、縮小逆離散コサイン変換装置１３４は、フィールドＤＣＴモードの場合には、図１２に示すように、ＤＣＴブロック内の８×８個のうち、低域の４×４個の係数のみに逆離散コサイン変換を行う。それに対し、縮小逆離散コサイン変換装置１３４は、フレームＤＣＴモードの場合には、図１３に示すように、ＤＣＴブロック内の８×８個の係数のうち、４×２個＋４×２個の係数のみに逆離散コサイン変換を行う。
【００１９】
また、この文献２に示された従来の画像復号装置の動き補償装置１３７は、高解像度画像に対して行われた動き予測の情報（動きベクトル）に基づき、フィールド動き予測モード及びフレーム動き予測モードのそれぞれに対応した１／４画素精度の動き補償を行う。すなわち、通常ＭＰＥＧ２では１／２画素精度で動き補償が行われることが定められているが、高解像度画像から標準解像度画像を復号する場合には、ピクチャ内の画素数が１／２に間引かれるため、動き補償装置１３７では動き補償の画素精度を１／４画素精度として動き補償を行っている。
【００２０】
従って、動き補償装置１３７では、高解像度画像に対応した動き補償を行うため、標準解像度の画像としてフレームメモリ１３６に格納された参照画像の画素に対して線形補間して、１／４画素精度の画素を生成している。
【００２１】
具体的に、フィールド動き予測モード及びフレーム動き予測モードの場合の垂直方向の画素の線形補間処理を、図１４及び図１５を用いて説明する。なお、図面中には、縦方向に垂直方向の画素の位相を示し、表示画像の各画素が位置する位相を整数で示している。
【００２２】
まず、フィールド動き予測モードで動き予測がされた画像の補間処理について、図１４を用いて説明する。高解像度画像（上位レイヤー）に対しては、図１４（ａ）に示すように、各フィールドそれぞれ独立に、１／２画素精度で動き補償がされる。これに対し、標準解像度画像（下位レイヤー）に対しては、図１４（ｂ）に示すように、整数精度の画素に基づきフィールド内で線形補間をして、垂直方向に１／４画素、１／２画素、３／４画素分の位相がずれた画素を生成し、動き補償がされる。すなわち、標準解像度画像（下位レイヤー）では、トップフィールドの整数精度の各画素に基づきトップフィールドの１／４画素精度の各画素が線形補間により生成され、ボトムフィールドの整数精度の各画素に基づきボトムフィールドの１／４画素精度の各画素が線形補間により生成される。例えば、垂直方向の位相が０の位置にあるトップフィールドの画素の値をａ、垂直方向の位相が１の位置にあるトップフィールドの画素の値をｂとする。この場合、垂直方向の位相が１／４の位置にあるトップフィールドの画素は（３ａ＋ｂ）／４となり、垂直方向の位相が１／２の位置にあるトップフィールドの画素は（ａ＋ｂ）／２となり、垂直方向の位相が３／４の位置にあるトップフィールドの画素は（ａ＋３ｂ）／４となる。
【００２３】
続いて、フレーム動き予測モードで動き予測がされた画像の補間処理について、図１５を用いて説明する。高解像度画像（上位レイヤー）に対しては、図１５（ａ）に示すように、各フィールド間で補間処理がされ、すなわち、ボトムフィールドとトップフィールドとの間で補間処理がされ、１／２画素精度で動き補償がされる。標準解像度画像（下位レイヤー）に対しては、図１５（ｂ）に示すように、トップフィールド及びボトムフィールドの２つのフィールドの整数精度の各画素に基づき、垂直方向に１／４画素、１／２画素、３／４画素分の位相がずれた画素が線形補間により生成され、動き補償がされる。例えば、垂直方向の位相が−１の位置にあるボトムフィールドの画素の値をａ、垂直方向の位相が０の位置にあるトップフィールドの画素の値をｂ、垂直方向の位相が１の位置にあるボトムフィールドの画素の値をｃ、垂直方向の位相が２の位置にあるトップフィールドの画素の値をｄ、垂直方向の位相が３の位置にあるボトムフィールドの画素の値をｅとする。この場合、垂直方向の位相が０〜２の間にある１／４画素精度の各画素は、以下のように求められる。
【００２４】
垂直方向の位相が１／４の位置にある画素は（ａ＋４ｂ＋３ｃ）／８となる。垂直方向の位相が１／２の位置にある画素は（ａ＋３ｃ）／４となる。垂直方向の位相が３／４の位置にある画素は（ａ＋２ｂ＋３ｃ＋２ｄ）／８となる。垂直方向の位相が５／４の位置にある画素は（２ｂ＋３ｃ＋２ｄ＋ｅ）／８となる。垂直方向の位相が３／２の位置にある画素は（３ｃ＋ｅ）／４となる。垂直方向の位相が７／４の位置にある画素は（３ｃ＋４ｄ＋ｅ）／８となる。
【００２５】
以上のように上記文献２に示された従来の画像復号装置は、飛び越し走査方式に対応した高解像度画像の圧縮画像データを、標準解像度画像データに復号することができる。
【００２６】
しかしながら、上記文献２に示された従来の画像復号装置では、フィールドＤＣＴモードで得られる標準解像度画像の各画素と、フレームＤＣＴモードで得られる標準解像度の各画素との位相がずれる。具体的には、フィールドＤＣＴモードでは、図１６に示すように、下位レイヤーのトップフィールドの各画素の垂直方向の位相が１／２、５／２・・・となり、下位レイヤーのボトムフィールドの各画素の垂直方向の位相が１、３・・・となる。それに対して、フレームＤＣＴモードでは、図１７に示すように、下位レイヤーのトップフィールドの各画素の垂直方向の位相が０、２・・・となり、下位レイヤーのボトムフィールドの各画素の垂直方向の位相が１、３・・・となる。そのため、位相が異なる画像がフレームメモリ１３６に混在し、出力する画像の画質が劣化する。
【００２７】
また、上記文献２に示された従来の画像復号装置では、フィールド動き予測モードとフレーム動き予測モードとで位相ずれの補正がされていない。そのため、出力する画像の画質が劣化する。
【００２８】
【発明が解決しようとする課題】
このような問題を解決するための画像復号装置が、特願平１０−２０８３８５号により提案されている。
【００２９】
つぎに、特願平１０−２０８３８５で提案された画像復号装置について説明する。
【００３０】
図１８に示す特願平１０−２０８３８５号で提案した画像復号装置２００は、垂直方向の有効ライン数が例えば１１５２本の高解像度画像をＭＰＥＧ２で画像圧縮したビットストリームが入力され、この入力されたビットストリームを復号するとともに１／２の解像度に縮小して、垂直方向の有効ライン数が例えば５７６本の標準解像度画像を出力する装置である。
【００３１】
なお、以下、高解像度画像のことを上位レイヤーとも呼び、標準解像度画像のことを下位レイヤーとも呼ぶものとする。また、通常、８×８の離散コサイン係数を有するＤＣＴブロックを逆離散コサイン変換した場合８×８の画素から構成される復号データを得ることができるが、例えば、８×８の離散コサイン係数を復号して４×４の画素から構成される復号データを得るような、逆離散コサイン変換をするとともに解像度を縮小する処理を、縮小逆離散コサイン変換という。
【００３２】
この画像復号装置２００は、圧縮された高解像度画像のビットストリームが供給され、このビットストリームを解析するビットストリーム解析装置２０１と、データの発生頻度に応じた符号長を割り当てる可変長符号化がされた上記ビットストリームを復号する可変長符号復号装置２０２と、ＤＣＴブロックの各係数に量子化ステップを掛ける逆量子化装置２０３と、フィールドＤＣＴモードで離散コサイン変換がされたＤＣＴブロックに対して縮小逆離散コサイン変換をして標準解像度画像を生成するフィールドモード用縮小逆離散コサイン変換装置２０４と、フレームＤＣＴモードで離散コサイン変換がされたＤＣＴブロックに対して縮小逆離散コサイン変換をして標準解像度画像を生成するフレームモード用縮小逆離散コサイン変換装置２０５と、縮小逆離散コサイン変換がされた標準解像度画像と動き補償がされた参照画像とを加算する加算装置２０６と、参照画像を一時記憶するフレームメモリ２０７と、フレームメモリ２０７が記憶した参照画像にフィールド動き予測モードに対応した動き補償をするフィールドモード用動き補償装置２０８と、フレームメモリ２０７が記憶した参照画像にフレーム動き予測モードに対応した動き補償をするフレームモード用動き補償装置２０９と、フレームメモリ２０７が記憶した画像に対してポストフィルタリングをすることにより、画枠変換をするとともに画素の位相ずれを補正してテレビジョンモニタ等に表示するための標準解像度の画像データを出力する画枠変換・位相ずれ補正装置２１０とを備えている。
【００３３】
フィールドモード用縮小逆離散コサイン変換装置２０４は、入力されたビットストリームのマクロブロックが、フィールドＤＣＴモードで離散コサイン変換されている場合に用いられる。フィールドモード用縮小逆離散コサイン変換装置２０４は、フィールドＤＣＴモードで離散コサイン変換がされたマクロブロック内の８×８個の係数が示されたＤＣＴブロックに対して、図１２で示したような、低域の４×４の係数のみに逆離散コサイン変換を行う。すなわち、水平方向及び垂直方向の低域の４点の離散コサイン係数に基づき縮小逆離散コサイン変換を行う。このフィールドモード用縮小逆離散コサイン変換装置２０４では、以上のような縮小逆離散コサイン変換を行うことにより、１つのＤＣＴブロックが４×４の画素から構成される標準解像度画像を復号することができる。この復号された画像データの各画素の位相は、図１９に示すように、トップフィールドの各画素の垂直方向の位相が１／２、５／２・・・となり、ボトムフィールドの各画素の垂直方向の位相が１、３・・・となる。すなわち、復号された下位レイヤーのトップフィールドでは、先頭画素（位相が１／２の画素）の位相が上位レイヤーのトップフィールドの先頭から１番目と２番目の画素（位相が０と２の画素）の中間位相となり、先頭から２番目の画素（位相が５／２の画素）の位相が上位レイヤーのトップフィールドの先頭から３番目と４番目の画素（位相が４と６の画素）の中間位相となる。また、復号された下位レイヤーのボトムフィールドでは、先頭画素（位相が１の画素）の位相が上位レイヤーのボトムフィールドの先頭から１番目と２番目の画素（位相が１と３の画素）の中間位相となり、先頭から２番目の画素（位相が３の画素）の位相が上位レイヤーのボトムフィールドの先頭から３番目と４番目の画素（位相が５と７の画素）の中間位相となる。
【００３４】
フレームモード用縮小逆離散コサイン変換装置２０５は、入力されたビットストリームのマクロブロックが、フレームＤＣＴモードで離散コサイン変換されている場合に用いられる。フレームモード用縮小逆離散コサイン変換装置２０５は、フレームＤＣＴモードで離散コサイン変換がされたマクロブロック内の８×８個の係数が示されたＤＣＴブロックに対して、縮小逆離散コサイン変換を行う。そして、フレームモード用縮小逆離散コサイン変換装置２０５では、１つのＤＣＴブロックが４×４の画素から構成される解像度画像を復号するとともに、フィールドモード用縮小逆離散コサイン変換装置２０４で生成した標準解像度画像の画素の位相と同位相の画像を生成する。すなわち、フレームモード用縮小逆離散コサイン変換装置２０５で復号された画像データの各画素の位相は、図１９に示すように、トップフィールドの各画素の垂直方向の位相が１／２、５／２・・・となり、ボトムフィールドの各画素の垂直方向の位相が１、３・・・となる。
【００３５】
なお、このフレームモード用縮小逆離散コサイン変換装置２０５の処理については、その詳細を後述する。
【００３６】
加算装置２０６は、フィールドモード用縮小逆離散コサイン変換装置２０４又はフレームモード用縮小逆離散コサイン変換装置２０５により縮小逆離散コサイン変換されたマクロブロックがイントラ画像の場合には、そのイントラ画像をそのままフレームメモリ２０７に格納する。また、加算装置２０６は、フィールドモード用縮小逆離散コサイン変換装置２０４又はフレームモード用縮小逆離散コサイン変換装置２０５により縮小逆離散コサイン変換されたマクロブロックがインター画像である場合には、そのインター画像に、フィールドモード用動き補償装置２０８或いはフレームモード用動き補償装置２０９により動き補償がされた参照画像を合成して、フレームメモリ２０７に格納する。
【００３７】
フィールドモード用動き補償装置２０８は、マクロブロックの動き予測モードがフィールド動き予測モードの場合に用いられる。フィールドモード用動き補償装置２０８は、フレームメモリ２０７に記憶されている標準解像度画像の参照画像に対して、トップフィールドとボトムフィールドとの間の位相ずれ成分を考慮した形で１／４画素精度で補間処理を行い、フィールド動き予測モードに対応した動き補償をする。このフィールドモード用動き補償装置２０８により動き補償がされた参照画像は、加算装置２０６に供給され、インター画像に合成される。
【００３８】
フレームモード用動き補償装置２０９は、マクロブロックの動き予測モードがフレーム動き予測モードの場合に用いられる。フレームモード用動き補償装置２０９は、フレームメモリ２０７に記憶されている標準解像度画像の参照画像に対して、トップフィールドとボトムフィールドとの間の位相ずれ成分を考慮した形で１／４画素精度で補間処理を行い、フレーム動き予測モードに対応した動き補償をする。このフレームモード用動き補償装置２０９により動き補償がされた参照画像は、加算装置２０６に供給され、インター画像に合成される。
【００３９】
画枠変換・位相ずれ補正装置２１０は、フレームメモリ２０７が記憶した標準解像度の参照画像或いは加算装置２０６が合成した画像が供給され、この画像をポストフィルタリングにより、トップフィールドとボトムフィールドとの間の位相ずれ成分を補正するとともに画枠を標準解像度のテレビジョンの規格に合致するように変換する。すなわち、画枠変換・位相ずれ補正装置２１０は、トップフィールドの各画素の垂直方向の位相が１／２、５／２・・・となり、ボトムフィールドの各画素の垂直方向の位相が１、３・・・となる標準解像度画像を、例えば、トップフィールドの各画素の垂直方向の位相が０、２、４・・・となり、ボトムフィールドの各画素の垂直方向の位相が１、３、５・・・となるように補正する。また、画枠変換・位相ずれ補正装置２１０は、高解像度のテレビジョン規格の画枠を、１／４に縮小して標準解像度のテレビジョン規格の画枠に変換する。
【００４０】
特願平１０−２０８３８５で提案した画像復号装置２００では、以上のような構成を有することにより、高解像度画像をＭＰＥＧ２で画像圧縮したビットストリームを、復号するとともに解像度を１／２に縮小して、標準解像度画像を出力することができる。
【００４１】
つぎに、上記フレームモード用縮小逆離散コサイン変換装置２０５の処理内容について、さらに詳細に説明する。
【００４２】
フレームモード用縮小逆離散コサイン変換装置２０５には、図２０に示すように、高解像度画像を圧縮符号化したビットストリームが、１つのＤＣＴブロック単位で入力される。
【００４３】
まず、ステップＳ１において、この１つのＤＣＴブロックの離散コサイン係数ｙ（ＤＣＴブロックの全ての離散コサイン係数のうち垂直方向の係数をｙ₁〜ｙ₈として図中に示す。）に対して、８×８の逆離散コサイン変換（ＩＤＣＴ８×８）を行う。逆離散コサイン変換をすることにより、８×８の復号された画素データｘ（ＤＣＴブロックの全ての画素データのうち垂直方向の画素データをｘ₁〜ｘ₈として図中に示す。）を得ることができる。
【００４４】
続いて、ステップＳ２において、この８×８の画素データｘを、垂直方向に１ライン毎交互に取り出して、飛び越し走査に対応した４×４のトップフィールドの画素ブロックと、飛び越し走査に対応した４×４のボトムフィールドの画素ブロックの２つの画素ブロックに分離する。すなわち、垂直方向に１ライン目の画素データｘ₁と、３ライン目の画素データｘ₃と、５ライン目の画素データｘ₅と、７ライン目の画素データｘ₇とを取り出して、トップフィールドに対応した画素ブロックを生成する。また、垂直方向に２ライン目の画素データｘ₂と、４ライン目の画素データｘ₄と、６ライン目の画素データｘ₆と、８ライン目の画素データｘ₈とを取り出して、ボトムフィールドに対応した画素ブロックを生成する。なお、ＤＣＴブロックの各画素を飛び越し走査に対応した２つの画素ブロックに分離する処理を、以下フィールド分離という。
【００４５】
続いて、ステップＳ３において、フィールド分離した２つの画素ブロックそれぞれに対して４×４の離散コサイン変換（ＤＣＴ４×４）をする。
【００４６】
続いて、ステップＳ４において、４×４の離散コサイン変換をして得られたトップフィールドに対応する画素ブロックの離散コサイン係数ｚ（トップフィールドに対応する画素ブロックの全ての係数のうち垂直方向の離散コサイン係数をｚ₁，ｚ₃，ｚ₅，ｚ₇として図中に示す。）の高域成分を間引き、２×２の離散コサイン係数から構成される画素ブロックとする。また、４×４の離散コサイン変換をして得られたボトムフィールドに対応する画素ブロックの離散コサイン係数ｚ（ボトムフィールドに対応する画素ブロックの全ての係数のうち垂直方向の離散コサイン係数をｚ₂，ｚ₄，ｚ₆，ｚ₈として図中に示す。）の高域成分を間引き、２×２の離散コサイン係数から構成される画素ブロックとする。
【００４７】
続いて、ステップＳ５において、高域成分の離散コサイン係数を間引いた画素ブロックに対して、２×２の逆離散コサイン変換（ＩＤＣＴ２×２）を行う。２×２の逆離散コサイン変換をすることにより、２×２の復号された画素データｘ′（トップフィールドの画素ブロックの全ての画素データのうち垂直方向の画素データをｘ′₁，ｘ′₃として図中に示し、また、ボトムフィールドに対応する画素ブロックの全ての画素データのうち垂直方向の画素データをｘ′₂，ｘ′₄として図中に示す。）を得ることができる。
【００４８】
続いて、ステップＳ６において、トップフィールドに対応する画素ブロックの画素データと、ボトムフィールドに対応する画素ブロックの画素データとを、垂直方向に１ラインずつ交互に合成して、４×４の画素データから構成される縮小逆離散コサイン変換をしたＤＣＴブロックを生成する。なお、トップフィールドとボトムフィールドに対応した２つの画素ブロックの各画素を垂直方向に交互に合成する処理を、以下フレーム合成という。
【００４９】
以上のステップＳ１からステップＳ６を行うことにより、フレームモード用縮小逆離散コサイン変換装１５では、図１９で示したような、フィールドモード用縮小逆離散コサイン変換装置２０４で生成した標準解像度画像の画素の位相と同位相の画素から構成される４×４のＤＣＴブロックを生成することができる。
【００５０】
つぎに、フィールドモード用動き補償装置２０８及びフレームモード用動き補償装置２０９について、さらに詳細に説明する。
【００５１】
まず、フィールドモード用動き補償装置２０８が行う補間処理について説明する。このフィールドモード用動き補償装置２０８では、以下に説明するように、高解像度画像の１／２画素精度の動き補償に対応するように、フレームメモリ２０７に記憶されている標準解像度画像の画素を補間して、１／４画素精度の画素を生成する。
【００５２】
水平方向の画素に対しては、整数精度の画素をフレームメモリ２０７からとりだして２つの画素を線形補間し、１／２画素精度の画素、及び、１／４精度の画素を生成する。
【００５３】
垂直方向の画素に対しては、まず、図２１（ａ）に示すように、トップフィールドの各画素の垂直方向の位相が１／２、５／２・・・となり、ボトムフィールドの各画素の垂直方向の位相が１、３・・・となるような、トップフィールドとボトムフィールドとのフィールド間で位相ずれを含む標準解像度画像の整数精度の画素を、フレームメモリ２０７から取り出す。
【００５４】
続いて、垂直方向の画素に対しては、図２１（ｂ）に示すように、線形補間フィルタを用いて、フィールド内で、フレームメモリ２０７から取り出した整数精度の画素から１／２画素精度の画素を生成する。すなわち、トップフィールドの整数精度の画素に基づきトップフィールドの１／２画素精度の画素を生成し、ボトムフィールドの整数精度の画素に基づきボトムフィールドの１／２画素精度の画素を生成する。例えば、この図２１（ｂ）に示すように、垂直方向の位相が７／２の位置にあるトップフィールドの画素は、５／２，９／２の位置にあるトップフィールドの画素から線形補間をされて生成される。また、垂直方向の位相が４の位置にあるボトムフィールドの画素は、３，５の位置にあるボトムフィールドの画素から線形補間をされて生成される。なお、この１／２画素精度の画素の生成は、線形補間フィルタではなく、ハーフバンドフィルタのような２倍補間フィルタを用いても良い。
【００５５】
続いて、垂直方向の画素に対しては、図２１（ｃ）に示すように、線形補間フィルタを用いて、フィールド内で、１／２画素精度の画素から１／４画素精度の画素を生成する。すなわち、トップフィールドの１／２画素精度の画素に基づきトップフィールドの１／４画素精度の画素を生成し、ボトムフィールドの１／２画素精度の画素に基づきボトムフィールドの１／４画素精度の画素を生成する。例えば、この図２１（ｃ）に示すように、垂直方向の位相が９／４の位置にあるトップフィールドの画素は、２，５／２の位置にあるトップフィールドの画素から線形補間をされて生成される。また、垂直方向の位相が１０／４の位置にあるボトムフィールドの画素は、９／４，１１／４の位置にあるボトムフィールドの画素から線形補間をされて生成される。
【００５６】
なお、２段階で線形補間を行わずに、４倍の線形補間フィルタを用いて整数精度の画素から直接１／４精度の画素を生成しても良い。
【００５７】
つぎに、フレームモード用動き補償装置２０９が行う補間処理について説明する。このフレームモード用動き補償装置２０９では、以下に説明するように、高解像度画像の１／２画素精度の動き補償に対応するように、フレームメモリ２０７に記憶されている標準解像度画像の画素を補間して、１／４画素精度の画素を生成する。
【００５８】
水平方向の画素に対しては、上述したフィールドモード用動き補償装置２０８と同様に、整数精度の画素をフレームメモリ２０７からとりだして２つの画素を線形補間し、１／２画素精度の画素、及び、１／４精度の画素を生成する。
【００５９】
垂直方向の画素に対しては、まず、図２２（ａ）に示すように、トップフィールドの各画素の垂直方向の位相が１／２、５／２・・・となり、ボトムフィールドの各画素の垂直方向の位相が１、３・・・となるような、トップフィールドとボトムフィールドとのフィールド間で位相ずれを含む標準解像度画像の整数精度の画素を、フレームメモリ２０７から取り出す。
【００６０】
続いて、垂直方向の画素に対しては、図２２（ｂ）に示すように、線形補間フィルタを用いて、フィールド内で、フレームメモリ２０７から取り出した整数精度の画素から１／２画素精度の画素を生成する。すなわち、トップフィールドの整数精度の画素に基づきトップフィールドの１／２画素精度の画素を生成し、ボトムフィールドの整数精度の画素に基づきボトムフィールドの１／２画素精度の画素を生成する。例えば、この図２２（ｂ）に示すように、垂直方向の位相が７／２の位置にあるトップフィールドの画素は、５／２，９／２の位置にあるトップフィールドの画素から線形補間をされて生成される。また、垂直方向の位相が４の位置にあるボトムフィールドの画素は、３，５の位置にあるボトムフィールドの画素から線形補間をされて生成される。
【００６１】
続いて、垂直方向の画素に対しては、図２２（ｃ）に示すように、線形補間フィルタを用いて、トップフィールドとボトムフィールドの２つのフィールド間で、１／２画素精度の画素から１／４画素精度の画素を生成する。例えば、この図２２（ｃ）に示すように、垂直方向の位相が１／４の位置にある画素は、０の位置にあるトップフィールドの画素と、１／２の位置にあるボトムフィールドの画素から線形補間をされて生成される。また、垂直方向の位相が３／４の位置にある画素は、１／２の位置にあるボトムフィールドの画素と１の位置にあるトップフィールドの画素から線形補間をされて生成される。
【００６２】
以上のような処理を行うフィールドモード用動き補償装置２０８及びフレームモード用動き補償装置２０９のブロック構成を図２３に示す。
【００６３】
フィールドモード用動き補償装置２０８及びフレームモード用動き補償装置２０９は、この図２３に示すように、アドレス生成装置２２２と、入力メモリ２２３と、垂直方向補間処理部２２４と、垂直方向フィルタ係数格納メモリ２２５と、中間メモリ２２６と、水平方向補間処理部２２７と、水平方向フィルタ係数格納メモリ２２８とを備えている。
【００６４】
アドレス生成部２２２には、動きベクトル情報が入力される。アドレス生成部２２２は、この動きベクトル情報に基づき、補間する画素の垂直方向及び水平方向の位置を示すアドレス情報を生成する。アドレス生成部２２２は、生成したアドレス情報に基づき、標準解像度画像の整数精度の画素をフレームメモリ２０７から取り出し、入力メモリ２２３に送る。
【００６５】
また、アドレス生成部２２２は、入力された動きベクトル情報を垂直方向フィルタ係数格納メモリ２２５及び水平方向フィルタ係数格納メモリ２２８に送る。
【００６６】
垂直方向フィルタ係数格納メモリ２２５には、フィールドモード用動き補償装置２０８の場合４通りの１次元フィルタ係数がされ、フレームモード用動き補償装置２０９の場合４通りの１次元フィルタ係数が格納されている。これは、本装置では、フィールド動き予測モードの場合には図２１（ｃ）に示したように参照画像に対して位相が０，０．２５，０．５，０．７５となる画素を生成し、１／４画素精度の動き補償を行い、フレーム動き予測モードの場合には図２２（ｃ）に示したように参照画像に対して位相が０，０．２５，０．５，０．７５，１，１．２５，１．５，１．７５となる画素を生成し、１／４画素精度の動き補償を行うためである。
【００６７】
水平方向フィルタ係数格納メモリ２２８には、フィールドモード用動き補償装置２０８とフレームモード用動き補償装置２０９との違いに拘わらず、４通りの１次元フィルタ係数が格納されている。
【００６８】
垂直方向フィルタ係数格納メモリ２２５及び水平方向フィルタ係数格納メモリ２２８は、送られた動きベクトル情報に応じたフィルタ係数を垂直方向補間処理部２２４及び水平方向補間処理部２２７に送る。
【００６９】
垂直方向補間処理部２２４は、入力メモリ２２３に格納された整数精度の画素データ（参照画像のマクロブロック）に対して、送られたフィルタ係数を用いて、垂直方向の１次元の画素補間を行う。垂直方向の画素補間が行われた参照画像のマクロブロックは、中間メモリ２２６に格納される。
【００７０】
水平方向補間処理部２２７は、中間メモリ２２６に格納された垂直方向の画素補間が行われた画素データに対して、送られたフィルタ係数を用いて、水平方向の１次元の画素補間を行う。水平方向の画素補間が行われた参照画像のマクロブロックは、動き補償がされた参照画像として加算装置２０６に送られ、縮小逆離散コサイン変換がされた圧縮画像データの加算がされる。
【００７１】
以上のような特願平１０−２０８３８５で提案された画像復号装置２００では、水平方向及び垂直方向に対して１／４画素精度で動き補償を行うことにより、トップフィールドとボトムフィールドとの間で位相ずれが生じず、いわゆるフィールド反転やフィールドミックスを防ぐことができ、動き補償に伴う画質の劣化を防止することができる。
【００７２】
ところで、以上のような特願平１０−２０８３８５号で提案した画像復号装置２００では、以下のような問題があった。
【００７３】
この画像復号装置２００では、動き補償を行う場合、垂直方向の画素補間と水平方向の画素補間とを分けて行っている。そのため、この画像復号装置２００では、中間結果をメモリに格納して再び読み出さなくてはならなく、余分なメモリ領域が必要となってしまい、さらに、メモリへのアクセス量が増え処理時間が増加してしまっていた。
【００７４】
本発明は、このような実情を鑑みてなされたものであり、飛び越し走査画像が有するインタレース性を損なうことなくフィールド直交変換モードとフレーム直交変換モードとによる画素の位相ずれをなくすことが可能な、高解像度画像の圧縮画像データから標準解像度の画像データを復号する画像復号装置及び画像復号方法であって、動き補償の際に簡易な構成で処理を簡略化した画像復号装置及び画像復号方法を提供することを目的とする。
【００７５】
【課題を解決するための手段】
本発明にかかる画像復号装置は、所定の画素ブロック（マクロブロック）単位で動き予測をすることによる予測符号化、及び、所定の画素ブロック（直交変換ブロック）単位で直交変換をすることによる圧縮符号化をした第１の解像度の圧縮画像データから、上記第１の解像度より低い第２の解像度の動画像データを復号する画像復号装置であって、飛び越し走査に対応した直交変換方式（フィールド直交変換モード）により直交変換がされた上記圧縮画像データの直交変換ブロックに対して、逆直交変換をする第１の逆直交変換手段と、順次走査に対応した直交変換方式（フレーム直交変換モード）により直交変換がされた上記圧縮画像データの直交変換ブロックに対して、逆直交変換をする第２の逆直交変換手段と、上記第１の逆直交変換手段又は上記第２の逆直交変換手段により逆直交変換がされた圧縮画像データと動き補償がされた参照画像データとを加算して、第２の解像度の動画像データを出力する加算手段と、上記加算手段から出力される動画像データを参照画像データとして記憶する記憶手段と、上記記憶手段が記憶している参照画像データのマクロブロックの垂直方向及び水平方向に対して１／４画素精度の動き補償をする動き補償手段とを備え、上記第１の逆直交変換手段は、上記直交変換ブロックの各係数のうち低周波成分の係数に対して逆直交変換をし、上記第２の逆直交変換手段は、上記直交変換ブロックの全周波数成分の係数に対して逆直交変換をし、逆直交変換をした直交変換ブロックの各画素を飛び越し走査に対応した２つの画素ブロックに分離し、分離した２つの画素ブロックに対してそれぞれ直交変換をし、直交変換をした２つの画素ブロックの各係数のうち低周波成分の係数に対して逆直交変換をし、逆直交変換をした２つの画素ブロックを合成して直交変換ブロックを生成し、上記動き補償手段は、飛び越し走査に対応した動き予測方式（フィールド動き予測モード）により動き予測がされた参照画像データのマクロブロックに対して垂直方向及び水平方向の１／４画素精度の画素補間をするフィールド用２次元フィルタ係数群、及び、順次走査に対応した動き予測方式（フレーム動き予測モード）により動き予測がされた参照画像データのマクロブロックに対して垂直方向及び水平方向の１／４画素精度の画素補間をするフレーム用２次元フィルタ係数群を格納するフィルタ格納部を有し、上記第１の逆直交変換手段又は上記第２の逆直交変換手段により逆直交変換がされた圧縮画像データの動きベクトルに基づき上記フィルタ格納部に格納された所定の２次元フィルタ係数を指定し、指定された２次元フィルタ係数を用いて上記記憶手段が記憶している参照画像データのマクロブロックを補間することを特徴とする。
【００７６】
また、本発明にかかる画像復号装置では、上記フレーム用２次元フィルタ係数群は、上記記憶手段が記憶している参照画像データのマクロブロックの水平方向の各画素に対して、１つのフィールド内で４倍補間を行い、上記記憶手段が記憶している参照画像データのマクロブロックの垂直方向の各画素に対して、１つのフィールド内で２倍補間を行い、１つのフィールド内で２倍補間をした各画素に対してトップフィールドとボトムフィールドとの間で線形補間をする複数のフィルタ係数からなり、上記フィールド用２次元フィルタ係数群は、上記記憶手段が記憶している参照画像データのマクロブロックの水平方向の各画素に対して、１つのフィールド内で４倍補間を行い、上記記憶手段が記憶している参照画像データのマクロブロックの垂直方向の各画素に対して、１つのフィールド内で２倍補間をし、１つのフィールド内で２倍補間をした各画素に対して線形補間をする複数のフィルタ係数からなることを特徴とする。
【００７７】
例えば、上記フレーム用２次元フィルタ係数群は、同一のフィルタ係数が共通化されて用いられる。また、上記動き補償手段は、上記フレーム用２次元フィルタ係数群を垂直方向の係数の対象性及び０係数を用いてグループ化し、補間処理を行う。
【００７８】
本発明にかかる画像復号方法は、所定の画素ブロック（マクロブロック）単位で動き予測をすることによる予測符号化、及び、所定の画素ブロック（直交変換ブロック）単位で直交変換をすることによる圧縮符号化をした第１の解像度の圧縮画像データから、上記第１の解像度より低い第２の解像度の動画像データを復号する画像復号方法であって、飛び越し走査に対応した直交変換方式（フィールド直交変換モード）により直交変換がされた上記圧縮画像データの直交変換ブロックに対して、逆直交変換をする第１の逆直交変換工程と、順次走査に対応した直交変換方式（フレーム直交変換モード）により直交変換がされた上記圧縮画像データの直交変換ブロックに対して、逆直交変換をする第２の逆直交変換工程と、上記第１の逆直交変換工程又は上記第２の逆直交変換工程により逆直交変換がされた圧縮画像データと動き補償がされた参照画像データとを加算して、第２の解像度の動画像データを出力する加算工程と、上記加算工程で出力される動画像データを参照画像データとして記憶する記憶工程と、上記記憶工程で記憶している参照画像データのマクロブロックの垂直方向及び水平方向に対して１／４画素精度の動き補償をする動き補償工程とを備え、上記第１の逆直交変換工程では、上記直交変換ブロックの各係数のうち低周波成分の係数に対して逆直交変換をし、上記第２の逆直交変換工程では、上記直交変換ブロックの全周波数成分の係数に対して逆直交変換をし、逆直交変換をした直交変換ブロックの各画素を飛び越し走査に対応した２つの画素ブロックに分離し、分離した２つの画素ブロックに対してそれぞれ直交変換をし、直交変換をした２つの画素ブロックの各係数のうち低周波成分の係数に対して逆直交変換をし、逆直交変換をした２つの画素ブロックを合成して直交変換ブロックを生成し、上記動き補償工程では、飛び越し走査に対応した動き予測方式（フィールド動き予測モード）により動き予測がされた参照画像データのマクロブロックに対して垂直方向及び水平方向の１／４画素精度の画素補間をするフィールド用２次元フィルタ係数群、及び、順次走査に対応した動き予測方式（フレーム動き予測モード）により動き予測がされた参照画像データのマクロブロックに対して垂直方向及び水平方向の１／４画素精度の画素補間をするフレーム用２次元フィルタ係数群を格納したフィルタ格納部の中から、上記第１の逆直交変換工程又は上記第２の逆直交変換工程により逆直交変換がされた圧縮画像データの動きベクトルに基づき格納された所定の２次元フィルタ係数を指定し、指定された２次元フィルタ係数を用いて記憶している参照画像データのマクロブロックを補間することを特徴とする。
【００７９】
本発明にかかる画像復号方法では、上記フレーム用２次元フィルタ係数群は、上記記憶工程で記憶している参照画像データのマクロブロックの水平方向の各画素に対して、１つのフィールド内で４倍補間を行い、上記記憶工程で記憶している参照画像データのマクロブロックの垂直方向の各画素に対して、１つのフィールド内で２倍補間を行い、１つのフィールド内で２倍補間をした各画素に対してトップフィールドとボトムフィールドとの間で線形補間をする複数のフィルタ係数からなり、上記フィールド用２次元フィルタ係数群は、上記記憶工程で記憶している参照画像データのマクロブロックの水平方向の各画素に対して、１つのフィールド内で４倍補間を行い、上記記憶工程で記憶している参照画像データのマクロブロックの垂直方向の各画素に対して、１つのフィールド内で２倍補間をし、１つのフィールド内で２倍補間をした各画素に対して線形補間をする複数のフィルタ係数からなることを特徴とする。
【００８０】
例えば、上記フレーム用２次元フィルタ係数群は、同一のフィルタ係数が共通化されて用いられる。また、上記動き補償工程では、上記フレーム用２次元フィルタ係数群を垂直方向の係数の対象性及び０係数を用いてグループ化し、補間処理を行う。
【００８１】
以上のような本発明では、１／４画素精度の画素補間をする際に、垂直方向及び水平方向の画素補間を２次元フィルタにより一括して行う。そして、フレーム動き予測モードに用いる複数の２次元フィルタのうち、同一のマトリクスとなるフィルタを共通化してもちいる。さらに、垂直方向の係数の対象性及び０係数を用いてグループ化し、処理を簡略化する。
【００８２】
【発明の実施の形態】
以下、本発明の実施の形態について、図面を参照しながら説明する。
【００８３】
図１に本発明の実施の形態の画像復号装置のブロック構成図を示す。
【００８４】
図１に示す画像復号装置１０は、垂直方向の有効ライン数が例えば１１５２本の高解像度画像をＭＰＥＧ２で画像圧縮したビットストリームが入力され、この入力されたビットストリームを復号するとともに１／２の解像度に縮小して、垂直方向の有効ライン数が例えば５７６本の標準解像度画像を出力する装置である。
【００８５】
この画像復号装置１０は、圧縮された高解像度画像のビットストリームが供給され、このビットストリームを解析するビットストリーム解析装置１１と、データの発生頻度に応じた符号長を割り当てる可変長符号化がされた上記ビットストリームを復号する可変長符号復号装置１２と、ＤＣＴブロックの各係数に量子化ステップを掛ける逆量子化装置１３と、フィールドＤＣＴモードで離散コサイン変換がされたＤＣＴブロックに対して縮小逆離散コサイン変換をして標準解像度画像を生成するフィールドモード用縮小逆離散コサイン変換装置１４と、フレームＤＣＴモードで離散コサイン変換がされたＤＣＴブロックに対して縮小逆離散コサイン変換をして標準解像度画像を生成するフレームモード用縮小逆離散コサイン変換装置１５と、縮小逆離散コサイン変換がされた標準解像度画像と動き補償がされた参照画像とを加算する加算装置１６と、参照画像を一時記憶するフレームメモリ１７と、フレームメモリ１７が記憶した参照画像に動き補償をする動き補償装置１８と、フレームメモリ１７が記憶した画像に対してポストフィルタリングをすることにより、画枠変換をするとともに画素の位相ずれを補正してテレビジョンモニタ等に表示するための標準解像度の画像データを出力する画枠変換・位相ずれ補正装置２０とを備えている。
【００８６】
フィールドモード用縮小逆離散コサイン変換装置１４は、入力されたビットストリームのマクロブロックが、フィールドＤＣＴモードで離散コサイン変換されている場合に用いられる。フィールドモード用縮小逆離散コサイン変換装置１４は、フィールドＤＣＴモードで離散コサイン変換がされたマクロブロック内の８×８個の係数が示されたＤＣＴブロックに対して、図１２で示したような、低域の４×４の係数のみに逆離散コサイン変換を行う。すなわち、水平方向及び垂直方向の低域の４点の離散コサイン係数に基づき縮小逆離散コサイン変換を行う。このフィールドモード用縮小逆離散コサイン変換装置１４では、以上のような縮小逆離散コサイン変換を行うことにより、１つのＤＣＴブロックが４×４の画素から構成される標準解像度画像を復号することができる。この復号された画像データの各画素の位相は、図１９に示すように、トップフィールドの各画素の垂直方向の位相が１／２、５／２・・・となり、ボトムフィールドの各画素の垂直方向の位相が１、３・・・となる。すなわち、復号された下位レイヤーのトップフィールドでは、先頭画素（位相が１／２の画素）の位相が上位レイヤーのトップフィールドの先頭から１番目と２番目の画素（位相が０と２の画素）の中間位相となり、先頭から２番目の画素（位相が５／２の画素）の位相が上位レイヤーのトップフィールドの先頭から３番目と４番目の画素（位相が４と６の画素）の中間位相となる。また、復号された下位レイヤーのボトムフィールドでは、先頭画素（位相が１の画素）の位相が上位レイヤーのボトムフィールドの先頭から１番目と２番目の画素（位相が１と３の画素）の中間位相となり、先頭から２番目の画素（位相が３の画素）の位相が上位レイヤーのボトムフィールドの先頭から３番目と４番目の画素（位相が５と７の画素）の中間位相となる。
【００８７】
フレームモード用縮小逆離散コサイン変換装置１５は、入力されたビットストリームのマクロブロックが、フレームＤＣＴモードで離散コサイン変換されている場合に用いられる。フレームモード用縮小逆離散コサイン変換装置１５は、フレームＤＣＴモードで離散コサイン変換がされたマクロブロック内の８×８個の係数が示されたＤＣＴブロックに対して、縮小逆離散コサイン変換を行う。そして、フレームモード用縮小逆離散コサイン変換装置１５では、１つのＤＣＴブロックが４×４の画素から構成される解像度画像を復号するとともに、フィールドモード用縮小逆離散コサイン変換装置１４で生成した標準解像度画像の画素の位相と同位相の画像を生成する。すなわち、フレームモード用縮小逆離散コサイン変換装置１５で復号された画像データの各画素の位相は、図１９に示すように、トップフィールドの各画素の垂直方向の位相が１／２、５／２・・・となり、ボトムフィールドの各画素の垂直方向の位相が１、３・・・となる。
【００８８】
なお、このフレームモード用縮小逆離散コサイン変換装置１５の処理内容は、上述した特願平１０−２０８３８５号で提案した画像復号装置２００のフレームモード用縮小逆離散コサイン変換装置２０５と同一であるので、その詳細は省略する。
【００８９】
加算装置１６は、フィールドモード用縮小逆離散コサイン変換装置１４又はフレームモード用縮小逆離散コサイン変換装置１５により縮小逆離散コサイン変換されたマクロブロックがイントラ画像の場合には、そのイントラ画像をそのままフレームメモリ１７に格納する。また、加算装置１６は、フィールドモード用縮小逆離散コサイン変換装置１４又はフレームモード用縮小逆離散コサイン変換装置１５により縮小逆離散コサイン変換されたマクロブロックがインター画像である場合には、そのインター画像に、動き補償装置１８により動き補償がされた参照画像を合成して、フレームメモリ１７に格納する。
【００９０】
動き補償装置１８は、フレームメモリ１７に記憶されている標準解像度画像の参照画像に対して、トップフィールドとボトムフィールドとの間の位相ずれ成分を考慮した形で１／４画素精度で補間処理を行い、フィールド動き予測モードに対応した動き補償をする。この動き補償装置１８により動き補償がされた参照画像は、加算装置１６に供給され、インター画像に合成される。この動き補償装置１８の処理については、その詳細を後述する。
【００９１】
画枠変換・位相ずれ補正装置２０は、フレームメモリ１７が記憶した標準解像度の参照画像或いは加算装置１６が合成した画像が供給され、この画像をポストフィルタリングにより、トップフィールドとボトムフィールドとの間の位相ずれ成分を補正するとともに画枠を標準解像度のテレビジョンの規格に合致するように変換する。すなわち、画枠変換・位相ずれ補正装置２０は、トップフィールドの各画素の垂直方向の位相が１／２、５／２・・・となり、ボトムフィールドの各画素の垂直方向の位相が１、３・・・となる標準解像度画像を、例えば、トップフィールドの各画素の垂直方向の位相が０、２、４・・・となり、ボトムフィールドの各画素の垂直方向の位相が１、３、５・・・となるように補正する。また、画枠変換・位相ずれ補正装置２０は、高解像度のテレビジョン規格の画枠を、１／４に縮小して標準解像度のテレビジョン規格の画枠に変換する。
【００９２】
画像復号装置１０では、以上のような構成を有することにより、高解像度画像をＭＰＥＧ２で画像圧縮したビットストリームを、復号するとともに解像度を１／２に縮小して、標準解像度画像を出力することができる。
【００９３】
つぎに、動き補償装置１８について、さらに詳細に説明する。
【００９４】
この動き補償装置１８では、以下に説明するように、高解像度画像の１／２画素精度の動き補償に対応するように、フレームメモリ１７に記憶されている標準解像度画像の画素を補間して、１／４画素精度の画素を生成する。
【００９５】
この動き補償装置１８は、垂直方向の画素補間と水平方向の画素補間とを、１つの２次元フィルタ係数を用いて処理を行っている。もっとも、この動き補償装置１８の処理結果生成された画素の位相、すなわち、この動き補償装置１８によりフィルタリングした結果生成された画素の位相は、上述した特願平１０−２０８３８５で提案された画像復号装置２００の動き補償装置で処理した結果と同一となる。
【００９６】
すなわち、この動き補償装置１８では、フィールド動き予測モードの場合には、以下のように処理を行う。
【００９７】
水平方向の画素に対しては、整数精度の２つの画素を線形補間し、１／２画素精度の画素、及び、１／４精度の画素を生成する。
【００９８】
垂直方向の画素に対しては、まず、トップフィールドの各画素の垂直方向の位相が１／２、５／２・・・となり、ボトムフィールドの各画素の垂直方向の位相が１、３・・・となるような、トップフィールドとボトムフィールドとのフィールド間で位相ずれを含む標準解像度画像の整数精度の画素を、フレームメモリ１７から取り出す。
【００９９】
続いて、垂直方向の画素に対して、フィールド内で、フレームメモリ１７から取り出した整数精度の画素から１／２画素精度の画素を生成する。すなわち、トップフィールドの整数精度の画素に基づきトップフィールドの１／２画素精度の画素を生成し、ボトムフィールドの整数精度の画素に基づきボトムフィールドの１／２画素精度の画素を生成する。
【０１００】
続いて、垂直方向の画素に対して、フィールド内で、１／２画素精度の画素から１／４画素精度の画素を生成する。すなわち、トップフィールドの１／２画素精度の画素に基づきトップフィールドの１／４画素精度の画素を生成し、ボトムフィールドの１／２画素精度の画素に基づきボトムフィールドの１／４画素精度の画素を生成する。
【０１０１】
以上のようなフィールド動き予測モードの場合の１／４画素精度の画素補間を図面に表すと、図２に示すようになる。この図２において、●はトップフィールドの整数画素精度の画素の位相位置を示し、▲はトップフィールドの１／２画素精度の画素の位相位置を示し、■はトップフィールドの１／４画素精度の画素の位相位置を示している。また、○はボトムフィールドの整数画素精度の画素の位相位置を示し、△はトップフィールドの１／２画素精度の画素の位相位置を示し、□はトップフィールドの１／４画素精度の画素の位相位置を示している。
【０１０２】
動き補償装置１８は、以上のようなフィールド動き予測モードにおける画素補間処理を、１つの２次元補間フィルタを用いて行い、整数精度の画素から直接１／４精度の画素を生成する
また、この動き補償装置１８では、フレーム動き予測モードの場合には、以下のように処理を行う。
【０１０３】
水平方向の画素に対しては、整数精度の画素の２つの画素を線形補間し、１／２画素精度の画素、及び、１／４精度の画素を生成する。
【０１０４】
垂直方向の画素に対しては、まず、トップフィールドの各画素の垂直方向の位相が１／２、５／２・・・となり、ボトムフィールドの各画素の垂直方向の位相が１、３・・・となるような、トップフィールドとボトムフィールドとのフィールド間で位相ずれを含む標準解像度画像の整数精度の画素を、フレームメモリ１７から取り出す。
【０１０５】
続いて、垂直方向の画素に対して、フィールド内で、フレームメモリ１７から取り出した整数精度の画素から１／２画素精度の画素を生成する。すなわち、トップフィールドの整数精度の画素に基づきトップフィールドの１／２画素精度の画素を生成し、ボトムフィールドの整数精度の画素に基づきボトムフィールドの１／２画素精度の画素を生成する。
【０１０６】
続いて、垂直方向の画素に対して、トップフィールドとボトムフィールドの２つのフィールド間で、１／２画素精度の画素から１／４画素精度の画素を生成する。例えば、垂直方向の位相が１／４の位置にある画素は、０の位置にあるトップフィールドの画素と、１／２の位置にあるボトムフィールドの画素から線形補間をされて生成される。また、垂直方向の位相が３／４の位置にある画素は、１／２の位置にあるボトムフィールドの画素と１の位置にあるトップフィールドの画素から線形補間をされて生成される。
【０１０７】
以上のようなフレーム動き予測モードの場合の１／４画素精度の画素補間を図面に表すと、図３に示すようになる。この図３において、●はトップフィールドの整数画素精度の画素の位相位置を示し、▲はトップフィールドの１／２画素精度の画素の位相位置を示し、■はトップフィールドの１／４画素精度の画素の位相位置を示している。また、○はボトムフィールドの整数画素精度の画素の位相位置を示し、△はトップフィールドの１／２画素精度の画素の位相位置を示し、□はトップフィールドの１／４画素精度の画素の位相位置を示している。
【０１０８】
動き補償装置１８は、以上のようなフレーム動き予測モードにおける画素補間処理を、１つの２次元補間フィルタを用いて行い、整数精度の画素から直接１／４精度の画素を生成する。
【０１０９】
つぎに、動き補償装置１８のブロック構成を図４に示し、この動き補償装置１８の回路構成及び画素補間の為のフィルタリング処理内容について、具体的に説明する。
【０１１０】
動き補償装置１８は、この図４に示すように、アドレス生成装置２１と、入力メモリ２２と、フィルタ係数格納メモリ２３と、２次元補間処理部２４とを備えている。
【０１１１】
アドレス生成部２１には、動きベクトル情報及びモード情報が入力される。モード情報とは、マクロブロックの動き補償のモードがフィールド動き予測モードであるか、フレーム動き予測モードであるかを示す情報である。
【０１１２】
アドレス生成部２１は、この動きベクトル情報に基づき、補間する画素の垂直方向及び水平方向の位置を示すアドレス情報を生成する。アドレス生成部２１は、生成したアドレス情報に基づき、標準解像度画像の整数精度の画素をマクロブロック単位でフレームメモリ１７から取り出し、入力メモリ２２に送る。
【０１１３】
また、アドレス生成部２１は、入力された動きベクトル情報及びモード情報をフィルタ係数格納メモリ２３に送る。
【０１１４】
フィルタ係数格納メモリ２３は、フィールド動き予測モードに対応した複数の２次元フィルタ係数を格納している。図５に、線形フィルタを用いた場合の１６通りの２次元フィルタ係数を示す。図５に示す各フィルタ係数は、垂直方向（Ｖ）の位相０，０．２５，０．５，０．７５と、水平方向（Ｈ）の位相０，０．２５，０．５，０．７５との組み合わせの数だけ存在する。すなわち、フィールド動き予測モードに対応した、垂直方向４係数×水平方向４係数の合計１６個のマトリクス係数を格納している。
【０１１５】
また、フィルタ係数格納メモリ２３は、フレーム動き予測モードに対応した複数の２次元フィルタ係数も格納している。ここで、フレーム動き予測モードの場合、垂直方向（Ｖ）の位相０，０．２５，０．５，０．７５，１，１．２５，１．５，１．７５と、水平方向（Ｈ）の位相０，０．２５，０．５，０．７５との組み合わせの数だけ、すなわち、垂直方向８係数×水平方向４係数の合計３２個のフィルタ係数が通常であれば存在することとなる。しかしながら、このフィルタ係数格納メモリ２３には、マトリクスが同一となるフィルタ係数は共通化して用い、メモリ容量の効率化を図っている。
【０１１６】
具体的には、以下のように共通化して、フレーム動き予測モードに対応したフィルタ係数の削減を図っている。
【０１１７】
フレーム動き予測モードに対応したフィルタ係数は、図６に示すように、垂直方向の位相が０のトップフィールド、垂直方向の位相が０のボトムフィールド、垂直方向の位相が０．５のトップフィールド、垂直方向の位相が１．５のボトムフィールドは、全て同一のフィルタ係数となる（グループ１）。垂直方向の位相が０．２５のトップフィールド、垂直方向の位相が１．７５のボトムフィールドは、全て同一のフィルタ係数となる（グループ２）。垂直方向の位相が０．５のボトムフィールド、垂直方向の位相が１のトップフィールド、垂直方向の位相が１のボトムフィールド、垂直方向の位相が１．５のトップフィールドは、全て同一のフィルタ係数となる（グループ３）。垂直方向の位相が０．２５のボトムフィールド、垂直方向の位相が０．７５のトップフィールド、垂直方向の位相が１．２５のボトムフィールド、垂直方向の位相が１．７５のトップフィールドは、全て同一のフィルタ係数となる（グループ４）。垂直方向の位相が０．７５のボトムフィールド、垂直方向の位相が１．２５のトップフィールドは、全て同一のフィルタ係数となる（グループ５）。そして、このような同一のフィルタ係数をグループ化して共通化して画素補間に用いる。
【０１１８】
このようにフィルタ係数を共通化して用いることによって、本来であれば、８×４の３２通りのフィルタ係数となるところを、５×４の２０通りのフィルタ係数に縮小している。
【０１１９】
図７に、線形フィルタを用いたフレーム動き予測モード用の２次元フィルタ係数を示す。図７に示す各フィルタ係数は、図６に示したグループ１からグループ５の５個のグループと、水平方向（Ｈ）の位相０，０．２５，０．５，０．７５との組み合わせの数、すなわち、５グループ×水平方向の４係数＝２０通りの存在する。
【０１２０】
フィルタ係数格納メモリ２３は、送られた動きベクトル情報及びモード情報に応じて、図５に示した１６通りのフィルタ係数、又は、図７に示した２０通りのフィルタ係数のうち、１つのフィルタ係数を２次元補間処理部２４に送る。
【０１２１】
２次元補間処理部２４は、フィールド動き予測モードの場合、送られたフィルタ係数を用いて、以下の式１の内積演算を行いマクロブロックに対して画素補間を行う。
【０１２２】
【数１】

【０１２３】
この式１において、ｃは、図５に示したフィルタ係数（２次元マトリクス）である。ｘは、入力されたマクロブロックの画素データである。
【０１２４】
そして、この式１により内積演算がされた結果（ｙ）が、１／４画素精度で動き補償がされた画素データとして、図１に示す加算装置１６に供給される。
【０１２５】
また、２次元補間処理部２４は、フレーム動き予測モードの場合、送られたフィルタ係数を用いて、以下の式２の内積演算を行いマクロブロックに対して画素補間を行う。
【０１２６】
【数２】

【０１２７】
ここで、グループ１に含まれるフィルタ係数は、入力データのサンプル点に一致するデータを出力する演算を行うものである。グループ２は、異なるフィールドの同一ライン間で補間されたデータを出力する演算を行うものである。グループ３は、同一のフィールドの２ライン間で補間されたデータを出力する演算を行うものである。グループ４は、あるフィールドの２ライン間で補間されたデータと他のフィールドの１ラインのデータとの間で補間されたデータを出力する演算を行うものである。グループ５は、あるフィールドの２ライン間で補間されたデータと、他のフィールドの２ライン間で補間されたデータとの間で補間されたデータを出力する演算を行うものである。
【０１２８】
ところで、フレーム動き予測モードにおいて、フィールド動き予測モードと同様に式１を用いて内積演算を行うことも可能であるが、上記式２に示すように演算を行うことにより、垂直方向の係数の対象性や０係数の並びによるグループ化して、乗算の数を少なくすることができる。この式２に示す内積演算式は、予め垂直方向に必要なライン数を加算し、水平方向のみに乗算を行い内積演算を行う。
【０１２９】
このことにより、グループ２とグループ３とは、２ラインが０係数なので、垂直方向の２ラインの加算のみを行えばよい。このグループ２とグループ３は、演算するラインが異なるが、アドレス生成回路２１が演算するラインを指定することにより、同一のフィルタ係数を用いることができる。また、グループ４は、第１ラインと第３ラインが第２ラインの１／２の係数となっており、この対象性を利用して、２×ｃ×ｘという演算を、ｃ×（ｘ＋ｘ）というように分解して演算することが可能である。すなわち、垂直方向の係数の加算結果は、グループ４とグループ５とで同一となる。このように分解することにより、グループ４とグループ５とで同一のフィルタ係数を用いることができる。
【０１３０】
以上のように本発明の実施の形態の画像復号装置１０では、水平方向及び垂直方向に対して１／４画素精度で動き補償を行うことにより、トップフィールドとボトムフィールドとの間で位相ずれが生じず、いわゆるフィールド反転やフィールドミックスを防ぐことができ、動き補償に伴う画質の劣化を防止することができる。
【０１３１】
さらに、この画像復号装置１０では、１／４画素精度の動き補償の際に２次元のフィルタ演算をするので、中間結果を格納するためのメモリを削減することができる。また、この画像復号装置１０では、１／４画素精度の動き補償の際にメモリへのアクセス量を減らすことができ、処理時間が短縮する。また、この画像復号装置１０では、フレーム動き予測モードの際のフィルタ係数をグループ化することにより、フレーム予測の際のコードサイズを最小限に抑え、キャッシュミス等を防止することができる。
【０１３２】
なお、本発明の実施の形態の画像復号装置１０では、動き補償を２次元の線形補間フィルタを用いて行った例を示したが、例えば、フィルタのタップ数を増やしたハーフバンドフィルタ等の他のフィルタを用いてもよい。
【０１３３】
【発明の効果】
本発明にかかる画像復号装置及び画像復号方法では、フレーム直交変換モードにより直交変換がされた直交変換ブロックの全周波数成分の係数に対して逆直交変換をして飛び越し走査に対応した２つの画素ブロックに分離し、分離した２つの画素ブロックに対してそれぞれ直交変換をして低周波成分の係数に対して逆直交変換をし、逆直交変換をした２つの画素ブロックを合成する。また、本発明では、記憶している参照画像データのマクロブロックの各画素に対して補間をして、１／４画素精度の画素から構成されるマクロブロックを生成する。そして、この画像復号方法では、第１の解像度より低い第２の解像度の動画像データを出力する。
【０１３４】
このことにより、本発明では、復号に必要な演算量及び記憶容量を少なくすることができ、フィールド動き予測モードとフレーム動き予測モードとによる動き補償の際の画素の位相ずれをなくし、動き補償に起因する画質の劣化を防止することができる。
【０１３５】
さらに本発明では、１／４画素精度の画素補間をする際に、垂直方向及び水平方向の画素補間を２次元フィルタにより一括して行う。そして、フレーム動き予測モードに用いる複数の２次元フィルタのうち、同一のマトリクスとなるフィルタを共通化してもちいる。さらに、垂直方向の係数の対象性及び０係数を用いてグループ化し、処理を簡略化する。
【０１３６】
このことにより、本発明では、１／４画素精度の動き補償の際の中間結果を格納するためのメモリを削減することができる。また、本発明では、１／４画素精度の動き補償の際にメモリへのアクセス量を減らすことができ、処理時間が短縮する。また、本発明では、フレーム予測の際のコードサイズを最小限に抑え、キャッシュミス等を防止することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態の画像復号装置のブロック図である。
【図２】フィールド動き予測モードの場合の１／４画素精度の画素補間を説明するための図である。
【図３】フィールド動き予測モードの場合の１／４画素精度の画素補間を説明するための図である。
【図４】上記画像復号装置の動き補償装置のブロック図である。
【図５】フィールド動き予測モードに対応した２次元フィルタ係数の一例を示す図である。
【図６】フレーム動き予測モードに対応した２次元フィルタ係数のグループ分けを説明するための図である。
【図７】フレーム動き予測モードに対応した２次元フィルタ係数の一例を示す図である。
【図８】従来の第１のダウンデコーダを示すブロック図である。
【図９】従来の第２のダウンデコーダを示すブロック図である。
【図１０】従来の第３のダウンデコーダを示すブロック図である。
【図１１】従来の画像復号装置のブロック図である。
【図１２】上記従来の画像復号装置のフィールドＤＣＴモードにおける縮小逆離散コサイン変換処理を説明するための図である。
【図１３】上記従来の画像復号装置のフィールドＤＣＴモードにおける縮小逆離散コサイン変換処理を説明するための図である。
【図１４】上記従来の画像復号装置のフィールド動き予測モードにおける線形補間処理を説明するための図である。
【図１５】上記従来の画像復号装置のフレーム動き予測モードにおける線形補間処理を説明するための図である。
【図１６】上記従来の画像復号装置のフィールドＤＣＴモードの結果得られる画素の位相を説明するための図である。
【図１７】上記従来の画像復号装置のフレームＤＣＴモードの結果得られる画素の位相を説明するための図である。
【図１８】特願平１０−２０８３８５で提案された画像復号装置のブロック図である。
【図１９】上記特願平１０−２０８３８５で提案された画像復号装置のフレームメモリに格納される参照画像の垂直方向の画素の位相を説明するための図である。
【図２０】上記特願平１０−２０８３８５で提案された画像復号装置のフレームモード用縮小逆離散コサイン変換装置の１ブロック処理の内容を説明するための図である。
【図２１】上記特願平１０−２０８３８５で提案された画像復号装置のフィールド動き予測モードの際の１／４画素補間処理を説明するための図である。
【図２２】上記特願平１０−２０８３８５で提案された画像復号装置のフレーム動き予測モードの際の１／４画素補間処理を説明するための図である。
【図２３】上記特願平１０−２０８３８５で提案された画像復号装置の動き補償装置の部ルック図である。
【符号の説明】
１０画像復号装置、１４縮小逆離散コサイン変換装置、１５フレームモード用縮小逆離散コサイン変換装置、１７フレームメモリ、１８動き補償装置、２１アドレス生成部、２２入力メモリ、２３フィルタ係数格納メモリ、２４２次元補間処理部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image decoding apparatus and an image decoding method for decoding compressed image data having a first resolution that has been compression-encoded by performing orthogonal transform in units of orthogonal transform blocks each consisting of 8 × 8 pixels. The present invention relates to an image decoding apparatus and an image decoding method for decoding compressed image data having a resolution and reducing the moving image data to a second resolution lower than the first resolution.
[0002]
[Prior art]
The standardization of digital television broadcasting using an image compression method such as MPEG2 (Moving Picture Experts Group phase 2) is underway. Standards for digital television broadcasting include standards corresponding to standard resolution images (for example, 576 effective lines in the vertical direction), standards corresponding to high resolution images (for example, 1152 effective lines in the vertical direction), and the like. is there. Therefore, in recent years, by decoding compressed image data of a high resolution image and reducing the compressed image data to 1/2 resolution, image data of a standard resolution image is generated, and this image data corresponds to the standard resolution. There is a need for a down decoder for display on a television monitor.
[0003]
A down-decoder that decodes a bit stream such as MPEG2 that has been subjected to predictive coding by motion prediction and compression coding by discrete cosine transform on a high-resolution image and down-samples it into a standard-resolution image is described in the document “No low frequency drift” "Scalable Decoder" (Iwahashi, Kamibayashi, Takaya: IEICE Tech. Reports CS94-186, DSP94-108, 1995-01) (hereinafter referred to as Document 1). The document 1 shows the following first to third down decoders.
[0004]
As shown in FIG. 8, the first down-decoder uses 8 (the number of coefficients counted from the DC component in the horizontal direction) × 8 (the number of coefficients counted from the DC component in the vertical direction) for the bit stream of the high-resolution image. The inverse discrete cosine transform device 101 for performing the inverse discrete cosine transform, the adder device 102 for adding the high-resolution image subjected to the discrete cosine transform and the motion compensated reference image, and a frame for temporarily storing the reference image A memory 103, a motion compensator 104 that performs motion compensation on the reference image stored in the frame memory 103 with 1/2 pixel accuracy, and a downsampling device 105 that converts the reference image stored in the frame memory 103 into an image of standard resolution. It has.
[0005]
In the first down decoder, an output image obtained by performing inverse discrete cosine transform and decoded as a high-resolution image is reduced by the down-sampling device 105 to output standard-resolution image data.
[0006]
As shown in FIG. 9, the second down decoder replaces the coefficient of the high frequency component of the DCT (Discrete Cosine Transform) block of the bit stream of the high resolution image with 0 and performs inverse discrete cosine transform of 8 × 8. A cosine transform device 111, an adder device 112 for adding a discrete cosine transformed high resolution image and a motion compensated reference image, a frame memory 113 for temporarily storing the reference image, and a reference stored in the frame memory 113 A motion compensation device 114 that performs motion compensation on an image with 1/2 pixel accuracy and a downsampling device 115 that converts a reference image stored in the frame memory 113 into an image of standard resolution are provided.
[0007]
In this second down decoder, the output image decoded as a high resolution image by performing inverse discrete cosine transform by replacing the coefficient of the high frequency component among all the coefficients of the DCT block with 0 is reduced by the downsampling device 105 and is standardized. Output resolution image data.
[0008]
As shown in FIG. 10, the third down decoder performs, for example, 4 × 4 inverse discrete cosine transform using only the low frequency component coefficient of the DCT block of the bit stream of the high resolution image and decodes it to the standard resolution image. A reduced inverse discrete cosine transform device 121, an adder 122 that adds the standard resolution image subjected to the reduced inverse discrete cosine transform and the motion compensated reference image, a frame memory 123 that temporarily stores the reference image, and a frame And a motion compensation device 124 for performing motion compensation on the reference image stored in the memory 123 with 1/4 pixel accuracy.
[0009]
In the third down decoder, inverse discrete cosine transform is performed using only the coefficients of the low frequency component among all the coefficients of the DCT block, and the high resolution image is decoded as the standard resolution image.
[0010]
Here, since the first down decoder performs inverse discrete cosine transform on all the coefficients in the DCT block and decodes the high-resolution image, the first down decoder and the inverse discrete cosine transform device 101 having high arithmetic processing capability and the high resolution image are decoded. A frame memory 103 having a capacity is required. In the second down decoder, since the high-frequency image is decoded by performing high-frequency component 0 in the coefficients in the DCT block and the high-resolution image is decoded, the inverse discrete cosine transform device 111 has low arithmetic processing capability. Although it is good, a high-capacity frame memory 113 is still necessary. In contrast to the first and second down decoders, the third down decoder performs inverse discrete cosine transform using only the coefficients of the low frequency component among all the coefficients in the DCT block, and thus the inverse discrete cosine transform apparatus. The arithmetic processing capability of 121 may be low, and furthermore, since the reference image of the standard resolution image is decoded, the capacity of the frame memory 123 can be reduced.
[0011]
By the way, there are a sequential scanning method and an interlaced scanning method as a moving image display method such as television broadcasting. The sequential scanning method is a display method that sequentially displays images obtained by sampling all pixels in a frame at the same timing. The interlaced scanning method is a display method that alternately displays images obtained by sampling pixels in a frame at different timings for each line in the horizontal direction.
[0012]
In this interlaced scanning method, one of images obtained by sampling pixels in a frame at different timings for each line is called a top field (also called a first field), and the other is called a bottom field (also called a second field). Say.) An image including the top line in the horizontal direction of the frame is a top field, and an image including the second line in the horizontal direction of the frame is a bottom field. Therefore, in the interlace scanning method, one frame is composed of two fields.
[0013]
In MPEG2, in order to efficiently compress a moving image signal corresponding to the interlace scanning method, not only can a frame be assigned to a picture, which is a compression unit of a screen, but also a field can be assigned to a picture for encoding. .
[0014]
In MPEG2, when a field is assigned to a picture, the structure of the bit stream is called a field structure, and when a frame is assigned to a picture, the structure of the bit stream is called a frame structure. In the field structure, a DCT block is formed from pixels in the field, and discrete cosine transform is performed on a field basis. A processing mode in which discrete cosine transform is performed on a field basis is called a field DCT mode. In the frame structure, a DCT block is formed from pixels in the frame, and discrete cosine transform is performed on a frame basis. A processing mode in which discrete cosine transformation is performed in units of frames is called a frame DCT mode. Furthermore, in the field structure, a macroblock is formed from pixels in the field, and motion prediction is performed in field units. A processing mode in which motion prediction is performed in units of fields is referred to as field motion prediction mode. In the frame structure, a macro block is formed from pixels in the frame, and motion prediction is performed on a frame basis. A processing mode in which motion prediction is performed in units of frames is called a frame motion prediction mode.
[0015]
By the way, an image decoding apparatus that decodes compressed image data corresponding to the interlace scanning method using the third down decoder shown in the above-mentioned document 1 is disclosed in, for example, the document “A Compensation Method of Drift Errors in Scalability” (N OBIKANE, K. TAHARA and J. YONEMITSU, HDTV Work Shop '93) (hereinafter referred to as Document 2).
[0016]
As shown in FIG. 11, the conventional image decoding device shown in this document 2 is supplied with a bit stream obtained by compressing a high-resolution image with MPEG2, and a bit stream analyzing device 131 for analyzing the bit stream; A variable-length code decoding apparatus 132 that decodes a variable-length-encoded bitstream to which a code length is assigned according to the frequency of occurrence; an inverse quantization apparatus 2033 that multiplies each coefficient of the DCT block by a quantization step; A reduced inverse discrete cosine transform device 134 that decodes a standard resolution image by performing, for example, 4 × 4 inverse discrete cosine transform using only low frequency component coefficients among all the coefficients, and a standard subjected to reduced inverse discrete cosine transform An adder 135 for adding the resolution image and the motion compensated reference image; and a frame memo for temporarily storing the reference image. And 136, and a motion compensation unit 137 for the motion compensation in 1/4 pixel precision in the reference picture frame memory 136 and stored.
[0017]
The reduced inverse discrete cosine transform device 134 of the conventional image decoding device shown in this document 2 performs inverse discrete cosine transform using only the coefficients of the low frequency component among all the coefficients in the DCT block. The position of the coefficient for performing the inverse discrete cosine transform differs between the mode and the field DCT mode.
[0018]
Specifically, in the case of the field DCT mode, the reduced inverse discrete cosine transform device 134, as shown in FIG. 12, only 4 × 4 coefficients in the low band among 8 × 8 in the DCT block. Perform inverse discrete cosine transform on. On the other hand, in the case of the frame DCT mode, the reduced inverse discrete cosine transform device 134 has 4 × 2 coefficients + 4 × 2 coefficients among 8 × 8 coefficients in the DCT block, as shown in FIG. Only the inverse discrete cosine transform is performed.
[0019]
In addition, the motion compensation device 137 of the conventional image decoding device disclosed in this document 2 is based on the field motion prediction mode and the frame motion prediction mode based on the information (motion vector) of motion prediction performed on the high-resolution image. The motion compensation with 1/4 pixel accuracy corresponding to each of the above is performed. In other words, in normal MPEG2, it is determined that motion compensation is performed with 1/2 pixel accuracy, but when decoding a standard resolution image from a high resolution image, the number of pixels in the picture is reduced to 1/2. Therefore, the motion compensation device 137 performs motion compensation by setting the pixel accuracy of motion compensation to ¼ pixel accuracy.
[0020]
Therefore, in order to perform motion compensation corresponding to the high resolution image, the motion compensation device 137 performs linear interpolation on the reference image pixels stored in the frame memory 136 as a standard resolution image, and has a 1/4 pixel accuracy. A pixel is generated.
[0021]
Specifically, the linear interpolation processing of pixels in the vertical direction in the field motion prediction mode and the frame motion prediction mode will be described with reference to FIGS. 14 and 15. In the drawing, the phase of pixels in the vertical direction is shown in the vertical direction, and the phase in which each pixel of the display image is located is shown as an integer.
[0022]
First, an interpolation process for an image subjected to motion prediction in the field motion prediction mode will be described with reference to FIG. For a high-resolution image (upper layer), as shown in FIG. 14A, motion compensation is performed with a 1/2 pixel accuracy independently for each field. On the other hand, for a standard resolution image (lower layer), as shown in FIG. 14B, linear interpolation is performed in the field based on integer precision pixels, and 1/4 pixel, 1 / 2 pixels and 3/4 pixels out of phase are generated and motion compensation is performed. In other words, in the standard resolution image (lower layer), each pixel having the ¼ pixel accuracy of the top field is generated by linear interpolation based on each pixel having the integer accuracy of the top field, and the bottom of each pixel having the integer accuracy of the bottom field is generated. Each pixel with 1/4 pixel accuracy of the field is generated by linear interpolation. For example, the value of the pixel in the top field where the vertical phase is 0 is a, and the value of the pixel in the top field where the vertical phase is 1 is b. In this case, the top field pixel whose vertical phase is 1/4 is (3a + b) / 4, and the top field pixel whose vertical phase is 1/2 is (a + b) / 2. The pixel in the top field at the position where the vertical phase is 3/4 is (a + 3b) / 4.
[0023]
Next, an interpolation process for an image subjected to motion prediction in the frame motion prediction mode will be described with reference to FIG. For the high resolution image (upper layer), as shown in FIG. 15A, interpolation processing is performed between the fields, that is, interpolation processing is performed between the bottom field and the top field, and 1/2 Motion compensation is performed with pixel accuracy. For the standard resolution image (lower layer), as shown in FIG. 15B, based on the integer precision pixels of the two fields of the top field and the bottom field, 1/4 pixel, 1 / Pixels whose phases are shifted by 2 pixels and 3/4 pixels are generated by linear interpolation, and motion compensation is performed. For example, the value of the bottom field pixel whose vertical phase is -1 is set to a, the value of the top field pixel whose vertical phase is 0 is set to b, and the vertical phase is set to 1 Let c be the value of a pixel in a certain bottom field, d be the value of a pixel in the top field where the vertical phase is at position 2, and e be the value of the pixel in the bottom field where the vertical phase is at position 3. In this case, each pixel with 1/4 pixel accuracy whose vertical phase is between 0 and 2 is obtained as follows.
[0024]
The pixel whose vertical phase is ¼ is (a + 4b + 3c) / 8. A pixel having a vertical phase of 1/2 is (a + 3c) / 4. A pixel whose vertical phase is 3/4 is (a + 2b + 3c + 2d) / 8. A pixel having a vertical phase of 5/4 is (2b + 3c + 2d + e) / 8. A pixel whose vertical phase is 3/2 is (3c + e) / 4. A pixel whose vertical phase is 7/4 is (3c + 4d + e) / 8.
[0025]
As described above, the conventional image decoding apparatus disclosed in Document 2 can decode compressed image data of a high resolution image corresponding to the interlace scanning method into standard resolution image data.
[0026]
However, in the conventional image decoding device shown in the above-mentioned document 2, the phase of each pixel of the standard resolution image obtained in the field DCT mode is shifted from that of each pixel of the standard resolution obtained in the frame DCT mode. Specifically, in the field DCT mode, as shown in FIG. 16, the vertical phase of each pixel in the top field of the lower layer is 1/2, 5/2. The vertical phase of the pixels is 1, 3,. On the other hand, in the frame DCT mode, as shown in FIG. 17, the vertical phase of each pixel in the top field of the lower layer is 0, 2,. The phase is 1, 3,. Therefore, images having different phases are mixed in the frame memory 136, and the image quality of the output image is deteriorated.
[0027]
Further, in the conventional image decoding device disclosed in the above-mentioned document 2, the phase shift is not corrected in the field motion prediction mode and the frame motion prediction mode. Therefore, the image quality of the output image is deteriorated.
[0028]
[Problems to be solved by the invention]
An image decoding apparatus for solving such a problem has been proposed in Japanese Patent Application No. 10-208385.
[0029]
Next, the image decoding apparatus proposed in Japanese Patent Application No. 10-208385 will be described.
[0030]
The image decoding apparatus 200 proposed in Japanese Patent Application No. 10-208385 shown in FIG. 18 receives a bit stream obtained by compressing a high-resolution image having, for example, 1152 effective lines in the vertical direction by MPEG2 and is input. This is a device that decodes the bit stream and reduces the resolution to ½, and outputs a standard resolution image having, for example, 576 effective lines in the vertical direction.
[0031]
Hereinafter, the high resolution image is also referred to as an upper layer, and the standard resolution image is also referred to as a lower layer. In general, when a DCT block having 8 × 8 discrete cosine coefficients is subjected to inverse discrete cosine transform, decoded data composed of 8 × 8 pixels can be obtained. For example, 8 × 8 discrete cosine coefficients are The process of performing inverse discrete cosine transform and reducing the resolution so as to obtain decoded data composed of 4 × 4 pixels by decoding is called reduced inverse discrete cosine transform.
[0032]
The image decoding apparatus 200 is supplied with a compressed bit stream of a high-resolution image, and performs a variable-length encoding that assigns a code length according to the frequency of data generation, and a bit stream analysis apparatus 201 that analyzes the bit stream. Further, the variable length code decoding device 202 for decoding the bit stream, the inverse quantization device 203 for applying a quantization step to each coefficient of the DCT block, and the reduced inverse for the DCT block subjected to the discrete cosine transform in the field DCT mode. Field mode reduced inverse discrete cosine transform device 204 for generating a standard resolution image by performing a discrete cosine transform, and a standard resolution image by performing a reduced inverse discrete cosine transform on a DCT block that has been subjected to discrete cosine transform in a frame DCT mode. Reduced Inverse Discrete Cosine Transform for Frame Mode 205, an adder 206 for adding the standard resolution image subjected to the reduced inverse discrete cosine transform and the reference image subjected to motion compensation, a frame memory 207 for temporarily storing the reference image, and a reference image stored in the frame memory 207 A field mode motion compensation device 208 that performs motion compensation corresponding to the field motion prediction mode, a frame mode motion compensation device 209 that performs motion compensation corresponding to the frame motion prediction mode on the reference image stored in the frame memory 207, and An image frame that outputs image data of a standard resolution for display on a television monitor or the like by performing post filtering on the image stored in the frame memory 207 to convert the image frame and correcting the phase shift of the pixel. And a conversion / phase shift correction device 210.
[0033]
The reduced inverse discrete cosine transform device 204 for field mode is used when a macroblock of the input bit stream is subjected to discrete cosine transform in the field DCT mode. The reduced inverse discrete cosine transform device 204 for field mode uses a DCT block having 8 × 8 coefficients in a macroblock subjected to discrete cosine transform in the field DCT mode, as shown in FIG. Inverse discrete cosine transform is performed only on low-frequency 4 × 4 coefficients. That is, the reduced inverse discrete cosine transform is performed based on the four discrete cosine coefficients in the horizontal and vertical low bands. The reduced inverse discrete cosine transform device 204 for field mode can decode a standard resolution image in which one DCT block is composed of 4 × 4 pixels by performing the reduced inverse discrete cosine transform as described above. . As shown in FIG. 19, the phase of each pixel of the decoded image data is 1/2, 5/2..., And the vertical phase of each pixel in the bottom field is vertical. The phase of the direction is 1, 3,. That is, in the decoded top layer of the lower layer, the first pixel (the pixel whose phase is ½) is the first and second pixels (the pixels whose phase is 0 and 2) from the top of the top field of the upper layer. The phase of the second pixel from the top (pixel having a phase of 5/2) is the middle phase of the third and fourth pixels from the top of the top field of the upper layer (pixels having a phase of 4 and 6). It becomes. In the bottom layer of the decoded lower layer, the phase of the first pixel (pixel having a phase of 1) is intermediate between the first and second pixels (pixels having a phase of 1 and 3) from the top of the bottom field of the upper layer. The phase of the second pixel from the top (pixel having a phase of 3) is an intermediate phase between the third and fourth pixels (pixels having a phase of 5 and 7) from the top of the bottom field of the upper layer.
[0034]
The reduced inverse discrete cosine transform device 205 for frame mode is used when the macroblock of the input bit stream is subjected to discrete cosine transform in the frame DCT mode. The frame mode reduced inverse discrete cosine transform device 205 performs a reduced inverse discrete cosine transform on a DCT block in which 8 × 8 coefficients in a macroblock subjected to discrete cosine transform in the frame DCT mode are indicated. The reduced inverse discrete cosine transform device 205 for frame mode decodes a resolution image in which one DCT block is composed of 4 × 4 pixels, and the standard resolution generated by the reduced inverse discrete cosine transform device 204 for field mode. An image having the same phase as the pixel phase of the image is generated. That is, as shown in FIG. 19, the phase of each pixel of the image data decoded by the frame mode reduced inverse discrete cosine transform device 205 is 1/2, 5/2. The vertical phase of each pixel in the bottom field is 1, 3,.
[0035]
Details of the processing of the reduced inverse discrete cosine transform device 205 for frame mode will be described later.
[0036]
When the macroblock subjected to the reduced inverse discrete cosine transform 204 by the field mode reduced inverse discrete cosine transform device 204 or the frame mode reduced inverse discrete cosine transform device 205 is an intra image, the adder 206 directly converts the intra image into a frame. Store in the memory 207. Further, when the macroblock subjected to the reduced inverse discrete cosine transform 204 by the field mode reduced inverse discrete cosine transform device 204 or the frame mode reduced inverse discrete cosine transform device 205 is an inter image, the adder 206 is an inter image. In addition, the reference image subjected to motion compensation by the field mode motion compensation device 208 or the frame mode motion compensation device 209 is synthesized and stored in the frame memory 207.
[0037]
The field mode motion compensation device 208 is used when the motion prediction mode of the macroblock is the field motion prediction mode. The field mode motion compensator 208 has a 1/4 pixel accuracy with respect to the reference image of the standard resolution image stored in the frame memory 207 in consideration of the phase shift component between the top field and the bottom field. Interpolation processing is performed, and motion compensation corresponding to the field motion prediction mode is performed. The reference image subjected to motion compensation by the field mode motion compensator 208 is supplied to the adder 206 and synthesized with the inter image.
[0038]
The frame mode motion compensation apparatus 209 is used when the macroblock motion prediction mode is the frame motion prediction mode. The frame mode motion compensator 209 has a 1/4 pixel accuracy with respect to the reference image of the standard resolution image stored in the frame memory 207 in consideration of the phase shift component between the top field and the bottom field. Interpolation processing is performed, and motion compensation corresponding to the frame motion prediction mode is performed. The reference image that has been subjected to motion compensation by the frame mode motion compensation device 209 is supplied to the addition device 206 and is combined with the inter image.
[0039]
The image frame conversion / phase shift correction device 210 is supplied with a standard resolution reference image stored in the frame memory 207 or an image synthesized by the addition device 206, and this image is post-filtered between the top field and the bottom field. The phase shift component is corrected and the image frame is converted so as to conform to the standard definition television standard. That is, in the image frame conversion / phase shift correction device 210, the vertical phase of each pixel in the top field is 1/2, 5/2..., And the vertical phase of each pixel in the bottom field is 1, 3 For example, the vertical phase of each pixel in the top field is 0, 2, 4,..., And the vertical phase of each pixel in the bottom field is 1, 3, 5,. -Correct so that Further, the image frame conversion / phase shift correction device 210 reduces the image frame of the high-resolution television standard to 1/4 and converts it to a standard-resolution television standard image frame.
[0040]
The image decoding apparatus 200 proposed in Japanese Patent Application No. 10-208385 has the above-described configuration, thereby decoding a bit stream obtained by compressing a high-resolution image with MPEG2 and reducing the resolution to ½. Standard resolution images can be output.
[0041]
Next, the processing contents of the reduced inverse discrete cosine transform device 205 for frame mode will be described in more detail.
[0042]
As shown in FIG. 20, a bit stream obtained by compressing and encoding a high-resolution image is input to the reduced inverse discrete cosine transform device for frame mode 205 in units of one DCT block.
[0043]
First, in step S1, the discrete cosine coefficient y of this one DCT block (the coefficient in the vertical direction of all the discrete cosine coefficients of the DCT block is expressed as y ₁ ~ Y ₈ As shown in the figure. ) Is subjected to 8 × 8 inverse discrete cosine transform (IDCT 8 × 8). By performing inverse discrete cosine transform, 8 × 8 decoded pixel data x (vertical pixel data of all the pixel data of the DCT block is converted to x ₁ ~ X ₈ As shown in the figure. ) Can be obtained.
[0044]
Subsequently, in step S2, the 8 × 8 pixel data x is alternately extracted for each line in the vertical direction, and a 4 × 4 top field pixel block corresponding to interlaced scanning and 4 corresponding to interlaced scanning. The pixel block of the x4 bottom field is separated into two pixel blocks. That is, the pixel data x of the first line in the vertical direction ₁ And pixel data x on the third line _Three And pixel data x on the fifth line _Five And pixel data x of the seventh line ₇ And a pixel block corresponding to the top field is generated. Also, pixel data x of the second line in the vertical direction ₂ And pixel data x on the fourth line _Four And pixel data x of the sixth line ₆ And pixel data x of the eighth line ₈ And a pixel block corresponding to the bottom field is generated. The process of separating each pixel of the DCT block into two pixel blocks corresponding to interlaced scanning is hereinafter referred to as field separation.
[0045]
Subsequently, in step S3, 4 × 4 discrete cosine transform (DCT4 × 4) is performed on each of the two pixel blocks separated in the field.
[0046]
Subsequently, in step S4, the discrete cosine coefficient z of the pixel block corresponding to the top field obtained by performing the 4 × 4 discrete cosine transform (the discrete in the vertical direction among all the coefficients of the pixel block corresponding to the top field) The cosine coefficient is z ₁ , Z _Three , Z _Five , Z ₇ As shown in the figure. ) Is a pixel block composed of 2 × 2 discrete cosine coefficients. Also, the discrete cosine coefficient z of the pixel block corresponding to the bottom field obtained by the 4 × 4 discrete cosine transform (the discrete cosine coefficient in the vertical direction among all the coefficients of the pixel block corresponding to the bottom field is z ₂ , Z _Four , Z ₆ , Z ₈ As shown in the figure. ) Is a pixel block composed of 2 × 2 discrete cosine coefficients.
[0047]
Subsequently, in step S5, 2 × 2 inverse discrete cosine transform (IDCT2 × 2) is performed on the pixel block obtained by thinning out the discrete cosine coefficients of the high frequency components. By performing 2 × 2 inverse discrete cosine transform, 2 × 2 decoded pixel data x ′ (vertical pixel data of all pixel data of the top field pixel block is converted to x ′ ₁ , X ′ _Three In the figure, the pixel data in the vertical direction among all the pixel data of the pixel block corresponding to the bottom field is x ′. ₂ , X ′ _Four As shown in the figure. ) Can be obtained.
[0048]
Subsequently, in step S6, the pixel data of the pixel block corresponding to the top field and the pixel data of the pixel block corresponding to the bottom field are alternately synthesized one line at a time in the vertical direction to obtain 4 × 4 pixel data. A DCT block having a reduced inverse discrete cosine transform composed of The process of alternately synthesizing the pixels of the two pixel blocks corresponding to the top field and the bottom field in the vertical direction is hereinafter referred to as frame synthesis.
[0049]
By performing the above steps S1 to S6, the reduced inverse discrete cosine transform unit 15 for frame mode 15 generates pixels of the standard resolution image generated by the reduced inverse discrete cosine transform device 204 for field mode as shown in FIG. It is possible to generate a 4 × 4 DCT block composed of pixels in the same phase.
[0050]
Next, the field mode motion compensation device 208 and the frame mode motion compensation device 209 will be described in more detail.
[0051]
First, the interpolation processing performed by the field mode motion compensation device 208 will be described. In the field mode motion compensator 208, as will be described below, the pixels of the standard resolution image stored in the frame memory 207 are interpolated so as to correspond to the motion compensation of 1/2 pixel accuracy of the high resolution image. Thus, a pixel with 1/4 pixel accuracy is generated.
[0052]
For pixels in the horizontal direction, an integer precision pixel is taken out from the frame memory 207, and two pixels are linearly interpolated to generate a ½ pixel precision pixel and a ¼ precision pixel.
[0053]
For the pixels in the vertical direction, as shown in FIG. 21A, the vertical phase of each pixel in the top field is 1/2, 5/2. The pixels of integer resolution of the standard resolution image including the phase shift between the top field and the bottom field such that the vertical phase is 1, 3,... Are extracted from the frame memory 207.
[0054]
Subsequently, for the pixels in the vertical direction, as shown in FIG. 21 (b), a linear interpolation filter is used to reduce the pixel accuracy of ½ pixel from the integer accuracy pixel extracted from the frame memory 207 within the field. Generate a pixel. In other words, a half-field precision pixel of the top field is generated based on the top-field integer precision pixel, and a bottom-field half-pixel precision pixel is generated based on the bottom field integer precision pixel. For example, as shown in FIG. 21 (b), a top-field pixel having a vertical phase position of 7/2 is linearly interpolated from a top-field pixel having a position of 5/2, 9/2. To be generated. Also, the bottom-field pixel whose vertical phase is at position 4 is generated by linear interpolation from the bottom-field pixel at

positions

3 and 5. It should be noted that this pixel generation with 1/2 pixel accuracy may use a double interpolation filter such as a half-band filter instead of a linear interpolation filter.
[0055]
Subsequently, for vertical pixels, as shown in FIG. 21C, a linear interpolation filter is used to generate 1/4 pixel accuracy pixels from 1/2 pixel accuracy pixels within the field. To do. That is, a pixel having a ¼ pixel accuracy in the top field is generated based on the pixel having a ½ pixel accuracy in the top field, and a pixel having a ¼ pixel accuracy in the bottom field is generated based on the ½ pixel accuracy pixel in the bottom field. Is generated. For example, as shown in FIG. 21 (c), the top field pixel whose vertical phase is at 9/4 is linearly interpolated from the top field pixel at 2, 5/2. Generated. Also, the bottom field pixel whose vertical phase is at the position of 10/4 is generated by linear interpolation from the bottom field pixel at the position of 9/4, 11/4.
[0056]
Instead of performing linear interpolation in two steps, a 1/4 precision pixel may be generated directly from integer precision pixels using a quadruple linear interpolation filter.
[0057]
Next, an interpolation process performed by the frame mode motion compensation apparatus 209 will be described. In the frame mode motion compensation device 209, as described below, the pixels of the standard resolution image stored in the frame memory 207 are interpolated so as to correspond to the motion compensation of 1/2 pixel accuracy of the high resolution image. Thus, a pixel with 1/4 pixel accuracy is generated.
[0058]
For pixels in the horizontal direction, as in the field mode motion compensation device 208 described above, an integer precision pixel is extracted from the frame memory 207, and the two pixels are linearly interpolated to obtain a half pixel precision pixel, and , Generating 1/4 precision pixels.
[0059]
For the pixels in the vertical direction, first, as shown in FIG. 22A, the vertical phase of each pixel in the top field becomes 1/2, 5/2. The pixels of integer resolution of the standard resolution image including the phase shift between the top field and the bottom field such that the vertical phase is 1, 3,... Are extracted from the frame memory 207.
[0060]
Subsequently, for the pixels in the vertical direction, as shown in FIG. 22 (b), using a linear interpolation filter, the pixels of integer precision extracted from the frame precision memory 207 are extracted from the frame memory 207 within the field. Generate a pixel. In other words, a half-field precision pixel of the top field is generated based on the top-field integer precision pixel, and a bottom-field half-pixel precision pixel is generated based on the bottom field integer precision pixel. For example, as shown in FIG. 22 (b), a top-field pixel having a vertical phase position of 7/2 is linearly interpolated from a top-field pixel having a position of 5/2, 9/2. To be generated. Also, the bottom-field pixel whose vertical phase is at position 4 is generated by linear interpolation from the bottom-field pixel at

positions

3 and 5.
[0061]
Subsequently, as shown in FIG. 22 (c), for a pixel in the vertical direction, a linear interpolation filter is used to change the pixel from a pixel with 1/2 pixel accuracy between two fields of the top field and the bottom field. A pixel with / 4 pixel accuracy is generated. For example, as shown in FIG. 22C, a pixel whose vertical phase is ¼ is a top field pixel at a 0 position and a bottom field pixel at a ½ position. Is generated by linear interpolation. Also, a pixel whose vertical phase is 3/4 is generated by linear interpolation from a bottom field pixel at 1/2 position and a top field pixel at 1 position.
[0062]
FIG. 23 shows a block configuration of the field mode motion compensation device 208 and the frame mode motion compensation device 209 that perform the above processing.
[0063]
As shown in FIG. 23, the field mode motion compensation device 208 and the frame mode motion compensation device 209 include an address generation device 222, an input memory 223, a vertical direction interpolation processing unit 224, and a vertical direction filter coefficient storage memory. 225, an intermediate memory 226, a horizontal direction interpolation processing unit 227, and a horizontal direction filter coefficient storage memory 228.
[0064]
The address generation unit 222 receives motion vector information. Based on this motion vector information, the address generation unit 222 generates address information indicating the vertical and horizontal positions of the pixel to be interpolated. Based on the generated address information, the address generation unit 222 extracts integer precision pixels of the standard resolution image from the frame memory 207 and sends them to the input memory 223.
[0065]
Further, the address generation unit 222 sends the input motion vector information to the vertical filter coefficient storage memory 225 and the horizontal filter coefficient storage memory 228.
[0066]
The vertical direction filter coefficient storage memory 225 stores four types of one-dimensional filter coefficients in the case of the field mode motion compensation device 208, and stores four types of one-dimensional filter coefficients in the case of the frame mode motion compensation device 209. . In this apparatus, in the case of the field motion prediction mode, as shown in FIG. 21C, pixels whose phases are 0, 0.25, 0.5, and 0.75 are generated with respect to the reference image. In the frame motion prediction mode, the phase is 0, 0.25, 0.5, 0... With respect to the reference image as shown in FIG. This is because pixels with 75, 1, 1.25, 1.5, and 1.75 are generated and motion compensation with 1/4 pixel accuracy is performed.
[0067]
Regardless of the difference between the field mode motion compensation device 208 and the frame mode motion compensation device 209, four types of one-dimensional filter coefficients are stored in the horizontal filter coefficient storage memory 228.
[0068]
The vertical filter coefficient storage memory 225 and the horizontal filter coefficient storage memory 228 send filter coefficients corresponding to the transmitted motion vector information to the vertical direction interpolation processing unit 224 and the horizontal direction interpolation processing unit 227.
[0069]
The vertical direction interpolation processing unit 224 performs one-dimensional pixel interpolation in the vertical direction on the integer precision pixel data (reference image macroblock) stored in the input memory 223 using the sent filter coefficient. . The macroblock of the reference image that has undergone vertical pixel interpolation is stored in the intermediate memory 226.
[0070]
The horizontal direction interpolation processing unit 227 performs one-dimensional pixel interpolation in the horizontal direction on the pixel data subjected to vertical pixel interpolation stored in the intermediate memory 226 using the sent filter coefficient. The macroblock of the reference image subjected to the pixel interpolation in the horizontal direction is sent to the adder 206 as a motion-compensated reference image, and the compressed image data subjected to the reduced inverse discrete cosine transform is added.
[0071]
In the image decoding apparatus 200 proposed in Japanese Patent Application No. 10-208385 as described above, motion compensation is performed with a 1/4 pixel accuracy in the horizontal direction and the vertical direction, so that the top field and the bottom field are between. A phase shift does not occur, so-called field inversion and field mix can be prevented, and deterioration of image quality due to motion compensation can be prevented.
[0072]
Incidentally, the image decoding apparatus 200 proposed in Japanese Patent Application No. 10-208385 as described above has the following problems.
[0073]
In the image decoding apparatus 200, when performing motion compensation, vertical pixel interpolation and horizontal pixel interpolation are performed separately. Therefore, in this image decoding apparatus 200, the intermediate result must be stored in the memory and read again, and an extra memory area is required, and the access amount to the memory increases and the processing time increases. It was.
[0074]
The present invention has been made in view of such circumstances, and can eliminate the pixel phase shift between the field orthogonal transform mode and the frame orthogonal transform mode without impairing the interlaced property of the interlaced scanning image. An image decoding apparatus and an image decoding method for decoding standard resolution image data from compressed image data of a high resolution image, the image decoding apparatus and the image decoding method simplifying the processing with a simple configuration at the time of motion compensation The purpose is to provide.
[0075]
[Means for Solving the Problems]
The image decoding apparatus according to the present invention includes predictive coding by performing motion prediction in units of predetermined pixel blocks (macroblocks), and compressed code by performing orthogonal transform in units of predetermined pixel blocks (orthogonal transform blocks). An image decoding apparatus for decoding moving image data having a second resolution lower than the first resolution from compressed compressed image data having a first resolution, and an orthogonal transform method (field orthogonal transform) corresponding to interlaced scanning The first inverse orthogonal transform means that performs inverse orthogonal transform on the orthogonal transform block of the compressed image data that has been orthogonally transformed by the mode) and the orthogonal transform method (frame orthogonal transform mode) that supports sequential scanning. A second inverse orthogonal transform unit that performs inverse orthogonal transform on the orthogonal transform block of the compressed image data that has been transformed; and the first inverse orthogonal transform. Or adding means for adding the compressed image data subjected to inverse orthogonal transformation by the second inverse orthogonal transformation means and the reference image data subjected to motion compensation, and outputting moving image data of the second resolution; Storage means for storing moving image data output from the adding means as reference image data, and 1/4 pixel precision with respect to the vertical and horizontal directions of the macroblock of the reference image data stored in the storage means Motion compensation means for performing motion compensation, wherein the first inverse orthogonal transform means performs inverse orthogonal transform on a coefficient of a low frequency component among the coefficients of the orthogonal transform block, and performs the second inverse orthogonal transform. The transform means performs inverse orthogonal transform on the coefficients of all frequency components of the orthogonal transform block, and separates each pixel of the orthogonal transform block subjected to inverse orthogonal transform into two pixel blocks corresponding to interlaced scanning. The two separated pixel blocks are each subjected to orthogonal transformation, and each of the coefficients of the two pixel blocks subjected to orthogonal transformation is subjected to inverse orthogonal transformation with respect to the low-frequency component coefficient, and then subjected to inverse orthogonal transformation. The pixel block is synthesized to generate an orthogonal transform block, and the motion compensation means is perpendicular to the macroblock of the reference image data subjected to motion prediction by a motion prediction method (field motion prediction mode) corresponding to interlaced scanning. And a field two-dimensional filter coefficient group that performs pixel interpolation with 1/4 pixel accuracy in the horizontal direction, and a macroblock of reference image data that has been subjected to motion prediction by a motion prediction method (frame motion prediction mode) that supports sequential scanning A filter case for storing a two-dimensional filter coefficient group for a frame that performs pixel interpolation with 1/4 pixel accuracy in the vertical and horizontal directions A predetermined two-dimensional image stored in the filter storage unit based on a motion vector of compressed image data that has a storage unit and is inversely orthogonally transformed by the first inverse orthogonal transform unit or the second inverse orthogonal transform unit A filter coefficient is designated, and the macroblock of the reference image data stored in the storage means is interpolated using the designated two-dimensional filter coefficient.
[0076]
In the image decoding device according to the present invention, the two-dimensional filter coefficient group for the frame is within one field for each pixel in the horizontal direction of the macroblock of the reference image data stored in the storage unit. Perform quadruple interpolation, perform double interpolation within one field for each pixel in the vertical direction of the macroblock of the reference image data stored in the storage means, and perform double interpolation within one field A plurality of filter coefficients for performing linear interpolation between the top field and the bottom field for each pixel, and the field two-dimensional filter coefficient group is a macroblock of reference image data stored in the storage means For each horizontal pixel in the horizontal direction, four-times interpolation is performed within one field, and the macroblock of the reference image data stored in the storage means is stored. It is characterized by comprising a plurality of filter coefficients for performing double interpolation within one field for each pixel in the straight direction and linear interpolation for each pixel subjected to double interpolation within one field. .
[0077]
For example, in the frame two-dimensional filter coefficient group, the same filter coefficient is used in common. The motion compensation means groups the frame two-dimensional filter coefficient group using the objectivity of the coefficient in the vertical direction and the zero coefficient, and performs an interpolation process.
[0078]
The image decoding method according to the present invention includes predictive coding by performing motion prediction in units of predetermined pixel blocks (macroblocks), and compressed code by performing orthogonal transform in units of predetermined pixel blocks (orthogonal transform blocks). An image decoding method for decoding moving image data having a second resolution lower than the first resolution from compressed compressed image data having a first resolution, and an orthogonal transform method (field orthogonal transform) corresponding to interlaced scanning The first inverse orthogonal transform process that performs inverse orthogonal transform on the orthogonal transform block of the compressed image data that has been orthogonally transformed by the mode) and the orthogonal transform method (frame orthogonal transform mode) that supports sequential scanning. A second inverse orthogonal transform process for performing inverse orthogonal transform on the orthogonal transform block of the compressed image data that has been transformed, and the first inverse orthogonal transform. Adding the compressed image data subjected to the inverse orthogonal transform in the step or the second inverse orthogonal transform step and the reference image data subjected to motion compensation, and outputting moving image data of the second resolution; A storage step of storing the moving image data output in the addition step as reference image data, and a 1/4 pixel accuracy with respect to the vertical and horizontal directions of the macroblock of the reference image data stored in the storage step A motion compensation step for performing motion compensation, wherein the first inverse orthogonal transform step performs inverse orthogonal transform on the low-frequency component coefficient among the coefficients of the orthogonal transform block, and the second inverse orthogonal transform In the conversion step, inverse orthogonal transformation is performed on the coefficients of all frequency components of the orthogonal transformation block, and each pixel of the orthogonal transformation block subjected to inverse orthogonal transformation is divided into two pixel blocks corresponding to interlaced scanning. Then, each of the two separated pixel blocks is subjected to orthogonal transformation, and each of the coefficients of the two orthogonally transformed pixel blocks is subjected to inverse orthogonal transformation with respect to the low frequency component coefficient, and then subjected to inverse orthogonal transformation. Two pixel blocks are synthesized to generate an orthogonal transform block. In the motion compensation step, the block is perpendicular to the macroblock of the reference image data subjected to motion prediction by a motion prediction method (field motion prediction mode) corresponding to interlaced scanning. A field two-dimensional filter coefficient group that performs pixel interpolation with ¼ pixel accuracy in the horizontal and horizontal directions, and a macro of reference image data that has undergone motion prediction by a motion prediction method (frame motion prediction mode) that supports sequential scanning A filter that stores a group of two-dimensional filter coefficients for frames that perform pixel interpolation with 1/4 pixel accuracy in the vertical and horizontal directions for the block. Specifies a predetermined two-dimensional filter coefficient stored based on the motion vector of the compressed image data that has been inversely orthogonal transformed by the first inverse orthogonal transformation step or the second inverse orthogonal transformation step. Then, the macroblock of the stored reference image data is interpolated using the designated two-dimensional filter coefficient.
[0079]
In the image decoding method according to the present invention, the two-dimensional filter coefficient group for the frame is four times in one field with respect to each pixel in the horizontal direction of the macroblock of the reference image data stored in the storing step. Interpolation is performed, and for each pixel in the vertical direction of the macroblock of the reference image data stored in the storage step, double interpolation is performed in one field, and double interpolation is performed in one field. The field two-dimensional filter coefficient group is composed of a plurality of filter coefficients for performing linear interpolation between a top field and a bottom field with respect to a pixel. For each pixel in the direction, quadruple interpolation is performed within one field, and the vertical direction of the macroblock of the reference image data stored in the above storage step For each pixel, and a two-fold interpolation within one field, characterized in that it consists of a plurality of filter coefficients of linear interpolation for each pixel in which the two-fold interpolation within a single field.
[0080]
For example, in the frame two-dimensional filter coefficient group, the same filter coefficient is used in common. Also, in the motion compensation step, the frame two-dimensional filter coefficient group is grouped using the objectivity of the coefficient in the vertical direction and the zero coefficient, and interpolation processing is performed.
[0081]
In the present invention as described above, when performing pixel interpolation with 1/4 pixel accuracy, pixel interpolation in the vertical direction and the horizontal direction is collectively performed by the two-dimensional filter. Of the plurality of two-dimensional filters used in the frame motion prediction mode, the same matrix is used. Further, the processing is simplified by grouping using the objectivity of the coefficient in the vertical direction and the zero coefficient.
[0082]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0083]
FIG. 1 shows a block configuration diagram of an image decoding apparatus according to an embodiment of the present invention.
[0084]
The image decoding apparatus 10 shown in FIG. 1 receives a bitstream obtained by compressing an MPEG2 high-resolution image having, for example, 1152 effective lines in the vertical direction, decodes the input bitstream, and reduces it to ½. This is a device that reduces the resolution and outputs a standard resolution image having, for example, 576 effective lines in the vertical direction.
[0085]
The image decoding apparatus 10 is supplied with a bit stream of a compressed high-resolution image, and performs a variable-length encoding that assigns a code length according to the frequency of data generation, and a bit stream analysis apparatus 11 that analyzes the bit stream. Further, the variable length code decoding device 12 that decodes the bit stream, the inverse quantization device 13 that applies a quantization step to each coefficient of the DCT block, and the reduced inverse of the DCT block that has been subjected to discrete cosine transform in the field DCT mode. A reduced inverse discrete cosine transform device for field mode 14 that generates a standard resolution image by performing discrete cosine transform, and a standard resolution image by performing a reduced inverse discrete cosine transform on a DCT block that has undergone discrete cosine transform in the frame DCT mode. A reduced inverse discrete cosine transform device 15 for frame mode for generating An adder 16 that adds the standard resolution image subjected to the small inverse discrete cosine transform and the reference image subjected to motion compensation, a frame memory 17 that temporarily stores the reference image, and motion compensation in the reference image stored in the frame memory 17 The standard resolution for displaying the image on the television monitor or the like by performing post-filtering on the image stored in the frame compensator 18 and the motion compensation device 18 that performs the image frame conversion and correcting the phase shift of the pixel. And an image frame conversion / phase shift correction device 20 for outputting the image data.
[0086]
The reduced inverse discrete cosine transform device 14 for field mode is used when the macroblock of the input bit stream is subjected to discrete cosine transform in the field DCT mode. The reduced inverse cosine transform device 14 for field mode uses a DCT block in which 8 × 8 coefficients in a macroblock subjected to discrete cosine transform in the field DCT mode are shown, as shown in FIG. Inverse discrete cosine transform is performed only on low-frequency 4 × 4 coefficients. That is, the reduced inverse discrete cosine transform is performed based on the four discrete cosine coefficients in the horizontal and vertical low bands. The reduced inverse discrete cosine transform device 14 for field mode can decode a standard resolution image in which one DCT block is composed of 4 × 4 pixels by performing the reduced inverse discrete cosine transform as described above. . As shown in FIG. 19, the phase of each pixel of the decoded image data is 1/2, 5/2..., And the vertical phase of each pixel in the bottom field is vertical. The phase of the direction is 1, 3,. That is, in the decoded top layer of the lower layer, the first pixel (the pixel whose phase is ½) is the first and second pixels (the pixels whose phase is 0 and 2) from the top of the top field of the upper layer. The phase of the second pixel from the top (pixel having a phase of 5/2) is the middle phase of the third and fourth pixels from the top of the top field of the upper layer (pixels having a phase of 4 and 6). It becomes. In the bottom layer of the decoded lower layer, the phase of the first pixel (pixel having a phase of 1) is intermediate between the first and second pixels (pixels having a phase of 1 and 3) from the top of the bottom field of the upper layer. The phase of the second pixel from the top (pixel having a phase of 3) is an intermediate phase between the third and fourth pixels (pixels having a phase of 5 and 7) from the top of the bottom field of the upper layer.
[0087]
The frame mode reduced inverse discrete cosine transform device 15 is used when a macroblock of an input bit stream is subjected to discrete cosine transform in the frame DCT mode. The frame mode reduced inverse discrete cosine transform device 15 performs a reduced inverse discrete cosine transform on a DCT block in which 8 × 8 coefficients in a macroblock subjected to discrete cosine transform in the frame DCT mode are indicated. The reduced inverse discrete cosine transform device 15 for frame mode decodes a resolution image in which one DCT block is composed of 4 × 4 pixels, and the standard resolution generated by the reduced inverse discrete cosine transform device 14 for field mode. An image having the same phase as the pixel phase of the image is generated. That is, as shown in FIG. 19, the phase of each pixel of the image data decoded by the reduced inverse discrete cosine transform device 15 for frame mode is 1/2, 5/2. The vertical phase of each pixel in the bottom field is 1, 3,.
[0088]
The processing content of the reduced inverse discrete cosine transform device 15 for frame mode is the same as the reduced inverse discrete cosine transform device 205 for frame mode of the image decoding device 200 proposed in Japanese Patent Application No. 10-208385 described above. The details are omitted.
[0089]
When the macroblock subjected to the reduced inverse discrete cosine transform 14 by the field mode reduced inverse discrete cosine transform device 14 or the frame mode reduced inverse discrete cosine transform device 15 is an intra image, the adder 16 directly converts the intra image into a frame. Store in the memory 17. Further, when the macroblock subjected to the reduced inverse discrete cosine transform by the field mode reduced inverse discrete cosine transform device 14 or the frame mode reduced inverse discrete cosine transform device 15 is an inter image, the adder 16 also performs the inter image. In addition, the reference image subjected to motion compensation by the motion compensation device 18 is synthesized and stored in the frame memory 17.
[0090]
The motion compensator 18 interpolates the reference image of the standard resolution image stored in the frame memory 17 with 1/4 pixel accuracy in consideration of the phase shift component between the top field and the bottom field. And motion compensation corresponding to the field motion prediction mode is performed. The reference image that has been subjected to motion compensation by the motion compensation device 18 is supplied to the adding device 16 and synthesized with the inter image. Details of the processing of the motion compensation device 18 will be described later.
[0091]
The frame conversion / phase shift correction device 20 is supplied with a standard resolution reference image stored in the frame memory 17 or an image synthesized by the addition device 16, and post-filtering this image between the top field and the bottom field. The phase shift component is corrected and the image frame is converted so as to conform to the standard definition television standard. That is, in the image frame conversion / phase shift correction device 20, the vertical phase of each pixel in the top field is 1/2, 5/2..., And the vertical phase of each pixel in the bottom field is 1, 3 For example, the vertical phase of each pixel in the top field is 0, 2, 4,..., And the vertical phase of each pixel in the bottom field is 1, 3, 5,. -Correct so that Further, the image frame conversion / phase shift correction device 20 reduces the image frame of the high-resolution television standard to 1/4 and converts it to the standard-definition television standard image frame.
[0092]
With the above-described configuration, the image decoding apparatus 10 can decode a bitstream obtained by compressing a high-resolution image with MPEG2 and reduce the resolution to 1/2 to output a standard-resolution image. it can.
[0093]
Next, the motion compensation device 18 will be described in more detail.
[0094]
In this motion compensation device 18, as will be described below, the pixels of the standard resolution image stored in the frame memory 17 are interpolated so as to correspond to the motion compensation of 1/2 pixel accuracy of the high resolution image, A pixel with 1/4 pixel accuracy is generated.
[0095]
The motion compensator 18 performs vertical pixel interpolation and horizontal pixel interpolation using one two-dimensional filter coefficient. However, the phase of the pixel generated as a result of processing by the motion compensation device 18, that is, the phase of the pixel generated as a result of filtering by the motion compensation device 18, is the image decoding proposed in the above-mentioned Japanese Patent Application No. 10-208385. The result is the same as that processed by the motion compensation device of the device 200.
[0096]
That is, the motion compensation device 18 performs the following process in the field motion prediction mode.
[0097]
For pixels in the horizontal direction, two pixels with integer precision are linearly interpolated to generate ½ pixel precision pixels and ¼ precision pixels.
[0098]
For vertical pixels, first, the vertical phase of each pixel in the top field is 1/2, 5/2..., And the vertical phase of each pixel in the bottom field is 1, 3,. The integer precision pixels of the standard resolution image including the phase shift between the top field and the bottom field such that
[0099]
Subsequently, with respect to the pixels in the vertical direction, a pixel with 1/2 pixel accuracy is generated from the integer accuracy pixels extracted from the frame memory 17 in the field. In other words, a half-field precision pixel of the top field is generated based on the top-field integer precision pixel, and a bottom-field half-pixel precision pixel is generated based on the bottom field integer precision pixel.
[0100]
Subsequently, with respect to the pixels in the vertical direction, a pixel with 1/4 pixel accuracy is generated from a pixel with 1/2 pixel accuracy within the field. That is, a pixel having a ¼ pixel accuracy in the top field is generated based on the pixel having a ½ pixel accuracy in the top field, and a pixel having a ¼ pixel accuracy in the bottom field is generated based on the ½ pixel accuracy pixel in the bottom field. Is generated.
[0101]
FIG. 2 shows pixel interpolation with 1/4 pixel accuracy in the case of the field motion prediction mode as described above. In FIG. 2, ● indicates the phase position of the top field integer pixel accuracy pixel, ▲ indicates the top field 1/2 pixel accuracy pixel phase position, and ■ indicates the top field 1/4 pixel accuracy. The phase position of the pixel is shown. In addition, ◯ indicates the phase position of the bottom field integer pixel accuracy pixel, Δ indicates the top field 1/2 pixel accuracy pixel phase position, and □ indicates the top field 1/4 pixel accuracy pixel phase. Indicates the position.
[0102]
The motion compensator 18 performs pixel interpolation processing in the field motion prediction mode as described above using one two-dimensional interpolation filter, and directly generates 1/4 precision pixels from integer precision pixels.
Further, in the motion compensation device 18, in the frame motion prediction mode, processing is performed as follows.
[0103]
For the pixels in the horizontal direction, two pixels of integer precision pixels are linearly interpolated to generate 1/2 pixel precision pixels and 1/4 precision pixels.
[0104]
For vertical pixels, first, the vertical phase of each pixel in the top field is 1/2, 5/2..., And the vertical phase of each pixel in the bottom field is 1, 3,. The integer precision pixels of the standard resolution image including the phase shift between the top field and the bottom field such that
[0105]
Subsequently, with respect to the pixels in the vertical direction, a pixel with 1/2 pixel accuracy is generated from the integer accuracy pixels extracted from the frame memory 17 in the field. In other words, a half-field precision pixel of the top field is generated based on the top-field integer precision pixel, and a bottom-field half-pixel precision pixel is generated based on the bottom field integer precision pixel.
[0106]
Subsequently, with respect to the pixels in the vertical direction, a pixel with 1/4 pixel accuracy is generated from a pixel with 1/2 pixel accuracy between the two fields of the top field and the bottom field. For example, a pixel whose phase in the vertical direction is 1/4 is generated by linear interpolation from a top field pixel at 0 position and a bottom field pixel at 1/2 position. Also, a pixel whose vertical phase is 3/4 is generated by linear interpolation from a bottom field pixel at 1/2 position and a top field pixel at 1 position.
[0107]
FIG. 3 shows pixel interpolation with 1/4 pixel accuracy in the case of the frame motion prediction mode as described above. In FIG. 3, ● indicates the phase position of the top field integer pixel accuracy pixel, ▲ indicates the top field 1/2 pixel accuracy pixel phase position, and ■ indicates the top field 1/4 pixel accuracy. The phase position of the pixel is shown. In addition, ◯ indicates the phase position of the bottom field integer pixel accuracy pixel, Δ indicates the top field 1/2 pixel accuracy pixel phase position, and □ indicates the top field 1/4 pixel accuracy pixel phase. Indicates the position.
[0108]
The motion compensator 18 performs pixel interpolation processing in the frame motion prediction mode as described above using one two-dimensional interpolation filter, and directly generates 1/4 precision pixels from integer precision pixels.
[0109]
Next, the block configuration of the motion compensation device 18 is shown in FIG. 4, and the circuit configuration of the motion compensation device 18 and the filtering processing contents for pixel interpolation will be specifically described.
[0110]
As shown in FIG. 4, the motion compensation device 18 includes an address generation device 21, an input memory 22, a filter coefficient storage memory 23, and a two-dimensional interpolation processing unit 24.
[0111]
The address generation unit 21 receives motion vector information and mode information. The mode information is information indicating whether the motion compensation mode of the macroblock is the field motion prediction mode or the frame motion prediction mode.
[0112]
Based on the motion vector information, the address generation unit 21 generates address information indicating the vertical and horizontal positions of the pixel to be interpolated. Based on the generated address information, the address generation unit 21 extracts the integer precision pixels of the standard resolution image from the frame memory 17 in units of macroblocks, and sends them to the input memory 22.
[0113]
The address generator 21 sends the input motion vector information and mode information to the filter coefficient storage memory 23.
[0114]
The filter coefficient storage memory 23 stores a plurality of two-dimensional filter coefficients corresponding to the field motion prediction mode. FIG. 5 shows 16 types of two-dimensional filter coefficients when a linear filter is used. Each filter coefficient shown in FIG. 5 has phases 0, 0.25, 0.5, 0.75 in the vertical direction (V) and phases 0, 0.25, 0.5, 0. There are as many combinations as 75. That is, a total of 16 matrix coefficients corresponding to the field motion prediction mode, 4 coefficients in the vertical direction × 4 coefficients in the horizontal direction, are stored.
[0115]
The filter coefficient storage memory 23 also stores a plurality of two-dimensional filter coefficients corresponding to the frame motion prediction mode. Here, in the case of the frame motion prediction mode, the phases 0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75 in the vertical direction (V) and the horizontal direction (H ) In phase 0, 0.25, 0.5, and 0.75, that is, a total of 32 filter coefficients of 8 coefficients in the vertical direction × 4 coefficients in the horizontal direction are normally present. Become. However, in this filter coefficient storage memory 23, filter coefficients having the same matrix are used in common to increase the memory capacity.
[0116]
Specifically, the filter coefficients corresponding to the frame motion prediction mode are reduced in common as follows.
[0117]
As shown in FIG. 6, the filter coefficients corresponding to the frame motion prediction mode include a top field with a vertical phase of 0, a bottom field with a vertical phase of 0, a top field with a vertical phase of 0.5, All the bottom fields whose vertical phase is 1.5 have the same filter coefficient (group 1). The top field with a vertical phase of 0.25 and the bottom field with a vertical phase of 1.75 all have the same filter coefficient (group 2). The bottom field with a vertical phase of 0.5, the top field with a vertical phase of 1, the bottom field with a vertical phase of 1, and the top field with a vertical phase of 1.5 all have the same filter coefficient. (Group 3). A bottom field with a vertical phase of 0.25, a top field with a vertical phase of 0.75, a bottom field with a vertical phase of 1.25, and a top field with a vertical phase of 1.75 are all It becomes the same filter coefficient (group 4). The bottom field having a vertical phase of 0.75 and the top field having a vertical phase of 1.25 all have the same filter coefficient (group 5). Then, the same filter coefficients are grouped and used in common for pixel interpolation.
[0118]
Thus, by using the filter coefficients in common, the original 8 × 4 32 filter coefficients are reduced to 5 × 4 20 filter coefficients.
[0119]
FIG. 7 shows a two-dimensional filter coefficient for a frame motion prediction mode using a linear filter. Each filter coefficient shown in FIG. 7 is a combination of five groups 1 to 5 shown in FIG. 6 and horizontal (H) phases 0, 0.25, 0.5, and 0.75. There are 20 numbers, that is, 5 groups × 4 coefficients in the horizontal direction = 20.
[0120]
The filter coefficient storage memory 23 stores one filter coefficient out of the 16 filter coefficients shown in FIG. 5 or the 20 filter coefficients shown in FIG. 7 according to the motion vector information and mode information sent. Is sent to the two-dimensional interpolation processing unit 24.
[0121]
In the case of the field motion prediction mode, the two-dimensional interpolation processing unit 24 performs an inner product operation of the following expression 1 using the transmitted filter coefficient, and performs pixel interpolation on the macroblock.
[0122]
[Expression 1]

[0123]
In Equation 1, c is the filter coefficient (two-dimensional matrix) shown in FIG. x is the pixel data of the input macroblock.
[0124]
Then, the result (y) obtained by performing the inner product calculation according to Equation 1 is supplied to the adder 16 shown in FIG. 1 as pixel data that has been subjected to motion compensation with ¼ pixel accuracy.
[0125]
Further, in the case of the frame motion prediction mode, the two-dimensional interpolation processing unit 24 performs inner product calculation of the following Expression 2 using the sent filter coefficient, and performs pixel interpolation on the macroblock.
[0126]
[Expression 2]

[0127]
Here, the filter coefficient included in the group 1 performs an operation of outputting data that matches the sample point of the input data. Group 2 performs an operation of outputting data interpolated between the same lines in different fields. Group 3 performs an operation for outputting data interpolated between two lines in the same field. Group 4 performs an operation for outputting data interpolated between two lines of a certain field and data of one line of another field. Group 5 performs an operation of outputting data interpolated between data interpolated between two lines of a certain field and data interpolated between two lines of another field.
[0128]
By the way, in the frame motion prediction mode, it is possible to perform the inner product operation using Equation 1 as in the field motion prediction mode, but by performing the operation as shown in Equation 2 above, The number of multiplications can be reduced by grouping according to the nature and the arrangement of zero coefficients. In the inner product calculation formula shown in Formula 2, the number of lines required in the vertical direction is added in advance, and multiplication is performed only in the horizontal direction to perform the inner product calculation.
[0129]
As a result, since 2 lines have 0 coefficients in group 2 and group 3, only the addition of 2 lines in the vertical direction needs to be performed. The

groups

2 and 3 have different lines to be calculated, but the same filter coefficient can be used by designating the line to be calculated by the address generation circuit 21. In the group 4, the first line and the third line are half the coefficients of the second line, and using this objectivity, an operation of 2 × c × x is calculated as c × (x + x). Thus, it is possible to calculate by decomposing. In other words, the result of adding the coefficients in the vertical direction is the same in group 4 and group 5. By decomposing in this way, the same filter coefficient can be used in group 4 and group 5.
[0130]
As described above, in the image decoding apparatus 10 according to the embodiment of the present invention, a phase shift is generated between the top field and the bottom field by performing motion compensation with 1/4 pixel accuracy in the horizontal direction and the vertical direction. It does not occur, so-called field inversion and field mix can be prevented, and image quality deterioration due to motion compensation can be prevented.
[0131]
Furthermore, since the image decoding apparatus 10 performs a two-dimensional filter operation at the time of motion compensation with 1/4 pixel accuracy, it is possible to reduce the memory for storing intermediate results. Further, in the image decoding apparatus 10, the access amount to the memory can be reduced at the time of motion compensation with 1/4 pixel accuracy, and the processing time is shortened. Further, in this image decoding apparatus 10, by grouping the filter coefficients in the frame motion prediction mode, it is possible to minimize the code size at the time of frame prediction and prevent a cache miss or the like.
[0132]
In the image decoding device 10 according to the embodiment of the present invention, an example in which motion compensation is performed using a two-dimensional linear interpolation filter has been described. However, for example, other than a half-band filter or the like having an increased number of filter taps. These filters may be used.
[0133]
【The invention's effect】
In the image decoding apparatus and the image decoding method according to the present invention, two pixel blocks corresponding to interlaced scanning by performing inverse orthogonal transformation on the coefficients of all frequency components of the orthogonal transformation block which has been orthogonally transformed by the frame orthogonal transformation mode. Then, the two separated pixel blocks are orthogonally transformed to perform inverse orthogonal transformation on the low frequency component coefficients, and the two pixel blocks subjected to inverse orthogonal transformation are synthesized. In the present invention, each pixel of the macroblock of the stored reference image data is interpolated to generate a macroblock composed of 1/4 pixel precision pixels. In this image decoding method, moving image data having a second resolution lower than the first resolution is output.
[0134]
As a result, the present invention can reduce the amount of computation and the storage capacity required for decoding, eliminate the phase shift of the pixels during motion compensation in the field motion prediction mode and the frame motion prediction mode, and achieve motion compensation. It is possible to prevent the deterioration of the image quality caused by it.
[0135]
Further, in the present invention, when performing pixel interpolation with 1/4 pixel accuracy, pixel interpolation in the vertical direction and horizontal direction is performed collectively by a two-dimensional filter. Of the plurality of two-dimensional filters used in the frame motion prediction mode, the same matrix is used. Further, the processing is simplified by grouping using the objectivity of the coefficient in the vertical direction and the zero coefficient.
[0136]
As a result, according to the present invention, it is possible to reduce the memory for storing the intermediate result in the motion compensation with ¼ pixel accuracy. Further, according to the present invention, the amount of access to the memory can be reduced at the time of motion compensation with 1/4 pixel accuracy, and the processing time is shortened. Further, according to the present invention, the code size at the time of frame prediction can be minimized and a cache miss or the like can be prevented.
[Brief description of the drawings]
FIG. 1 is a block diagram of an image decoding apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining pixel interpolation with ¼ pixel accuracy in the case of a field motion prediction mode.
FIG. 3 is a diagram for explaining pixel interpolation with ¼ pixel accuracy in the case of a field motion prediction mode.
FIG. 4 is a block diagram of a motion compensation device of the image decoding device.
FIG. 5 is a diagram illustrating an example of a two-dimensional filter coefficient corresponding to a field motion prediction mode.
FIG. 6 is a diagram for explaining grouping of two-dimensional filter coefficients corresponding to a frame motion prediction mode.
FIG. 7 is a diagram illustrating an example of a two-dimensional filter coefficient corresponding to a frame motion prediction mode.
FIG. 8 is a block diagram showing a conventional first down decoder.
FIG. 9 is a block diagram showing a conventional second down decoder.
FIG. 10 is a block diagram showing a conventional third down decoder.
FIG. 11 is a block diagram of a conventional image decoding device.
FIG. 12 is a diagram for explaining a reduced inverse discrete cosine transform process in the field DCT mode of the conventional image decoding apparatus.
FIG. 13 is a diagram for explaining a reduced inverse discrete cosine transform process in a field DCT mode of the conventional image decoding apparatus.
FIG. 14 is a diagram for describing linear interpolation processing in a field motion prediction mode of the conventional image decoding apparatus.
FIG. 15 is a diagram for describing linear interpolation processing in a frame motion prediction mode of the conventional image decoding apparatus.
FIG. 16 is a diagram for explaining a phase of a pixel obtained as a result of a field DCT mode of the conventional image decoding device.
FIG. 17 is a diagram for explaining a phase of a pixel obtained as a result of a frame DCT mode of the conventional image decoding device.
FIG. 18 is a block diagram of an image decoding apparatus proposed in Japanese Patent Application No. 10-208385.
FIG. 19 is a diagram for explaining the phase of a pixel in the vertical direction of a reference image stored in the frame memory of the image decoding apparatus proposed in Japanese Patent Application No. 10-208385.
FIG. 20 is a diagram for explaining the contents of one block processing of the frame mode reduced inverse discrete cosine transform device of the image decoding device proposed in Japanese Patent Application No. 10-208385.
FIG. 21 is a diagram for explaining a ¼ pixel interpolation process in the field motion prediction mode of the image decoding apparatus proposed in Japanese Patent Application No. 10-208385.
FIG. 22 is a diagram for explaining a ¼ pixel interpolation process in the frame motion prediction mode of the image decoding apparatus proposed in Japanese Patent Application No. 10-208385.
FIG. 23 is a partial look diagram of the motion compensation device of the image decoding device proposed in Japanese Patent Application No. 10-208385.
[Explanation of symbols]
10 image decoding device, 14 reduced inverse discrete cosine transform device, 15 frame mode reduced inverse discrete cosine transform device, 17 frame memory, 18 motion compensation device, 21 address generation unit, 22 input memory, 23 filter coefficient storage memory, 24 2 Dimension interpolation processing unit

Claims

Prediction coding by performing motion prediction in units of predetermined pixel blocks (macroblocks), and compression of the first resolution by compression encoding by performing orthogonal transform in units of predetermined pixel blocks (orthogonal transform blocks) In an image decoding apparatus for decoding moving image data having a second resolution lower than the first resolution from image data,
First inverse orthogonal transform means for performing inverse orthogonal transform on the orthogonal transform block of the compressed image data that has been orthogonally transformed by an orthogonal transform method (field orthogonal transform mode) corresponding to interlaced scanning;
A second inverse orthogonal transform unit that performs inverse orthogonal transform on the orthogonal transform block of the compressed image data that has been orthogonally transformed by the orthogonal transform method (frame orthogonal transform mode) corresponding to progressive scanning;
The compressed image data obtained by inverse orthogonal transformation by the first inverse orthogonal transformation unit or the second inverse orthogonal transformation unit and the reference image data subjected to motion compensation are added to obtain moving image data having a second resolution. Adding means for outputting
Storage means for storing moving image data output from the adding means as reference image data;
Motion compensation means for performing motion compensation with 1/4 pixel accuracy in the vertical and horizontal directions of the macroblock of the reference image data stored in the storage means,
The first inverse orthogonal transform means performs an inverse orthogonal transform on a coefficient of a low frequency component among the coefficients of the orthogonal transform block,
The second inverse orthogonal transform means performs two orthogonal transforms on each pixel of the orthogonal transform block subjected to the inverse orthogonal transform by performing an inverse orthogonal transform on the coefficients of all frequency components of the orthogonal transform block, and corresponding to the interlaced scanning. The two separated pixel blocks are orthogonally transformed, and the inverse frequency transformation is performed on the low frequency component coefficients of the coefficients of the two orthogonally transformed pixel blocks, and the inverse orthogonal transformation is performed. The two transformed pixel blocks are combined to generate an orthogonal transform block,
The motion compensation means performs pixel interpolation with 1/4 pixel accuracy in the vertical direction and the horizontal direction on the macroblock of the reference image data subjected to motion prediction by a motion prediction method (field motion prediction mode) corresponding to interlaced scanning. Two-dimensional filter coefficient group for field to be used, and 1/4 pixels in the vertical and horizontal directions with respect to the macroblock of the reference image data subjected to motion prediction by a motion prediction method (frame motion prediction mode) corresponding to progressive scanning A filter storage unit for storing a frame two-dimensional filter coefficient group for performing pixel interpolation with high accuracy, and the compressed image data subjected to inverse orthogonal transform by the first inverse orthogonal transform unit is based on the motion vector. the filter storage stored two-dimensional filter coefficient group for field unit by specifying a two-dimensional Fi for the specified field Using data coefficient group by interpolating the macroblocks of the reference image data in which the storage means stores, for inverse orthogonal transform compressed image data by the second inverse orthogonal transform means to the motion vector Based on the two-dimensional filter coefficient group for the frame stored in the filter storage unit, the macroblock of the reference image data stored in the storage means is interpolated using the specified two-dimensional filter coefficient group for the frame. An image decoding apparatus characterized by that.

Prediction coding by performing motion prediction in units of predetermined pixel blocks (macroblocks), and compression of the first resolution by compression encoding by performing orthogonal transform in units of predetermined pixel blocks (orthogonal transform blocks) In an image decoding method for decoding moving image data having a second resolution lower than the first resolution from image data,
A first inverse orthogonal transform step for performing an inverse orthogonal transform on the orthogonal transform block of the compressed image data subjected to the orthogonal transform by the orthogonal transform method (field orthogonal transform mode) corresponding to interlaced scanning;
A second inverse orthogonal transform step for performing an inverse orthogonal transform on the orthogonal transform block of the compressed image data subjected to the orthogonal transform by the orthogonal transform method (frame orthogonal transform mode) corresponding to the progressive scanning;
Moving image data having the second resolution is obtained by adding the compressed image data subjected to inverse orthogonal transformation in the first inverse orthogonal transformation step or the second inverse orthogonal transformation step and the reference image data subjected to motion compensation. An adding step for outputting
A storage step of storing the moving image data output in the addition step as reference image data;
A motion compensation step of performing motion compensation with 1/4 pixel accuracy in the vertical and horizontal directions of the macroblock of the reference image data stored in the storage step,
In the first inverse orthogonal transform step, an inverse orthogonal transform is performed on a coefficient of a low frequency component among the coefficients of the orthogonal transform block,
In the second inverse orthogonal transform step, two pixel blocks corresponding to interlaced scanning are performed on each pixel of the orthogonal transform block that has been subjected to inverse orthogonal transform on the coefficients of all the frequency components of the orthogonal transform block and subjected to inverse orthogonal transform. The two separated pixel blocks are orthogonally transformed, and the inverse frequency transformation is performed on the low frequency component coefficients of the coefficients of the two orthogonally transformed pixel blocks, and the inverse orthogonal transformation is performed. The two transformed pixel blocks are combined to generate an orthogonal transform block,
In the motion compensation step, pixel interpolation with 1/4 pixel accuracy in the vertical direction and the horizontal direction is performed on the macroblock of the reference image data subjected to motion prediction by the motion prediction method (field motion prediction mode) corresponding to interlaced scanning. Two-dimensional filter coefficient group for field to be used, and 1/4 pixels in the vertical and horizontal directions with respect to the macroblock of the reference image data subjected to motion prediction by a motion prediction method (frame motion prediction mode) corresponding to progressive scanning Based on the motion vector of the compressed image data subjected to inverse orthogonal transformation in the first inverse orthogonal transformation step from the filter storage unit storing the two-dimensional filter coefficient group for the frame that performs pixel interpolation with high accuracy. the two-dimensional filter coefficient group for field stored in the filter storage unit by specifying a two-dimensional for the specified field Using filter coefficient group by interpolating the macroblocks of the reference image data in which the storage means stores, for the above-mentioned second compressed image data inverse orthogonal transform is by an inverse orthogonal transform process on the motion vector image decoding method characterized by based specify the two-dimensional filter coefficient group for the stored frame, interpolating the macroblocks of the reference image data stored using the two-dimensional filter coefficient group for the specified frame.