JP3711572B2

JP3711572B2 - Image coding apparatus and method

Info

Publication number: JP3711572B2
Application number: JP23811194A
Authority: JP
Inventors: 聡三橋
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1994-09-30
Filing date: 1994-09-30
Publication date: 2005-11-02
Anticipated expiration: 2020-11-02
Also published as: JPH08102951A

Description

【０００１】
【産業上の利用分野】
本発明は、例えば画像を圧縮符号化する場合に用いて好適な画像符号化装置及び方法に関する。
【０００２】
【従来の技術】
従来の例えば画像を圧縮符号化する場合に用いて好適な画像符号化装置の構成例を図７に示す。
この図７の画像符号化装置において、入力端子１には、図８に示すように、
輝度成分(Y) 352(H)×240(V)×30フレーム
クロマ成分(Cb) 174(H)×120(V)×30フレーム
クロマ成分(Cr) 174(H)×120(V)×30フレーム
のピクセル数にディジタル化された画像データが供給される。
【０００３】
上記入力端子１に供給された入力画像データは、当該入力画像データを一時的に蓄えて然るべき順番に入れ替えるためのフレームメモリ１０を介して、動き検出器２０とブロック分割器１１に送られる。
当該ブロック分割器１１は、フレームメモリ１０から供給されたそれぞれのフレームを、図９に示すように、輝度成分(Y) ，クロマ成分(Cr),(Cb) それぞれを８×８ピクセルのブロックに分割する。なお、輝度成分(Y) の４つのブロック(Y0,Y1,Y2,Y3）と１つのクロマ成分(Cb)のブロックと、１つのクロマ成分(Cr)のブロックからなる合計６つのブロック(Y0,Y1,Y2,Y3,Cb,Cr) は、マクロブロック(MB)と呼ばれている。
【０００４】
このブロック分割器１１からのマクロブロック単位のデータは差分器１２に送られる。
この差分器１２では、ブロック分割器１１からのデータと後述するフレーム間予測画像データとの差分をとり、その出力を後述するフレーム間予測符号化がなされるフレームのデータとして切換スイッチ１３の被切換端子ｂに送る。また、当該切換スイッチ１３の被切換端子ａには、上記ブロック分割器１１からのデータが後述するフレーム内符号化がなされるフレームのデータとして供給される。
【０００５】
上記切換スイッチ１３を介したブロック単位のデータはＤＣＴ回路１４によって離散コサイン変換（ＤＣＴ）処理され、そのＤＣＴ係数が量子化器１５に送られる。当該量子化器１５では、所定の量子化ステップ幅で上記ＤＣＴ出力を量子化し、この量子化した係数がジグザグスキャン回路１６に送られる。
当該ジグザグスキャン回路１６では、上記量子化係数を図１０に示すようにいわゆるジグザグスキャンによって並べ換え、その出力を可変長符号化回路１７に送る。この可変長符号化回路１７では、上記ジグザグスキャン回路１６の出力データを可変長符号化（ＶＬＣ）し、その出力を出力バッファ１８に送ると共に、当該可変長符号化処理により発生した符号量を示す情報を、量子化ステップ制御器１９に送る。量子化ステップ制御器１９は、可変長符号化回路１７からの符号量を示す情報に基づいて量子化器１５の量子化ステップ幅を制御する。また、上記出力バッファ１８から出力されたデータは圧縮符号化がなされた符号化出力として出力端子２から出力される。
【０００６】
また、上記量子化器１５からの出力は、逆量子化器２７によって逆量子化され、さらに逆ＤＣＴ回路２６によって逆ＤＣＴ処理される。当該逆ＤＣＴ回路２６の出力は、加算器２５に送られる。
この加算器２５には、フレーム間予測符号化のフレームのときにオンとなる切換スイッチ２４を介した動き補償器２１からのフレーム間予測画像データも供給され、当該データと上記逆ＤＣＴ回路２６の出力データとの加算が行われる。この加算器２５の出力データは、フレームメモリ２２に一時的に蓄えられた後、動き補償器２１に送られる。
【０００７】
当該動き補償器２１は、上記動き検出器２０によって検出された動きベクトルに基づいて動き補償を行い、これによって得たフレーム間予測画像データを出力する。
以下、上記図７の従来の画像符号化装置の具体的な動作について詳細に説明する。ここで、説明のために以下のように各フレームの呼び名を定義する。
【０００８】
先ず、表示順にフレームを並べたとき、それぞれを
Ｉ０，Ｂ１，Ｂ２，Ｐ３，Ｂ４，Ｂ５，Ｐ６，Ｂ７，Ｂ８，Ｉ９，Ｂ１０，Ｂ１１，Ｂ１２，・・・・・
と呼ぶこととする。これらのフレームのうち、Ｉ，Ｐ，Ｂは、後に説明するが、圧縮方法の種類を示し、これらＩ，Ｐ，Ｂの次の数字は、単純に表示順を示している。
【０００９】
カラー動画像符号化方式の国際標準化作業グループであるいわゆるＭＰＥＧ（Moving Picture Expert Group)のうちＭＰＥＧ１では、この様な画像を圧縮するために、以下のようにすることが規定されている。
先ず、Ｉ０の画像を圧縮する。
次に、Ｐ３の画像を圧縮するのだが、Ｐ３そのものを圧縮するのではなく、Ｐ３とＩ０の画像との差分データを圧縮する。
【００１０】
その次に、Ｂ１の画像を圧縮するのだが、Ｂ１そのものを圧縮するのではなく、Ｂ１とＩ０或いは、Ｂ１とＰ３との差分データ或いはＩ０とＰ３の平均値との差分（いずれか情報の少ない方）を圧縮する。
その次に、Ｂ２の画像を圧縮するのだが、Ｂ２そのものを圧縮するのではなく、Ｂ２とＩ０或いは、Ｂ２とＰ３との差分データ或いはＩ０とＰ３の平均値との差分（どちらか情報の少ない方を選んで）を圧縮する。
【００１１】
次に、Ｐ６の画像を圧縮するのだが、Ｐ６そのものを圧縮するのではなく、Ｐ６とＰ３の画像との差分データを圧縮する。
上述したような処理を順番に並べて表すと、

となる。このようにエンコード順は、
Ｉ０，Ｐ３，Ｂ１，Ｂ２，Ｐ６，Ｂ４，Ｂ５，Ｐ９，Ｂ７，Ｂ８，Ｉ９，Ｐ１２，Ｂ１０，Ｂ１１，・・・・
のように、表示順とは順番が入れ替わる。圧縮後のデータ（符号化データ）はこの順番に並ぶことになる。
【００１２】
以下、上述したことを図７の構成の動作と共にさらに詳しく述べる。
１枚目の画像（すなわちＩ０）のエンコードでは、先ず、上記フレームメモリ１０から１枚目に圧縮すべき画像のデータが、ブロック分割器１１によってブロック化される。このブロック分割器１１からは、前記Ｙ０，Ｙ１，Ｙ２，Ｙ３，Ｃｂ，Ｃｒの順にブロック毎のデータが出力され、被切換端子ａ側に切り換えられた切換スイッチ１３を介してＤＣＴ回路１４に送られる。このＤＣＴ回路１４では、それぞれのブロックについて縦横２次元の離散コサイン変換処理を行う。これにより、時間軸であったデータが周波数軸に変換される。
【００１３】
このＤＣＴ回路１４からのＤＣＴ係数は、量子化器１５に送られ、当該量子化器１５で所定の量子化ステップ幅で量子化される。その後、ジグザグスキャン回路１６によって図１０のようにジグザグ順に並べ変えられる。このようにジグザグ順に並べると、後ろへ行くほど、その係数は周波数成分の高い係数となるから、一般的に係数の値は後ろの方が小さくなる傾向にある。したがって、ある値Ｓで量子化すると、後ろへ行くほど、その結果は０になる頻度が増し、結果的に高域の成分が切り落とされることになる。
【００１４】
その後、この量子化後の係数は、可変長符号化（ＶＬＣ）回路１７へ送られ、ここでいわゆるハフマンコーディングが施される。この結果得られる圧縮されたビットストリームは、出力バッファ１８に一旦蓄えられた後、一定のビットレートで送出される。当該出力バッファ１８は、不規則に発生するビットストリームを一定のビットレートで送出できるようにするための緩衝のためのメモリである。
【００１５】
以上の様に１枚の画像だけ単独で圧縮することをフレーム内（イントラ：Intra ）符号化と言い、この画像をＩピクチャと呼ぶ。
したがって、デコーダが上記のＩピクチャのビットストリームを受信した場合は、以上に述べたことを逆にたどり、１枚目の画像を完成させる。
次に、２枚目の画像（すなわちＰ３）のエンコードでは、以下のようになされる。
【００１６】
すなわち、この２枚目以降もＩピクチャとして圧縮してビットストリームを作っても良いが圧縮率を上げるには、連続する画像の内容には相関があることを利用して、以下の様な方法で圧縮する。
先ず、動き検出器２０では、２枚目の画像を構成するマクロブロック毎に、１枚目の画像（Ｉ０）の中からそれに良く似たパターンを捜し出し、それを動きベクトルという（ｘ，ｙ）の相対位置の座標として表現する。
【００１７】
また、２枚目の画像ではそれぞれのブロックを、上記Ｉピクチャの場合のようにそのままＤＣＴ回路１４に送るのではなく、そのブロック毎の動きベクトルに従って一枚目の画像から引っ張ってきたブロックとの差分のデータ（差分器１２による差分データ）を、ＤＣＴ回路１４へ送るようにする。なお、動きベクトルの検出方法としては、ＩＳＯ／ＩＥＣ 11172-2 annex D.6.2 に詳細に述べられているためここでは省略する。
【００１８】
ここで、例えば上記動きベクトルによって示された一枚目の画像のパターンと、これから圧縮しようとするブロックのパターンとの間で、相関が非常に強くなっていれば、その差分データは非常に小さくなり、したがって、上記フレーム内（イントラ）符号化で圧縮するよりも、上記動きベクトルと上記差分データとを符号化した方が、圧縮後のデータ量は小さくなる。
【００１９】
このような圧縮方法を、フレーム間（インター：Inter)予測符号化と呼んでいる。ただし、常に差分データが少なくなるわけではなく、絵柄（画像内容）によっては、差分を取るよりも、上記フレーム内符号化で圧縮した方が、圧縮率が上がる場合がある。このような場合は、上記フレーム内符号化で圧縮する。フレーム間予測符号化にするか、フレーム内符号化にするかは、マクロブロック毎に異なる。
【００２０】
以上のことを図７の画像符号化装置（エンコーダ）に即して説明すると、先ず、フレーム間予測符号化を行うためには、エンコーダ側でたえずデコーダ側で作られる画像と同じ画像を作って置く必要がある。
そのためにエンコーダ内には、デコーダと同じ回路が存在する。その回路をローカルデコーダ（局部復号器）と呼ぶ。図７の逆量子化器２７と逆ＤＣＴ回路２６と加算器２５とフレームメモリ２２と動き補償器２１が当該ローカルデコーダに対応し、フレームメモリ２２内に記憶される画像のことをローカルデコーデッドピクチャ（Local decoded picture)又はローカルデコーデッドデータ(Local decoded data)と呼ぶ。これに対して、圧縮前の画像のデータは、オリジナルピクチャ(Original picture)又はオリジナルデータ(Original data) と呼ぶ。
【００２１】
なお、前述した１枚目のＩピクチャの圧縮時にも、上記ローカルデコーダを通して復号化された１枚目の画像が、上記フレームメモリ２２内に格納される。ここで、注意すべきことは、このローカルデコーダによって得られる画像は、圧縮前の画像ではなく、圧縮後復元した画像であり、圧縮による画質劣化のある、デコーダが復号化する画像とまったく同じ画像であるということである。
【００２２】
このような状態のエンコーダに２枚目の画像（Ｐ３）のデータ(Original data）が入ってくるわけだが（この段階ですでに、動きベクトルは検出済でなければならない）、データはブロック毎に動きベクトルを持ち、このベクトルが動き補償器（MC:Motion Compensation）２１に与えられる。当該動き補償回路２１は、その動きベクトルの示すローカルデコーデッドピクチャ上のデータ（動き補償データ：MC data:１マクロブロック）を上記フレーム間予測画像データとして出力する。
【００２３】
上記２枚目のオリジナルデータとこの動き補償データ（フレーム間予測画像データ）のピクセル毎の、差分器１２による差分データが、上記ＤＣＴ回路１４に入力される。それからの後の圧縮方法は、基本的にＩピクチャと同じである。上述のような圧縮方法によって圧縮する画像をＰピクチャ（Predicted picture)と呼ぶ。
【００２４】
さらに詳しく説明すると、Ｐピクチャにおいてすべてのマクロブロックがフレーム間予測符号化で圧縮するとは限らず、フレーム内符号化で圧縮する方が効率が良いと判断されるときは、そのマクロブロックは当該フレーム内符号化で符号化を行う。
すなわち、Ｐピクチャにおいても、マクロブロック毎に、フレーム内符号化によるか（このマクロブロックをイントラマクロブロックと呼ぶ）、又はフレーム間予測符号化によるか（このマクロブロックをインターマクロブロックと呼ぶ）のどちらかを選択して圧縮を行う。
【００２５】
上述のように、上記ローカルデコーダでは、量子化器１５の出力が、逆量子化器２７で逆量子化され、さらに逆ＤＣＴ回路２６で逆ＤＣＴ処理された後、エンコード時に動き補償データ（MC data ）と足され最終的なローカルデコーデッドピクチャとなる。
次に、３枚目の画像（すなわちＢ１）のエンコードでは、以下のようになされる。
【００２６】
上記３枚目の画像（Ｂ１）のエンコードでは、Ｉ０，Ｐ３の２枚の画像それぞれに対する動きベクトルを探索する。ここで、Ｉ０に対する動きベクトルをフォワードベクトル（forward Vector）ＭＶｆ(x,y) と呼び、Ｐ３に対する動きベクトルをバックワードベクトル（Backward Vector)ＭＶｂ(x,y) と呼ぶ。
この３枚目の画像についても差分データを圧縮するわけであるが、どのデータを圧縮するのかが、問題である。この場合も一番情報量が少なくなるものとの差分を取れば良い。このときの圧縮方法の選択肢としては、
（１）フォワードベクトルＭＶｆ(x,y) の示すＩ０上のデータとの差分
（２）バックワードベクトルＭＶｂ(x,y) の示すＰ３上のデータとの差分
（３）フォワードベクトルＭＶｆ(x,y) の示すＩ０上のデータとバックワードベクトルＭＶｂ(x,y) の示すＰ３上のデータの平均値との差分
（４）差分データは使わない（フレーム内符号化）
の４つである。マクロブロック毎にこの４種類の圧縮方法から一つが選択される。上記圧縮方法の選択肢のうちの（１），（２），（３）の場合は、それぞれの動きベクトルも動き補償器２１に送られ、差分器２１ではその動き補償データとの差分をとり、これがＤＣＴ回路１４に送られる。上記圧縮方法の選択肢のうちの（４）の場合は、そのままのデータがＤＣＴ回路１４へ送られる。
【００２７】
上述した１枚目、２枚目のエンコードの処理の結果、ローカルデコーデッドピクチャを格納するフレームメモリ２２には、Ｉ０，Ｐ３の２枚のピクチャが、復元されているのでこのようなことが可能である。
次に、４枚目の画像（すなわちＢ２）のエンコードでは、以下のようになされる。
【００２８】
上記４枚目の画像（Ｂ２）のエンコードでは、上述した３枚目（Ｂ１）のエンコード方法のところの説明文で、Ｂ１をＢ２に置き換えたこと以外は、上記３枚目のエンコードと同じ方法で圧縮する。
次に、５枚目の画像（すなわちＰ６）のエンコードでは、以下のようになされる。
【００２９】
上記５枚目の画像（Ｐ６）のエンコードでは、上述した２枚目（Ｐ３）のエンコード方法のところの説明文で、Ｐ３をＰ６に、Ｉ０をＰ３に置き換えただけで、他は同じ説明となる。
６枚目以降は、上述の繰り返しとなるので説明は省略する。
また、ＭＰＥＧにおいては、ＧＯＰ（Group Of Picture）と呼ばれるものが規定されている。
【００３０】
すなわち、何枚かのピクチャの集まりがグループオブピクチャ（ＧＯＰ）と呼ばれており、当該ＧＯＰは符号化データ（圧縮後のデータ）上で見て連続した画像の集まりでなくてはならないものである。また、ＧＯＰはランダムアクセスを考慮したもので、そのためには符号化データ上で見てＧＯＰの最初に来るピクチャは上記Ｉピクチャである必要がある。さらに、表示順（ディスプレイ順）でＧＯＰの最後は、Ｉ又はＰピクチャでなくてはならない。
【００３１】
図１１には、最初が４枚のピクチャからなるＧＯＰで、それ以降が６枚のピクチャからなるＧＯＰとなる例を挙げる。なお、図１１のＡはディスプレイ順を示し、図１１のＢは符号化データ順を示している。
この図１１において、ＧＯＰ２に注目すると、Ｂ４，Ｂ５はＰ３，Ｉ６から形成されるため、例えばランダムアクセスでＩ６にアクセスされると、Ｐ３が無いため、Ｂ４，Ｂ５は正しく復号化できない。このようにＧＯＰ内だけで正しく復号化できないＧＯＰは、クローズドＧＯＰ（Closed GOP）でないという。
【００３２】
これに対し、もしＢ４，Ｂ５がＩ６だけしか参照していないとしならば、例えばランダムアクセスでＩ６にアクセスしても、Ｐ３は必要ないため、これらＢ４，Ｂ５は正しく復号化できることになる。このようにＧＯＰ内だけの情報で、完全に復号化できるＧＯＰをクローズドＧＯＰ（Closed GOP）と呼ぶ。
以上のような圧縮方法の選択の中から一番効率の良い方法で圧縮するわけであるが、その結果発生する符号化データ（Coded data）の量は、入力画像にも依存し、実際に圧縮してみないと判らない。
【００３３】
しかし、圧縮後のデータのビットレートを一定にするためにコントロールすることも必要である。当該コントロールを行うためのパラメータは、量子化器１５に与える前記符号量を表す情報としての量子化ステップ（又は量子化スケール：Q-scale ）である。同じ圧縮方法でも、上記量子化ステップを大きくすれば発生ビット量は減り、小さくすれば増える。
【００３４】
この量子化ステップの値は、次のようにして制御する。
エンコーダには、圧縮後のデータを一定のビットレートにするために、出力に緩衝バッファ（出力バッファ１８）が設けられており、これによって画像毎のある程度のデータ発生量の差は吸収できるようになされている。
しかし、定められたビットレートを超えるようなデータの発生が続けば、出力バッファ１８の残量が増加し、ついにはオーバーフローを起こすことになる。逆にビットレートを下回るデータの発生が続けば出力バッファ１８の残量は減少し、最後にはアンダーフローを引き起こすことになる。
【００３５】
したがって、エンコーダは、出力バッファ１８の残量をフィードバックすることにより、前記量子化ステップ制御器１９が量子化器１５の量子化ステップをコントロールし、ここで出力バッファ１８の残量が少なくなればあまり圧縮しないように量子化ステップを小さくなるよう制御し、出力バッファ１８の残量が多くなれば圧縮率を高くするように量子化ステップを大きくするようにコントロールを行うようにしている。
【００３６】
また、前述した圧縮方法（前記フレーム内符号化やフレーム間予測符号化）によって発生する符号化データ量の範囲には、大きな差がある。
特にフレーム内符号化方式で圧縮をすると大量のデータが発生するため、出力バッファ１８の空き容量が小さい場合には量子化ステップサイズを大きくしなければならず、場合によっては量子化ステップサイズを最大にしてもバッファ１８のオーバーフローを招くかもしれない。よしんばバッファ１８に収まったとしても量子化ステップが大きければフレーム内符号化の画像は後のフレーム間予測符号化の画質に影響するので、フレーム内符号化での圧縮を行う前には出力バッファ１８に十分な空き容量が必要である。
【００３７】
したがって、予め定められた順序の圧縮方法を決めておき、フレーム内符号化の前には十分な出力バッファ１８の空き容量を確保するように、量子化ステップ制御器１９は量子化ステップサイズのフィードバックコントロールを行うようにしている。
以上のようにして一定レートの符号化データに抑えることが可能となる。
【００３８】
【発明が解決しようとする課題】
上述した従来の方法では、以下の理由により高画質を得られないことが欠点となっている。
すなわち、時々刻々情報量の変化する入力画像を一定のビットレートで平均的に高画質に圧縮するためには、出力バッファによって低ビットレートを維持できる範囲でかつ画質が均質になるように、情報量の多い画像（絵）には多めの圧縮データを許し、情報量の少ない画像には少なめの圧縮データにすることが必要だが、次のような場合に従来の方法ではそれができない。
【００３９】
例えば、情報量の少ない画像が連続し、そのあとで急に情報量の多い画像が入ってくる場合を考えると、先に供給される情報量の少ない画像に対しては量子化ステップをあまり小さくし過ぎず、その後に続く情報量が多い画像が符号化されるまで出力バッファの残量を低く保つべきであるのに、前述した出力バッファ残量をフィードバックする方式では、上記情報量が少ない画像が連続するうちに出力バッファの残量を増加させてしまうようになる。
【００４０】
逆に、情報量が多い画像の後に情報量の少ない画像が続く場合では、先に供給される情報量の多い画像を大きな量子化ステップで圧縮して出力バッファの残量を減らさなくても、その後に続くのは情報量の少ない画像なのでオーバーフローし難いはずであるが、上記出力バッファ残量フィードバック方式では、続く画像の情報量がわからないためバッファの残量を減らす方向、すなわち量子化ステップを大きくする方向に制御し、画質を低下させてしまう。
【００４１】
このようなことから、例えば、入力画像の情報量を評価し、この評価値に基づいて量子化ステップを制御するような構成も考えられる。
ところが、上記入力画像の情報量の評価値を求めるような機構を備えた画像符号化装置において、例えば１枚の入力画像を圧縮する際には、当該入力情報を圧縮した後に得られることになるデータに対して使用可能な割当量を、当該入力画像の情報量（難易度）に応じて配当することになるが、そのときの量子化器の量子化ステップを、当該割当量に応じて精度良く予測する必要がある。
【００４２】
ここで、もしも、上記予測した量子化ステップが適当でない場合には、上記圧縮後のデータに対して使用可能な割当量を大幅に割り込んだり、逆にオーバーしたりしてしまうことになる。このように割当量を大幅に割り込んだり、オーバーしたりすると、他のピクチャの圧縮の際の割当量に影響を与えてしまうことになる。
【００４３】
すなわち例えば、割当量が少なくなったフレームでは、量子化ステップが大きくなり、したがって画質が低下するようになる。このため、例えば連続的に見て均等な画質のフレームが続かなくなり、全体的に見ても画質が悪い印象になってしまう。また、上記予測が大幅にずれると、最悪の場合、バッファのアンダーフローやオーバーフローを招くことになる。
【００４４】
ここで、そのようにならないようにするために、例えば、１画面内で圧縮後の情報量と予定割当量と圧縮の進捗の画面内での割合で量子化ステップを制御していたとしても、基本の量子化ステップの予測が外れると画面内での量子化ステップの大きな変動が起こるようになる。このように画面内での量子化ステップの大きな変動が起こると、上記圧縮はラスタースキャン順になされるものなので、画面上で帯状に画質の不均等な部分が認識され、画質低下をもたらすようになる。
【００４５】
そこで、本発明は、上述のような実情に鑑みて提案されたものであり、効率の良い画像圧縮が可能で、全体的に画質を向上させることができる画像符号化装置及び方法を提供することを目的とするものである。
【００４６】
【課題を解決するための手段】
本発明の画像符号化装置は、上述した目的を達成するために提案されたものであり、入力画像データを複数枚蓄える画像データ蓄積手段と、上記画像データ蓄積手段に蓄積された複数枚の画像データから、当該入力画像データの情報量を評価するための画像自身の情報量を示す第１のパラメータ、画像の差分情報量を示す第２のパラメータ及び画像カウントのための画像情報を出力する画像情報評価手段と、上記画像データ蓄積手段に蓄積された複数枚の画像データの画像間の相関情報として、上記画像情報評価手段からの上記第２のパラメータを用いてシーンチェンジを検出する画像間相関検出手段と、画像データに直交変換処理を施し、直交変換係数を生成する直交変換手段と、上記直交変換手段により生成された直交変換係数を、所定の量子化ステップで量子化する量子化手段と、上記画像情報評価手段によって得られた上記画像情報と上記画像間相関検出手段からの画像間の相関情報であるシーンチェンジの検出出力とに基づいて、上記画像情報のカウント値によりフレーム内符号化を定期的に選択すると共に上記シーンチェンジの検出時にもフレーム内符号化を選択し、それら以外ではフレーム間予測符号化を選択する圧縮方法選択手段と、上記圧縮方法選択手段が選択した圧縮方法で１画面分の画像データを圧縮することにより得られる予定圧縮データ量とマクロブロックタイプに応じて上記第１、第２のパラメータのどちらかを加算し１画面分合計することにより求められた難易度とから、上記量子化手段における量子化の際の基本量子化ステップを予測する量子化ステップ制御手段とを有し、上記量子化ステップ制御手段は、上記１画面分の画像データの予定圧縮データ量をallocated_bit とし、上記難易度をdifficultyとし、上記基本量子化ステップをQ_scale とするとき、予め定められたパラメータＡ、Ｂを用いて、
Q_scale ＝ exp((log(allocated_bit/difficulty)-B)/A)
の式により基本量子化ステップQ_scale を求めることを特徴とするものである。
また、本発明の画像符号化方法は、入力画像データを複数枚蓄える画像データ蓄積手段に蓄積された複数枚の画像データから、当該入力画像データの情報量を評価するための画像自身の情報量を示す第１のパラメータ、画像の差分情報量を示す第２のパラメータ及び画像カウントのための画像情報を出力する画像情報評価工程と、上記画像データ蓄積手段に蓄積された複数枚の画像データの画像間の相関情報として、上記画像情報評価手段からの上記第２のパラメータを用いてシーンチェンジを検出する画像間相関検出工程と、上記画像情報評価工程にて得られた画像情報と上記画像間相関検出工程にて得られた画像間の相関情報であるシーンチェンジの検出出力とに基づいて、上記画像情報のカウント値によりフレーム内符号化を定期的に選択すると共に上記シーンチェンジの検出時にもフレーム内符号化を選択し、それら以外ではフレーム間予測符号化を選択する圧縮方法選択工程と、画像データに直交変換処理を施し、直交変換係数を生成する直交変換工程と、上記圧縮方法選択工程にて選択された圧縮方法で１画面分の画像データを圧縮することにより得られる予定圧縮データ量とマクロブロックタイプに応じて上記第１、第２のパラメータのどちらかを加算し１画面分合計することにより求められた難易度とから、量子化の際の基本量子化ステップを予測する量子化ステップ制御工程と、上記直交変換工程にて生成された直交変換係数を、上記所定の量子化ステップで量子化する量子化工程とを有し、上記量子化ステップ制御工程では、上記１画面分の画像データの予定圧縮データ量をallocated_bit とし、上記難易度をdifficultyとし、上記基本量子化ステップをQ_scale とするとき、予め定められたパラメータＡ、Ｂを用いて、
Q_scale ＝ exp((log(allocated_bit/difficulty)-B)/A)
の式により基本量子化ステップQ_scale を求めることを特徴とするものである。
【００４７】
ここで、上記量子化ステップ制御手段は、実際に圧縮に使用した量子化ステップと圧縮後のデータ量と上記評価値の関係を学習し、当該学習結果に応じて上記基本量子化ステップの予測を行う。また、上記量子化ステップ制御手段は、画像データを複数に分割したマクロブロック毎の上記評価値を１画面分合計して合計評価値を求め、当該合計評価値を上記基本量子化ステップの予測に使用する。このとき、上記画像情報評価手段は、動き検出による動きベクトルに応じた参照画像のマクロブロックの画素データと入力画像のマクロブロックの画素データとの差分のマクロブロック毎の絶対値和を、上記評価値とする。
【００４９】
【作用】
本発明によれば、蓄積した複数枚の画像データから情報量を評価し、さらに画像間の相関を検出し、情報量の評価値と画像間の相関情報とに基づいて適応的に画像データの圧縮方法を選択し、選択した圧縮方法で１画面分の画像データを圧縮することにより得られる予定圧縮データ量と評価値とから基本量子化ステップを予測することで、基本量子化ステップの予測精度を高めている。
【００５０】
また、本発明によれば、実際に圧縮に使用した量子化ステップと圧縮後のデータ量と評価値の関係を学習し、当該学習結果に応じて基本量子化ステップの予測を行うことで、入力画像の変動に追従させるようにしている。
【００５２】
【実施例】
以下、図面を参照し、本発明の実施例について詳述する。
図１には本発明実施例の画像符号化装置の概略構成を示す。なお、この図１において、前述した図７と同じ構成については同一の指示符号を付してその説明については省略する。
【００５３】
なお、この図１の構成において、前記図７の構成に追加された構成要素は画像情報評価回路５０とシーンチェンジ検出回路３１と圧縮方法選択回路３２と動きベクトル発生回路３３であり、また、動き検出器３８と量子化ステップ制御器３９とフレームメモリ４０とが変更されている。
すなわち、本発明実施例の画像符号化装置は、入力画像データを複数枚蓄える画像データ蓄積手段としてのフレームメモリ４０と、上記フレームメモリ４０に蓄積された複数枚の画像データから、当該入力画像データの情報量を評価する画像情報評価回路５０と、上記フレームメモリ４０に蓄積された複数枚の画像データから画像間の相関を検出する画像間相関検出手段としてのシーンチェンジ検出回路３１と、画像データに直交変換処理（ＤＣＴ処理）を施してそのＤＣＴ係数を生成するＤＣＴ回路１４と、上記ＤＣＴ回路１４によって生成されたＤＣＴ係数を所定の量子化ステップで量子化する量子化器１５と、上記画像情報評価回路５０によって得られた情報量の評価値と上記シーンチェンジ検出回路３１からの画像間の相関情報（シーンチェンジ検出出力）とに基づいて適応的に画像データの圧縮方法（ピクチャタイプ，マクロブロックタイプ，ＧＯＰ長）を選択する圧縮方法選択回路３２と、上記圧縮方法選択回路３２が選択した圧縮方法で１画面分の画像データを圧縮することにより得られる予定圧縮データ量と上記評価値とから、上記量子化器１５における量子化の際の基本量子化ステップを予測する量子化ステップ制御器３９とを有することを特徴とする。
【００５４】
この図１において、先ず、入力端子１から入力された入力画像データは、フレームメモリ４０に蓄えられる。このフレームメモリ４０は、図３のフレームメモリ１０とは異なり、所定数のフレームを蓄積できるものである。このときの蓄積する所定数としては、多過ぎるとフレームメモリ４０が大規模になってしまうので好ましくない。上記所定数として効率的な長さ（フレーム数）は、ビットレートと出力バッファ１８の容量、フレーム内符号化の圧縮方式の画像同士の間隔（ほとんどの場合ＧＯＰの長さといっても差し支えない）に大きく依存する。これは圧縮方法及び圧縮率の違いから生ずる圧縮データの大きさのむらを上記出力バッファ１８によって吸収し、定ビットレートにすることができる範囲が、上記ビットレート及び出力バッファ容量とフレーム内符号化がなされる画像同士の間隔等の条件によって制約されるからである。
【００５５】
ところで、一般的にフレーム内符号化方式で圧縮することは定期的に行われる（これがＧＯＰの区切りになることが多い）ものであり、このフレーム内符号化の圧縮方式は当該圧縮後のデータ量が他の方式（フレーム間予測符号化）に比べてかなり大きいものである。このため、当該フレーム内符号化による圧縮画像同士（或いはＧＯＰ）の間隔で情報量を調べ、データ量の配分をするのは、一つの合理的な方法である。
【００５６】
しかし、本実施例の方式では、後述するようにシーンチェンジ等によって前後の画像の相関が著しく低くなった場合にもフレーム内符号化方式で圧縮するようにしており、このようにシーンチェンジ部分でフレーム内符号化を行うようにすると、例えば、当該シーンチェンジに基づくフレーム内符号化画像の近傍に前記定期的なフレーム内符号化がきた場合、当該定期的に行われるフレーム内符号化の画像に対しては、定ビットレート或いは均質な画質の維持が困難になるため、フレーム内符号化による圧縮である必然性を失い、当該フレーム内符号化で圧縮することを取り止める必要がでてくる。
【００５７】
したがって、上記フレームメモリ４０の記憶可能な容量（上記所定数）は、上述のようにシーンチェンジが上記定期的に行われるはずであるフレーム内符号化の画像の近傍にくる場合があることを考慮して、当該定期的にフレーム内符号化で圧縮を行う周期の２倍程度とすることが適当である。
もちろん、上記所定数は一例であり、これに限定されることはなく様々な条件に合わせて変更することは可能である。
【００５８】
上記フレームメモリ４０に蓄積された画像データは、適宜、画像情報評価回路５０に送られる。
ここで、当該画像情報評価回路５０は、大別して２通りのパラメータを算出するものである。
第１のパラメータは、フレーム内符号化で圧縮を行った場合の圧縮後のデータ量を予測することが可能なように、その画像自身の情報量を示すものである。この第１のパラメータとしては、例えば、フレームメモリ４０から供給された画像データに対して、ＤＣＴ処理をブロック毎に行い、そのＤＣＴ係数の和や統計をとったものとしたり、また、それでは規模が大きくなる場合には、平均自乗誤差のブロック毎の和を求めたものとする。いずれにしても、当該画像情報評価回路５０では、画像の情報量を表し、圧縮後のデーター量を類推するに足るパラメータを算出する。
【００５９】
第２のパラメータは、フレーム間予測符号化で圧縮を行った場合の圧縮後のデータ量を予測することが可能な、画像の差分情報量を示すものである。この場合のパラメータとしては、例えば、フレームメモリ４０に格納された画像と動き補償後の画像との差分値のブロック内の和を用いる。このパラメータ算出の際には、一般的な動きベクトル検出回路（動き検出器３８及び動きベクトル発生回路３３）で得られる動きベクトルが検出された最小誤差を利用することができる。
【００６０】
このとき、フレーム間予測符号化による圧縮後のデータ量の類推（予測）のためのパラメータとしては、一般的な輝度情報だけの動きベクトル検出回路（動き検出器３８及び動きベクトル発生回路３３）で得られる動きベクトル及びその動きベクトルが検出された輝度情報だけで求めた最小誤差に加えて、本実施例ではその動きベクトルが検出された色差情報だけで求めた最小誤差を新たに用いるようにする。
【００６１】
本実施例装置では、このようにして求めた輝度情報からの最小誤差と色差情報からの最小誤差を用いて、そのマクロブロックの誤差とし、当該マクロブロックの誤差を用いて後述する圧縮方法選択回路３２で圧縮方法の判定を行なう。
上記画像情報評価回路５０によって、上述したようにして算出された画像情報の評価値（パラメータ）は、次に説明するシーンチェンジ検出回路３１と、圧縮方法選択回路３２と、量子化ステップ制御器３９とに送られる。
【００６２】
また、画像情報評価回路５０からは、後述する圧縮方法選択回路３２においてＧＯＰの長さを決定する際に画像のカウントを行うため、その圧縮方法選択回路３２に対して画像情報も送られる。
次に、シーンチェンジ検出回路３１は、上記画像情報評価回路５０の出力（例えば第２のパラメータ）を用いてシーンチェンジを検出するものである。
【００６３】
ここで、当該シーンチェンジ検出回路３１においてシーンチェンジを検出する目的は、フレーム間予測符号化かフレーム内符号化のいずれかの圧縮方式を決定するための判断材料にすることが主である。それは、シーンチェンジ部分のように前後で相関の極めて低い画像では、フレーム間予測符号化で圧縮するよりもフレーム内符号化で圧縮する方が効率良く圧縮できるからである。また、シーンチェンジ部分では、圧縮後のデータも大きなものとなるため、データ量配分や出力バッファマネジメントの観点からも当該シーンチェンジを把握することは重要である。
【００６４】
上述のようなシーンチェンジは前後の画像で相関が著しく損なわれる所に存在するものであるため、当該シーンチェンジ部分は、例えば、前後の画像についてそれぞれ例えば動きベクトル補償後の画像との差分値を求め、それぞれこの差分値の画像全体での総和を求めて、さらに当該前後の画像での上記総和の比を求めるなどして検出できる。
【００６５】
このようなことから、本実施例のシーンチェンジ検出回路３１では、上記画像情報評価回路５０の出力を用いてシーンチェンジを検出するようにしている。すなわち、上記画像情報評価回路５０は、前述のように動き補償後の画像の差分値のブロック内の和を第２のパラメータとして出力するため、当該シーンチェンジ検出回路３１では、当該差分値のブロック内の和を用いて、上述のシーンチェンジ検出のための演算を行うことができる。
【００６６】
次に、圧縮方法選択回路３２について説明する。
当該圧縮方法選択回路３２は、上記シーンチェンジ検出回路３１からのシーンチェンジ検出出力と、画像情報評価回路５０からの画像情報をカウントしたカウント値と、前記輝度情報と色差情報からそれぞれ求めた第２のパラメータ（最小誤差）とに基づいて、フレーム内符号化／フレーム間予測符号化（Ｐ，Ｂピクチャ）のいずれの圧縮方式で圧縮を行うのかを選択する回路である。
【００６７】
すなわち当該圧縮方法選択回路３２では、上記画像情報評価回路５０によって得られるマクロブロック毎の輝度情報及び色差情報から求めたパラメータに基づいて、マクロブロック毎のフレーム内符号化／フレーム間予測符号化の各圧縮方法によって発生するデータの予想量（発生予想量）を比較し、より発生予想量の少なくなる圧縮方法を選択するようにしている。
【００６８】
また、フレーム内符号化方式による圧縮画像は少なくともＧＯＰの最初になければならない。さらに、ＧＯＰはランダムアクセスを考慮してある程度の間隔となされているので、必然的にＩピクチャは当該間隔で定期的に発生するものであり、また、本実施例ではシーンチェンジ等によっても発生するものである。
このようなことから、当該圧縮方法選択回路３２では、上記画像情報評価回路５０からの画像情報のカウントを行うと共に、上記シーンチェンジ検出回路３１からのシーンチェンジ検出出力が当該圧縮方法選択回路３２に加えられる。これにより当該圧縮方法選択回路３２では、上記画像のカウント値から定期的なフレーム内符号化を選択すると共にシーンチェンジ検出時にもフレーム内符号化を選択（すなわちＧＯＰの間隔を決定する）し、それら以外ではフレーム間予測符号化を選択するようにしている。
【００６９】
この圧縮方法選択回路３２は、上記圧縮方法の選択に応じて前記切換スイッチ１３と２４の切換制御を行うと共に、その選択結果を示す情報を量子化ステップ制御器３９に送る。
量子化ステップ制御器３９は、可変長符号化回路１７からの符号量を示す情報に基づいて量子化器１５の量子化ステップ幅を制御する。また、上記出力バッファ１８から出力されたデータは圧縮符号化がなされた符号化出力として出力端子２から出力される。
【００７０】
また、上記量子化器１５からの出力は、逆量子化器２７によって逆量子化され、さらに逆ＤＣＴ回路２６によって逆ＤＣＴ処理される。当該逆ＤＣＴ回路２６の出力は、加算器２５に送られる。
この加算器２５には、フレーム間予測符号化のフレームのときにオンとなる切換スイッチ２４を介した動き補償器２１からのフレーム間予測画像データも供給され、当該データと上記逆ＤＣＴ回路２６の出力データとの加算が行われる。この加算器２５の出力データは、フレームメモリ２２に一時的に蓄えられた後、動き補償器２１に送られる。
【００７１】
当該動き補償器２１は、上記動き検出器３８によって検出され動きベクトル発生回路３３によって発生された動きベクトルに基づいて動き補償を行い、これによって得たフレーム間予測画像データを出力する。
また、上記量子化ステップ制御器３９は、前記画像情報評価回路５０からの評価値（パラメータ）から画像の情報量、さらにはシーンチェンジのように前後の画像の相関が極めて低くなる所を知ると共に、圧縮方法選択回路３２からの選択結果を示す情報からフレーム内符号化かフレーム間予測符号化のいずれが選択された画像であるかも知ることができる。
【００７２】
したがって、当該量子化ステップ制御器３９においては、出力バッファ１８の残量のみをフィードバックする従来の量子化ステップ制御に比べて、入力画像の急激な情報量変化に追随できることになり、また、画像の情報量の変化に応じて適切な量子化ステップ制御が可能で、さらに、フレーム内符号化／フレーム間予測符号化の圧縮方法に応じて適切な量子化ステップ制御も可能となる。
【００７３】
次に本実施例の構成における処理の流れを、図２のフローチャートに沿って説明する。
先ず、ステップＳ８１では、入力端子１に入力された画像データが順次フレームメモリ４０へ格納される。
ここで、先に述べたようにＩピクチャの頻度や間隔の決定が画質に影響を及ぼすので、これに関係して符号化に先だってＧＯＰを決めておく必要があり、また、レートコントロール（量子化ステップ制御によるビットレートのコントロール）をするために符号化に先だって１ＧＯＰ分の画像についての情報を収集しなければならない。このように、次々と入力されてくる画像データに対してその間に分析を行い、符号化するまでの十分な遅延時間を稼ぐため、大量のフレームメモリ４０を用いる。
【００７４】
次に、ステップＳ８２では、動き検出器３８及び動きベクトル発生回路３３によって、フレーム間予測符号化で圧縮するために必要な動きベクトルを検出及び発生させる。すなわち、このステップＳ８２では、予め定められたスケジュールでフレームメモリ４０中の各画像データをＰピクチャ或いはＢピクチャとして圧縮符号化できるように、動き検出（モーションエスティメーション）を行う。
【００７５】
ここで、動き検出を行う画像については、Ｉピクチャを規定しない。それはどの画像データがＩピクチャになるのかこの時点では確定していないからであり、またＩピクチャは動き補償を必要としないため、後にどの画像データでもＩピクチャにすることが可能だからである。
上記画像情報評価回路５０は、上記動き検出をする際に用いられる最小歪み（Minimum Distortion）或いは誤差の絶対値和（ＡＤ：Absolute Difference ）と呼ばれるものを符号化に用いるパラメータの一つ（第２のパラメータ）として読み出し格納する。
【００７６】
なお、上記誤差の絶対値和（ＡＤ）は、リファレンス側の画像を８×８ピクセルのブロックに分割し、輝度データ８×８×４ピクセルと色差データ８×８×２ピクセルのマクロブロック（ＭＢ）に対し、動き検出を行なった際に求めた動きベクトルで切りだされるサーチ側のマクロブロックとの各画素同士の差の絶対値和で、以下の式(1) で求めることができる。
【００７７】
【数１】

【００７８】
これをさらにマクロブロック内のブロックについて集計したものを用いてマクロブロックの誤差の絶対値和（ＡＤ）とする。
このパラメータはシーンチェンジの判定やフレーム間予測符号化で圧縮する場合の画像の相関も考慮した情報量の推定に用いるものである。
このパラメータはまた、後述するようにマクロブロックタイプを決定するために使われる。
【００７９】
画像の情報量を推定するパラメータＳＡＤは、式(2) のように、一つの画像内の上記誤差の絶対値和（ＡＤ）を合計したものである。
ＳＡＤ＝ΣＡＤ (2)
もちろん、上記誤差の絶対値和（ＡＤ）以外に最小歪み（Minimum Distortion）を用いてもよい。
【００８０】
次に、ステップＳ８３では、画像情報評価回路５０において、上記動き検出で得られたパラメータ以外に誤差の平均絶対値和（ＭＡＤ：Mean Absolute Difference）、アクティビティ（Activity）を各画像毎に評価する。
上記誤差の平均絶対値和（ＭＡＤ）はＩピクチャの情報量を推定するためのパラメータで、下記の式(3) によって８×８画素のブロック毎に求め、必要に応じてマクロブロック或いは画面で集計を行う。このパラメータはまた、マクロブロックタイプを決定するために使われる。
【００８１】
【数２】

【００８２】
これをさらに、式(4) のように、マクロブロック内のブロックについて合計したものを用いてマクロブロックの判定に用いる。
ＭＡＤ＝Σ blockＭＡＤ (4)
またさらに、式(5) のように、マクロブロックの値を一つの画像内で合計してその値を、その画像の（Ｉピクチャとしての）情報量を表すパラメータＳＭＡＤとする。
【００８３】
ＳＭＡＤ＝ΣＭＡＤ (5)
また、上記アクティビティは、一つの画面の中でそのマクロブロックの画像の状態に応じてよりきめ細かに量子化ステップを制御することによって画質を維持しながら、より圧縮効率を高めるために、その画像の状態を定量化するためのパラメータである。
【００８４】
例えば一つのブロック内で画像が画素のレベル変化の少ない平坦な部分（フラットな部分）では量子化による歪みが目立ち易く、量子化ステップを小さくしてやるべきで、逆にレベル変化が多い複雑なパターンのブロックでは量子化歪みは目立ち難く、情報量も多いので量子化ステップを大きくするべきである。
そこで、例えばブロックの平坦度（フラットネス）を表すパラメータをこのアクティビティとして用いる。
【００８５】
次に、ステップＳ８４では、上記シーンチェンジ検出回路３１でシーンチェンジの検出を行う。このシーンチェンジ検出回路３１でのシーンチェンジ検出は、上記画像情報評価回路５０で得られたパラメータＡＤを使ってなされる。具体的には、上記パラメータＡＤを１画面分合計した上記パラメータＳＡＤを使い、その変化の割合で上記シーンチェンジを検出する。
【００８６】
次に、圧縮方法選択回路３２においては、ステップＳ８５でＧＯＰ長の決定を行い、ステップＳ８６で圧縮方法の選択（ピクチャタイプの決定）を行う。
ここでは、すでに述べたように、符号化に際してランダムアクセス性を考慮して適当なフレーム数毎にＧＯＰを区切る。このとき少なくともＧＯＰの符号順で最初のピクチャはＩピクチャでなければならないから、ピクチャの数をカウントし定期的にピクチャタイプをＩピクチャにする。
【００８７】
一方、上記シーンチェンジによって前後のピクチャで相関が低くなった場合、これも先に述べたようにＩピクチャで圧縮符号化すると効率が良い。しかしながら、Ｉピクチャは圧縮率が低いため、低ビットレートにおいては頻繁に現れると画質の低下を招く。したがって、シーンチェンジ検出回路３１によってシーンチェンジが検出された場合、圧縮方法選択回路３２は、Ｉピクチャ同士の間隔を適度に保つよう適応的にＧＯＰの長さを決める。
【００８８】
次のステップＳ８７では、圧縮方法選択回路３２において、後述する図３のフローチャートに示すような合計評価値（難易度：difficulty) の集計を行い、これに基づいてマクロブロックタイプの判定を行う。すなわち、圧縮方法選択回路３２では、ステップＳ８７においてマクロブロック毎の圧縮方法とマクロブロックタイプとを決める。
【００８９】
前述のように既に求めた上記誤差の平均絶対値和（ＭＡＤ）と誤差の絶対値和（ＡＤ）は、それぞれフレーム内符号化／フレーム間予測符号化で圧縮したときの圧縮後のデータ量に関係するので、この２つのパラメータを比較すればフレーム内符号化／フレーム間予測符号化のどちらのマクロブロックタイプがより少ないデータ量になるか判定できる。
【００９０】
次のステップＳ８８では、量子化ステップ制御器３９において、レートコントロールのためのビット配分を行う。すなわちこのステップＳ８８では、上記ステップＳ８７で求めた難易度（difficulty) に応じて１画面毎の割り当て量をビット配分する。
各ピクチャ毎の圧縮符号化された後のデータサイズは、その符号化方式や元々の画像データが持つ情報量、前後の相関などによって大きく変動する。平均的な画質を保つようにするならばことさらである。
【００９１】
各ピクチャ毎の圧縮符号化された後のデータサイズのむらは出力バッファ１８によってある程度吸収されるが、平均的には一定のビットレートにしなければならない。したがって、ある区間を定めればその間のピクチャのトータルの圧縮後のデータ量が決まる。そこで、既に決定しているピクチャタイプと、予め調べておいた画像の情報量パラメータとを用いて各ピクチャ毎に圧縮後のデータ量、すなわち各ピクチャが使って良いビットの量を決める。
【００９２】
このとき、例えば情報量の少ない画像やＢピクチャには少なく、情報量の多い画像やＩピクチャには多くする。これをビット配分と呼ぶ。これによって画質のばらつきを抑え、なおかつ一定レートに保つことが容易になる。
例えば本実施例ではＧＯＰをその区間として、次の式(6) ，式(7) のようにビット配分を行う。
Total Bit Count ＝（Bit Rate [bit/s]× Number Of Picture In GOP [picture])／（Picture Rate [picture/s]） [bit] (6)
Available Bits＝（Total Bit Count ×ターゲットの画像情報量パラメータ）／画像情報量パラメータのＧＯＰ合計値） [bit] (7)
なお、この式(7) で使用している情報量パラメータは、先に述べたパラメータＳＭＡＤ，ＳＡＤを用い、これに圧縮するピクチャタイプ別の乗数をかけたものである。また、上記乗数は各ピクチャタイプ間のパラメータと画質との関係を調整するものである。
【００９３】
なお、上記式(7) の上記画像情報量パラメータのＧＯＰ合計値は、式(8) に示すようにして求める。
画像情報量パラメータのＧＯＰ合計値＝Ki×ΣDifi＋Kp×ΣDifp＋Kb×ΣDifb
Difi: Ｐピクチャの難易度
Difb: Ｂピクチャの難易度 (8)
上記量子化ステップ制御器３９においては、次のステップＳ８９において、後述する図４のフローチャートのような学習パラメータＡ，Ｂによる回帰予測の処理に基づいて基本量子化ステップの決定を行う。すなわち、このステップＳ８９では、上述した１画面のビット割当量と難易度から回帰予測による基本量子化ステップの決定（予測）を行う。
【００９４】
上述のようにしてピクチャタイプが決まり、マクロブロックタイプが決まれば、マクロブロックタイプに応じて１画面分の上記誤差の平均絶対値和（ＭＡＤ）、誤差の絶対値和（ＡＤ）を集計することで、１画面の情報量パラメータ（すなわち難易度）が測定できる。したがって、過去の実績から、情報量パラメータと量子化後のデータ量が決まれば量子化ステップを推定することができる。
【００９５】
本発明は、この基本量子化ステップ決定機構に関してのものであり、本実施例の量子化ステップ制御器３９は、１画面毎の割当ビット量(allocated＿bit)と上記難易度(difficulty)とから、以下の方法により一画面の基本の量子化ステップ（量子化スケール：Q ＿scale)を決めるようにしている。
先ず、式(10)に示すように、
log(allocated＿bit/difficulty) ＝A*log(Q ＿scale)+B (10)
という関係があると仮定し、上記式(10)中のＡ，Ｂを予め学習（実験）により求めておく。さらに、上記式(10)を変形した式(11)から、上記量子化スケール(Q＿scale)を求める。
【００９６】
Q＿scale ＝exp((log(allocated＿bit/difficulty)-B)/A) (11)
このようにして求めた量子化ステップをそのピクチャの基本量子化ステップとする。
次に、量子化ステップ制御器３９は、ステップＳ９０のように、画面内の量子化ステップの制御を行う。
【００９７】
すなわち、当該量子化ステップ制御器３９は、先に述べたように画面内の量子化ステップを各ブロック毎に、なるべく画質を高く、しかも圧縮効率を高くするように制御する。具体的には、上記アクティビティやマクロブロックタイプなどの情報をもとに、基本量子化ステップからマクロブロック毎の量子化ステップを加減することで、量子化器１５に対する量子化ステップの制御を行う。
【００９８】
次のステップＳ９１では前記可変長符号化回路１７において符号化を行う。上述のようにして圧縮符号化の全てのパラメータが決まっているのでその後は、ＭＰＥＧの規則にしたがって圧縮符号化する。
次のステップＳ９２ではマクロブロック毎のビット発生量と、量子化スケール(Q＿scale)の集計を行う。
【００９９】
最後に、ステップＳ９３では、前述した各パラメータの更新を行う。すなわち、後述する図５のフローチャートに示すように、マクロブロック毎の量子化ステップの平均と、マクロブロック毎の発生量の合計値と、難易度（difficulty) とによる予測標本の更新を行う。
ここで、画像情報量と基本量子化ステップ、圧縮後のデータ量の関係は、圧縮する画像に依存する。したがって、ここでは、その関係を表す式に用いるパラメータ、予測パラメータを、圧縮後の実際のデータ量をフィードバックすることにより学習させ、予測の精度を向上させている。
【０１００】
この場合、先ず、ピクチャタイプ毎に学習パラメータＡ，Ｂを以下の方法により、学習し、修正する。
例えば、マクロブロック毎の量子化スケール(Q-scale) の平均値を(average＿Q)とし、１画面圧縮した後の発生量を(generated bit) とすると、式(12)に示すように、
x = log(average＿Q), y = log(generated bit/ difficulty) (12)
となり、ＡとＢのパラメータを最小２乗誤差法で以下の式(13)、式(14)のようにして求める事ができる。なお、式中ｎは標本数である。

次に図２のステップＳ８７における難易度（difficulty) の集計のフローチャートについて図３を用いて説明する。
【０１０１】
図２のステップＳ８７で難易度（difficulty) の集計の処理に進むと、図３のステップＳ１００以降の処理に移る。
この図３において、ステップＳ１０１では、先ず、初期化としてdifficulty＝０とし、次のステップＳ１０２では、マクロブロックタイプがフレーム内符号化マクロブロック（イントラマクロブロック:intra MB)か否かの判定を行う。ここで、イントラマクロブロックであると判断した場合にはステップＳ１０６で難易度（difficulty）を前記１画面分の誤差の平均絶対値和（MAD)とした後、ステップＳ１１０に進む。また、ステップＳ１０２でマクロブロックタイプがイントラマクロブロックでないと判定した場合には、ステップＳ１０３に進む。
【０１０２】
このステップＳ１０３では、マクロブロックタイプがフレーム間予測符号化マクロブロック（インターマクロブロック:inter MB)のうちの前方予測マクロブロック(forward MB)か否かの判定を行う。当該ステップＳ１０３で前方予測マクロブロックであると判断した場合にはステップＳ１０７で難易度（difficulty）を前方予測マクロブロックの誤差の絶対値和（AD＿for)とした後、ステップＳ１１０に進む。また、ステップＳ１０３でマクロブロックタイプが前方予測マクロブロックでないと判定した場合には、ステップＳ１０４に進む。
【０１０３】
このステップＳ１０４では、マクロブロックタイプがインターマクロブロックのうちの後方予測マクロブロック(backward MB) か否かの判定を行う。当該ステップＳ１０４で後方予測マクロブロックであると判断した場合にはステップＳ１０８で難易度（difficulty）を後方予測マクロブロックの誤差の絶対値和（AD＿bac)とした後、ステップＳ１１０に進む。また、ステップＳ１０４でマクロブロックタイプが後方予測マクロブロックでないと判定した場合には、ステップＳ１０５に進む。
【０１０４】
このステップＳ１０５では、マクロブロックタイプがインターマクロブロックのうちの両方向予測マクロブロック(bidirectional MB)か否かの判定を行う。当該ステップＳ１０５で両方向予測マクロブロックであると判断した場合にはステップＳ１０９で難易度（difficulty）を両方向予測マクロブロックの誤差の絶対値和（AD＿bid)とした後、ステップＳ１１０に進む。また、ステップＳ１０５でマクロブロックタイプが両方向予測マクロブロックでないと判定した場合には、ステップＳ１１０に進む。
【０１０５】
上記ステップＳ１１０では、全マクロブロック分の難易度の集計が終了したか否かの判定を行い、終了していないと判定した場合にはステップＳ１０２に戻り、終了したと判定した場合にはステップＳ１１１で難易度の集計を終了し、ステップＳ８７に戻る。
次に図２のステップＳ８９における学習パラメータＡ，Ｂに因る回帰予測のフローチャートについて図４を用いて説明する。
【０１０６】
図２のステップＳ８９で学習パラメータＡ，Ｂに因る回帰予測の処理に進むと、図４のステップＳ１２０以降の処理に移る。
この図４において、ステップＳ１２１では、前記式(11)の演算を行い、次のステップＳ１２２では、この式(11)の演算により得られた学習パラメータＡ，Ｂによる回帰予測処理を終了し、図２のステップＳ８９に戻る。
【０１０７】
次に図２のステップＳ９３における学習パラメータＡ，Ｂの修正更新のフローチャートについて図５を用いて説明する。すなわちこの図５のフローチャートでは、新たな標本を標本集合に加え、最小２乗誤差法によりパラメータＡ，Ｂを求め、古い標本を標本集合から除くことで、回帰予測のための学習パラメータＡ，Ｂを更新修正する。
【０１０８】
図２のステップＳ９３で学習パラメータＡ，Ｂの修正構成の処理に進むと、図５のステップＳ１３０以降の処理に移る。
この図５において、ステップＳ１３１では、最小２乗誤差法で、学習パラメータＡ，Ｂの修正と更新を行う。ここでは、１画面のデータを集計し、以下の式(15),(16) に示すように、
ｘ＝log(average ＿Q) (14)
ｙ＝log(generated ＿bit/difficulty) (15)
とする。
【０１０９】
次のステップＳ１３２では、新しいｘ，ｙデータを回帰分析標本集合に加え、ステップＳ１３３では学習パラメータＡ，Ｂを最小２乗誤差法により計算する。さらに次のステップＳ１３４では学習パラメータＡ，Ｂを最大値，最小値でクリップし、ステップＳ１３５では古いｘ，ｙデータを回帰分析の標本集合から除く。その後、ステップＳ１３６では学習パラメータＡ，Ｂの修正更新を終了して、図２のステップＳ９３に戻る。
【０１１０】
上述のようにして求めた発生ビット量と量子化スケール（Q ＿scale)の関係は、図６に示すようになる。
上述した本発明実施例の画像符号化装置についてまとめると、本実施例の画像符号化装置においては、基本量子化ステップの制御の際に、入力画像データの情報量を見積るパラメータ（評価値）と量子化後の予定圧縮データ量とから、基本量子化ステップを精度良く予測するようにしている。
【０１１１】
ここで、当該基本量子化ステップを精度良く予測するために、入力画像データの情報量を見積るパラメータと実際に圧縮に使用した量子化ステップと圧縮後のデータ量の関係を学習するようにしている。また、基本量子化ステップを予測する際には、入力画像データの情報量を見積る方法として、１画面毎に難しさ（難易度）を、マクロブロックタイプ決定後のマクロブロックタイプに応じて誤差の絶対値和（ＡＤ），誤差の平均絶対値和（ＭＡＤ）のどちらかを加算し、１画面分合計し、これをその画面の難易度(difficulty)としている。さらに、本実施例では、１画面毎の割当ビット量(allocated＿bit)を、難易度(difficulty)を１ＧＯＰ分集計し、この１ＧＯＰ分の難易度に応じてビット配分を行うようにしている。
【０１１２】
また、本実施例装置では、１画面毎の割当ビット量(allocated＿bit)と難易度(difficulty)から、ピクチャタイプ毎に前記式(10)の関係があると仮定し、予め式中のＡ，Ｂを学習により求めておき、さらに式(10)を変形した式(11)から基本量子化スケール(Q＿scale)を求めるようにしている。このとき、マクロブロック毎の学習パラメータＡ，Ｂは、前記式(12), 式(13), 式(14)のように、マクロブロック毎の量子化ステップの平均値と１画面圧縮した後の発生量とを用い、最小２乗誤差法で求めることができる。
【０１１３】
またさらに、学習パラメータＡ，Ｂを学習し、修正する際には、ピクチャータイプ毎に各々最近のｎ秒間のデータから求める（すなわち、最近の過去ｎ秒間以上過去のデータは使わない）ようにしている。このとき、過去のデータを図６のようにグラフにプロットし、直線近似ができそうな部分の基本量子化スケール(Q＿scale)の最大値、最小値の平均データを標本集合中に入れておき、予測直線を安定させるようにし、さらには学習パラメータＡ，Ｂには上限下限を設定し、特異なデータが多く入力されても安定した基本量子化ステップ予測を行えるようにしている。
【０１１４】
上述したようなことから、本実施例の画像符号化装置によれば、基本量子化ステップを精度良く予測できるため、画面内で量子化ステップの制御を特に行わなくても一画面に割り当てた圧縮後の予想ビット量に近くなり、したがって、画面毎にビットの使い込みや余りが起きなくなるので、平均した画質を維持できるようになる。例えば、画面内で量子化ステップの制御を上手に行ったとしても、基本量子化ステップが大きく外れていたならば、画面内で量子化ステップの変動がおき、画質の不均質を検出できる圧縮画ができてしまうが、本実施例では、圧縮に際しての難易度に応じてビット配分を行うため、基本量子化ステップを精度良く予測でき、したがって、無駄なくビットが使われ、画面内や画面毎の不均質が検出し難い圧縮画ができる。
【０１１５】
また、本実施例では、基本量子化ステップを精度良く予測する機構が、変動する入力画によって修正、学習を行い、入力画に追随するため、基本量子化ステップを精度良く予測する機構が維持できる。
さらに、本実施例装置では、基本量子化ステップを精度良く予測する機構が、過去の学習結果に引きずられることなく、最近のある期間の入力画の影響を学習に使用するようにしているため、入力画に素早く追随した基本量子化ステップの予測が行なえる。
【０１１６】
また、本実施例装置では、基本量子化ステップを精度良く予測する機構が、実験で求めた多くの学習データからの予測直線を最近の入力画の学習データで更新し、なおかつ実験で求めた予測直線データは、ｘ，ｙ共に最大値、最小値に近いデータが入力されている。したがって、最小二乗誤差法では、実験の影響が大きくなるため、もし特異な入力画の学習データが入ったとしても、それに引きずられずに基本量子化ステップの予測が行なえる。
【０１１７】
またさらに、本実施例の画像符号化装置によれば、上記の実験で求めた最大値，最小値の付近のｘ，ｙから求められる学習パラメータＡ，Ｂでの変動抑制でも防げないような予測直線となった場合でも、最終的にパラメータＡ，Ｂをクリップすることにより、異常な学習結果を保存しないので、基本量子化ステップを精度良く予測する機構を維持できる。
【０１１８】
【発明の効果】
本発明の画像符号化装置においては、蓄積した複数枚の画像データから情報量を評価し、さらに画像間の相関を検出し、情報量の評価値と画像間の相関情報とに基づいて適応的に画像データの圧縮方法を選択し、選択した圧縮方法で１画面分の画像データを圧縮することにより得られる予定圧縮データ量と評価値とから基本量子化ステップを予測することにより、基本量子化ステップが精度良く予測でき、画面内で量子化ステップの制御を特に行わなくても一画面に割り当てた圧縮後の予想ビット量に近くなり、したがって、画面毎にビットの使い込みや余りが起きなくなるので、平均した画質を維持できるようになる。このため、効率の良い画像圧縮が可能で、全体的に画質を向上させることが可能となる。
【０１１９】
また、本発明の画像符号化装置においては、実際に圧縮に使用した量子化ステップと圧縮後のデータ量と評価値の関係を学習し、当該学習結果に応じて基本量子化ステップの予測を行うようにしているため、変動する入力画像でもその入力画像に追従でき、基本量子化ステップを精度良く予測することが可能となっている。
【図面の簡単な説明】
【図１】本発明実施例の画像符号化装置の概略構成を示すブロック回路図である。
【図２】本実施例装置の動作を説明するためのフローチャートである。
【図３】難易度の集計のフローチャートである。
【図４】学習パラメータＡ，Ｂに因る回帰予測のフローチャートである。
【図５】学習パラメータＡ，Ｂの修正，更新のフローチャートである。
【図６】発生ビット量と基本量子化スケールとの関係を示す図である。
【図７】従来の画像符号化装置の概略構成を示すブロック回路図である。
【図８】画像の解像度と構成について説明するための図である。
【図９】マクロブロックとブロックについて説明するための図である。
【図１０】ジグザグスキャンについて説明するための図である。
【図１１】ＧＯＰの一例について説明するための図である。
【符号の説明】
２２，４０フレームメモリ
１１ブロック分割器
１２差分器
１３，２４スイッチ
１４ＤＣＴ回路
１５量子化器
１６ジグザグスキャン回路
１７可変長符号化回路
１８出力バッファ
１９，３９量子化ステップ制御器
２０動き検出器
２１動き補償器
２５加算器
２６逆ＤＣＴ回路
２７逆量子化器
３１シーンチェンジ検出回路
３２圧縮方法選択回路
３３動きベクトル発生回路
５０画像情報評価回路[0001]
[Industrial application fields]
The present invention relates to an image coding apparatus and method suitable for use in, for example, compression coding of an image.
[0002]
[Prior art]
FIG. 7 shows a configuration example of a conventional image encoding apparatus that is suitable for use in, for example, conventional compression encoding of an image.
In the image encoding device of FIG. 7, the input terminal 1 is connected to the input terminal 1 as shown in FIG.
Luminance component (Y) 352 (H) x 240 (V) x 30 frames
Chroma component (Cb) 174 (H) x 120 (V) x 30 frames
Chroma component (Cr) 174 (H) x 120 (V) x 30 frames
The digitized image data is supplied to the number of pixels.
[0003]
The input image data supplied to the input terminal 1 is sent to the motion detector 20 and the block divider 11 via the frame memory 10 for temporarily storing the input image data and replacing it in an appropriate order.
The block divider 11 converts each frame supplied from the frame memory 10 into a block of 8 × 8 pixels, as shown in FIG. 9, for each of the luminance component (Y), chroma component (Cr), and (Cb). To divide. It should be noted that a total of six blocks (Y0, Y1, Y2, Y3), four blocks (Y0, Y1, Y2, Y3), one chroma component (Cb), and one chroma component (Cr) block. Y1, Y2, Y3, Cb, Cr) are called macroblocks (MB).
[0004]
Data in units of macroblocks from the block divider 11 is sent to the differentiator 12.
The differencer 12 takes the difference between the data from the block divider 11 and the inter-frame prediction image data described later, and outputs the data as the frame data to be subjected to the inter-frame prediction encoding described later. Send to terminal b. In addition, the data from the block divider 11 is supplied to the switched terminal a of the changeover switch 13 as data of a frame to be subjected to intraframe encoding described later.
[0005]
Data in block units via the selector switch 13 is subjected to discrete cosine transform (DCT) processing by the DCT circuit 14, and the DCT coefficients are sent to the quantizer 15. The quantizer 15 quantizes the DCT output with a predetermined quantization step width, and sends the quantized coefficients to the zigzag scan circuit 16.
In the zigzag scan circuit 16, the quantized coefficients are rearranged by so-called zigzag scan as shown in FIG. 10 and the output is sent to the variable length coding circuit 17. The variable length coding circuit 17 performs variable length coding (VLC) on the output data of the zigzag scan circuit 16, sends the output to the output buffer 18, and indicates the amount of code generated by the variable length coding processing. Information is sent to the quantization step controller 19. The quantization step controller 19 controls the quantization step width of the quantizer 15 based on the information indicating the code amount from the variable length coding circuit 17. The data output from the output buffer 18 is output from the output terminal 2 as an encoded output subjected to compression encoding.
[0006]
The output from the quantizer 15 is inversely quantized by an inverse quantizer 27 and further subjected to inverse DCT processing by an inverse DCT circuit 26. The output of the inverse DCT circuit 26 is sent to the adder 25.
The adder 25 is also supplied with the inter-frame prediction image data from the motion compensator 21 via the changeover switch 24 that is turned on in the case of the inter-frame prediction encoding frame, and the data and the inverse DCT circuit 26 Addition with output data is performed. The output data of the adder 25 is temporarily stored in the frame memory 22 and then sent to the motion compensator 21.
[0007]
The motion compensator 21 performs motion compensation based on the motion vector detected by the motion detector 20, and outputs inter-frame prediction image data obtained thereby.
The specific operation of the conventional image encoding apparatus shown in FIG. 7 will be described in detail below. Here, for the sake of explanation, the name of each frame is defined as follows.
[0008]
First, when arranging the frames in the display order,
I0, B1, B2, P3, B4, B5, P6, B7, B8, I9, B10, B11, B12, ...
I will call it. Among these frames, I, P, and B, which will be described later, indicate the type of compression method, and the numbers after these I, P, and B simply indicate the display order.
[0009]
Of the so-called MPEG (Moving Picture Expert Group), which is an international standardization working group for color moving picture coding systems, MPEG1 stipulates the following to compress such images.
First, the I0 image is compressed.
Next, the P3 image is compressed, but not the P3 itself, but the difference data between the P3 and the I0 image is compressed.
[0010]
Next, the B1 image is compressed, but the B1 itself is not compressed, but the difference between B1 and I0 or the difference data between B1 and P3 or the average value of I0 and P3 (whichever information is less) )).
Next, the image of B2 is compressed, but B2 itself is not compressed, but the difference between B2 and I0 or the difference data between B2 and P3 or the average value of I0 and P3 (whichever information is less) Choose the method).
[0011]
Next, the P6 image is compressed, but not the P6 itself, but the difference data between the P6 and P3 images is compressed.
If the processing as described above is arranged in order,

It becomes. In this way, the encoding order is
I0, P3, B1, B2, P6, B4, B5, P9, B7, B8, I9, P12, B10, B11,.
As shown, the display order is changed. The compressed data (encoded data) is arranged in this order.
[0012]
The above will be described in more detail together with the operation of the configuration of FIG.
In encoding the first image (that is, I0), first, the data of the image to be compressed from the frame memory 10 is blocked by the block divider 11. The block divider 11 outputs data for each block in the order of Y0, Y1, Y2, Y3, Cb, and Cr, and sends the data to the DCT circuit 14 via the changeover switch 13 switched to the switched terminal a side. It is done. The DCT circuit 14 performs two-dimensional vertical and horizontal discrete cosine transform processing for each block. Thereby, the data which was the time axis is converted into the frequency axis.
[0013]
The DCT coefficient from the DCT circuit 14 is sent to the quantizer 15 where it is quantized with a predetermined quantization step width. After that, the zigzag scan circuit 16 rearranges them in the zigzag order as shown in FIG. When arranged in this zigzag order, the coefficient becomes a coefficient having a higher frequency component as it goes backward, so that the coefficient value generally tends to be smaller at the rear. Therefore, when quantization is performed with a certain value S, the frequency of the result becoming 0 increases as going backward, and as a result, the high frequency component is cut off.
[0014]
Thereafter, the quantized coefficients are sent to a variable length coding (VLC) circuit 17 where so-called Huffman coding is performed. The compressed bit stream obtained as a result is temporarily stored in the output buffer 18 and then transmitted at a constant bit rate. The output buffer 18 is a memory for buffering so that an irregularly generated bit stream can be transmitted at a constant bit rate.
[0015]
As described above, compression of only one image alone is called intra-frame (Intra) coding, and this image is called an I picture.
Accordingly, when the decoder receives the bit stream of the above I picture, the above process is reversed to complete the first image.
Next, the encoding of the second image (that is, P3) is performed as follows.
[0016]
That is, the second and subsequent frames may be compressed as an I picture to create a bit stream. However, in order to increase the compression rate, the following method is used by utilizing the fact that there is a correlation between the contents of successive images. Compress with.
First, the motion detector 20 searches for a pattern very similar to it from the first image (I0) for each macroblock constituting the second image and calls it a motion vector (x, y). It is expressed as the coordinates of the relative position.
[0017]
Also, in the second image, each block is not sent as it is to the DCT circuit 14 as in the case of the I picture, but with the block pulled from the first image according to the motion vector for each block. The difference data (difference data by the differentiator 12) is sent to the DCT circuit 14. The method for detecting a motion vector is described in detail in ISO / IEC 11172-2 annex D.6.2, and is therefore omitted here.
[0018]
For example, if the correlation between the pattern of the first image indicated by the motion vector and the pattern of the block to be compressed is very strong, the difference data is very small. Therefore, the amount of data after compression is smaller when the motion vector and the difference data are encoded than when the compression is performed by intra-frame (intra) encoding.
[0019]
Such a compression method is called inter-frame (Inter) predictive coding. However, the difference data does not always decrease, and depending on the design (image content), the compression rate may be higher when compression is performed by the intra-frame coding than when the difference is taken. In such a case, compression is performed by the intra-frame coding. Whether to perform interframe prediction encoding or intraframe encoding differs for each macroblock.
[0020]
The above will be described with reference to the image coding apparatus (encoder) in FIG. 7. First, in order to perform interframe predictive coding, the same image as that created on the decoder side is created on the encoder side. Need to put.
For this purpose, the same circuit as the decoder exists in the encoder. This circuit is called a local decoder (local decoder). The inverse quantizer 27, the inverse DCT circuit 26, the adder 25, the frame memory 22 and the motion compensator 21 of FIG. 7 correspond to the local decoder, and the image stored in the frame memory 22 is represented as a local decoded picture. It is called (Local decoded picture) or Local decoded data. On the other hand, the image data before compression is called original picture (Original picture) or original data (Original data).
[0021]
Even when the first I picture is compressed, the first image decoded through the local decoder is stored in the frame memory 22. Here, it should be noted that the image obtained by the local decoder is not an image before compression but an image restored after compression, and is exactly the same image as the image decoded by the decoder with image quality degradation due to compression. It is that.
[0022]
The data (Original data) of the second image (P3) comes into the encoder in such a state (the motion vector must already be detected at this stage), but the data for each block It has a motion vector, and this vector is given to a motion compensator (MC) 21. The motion compensation circuit 21 outputs data on the locally decoded picture (motion compensation data: MC data: 1 macroblock) indicated by the motion vector as the inter-frame prediction image data.
[0023]
Difference data by the differentiator 12 for each pixel of the second original data and the motion compensation data (inter-frame prediction image data) is input to the DCT circuit 14. The subsequent compression method is basically the same as the I picture. An image compressed by the compression method as described above is called a P picture (Predicted picture).
[0024]
More specifically, not all macroblocks in a P picture are compressed by interframe predictive coding, and when it is determined that compression by intraframe coding is more efficient, the macroblock is not Encoding is performed by inner encoding.
That is, even in a P picture, for each macroblock, whether it is by intraframe coding (this macroblock is called an intra macroblock) or by interframe predictive coding (this macroblock is called an intermacroblock). Choose one to compress.
[0025]
As described above, in the local decoder, the output of the quantizer 15 is inversely quantized by the inverse quantizer 27 and further subjected to inverse DCT processing by the inverse DCT circuit 26, and then motion compensation data (MC data) is encoded. ) To be the final local decoded picture.
Next, the encoding of the third image (that is, B1) is performed as follows.
[0026]
In the encoding of the third image (B1), a motion vector for each of the two images I0 and P3 is searched. Here, the motion vector for I0 is called a forward vector (forward vector) MVf (x, y), and the motion vector for P3 is called a backward vector (Backward Vector) MVb (x, y).
The difference data is also compressed for the third image, but the problem is which data to compress. In this case as well, a difference from the one with the least amount of information may be taken. In this case, compression options are:
(1) Difference from data on I0 indicated by the forward vector MVf (x, y)
(2) Difference from data on P3 indicated by backward vector MVb (x, y)
(3) The difference between the data on I0 indicated by the forward vector MVf (x, y) and the average value of the data on P3 indicated by the backward vector MVb (x, y)
(4) Do not use differential data (intraframe coding)
There are four. One of these four compression methods is selected for each macroblock. In the case of (1), (2), (3) among the options of the compression method, each motion vector is also sent to the motion compensator 21, and the differencer 21 takes the difference from the motion compensation data, This is sent to the DCT circuit 14. In the case of (4) among the options of the compression method, the data as it is is sent to the DCT circuit 14.
[0027]
As a result of the first and second encoding processes described above, two pictures I0 and P3 are restored in the frame memory 22 storing local decoded pictures, and this is possible. It is.
Next, the encoding of the fourth image (that is, B2) is performed as follows.
[0028]
The encoding of the fourth image (B2) is the same as the encoding of the third image except that B1 is replaced with B2 in the explanation of the encoding method of the third image (B1) described above. Compress with.
Next, the encoding of the fifth image (that is, P6) is performed as follows.
[0029]
The encoding of the fifth image (P6) is the same as the description of the encoding method of the second image (P3), except that P3 is replaced with P6 and I0 is replaced with P3. Become.
Since the sixth and subsequent sheets are repeated as described above, the description is omitted.
In MPEG, a so-called GOP (Group Of Picture) is defined.
[0030]
That is, a group of pictures is called a group of pictures (GOP), and the GOP must be a group of continuous images as viewed on encoded data (compressed data). is there. In addition, GOP considers random access. For this purpose, the picture that comes first in GOP on the encoded data needs to be the I picture. Further, the last GOP in the display order (display order) must be an I or P picture.
[0031]
FIG. 11 shows an example in which the first GOP is composed of 4 pictures and the subsequent GOP is composed of 6 pictures. 11A shows the display order, and FIG. 11B shows the encoded data order.
In FIG. 11, paying attention to GOP2, since B4 and B5 are formed from P3 and I6, for example, when I6 is accessed by random access, B4 and B5 cannot be correctly decoded because there is no P3. Thus, a GOP that cannot be correctly decoded only within the GOP is not a closed GOP.
[0032]
On the other hand, if B4 and B5 refer only to I6, for example, even if I6 is accessed by random access, P3 is not necessary, so that B4 and B5 can be correctly decoded. A GOP that can be completely decoded with only information in the GOP is called a closed GOP.
Although compression is performed by the most efficient method from among the compression methods as described above, the amount of coded data (Coded data) generated as a result depends on the input image and is actually compressed. I do not know without trying.
[0033]
However, it is also necessary to control the bit rate of the compressed data to be constant. The parameter for performing the control is a quantization step (or quantization scale: Q-scale) as information representing the code amount given to the quantizer 15. Even with the same compression method, the amount of generated bits decreases if the quantization step is increased, and increases if it is decreased.
[0034]
The value of this quantization step is controlled as follows.
The encoder is provided with a buffer buffer (output buffer 18) at the output in order to set the compressed data to a constant bit rate, so that a certain amount of difference in data generation amount for each image can be absorbed. Has been made.
However, if the generation of data that exceeds the predetermined bit rate continues, the remaining amount of the output buffer 18 increases and eventually overflows. On the contrary, if the data below the bit rate continues to be generated, the remaining amount of the output buffer 18 will decrease, and finally an underflow will be caused.
[0035]
Therefore, the encoder feeds back the remaining amount of the output buffer 18 so that the quantization step controller 19 controls the quantization step of the quantizer 15. Control is performed such that the quantization step is reduced so as not to be compressed, and the quantization step is increased so as to increase the compression rate when the remaining amount of the output buffer 18 increases.
[0036]
In addition, there is a large difference in the range of the encoded data amount generated by the above-described compression method (the intra-frame encoding or inter-frame predictive encoding).
In particular, when compression is performed by the intraframe coding method, a large amount of data is generated. Therefore, when the free capacity of the output buffer 18 is small, the quantization step size must be increased. In some cases, the quantization step size is maximized. However, the buffer 18 may overflow. Even if it fits in the buffer 18, if the quantization step is large, the image of the intra-frame coding affects the image quality of the subsequent inter-frame prediction coding. Therefore, before performing the compression in the intra-frame coding, the output buffer 18 Sufficient free space is required.
[0037]
Therefore, the quantization step controller 19 determines the compression method in a predetermined order, and the quantization step controller 19 provides feedback of the quantization step size so as to ensure a sufficient free space in the output buffer 18 before intra-frame coding. I try to control it.
As described above, it is possible to suppress the encoded data at a constant rate.
[0038]
[Problems to be solved by the invention]
The conventional method described above has a drawback in that high image quality cannot be obtained for the following reason.
In other words, in order to compress an input image whose amount of information changes from moment to moment to an average high image quality at a constant bit rate, the information should be uniform so that the output buffer can maintain a low bit rate and the image quality is uniform. It is necessary to allow a large amount of compressed data for an image (picture) with a large amount and a small amount of compressed data for an image with a small amount of information. However, the conventional method cannot do this in the following cases.
[0039]
For example, if images with a small amount of information continue and then an image with a large amount of information suddenly comes in, the quantization step is made too small for images with a small amount of information supplied earlier. However, the remaining amount of the output buffer should be kept low until an image with a large amount of information that follows is encoded. Will continue to increase the remaining amount of output buffer.
[0040]
Conversely, if an image with a large amount of information is followed by an image with a small amount of information, even if the image with a large amount of information supplied earlier is compressed with a large quantization step and the remaining amount of the output buffer is not reduced, Subsequent images that have a small amount of information should be difficult to overflow. However, in the output buffer remaining amount feedback method, since the amount of information of the following image is unknown, the direction in which the remaining amount of the buffer is reduced, that is, the quantization step is increased. Control in the direction to reduce the image quality.
[0041]
For this reason, for example, a configuration in which the information amount of the input image is evaluated and the quantization step is controlled based on the evaluation value is also conceivable.
However, in an image encoding apparatus having a mechanism for obtaining the evaluation value of the information amount of the input image, for example, when compressing one input image, it is obtained after compressing the input information. The allocation amount that can be used for the data is paid out according to the information amount (difficulty level) of the input image, and the quantization step of the quantizer at that time is accurately determined according to the allocation amount. It is necessary to predict well.
[0042]
Here, if the predicted quantization step is not appropriate, a usable allocation amount for the compressed data is significantly interrupted or conversely exceeded. If the allocation amount is significantly cut or exceeded in this way, the allocation amount at the time of compression of other pictures is affected.
[0043]
That is, for example, in a frame in which the allocated amount is small, the quantization step is large, and thus the image quality is lowered. For this reason, for example, frames with uniform image quality do not continue when viewed continuously, and the overall image quality is poor. Further, if the prediction is greatly deviated, in the worst case, buffer underflow or overflow may be caused.
[0044]
Here, in order to prevent this from happening, for example, even if the quantization step is controlled by the ratio of the information amount after compression in one screen, the scheduled allocation amount, and the compression progress in the screen, If the basic quantization step is not predicted, a large fluctuation of the quantization step in the screen will occur. In this way, when a large variation in the quantization step occurs in the screen, the compression is performed in the order of raster scan, so that a non-uniform portion of image quality is recognized in a strip shape on the screen, resulting in a decrease in image quality. .
[0045]
Therefore, the present invention has been proposed in view of the above-described circumstances, and provides an image encoding apparatus and method capable of efficient image compression and improving overall image quality. It is intended.
[0046]
[Means for Solving the Problems]
An image encoding device of the present invention has been proposed to achieve the above-described object, and includes an image data storage unit that stores a plurality of input image data, and a plurality of images stored in the image data storage unit. An image that outputs, from the data, a first parameter indicating the information amount of the image itself for evaluating the information amount of the input image data, a second parameter indicating the difference information amount of the image, and image information for image counting Correlation between images for detecting a scene change using the second parameter from the image information evaluation means as correlation information between the images of the information evaluation means and a plurality of image data stored in the image data storage means A detection unit, an orthogonal transformation unit that performs orthogonal transformation processing on image data and generates an orthogonal transformation coefficient, and an orthogonal transformation coefficient generated by the orthogonal transformation unit Based on the quantization means for quantizing in the quantization step, the image information obtained by the image information evaluation means, and the detection output of the scene change that is the correlation information between the images from the correlation detection means between the images, A compression method selection means for periodically selecting intra-frame coding according to the count value of the image information and selecting intra-frame coding at the time of detecting the scene change, and selecting inter-frame predictive coding otherwise. One of the first and second parameters is added according to the planned compressed data amount obtained by compressing the image data for one screen by the compression method selected by the compression method selection means and the macroblock type. Quantization that predicts the basic quantization step at the time of quantization in the quantization means from the difficulty level obtained by summing up the screen The quantization step control means, when the scheduled compression data amount of the image data for one screen is allocated_bit, the difficulty is difficult, and the basic quantization step is Q_scale, Using predetermined parameters A and B,
Q_scale = exp ((log (allocated_bit / difficulty) -B) / A)
The basic quantization step Q_scale is obtained by the following equation.
The image coding method according to the present invention also includes an image information amount for evaluating an information amount of input image data from a plurality of image data stored in an image data storage means for storing a plurality of input image data. An image information evaluation step for outputting image information for image counting, a first parameter indicating image difference information, a second parameter indicating image difference information amount, and a plurality of pieces of image data stored in the image data storage means As correlation information between images, the inter-image correlation detection step of detecting a scene change using the second parameter from the image information evaluation means, the image information obtained in the image information evaluation step, and the image Based on the detection output of scene change that is correlation information between images obtained in the correlation detection step, intra-frame coding is periodically performed based on the count value of the image information. In addition, when the scene change is detected, intra-frame coding is selected. In other cases, inter-frame prediction coding is selected, and an orthogonal transform process is performed on the image data to generate orthogonal transform coefficients. The first and second parameters according to the planned compressed data amount and macroblock type obtained by compressing image data for one screen by the compression method selected in the orthogonal transformation step and the compression method selection step. The quantization step control step for predicting the basic quantization step at the time of quantization from the difficulty level obtained by adding one of the above and totaling for one screen, and the orthogonality generated in the orthogonal transformation step A quantization step of quantizing the transform coefficient in the predetermined quantization step, and in the quantization step control step, the image data for one screen is scheduled The reduced amount of data and Allocated_bit, the difficulty level and DIFFICULTY, when the Q_scale the basic quantization step, using a parameter A, B previously determined,
Q_scale = exp ((log (allocated_bit / difficulty) -B) / A)
The basic quantization step Q_scale is obtained by the following equation.
[0047]
Here, the quantization step control means learns the relationship between the quantization step actually used for compression, the amount of data after compression, and the evaluation value, and predicts the basic quantization step according to the learning result. Do. The quantization step control means obtains a total evaluation value by summing up the evaluation values for each macroblock obtained by dividing the image data into a plurality of screens, and uses the total evaluation value for prediction of the basic quantization step. use. At this time, the image information evaluation means calculates the absolute value sum for each macroblock of the difference between the pixel data of the macroblock of the reference image and the pixel data of the macroblock of the input image according to the motion vector by the motion detection. Value.
[0049]
[Action]
According to the present invention, the amount of information is evaluated from a plurality of stored image data, the correlation between images is further detected, and the image data is adaptively determined based on the evaluation value of the information amount and the correlation information between images. Prediction accuracy of the basic quantization step by selecting the compression method and predicting the basic quantization step from the estimated compressed data amount obtained by compressing the image data for one screen by the selected compression method and the evaluation value Is increasing.
[0050]
In addition, according to the present invention, the relationship between the quantization step actually used for compression, the amount of data after compression, and the evaluation value is learned, and the basic quantization step is predicted according to the learning result. It is made to follow the fluctuation of the image.
[0052]
【Example】
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 shows a schematic configuration of an image encoding apparatus according to an embodiment of the present invention. In FIG. 1, the same components as those in FIG. 7 described above are denoted by the same reference numerals, and the description thereof is omitted.
[0053]
In the configuration of FIG. 1, the components added to the configuration of FIG. 7 are an image information evaluation circuit 50, a scene change detection circuit 31, a compression method selection circuit 32, and a motion vector generation circuit 33. The detector 38, the quantization step controller 39, and the frame memory 40 are changed.
That is, the image coding apparatus according to the embodiment of the present invention includes a frame memory 40 as image data storage means for storing a plurality of pieces of input image data, and a plurality of pieces of image data stored in the frame memory 40. An image information evaluation circuit 50 for evaluating the amount of information, a scene change detection circuit 31 as an image correlation detection means for detecting a correlation between images from a plurality of pieces of image data stored in the frame memory 40, and image data A DCT circuit 14 that performs orthogonal transformation processing (DCT processing) on the DCT coefficient to generate the DCT coefficient, a quantizer 15 that quantizes the DCT coefficient generated by the DCT circuit 14 in a predetermined quantization step, and the image Correlation information between the evaluation value of the amount of information obtained by the information evaluation circuit 50 and the image from the scene change detection circuit 31 The compression method selection circuit 32 for adaptively selecting the compression method (picture type, macroblock type, GOP length) of the image data based on the change detection output), and the compression method selected by the compression method selection circuit 32 is 1 A quantization step controller 39 for predicting a basic quantization step at the time of quantization in the quantizer 15 from the scheduled compressed data amount obtained by compressing image data for a screen and the evaluation value; It is characterized by that.
[0054]
In FIG. 1, first, input image data input from the input terminal 1 is stored in the frame memory 40. The frame memory 40 is different from the frame memory 10 of FIG. 3 and can store a predetermined number of frames. As the predetermined number to be accumulated at this time, if the number is too large, the frame memory 40 becomes large, which is not preferable. The effective length (number of frames) as the predetermined number is the bit rate, the capacity of the output buffer 18, and the interval between images of the compression method of intra-frame coding (in most cases, it may be the GOP length). Depends heavily on This is because the output buffer 18 absorbs non-uniformity in the size of the compressed data resulting from the difference in compression method and compression rate, and the range where the constant bit rate can be obtained is that the bit rate, output buffer capacity and intra-frame coding are This is because it is limited by conditions such as the interval between images.
[0055]
By the way, in general, compression by the intra-frame coding method is performed periodically (this is often a GOP delimiter), and this intra-frame coding compression method is the amount of data after the compression. Is considerably larger than other methods (interframe predictive coding). For this reason, it is one reasonable method to examine the information amount at intervals between compressed images (or GOPs) by intra-frame coding and distribute the data amount.
[0056]
However, in the system of the present embodiment, as will be described later, even when the correlation between the preceding and subsequent images becomes extremely low due to a scene change or the like, compression is performed using the intra-frame coding system. When intra-frame coding is performed, for example, when the regular intra-frame coding is performed in the vicinity of the intra-frame coded image based on the scene change, the intra-frame coded image that is periodically performed is displayed. On the other hand, since it becomes difficult to maintain a constant bit rate or uniform image quality, the necessity of compression by intra-frame coding is lost, and it is necessary to stop the compression by the intra-frame coding.
[0057]
Therefore, it is considered that the storable capacity of the frame memory 40 (the predetermined number) may be in the vicinity of the intra-frame-encoded image where the scene change should be performed periodically as described above. Thus, it is appropriate that the period is about twice as long as the period in which compression is periodically performed by intra-frame coding.
Of course, the predetermined number is merely an example, and is not limited to this, and can be changed according to various conditions.
[0058]
The image data stored in the frame memory 40 is sent to the image information evaluation circuit 50 as appropriate.
Here, the image information evaluation circuit 50 roughly calculates two kinds of parameters.
The first parameter indicates the amount of information of the image itself so that the amount of data after compression when compression is performed by intraframe coding can be predicted. As the first parameter, for example, DCT processing is performed for each block on the image data supplied from the frame memory 40, and the sum and statistics of the DCT coefficients are taken. When it becomes larger, the sum of the mean square error for each block is obtained. In any case, the image information evaluation circuit 50 calculates a parameter that represents the information amount of the image and can be used to analogize the data amount after compression.
[0059]
The second parameter indicates the amount of difference information of an image capable of predicting the amount of data after compression when compression is performed by inter-frame predictive coding. As the parameter in this case, for example, the sum of the difference values between the image stored in the frame memory 40 and the image after motion compensation in the block is used. In calculating this parameter, it is possible to use a minimum error in which a motion vector obtained by a general motion vector detection circuit (motion detector 38 and motion vector generation circuit 33) is detected.
[0060]
At this time, as a parameter for analogy (prediction) of the amount of data after compression by inter-frame predictive coding, a motion vector detection circuit (motion detector 38 and motion vector generation circuit 33) with only general luminance information is used. In addition to the minimum error obtained only from the obtained motion vector and the luminance information from which the motion vector is detected, in this embodiment, the minimum error obtained from only the color difference information from which the motion vector is detected is newly used. .
[0061]
In the present embodiment apparatus, the minimum error from the luminance information and the minimum error from the color difference information obtained in this way are used as the error of the macroblock, and the compression method selection circuit described later using the error of the macroblock. At 32, the compression method is determined.
The evaluation value (parameter) of the image information calculated as described above by the image information evaluation circuit 50 is a scene change detection circuit 31, a compression method selection circuit 32, and a quantization step controller 39 described below. And sent to.
[0062]
The image information evaluation circuit 50 also sends image information to the compression method selection circuit 32 in order to count images when the GOP length is determined in the compression method selection circuit 32 described later.
Next, the scene change detection circuit 31 detects a scene change using the output (for example, the second parameter) of the image information evaluation circuit 50.
[0063]
Here, the purpose of detecting a scene change in the scene change detection circuit 31 is mainly to be used as a judgment material for determining a compression method of either inter-frame predictive coding or intra-frame coding. This is because an image with extremely low correlation between before and after such as a scene change portion can be compressed more efficiently by intra-frame coding than by inter-frame prediction coding. In addition, since the data after compression becomes large in the scene change portion, it is important to grasp the scene change from the viewpoint of data amount distribution and output buffer management.
[0064]
Since the scene change as described above exists where the correlation is significantly impaired in the previous and subsequent images, for example, the scene change portion includes, for example, a difference value between the previous and next images, for example, the image after motion vector compensation. It can be detected by obtaining the sum of the difference values in the entire image and obtaining the ratio of the sum in the previous and subsequent images.
[0065]
For this reason, the scene change detection circuit 31 of this embodiment detects a scene change using the output of the image information evaluation circuit 50. That is, since the image information evaluation circuit 50 outputs the sum of the difference values of the image after motion compensation in the block as the second parameter as described above, the scene change detection circuit 31 performs the block of the difference value. The above-described calculation for scene change detection can be performed using the sum of the two.
[0066]
Next, the compression method selection circuit 32 will be described.
The compression method selection circuit 32 obtains the second change obtained from the scene change detection output from the scene change detection circuit 31, the count value obtained by counting the image information from the image information evaluation circuit 50, and the luminance information and the color difference information. This is a circuit that selects which compression method to use for the intra-frame coding / inter-frame prediction coding (P, B picture) based on the parameter (minimum error).
[0067]
That is, the compression method selection circuit 32 performs intraframe coding / interframe prediction coding for each macroblock based on the parameters obtained from the luminance information and color difference information for each macroblock obtained by the image information evaluation circuit 50. The expected amount of data generated by each compression method (predicted amount of occurrence) is compared, and a compression method that reduces the expected amount of generation is selected.
[0068]
Also, the compressed image by the intraframe coding method must be at least at the beginning of the GOP. Further, since the GOP is set at a certain interval in consideration of random access, the I picture is inevitably generated periodically at the interval, and in this embodiment, it is also generated by a scene change or the like. Is.
For this reason, the compression method selection circuit 32 counts the image information from the image information evaluation circuit 50, and the scene change detection output from the scene change detection circuit 31 is sent to the compression method selection circuit 32. Added. As a result, the compression method selection circuit 32 selects the regular intra-frame coding from the count value of the image and also selects the intra-frame coding at the time of detecting the scene change (that is, determines the GOP interval). In other cases, inter-frame predictive coding is selected.
[0069]
The compression method selection circuit 32 performs switching control of the changeover switches 13 and 24 according to the selection of the compression method, and sends information indicating the selection result to the quantization step controller 39.
The quantization step controller 39 controls the quantization step width of the quantizer 15 based on the information indicating the code amount from the variable length coding circuit 17. The data output from the output buffer 18 is output from the output terminal 2 as an encoded output subjected to compression encoding.
[0070]
The output from the quantizer 15 is inversely quantized by an inverse quantizer 27 and further subjected to inverse DCT processing by an inverse DCT circuit 26. The output of the inverse DCT circuit 26 is sent to the adder 25.
The adder 25 is also supplied with the inter-frame prediction image data from the motion compensator 21 via the changeover switch 24 that is turned on in the case of the inter-frame prediction encoding frame, and the data and the inverse DCT circuit 26 Addition with output data is performed. The output data of the adder 25 is temporarily stored in the frame memory 22 and then sent to the motion compensator 21.
[0071]
The motion compensator 21 performs motion compensation based on the motion vector detected by the motion detector 38 and generated by the motion vector generation circuit 33, and outputs inter-frame prediction image data obtained thereby.
The quantization step controller 39 knows from the evaluation value (parameter) from the image information evaluation circuit 50 that the amount of information of the image, and further, the correlation between the preceding and following images becomes extremely low like a scene change. From the information indicating the selection result from the compression method selection circuit 32, it is also possible to know which of the images selected is intra-frame coding or inter-frame prediction coding.
[0072]
Therefore, in the quantization step controller 39, compared to the conventional quantization step control that feeds back only the remaining amount of the output buffer 18, it is possible to follow an abrupt change in the information amount of the input image. Appropriate quantization step control can be performed according to the change in the information amount, and further appropriate quantization step control can be performed according to the compression method of intraframe coding / interframe prediction coding.
[0073]
Next, the flow of processing in the configuration of this embodiment will be described with reference to the flowchart of FIG.
First, in step S <b> 81, image data input to the input terminal 1 is sequentially stored in the frame memory 40.
Here, since the determination of the frequency and interval of the I picture affects the image quality as described above, it is necessary to determine the GOP prior to encoding in connection with this, and rate control (quantization) In order to perform the bit rate control by step control), information about an image for 1 GOP must be collected prior to encoding. In this way, a large amount of frame memory 40 is used in order to obtain a sufficient delay time until the image data input one after another is analyzed in the meantime and encoded.
[0074]
Next, in step S82, the motion detector 38 and the motion vector generation circuit 33 detect and generate a motion vector necessary for compression by interframe predictive coding. That is, in step S82, motion detection (motion estimation) is performed so that each image data in the frame memory 40 can be compression-coded as a P picture or a B picture according to a predetermined schedule.
[0075]
Here, an I picture is not defined for an image subjected to motion detection. This is because it is not determined at this point which image data is to be an I picture, and since I pictures do not require motion compensation, any image data can later become an I picture.
The image information evaluation circuit 50 uses one of the parameters (second distortion) used for encoding a so-called minimum distortion or absolute difference (AD) used for the motion detection. Parameter).
[0076]
Note that the absolute value sum (AD) of the errors is obtained by dividing the reference-side image into 8 × 8 pixel blocks and macroblocks (MB) of luminance data 8 × 8 × 4 pixels and color difference data 8 × 8 × 2 pixels. ), The absolute value sum of the difference between each pixel and the macro block on the search side cut out by the motion vector obtained when motion detection is performed, and can be obtained by the following equation (1).
[0077]
[Expression 1]

[0078]
This is further used as a sum of absolute values (AD) of errors of the macroblock by using the total of the blocks in the macroblock.
This parameter is used to estimate the amount of information in consideration of scene change determination and image correlation when compression is performed by inter-frame predictive coding.
This parameter is also used to determine the macroblock type as described below.
[0079]
The parameter SAD for estimating the information amount of the image is the sum of the absolute value sums (AD) of the errors in one image, as shown in Equation (2).
SAD = ΣAD (2)
Of course, in addition to the absolute value sum (AD) of the errors, a minimum distortion may be used.
[0080]
In step S83, the image information evaluation circuit 50 evaluates the average absolute value (MAD) and activity (Activity) of errors for each image in addition to the parameters obtained by the motion detection.
The average absolute value sum of errors (MAD) is a parameter for estimating the amount of information of an I picture, and is obtained for each 8 × 8 pixel block by the following equation (3). Aggregate. This parameter is also used to determine the macroblock type.
[0081]
[Expression 2]

[0082]
This is further used for the determination of the macroblock using the sum of the blocks in the macroblock as shown in equation (4).
MAD = Σ blockMAD (4)
Further, as shown in equation (5), the values of the macroblocks are summed in one image, and the value is set as a parameter SMAD representing the amount of information (as an I picture) of the image.
[0083]
SMAD = ΣMAD (5)
In addition, the activity described above is to improve the compression efficiency of the image while maintaining the image quality by controlling the quantization step more finely according to the state of the image of the macroblock in one screen. It is a parameter for quantifying the state.
[0084]
For example, in a flat part where the image level is small in the pixel level change (flat part) in one block, distortion due to quantization tends to be conspicuous, and the quantization step should be reduced. The quantization distortion is not noticeable in the block, and the amount of information is large, so the quantization step should be increased.
Therefore, for example, a parameter representing the flatness of the block is used as this activity.
[0085]
Next, in step S84, the scene change detection circuit 31 detects a scene change. The scene change detection by the scene change detection circuit 31 is performed using the parameter AD obtained by the image information evaluation circuit 50. Specifically, the scene change is detected at the rate of change using the parameter SAD obtained by adding the parameter AD for one screen.
[0086]
Next, in the compression method selection circuit 32, the GOP length is determined in step S85, and the compression method is selected (picture type is determined) in step S86.
Here, as described above, the GOP is divided for each appropriate number of frames in consideration of random accessibility during encoding. At this time, since at least the first picture in the GOP code order must be an I picture, the number of pictures is counted and the picture type is changed to an I picture periodically.
[0087]
On the other hand, when the correlation changes in the preceding and succeeding pictures due to the scene change, it is also efficient to perform compression coding with the I picture as described above. However, since the I picture has a low compression rate, if it appears frequently at a low bit rate, the image quality is degraded. Therefore, when a scene change is detected by the scene change detection circuit 31, the compression method selection circuit 32 adaptively determines the GOP length so as to keep the interval between I pictures moderate.
[0088]
In the next step S87, the total evaluation value (difficulty level: difficulty) as shown in the flowchart of FIG. 3 to be described later is totalized in the compression method selection circuit 32, and the macroblock type is determined based on the total evaluation value. That is, the compression method selection circuit 32 determines the compression method and macroblock type for each macroblock in step S87.
[0089]
As described above, the average absolute value sum (MAD) of errors and the sum of absolute values of errors (AD) already obtained are the data amounts after compression when compressed by intraframe coding / interframe prediction coding, respectively. Therefore, if these two parameters are compared, it can be determined which of the macroblock types of intraframe coding and interframe prediction coding has a smaller data amount.
[0090]
In the next step S88, the quantization step controller 39 performs bit allocation for rate control. That is, in this step S88, the allocation amount for each screen is bit-distributed according to the difficulty level (difficulty) obtained in step S87.
The data size after compression encoding for each picture varies greatly depending on the encoding method, the amount of information of the original image data, the correlation before and after, and the like. This is especially true if average image quality is maintained.
[0091]
The nonuniformity of the data size after compression coding for each picture is absorbed to some extent by the output buffer 18, but on average it must be a constant bit rate. Therefore, if a certain section is determined, the total compressed data amount of the picture during that period is determined. Therefore, the amount of data after compression, that is, the amount of bits that can be used by each picture, is determined for each picture using the picture type that has already been determined and the information amount parameter of the image that has been examined in advance.
[0092]
At this time, for example, it is small for images and B pictures with a small amount of information, and is large for images and I pictures with a large amount of information. This is called bit allocation. This makes it easy to suppress variations in image quality and maintain a constant rate.
For example, in this embodiment, GOP is used as the section, and bit allocation is performed as in the following formulas (6) and (7).
Total Bit Count = (Bit Rate [bit / s] x Number Of Picture In GOP [picture]) / (Picture Rate [picture / s]) [bit] (6)
Available Bits = (Total Bit Count x Target image information parameter) / GOP total value of image information parameter) [bit] (7)
The information amount parameter used in the equation (7) is obtained by multiplying the parameters SMAD and SAD described above by the multiplier for each picture type to be compressed. The multiplier adjusts the relationship between the parameter and the image quality between the picture types.
[0093]
Note that the GOP total value of the image information amount parameter in the equation (7) is obtained as shown in the equation (8).
GOP total value of image information amount parameter = Ki x ΣDifi + Kp x ΣDifp + Kb x ΣDifb
Difi: P picture difficulty
Difb: B picture difficulty (8)
In the quantization step controller 39, in the next step S89, the basic quantization step is determined based on the regression prediction process using the learning parameters A and B as shown in the flowchart of FIG. That is, in this step S89, the basic quantization step is determined (predicted) by regression prediction from the bit allocation amount and difficulty of one screen described above.
[0094]
When the picture type is determined as described above and the macroblock type is determined, the average absolute value sum (MAD) of the errors and the sum of absolute values of errors (AD) for one screen are totaled according to the macroblock type. Thus, the information amount parameter (that is, the difficulty level) of one screen can be measured. Therefore, if the information amount parameter and the amount of data after quantization are determined from past results, the quantization step can be estimated.
[0095]
The present invention relates to this basic quantization step determination mechanism, and the quantization step controller 39 of this embodiment uses the allocated bit amount (allocated_bit) for each screen and the difficulty level (difficulty) as follows. By this method, the basic quantization step (quantization scale: Q_scale) of one screen is determined.
First, as shown in equation (10),
log (allocated_bit / difficulty) = A * log (Q_scale) + B (10)
In the above equation (10), A and B are obtained in advance by learning (experiment). Further, the quantization scale (Q_scale) is obtained from Expression (11) obtained by modifying Expression (10).
[0096]
Q_scale = exp ((log (allocated_bit / difficulty) -B) / A) (11)
The quantization step obtained in this way is set as the basic quantization step for the picture.
Next, the quantization step controller 39 controls the quantization step in the screen as in step S90.
[0097]
That is, as described above, the quantization step controller 39 controls the quantization step in the screen for each block so that the image quality is as high as possible and the compression efficiency is as high as possible. Specifically, the quantization step for the quantizer 15 is controlled by adding or subtracting the quantization step for each macroblock from the basic quantization step based on the information such as the activity and the macroblock type.
[0098]
In the next step S91, the variable length encoding circuit 17 performs encoding. Since all the parameters of compression encoding are determined as described above, after that, compression encoding is performed according to the MPEG rules.
In the next step S92, the bit generation amount for each macroblock and the quantization scale (Q_scale) are tabulated.
[0099]
Finally, in step S93, each parameter described above is updated. That is, as shown in the flowchart of FIG. 5 to be described later, the prediction sample is updated based on the average quantization step for each macroblock, the total value of the generation amount for each macroblock, and the degree of difficulty.
Here, the relationship between the amount of image information, the basic quantization step, and the amount of data after compression depends on the image to be compressed. Therefore, here, the parameters and prediction parameters used in the expression representing the relationship are learned by feeding back the actual data amount after compression, thereby improving the prediction accuracy.
[0100]
In this case, first, learning parameters A and B are learned and corrected for each picture type by the following method.
For example, if the average value of the quantization scale (Q-scale) for each macroblock is (average_Q), and the generated amount after one screen compression is (generated bit), as shown in equation (12),
x = log (average_Q), y = log (generated bit / difficulty) (12)
Thus, the parameters A and B can be obtained by the least square error method as shown in the following equations (13) and (14). In the formula, n is the number of samples.

Next, a flowchart for calculating the difficulty level (difficulty) in step S87 in FIG. 2 will be described with reference to FIG.
[0101]
When the process proceeds to the calculation of the difficulty level (difficulty) in step S87 in FIG. 2, the process proceeds to step S100 and subsequent steps in FIG.
In FIG. 3, in step S101, first, initialization = 0 is set, and in the next step S102, it is determined whether or not the macroblock type is an intra-frame encoded macroblock (intra macroblock: intra MB). . If it is determined that the block is an intra macroblock, the difficulty level (difficulty) is set to the mean absolute value sum (MAD) of errors for one screen in step S106, and the process proceeds to step S110. If it is determined in step S102 that the macro block type is not an intra macro block, the process proceeds to step S103.
[0102]
In this step S103, it is determined whether or not the macroblock type is a forward prediction macroblock (forward MB) of interframe predictive coding macroblocks (inter macroblock: inter MB). If it is determined in step S103 that the block is a forward prediction macroblock, the difficulty (difficulty) is set to the absolute value sum (AD_for) of the forward prediction macroblock error in step S107, and the process proceeds to step S110. If it is determined in step S103 that the macroblock type is not a forward prediction macroblock, the process proceeds to step S104.
[0103]
In step S104, it is determined whether or not the macroblock type is a backward prediction macroblock (backward MB) of inter macroblocks. If it is determined in step S104 that the block is a backward prediction macroblock, the difficulty (difficulty) is set to the absolute value sum (AD_bac) of the error of the backward prediction macroblock in step S108, and the process proceeds to step S110. If it is determined in step S104 that the macroblock type is not a backward prediction macroblock, the process proceeds to step S105.
[0104]
In step S105, it is determined whether or not the macroblock type is a bidirectionally predicted macroblock (bidirectional MB) of inter macroblocks. If it is determined in step S105 that it is a bidirectional prediction macroblock, the difficulty level (difficulty) is set to the absolute value sum (AD_bid) of the bidirectional prediction macroblock error in step S109, and the process proceeds to step S110. If it is determined in step S105 that the macroblock type is not a bidirectionally predicted macroblock, the process proceeds to step S110.
[0105]
In step S110, it is determined whether or not the aggregation of the difficulty levels for all macroblocks has been completed. If it is determined that the calculation has not been completed, the process returns to step S102. If it is determined that the calculation has been completed, step S111 is performed. To complete the difficulty level and return to step S87.
Next, a flowchart of regression prediction based on the learning parameters A and B in step S89 in FIG. 2 will be described with reference to FIG.
[0106]
When the process proceeds to the regression prediction process based on the learning parameters A and B in step S89 in FIG. 2, the process proceeds to the process after step S120 in FIG.
In FIG. 4, in step S121, the calculation of equation (11) is performed. In the next step S122, the regression prediction process using the learning parameters A and B obtained by the calculation of equation (11) is terminated. Return to step S89 of step 2.
[0107]
Next, a flowchart for correcting and updating the learning parameters A and B in step S93 in FIG. 2 will be described with reference to FIG. That is, in the flowchart of FIG. 5, learning parameters A and B for regression prediction are obtained by adding new samples to the sample set, obtaining parameters A and B by the least square error method, and removing old samples from the sample set. Update and correct.
[0108]
When the process proceeds to the process for correcting the learning parameters A and B in step S93 in FIG. 2, the process proceeds to step S130 and subsequent steps in FIG.
In FIG. 5, in step S131, the learning parameters A and B are corrected and updated by the least square error method. Here, the data of one screen is totaled, and as shown in the following formulas (15) and (16),
x = log (average_Q) (14)
y = log (generated _bit / difficulty) (15)
And
[0109]
In the next step S132, new x and y data are added to the regression analysis sample set, and in step S133, the learning parameters A and B are calculated by the least square error method. In the next step S134, the learning parameters A and B are clipped with the maximum value and the minimum value, and in step S135, the old x and y data are removed from the sample set for regression analysis. Thereafter, in step S136, the learning parameters A and B are corrected and updated, and the process returns to step S93 in FIG.
[0110]
The relationship between the generated bit amount and the quantization scale (Q_scale) obtained as described above is as shown in FIG.
To summarize the above-described image coding apparatus according to the embodiment of the present invention, in the image coding apparatus according to the present embodiment, a parameter (evaluation value) for estimating the amount of information of input image data in the control of the basic quantization step is described. The basic quantization step is accurately predicted from the scheduled compressed data amount after quantization.
[0111]
Here, in order to accurately predict the basic quantization step, the relationship between the parameter for estimating the information amount of the input image data, the quantization step actually used for compression, and the data amount after compression is learned. . Also, when predicting the basic quantization step, as a method of estimating the amount of information of input image data, the difficulty (difficulty) for each screen is determined according to the macroblock type after the macroblock type is determined. Either the absolute value sum (AD) or the average sum of absolute values of errors (MAD) is added, and one screen is added, and this is used as the difficulty of the screen. Furthermore, in this embodiment, the allocated bit amount (allocated_bit) for each screen is totaled by 1 GOP for the difficulty (difficulty), and bit allocation is performed according to the difficulty for this 1 GOP.
[0112]
Further, in the apparatus of this embodiment, it is assumed that there is a relationship of the above formula (10) for each picture type from the allocated bit amount (allocated_bit) and the difficulty level (difficulty) for each screen, and A and B in the formula are preliminarily determined. Is obtained by learning, and the basic quantization scale (Q_scale) is obtained from Expression (11) obtained by modifying Expression (10). At this time, the learning parameters A and B for each macroblock are obtained by compressing the average value of the quantization step for each macroblock and the one-screen compression as shown in the equations (12), (13), and (14). It can be obtained by the least square error method using the generated amount.
[0113]
Further, when learning parameters A and B are learned and corrected, each of the picture types is obtained from the data for the most recent n seconds (ie, the past data for the past past n seconds or more is not used). Yes. At this time, past data is plotted on a graph as shown in FIG. 6, and the average value of the maximum value and the minimum value of the basic quantization scale (Q_scale) that can be linearly approximated is put in the sample set. The prediction line is stabilized, and furthermore, upper and lower limits are set for the learning parameters A and B, so that stable basic quantization step prediction can be performed even if a large amount of unique data is input.
[0114]
As described above, according to the image coding apparatus of the present embodiment, since the basic quantization step can be predicted with high accuracy, the compression allocated to one screen without particularly controlling the quantization step within the screen. The amount of bits will be close to the expected bit amount later, and therefore, bit usage and remainder will not occur for each screen, so that the average image quality can be maintained. For example, even if the quantization step is well controlled in the screen, if the basic quantization step is significantly different, the quantized step will change in the screen, and the compressed image can detect image quality inhomogeneity. However, in this embodiment, since the bit allocation is performed according to the difficulty level in compression, the basic quantization step can be predicted with high accuracy. A compressed image in which inhomogeneity is difficult to detect can be produced.
[0115]
Further, in this embodiment, the mechanism for accurately predicting the basic quantization step is corrected and learned by the fluctuating input image and follows the input image, so that the mechanism for accurately predicting the basic quantization step can be maintained. .
Furthermore, in the present embodiment apparatus, the mechanism for accurately predicting the basic quantization step uses the influence of the input image in a recent period for learning without being dragged by the past learning result. The basic quantization step that quickly follows the input image can be predicted.
[0116]
Further, in the apparatus of this embodiment, the mechanism for accurately predicting the basic quantization step updates the prediction line from many learning data obtained by experiment with the learning data of the latest input image, and the prediction obtained by experiment. As the straight line data, data close to the maximum value and the minimum value are input for both x and y. Therefore, in the least square error method, since the influence of the experiment becomes large, even if learning data of a peculiar input image is entered, the basic quantization step can be predicted without being dragged by it.
[0117]
Furthermore, according to the image coding apparatus of the present embodiment, prediction that cannot be prevented even by suppressing fluctuations in the learning parameters A and B obtained from x and y in the vicinity of the maximum and minimum values obtained in the above experiment. Even in the case of a straight line, the abnormal learning result is not saved by finally clipping the parameters A and B, so that a mechanism for accurately predicting the basic quantization step can be maintained.
[0118]
【The invention's effect】
In the image coding apparatus of the present invention, the amount of information is evaluated from a plurality of accumulated image data, and further, the correlation between images is detected, and adaptively based on the evaluation value of the information amount and the correlation information between images. By selecting a compression method of image data and predicting the basic quantization step from the estimated compressed data amount and the evaluation value obtained by compressing the image data for one screen by the selected compression method. Steps can be predicted with high accuracy, and even if the quantization step control is not performed on the screen, it is close to the expected bit amount after compression allocated to one screen. Therefore, the average image quality can be maintained. Therefore, efficient image compression is possible, and the overall image quality can be improved.
[0119]
In the image coding apparatus according to the present invention, the relationship between the quantization step actually used for compression, the amount of data after compression, and the evaluation value is learned, and the basic quantization step is predicted according to the learning result. As a result, even a changing input image can follow the input image, and the basic quantization step can be accurately predicted.
[Brief description of the drawings]
FIG. 1 is a block circuit diagram showing a schematic configuration of an image encoding apparatus according to an embodiment of the present invention.
FIG. 2 is a flowchart for explaining the operation of the apparatus according to the embodiment.
FIG. 3 is a flowchart of difficulty level tabulation.
FIG. 4 is a flowchart of regression prediction due to learning parameters A and B;
FIG. 5 is a flowchart for correcting and updating learning parameters A and B;
FIG. 6 is a diagram showing the relationship between the amount of generated bits and the basic quantization scale.
FIG. 7 is a block circuit diagram showing a schematic configuration of a conventional image encoding device.
FIG. 8 is a diagram for explaining the resolution and configuration of an image.
FIG. 9 is a diagram for explaining macroblocks and blocks;
FIG. 10 is a diagram for explaining zigzag scanning.
FIG. 11 is a diagram for explaining an example of a GOP;
[Explanation of symbols]
22, 40 frame memory
11 Block divider
12 Differentiator
13,24 switch
14 DCT circuit
15 Quantizer
16 Zigzag scan circuit
17 Variable length coding circuit
18 Output buffer
19, 39 Quantization step controller
20 Motion detector
21 motion compensator
25 Adder
26 Inverse DCT circuit
27 Inverse Quantizer
31 Scene change detection circuit
32 Compression method selection circuit
33 Motion vector generation circuit
50 Image information evaluation circuit

Claims

Image data storage means for storing a plurality of input image data;
A first parameter indicating the information amount of the image itself for evaluating the information amount of the input image data from a plurality of pieces of image data stored in the image data storage means, and a second parameter indicating the difference information amount of the image Image information evaluation means for outputting image information for parameters and image count;
An inter-image correlation detection unit that detects a scene change using the second parameter from the image information evaluation unit as correlation information between images of a plurality of image data stored in the image data storage unit;
Orthogonal transform means for performing orthogonal transform processing on image data and generating orthogonal transform coefficients;
Quantization means for quantizing the orthogonal transformation coefficient generated by the orthogonal transformation means in a predetermined quantization step;
Based on the image information obtained by the image information evaluation unit and the scene change detection output that is correlation information between images from the inter-image correlation detection unit, intra-frame coding is performed using the count value of the image information. A compression method selection means that periodically selects and also selects intra-frame coding at the time of detection of the scene change, and selects inter-frame predictive coding otherwise .
One of the first and second parameters is added according to the planned compressed data amount obtained by compressing the image data for one screen by the compression method selected by the compression method selection means and the macroblock type. A quantization step control means for predicting a basic quantization step at the time of quantization in the quantization means, from the difficulty level obtained by summing up the screen ,
The quantization step control means sets predetermined parameters A and B when the scheduled compressed data amount of the image data for one screen is allocated_bit , the difficulty is difficulty , and the basic quantization step is Q_scale. Using,
Q_scale = exp ((log (allocated_bit / difficulty) -B) / A)
Image encoding device and obtains the basic quantization step Q_scale by formula.

The above parameters A and B are obtained by correcting the parameters A and B by the least square error method based on the relationship between the quantization step Q_scale actually used for compression, the compressed data amount allocated_bit, and the above difficulty level. The image encoding apparatus according to claim 1 , wherein the image encoding apparatus is obtained by updating.

The first parameter indicating the information amount of the image itself for evaluating the information amount of the input image data from the plurality of image data stored in the image data storage means for storing a plurality of input image data, image difference information An image information evaluation step for outputting a second parameter indicating the amount and image information for image counting;
An inter-image correlation detection step of detecting a scene change using the second parameter from the image information evaluation unit as correlation information between the images of the plurality of image data stored in the image data storage unit;
Based on the image information obtained in the image information evaluation step and the scene change detection output that is the correlation information between images obtained in the inter-image correlation detection step , A compression method selection step of selecting encoding periodically and selecting intra-frame encoding at the time of detecting the scene change, and selecting inter-frame predictive encoding otherwise .
An orthogonal transform process for performing orthogonal transform processing on image data and generating orthogonal transform coefficients;
Add one of the first and second parameters according to the planned compressed data amount obtained by compressing the image data for one screen by the compression method selected in the compression method selection step and the macroblock type. And a quantization step control step for predicting a basic quantization step at the time of quantization from the difficulty level obtained by summing up one screen ,
A quantization step of quantizing the orthogonal transformation coefficient generated in the orthogonal transformation step in the predetermined quantization step;
In the quantization step control step, when the scheduled compressed data amount of the image data for one screen is allocated_bit, the difficulty is Difficulty, and the basic quantization step is Q_scale, predetermined parameters A and B Using,
Q_scale = exp ((log (allocated_bit / difficulty) -B) / A)
An image encoding method characterized in that a basic quantization step Q_scale is obtained by the following equation.