JP4273386B2

JP4273386B2 - Encoding apparatus, encoding method, program, and recording medium

Info

Publication number: JP4273386B2
Application number: JP2002104315A
Authority: JP
Inventors: 弘道上野
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2002-04-05
Filing date: 2002-04-05
Publication date: 2009-06-03
Anticipated expiration: 2022-04-05
Also published as: JP2003299081A

Description

【０００１】
【発明の属する技術分野】
本発明は、符号化装置および符号化方法、プログラム、並びに記録媒体に関し、特に、フィードバック型レート制御において、ビット補給レート制御を行う場合に用いて好適な、符号化装置および符号化方法、プログラム、並びに記録媒体に関する。
【０００２】
【従来の技術】
近年、映像データおよび音声データを圧縮して情報量を減らす方法として、種々の圧縮符号化方法が提案されており、その代表的なものにＭＰＥＧ２（Moving
Picture Experts Group Phase 2）がある。
【０００３】
このような画像圧縮方式において、良好なエンコード画質を得る方法として、ＴＭ５（Test Model 5）がある。ＴＭ５のステップ１においては、ピクチャ単位に与えるターゲットビットの算出を行う。ターゲットビットの算出においては、ピクチャタイプ別のＧＣ（Global Complexity）のそれぞれの比率に応じて、そのＧＯＰ（Group of Picture）内の残りのピクチャに割り当てることができるビット量Ｒを比例配分して、各ピクチャに割り当てるビット量を算出する。
【０００４】
ＴＭ５は、ＧＯＰあたりの発生ビット量をほぼ一定にするために優れた方法であるが、固定レート符号化を行う場合には、必ずしも、ＧＯＰの発生ビット量を一定にする必要はない。固定レート符号化においては、ＶＢＶ（Video Buffering Verifier）バッファの占有量が、規定値をオーバーフロー、あるいはアンダーフローしないようにしなければならない。
【０００５】
ＴＭ５においては、ＧＯＰあたりの発生ビット量がほぼ一定であるから、ＶＢＶバッファがオーバーフローあるいはアンダーフローすることはない。しかしながら、ＴＭ５においては、低いビットレートで符号化した場合に、バッファ容量を有効利用することができない。例えば、ＭＰＥＧのＭＰ＠ＰＬにおいて、ＴＭ５を適用した場合、ＶＢＶバッファ容量は約１．８Ｍｂｉｔであるのに対して、バッファから引き抜かれる１枚あたりのピクチャのビット量が少ないため、約１．８Ｍｂｉｔを有効に利用することができない。
【０００６】
このように、入力される絵柄に関わらず、一定量のビット量を割り当ててしまうことにより、符号化難易度が高い絵柄については、符号化歪みが顕著に発生してしまい、一方、符号化難易度が低い絵柄は、符号化歪みが少ないため、全体として、むらの多い不安定な画像になってしまう。
【０００７】
このような問題を解決するために、符号化難易度が高い絵柄には、バッファがアンダーフローしない範囲で、より多くのビット量を配分し、一方、符号化難易度が低い絵柄には、バッファがオーバーフローしない範囲で、絵柄に適した少ないビット量を配分する必要がある。
【０００８】
そこで、本出願人は、特開平１０−７５４４３において、映像データの部分毎の絵柄の複雑さに応じて発生ビット量を調節し、全体として、圧縮後の映像の品質を向上させることができるようにした、映像データ圧縮装置およびその方法について開示している。
【０００９】
ＴＭ５において、ＧＯＰの残りのピクチャに割り当てることができる使用可能ビット量Ｒは、レートコントロールで重要なパラメータである。例えば、ＧＯＰの前半において、複雑な絵柄の画像が続いたために、たくさんのビット量を割り当ててしまうと、ＧＯＰの後半で、ビット量Ｒが、極端に少なくなってしまったり、あるいは、負の数になってしまう。
【００１０】
これに対して、本出願人が特開平１０−７５４４３において開示したビット補給レート制御とは、これからエンコードしようとする複数枚のピクチャに対して割り当てられている使用可能ビット量Ｒに、そのエンコード対象の画像難易度やＶＢＶバッファ占有量に応じて、ビット量を加える、あるいは減じる（以下、加えられる、あるいは減じられるビットをsupplementと称する）ことを特徴とするレート制御方式である。
【００１１】
【発明が解決しようとする課題】
以前提案されたビット補給レート制御は、これからエンコードしようとする複数枚のピクチャ画像難易度等の情報が全て既知である場合、すなわちエンコード情報を先読みしたフィードフォワード（Feed Forward）型レート制御に適用されていたもので、例えば、ＧＯＰの１５枚のデータを蓄積した後、その画像符号化難易度を判断していたので、その情報蓄積に一定の遅延を生じてしまうものである。
【００１２】
しかしながら、先読み情報を得ることができないフィードバック（Feed Back）型レート制御では、未来のＶＢＶ余裕度を正確に見積もることができないため、sum_supplement（以下、sum_supと称する）の最大値および最小値をビットレートや使用可能ＶＢＶサイズによって決定した固定値を用いざるを得なかった。しかしながら、特に、sum_supの最大値が固定値の場合、ピクチャの発生量次第によってはＶＢＶアンダーフローを起こしやすくなるなどの問題があり、ＶＢＶアンダーフローを起こさないようにするためには、ＶＢＶ余裕度に応じて、sum_supの最大値を決定する必要があった。
【００１３】
本発明はこのような状況に鑑みてなされたものであり、フィードバック型レート制御において、ビット補給レート制御を行うことができるようにするものである。
【００１４】
【課題を解決するための手段】
本発明の符号化装置は、非圧縮データの符号化難易度を検出する第１の検出手段と、非圧縮データを、ＧＯＰを基準として圧縮符号化する符号化手段と、符号化手段により過去に符号化された直前のＧＯＰに含まれる非圧縮データのうちの、フレーム内符号化画像のビット発生量を検出する第２の検出手段と、非圧縮データが符号化された符号化ストリームをデコードするデコーダの入力バッファに対応する仮想バッファのバッファ容量から、第２の検出手段により検出されたフレーム内符号化画像のビット発生量を減算した値を算出し、符号化中のＧＯＰに含まれる非圧縮データに対して割り当てられる仮想バッファのバッファ容量のうちＧＯＰ内でまだ符号化されていない残りのピクチャに割り当てられる使用可能ビット量に加えられるＧＯＰ毎のビット補給量の合計値の最大値として設定する設定手段と、これから符号化されるＧＯＰのビット補給量を、設定手段により設定された合計値の最大値を満たし、かつ、第１の検出手段により検出された、符号化手段により過去に符号化されたＧＯＰに含まれる非圧縮データの符号化難易度が第１の値より高い場合、正の値となるように、符号化難易度が第１の値よりも低い第２の値より低い場合、負の値となるように、符号化難易度が第１の値と第２の値との間となる場合、０となるように算出する算出手段とを備えることを特徴とする。
【００１５】
シーンチェンジのＩピクチャと１つ前のＩピクチャとの、符号化難易度の差を検出する第３の検出手段と、第３の検出手段により、符号化難易度の差が検出された場合、第１の検出手段により検出されたひとつ前のＩピクチャの符号化難易度がシーンチェンジのＩピクチャの符号化難易度よりも低いとき、設定手段により設定されたビット補給量の合計値の最大値が多くなるように、第１の検出手段により検出されたひとつ前のＩピクチャの符号化難易度がシーンチェンジのＩピクチャの符号化難易度よりも高いとき、設定手段により設定されたビット補給量の合計値の最大値が少なくなるように、ビット補給量の合計値の最大値を再設定する再設定手段とを更に備えさせるようにすることができる。
【００１６】
本発明の符号化方法は、非圧縮データの符号化難易度を検出する第１の検出ステップと、非圧縮データを、ＧＯＰを基準として圧縮符号化する符号化ステップと、符号化ステップの処理により過去に符号化された直前のＧＯＰに含まれる非圧縮データのうちの、フレーム内符号化画像のビット発生量を検出する第２の検出ステップと、非圧縮データが符号化された符号化ストリームをデコードするデコーダの入力バッファに対応する仮想バッファのバッファ容量から、第２の検出ステップの処理により検出されたフレーム内符号化画像のビット発生量を減算した値を算出し、符号化中のＧＯＰに含まれる非圧縮データに対して割り当てられる仮想バッファのバッファ容量のうちＧＯＰ内でまだ符号化されていない残りのピクチャに割り当てられる使用可能ビット量に加えられるＧＯＰ毎のビット補給量の合計値の最大値に設定する設定ステップと、これから符号化されるＧＯＰのビット補給量を、設定ステップの処理により設定された合計値の最大値を満たし、かつ、第１の検出ステップの処理により検出された、符号化ステップの処理により過去に符号化されたＧＯＰに含まれる非圧縮データの符号化難易度が第１の値より高い場合、正の値となるように、符号化難易度が第１の値よりも低い第２の値より低い場合、負の値となるように、符号化難易度が第１の値と第２の値との間となる場合、０となるように算出する算出ステップとを含むことを特徴とする。
【００１７】
本発明の記録媒体に記録されているプログラムは、非圧縮データの符号化難易度を検出する第１の検出ステップと、非圧縮データを、ＧＯＰを基準として圧縮符号化する符号化ステップと、符号化ステップの処理により過去に符号化された直前のＧＯＰに含まれる非圧縮データのうちの、フレーム内符号化画像のビット発生量を検出する第２の検出ステップと、非圧縮データが符号化された符号化ストリームをデコードするデコーダの入力バッファに対応する仮想バッファのバッファ容量から、第２の検出ステップの処理により検出されたフレーム内符号化画像のビット発生量を減算した値を算出し、符号化中のＧＯＰに含まれる非圧縮データに対して割り当てられる仮想バッファのバッファ容量のうちＧＯＰ内でまだ符号化されていない残りのピクチャに割り当てられる使用可能ビット量に加えられるＧＯＰ毎のビット補給量の合計値の最大値に設定する設定ステップと、これから符号化されるＧＯＰのビット補給量を、設定ステップの処理により設定された合計値の最大値を満たし、かつ、第１の検出ステップの処理により検出された、符号化ステップの処理により過去に符号化されたＧＯＰに含まれる非圧縮データの符号化難易度が第１の値より高い場合、正の値となるように、符号化難易度が第１の値よりも低い第２の値より低い場合、負の値となるように、符号化難易度が第１の値と第２の値との間となる場合、０となるように算出する算出ステップとを含むことを特徴とする。
【００１８】
本発明のプログラムは、非圧縮データの符号化難易度を検出する第１の検出ステップと、非圧縮データを、ＧＯＰを基準として圧縮符号化する符号化ステップと、符号化ステップの処理により過去に符号化された直前のＧＯＰに含まれる非圧縮データのうちの、フレーム内符号化画像のビット発生量を検出する第２の検出ステップと、非圧縮データが符号化された符号化ストリームをデコードするデコーダの入力バッファに対応する仮想バッファのバッファ容量から、第２の検出ステップの処理により検出されたフレーム内符号化画像のビット発生量を減算した値を算出し、符号化中のＧＯＰに含まれる非圧縮データに対して割り当てられる仮想バッファのバッファ容量のうちＧＯＰ内でまだ符号化されていない残りのピクチャに割り当てられる使用可能ビット量に加えられるＧＯＰ毎のビット補給量の合計値の最大値に設定する設定ステップと、これから符号化されるＧＯＰのビット補給量を、設定ステップの処理により設定された合計値の最大値を満たし、かつ、第１の検出ステップの処理により検出された、符号化ステップの処理により過去に符号化されたＧＯＰに含まれる非圧縮データの符号化難易度が第１の値より高い場合、正の値となるように、符号化難易度が第１の値よりも低い第２の値より低い場合、負の値となるように、符号化難易度が第１の値と第２の値との間となる場合、０となるように算出する算出ステップとを含むことを特徴とする。
【００１９】
本発明の符号化装置および符号化方法、並びにプログラムにおいては、非圧縮データの符号化難易度が検出され、非圧縮データがＧＯＰを基準として圧縮符号化され、過去に符号化された直前のＧＯＰに含まれる非圧縮データのうちの、フレーム内符号化画像のビット発生量が検出され、非圧縮データが符号化された符号化ストリームをデコードするデコーダの入力バッファに対応する仮想バッファのバッファ容量から、検出されたフレーム内符号化画像のビット発生量を減算した値が算出されて、符号化中のＧＯＰに含まれる非圧縮データに対して割り当てられる仮想バッファのバッファ容量のうちＧＯＰ内でまだ符号化されていない残りのピクチャに割り当てられる使用可能ビット量に加えられるＧＯＰ毎のビット補給量の合計値の最大値に設定され、これから符号化されるＧＯＰのビット補給量が、設定された合計値の最大値を満たし、かつ、過去に符号化されたＧＯＰに含まれる非圧縮データの符号化難易度が第１の値より高い場合、正の値となるように、符号化難易度が第１の値よりも低い第２の値より低い場合、負の値となるように、符号化難易度が第１の値と第２の値との間となる場合、０となるように算出される。
【００２０】
【発明の実施の形態】
以下、図を参照して、本発明の実施の形態について説明する。
【００２１】
図１は、本発明を適応したエンコーダ１の構成を示すブロック図である。
【００２２】
画像並び替え部１２は、入力された非圧縮映像データを符号化順に並べ替える。走査変換・マクロブロック化部１３は、ピクチャ・フィールド変換を行い、例えば、非圧縮映像データが映画の映像データである場合、３：２プルダウン処理等を行う。イントラＡＣ算出部１４は、画像並び替え部１２および走査変換・マクロブロック化部１３により処理され、Iピクチャに圧縮符号化されるピクチャから、イントラＡＣ（intra ＡＣ）を算出する。
【００２３】
Iピクチャについては、他のピクチャの参照なしに圧縮符号化されるため、後述するＭＥ残差を求めることができない。従って、Iピクチャの符号化難易度を求めるために、ＭＥ残差に代わるパラメータとして、イントラＡＣが用いられる。イントラＡＣは、ＭＰＥＧ方式におけるＤＣＴ処理単位のＤＣＴブロックごとの映像データとの分散値の総和として定義されるパラメータであって、映像の複雑さを指標し、映像の絵柄の難しさおよび圧縮後のデータ量と相関性を有する。すなわち、イントラＡＣとは、ＤＣＴブロック単位で、それぞれの画素の画素値から、ブロック毎の画素値の平均値を引いたものの絶対値和の、画面内における総和である。イントラＡＣは、次の式（１）で示される。
【００２４】
【数１】

・・・（１）
【００２５】
また、式（1）において、式（２）が成り立つ。
【数２】

・・・・（２）
【００２６】
イントラＡＣ算出部１４は、算出されたイントラＡＣの値を、レートコントロール部１５の難易度算出部３２に出力する。
【００２７】
演算処理部１６は、動き補償部２５から供給される動き補償情報を基に、供給された映像データに対して動き補償を行い、ＤＣＴ部１８に対して出力する。ＤＣＴ部１８は、演算処理部１６から入力された映像データに対して、例えば、１６画素×１６画素のマクロブロック単位に離散コサイン変換（ＤＣＴ）処理を施し、時間領域のデータから周波数領域のデータに変換して、量子化部１９に対して出力する。
【００２８】
量子化部１９は、ＤＣＴ部１８から入力された周波数領域のデータを、レートコントロール部１５の量子化インデックス決定部３５から供給される量子化インデックスＱで量子化し、量子化データとしてＶＬＣ（Variable Length Code；可変長符号化）部２０および逆量子化部２２に対して出力する。
【００２９】
ＶＬＣ部２０は、量子化部１９から入力された量子化データに対し、所定の変換テーブルに基づく可変長符号化処理を行い、その結果得られる可変長符号化データをバッファ２１に出力する。
【００３０】
バッファ２１は、入力された符号化データをバッファリングし、符号化ビットストリームとして、順次、出力する。
【００３１】
逆量子化部２２は、量子化部１９から入力された量子化データを、量子化部１９が実行した量子化の量子化ステップで逆量子化し、逆量子化データとして逆ＤＣＴ部２３に対して出力する。
【００３２】
逆ＤＣＴ部２３は、逆量子化部２２から入力される逆量子化データに対して逆ＤＣＴ処理を行い、演算処理部２４に対して出力する。
【００３３】
演算処理部２４は、動き補償部２５の出力データ、および逆ＤＣＴ部２３の出力データを加算し、動き補償部２５に対して出力する。動き検出部１７は、圧縮対象となるピクチャ（入力ピクチャ）の注目マクロブロックと、参照されるピクチャ（参照ピクチャ）との間の差分値の絶対値和あるいは自乗値和が最小となるようなマクロブロックを探し、動きベクトルを求めて、動き補償部２５に出力する。動き補償部２５は、演算処理部２４の出力データに対して、動き検出部１７から入力される動きベクトルに基づいて動き補償処理を行い、演算処理部２４、および演算処理部１６に対して出力する。
【００３４】
レートコントロール部１５は、ＭＥ残差算出部３１、難易度算出部３２、genbit検出部３３、ターゲットビット決定部３４、および量子化インデックス決定部３５で構成され、ターゲットビットおよび量子化インデックスを決定する。
【００３５】
ＭＥ残差算出部３１は、画像の符号化難易度と強い相関があるパラメータであるＭＥ残差を算出する。動き予測によって、参照フレームから入力フレームへの差分値の絶対値和などが少なくなるような動きベクトルを求めることができるが、その場合における差分値の絶対値和、あるいは自乗和などで求められる誤差成分のパワーがＭＥ残差である。Ｐピクチャ、およびＢピクチャにおいては、ＭＥ残差と画像の符号化難易度とは、ほぼ単純な比例関係を有している。
【００３６】
難易度算出部３２は、ＭＥ残差算出部３１から入力されるＭＥ残差による近似により、式（３）、および、式（４）を用いて、ＰピクチャおよびＢピクチャの符号化難易度Ｄjを算出する。
【数３】

・・・（３）
【数４】

・・・（４）
【００３７】
ここで、ＭＥｊは、ｊ番目のピクチャにおけるＭＥ残差であり、ａ_P、ａ_B、ｂ_P、ｂ_Bは、それぞれ、１次式で近似した場合の傾きと補正値である。
【００３８】
また、難易度算出部３２はイントラＡＣ算出部１４から入力されるイントラＡＣによる近似により、同様にIピクチャの符号化難易度Ｄjを算出し、ターゲットビット決定部３４に出力する。
【００３９】
そして、難易度算出部３２は、それそれのピクチャで算出された符号化難易度Ｄjから、ＧＯＰ毎の難易度平均avgDを算出する。
【００４０】
genbit検出部３３は、バッファ２１にバッファリングされている符号化データから、直近に符号化されたIピクチャの発生ビット量genbitを検出し、その値を、ターゲットビット決定部３４に出力する。
【００４１】
ターゲットビット決定部３４は、難易度算出部３２から入力された符号化難易度Ｄj、および、genbit検出部３３から入力されたIピクチャの発生ビット量genbitに基づいて、各ピクチャタイプのピクチャそれぞれのターゲットビットを算出して、レート制御を行う。
【００４２】
すなわち、ターゲットビット決定部３４は、後述する処理により、エンコードを終了した過去の画像における難易度などを基に、これからエンコードしようとする複数枚のピクチャに対して割り当てられている使用可能ビット量Ｒに加えられるsupplementの値（supplementは、正の値である場合、負の値である場合、０である場合がある）を決定する。ターゲットビット決定部３４は、この使用可能ビット量Ｒ＋supplementを基に、ターゲットビットの値を求め、量子化インデックス決定部３５に出力する。
【００４３】
量子化インデックス決定部３５は、ターゲットビット決定部３４から入力されたターゲットビットの値に基づいて、量子化インデックスＱを生成し、量子化部１９に対して出力する。
【００４４】
次に、図２のフローチャートを参照して、エンコードを終了した過去の画像における難易度を基にＲに加えるsupplementを決定する、ビット補給レート制御処理について説明する。
【００４５】
ステップＳ１において、ターゲットビット決定部３４は、現在処理中のピクチャは、ＧＯＰの先頭であるか否かを判断する。ステップＳ１において、ＧＯＰの先頭ではないと判断された場合、ＧＯＰの先頭であると判断されるまで、ステップＳ１の処理が繰り返される。
【００４６】
ステップＳ１において、ＧＯＰの先頭であると判断された場合、ステップＳ２において、ターゲットビット決定部３４は、難易度算出部３２より、前のＧＯＰにおける難易度平均avgDを取得する。
【００４７】
ステップＳ３において、図３、もしくは図６を用いて後述するmax_sum_sup算出処理が実行される。
【００４８】
ステップＳ４において、ターゲットビット決定部３４は、avgD > 0x2000かつsum_sup < max_sum_supであるか否かを判断する。ここで、難易度平均avgDと比較されている0x2000は、予め定められた閾値であり、画質を検討しながら要求される画質を得るために設定可能な値である。
【００４９】
ステップＳ４において、avgD > 0x2000かつsum_sup < max_sum_supであると判断された場合、ステップＳ５において、ターゲットビット決定部３４は、使用可能ビット量Ｒに対して、正の値のsupplementを加える。すなわち、ターゲットビット決定部３４は、前のＧＯＰは、ある一定以上の難易度を有していたため、これからエンコードするＧＯＰの難易度を、前のＧＯＰと同程度であると予測して、使用可能ビット量Ｒに対して、正の値のsupplementを加える。
【００５０】
ステップＳ４において、avgD > 0x2000かつsum_sup < max_sum_supではないと判断された場合、ステップＳ６において、ターゲットビット決定部３４は、avgD < 0x1000、かつsum_sup > min_sum_supであるか否かを判断する。ここで、難易度平均avgDと比較されている0x１000は、予め定められた閾値であり、上述した 0x2000より小さな値（画像難易度が低いことを示す値）であり、画質を検討しながら要求される画質を得るために設定可能な値である。
【００５１】
ステップＳ６において、avgD < 0x1000、かつsum_sup > min_sum_supであると判断された場合、ステップＳ７において、ターゲットビット決定部３４は、使用可能ビット量Ｒに対して、負の値のsupplementを加える。すなわち、ターゲットビット決定部３４は、前のＧＯＰは、ある一定以下の難易度であった（すなわち、簡単な画像であった）ため、これからエンコードするＧＯＰの難易度を、前のＧＯＰと同程度であると予測して、使用可能ビット量Ｒに対して、負の値のsupplementを加える。
【００５２】
ステップＳ６において、avgD < 0x1000、かつsum_sup > min_sum_supではなかったと判断された場合、ステップＳ８において、ターゲットビット決定部３４は、supplement ＝ 0とする。すなわち、ターゲットビット決定部３４は、使用可能ビット量Ｒに対して、supplementの増減を行わない。
【００５３】
ステップＳ５、ステップＳ７、もしくはステップＳ８の処理の終了後、ステップＳ９において、ターゲットビット決定部３４は、ステップＳ５、ステップＳ７、もしくはステップＳ８の処理において用いられたsupplementの値を用いて、sum_sup = sum_sup + supplementとし、処理は、ステップＳ１に戻り、それ以降の処理が繰り返される。
【００５４】
図２を用いて説明した処理により、エンコードを終了した過去の画像における難易度を基に、使用可能ビット量Ｒに加える、あるいは、減少されるsupplementの値が決定される。例えば、ＧＯＰ単位で、Ｒ＋supplemet（supplementは、正の値であるか、負の値であるか、もしくは０である）が決定される場合、前のＧＯＰの画像難易度（イントラＡＣ、あるいは、ＭＥ残差等）の平均値を基に、これからエンコードするＧＯＰの難易度が前のＧＯＰの難易度と同程度であると予測して、使用可能ビット量Ｒに対して、その難易度に応じたsupplementが加えられる。
【００５５】
ここでは、画像難易度をイントラＡＣ、あるいは、ＭＥ残差を用いて算出するものとして説明したが、画像難易度は、それ以外のパラメータを用いて算出するようにしても良い。
【００５６】
また、supplementの具体的な値の算出方法は、例えば、特開平１０−７５４４３に開示されている方法でも良いし、それ以外の方法で、要求される画質を得ることができるsupplementの値を用いるようにしても良い。
【００５７】
また、ここでは、前の１ＧＯＰにおける難易度平均avgＤを用いるものとして説明したが、難易度算出部３２は、１ＧＯＰにおける難易度平均avgＤに代わって、例えば、複数のＧＯＰ、もしくは、ＧＯＰの一部における難易度平均を求めるようにしても良いし、更に、単純な難易度平均ではなく、必要に応じて、重み付け和や重み付け平均を算出するようにしても良い。
【００５８】
次に、図２のステップＳ３において実行されるmax_sum_sup算出処理について説明する。
【００５９】
LongGOPにおいては、Iピクチャの発生量が大きくなる傾向がある。従って、ピクチャの発生量によってＶＢＶアンダーフローを起こすことを防ぐためには、ＶＢＶバッファサイズからエンコードを終了した直近のIピクチャのビット発生量を引いたものをsum_supの最大値（max_sum_sup）とすればよい。
【００６０】
図３のフローチャートを参照して、図２のステップＳ３において実行されるmax_sum_sup算出処理１について説明する。
【００６１】
ステップＳ２１において、genbit検出部３３は、直近のIピクチャの発生符号量genbitを検出する。ターゲットビット決定部３４は、genbit検出部３３から、genbitの値の入力を受ける。
【００６２】
ステップＳ２２において、ターゲットビット決定部３４は、sum_supの最大値であるmax_sum_supの値を、max_sum_sup＝ＶＢＶバッファサイズ−Iピクチャ発生量とし、処理は、図２のステップＳ４に戻る。
【００６３】
図３を用いて説明した処理により、図４に示すように、ＶＢＶサイズからIピクチャ発生量を引いた、実線矢印の合計量が、ＶＢＶ余裕度として、次のＧＯＰのsum_supの最大値とされる。これにより、アンダーフローしやすい絵柄ではsupplementが与えられにくくなり、アンダーフローに対する余裕がある絵柄に対してはsupplementが与えられやすくなる。すなわち、Iピクチャのビット発生量が多いために発生するＶＢＶアンダーフローを防ぐことができる。
【００６４】
しかしながら、図３を用いて説明した処理では、シーンチェンジが起きた場合に不具合が発生してしまう。例えば、難しい絵柄から簡単な絵柄へのシーンチェンジが起きた場合、図５に示されるように、前のＧＯＰが難しい絵柄のため、次のＧＯＰのmax_sum_sup（実線矢印の合計）が大きくなり、絵柄が簡単なＧＯＰに、大きくなったmax_sum_supを適用してしまうので、ＶＢＶの余裕が無いものに対してsum_supの最大値を大きくしてしまう。また、同様に、簡単な絵柄から難しい絵柄へのシーンチェンジにおいても、逆の不具合が発生してしまう。
【００６５】
これを防ぐために、シーンチェンジのIピクチャを含むＧＯＰをエンコードする場合には、前のIピクチャ発生量により求められたsum_supの最大値を、シーンチェンジのIピクチャの難易度により増減させるようにすることができる。
【００６６】
図６のフローチャートを参照して、図２のステップＳ３において実行されるmax_sum_sup算出処理２について説明する。
【００６７】
ステップＳ３１において、genbit検出部３３は、直近のIピクチャの発生符号量genbitを検出する。ターゲットビット決定部３４は、genbit検出部３３から、genbitの値の入力を受ける。
【００６８】
ステップＳ３２において、ターゲットビット決定部３４は、sum_supの最大値であるmax_sum_supの値を、max_sum_sup＝ＶＢＶバッファサイズ−Iピクチャ発生量とする。
【００６９】
ステップＳ３３において、ターゲットビット決定部３４は、シーンチェンジであるか否かを判断する。シーンチェンジであるか否かの判断は、例えば、ＭＥ残差算出部３１により算出されるＭＥ残差の値を基にして判断するようにしても良いし、それ以外のいかなる方法によって判断するようにしても良い。
【００７０】
ステップＳ３３において、シーンチェンジではないと判断された場合、処理は、図２のステップＳ４に戻る。
【００７１】
ステップＳ３３において、シーンチェンジであると判断された場合、ステップＳ３４において、ターゲットビット決定部３４は、難易度算出部３２より、シーンチェンジのIピクチャ、および１つ前のIピクチャの符号化難易度を取得する。
【００７２】
ステップＳ３５において、ターゲットビット決定部３４は、２つのIピクチャの符号化難易度の差を算出し、ステップＳ３２において算出されたmax_sum_supの値を、符号化難易度の差、すなわち、難しい絵柄から簡単な絵柄へのシーンチェンジであるか、簡単な絵柄から難しい絵柄へのシーンチェンジであるかを基に増減して、処理は、図２のステップＳ４に戻る。
【００７３】
具体的には、シーンチェンジ後の符号化難易度が低い場合は、max_sum_supの値を少なくし、シーンチェンジ後の符号化難易度が高い場合は、max_sum_supの値を多くする。
【００７４】
図６を用いて説明した処理により、シーンチェンジのIピクチャを含むＧＯＰをエンコードする場合には、前のIピクチャ発生量により求まったsum_supの最大値をシーンチェンジのIピクチャの難易度により増減させることにより、例えば、次のＧＯＰのIピクチャ発生量が大きく、ＶＢＶに余裕が無いにもかかわらず、大きなsum_sup最大値となってしまうようなことをふせぐようにすることができる。
【００７５】
また、本発明は、図２を用いて説明したビット補給レート制御処理以外でも、ビット補給レート制御を行う場合、すなわち、ビット補給量supplementの積算値sum_supの最大値である、max_sum_supを用いる処理の全てに適用可能である。
【００７６】
上述した一連の処理は、ハードウエアにより実行させることもできるが、ソフトウエアにより実行させることもできる。この場合、例えば、エンコーダ１は、図７に示されるようなパーソナルコンピュータ１０１により構成される。
【００７７】
図７において、CPU１１１は、ROM１１２に記憶されているプログラム、または記憶部１１８からRAM１１３にロードされたプログラムに従って、各種の処理を実行する。RAM１１３にはまた、CPU１１１が各種の処理を実行する上において必要なデータなども適宜記憶される。
【００７８】
CPU１１１、ROM１１２、およびRAM１１３は、バス１１４を介して相互に接続されている。このバス１１４にはまた、入出力インタフェース１１５も接続されている。
【００７９】
入出力インタフェース１１５には、キーボード、マウスなどよりなる入力部１１６、ディスプレイやスピーカなどよりなる出力部１１７、ハードディスクなどより構成される記憶部１１８、モデム、ターミナルアダプタなどより構成される通信部１１９が接続されている。通信部１１９は、インターネットを含むネットワークを介しての通信処理を行う。
【００８０】
入出力インタフェース１１５にはまた、必要に応じてドライブ１２０が接続され、磁気ディスク１３１、光ディスク１３２、光磁気ディスク１３３、あるいは、半導体メモリ１３４などが適宜装着され、それらから読み出されたコンピュータプログラムが、必要に応じて記憶部１１８にインストールされる。
【００８１】
一連の処理をソフトウエアにより実行させる場合には、そのソフトウエアを構成するプログラムが、専用のハードウエアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどに、ネットワークや記録媒体からインストールされる。
【００８２】
この記録媒体は、図７に示されるように、装置本体とは別に、ユーザにプログラムを供給するために配布される、プログラムが記憶されている磁気ディスク１３１（フロッピディスクを含む）、光ディスク１３２（ＣＤ-ＲＯＭ（Compact Disk-Read Only Memory），ＤＶＤ（Digital Versatile Disk）を含む）、光磁気ディスク１３３（ＭＤ（Mini-Disk）（商標）を含む）、もしくは半導体メモリ１３４などよりなるパッケージメディアにより構成されるだけでなく、装置本体に予め組み込まれた状態でユーザに供給される、プログラムが記憶されているROM１１２や、記憶部１１８に含まれるハードディスクなどで構成される。
【００８３】
なお、本明細書において、記録媒体に記憶されるプログラムを記述するステップは、含む順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。
【００８４】
【発明の効果】
本発明によれば、画像データをエンコードすることができる。
また、本発明によれば、エンコードを終了した過去のＧＯＰの画像における難易度を基に使用可能ビット量Ｒに加えるsupplementを決定する場合の、supplementの合計値の最大値を設定することができるので、フィードバック型レート制御にビット補給レート制御を適用する場合に仮想バッファのアンダーフローを防ぐことができる。
【００８５】
また、シーンチェンジが起きたＧＯＰをエンコードする際には、シーンチェンジ前後のフレーム内符号化画像の画像難易度を比較した値を用いて、ひとつ前のＧＯＰのフレーム内符号化画像の符号化難易度が符号化されるＧＯＰのフレーム内符号化画像の符号化難易度よりも低いとき、 supplement の合計値の最大値が少なくなるように、ひとつ前のＧＯＰのフレーム内符号化画像の符号化難易度が符号化されるＧＯＰのフレーム内符号化画像の符号化難易度よりも高いとき、 supplement の合計値の最大値が多くなるように、supplementの合計値の最大値を再設定することができるので、フィードバック型レート制御にビット補給レート制御を適用する場合に仮想バッファのアンダーフローを防ぐことができる。
【図面の簡単な説明】
【図１】本発明を適用したエンコーダの構成を示すブロック図である。
【図２】ビット補給レート制御処理について説明するフローチャートである。
【図３】 max_sum_sup算出処理１について説明するフローチャートである。
【図４】ＶＢＶバッファと、sum_supの最大値とについて説明するための図である。
【図５】ＶＢＶバッファと、sum_supの最大値とについて説明するための図である。
【図６】 max_sum_sup算出処理２について説明するフローチャートである。
【図７】パーソナルコンピュータの構成について説明する図である。
【符号の説明】
１エンコーダ，１２画像並び替え部，１３走査変換・マクロブロック化部，１４イントラＡＣ算出部，１５レートコントロール部，１６演算処理部，１７動き検出部，１８ＤＣＴ処理部，１９量子化部，２０ＶＬＣ部，２１バッファ，２２逆量子化部，２３逆ＤＣＴ処理部，２４演算処理部，２５動き補償部，３１ＭＥ残差算出部，３２難易度算出部，３３ genbit検出部，３４ターゲットビット決定部，３５量子化インデックス決定部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an encoding device, an encoding method, a program, and a recording medium, and more particularly to an encoding device, an encoding method, a program, and the like suitable for performing bit replenishment rate control in feedback type rate control, And a recording medium.
[0002]
[Prior art]
In recent years, various compression encoding methods have been proposed as methods for reducing the amount of information by compressing video data and audio data, and representative examples thereof include MPEG2 (Moving
There is Picture Experts Group Phase 2).
[0003]
In such an image compression method, TM5 (Test Model 5) is available as a method for obtaining good encoded image quality. In step 1 of TM5, a target bit to be given for each picture is calculated. In the calculation of the target bit, the bit amount R that can be allocated to the remaining pictures in the GOP (Group of Picture) is proportionally distributed according to the ratio of GC (Global Complexity) for each picture type, The amount of bits allocated to each picture is calculated.
[0004]
TM5 is an excellent method for making the amount of generated bits per GOP substantially constant, but when performing fixed-rate encoding, it is not always necessary to make the amount of generated bits of GOP constant. In fixed-rate encoding, it is necessary to prevent the VBV (Video Buffering Verifier) buffer occupancy from overflowing or underflowing a specified value.
[0005]
In TM5, the amount of generated bits per GOP is almost constant, so the VBV buffer does not overflow or underflow. However, in TM5, when encoding is performed at a low bit rate, the buffer capacity cannot be effectively used. For example, in the MP @ PL of MPEG, when TM5 is applied, the VBV buffer capacity is about 1.8 Mbit, whereas the bit amount of a picture extracted from the buffer is small, so about 1.8 Mbit. Cannot be used effectively.
[0006]
In this way, regardless of the input pattern, by assigning a certain amount of bits, encoding distortion is noticeably generated for a pattern with a high degree of encoding difficulty, while encoding is difficult. A picture with a low degree has less coding distortion, and as a whole becomes an unstable image with a lot of unevenness.
[0007]
In order to solve such a problem, a larger amount of bits is allocated to a picture with a high degree of difficulty in coding in a range where the buffer does not underflow, while a picture with a lower degree of difficulty of coding has a buffer. Therefore, it is necessary to allocate a small bit amount suitable for the pattern within a range where the overflow does not occur.
[0008]
Therefore, in the Japanese Patent Laid-Open No. 10-75443, the present applicant can adjust the amount of generated bits according to the complexity of the pattern for each part of the video data so that the quality of the video after compression can be improved as a whole. A video data compression apparatus and method therefor are disclosed.
[0009]
In TM5, the usable bit amount R that can be allocated to the remaining pictures of the GOP is an important parameter in rate control. For example, in the first half of the GOP, if a large amount of bits is allocated because a complex pattern image continues, the bit amount R becomes extremely small in the second half of the GOP, or a negative number. Become.
[0010]
On the other hand, the bit replenishment rate control disclosed by the present applicant in Japanese Patent Laid-Open No. Hei 10-75443 is that the usable bit amount R assigned to a plurality of pictures to be encoded is added to the encoding target. The rate control method is characterized in that the bit amount is added or reduced according to the image difficulty level and the VBV buffer occupation amount (hereinafter, the added or reduced bits are referred to as supplement).
[0011]
[Problems to be solved by the invention]
The previously proposed bit replenishment rate control is applied to the case where all the information such as the difficulty level of a plurality of picture images to be encoded is already known, that is, the feed forward type rate control in which the encoding information is pre-read. For example, after the 15 GOP data are accumulated, the image encoding difficulty level is determined, which causes a certain delay in the information accumulation.
[0012]
However, in feedback (Feed Back) rate control in which prefetch information cannot be obtained, the future VBV margin cannot be accurately estimated. Or a fixed value determined by the usable VBV size. However, in particular, when the maximum value of sum_sup is a fixed value, there is a problem that VBV underflow is likely to occur depending on the amount of generated pictures. In order to prevent VBV underflow, VBV margin Therefore, the maximum value of sum_sup had to be determined.
[0013]
The present invention has been made in view of such a situation, and enables bit replenishment rate control to be performed in feedback-type rate control.
[0014]
[Means for Solving the Problems]
  The encoding apparatus according to the present invention includes a first detection unit that detects a degree of difficulty in encoding uncompressed data, an encoding unit that compresses and encodes uncompressed data based on GOP, and an encoding unit in the past. Second detection means for detecting the bit generation amount of the intra-frame encoded image of the uncompressed data included in the immediately preceding GOP encoded, and the encoded stream in which the uncompressed data is encoded is decoded. A value obtained by subtracting the bit generation amount of the intra-frame encoded image detected by the second detection unit from the buffer capacity of the virtual buffer corresponding to the input buffer of the decoder is calculated, and the uncompressed included in the GOP being encoded Of the buffer capacity of the virtual buffer allocated for data, this is added to the amount of usable bits allocated to the remaining pictures not yet encoded in the GOPPer GOPBit replenishment amountofThe setting means for setting as the maximum value of the total value, and the bit supply amount of the GOP to be encoded from now on satisfy the maximum value of the total value set by the setting means, and detected by the first detection means, Encoding difficulty level of uncompressed data included in GOP encoded in the past by encoding meansIf the encoding difficulty level is lower than the second value, which is lower than the first value, so that the encoding difficulty level is higher than the first value, the encoding difficulty level becomes the negative value. If it is between the value of 1 and the second value, it will be 0And calculating means for calculating.
[0015]
  Scene change I picture andOne beforeIPicture andofWhen a difference in encoding difficulty is detected by the third detecting means for detecting the difference in encoding difficulty and the third detecting means, the previous detection detected by the first detecting means.IPicture coding difficultyScene change I pictureWhen the degree of difficulty of encoding is lower, the previous value detected by the first detecting unit is increased so that the maximum value of the total amount of bit supply set by the setting unit increases.IPicture coding difficultyScene change I pictureResetting means for resetting the maximum value of the total amount of bit supply set by the setting means so that the maximum value of the total amount of bit supply set by the setting means is reduced Can be provided.
[0016]
  The encoding method of the present invention includes a first detection step for detecting the encoding difficulty level of uncompressed data, an encoding step for compressing and encoding uncompressed data with reference to GOP, and processing of the encoding step. A second detection step of detecting a bit generation amount of an intra-frame encoded image of uncompressed data included in a GOP immediately preceding encoded in the past, and an encoded stream in which the uncompressed data is encoded A value obtained by subtracting the bit generation amount of the intra-frame encoded image detected by the processing of the second detection step from the buffer capacity of the virtual buffer corresponding to the input buffer of the decoder to be decoded is calculated, and the GOP being encoded is calculated. Of the buffer capacity of the virtual buffer allocated for the included uncompressed data, it is allocated to the remaining pictures that are not yet encoded in the GOP. It is added to the available bit amount thatPer GOPBit replenishment amountofThe setting step for setting the maximum value of the total value and the bit replenishment amount of the GOP to be encoded satisfy the maximum value of the total value set by the processing of the setting step, and the processing of the first detection step The degree of difficulty in encoding uncompressed data included in the GOP encoded in the past by the processing of the encoding step detectedIf the encoding difficulty level is lower than the second value, which is lower than the first value, so that the encoding difficulty level is higher than the first value, the encoding difficulty level becomes the negative value. If it is between the value of 1 and the second value, it will be 0And a calculating step for calculating.
[0017]
  The program recorded on the recording medium of the present invention includes a first detection step for detecting the encoding difficulty level of uncompressed data, an encoding step for compressing and encoding uncompressed data with reference to GOP, A second detection step for detecting a bit generation amount of an intra-frame encoded image out of the uncompressed data included in the previous GOP encoded in the past by the encoding step processing; and the uncompressed data is encoded A value obtained by subtracting the bit generation amount of the intra-frame encoded image detected by the processing of the second detection step from the buffer capacity of the virtual buffer corresponding to the input buffer of the decoder that decodes the encoded stream; Of the buffer capacity of the virtual buffer allocated to the uncompressed data included in the GOP being converted, the remaining unencoded data in the GOP It is added to the available bit amount allocated to the picturePer GOPBit replenishment amountofThe setting step for setting the maximum value of the total value and the bit replenishment amount of the GOP to be encoded satisfy the maximum value of the total value set by the processing of the setting step, and the processing of the first detection step The degree of difficulty in encoding uncompressed data included in the GOP encoded in the past by the processing of the encoding step detectedIf the encoding difficulty level is lower than the second value, which is lower than the first value, so that the encoding difficulty level is higher than the first value, the encoding difficulty level becomes the negative value. If it is between the value of 1 and the second value, it will be 0And a calculating step for calculating.
[0018]
  The program of the present invention includes a first detection step for detecting a degree of difficulty in encoding non-compressed data, an encoding step for compressing and encoding non-compressed data based on GOP, and a process of the encoding step. A second detection step of detecting the bit generation amount of the intra-frame encoded image of the uncompressed data included in the immediately preceding GOP encoded, and decoding the encoded stream in which the uncompressed data is encoded A value obtained by subtracting the bit generation amount of the intra-frame encoded image detected by the processing of the second detection step from the buffer capacity of the virtual buffer corresponding to the input buffer of the decoder is calculated and included in the GOP being encoded Of the buffer capacity of the virtual buffer allocated for uncompressed data, it is allocated to the remaining pictures that are not yet encoded in the GOP. It is added to the available bit amount thatPer GOPBit replenishment amountofThe setting step for setting the maximum value of the total value and the bit replenishment amount of the GOP to be encoded satisfy the maximum value of the total value set by the processing of the setting step, and the processing of the first detection step The degree of difficulty in encoding uncompressed data included in the GOP encoded in the past by the processing of the encoding step detectedIf the encoding difficulty level is lower than the second value, which is lower than the first value, so that the encoding difficulty level is higher than the first value, the encoding difficulty level becomes the negative value. If it is between the value of 1 and the second value, it will be 0And a calculating step for calculating.
[0019]
  In the encoding apparatus, the encoding method, and the program of the present invention, the encoding difficulty level of uncompressed data is detected, the GOP immediately before the uncompressed data is compressed and encoded based on the GOP, and encoded in the past. The bit generation amount of the intra-frame encoded image of the uncompressed data included in the image is detected, and from the buffer capacity of the virtual buffer corresponding to the input buffer of the decoder that decodes the encoded stream in which the uncompressed data is encoded A value obtained by subtracting the bit generation amount of the detected intra-frame encoded image is calculated, and the code is still encoded in the GOP out of the buffer capacity of the virtual buffer allocated to the uncompressed data included in the GOP being encoded. Added to the amount of available bits allocated to the remaining unencoded picturesPer GOPBit replenishment amountofEncoding uncompressed data included in a GOP encoded in the past, in which the bit replenishment amount of the GOP to be encoded is set to the maximum value of the total value and satisfies the maximum value of the set total value DifficultyIf the encoding difficulty level is lower than the second value, which is lower than the first value, so that the encoding difficulty level is higher than the first value, the encoding difficulty level becomes the negative value. If it is between the value of 1 and the second value, it will be 0Calculated.
[0020]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0021]
FIG. 1 is a block diagram showing a configuration of an encoder 1 to which the present invention is applied.
[0022]
The image rearrangement unit 12 rearranges the input uncompressed video data in the encoding order. The scan conversion / macroblocking unit 13 performs picture / field conversion. For example, when the uncompressed video data is video data of a movie, 3: 2 pull-down processing is performed. The intra AC calculation unit 14 calculates an intra AC from a picture that is processed by the image rearrangement unit 12 and the scan conversion / macroblocking unit 13 and is compression-encoded into an I picture.
[0023]
Since an I picture is compression-encoded without reference to other pictures, an ME residual described later cannot be obtained. Therefore, in order to obtain the encoding difficulty level of the I picture, intra AC is used as a parameter instead of the ME residual. Intra AC is a parameter defined as the sum of variance values of video data for each DCT block in the DCT processing unit in the MPEG system, and indicates the complexity of the video. Correlate with data volume. That is, the intra AC is the sum in the screen of the absolute value sum obtained by subtracting the average value of the pixel values for each block from the pixel value of each pixel for each DCT block. The intra AC is represented by the following formula (1).
[0024]
[Expression 1]

... (1)
[0025]
Further, in the formula (1), the formula (2) is established.
[Expression 2]

(2)
[0026]
The intra AC calculation unit 14 outputs the calculated intra AC value to the difficulty level calculation unit 32 of the rate control unit 15.
[0027]
The arithmetic processing unit 16 performs motion compensation on the supplied video data based on the motion compensation information supplied from the motion compensation unit 25, and outputs it to the DCT unit 18. The DCT unit 18 performs a discrete cosine transform (DCT) process on the video data input from the arithmetic processing unit 16 in units of macroblocks of 16 pixels × 16 pixels, for example, and converts the time domain data to the frequency domain data. And output to the quantization unit 19.
[0028]
The quantization unit 19 quantizes the frequency domain data input from the DCT unit 18 with the quantization index Q supplied from the quantization index determination unit 35 of the rate control unit 15, and uses VLC (Variable Length) as the quantization data. Code; variable length coding) unit 20 and inverse quantization unit 22
[0029]
The VLC unit 20 performs variable length coding processing based on a predetermined conversion table for the quantized data input from the quantization unit 19, and outputs variable length coded data obtained as a result to the buffer 21.
[0030]
The buffer 21 buffers the input encoded data and sequentially outputs it as an encoded bit stream.
[0031]
The inverse quantization unit 22 inversely quantizes the quantized data input from the quantization unit 19 in the quantization step performed by the quantization unit 19, and performs inverse quantization on the inverse DCT unit 23. Output.
[0032]
The inverse DCT unit 23 performs inverse DCT processing on the inversely quantized data input from the inverse quantization unit 22 and outputs the result to the arithmetic processing unit 24.
[0033]
The arithmetic processing unit 24 adds the output data of the motion compensation unit 25 and the output data of the inverse DCT unit 23 and outputs the result to the motion compensation unit 25. The motion detector 17 is a macro that minimizes the sum of absolute values or sum of squares of difference values between a target macroblock of a picture to be compressed (input picture) and a picture to be referenced (reference picture). A block is searched, a motion vector is obtained, and output to the motion compensation unit 25. The motion compensation unit 25 performs motion compensation processing on the output data of the arithmetic processing unit 24 based on the motion vector input from the motion detection unit 17 and outputs the data to the arithmetic processing unit 24 and the arithmetic processing unit 16. To do.
[0034]
The rate control unit 15 includes an ME residual calculation unit 31, a difficulty calculation unit 32, a genbit detection unit 33, a target bit determination unit 34, and a quantization index determination unit 35, and determines a target bit and a quantization index. .
[0035]
The ME residual calculation unit 31 calculates an ME residual which is a parameter having a strong correlation with the degree of difficulty in encoding an image. A motion vector that reduces the sum of absolute values of the difference values from the reference frame to the input frame can be obtained by motion prediction. In this case, the error obtained by the sum of absolute values of the difference values or the sum of squares, etc. The power of the component is the ME residual. In the P picture and the B picture, the ME residual and the image encoding difficulty have a substantially simple proportional relationship.
[0036]
The difficulty level calculation unit 32 uses the equations (3) and (4) to approximate the coding difficulty level Dj of the P picture and the B picture by approximation using the ME residual input from the ME residual calculation unit 31. Is calculated.
[Equation 3]

... (3)
[Expression 4]

... (4)
[0037]
Where MEj is the ME residual in the jth picture, and a_P, A_B, B_P, B_BAre the inclination and the correction value when approximated by a linear expression.
[0038]
Also, the difficulty level calculation unit 32 similarly calculates the I picture coding difficulty level Dj by approximation using the intra AC input from the intra AC calculation unit 14, and outputs it to the target bit determination unit 34.
[0039]
Then, the difficulty level calculation unit 32 calculates the average difficulty level avgD for each GOP from the encoding difficulty level Dj calculated for each picture.
[0040]
The genbit detection unit 33 detects the generated bit amount genbit of the most recently encoded I picture from the encoded data buffered in the buffer 21, and outputs the value to the target bit determination unit 34.
[0041]
Based on the encoding difficulty Dj input from the difficulty calculation unit 32 and the generated bit amount genbit of the I picture input from the genbit detection unit 33, the target bit determination unit 34 determines each picture type picture. Target bits are calculated and rate control is performed.
[0042]
That is, the target bit determination unit 34 uses the processing described later, based on the difficulty level in the past image that has been encoded, and the usable bit amount R allocated to a plurality of pictures to be encoded from now on. The value of the supplement to be added to (the supplement may be 0 if it is a positive value, a negative value). The target bit determination unit 34 obtains the value of the target bit based on the usable bit amount R + supplement and outputs it to the quantization index determination unit 35.
[0043]
The quantization index determination unit 35 generates a quantization index Q based on the value of the target bit input from the target bit determination unit 34 and outputs the quantization index Q to the quantization unit 19.
[0044]
Next, a bit supply rate control process for determining a supplement to be added to R based on the difficulty level of past images that have been encoded will be described with reference to the flowchart of FIG.
[0045]
In step S1, the target bit determination unit 34 determines whether or not the picture currently being processed is the head of the GOP. If it is determined in step S1 that it is not the head of the GOP, the process of step S1 is repeated until it is determined that the head is the head of the GOP.
[0046]
If it is determined in step S1 that it is the head of the GOP, in step S2, the target bit determination unit 34 acquires the difficulty average avgD in the previous GOP from the difficulty calculation unit 32.
[0047]
In step S3, a max_sum_sup calculation process described later with reference to FIG. 3 or FIG. 6 is executed.
[0048]
In step S4, the target bit determination unit 34 determines whether or not avgD> 0x2000 and sum_sup <max_sum_sup. Here, 0x2000 compared with the average difficulty level avgD is a predetermined threshold, which is a value that can be set to obtain the required image quality while considering the image quality.
[0049]
If it is determined in step S4 that avgD> 0x2000 and sum_sup <max_sum_sup, the target bit determination unit 34 adds a positive value supplement to the usable bit amount R in step S5. That is, since the previous GOP has a certain degree of difficulty or more, the target bit determination unit 34 can use it by predicting that the difficulty of the GOP to be encoded is the same as that of the previous GOP. A positive value supplement is added to the bit amount R.
[0050]
When it is determined in step S4 that avgD> 0x2000 and sum_sup <max_sum_sup are not satisfied, in step S6, the target bit determination unit 34 determines whether or not avgD <0x1000 and sum_sup> min_sum_sup. Here, 0x1000, which is compared with the average difficulty level avgD, is a predetermined threshold value, which is smaller than the above-mentioned 0x2000 (a value indicating that the image difficulty level is low), and is required while examining the image quality. This is a value that can be set to obtain a desired image quality.
[0051]
When it is determined in step S6 that avgD <0x1000 and sum_sup> min_sum_sup, the target bit determination unit 34 adds a negative value supplement to the usable bit amount R in step S7. That is, since the previous GOP has a certain difficulty level or less (that is, it is a simple image), the target bit determination unit 34 sets the difficulty level of the GOP to be encoded to the same degree as the previous GOP. Therefore, a negative supplement is added to the usable bit amount R.
[0052]
When it is determined in step S6 that avgD <0x1000 and sum_sup> min_sum_sup are not satisfied, the target bit determination unit 34 sets supplement = 0 in step S8. That is, the target bit determination unit 34 does not increase or decrease the supplement with respect to the usable bit amount R.
[0053]
After the process of step S5, step S7, or step S8 is completed, in step S9, the target bit determination unit 34 uses the supplement value used in the process of step S5, step S7, or step S8, and sum_sup = The sum_sup + supplement is set, and the process returns to step S1, and the subsequent processes are repeated.
[0054]
With the processing described with reference to FIG. 2, the value of the supplement to be added to or reduced from the usable bit amount R is determined based on the difficulty level of the past image that has been encoded. For example, when R + supplemet (supplement is a positive value, a negative value, or 0) is determined in GOP units, the image difficulty level of the previous GOP (intra AC or ME Based on the average value of the residuals, etc., it is predicted that the difficulty level of the GOP to be encoded will be the same as the difficulty level of the previous GOP, and the available bit amount R depends on the difficulty level. supplement is added.
[0055]
Here, the image difficulty level is described as being calculated using the intra AC or ME residual, but the image difficulty level may be calculated using other parameters.
[0056]
In addition, the method for calculating the specific value of supplement may be, for example, the method disclosed in Japanese Patent Laid-Open No. 10-75443, or the supplement value that can obtain the required image quality by other methods. You may do it.
[0057]
Further, here, the difficulty level average avgD in the previous 1 GOP is described as being used, but the difficulty level calculation unit 32 replaces the difficulty level average avgD in 1 GOP with, for example, a plurality of GOPs or a part of GOPs. The difficulty level average may be obtained, or a weighted sum or a weighted average may be calculated as needed instead of a simple difficulty level average.
[0058]
Next, the max_sum_sup calculation process executed in step S3 in FIG. 2 will be described.
[0059]
In LongGOP, the amount of I picture generation tends to increase. Therefore, in order to prevent the occurrence of VBV underflow due to the amount of generated pictures, the maximum sum_sup value (max_sum_sup) may be obtained by subtracting the bit generation amount of the latest I picture that has been encoded from the VBV buffer size. .
[0060]
The max_sum_sup calculation process 1 executed in step S3 of FIG. 2 will be described with reference to the flowchart of FIG.
[0061]
In step S21, the genbit detector 33 detects the generated code amount genbit of the latest I picture. The target bit determination unit 34 receives the genbit value input from the genbit detection unit 33.
[0062]
In step S22, the target bit determination unit 34 sets max_sum_sup, which is the maximum value of sum_sup, as max_sum_sup = VBV buffer size−I picture generation amount, and the processing returns to step S4 in FIG.
[0063]
As a result of the processing described with reference to FIG. 3, as shown in FIG. 4, the total amount of solid arrows obtained by subtracting the I picture generation amount from the VBV size is set as the maximum value of sum_sup of the next GOP as the VBV margin. The As a result, a supplement is less likely to be given to a pattern that tends to underflow, and a supplement is likely to be given to a picture that has a margin for underflow. That is, it is possible to prevent VBV underflow that occurs due to a large amount of I-picture bit generation.
[0064]
However, the process described with reference to FIG. 3 causes a problem when a scene change occurs. For example, when a scene change from a difficult pattern to a simple pattern occurs, as shown in FIG. 5, because the previous GOP is difficult, the next GOP's max_sum_sup (the sum of the solid arrows) increases and the pattern However, since the increased max_sum_sup is applied to a simple GOP, the maximum value of sum_sup is increased for a VOP having no VBV margin. Similarly, in the case of a scene change from a simple pattern to a difficult pattern, the opposite problem occurs.
[0065]
In order to prevent this, when encoding a GOP including an I picture of a scene change, the maximum value of sum_sup obtained from the previous I picture generation amount is increased or decreased depending on the difficulty level of the I picture of the scene change. be able to.
[0066]
The max_sum_sup calculation process 2 executed in step S3 of FIG. 2 will be described with reference to the flowchart of FIG.
[0067]
In step S31, the genbit detector 33 detects the generated code amount genbit of the latest I picture. The target bit determination unit 34 receives the genbit value input from the genbit detection unit 33.
[0068]
In step S32, the target bit determination unit 34 sets the value of max_sum_sup, which is the maximum value of sum_sup, as max_sum_sup = VBV buffer size−I picture generation amount.
[0069]
In step S <b> 33, the target bit determination unit 34 determines whether it is a scene change. The determination as to whether or not the scene change is made may be made based on the value of the ME residual calculated by the ME residual calculation unit 31, for example, or by any other method. Anyway.
[0070]
If it is determined in step S33 that the scene change has not occurred, the process returns to step S4 in FIG.
[0071]
If it is determined in step S33 that it is a scene change, in step S34, the target bit determination unit 34 determines from the difficulty level calculation unit 32 the encoding difficulty level of the I picture of the scene change and the previous I picture. To get.
[0072]
In step S35, the target bit determination unit 34 calculates the difference in encoding difficulty between the two I pictures, and easily calculates the value of max_sum_sup calculated in step S32 from the difference in encoding difficulty, that is, a difficult picture. The process returns to step S4 in FIG. 2 by increasing or decreasing based on whether it is a scene change to a simple picture or a scene change from a simple picture to a difficult picture.
[0073]
Specifically, when the encoding difficulty after the scene change is low, the value of max_sum_sup is decreased, and when the encoding difficulty after the scene change is high, the value of max_sum_sup is increased.
[0074]
When a GOP including an I picture of a scene change is encoded by the processing described with reference to FIG. 6, the maximum value of sum_sup obtained from the previous I picture generation amount is increased or decreased depending on the difficulty level of the I picture of the scene change. Thus, for example, it is possible to prevent a large sum_sup maximum value from occurring even though the I picture generation amount of the next GOP is large and the VBV has no margin.
[0075]
Further, the present invention is not limited to the bit replenishment rate control process described with reference to FIG. 2, but performs the bit replenishment rate control, that is, the process using max_sum_sup which is the maximum value sum_sup of the bit replenishment amount supplement. Applicable to all.
[0076]
The series of processes described above can be executed by hardware, but can also be executed by software. In this case, for example, the encoder 1 includes a personal computer 101 as shown in FIG.
[0077]
In FIG. 7, the CPU 111 executes various processes according to a program stored in the ROM 112 or a program loaded from the storage unit 118 to the RAM 113. The RAM 113 also appropriately stores data necessary for the CPU 111 to execute various processes.
[0078]
The CPU 111, the ROM 112, and the RAM 113 are connected to each other via the bus 114. An input / output interface 115 is also connected to the bus 114.
[0079]
The input / output interface 115 includes an input unit 116 including a keyboard and a mouse, an output unit 117 including a display and a speaker, a storage unit 118 including a hard disk, and a communication unit 119 including a modem and a terminal adapter. It is connected. The communication unit 119 performs communication processing via a network including the Internet.
[0080]
A drive 120 is connected to the input / output interface 115 as necessary, and a magnetic disk 131, an optical disk 132, a magneto-optical disk 133, a semiconductor memory 134, or the like is appropriately mounted, and a computer program read from them is loaded. If necessary, it is installed in the storage unit 118.
[0081]
When a series of processing is executed by software, a program constituting the software executes various functions by installing a computer incorporated in dedicated hardware or various programs. For example, a general-purpose personal computer is installed from a network or a recording medium.
[0082]
As shown in FIG. 7, this recording medium includes a magnetic disk 131 (including a floppy disk) and an optical disk 132 (including a floppy disk) that are distributed to supply a program to a user separately from the apparatus main body. Package media including CD-ROM (compact disk-read only memory), DVD (including digital versatile disk), magneto-optical disk 133 (including MD (mini-disk) (trademark)), or semiconductor memory 134 In addition to being configured, it is configured by a ROM 112 storing a program and a hard disk included in the storage unit 118 supplied to the user in a state of being incorporated in the apparatus main body in advance.
[0083]
In the present specification, the step of describing the program stored in the recording medium is not limited to the processing performed in chronological order in the order in which it is included, but is not necessarily processed in chronological order, either in parallel or individually. The process to be executed is also included.
[0084]
【The invention's effect】
  According to the present invention, image data can be encoded.
  Moreover, according to the present invention, the encoding is finished.Past GOP imagesWhen the supplement to be added to the usable bit amount R is determined based on the difficulty level in the case where the maximum value of the supplement can be set, the bit supply rate control is applied to the feedback type rate control.Virtual bufferUnderflow can be prevented.
[0085]
  In addition, when encoding a GOP in which a scene change has occurred, a value obtained by comparing the image difficulty levels of the intra-frame encoded images before and after the scene change is used.When the encoding difficulty of the intra-frame encoded image of the previous GOP is lower than the encoding difficulty of the intra-frame encoded image of the GOP to be encoded, supplement When the encoding difficulty of the intra-frame encoded image of the previous GOP is higher than the encoding difficulty of the intra-frame encoded image of the GOP to be encoded so that the maximum value of the total value of supplement So that the maximum of the total value ofThe maximum supplement value can be reset, so when applying bit supplement rate control to feedback type rate control.Virtual bufferUnderflow can be prevented.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an encoder to which the present invention is applied.
FIG. 2 is a flowchart illustrating a bit replenishment rate control process.
FIG. 3 is a flowchart illustrating max_sum_sup calculation processing 1;
FIG. 4 is a diagram for describing a VBV buffer and a maximum value of sum_sup.
FIG. 5 is a diagram for explaining a VBV buffer and a maximum value of sum_sup.
FIG. 6 is a flowchart illustrating max_sum_sup calculation processing 2;
FIG. 7 is a diagram illustrating a configuration of a personal computer.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Encoder, 12 Image rearrangement part, 13 Scan conversion and macroblock conversion part, 14 Intra AC calculation part, 15 Rate control part, 16 arithmetic processing part, 17 Motion detection part, 18 DCT process part, 19 Quantization part, 20 VLC section, 21 buffer, 22 inverse quantization section, 23 inverse DCT processing section, 24 arithmetic processing section, 25 motion compensation section, 31 ME residual calculation section, 32 difficulty calculation section, 33 genbit detection section, 34 target bit determination , 35 Quantization index determination unit

Claims

In an encoding device that encodes uncompressed data,
First detection means for detecting the encoding difficulty level of the uncompressed data;
Encoding means for compressing and encoding the uncompressed data on the basis of GOP;
Second detection means for detecting a bit generation amount of an intra-frame encoded image of the uncompressed data included in the GOP immediately before encoded by the encoding means;
From the buffer capacity of the virtual buffer corresponding to the input buffer of the decoder that decodes the encoded stream in which the uncompressed data is encoded, the bit generation amount of the intra-frame encoded image detected by the second detecting means is calculated. The subtracted value is calculated, and usable bits allocated to the remaining pictures not yet encoded in the GOP out of the buffer capacity of the virtual buffer allocated to the uncompressed data included in the GOP being encoded Setting means for setting as a maximum value of the total value of the bit replenishment amount for each GOP added to the amount;
The bit replenishment amount of the GOP to be encoded is encoded in the past by the encoding means that satisfies the maximum value of the total value set by the setting means and detected by the first detection means. When the encoding difficulty level of the uncompressed data included in the GOP is higher than the first value, the second value of the encoding difficulty level is lower than the first value so as to be a positive value. Calculating means for calculating to be 0 when the encoding difficulty level is between the first value and the second value so as to be a negative value when lower. A characteristic encoding apparatus.

The I picture and the previous I-picture of a scene change, a third detection means for detecting a difference between the coding difficulty,
When the difference in the encoding difficulty level is detected by the third detection means, the encoding difficulty level of the previous I picture detected by the first detection means is the encoding of the I picture of the scene change . When the difficulty level is lower than the difficulty level, the encoding difficulty level of the previous I picture detected by the first detection unit is set so that the maximum value of the total value of the bit replenishment amounts set by the setting unit increases. Is higher than the encoding difficulty level of the I picture of the scene change, the maximum value of the total amount of bit replenishment is set so that the maximum value of the total amount of bit replenishment set by the setting means is reduced. The encoding device according to claim 1, further comprising resetting means for resetting.

In an encoding method of an encoding device that encodes uncompressed data,
A first detection step of detecting the encoding difficulty level of the uncompressed data;
An encoding step of compressing and encoding the uncompressed data on the basis of GOP;
A second detection step of detecting a bit generation amount of an intra-frame encoded image of the uncompressed data included in the immediately preceding GOP encoded in the past by the processing of the encoding step;
Bit generation amount of the intra-frame encoded image detected by the processing of the second detection step from the buffer capacity of the virtual buffer corresponding to the input buffer of the decoder that decodes the encoded stream in which the uncompressed data is encoded The value obtained by subtracting can be calculated and can be assigned to the remaining pictures not yet encoded in the GOP out of the buffer capacity of the virtual buffer allocated to the uncompressed data included in the GOP being encoded. A setting step for setting the maximum value of the total amount of bit supply for each GOP to be added to the bit amount;
The bit replenishment amount of the GOP to be encoded satisfies the maximum value of the total value set by the processing of the setting step, and is detected by the processing of the first detection step. When the encoding difficulty level of the uncompressed data included in the GOP encoded in the past by processing is higher than the first value, the encoding difficulty level is higher than the first value so as to be a positive value. When the encoding difficulty level is between the first value and the second value, the calculation is performed so as to be 0 when the encoding difficulty level is between the first value and the second value. An encoding method comprising the steps of:

A program for causing a computer to execute processing for encoding uncompressed data,
A first detection step of detecting the encoding difficulty level of the uncompressed data;
An encoding step of compressing and encoding the uncompressed data on the basis of GOP;
A second detection step of detecting a bit generation amount of an intra-frame encoded image of the uncompressed data included in the immediately preceding GOP encoded in the past by the processing of the encoding step;
Bit generation amount of the intra-frame encoded image detected by the processing of the second detection step from the buffer capacity of the virtual buffer corresponding to the input buffer of the decoder that decodes the encoded stream in which the uncompressed data is encoded The value obtained by subtracting can be calculated and can be assigned to the remaining pictures not yet encoded in the GOP out of the buffer capacity of the virtual buffer allocated to the uncompressed data included in the GOP being encoded. A setting step for setting the maximum value of the total amount of bit supply for each GOP to be added to the bit amount;
The bit replenishment amount of the GOP to be encoded satisfies the maximum value of the total value set by the processing of the setting step, and is detected by the processing of the first detection step. When the encoding difficulty level of the uncompressed data included in the GOP encoded in the past by processing is higher than the first value, the encoding difficulty level is higher than the first value so as to be a positive value. When the encoding difficulty level is between the first value and the second value, the calculation is performed so as to be 0 when the encoding difficulty level is between the first value and the second value. A recording medium on which is recorded a program that causes a computer to execute a process including the steps.

A program for causing a computer to execute processing for encoding uncompressed data,
A first detection step of detecting the encoding difficulty level of the uncompressed data;
An encoding step of compressing and encoding the uncompressed data on the basis of GOP;
A second detection step of detecting a bit generation amount of an intra-frame encoded image of the uncompressed data included in the immediately preceding GOP encoded in the past by the processing of the encoding step;
Bit generation amount of the intra-frame encoded image detected by the processing of the second detection step from the buffer capacity of the virtual buffer corresponding to the input buffer of the decoder that decodes the encoded stream in which the uncompressed data is encoded The value obtained by subtracting can be calculated and can be assigned to the remaining pictures not yet encoded in the GOP out of the buffer capacity of the virtual buffer allocated to the uncompressed data included in the GOP being encoded. A setting step for setting the maximum value of the total amount of bit supply for each GOP to be added to the bit amount;
The bit replenishment amount of the GOP to be encoded satisfies the maximum value of the total value set by the processing of the setting step, and is detected by the processing of the first detection step. When the encoding difficulty level of the uncompressed data included in the GOP encoded in the past by processing is higher than the first value, the encoding difficulty level is higher than the first value so as to be a positive value. When the encoding difficulty level is between the first value and the second value, the calculation is performed so as to be 0 when the encoding difficulty level is between the first value and the second value. A program that causes a computer to execute a process characterized by including the steps.