JP3734286B2

JP3734286B2 - Video encoding device and video transmission device

Info

Publication number: JP3734286B2
Application number: JP32643594A
Authority: JP
Inventors: 晋一郎古藤; 敏則尾高; 朋夫山影; 知也児玉
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1994-12-27
Filing date: 1994-12-27
Publication date: 2006-01-11
Anticipated expiration: 2021-01-11
Also published as: JPH08186821A

Description

【０００１】
【産業上の利用分野】
本発明は、特に蓄積メディアのように符号化時にリアルタイム性が要求されない動画像システムにおいて高い圧縮率で高画質の符号化を実現する動画像符号化装置及び動画像伝送装置に関する。
【０００２】
【従来の技術】
動画像の圧縮符号化方式の国際規格として、ビデオＣＤの規格にも採用されたＭＰＥＧ−１（ＩＳＯ／ＩＥＣ１１１７２）や、次世代のデジタル放送、ＶＯＤ(Video On Demand) あるいはＤＶＤ(Digital Video Disk)などへの応用が期待されているＭＰＥＧ−２（ＩＳＯ／ＩＥＣ１３８１８）などが知られている。
【０００３】
ＭＰＥＧ符号化方式では、動き補償予測とＤＣＴ（離散コサイン変換）をベースにしたエントロピー符号化が用いられているため、均一な画質での符号化を行う場合、入力動画像信号に対する予測効率や解像度といった性質に応じて、発生符号量が大幅に変動する。伝送路やメディアの制約から、この発生符号量の時間変動は所定範囲内に収まるように制限される必要がある。通常、量子化の精度（量子化ステップ幅）を制御することで、この条件を満たすように発生符号量は制御される。しかし、この方法では入力動画像信号の画像の動きが激しいために予測効率が低い場合や、解像度が高くフレーム内の情報量が大きい場合には、量子化ステップ幅が大きくなるため、量子化による符号化歪みが大きくなり、画質劣化が顕著となる。
【０００４】
一方、こうして符号化された信号の伝送方式として固定レート伝送方式を用いた場合、発生符号量の時間変動は符号化ビットストリームを伝送前に一時蓄える平滑化バッファによってのみ吸収される。従って、発生符号量の時間変動は規定された平滑化バッファのサイズの中で制限されるが、その自由度は大きくとることが出来ず、入力動画像信号の画像の性質に依存して画質劣化が顕著になる場合がある。
【０００５】
符号化ビットストリームの伝送方式として、伝送レートを時間的に変動させる可変レート伝送方式も知られている。ＡＴＭのパケット伝送やＨＤＤなどのようにバースト転送を基本とする伝送方式では、この可変レート伝送を実現することが可能である。可変レート伝送では、伝送路の最大レートと平滑化バッファのサイズにより制限される範囲で、固定レート伝送に比べて大幅な発生符号量の時間変動を許容できる。したがって、入力動画像信号の画像の性質に応じた、より高能率な符号化を実現することが可能となる。
【０００６】
符号化ビットストリームを光ディスクなどのディジタル蓄積メディア、つまりＤＳＭ(Digital Storage Media) に蓄積する場合には、ＤＳＭの最大容量と最大転送レートで制限される可変レート伝送が可能である。このような蓄積系への応用では、実時間での符号化は必ずしも要求されず、効率的な符号化による高画質化や記録時間の拡大が重要となる。このためには、入力動画像信号系列全体の性質に基づいて最適な可変レート制御を行うことが有効と考えられるが、動画像信号の可変レート伝送における最適なレート制御方法は未だ確立されていない。
【０００７】
【発明が解決しようとする課題】
上述したように、動画像符号化技術を蓄積系に応用を考えた場合、記録時間の拡大や高画質化を図るために入力動画像信号の全体の性質に基づいて最適なレート制御を行うことが有効と考えられるが、そのようなレート制御方法は未だ確立されておらず、その実現が望まれていた。
【０００８】
本発明は、入力動画像信号系列の全体の性質に応じたレート制御の最適化による大幅な高画質化と記録時間の拡大を可能とする蓄積系に適した動画像符号化装置を提供することを目的とする。
【０００９】
【課題を解決するための手段】
上記課題を解決するため、本発明においては、入力動画像信号系列全体を一旦符号化した後、この全体にわたる発生符号量を含む統計量に基づいて再度同一の入力動画像信号系列全体を符号化する動画像符号化装置において、前記入力動画像信号系列を画素ブロック毎に複数の量子化ステップ幅を交互に切り換えて符号化する符号化手段と、前記複数の量子化ステップ幅でそれぞれ符号化された画素ブロックの符号量をフレーム毎に独立に加算し、複数の加算符号量を求める手段と、前記複数の加算符号量から量子化ステップ幅毎のフレームの発生符号量を推定する手段と、推定された発生符号量に基づいて前記符号化手段に対して最終回符号化のための前記入力動画像信号系列全体の最適符号量割り当てを行う符号量割り当て手段とにより構成される動画像符号化装置を提供する。
【００１０】
また本発明に係る動画像符号化装置において、前記符号量割り当ての結果に基づき平滑化バッファ占有量の時間変動を推定する手段と、推定結果に応じて符号量の配分を補正する手段とを含み、前記符号化手段は補正符号量に従って動画像信号系列全体を符号化することを特徴とする。
【００１４】
【作用】
第１の発明においては、少なくとも１回目を含む符号化で入力動画像信号系列全体の性質に依存する入力動画像信号系列の全体にわたる発生符号量などの統計量を抽出し、それに基づいて入力動画像信号系列の所定数のフレーム毎に符号量配分または量子化ステップ幅あるいはその両方を選択する。そして、こうして選択された符号量配分や量子化ステップ幅に基づいて、最終回、例えば２回目の符号化を行う際の量子化ステップ幅を画面内の所定の領域毎に制御する。ＤＳＭのような総符号量が一定の蓄積メディアにおいては、実際にメディアに符号化データを蓄積する前に、可変レート伝送を利用した最適な符号量配分を事前に行うことができるため、このような制御が容易に可能となる。
【００１５】
従って、動画像信号系列の可変レート伝送において入力動画像信号系列全体の性質に基づく最適なレート制御が可能となり、平均伝送レートを上げることなく高画質化およびメディアへの記録時間の拡大を実現することができる。また、符号量配分や量子化ステップ幅の選択単位としてのフレーム数を１フレームあるいは少数のフレーム群とすることで、シーンチェンジ等の急激な入力動画像信号系列変動にも対応した安定な画質を得ることが可能となる。
【００１６】
また、第２の発明ではさらに伝送時に必要な一時記憶手段である平滑化バッファの占有量の時間変動も事前に推定することができるため、その記憶容量つまり平滑化バッファのサイズを最大限利用して、先に示した２または３つの条件を満たすように符号量配分を最適化することが可能となる。従って、可変レート伝送では平滑化バッファを有効に用いることにより、瞬時のビットレートは伝送路の最大伝送レートを超えて符号量の配分を行うことも可能となる。固定レート伝送においても、平滑化バッファで吸収できる範囲での発生符号量の変動を最大限に利用して、高画質化を実現することが可能となる。
【００１７】
また、本発明では入力動画像信号系列全体の性質を分析するために、符号化部における量子化ステップ幅の選択を外部より指定して１回目の符号化を行い、その時の統計量の収集を行う。従って、統計量を抽出できる符号化装置ではハードウェアを変更することなく入力動画像信号系列全体の性質を分析できる。一方、統計量抽出機能を持たない符号化装置の場合、例えば量子化ステップ幅の選択を外部より指定して符号化ビットストリームを分析することにより、入力動画像信号系列全体の性質を分析するのと等価な結果を得ることが可能であり、統計量抽出手段を既存の符号化装置と独立に持つことで、大きなハードウェアの変更を必要とすることなく分析が可能となる。
【００１８】
一般に、符号化画像のＳ／Ｎは量子化ステップ幅と単調な関係を持っており、量子化ステップ幅を基準にして画質を決定することが可能である。また、量子化ステップ幅は発生符号量の動的な制御にも用いられる。従って、入力動画像信号系列全体の性質の一つとして、量子化ステップ幅と発生符号量の関係を表すパラメータの推定を行うことは、画質を考慮したレート制御を行う上で有効な方法である。量子化ステップ幅と発生符号量の関係を表すパラメータの次元をｎとすると、そのパラメータ推定には少なくともｎ個以上の固定の量子化ステップ幅を用いて符号化を行い、その結果からパラメータ推定を行うことが必要になる。つまり、全シーケンスに渡ってｎ回の符号化を繰り返すことが必要になる。
【００１９】
これに対し、本発明では連続するフレームの類似性を考慮し、フレーム群毎にフレーム単位で量子化ステップ幅を切替えて符号化を行うことで、各フレーム群における量子化ステップ幅と発生符号量の関係の推定を一度の符号化で行うことができる。さらに、フレーム内において画素ブロック単位に量子化ステップ幅を切替えて符号化することにより、各フレームに対する量子化ステップ幅と発生符号量の関係を一度の符号化で高精度に推定することが可能となる。
【００２０】
また、入力画像の性質に応じて視覚特性を考慮した適応量子化処理を必要とする場合は、固定の量子化ステップ幅に対して、フレーム単位あるいは画素領域単位の補正係数を乗じた量子化ステップ幅を用いて、統計量の抽出を行うことにより、各フレームあるいは各フレーム群単位の量子化ステップ幅と発生符号量の関係の推定を行うことが可能である。
【００２１】
【実施例】
以下、図面を参照して本発明の実施例を説明する。
（実施例１）
図１は、本発明に係る一実施例の概略構成を示すブロック図である。同図において、ディジタルＶＴＲ１０にはディジタル化された映画その他の番組の動画像信号系列が記録されている。このディジタルＶＴＲ１０から再生された動画像符号化系列（入力動画像信号系列）１１は符号化部１２に入力され、圧縮符号化される。符号化部１２からは符号化データがビットストリームの形で出力され、この符号化ビットストリーム１３は、ディジタル蓄積メディア（ＤＳＭ）１４に蓄積される。さらに、この動画像符号化装置においては、符号化部１２に加えて統計量蓄積用のデータメモリ１６、画像分析部１７、符号量割当部１９およびレート制御パラメータ蓄積用のデータメモリ２１が設けられており、これらは符号化部１２に対するレート制御に用いられる。
【００２２】
ディジタルＶＴＲ１０から所望の番組の動画像信号を再生し、それを符号化部１２で符号化して、符号化ビットストリーム１３をＤＳＭ１４に蓄積する場合、ディジタルＶＴＲ１０から同じ番組の入力動画像信号系列１１が繰り返し２回再生される。すなわち、符号化部１２には同じ入力動画像信号系列１１が２回繰り返して入力され、符号化される。ここで、２回繰り返して入力される入力動画像信号系列１１は、１本の番組全体の動画像信号系列であってもよいし、番組が長編映画のように長いものである場合は、その番組を前半と後半の２つに分けるなど、複数の時間帯に分割した系列であってもよい。
【００２３】
符号化部１２での１回目の符号化は、入力動画像信号系列１１全体の性質を分析してその統計量を抽出するために行われ、２回目の符号化は、１回目の符号化で抽出された統計量に基づいて符号量配分や量子化ステップ幅の選択による最適なレート制御のために行われる。そして、２回目の符号化によって最適レート制御の下で得られた符号化ビットストリーム１３が最終的にＤＳＭ１４に蓄積される。
【００２４】
次に、本実施例の詳細な構成と動作について述べる。
ディジタルＶＴＲ１０からの入力動画像信号系列１１は、前述したように２回繰り返して符号化部１２に入力され、符号化される。符号化部１２は１回目の符号化時には、固定の量子化ステップ幅を用いて符号化を行う。この１回目の符号化の際、符号化部１２からフレーム毎の発生符号量、アクティビティ、予測効率等の統計量が統計量パラメータ１５として抽出され、これがデータメモリ１６に蓄積される。
【００２５】
この１回目の符号化が終了した時点で、データメモリ１６に蓄積された各統計量の統計パラメータから、画像分析部１７によってフレーム単位で統計量の自動分析を行い、入力動画像信号系列１１の各画像の特性パラメータ１８を得る。
【００２６】
この特性パラメータ１８は符号量割当部１９に送られ、各フレームについて符号化部１２でのバッファリングの制限及び伝送レートの制限を満たす範囲で、入力動画像信号系列１１全体に対して、主観的な画質変動を抑えた最適符号量割当がなされる。この最適符号量割当の結果得られたフレーム単位のレート制御パラメータ２０は、もう一つのデータメモリ２１に蓄積される。
【００２７】
次に、ディジタルＶＴＲ１０から同じ入力動画像信号系列１１が符号化部１２入力され、２回目の符号化が行われる。この２回目の符号化に際しては、１回目の符号化で上述のようにデータメモリ２１に蓄積され、ここから読出されたレート制御パラメータ２２に基づきレート制御を行う。そして、この２回目の符号化で得られた符号化ビットストリーム１３がＤＳＭ１４に蓄積される。
【００２８】
図２は、符号化部１２の具体的な構成例を示すブロック図である。この符号化部１２での符号化方式自体は、ＭＰＥＧ等で規定された公知のものである。図２において、入力動画像信号系列１１は減算器１０１と動き補償予測回路１０９に入力される。動き補償予測回路１０９では、入力動画像信号系列１１とフレームメモリ１０８に蓄えられている既に符号化／局部復号化によって得られた参照画像信号との間の動きベクトルが検出され、この動きベクトルに基づいて動き補償予測信号１０２が作成される。減算器１０１では、入力動画像信号系列１１から予測信号１０２が減算されることにより予測残差信号が生成される。この予測残差信号は、離散コサイン変換（ＤＣＴ）回路１０３において一定の大きさのブロック単位で離散コサイン変換され、ＤＣＴ係数情報となる。ＤＣＴ係数情報は、量子化回路１０４で量子化される。
【００２９】
量子化回路１０４からの量子化されたＤＣＴ係数情報は、逆量子化回路１０５により逆量子化される。逆量子化回路１０５の出力は、逆離散コサイン変換（逆ＤＣＴ）回路１０６により逆離散コサイン変換される。すなわち、逆量子化回路１０５および逆ＤＣＴ回路１０６では量子化回路１０４およびＤＣＴ回路１０３と逆の処理がそれぞれ行われ、逆ＤＣＴ回路１０６の出力に減算器１０１から出力される予測残差信号に近似した信号が得られる。逆ＤＣＴ回路１０６の出力は加算回路１０７において動き補償予測回路１０９からの予測信号１０２と加算され、局部復号信号が生成される。この局部復号信号は、フレームメモリ１０８に参照画像信号として記憶される。
【００３０】
一方、量子化回路１０４からの量子化されたＤＣＴ係数情報は可変長符号化回路１１０にも入力され、可変長符号化される。可変長符号化されたデータは、平滑化バッファ１１１を経て符号化ビットストリーム１３として取り出される。
【００３１】
入力動画像信号系列１１は、アクティビティ計算回路１１２にも入力される。アクティビティ計算回路１１２では、入力動画像信号系列１１の画像のアクティビティが計算され、その結果はレート制御回路１１３に入力される。レート制御回路１１３は、アクティビティと平滑化バッファ１１１のバッファ量（占有量）および図１のデータメモリ２１からのレート制御パラメータ２２に基づいて量子化回路１０４での量子化ステップ幅を制御することにより、レート制御、つまり符号化ビットストリーム１３の伝送レートの制御を行う。
【００３２】
また、図２では可変長符号化回路１１０からのフレーム毎の発生符号量、アクティビィティ計算回路１１２で計算されたアクティビィティおよび減算器１０１から出力される予測残差信号で示される予測効率を示す情報が統計量パラメータ１５として図１のデータメモリ１６へ出力される。
【００３３】
（実施例２）
図３は、本実施例の概略構成を示すブロック図である。同図において、ディジタルＶＴＲ３０、符号化部３２、ＤＳＭ３４、データメモリ３６、画像分析部３７、符号量割当部３９およびデータメモリ４１は、図１中に示したディジタルＶＴＲ１０、符号化部１２、ＤＳＭ１４、データメモリ１６、画像分析部１７、符号量割当部１９およびデータメモリ２１と基本的に同じである。
【００３４】
本実施例では、実施例１で示した統計量抽出のための１回目の符号化時においても、符号化部３２から出力される符号化ビットストリーム３３をＤＳＭ３４に記録する。ＤＳＭ３４に記録される１回目の符号化における符号化ビットストリームは、順次、統計量抽出部４３で解析されることにより統計量が抽出され、抽出された各統計量はデータメモリ３６に記録される。
【００３５】
そして、この１回目の符号化の終了後、実施例１と同様にフレーム単位で統計量の自動分析、及び符号量割当処理を行い、その結果に基づくレート制御の下で２回目の符号化を行う。
【００３６】
（実施例３）
本実施例では、実施例１および実施例２における符号量割当部１９，３９での最適符号量割当の具体例について述べる。図４〜図６は、最適符号量割当の様子とその効果を模式的に示した図である。
【００３７】
図４は、入力動画像信号系列１１の画像（以下、入力画像という）のエントロピー（複雑さ）の時間変動を示す図である。図５は、一様な符号量割当を行った場合の画質とビットレートの時間変動を示したものである。符号量割当を一様とすると、入力画像の複雑さの度合に応じて、符号化画像の画質は一般にその逆相の時間変動を示す。つまり、入力画像が複雑なほど符号化画像の画質は低下し、また準静止画のような情報量が少ない入力画像に対しては、画質は高くなる。この場合、視覚的に画質劣化した部分が顕著に知覚され、全体的な符号化画像の印象が悪いものとなる。
【００３８】
一方、図６は統計量の分析に基づいて最適符号量割当を行った場合の画質とビットレートの時間変動を示したものである。最適符号量割当は、伝送路の最大レートとバッファリングの制限を満たす範囲で安定した画質を得るために、入力画像の複雑さに応じた符号量割当を行うことでなされる。そのため、得られる画質は非常に安定したものとなり、全体としての視覚的な符号化画像の印象が向上する。また、エントロピー符号化におけるレート−歪み関数が通常は非線形特性を有していることから、最適符号量割当を行うことにより、一様な符号量割当の場合と比較して、総符号量が一定の下では符号化画像全体にわたるＳ／Ｎも向上することになる。
【００３９】
画質を考慮した最適な符号量割当および符号量制御を行うためには、量子化ステップ幅と発生符号量の関係を高精度に推定できることが重要となる。図７は、各画像における量子化ステップ幅と発生符号量の関係の例を示したものである。同図に示されるように、一般に発生符号量は量子化ステップ幅に対して単調に減少する。発生符号量と量子化ステップ幅との関係は、符号化方式に依存するとともに、入力画像の性質に応じて固有な特性を有する。ＭＰＥＧ符号化方式では、図２にも示したように動き補償予測とＤＣＴを用いているため、発生符号量と量子化ステップ幅との関係は、各画像の予測効率と入力動画像信号系列の空間周波数分布等に依存する。
【００４０】
ここで、所定のフレーム数（１フレームあるいは比較的少数のフレーム群）に対する量子化ステップ幅Ｑと発生符号量Ｒとの関係は、画像ｊに固有な統計量パラメータをａ_i ^j （ｉ＝０，１，…，ｎ）とすると、
Ｒ＝ｆ（Ｑ，ａ₁ ^j ，ａ₂ ^j ，…，ａ_n ）
と表すことができる。ここで、モデル化した関数ｆに対して、計算されたアクティビティや空間周波数分布、あるいは予測効率といった間接的なパラメータから、パラメータａ_i を推定する試みもなされている。しかし、一般には実測データのモデル化した関数系からのばらつきが大きくなり、これら間接的なパラメータからは、高精度に発生符号量と量子化ステップ幅の関係を推定することが困難である。そこで、入力動画像信号系列１１の所定数のフレーム毎（１フレーム毎あるいはフレーム群毎）に量子化ステップ幅と発生符号量の関係を直接実測し、回帰分析等により統計量パラメータａ_i ^j （ｉ＝０，１，…，ｎ）を求めることによって、高精度な発生符号量の推定が可能となる。
【００４１】
（実施例４）
図８は、本実施例における符号化処理の流れを示すフローチャート図である。本実施例における動画像符号化装置の構成は、図１または図３と同様である。
【００４２】
入力動画像信号系列１１（または３１）の所定数のフレーム（１フレームあるいはフレーム群ｊ）に対する発生符号量Ｒと量子化ステップ幅Ｑの関係をＲ＝ｆ（Ｑ，ａ₁ ^j ，ａ₂ ^j ）として、はじめに入力動画像信号系列全体にわたり統計量抽出のための前符号化として、固定の量子化ステップ幅Ｑ＝Ｑ₁ 及びＱ＝Ｑ₂ を用いて符号化部１２（または３２）により２回の符号化を順次行う（ステップＳ１１）。
【００４３】
そして、これらの符号化時の各フレームあるいはフレーム群毎の発生符号量を例えば図２の可変長符号化回路１１０の出力から実測することにより、統計量パラメータａ₁ ^j ，ａ₂ ^j の推定を行う。すなわち、統計量の分析処理を行う（ステップＳ１２）。
【００４４】
こうして推定された量子化ステップ幅と発生符号量の関係を用いることで、画質変動を抑え、且つ最大レート、平均レートおよび平滑化バッファサイズ等の伝送路の条件を満たした最適な符号量配分を符号量割当部１９（または３８）で行い（ステップＳ１３）、それに基づいて符号化部１２（または３２）で圧縮符号化を行う（ステップＳ１４）。
【００４５】
（実施例５）
図９を用いて本実施例を説明する。図９において、Ｉはフレーム内符号化のみを行う画像、Ｐは前方予測を行う画像、Ｂは前方および後方予測を行う画像をそれぞれ示している。本実施例は、時間的に連続するＩ，Ｐ，Ｂの画像タイプ毎に、量子化ステップ幅をＱ₁ ，Ｑ₂ と交互に切り換えて符号化することにより、時間的に隣接するフレームの類似性を考慮して、入力動画像信号系列全体にわたる一度の符号化により、等価的に２回の符号化を実現するものである。つまり、例えば図中のＢ１およびＢ２について、ともに発生符号量Ｒと量子化ステップ幅Ｑの関係を
Ｒ＝ｆ（Ｑ，ａ₁ ¹²，ａ₂ ¹²）
として、量子化ステップ幅Ｑ_b1，Ｑ_b2を用いたときのＢ１およびＢ２の符号化時の発生符号量の実測値をそれぞれＲ_b1, Ｒ_b2として、
（Ｒ，Ｑ）＝（Ｒ_b1，Ｑ_b1），（Ｒ_b2，Ｑ_b2）
から、上式のａ₁ ¹²，ａ₂ ¹²を求める。
【００４６】
このように１回の符号化によって、同じ動画像信号系列を実質的に２回行ったのと同じ結果を得ることも可能である。
（実施例６）
図１０および図１１を用いて本実施例を説明する。図１０および図１１は、１フレーム内の各画素ブロックの量子化ステップ幅を示し、図１０は量子化ステップ幅が２種類の場合、図１１は量子化ステップ幅が３種類の場合である。
【００４７】
今、フレームｊの量子化ステップ幅Ｑと発生符号量Ｒの関係を、
Ｒ＝ｆ（Ｑ，ａ₁ ^j ，ａ₂ ^j ）
とする。ここでフレームｊの各画素ブロック毎に量子化ステップ幅を図１０に示すようにＱ₁ ，Ｑ₂ 、あるいは図１１に示すようにＱ₁ ，Ｑ₂ ，Ｑ₃ と交互に切り換え、それぞれの量子化ステップ幅に対応する画素ブロック毎に、独立に発生符号量をフレーム内で加算する。ここで、１フレーム内の量子化ステップ幅の種類は２または３に限るものではなく、一般にＮ種類を１フレーム内で用いるものとする。量子化ステップ幅Ｑ_n （ｎ＝１，２，…，Ｎ）に対するブロック毎の発生符号量のフレーム内加算値をそれぞれＲ_n として、
（Ｒ，Ｑ）＝（Ｎ×Ｒ_n ，Ｑ_n ）（ｎ＝１，２，…，Ｎ）
から、上式のパラメータａ₁ ^j ，ａ₂ ^j を求める。これにより、１度の符号化から、等価的にＮ回の固定の量子化ステップ幅の符号化結果を得ることが可能となる。なお、１フレーム内の量子化ステップ幅の組Ｎは、パラメータａ₁ ^j の次数以上に設定することで、回帰分析によるパラメータ推定が可能となる。
【００４８】
（実施例７）
図１２は、本実施例の概略構成を示すブロック図である。同図において、ディジタルＶＴＲ５０、符号化部５２、ＤＳＭ５４、データメモリ５６、画像分析部５７、符号量割当部５９およびデータメモリ６１は、図１に示したディジタルＶＴＲ１０、符号化部１２、ＤＳＭ１４、データメモリ１６、画像分析部１７、符号量割当部１９およびデータメモリ２１と基本的に同じである。
【００４９】
本実施例では、主観的な画質に影響を及ぼすアクティビティ、予測効率、動き量、予測モード等のパラメータに応じて、各画素ブロック単位あるいはフレーム単位に量子化ステップ幅に対して視覚補正の重み付けを行うことが先の実施例１および２と異なっている。すなわち、本実施例では適応量子化重み計算部６４が新たに追加され、この適応量子化重み計算部６４では、これらアクティビティ、予測効率、動き量、予測モード等のパラメータ６２から適応量子化重みパラメータ６３を計算する。適応量子化処理は、視覚的に画質劣化が目立ちやすい部分では量子化を細かくし、また画質劣化が目立ちにくい部分では量子化を粗くすることで、全体としての主観的な画質を向上させることが目的である。
【００５０】
図１３および図１４は、それぞれ時間方向およびフレーム内の空間方向での適応量子化処理を用いる場合の適応量子化重みパラメータの一例を示したものである。適応量子化処理を用いる場合、図１３および図１４で示されるような時間方向または空間方向の適応処理の重み関数がそのフレームを符号化する以前に得られる場合には、統計量抽出のための固定の量子化ステップ幅での１回目の符号化の際に、この適応処理の重み関数で固定の量子化ステップ幅を変化させて符号化を行う。これによって、所定数のフレーム（１フレームまたはフレーム群）に対する発生符号量と量子化ステップ幅の関係をより高精度に推定することが可能となる。
【００５１】
（実施例８）
図１５は、本発明による動画像符号化装置により得られた符号化データから元の動画像信号系列を復号するための可変レート動画像復号化装置の実施例を示したものである。同図において、ＤＳＭ７０には符号化データが蓄積されており、このＤＳＭ７０から再生された符号化ビットストリームは伝送路７２を介して動画像復号化装置７１に入力され、まず平滑化バッファであるＦＩＦＯバッファ７３に入力される。ここで、伝送路７２は規定された最大伝送レートＲｍａｘで符号化データを伝送し、ＦＩＦＯバッファ７３の占有率が規定値を越えると伝送を停止するように構成されている。
【００５２】
ＦＩＦＯバッファ７３は、デコーダ７５からの要求に応じて符号化データをデコーダ７５に送出する。このとき、１フレーム分の符号化データは、該フレームをデコードすべき時刻においてＦＩＦＯバッファ７３からデコーダ７５へ瞬時に転送するものとする。なお、本伝送モデルはＩＳＯ／ＩＥＣ１３８１８−２において規定されている。こうしてデコーダ７５で復号された動画像信号系列７６は、表示装置７７に送られて表示される。
【００５３】
（実施例９）
図１６は、本発明に係る動画像符号化装置における符号量割当処理の一実施例の処理手順を示している。まず、前述のように抽出された各統計パラメータを用いて、入力動画像系列全体に渡って最適符号量配分を行う（ステップＳ２１）。次に、その符号量配分結果に基づき、前記平滑化バッファ占有量の時間変動の推定を行う（ステップＳ２３）。
【００５４】
図１７は、与えられた総符号量の下で主観的な画質が一定となるように最適符号量配分を行った場合の、前記伝送モデルに従った平滑化バッファの占有量推移の推定結果を示したものである。ここで、バッファ占有量の時間的変動の傾きは伝送路の最大伝送レートを示し、またフレーム周期の時刻において符号化データが平滑化バッファから瞬時に抜き取られている。図１７では、平滑化バッファの制限を考慮していないため、時刻ｍにおいて平滑化バッファのアンダーフローを起こすことが推定される。
【００５５】
そこで、本実施例では図１６のステップＳ２４で平滑化バッファのアンダーフローの可能性を検証し、アンダーフローが予測される場合には、アンダーフロー予測時刻から遡り、バッファ占有率が十分高くなる時刻（図１７の例ではｎ）までの間の配分符号量を他の時刻領域に再配分することにより、符号量配分の補正を行う（ステップＳ２５）。これにより、アンダーフローを起こさない符号量配分を実現することができる。
【００５６】
図１８は、平滑化バッファの制限により補正された符号量配分による平滑化バッファ占有量の時間変動の推定結果を示している。このとき、短時間平均伝送レートの時間変動は図１９に示されるようになり、バッファ変動を有効に利用することで、伝送路の最大レートを瞬間的に越えた符号量配分も可能となる。すなわち、伝送路の最大伝送レートと平滑化バッファを組み合わせた最適な符号量配分が実現できる。
【００５７】
また、固定レート伝送においても、平滑化バッファのアンダーフローとオーバーフローを起こさないように制御することが必要となるが、平滑化バッファで吸収できる範囲で、可変レート伝送と同様に高画質化の意味で最適な符号量配分が可能となる。
【００５８】
このように本実施例では、伝送路の最大伝送レートと平滑化バッファの記憶容量によって規定される瞬時の最大伝送レートの制限を満たす、(b) 平滑化バッファがアンダーフロー及びオーバーフローを起こさない、(c) 前記伝送路に規定された平均伝送レートを満たす、の３つの条件を満足するように最適な符号量配分を行うことができる。
【００５９】
【発明の効果】
以上説明したように、本発明によれば符号化時にリアルタイム性が必ずしも要求されない蓄積系での動画像圧縮符号化において、例えば固定の量子化ステップ幅を用いて入力動画像信号系列全体に渡る１回目の符号化を行うことで統計量を抽出して、所定数のフレーム（１フレームあるいはフレーム群）に対する発生符号量と量子化ステップ幅の関係を推定し、それに基づいて発生符号量の符号量配分および量子化ステップ幅の少なくとも一方を選択し、２回目の符号化時の量子化ステップ幅を入力動画像信号系列の画面内の所定の領域毎に制御することにより、伝送路の制約と限られた総符号量の下での高画質化を両立した最適な符号量配分が可能となる。
【００６０】
ここで、さらに平均伝送レートおよび伝送路の最大ビットレートの制限に加えて、平滑化バッファ占有量の時間推移も含めて符号量配分を最適化することにより、可変レート伝送および固定レート伝送のいずれにおいても、平滑化バッファのオーバーフローまたはアンダーフローを防ぐとともに、より高画質化を実現することが可能となる。
【００６１】
また、１回目の符号化時にフレーム毎あるいは画素ブロック毎に、複数の量子化ステップ幅を切り換えて符号化を行うことで、１回目の符号化から数回に渡り入力動画像信号系列全体の符号化を繰り返し行った場合と同等の高い精度で、フレーム単位あるいはフレーム群単位の発生符号量と量子化ステップ幅の関係を推定することが可能となる。
【００６２】
さらに、フレーム毎あるいは画素ブロック毎に視覚特性を考慮した適応量子化処理を用いる場合には、固定の量子化を用いた１回目の符号化時においても、適応処理の重み関数を用いて固定の量子化スケールの値を変化させて用いることにより、各フレームあるいはフレーム群の符号化特性を精度を落とさずに得ることができる。
【図面の簡単な説明】
【図１】実施例１に係る動画像符号化装置の概略構成を示すブロック図
【図２】図１における符号化部の具体的な構成例を示すブロック図
【図３】実施例２に係る動画像符号化装置の概略構成を示すブロック図
【図４】実施例３を説明するための入力画像のエントロピーの時間変化を示す図
【図５】実施例３を説明するための固定レート符号化における画質とビットレートの時間変動を示す図
【図６】実施例３を説明するための可変レート符号化における画質とビットレートの時間変動を示す図
【図７】実施例３を説明するための画像毎の量子化ステップ幅と発生符号量の関係を示す図
【図８】実施例４における符号化処理の流れを示すフローチャート
【図９】実施例５を説明するための複数の画像タイプを示す図
【図１０】実施例６を説明するための１フレーム内の各画素ブロックの量子化ステップ幅を示す図
【図１１】実施例６を説明するための１フレーム内の各画素ブロックの量子化ステップ幅を示す図
【図１２】実施例７に係る動画像符号化装置の概略構成を示すブロック図
【図１３】実施例７における時間方向の適応量子化処理に用いる重み関数の例を示す図
【図１４】実施例７における空間方向の適応量子化処理に用いる重み関数の例を示す図
【図１５】実施例８に係る蓄積系の動画像復号化装置の構成例を示すブロック図
【図１６】実施例９における符号量配分処理を示すフローチャート図
【図１７】実施例９を説明するための平滑化バッファ占有量の時間変動を示す図
【図１８】実施例９を説明するための平滑化バッファ占有量の時間変動を示す図
【図１９】実施例９を説明するためのビットレートの時間変動を示す図
【符号の説明】
１０，３０，５０…ディジタルＶＴＲ
１１，３１，５１…入力画像信号
１２，３２，５２…符号化部
１３，３３，４２，５３…符号化ビットストリーム
１４，３４，５４…ＤＳＭ（ディジタル蓄積メディア）
１５，５５…統計量パラメータ
１６，３６，５６…統計量蓄積用データメモリ
１７，３７，５７…画像分析部
１８，３８，５８…特性パラメータ
１９，３９，５９…符号量割当部
２０，４０，６０…レート制御パラメータ
２１，４１，６１…レート制御パラメータ用データメモリ
２２…レート制御パラメータ
４３…統計量抽出部
６２…アクティビティ、予測効率、動き量、予測モード等
６３…適応量子化重みパラメータ
６４…適応量子化重み計算処理部
７０…ＤＳＭ
７１…動画像復号化装置
７２…伝送路
７３…ＦＩＦＯバッファ
７４…符号化データ
７５…デコーダ
７６…復号化された動画像信号系列
７７…表示装置
１０１…減算器
１０２…予測信号
１０３…ＤＣＴ回路
１０４…量子化回路
１０５…逆量子化回路
１０６…逆ＤＣＴ回路
１０７…加算器
１０８…フレームメモリ
１０９…動き補償予測回路
１１０…可変長符号化回路
１１１…平滑化バッファ
１１２…アクティビィティ計算回路
１１３…レート制御回路[0001]
[Industrial application fields]
The present invention relates to a moving image encoding apparatus and a moving image transmission apparatus that realize high-quality encoding at a high compression rate, particularly in a moving image system that does not require real-time performance during encoding, such as a storage medium.
[0002]
[Prior art]
MPEG-1 (ISO / IEC11172) adopted as a video CD standard as an international standard for moving image compression coding, next-generation digital broadcasting, VOD (Video On Demand) or DVD (Digital Video Disk) MPEG-2 (ISO / IEC13818) and the like are expected to be applied to the above.
[0003]
In the MPEG encoding method, since entropy encoding based on motion compensated prediction and DCT (Discrete Cosine Transform) is used, when encoding with uniform image quality, prediction efficiency and resolution for an input moving image signal are used. The generated code amount varies greatly according to the above characteristics. Due to restrictions on the transmission path and media, it is necessary to limit the time variation of the generated code amount so as to be within a predetermined range. Normally, the amount of generated code is controlled so as to satisfy this condition by controlling the accuracy of quantization (quantization step width). However, in this method, if the prediction efficiency is low due to intense motion of the input moving image signal, or if the resolution is high and the amount of information in the frame is large, the quantization step width becomes large. Coding distortion increases and image quality deterioration becomes significant.
[0004]
On the other hand, when the fixed-rate transmission method is used as the transmission method of the signal thus encoded, the time variation of the generated code amount is absorbed only by the smoothing buffer that temporarily stores the encoded bit stream before transmission. Therefore, the time variation of the generated code amount is limited within the prescribed smoothing buffer size, but the degree of freedom cannot be taken large, and the image quality deterioration depends on the image characteristics of the input moving image signal. May become noticeable.
[0005]
As a transmission method of the encoded bit stream, a variable rate transmission method in which the transmission rate is temporally changed is also known. In a transmission method based on burst transfer such as ATM packet transmission or HDD, this variable rate transmission can be realized. In variable rate transmission, a large variation in the amount of generated code can be allowed as compared with fixed rate transmission within a range limited by the maximum rate of the transmission path and the size of the smoothing buffer. Therefore, it is possible to realize more efficient encoding according to the image properties of the input moving image signal.
[0006]
When the encoded bit stream is stored in a digital storage medium such as an optical disk, that is, DSM (Digital Storage Media), variable rate transmission limited by the maximum capacity and the maximum transfer rate of the DSM is possible. In application to such a storage system, encoding in real time is not always required, and it is important to improve image quality and extend recording time by efficient encoding. For this purpose, it is considered effective to perform optimum variable rate control based on the properties of the entire input moving image signal sequence, but an optimum rate control method for variable rate transmission of moving image signals has not yet been established. .
[0007]
[Problems to be solved by the invention]
As described above, when considering the application of moving image coding technology to storage systems, optimal rate control should be performed based on the overall characteristics of the input moving image signal in order to extend the recording time and improve the image quality. However, such a rate control method has not been established yet, and its realization has been desired.
[0008]
The present invention provides a moving image encoding apparatus suitable for a storage system that can greatly improve image quality and extend recording time by optimizing rate control in accordance with the overall properties of an input moving image signal sequence. With the goal.
[0009]
[Means for Solving the Problems]
In order to solve the above-described problem, in the present invention, the entire input video signal sequence is once encoded, and then the same input video signal sequence is encoded again based on a statistic including the generated code amount over the entire video sequence. And encoding means for encoding the input moving image signal sequence by alternately switching a plurality of quantization step widths for each pixel block, and encoding each of the plurality of quantization step widths. Means for independently adding the code amount of the pixel block for each frame to obtain a plurality of addition code amounts, means for estimating the generated code amount of the frame for each quantization step width from the plurality of addition code amounts, and estimation Code amount assigning means for assigning an optimal code amount of the entire input moving image signal sequence for final encoding to the encoding means based on the generated code amount To provide a moving picture coding apparatus is performed.
[0010]
Further, the moving picture encoding apparatus according to the present invention includes means for estimating a temporal variation of the smoothing buffer occupancy based on the result of the code amount assignment, and means for correcting the distribution of the code amount according to the estimation result. The encoding means encodes the entire moving image signal sequence according to the correction code amount.
[0014]
[Action]
In the first invention, at least the first encoding is performed to extract a statistic such as a generated code amount over the entire input moving image signal sequence depending on the properties of the entire input moving image signal sequence, and based on the extracted statistical amount The code amount distribution and / or the quantization step width are selected for each predetermined number of frames of the image signal sequence. Based on the code amount distribution and the quantization step width thus selected, the quantization step width when performing the final encoding, for example, the second encoding, is controlled for each predetermined area in the screen. In a storage medium with a constant total code amount such as DSM, optimal code amount distribution using variable rate transmission can be performed in advance before actually storing encoded data in the medium. Can be easily controlled.
[0015]
Therefore, it is possible to perform optimal rate control based on the properties of the entire input moving image signal sequence in variable rate transmission of the moving image signal sequence, realizing high image quality and extending the recording time on the medium without increasing the average transmission rate. be able to. In addition, by setting the number of frames as a selection unit of code amount distribution and quantization step width to one frame or a small number of frames, stable image quality corresponding to a sudden change in the input video signal sequence such as a scene change can be obtained. Can be obtained.
[0016]
Further, in the second invention, since the temporal fluctuation of the occupancy amount of the smoothing buffer, which is a temporary storage means necessary for transmission, can be estimated in advance, the storage capacity, that is, the size of the smoothing buffer is used to the maximum. Thus, the code amount distribution can be optimized so as to satisfy the above-described two or three conditions. Therefore, by effectively using a smoothing buffer in variable rate transmission, it is possible to allocate the code amount so that the instantaneous bit rate exceeds the maximum transmission rate of the transmission path. Even in the case of fixed rate transmission, it is possible to achieve high image quality by making maximum use of fluctuations in the amount of generated code within a range that can be absorbed by the smoothing buffer.
[0017]
In the present invention, in order to analyze the properties of the entire input video signal sequence, the quantization step width selection in the encoding unit is designated from the outside, the first encoding is performed, and the statistics at that time are collected. Do. Therefore, an encoding apparatus capable of extracting statistics can analyze the properties of the entire input moving image signal sequence without changing hardware. On the other hand, in the case of an encoding device that does not have a statistic extraction function, for example, by analyzing the encoded bitstream by specifying the selection of the quantization step width from the outside, the characteristics of the entire input video signal sequence are analyzed. Equivalent results can be obtained, and by having the statistic extraction means independently of the existing encoding device, analysis can be performed without requiring large hardware changes.
[0018]
In general, the S / N of an encoded image has a monotonous relationship with the quantization step width, and the image quality can be determined based on the quantization step width. The quantization step width is also used for dynamic control of the generated code amount. Therefore, as one of the properties of the entire input video signal sequence, estimating a parameter representing the relationship between the quantization step width and the generated code amount is an effective method for rate control in consideration of image quality. . If the dimension of the parameter representing the relationship between the quantization step width and the generated code amount is n, the parameter estimation is performed using at least n fixed quantization step widths, and the parameter estimation is performed based on the result. Need to do. That is, it is necessary to repeat encoding n times over the entire sequence.
[0019]
On the other hand, in the present invention, by considering the similarity of consecutive frames and performing encoding by switching the quantization step width for each frame group, the quantization step width and the generated code amount in each frame group Can be estimated by one encoding. Furthermore, by encoding by switching the quantization step width for each pixel block in the frame, it is possible to estimate the relationship between the quantization step width and the generated code amount for each frame with a single encoding with high accuracy. Become.
[0020]
Also, if you need adaptive quantization processing that considers visual characteristics according to the nature of the input image, a quantization step that is a fixed quantization step width multiplied by a correction coefficient in units of frames or pixel areas By extracting the statistic using the width, it is possible to estimate the relationship between the quantization step width of each frame or each frame group and the generated code amount.
[0021]
【Example】
Embodiments of the present invention will be described below with reference to the drawings.
Example 1
FIG. 1 is a block diagram showing a schematic configuration of an embodiment according to the present invention. In the figure, a digital VTR 10 records a moving image signal sequence of a digitized movie or other program. A moving image encoded sequence (input moving image signal sequence) 11 reproduced from the digital VTR 10 is input to the encoding unit 12 and compressed and encoded. The encoding unit 12 outputs encoded data in the form of a bit stream, and the encoded bit stream 13 is stored in a digital storage medium (DSM) 14. Further, in this moving image encoding apparatus, in addition to the encoding unit 12, a data memory 16 for statistic accumulation, an image analysis unit 17, a code amount allocation unit 19, and a data memory 21 for rate control parameter accumulation are provided. These are used for rate control for the encoding unit 12.
[0022]
When a moving image signal of a desired program is reproduced from the digital VTR 10, encoded by the encoding unit 12, and the encoded bit stream 13 is stored in the DSM 14, the input moving image signal sequence 11 of the same program is output from the digital VTR 10. Played twice repeatedly. That is, the same input video signal sequence 11 is repeatedly input twice to the encoding unit 12 and encoded. Here, the input moving image signal sequence 11 repeatedly input twice may be a moving image signal sequence of one entire program, or if the program is a long one such as a feature film, the program It may be a series divided into a plurality of time zones, for example, dividing into a first half and a second half.
[0023]
The first encoding in the encoding unit 12 is performed in order to analyze the properties of the entire input moving image signal sequence 11 and extract the statistics, and the second encoding is the first encoding. This is performed for optimal rate control by code amount allocation and quantization step width selection based on the extracted statistics. The encoded bit stream 13 obtained under the optimum rate control by the second encoding is finally stored in the DSM 14.
[0024]
Next, the detailed configuration and operation of this embodiment will be described.
As described above, the input moving image signal sequence 11 from the digital VTR 10 is input twice to the encoding unit 12 and encoded. The encoding unit 12 performs encoding using a fixed quantization step width at the first encoding. At the time of the first encoding, a statistic such as a generated code amount, activity, and prediction efficiency for each frame is extracted as a statistic parameter 15 from the encoding unit 12 and stored in the data memory 16.
[0025]
At the time when the first encoding is completed, the image analysis unit 17 automatically analyzes the statistics in units of frames from the statistical parameters of the statistics accumulated in the data memory 16, and the input moving image signal sequence 11 A characteristic parameter 18 of each image is obtained.
[0026]
This characteristic parameter 18 is sent to the code amount allocating unit 19 and is subjectively applied to the entire input video signal sequence 11 within a range that satisfies the buffering limitation and transmission rate limitation of the encoding unit 12 for each frame. Optimal code amount allocation with reduced image quality fluctuation is performed. The rate control parameter 20 for each frame obtained as a result of this optimal code amount allocation is stored in another data memory 21.
[0027]
Next, the same input moving image signal sequence 11 is input from the digital VTR 10 to the encoding unit 12, and the second encoding is performed. In the second encoding, rate control is performed based on the rate control parameter 22 stored in the data memory 21 and read from the first encoding as described above. Then, the encoded bit stream 13 obtained by the second encoding is accumulated in the DSM 14.
[0028]
FIG. 2 is a block diagram illustrating a specific configuration example of the encoding unit 12. The encoding method itself in the encoding unit 12 is a known one defined by MPEG or the like. In FIG. 2, an input moving image signal sequence 11 is input to a subtracter 101 and a motion compensation prediction circuit 109. The motion compensation prediction circuit 109 detects a motion vector between the input moving image signal sequence 11 and a reference image signal already obtained by encoding / local decoding stored in the frame memory 108, and the motion vector is used as the motion vector. Based on this, a motion compensated prediction signal 102 is created. In the subtractor 101, a prediction residual signal is generated by subtracting the prediction signal 102 from the input moving image signal sequence 11. This prediction residual signal is subjected to discrete cosine transform in a block unit having a certain size in a discrete cosine transform (DCT) circuit 103, and becomes DCT coefficient information. The DCT coefficient information is quantized by the quantization circuit 104.
[0029]
The quantized DCT coefficient information from the quantization circuit 104 is inversely quantized by the inverse quantization circuit 105. The output of the inverse quantization circuit 105 is subjected to inverse discrete cosine transform by an inverse discrete cosine transform (inverse DCT) circuit 106. That is, the inverse quantization circuit 105 and the inverse DCT circuit 106 perform processing opposite to that of the quantization circuit 104 and the DCT circuit 103, respectively, and approximate the prediction residual signal output from the subtractor 101 to the output of the inverse DCT circuit 106. Signal is obtained. The output of the inverse DCT circuit 106 is added to the prediction signal 102 from the motion compensation prediction circuit 109 in the addition circuit 107, and a local decoded signal is generated. This local decoded signal is stored in the frame memory 108 as a reference image signal.
[0030]
On the other hand, the quantized DCT coefficient information from the quantization circuit 104 is also input to the variable length coding circuit 110 and is variable length coded. The variable-length encoded data is extracted as an encoded bit stream 13 through the smoothing buffer 111.
[0031]
The input moving image signal series 11 is also input to the activity calculation circuit 112. The activity calculation circuit 112 calculates the activity of the image of the input moving image signal series 11, and the result is input to the rate control circuit 113. The rate control circuit 113 controls the quantization step width in the quantization circuit 104 based on the activity and the buffer amount (occupation amount) of the smoothing buffer 111 and the rate control parameter 22 from the data memory 21 of FIG. Rate control, that is, control of the transmission rate of the encoded bit stream 13 is performed.
[0032]
Also, FIG. 2 shows the generated code amount for each frame from the variable length coding circuit 110, the activity calculated by the activity calculation circuit 112, and the prediction efficiency indicated by the prediction residual signal output from the subtractor 101. Information is output to the data memory 16 of FIG.
[0033]
(Example 2)
FIG. 3 is a block diagram showing a schematic configuration of the present embodiment. In the figure, the digital VTR 30, the encoding unit 32, the DSM 34, the data memory 36, the image analysis unit 37, the code amount allocation unit 39, and the data memory 41 are the digital VTR 10, the encoding unit 12, the DSM 14, This is basically the same as the data memory 16, the image analysis unit 17, the code amount allocation unit 19 and the data memory 21.
[0034]
In the present embodiment, the encoded bit stream 33 output from the encoding unit 32 is recorded in the DSM 34 even at the time of the first encoding for extracting statistics shown in the first embodiment. The statistic is extracted from the encoded bit stream in the first encoding recorded in the DSM 34 by the statistic extraction unit 43 sequentially, and each extracted statistic is recorded in the data memory 36. .
[0035]
Then, after the completion of the first encoding, automatic statistics analysis and code amount allocation processing are performed in units of frames as in the first embodiment, and the second encoding is performed under rate control based on the result. Do.
[0036]
(Example 3)
In the present embodiment, a specific example of optimal code amount allocation in the code amount allocation units 19 and 39 in the first and second embodiments will be described. 4 to 6 are diagrams schematically showing the state of the optimum code amount allocation and the effect thereof.
[0037]
FIG. 4 is a diagram showing temporal variation of entropy (complexity) of an image of the input moving image signal sequence 11 (hereinafter referred to as an input image). FIG. 5 shows temporal variations in image quality and bit rate when uniform code amount allocation is performed. If the code amount allocation is uniform, the image quality of the encoded image generally shows a time-varying reverse phase according to the degree of complexity of the input image. That is, the more complex the input image, the lower the image quality of the encoded image, and the higher the image quality for an input image with a small amount of information such as a quasi-still image. In this case, the visually deteriorated portion is perceived remarkably, and the overall impression of the encoded image is poor.
[0038]
On the other hand, FIG. 6 shows temporal variations in image quality and bit rate when optimal code amount allocation is performed based on statistical analysis. The optimal code amount assignment is performed by assigning a code amount according to the complexity of the input image in order to obtain a stable image quality within a range that satisfies the maximum transmission rate and buffering restrictions. Therefore, the obtained image quality is very stable, and the impression of the visually encoded image as a whole is improved. In addition, since the rate-distortion function in entropy coding usually has non-linear characteristics, the total code amount is constant by performing optimal code amount assignment as compared with the case of uniform code amount assignment. The S / N over the entire encoded image is also improved.
[0039]
In order to perform optimal code amount allocation and code amount control in consideration of image quality, it is important that the relationship between the quantization step width and the generated code amount can be estimated with high accuracy. FIG. 7 shows an example of the relationship between the quantization step width and the generated code amount in each image. As shown in the figure, the generated code amount generally decreases monotonously with respect to the quantization step width. The relationship between the generated code amount and the quantization step width depends on the encoding method and has unique characteristics depending on the properties of the input image. Since the MPEG coding method uses motion compensated prediction and DCT as shown in FIG. 2, the relationship between the generated code amount and the quantization step width depends on the prediction efficiency of each image and the input video signal sequence. It depends on the spatial frequency distribution.
[0040]
Here, the relationship between the quantization step width Q and the generated code amount R for a predetermined number of frames (one frame or a relatively small number of frames) is a statistical parameter specific to the image j._i ^j (I = 0, 1,..., N)
R = f (Q, a₁ ^j , A₂ ^j , ..., a_n )
It can be expressed as. Here, with respect to the modeled function f, from the indirect parameters such as the calculated activity, spatial frequency distribution, or prediction efficiency, the parameter a_i Attempts have also been made to estimate. However, in general, the actual measurement data varies greatly from the modeled function system, and it is difficult to estimate the relationship between the generated code amount and the quantization step width with high accuracy from these indirect parameters. Therefore, the relationship between the quantization step width and the generated code amount is directly measured every predetermined number of frames (one frame or each frame group) of the input moving image signal sequence 11, and the statistical parameter a is determined by regression analysis or the like._i ^j By obtaining (i = 0, 1,..., N), it is possible to estimate the generated code amount with high accuracy.
[0041]
Example 4
FIG. 8 is a flowchart showing the flow of the encoding process in this embodiment. The configuration of the video encoding apparatus in the present embodiment is the same as that shown in FIG.
[0042]
The relationship between the generated code amount R and the quantization step width Q for a predetermined number of frames (one frame or frame group j) of the input moving image signal sequence 11 (or 31) is expressed as R = f (Q, a₁ ^j , A₂ ^j ) First, as a pre-coding for extracting statistics over the entire input video signal sequence, a fixed quantization step width Q = Q₁ And Q = Q₂ The encoding unit 12 (or 32) sequentially performs encoding twice using (Step S11).
[0043]
Then, by measuring the generated code amount for each frame or frame group at the time of encoding, for example, from the output of the variable length encoding circuit 110 in FIG.₁ ^j , A₂ ^j Estimate That is, a statistical amount analysis process is performed (step S12).
[0044]
By using the relationship between the estimated quantization step width and the generated code amount in this way, it is possible to reduce the image quality variation and to achieve an optimal code amount distribution that satisfies the transmission path conditions such as the maximum rate, average rate, and smoothing buffer size. The code amount assigning unit 19 (or 38) performs it (step S13), and based on this, the encoding unit 12 (or 32) performs compression encoding (step S14).
[0045]
(Example 5)
The present embodiment will be described with reference to FIG. In FIG. 9, I indicates an image for which only intraframe coding is performed, P indicates an image for which forward prediction is performed, and B indicates an image for which forward and backward prediction are performed. In this embodiment, the quantization step width is set to Q for each of I, P, and B image types that are temporally continuous.₁ , Q₂ By alternately switching between and encoding, the encoding of two times is equivalently realized by encoding once over the entire input video signal sequence in consideration of the similarity of temporally adjacent frames. It is. That is, for example, for B1 and B2 in the figure, the relationship between the generated code amount R and the quantization step width Q is
R = f (Q, a₁ ¹², A₂ ¹²)
Quantization step width Q_b1, Q_b2R1 represents the actual value of the generated code amount at the time of encoding B1 and B2, respectively._b1, R_b2As
(R, Q) = (R_b1, Q_b1), (R_b2, Q_b2)
From the above formula a₁ ¹², A₂ ¹²Ask for.
[0046]
As described above, it is possible to obtain the same result as when the same moving image signal sequence is substantially performed twice by one encoding.
(Example 6)
A present Example is described using FIG. 10 and FIG. 10 and 11 show the quantization step width of each pixel block in one frame, FIG. 10 shows the case where there are two types of quantization step widths, and FIG. 11 shows the case where there are three types of quantization step widths.
[0047]
Now, the relationship between the quantization step width Q of the frame j and the generated code amount R is expressed as follows:
R = f (Q, a₁ ^j , A₂ ^j )
And Here, the quantization step width for each pixel block of the frame j is set to Q as shown in FIG.₁ , Q₂ Or Q as shown in FIG.₁ , Q₂ , Q_Three And the generated code amount is independently added within the frame for each pixel block corresponding to each quantization step width. Here, the number of quantization step widths in one frame is not limited to 2 or 3, and generally N types are used in one frame. Quantization step width Q_n The intra-frame addition value of the generated code amount for each block for (n = 1, 2,..., N) is R_n As
(R, Q) = (N × R_n , Q_n ) (N = 1, 2,..., N)
From the above parameter a₁ ^j , A₂ ^j Ask for. Thereby, it is possible to obtain an encoding result having a fixed quantization step width of N times equivalently from one encoding. Note that a set N of quantization step widths in one frame is a parameter a.₁ ^j By setting the order to be higher than the order, it is possible to estimate parameters by regression analysis.
[0048]
(Example 7)
FIG. 12 is a block diagram showing a schematic configuration of the present embodiment. In the figure, the digital VTR 50, the encoding unit 52, the DSM 54, the data memory 56, the image analysis unit 57, the code amount allocation unit 59, and the data memory 61 are the digital VTR 10, the encoding unit 12, the DSM 14, the data shown in FIG. The memory 16, the image analysis unit 17, the code amount allocation unit 19, and the data memory 21 are basically the same.
[0049]
In this embodiment, according to parameters such as activity that affects subjective image quality, prediction efficiency, motion amount, prediction mode, etc., weighting of visual correction is applied to the quantization step width for each pixel block unit or frame unit. This is different from the previous Examples 1 and 2. That is, in this embodiment, an adaptive quantization weight calculation unit 64 is newly added. In the adaptive quantization weight calculation unit 64, an adaptive quantization weight parameter is selected from the parameters 62 such as activity, prediction efficiency, motion amount, and prediction mode. 63 is calculated. Adaptive quantization processing can improve the overall subjective image quality by finely quantizing the areas where image quality degradation is visually noticeable, and by coarsening the quantization areas where image quality degradation is less noticeable. Is the purpose.
[0050]
FIG. 13 and FIG. 14 show examples of adaptive quantization weight parameters when using adaptive quantization processing in the time direction and the spatial direction in the frame, respectively. When adaptive quantization processing is used, if the weight function of adaptive processing in time direction or spatial direction as shown in FIGS. 13 and 14 is obtained before encoding the frame, In the first encoding with a fixed quantization step width, encoding is performed by changing the fixed quantization step width with the weighting function of this adaptive processing. This makes it possible to estimate the relationship between the generated code amount and the quantization step width for a predetermined number of frames (one frame or frame group) with higher accuracy.
[0051]
(Example 8)
FIG. 15 shows an embodiment of a variable rate video decoding device for decoding an original video signal sequence from encoded data obtained by the video encoding device according to the present invention. In the figure, encoded data is stored in the DSM 70, and the encoded bit stream reproduced from the DSM 70 is input to the moving picture decoding apparatus 71 via the transmission path 72, and is firstly a FIFO that is a smoothing buffer. Input to the buffer 73. Here, the transmission path 72 is configured to transmit encoded data at a specified maximum transmission rate Rmax, and stop transmission when the occupation rate of the FIFO buffer 73 exceeds a specified value.
[0052]
The FIFO buffer 73 sends the encoded data to the decoder 75 in response to a request from the decoder 75. At this time, it is assumed that the encoded data for one frame is instantaneously transferred from the FIFO buffer 73 to the decoder 75 at the time when the frame is to be decoded. This transmission model is defined in ISO / IEC13818-2. The moving image signal sequence 76 decoded by the decoder 75 is sent to the display device 77 and displayed.
[0053]
Example 9
FIG. 16 shows the processing procedure of an embodiment of the code amount allocation processing in the video encoding apparatus according to the present invention. First, using the statistical parameters extracted as described above, optimal code amount distribution is performed over the entire input video sequence (step S21). Next, on the basis of the code amount distribution result, the time variation of the smoothing buffer occupation amount is estimated (step S23).
[0054]
FIG. 17 shows an estimation result of the occupancy transition of the smoothing buffer according to the transmission model when optimal code amount allocation is performed so that subjective image quality is constant under a given total code amount. It is shown. Here, the gradient of the temporal fluctuation of the buffer occupation amount indicates the maximum transmission rate of the transmission path, and the encoded data is instantaneously extracted from the smoothing buffer at the time of the frame period. In FIG. 17, since the restriction of the smoothing buffer is not considered, it is estimated that the smoothing buffer underflow occurs at time m.
[0055]
Therefore, in this embodiment, the possibility of the underflow of the smoothing buffer is verified in step S24 of FIG. 16, and when underflow is predicted, the time when the buffer occupancy becomes sufficiently high goes back from the underflow prediction time. The code amount distribution is corrected by redistributing the allocated code amount up to (n in the example of FIG. 17) to other time regions (step S25). Thereby, it is possible to realize code amount distribution that does not cause underflow.
[0056]
FIG. 18 shows the estimation result of the temporal variation of the smoothing buffer occupancy amount by the code amount distribution corrected by the restriction of the smoothing buffer. At this time, the temporal fluctuation of the short-time average transmission rate is as shown in FIG. 19, and by effectively utilizing the buffer fluctuation, it is possible to distribute the code amount instantaneously exceeding the maximum rate of the transmission path. That is, optimal code amount distribution combining the maximum transmission rate of the transmission path and the smoothing buffer can be realized.
[0057]
In fixed-rate transmission, it is necessary to control the smoothing buffer so that underflow and overflow do not occur. However, within the range that can be absorbed by the smoothing buffer, the meaning of high image quality is the same as variable-rate transmission. Thus, optimal code amount distribution is possible.
[0058]
As described above, in this embodiment, the instantaneous maximum transmission rate defined by the maximum transmission rate of the transmission path and the storage capacity of the smoothing buffer is satisfied, (b) the smoothing buffer does not cause underflow and overflow, (c) Optimal code amount distribution can be performed so as to satisfy the three conditions of satisfying the average transmission rate defined in the transmission path.
[0059]
【The invention's effect】
As described above, according to the present invention, in moving image compression coding in a storage system that does not necessarily require real-time performance at the time of coding, for example, 1 over the entire input moving image signal sequence using a fixed quantization step width is used. The statistics are extracted by performing the first encoding, the relationship between the generated code amount and the quantization step width for a predetermined number of frames (one frame or a frame group) is estimated, and the code amount of the generated code amount is based thereon By selecting at least one of allocation and quantization step width and controlling the quantization step width at the time of the second encoding for each predetermined area in the screen of the input moving image signal sequence, it is possible to limit and limit the transmission path. It is possible to perform optimal code amount distribution that achieves high image quality under the total code amount.
[0060]
Here, in addition to limiting the average transmission rate and the maximum bit rate of the transmission path, by optimizing the code amount distribution including the temporal transition of the smoothing buffer occupancy, either variable rate transmission or fixed rate transmission can be achieved. In this case, overflow or underflow of the smoothing buffer can be prevented, and higher image quality can be realized.
[0061]
Also, by encoding by switching a plurality of quantization step widths for each frame or each pixel block at the time of the first encoding, the encoding of the entire input video signal sequence is performed several times after the first encoding. It is possible to estimate the relationship between the generated code amount and the quantization step width in frame units or frame group units with the same high accuracy as when the conversion is repeated.
[0062]
Furthermore, when adaptive quantization processing that takes visual characteristics into consideration for each frame or pixel block is used, even during the first encoding using fixed quantization, a fixed function weighting function is used. By changing the value of the quantization scale, the encoding characteristics of each frame or frame group can be obtained without reducing accuracy.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a schematic configuration of a video encoding apparatus according to a first embodiment.
FIG. 2 is a block diagram showing a specific configuration example of an encoding unit in FIG.
FIG. 3 is a block diagram illustrating a schematic configuration of a moving image encoding apparatus according to a second embodiment.
FIG. 4 is a diagram illustrating a temporal change in entropy of an input image for explaining Example 3;
FIG. 5 is a diagram showing temporal variations in image quality and bit rate in fixed-rate encoding for explaining the third embodiment.
FIG. 6 is a diagram showing temporal variations in image quality and bit rate in variable rate coding for explaining the third embodiment;
FIG. 7 is a diagram illustrating a relationship between a quantization step width for each image and a generated code amount for explaining the third embodiment;
FIG. 8 is a flowchart showing the flow of an encoding process in the fourth embodiment.
FIG. 9 is a view showing a plurality of image types for explaining the fifth embodiment;
FIG. 10 is a diagram illustrating the quantization step width of each pixel block in one frame for explaining the sixth embodiment;
FIG. 11 is a diagram illustrating the quantization step width of each pixel block in one frame for explaining the sixth embodiment;
FIG. 12 is a block diagram showing a schematic configuration of a video encoding apparatus according to Embodiment 7.
FIG. 13 is a diagram illustrating an example of a weight function used for time-direction adaptive quantization processing in the seventh embodiment;
FIG. 14 is a diagram illustrating an example of a weighting function used for spatial direction adaptive quantization processing in the seventh embodiment;
FIG. 15 is a block diagram illustrating a configuration example of an accumulation-type moving image decoding device according to an eighth embodiment;
FIG. 16 is a flowchart illustrating code amount distribution processing according to the ninth embodiment.
FIG. 17 is a diagram illustrating temporal variation of the smoothing buffer occupancy for explaining the ninth embodiment;
FIG. 18 is a diagram showing temporal variation of the smoothing buffer occupancy for explaining the ninth embodiment;
FIG. 19 is a diagram showing temporal fluctuations of the bit rate for explaining the ninth embodiment;
[Explanation of symbols]
10, 30, 50 ... Digital VTR
11, 31, 51... Input image signal
12, 32, 52 ... encoding unit
13, 33, 42, 53 ... encoded bit stream
14, 34, 54 ... DSM (digital storage media)
15, 55 ... Statistics parameter
16, 36, 56 ... Statistics accumulation data memory
17, 37, 57 ... Image analysis section
18, 38, 58 ... characteristic parameters
19, 39, 59... Code amount allocation unit
20, 40, 60 ... Rate control parameters
21, 41, 61 ... Data memory for rate control parameters
22 ... Rate control parameters
43 ... Statistics extraction unit
62 ... Activity, prediction efficiency, amount of motion, prediction mode, etc.
63 ... Adaptive quantization weight parameter
64: Adaptive quantization weight calculation processing unit
70 ... DSM
71. Moving picture decoding apparatus
72: Transmission path
73 ... FIFO buffer
74: Encoded data
75 ... Decoder
76 ... Decoded video signal sequence
77 ... Display device
101 ... Subtractor
102 ... Prediction signal
103 ... DCT circuit
104: Quantization circuit
105: Inverse quantization circuit
106: Inverse DCT circuit
107 ... Adder
108: Frame memory
109 ... Motion compensation prediction circuit
110... Variable length encoding circuit
111 ... smoothing buffer
112 ... Activity calculation circuit
113 ... Rate control circuit

Claims

After once encoded the entire input moving image signal sequence, the moving picture coding apparatus for coding the same overall input dynamic image signal sequence again based on statistics containing this throughout generated code quantity, the input video An encoding unit that encodes a signal sequence by alternately switching a plurality of quantization step widths for each pixel block, and a code amount of each pixel block encoded with the plurality of quantization step widths independently for each frame Means for adding and obtaining a plurality of added code amounts; means for estimating a generated code amount of a frame for each quantization step width from the plurality of added code amounts; and the encoding means based on the estimated generated code amount A coding amount allocating unit for allocating an optimum coding amount for the entire input moving image signal sequence for final encoding.

Means for estimating the temporal variation of the smoothing buffer occupancy based on the result of the code amount assignment, and means for correcting the distribution of the code amount according to the estimation result, wherein the encoding means is a moving image according to the correction code amount. The moving image encoding apparatus according to claim 1, wherein the entire signal sequence is encoded.