JP2004357086A

JP2004357086A - Encoding method and encoder of animation

Info

Publication number: JP2004357086A
Application number: JP2003153713A
Authority: JP
Inventors: Yoshinori Suzuki; 芳典鈴木; Shigeki Nagaya; 茂喜長屋
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2003-05-30
Filing date: 2003-05-30
Publication date: 2004-12-16

Abstract

<P>PROBLEM TO BE SOLVED: To solve a problem that many unused disk regions are generated when data is stored in a storage device in the unit of GOV with the fixed number of frames since activity of a video is greatly different depending on time and place when surveillance monitoring is targeted in the case of encoding animation data. <P>SOLUTION: The number of frames in a random access unit is varied according to data quantity and encoding control is performed so that multiples of the data quantity of the random access unit approach prescribed data quantity of the storage device. Thus, the unused regions of the disk are reduced and use efficiency of the storage device is enhanced. In addition, retrieval time in random access processing is stabilized and throughput of data transfer processing is enhanced. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、動画像データの符号化技術に属するものである。
【０００２】
【従来の技術】
動画像符号化した符号化データをストレージ装置に保存して管理する映像監視システムでは、データアクセス効率化を考慮したデータ格納処理を実施している。このようなシステムでは、ユーザからの要求に応じて、ストレージ装置から必要なデータの固まりを映像配信サーバに取り出し、データ配信処理を行うが、このとき、ストレージ装置から取り出すデータ量には、ディスクアレイの基本データサイズやディスクの論理サイズ、ならびに映像配信サーバのキャッシュサイズなどの面からそのシステムにとって最適な値が存在する。本明細書では、この最適なデータ量を規定格納データ量と定義し、規定格納データ量により区分けされるディスク上の領域を格納データ領域と定義する。
ストレージ装置内の必要な画像データを検索する際、まず、必要なデータが含まれる格納データ領域を検索し、その後、格納データ領域内の符号化データを復号して必要な画像データを検索するという手順をとる。そのため、このようなシステムでは、各格納データ領域の先頭フレームはランダムアクセス可能なフレームの符号化データとしている。
【０００３】
映像の符号化方法には、大きく分けて２種類あり、それぞれ、イントラ符号化、インター符号化と呼ばれている。インター符号化は、フレーム間相関を利用するため、以前に符号化したフレームを予測処理のために利用する。イントラ符号化は、フレーム内相関のみを利用した符号化方法である。イントラ符号化のみ用いて符号化されたフレームをイントラフレーム、インター符号化を用いて符号化されたフレームをインターフレームと呼ばれている。したがって、ランダムアクセスが可能なフレームは、そのフレームの符号化データだけで再生が可能なイントラフレームのみである。
【０００４】
図５に例としてＭＰＥＧ−４ビデオ符号化の基本的なデータ構造を示す。ＶＯＳヘッダはＭＰＥＧ−４ビデオ製品の適用範囲を決めるプロファイル・レベル情報、ＶＯヘッダはビデオ符号化のデータ構造を決めるバージョン情報、ＶＯＬヘッダは画像サイズ、符号化ビットレート、フレームメモリサイズ、適用ツール等の情報を含んでいる。ＧＯＶヘッダには時刻情報が含まれている。ＶＯＰは各フレームの符号化データであり、ＧＯＶヘッダの直後のＶＯＰはかならずイントラフレームとする。ＶＯＳヘッダ、ＶＯヘッダ、ＶＯＬヘッダ、ＧＯＶヘッダとも３２ビットのユニークワードから始まるため、容易に検索できる。シーケンスの終了を示すＥｎｄｃｏｄｅｏｆＶＯＳも３２ビットのユニークワードである。これらのユニークワードは、２３個の”０”と１個の”１”で始まり、その２４ビットに続く２バイトのデータがその区切れの種類を示すような構造となっている。
このように、ＧＯＶヘッダと複数のＶＯＰにて構成されるＧＯＶのようなイントラフレームから始まるランダムアクセスが可能なデータ単位（以降、ランダムアクセス単位と呼ぶ）で格納データ領域に符号化データを保存し、ランダムアクセス処理を実現する。
【０００５】
【発明が解決しようとする課題】
通信や放送を目的とした場合には、バッファ制御等の復号処理の観点から、ランダムアクセス単位内のフレーム数は一定とし、ランダムアクセス単位内の符号量がほぼ一定となるように各フレームへのデータ配分を行う。しかしながら、監視モニタリングを目的とした場合には、画質が重要であり、また、映像のアクティビティーは時刻や場所により大きく異なる。そのため、ランダムアクセス単位内のフレーム数を一定とし、ランダムアクセス単位でストレージ装置にデータを格納すると、規定格納データ量とランダムアクセス単位のデータ量の不一致に伴う未使用のディスク領域が多く発生し、ディスクの利用効率が低減する。例えば、図６（規定格納データ量とランダムアクセスのデータ量の関係）のデータ６０１のように、使用されないディスク領域が発生する。このような現象は、特に、映像内の動きが少ない深夜などに発生しやすい。さらに、データ６０２のように、ランダムアクセス単位のデータが２つの格納データ領域を跨ぐ場合には、大きな未使用ディスク領域が発生し、ディスク利用効率が低減する。このような現象は、映像内の動きが大きいときに発生しやすい。
【０００６】
また、ランダムアクセス単位内の検索処理に要する時間は、ランダムアクセス単位内のデータ量に依存するため、ランダムアクセス単位内のフレーム数を一定とした場合、ランダムアクセス処理に要する時間が安定せず、サービス品質の低下につながる。
【０００７】
さらに、ランダムアクセス単位内のフレーム数を一定とした場合、格納データ領域内のデータ量にばらつきが生じるため、復号装置へのデータ配信時の通信帯域の利用効率が低減する。
【０００８】
【課題を解決するための手段】
ランダムアクセス単位内のフレーム数をデータ量に応じて変化させると共に、ランダムアクセス単位のデータ量の倍数がストレージ装置の規定格納データ量に近づくように符号化制御する。より具体的には、次の入力画像の推定符号量と単位データ総量の和が規定格納データ量より大きい場合には、単位データ総量を初期化した後、入力画像の符号量が規定格納データ量より小さくなるように入力画像をイントラ符号化し、符号量を単位データ総量に加算する。次の入力画像の推定符号量と単位データ総量の和が規定格納データ量より小さい場合には、入力画像の符号量と単位データ総量の和が規定格納データ量より小さくなるように入力画像をインター符号化する。
【０００９】
【発明の実施の形態】
従来の技術でも示したように、ストレージ装置を含むネットワーク監視システムでは、ストレージ装置から復号装置へのデータ配信処理は、格納データ領域毎に実施する。そのため、各格納データ領域に保存される符号化データの先頭フレームはイントラフレームとする必要がある。符号化レートが変動しないように符号化制御（固定符号化レート）を実施する符号化装置では、定期処理として、一定フレーム数間隔でイントラフレームを発生させ、この１周期をランダムアクセス単位とする。例外処理として、シーンチェンジ時（監視映像ではカメラ動作時または撮影カメラ切り替え時に相当）に、イントラフレームを発生させる場合もあるが、これは、シーンチェンジ時は、インター符号化よりイントラ符号化の方が符号化データ量を抑えることができるためである。しかしながら、監視用途では、映像シーン内の１枚１枚の画像の画質を高品質とする必要であり、フレーム間変動が大きい時刻とフレーム間変動が少ない時刻では、各フレームの符号量は大きく異なる。そのため、一定フレーム数間隔でイントラフレームを挿入する符号化制御方法を適用した上で、格納データ領域の先頭フレームをイントラ符号化となるようにストレージ装置に符号化データを記録していくと、ディスク内の未使用領域の発生率が高くなり、ディスクの利用効率が低減する。
【００１０】
そこで、本発明では、ランダムアクセス単位に目安となるデータ量を設定し、その設定データ量に近づくように符号化処理を実施する。そして、ランダムアクセス単位の目標データ量をその倍数値が規定格納データ量となるように設定する。これにより、ランダムアクセス単位内のフレーム数を可変となるが、ランダムアクセス単位のデータ量が安定し、各格納データ量の値が規定格納データ量に近い値となる。その結果、
１）ストレージ装置の未使用領域が減少し、蓄積できるデータ量が増加する、
２）一定データ量を蓄積するために要する格納データ領域の数が減少し、アクセスしたいフレームの符号化データが含まれる格納データ領域の平均検索時間が短縮される、
３）２つのイントラフレーム間の符号化データ量が安定するため、ランダムアクセス単位内での最大検索時間と検索要求時の応答遅延が短縮される（ランダムアクセス処理には復号処理が伴うため、要する時間は符号化データ量に依存する）、
４）各格納データ領域に蓄積される符号量が安定化するため、ストレージ装置から配信サーバへのデータ取り出し効率が向上する、
５）各格納データ領域に蓄積される符号量が安定化するため、復号装置への通信帯域やストレージ装置や配信サーバを含むデータ蓄積装置内のデータパスの利用効率が向上する、
などの効果が生まれる。
【００１１】
図２にランダムアクセス単位の目標データ量と規定格納データ量を一致させた場合を例に、本願の符号量制御方法を示す。次の入力画像が入力されると（処理８０１）、規定格納データ量Ｄと単位総データ量Ｔと推定符号量ｐの和を比較する（処理８０２）。ここで、単位総データ量Ｔは１つの格納データ領域に記録される最初のフレームから直前の記録フレームまでの総符号量を示す。推定符号量ｐは、次の入力画像をインター符号化した場合の推定符号量であり、直前フレームの符号量、現在の時刻、カメラアングル、現在のランダムアクセス単位のアクティビティーなどから推測される値である。Ｄがｐ＋Ｔより大きい場合には、符号量ｅがＤ − Ｔより小さくなるように、入力画像をインター符号化する（処理８０３）。一方、Ｄがｐ＋Ｔより小さい場合には、まず、Ｔ値を０に初期化し、直前フレームにて現在の格納データ領域を対象とした符号化処理を終了する。そして、入力画像を新しい格納データ領域の先頭フレームとして、符号量ｅがＤより小さくなるように、イントラ符号化する（処理８０４）。処理８０３終了後、ｅとＤ−Ｔを比較し（処理８０５）、ｅがＤ−Ｔより大きくなる場合には、処理８０３の符号化処理を取り消し、処理８０４を実施する。処理８０５にてｅがＤ−Ｔより小さい場合には、Ｔの値を更新（Ｔ＝Ｔ＋ｅ）する（処理８０６）。また、処理８０４終了後にも、Ｔの値を更新（Ｔ＝Ｔ＋ｅ）する（処理８０６）。以降、処理８０１から処理８０６を繰り返す。
【００１２】
なお、上記の説明では、簡単のため、ランダムアクセス単位の目標データ量と規定格納データ量とした。しかしながら、本発明のポイントは、各データ格納領域の先頭フレームがイントラフレームとし、かつ各データ格納領域に蓄積されるデータ量を規定格納データ量に近づけることである。そのため、各データ格納領域内に含まれるイントラフレームの数は限定されず、ランダムアクセスに対する応答速度が重視される環境では、データ格納領域内に複数のイントラフレームを設け、サービス品質が向上させたほうがよい。たとえば、ランダムアクセス単位の目標データ量のｎ倍（ｎは２以上の整数）を規定格納データ量となるように設定する方法や、複数の目標データ量の候補を用意し、その組み合わせが規定格納データ量となるように設定してもよい。この場合、処理８０２の前に下記の処理を追加する。まず、ランダムアクセス単位の総データ量Ａ＋ｐと目標データ量Ｂを比較する（処理８０７）。この際、Ａ＋ｐがＢより大きい値であれば、Ａを０に初期化する（処理８０８）。Ａ＋ｐの値がＢよりも大きい値であり、さらに、現在のランダムアクセス単位がデータ格納領域に割り当てられた最後のランダムアクセス単位でなく（処理８０９）、ランダムアクセス単位内の符号化フレーム数ｃがランダムアクセス単位内最小フレーム数Ｆより大きい場合（処理８１２）には、入力画像を次のランダムアクセス単位のイントラフレームとし、ｐをイントラ符号化の推定符号量とし（処理８１０）、ｃの値を１とする（処理８１１）。Ａ＋Ｐの値がＢよりも大きい値であり、かつ現在のランダムアクセス単位がデータ格納領域に割り当てられた最後のランダムアクセス単位であるか（処理８０９）、あるいはｃの値がＦよりも小さい場合（処理８１２）には、ｃの値を１増やす（処理８１１）。処理８０７で、Ａ＋Ｐの値がＢよりも小さい場合には、ｃの値を１増やす（処理８１１）。処理８１２は、フレーム間の変化が激しいシーンにて符号化効率が低減する問題を解決する効果がある。本発明では、ランダムアクセス単位内のフレーム数は符号化データ量によって制御するため、フレーム間変動の大きいシーンではランダムアクセス単位内フレーム数は少なくなり、結果として、符号量の大きいイントラフレームの発生頻度が高くなる。そこで、１つのランダムアクセス単位内のフレーム数が規定値Ｆより小さい場合には、次のランダムアクセス単位を合成し、時間方向に対するイントラフレームの発生頻度を下げる。この処理により、符号化効率の低減を避けることが可能となる。Ｆの値は、平均的なランダムアクセス単位内のフレーム数よりも小さい値に設定する。また、ストレージ装置から復号装置へのデータ配信処理を規定格納データ量のｎ倍（ｎは２以上の整数）とするシステムでは、ランダムアクセス単位の目標データ量を、規定格納データ量よりも大きな値に設定できる。この場合、ランダムアクセスデータ単位の目標データ量のｍ倍（ｍは２以上で、ｎよりも小さい整数）値を規定格納データ量のｎ倍値とし、Ｄの値を規定格納データ量のｎ倍値として制御する。
【００１３】
上記の説明では、処理８０５にて、ｅ＋Ｔ＞Ｄの時に、処理８０３の符号化処理を取り消して、処理８０４を実施しているが、ｅを０とし（入力画像を符号化せずに）、処理８０６を実施してもよい。
【００１４】
図８のデータ８０１〜データ８０３に図２の処理により生成されるＭＰＥＧ−４符号化データの例（ランダムアクセス単位はＧＯＶ）を示す。このように、１つのランダムアクセス単位に属するフレームの数は映像シーンのアクティビティーや、カメラアングル、撮影時間などにより異なるが、各ランダムアクセス単位の符号量は、それぞれ規定格納データ量に近い値となり、図６の場合に比較して各格納データ領域の未使用領域は削減される。
図７に図２の処理を実施するストレージ装置を含むネットワーク監視システムの構成例を示す。図７では、符号化装置は多数台のカメラ（１ａ，１ｂ，１ｃ，１ｄ．．．）にて撮影された各アングルの映像が順次符号化装置２に入力される。符号化装置２では、各カメラからの入力映像を順次符号化する。例えば、３台のカメラが符号化装置２に接続されており、各カメラの符号化レートを各々１フレーム／秒とする場合、符号化装置への入力タイミングをずらしながら３フレーム／秒で符号化する。別の例としては、フレームレートを１０フレーム／秒とし、符号化装置にデータを入力するカメラの映像を数秒間ずつスイッチさせ、順に符号化する場合もある。図１０のデータ７０１がカメラ３台の場合の符号化データの構成例である。なお、図１０のデータ７０２（カメラ１の符号化データ）に示すように、各カメラの符号化データを個別のデータとしてストレージ装置に記録する場合も考えられる。個別に記録する場合には、図２の処理も各カメラに対して、個別に実施する。監視用途では、検索効率向上のため、カメラの番号情報やデータの特徴情報が符号化データに付加される場合がある。この場合にも、これらの情報のデータサイズを図２の符号量ｅの値に加算することで本発明が適用できる。また、符号化装置の処理に余裕がある場合には、異なるスペックの復号装置に対応するため、各入力画像に対して、複数のビットレートならびに画面サイズの符号化データを生成する場合もある。この場合には、各スペックの符号化データについて個別に図２の処理を適用することで、本発明が適用できる。符号化データは、ネットワークを介してデータ蓄積装置３に配信される。なお、図７には、１台の符号化監視装置しか記載していないが、ネットワークを介して複数の監視装置が接続されている。データ蓄積装置３では、受信サーバ３ａにて各符号化監視装置から提供された監視映像の符号化データを受信し、図８に示すような符号量が規定格納データ量（設定方法は後述）に近くかつランダムアクセス可能なデータ単位（１個以上のランダムアクセス単位から構成）に分割した後、ストレージ装置３ｂの格納データ領域に蓄積する（ストレージ装置から復号装置へのデータ配信処理を規定格納データ量のｎ倍とするシステムでは、ランダムアクセス単位の目標データ量を、規定格納データ量よりも大きな値に設定することもある）。この際、格納データ領域内の最初のフレームデータの前に図５のＶＯＬヘッダが存在しない格納データ領域については、ＧＯＶヘッダの前にＶＯＳヘッダ、ＶＯヘッダならびにＶＯＬヘッダを付け足しておくと受信装置への配信処理が容易となる。但し、これらの情報のデータサイズを符号量ｅの値に加算する必要がある。各格納データ領域に蓄積されたデータのカメラ番号と時刻情報（必要の場合には、画像サイズやビットレート）は配信サーバ３ｃにて管理される。なお、符号化装置２にて、図２の符号量制御が各カメラ個別に実施されている場合には、まず、図１０のように、入力された符号化データ（データ７０１）を各カメラの符号化データ（データ７０２、カメラ１のデータの例）に分割する。そして、各カメラ個別に、図８に示すような符号量が規定格納データ量（設定方法は後述）に近くかつランダムアクセス可能なデータ単位（１個以上のランダムアクセス単位から構成）に分割して、格納データ領域に蓄積する。配信サーバ３ｃは、ネットワークを介して、監視者５から指令を受けると、該当する時刻ならびにカメラの符号化データが含まれる格納データ領域を検索し、格納データ領域単位で配信サーバ内のメモリ領域に読み込み、復号装置４に配信する。この際、指令と同時に監視者側の再生端末のプロファイルを受信し、ストレージ装置に格納されている符号化データの画像サイズなどが再生端末のスペックに合わない場合には、トランスコーディングを実施して配信する。符号化装置の処理により、ストレージ装置に複数の端末スペックに対応する符号化データが保存されている場合には、最適なスペックの符号化データを検索して配信する。なお、図１０のデータ７０１のように複数カメラの符号化データを多重化してストレージ装置３ｂに蓄積している場合には、監視者からの指令に基づいて、必要なデータのみを配信することが可能となる。例えば、カメラ１のデータのみを要求された場合にはデータ７０２、カメラ１〜３の全てを要求された場合にはデータ７０１、カメラ１と２のみを要求された場合にはデータ７０３を配信する。
上記の説明では、監視者の要求によって、ストレージ装置から復号装置へのデータ配信が実施されているが、受信した符号化データをリアルタイムで復号装置に配信する場合もある。この場合には、受信サーバ３ａから配信サーバ３ｂにデータが転送した後に、ストレージ装置３ｂに蓄積するか、あるいは受信サーバ３ａから直接復号装置にデータを配信した後に、ストレージ装置３ｂに蓄積する。
【００１５】
また、監視用途では、数ヶ月単位の長時間の記録が必要となるため、ストレージ装置のディスク容量が不足する恐れがある。一方、ストレージ装置に蓄積された符号化データは時間が立つほど監視者からのアクセス頻度が下がり、その重要性が低下する。そこで、古いデータ量を削減する処理が実施される。その方法としては、
１）ストレージ装置内の符号化データにトランスコーディング（画像サイズを小さくする、フレームレートを落とす、画質を落とすなどの処理）を施す、
２）符号化データ内の双方向予測フレーム（他のフレームの予測に使用されないデータであり、ＭＰＥＧなどの符号化方式で利用されている）を取り除く、
などがあり、符号量が削減される。具体的には、受信サーバあるいは送信サーバが古いデータをストレージ装置から取り出し、上記のような符号量削減処理を実施し、再度ストレージ装置に蓄積する。元のデータが蓄積されている格納データ領域のデータは上書きすることが可能となる。この際にも、図２の処理２のような符号量制御方法を適用するとディクス利用効率が向上する。なお、このようなデータ量削減処理は、他のストレージ装置データへのバックアップ処理を行う場合にも有効である。
【００１６】
図１にて、図７における符号化装置２の内部構成を説明する。この構成では、再生画像（局部復号画像）を保存するフレームメモリを符号化装置に接続されるカメラの数だけ余分に用意し（双方向予測を用いる場合にはカメラの数の２倍）、各カメラに割り当てる。そして、入力画像を撮影したカメラに対応するフレームメモリに保存されている画像がフレーム間予測時の参照画像となるように制御し、通常はカメラ切り替え時に必ず発生する符号量の大きいイントラフレームの発生率を抑える。この際、図１０のデータ７０１あるいはデータ７０３のように、複数のカメラにより撮影された映像の符号化データを多重化して配信する場合には、参照画像の切り換え情報を復号側に通知する。
【００１７】
基本的な動画像の符号化処理では動画像の１フレームは、図１２に示すように、１個の輝度信号（Ｙ信号：２００１）と２個の色差信号（Ｃｒ信号：２００２，Ｃｂ信号：２００３）にて構成されており、色差信号の画像サイズは縦横とも輝度信号の１／２となる。符号化の際には、まず、入力画像２００は、ＭＢ分割部３００にて図１２に示すような小ブロックに分割される。この小ブロックは、マクロブロックと呼ばれる。図１３にマクロブロックの構造を示す。マクロブロックは１６ｘ１６画素の１個のＹ信号ブロック２１０１と、それと空間的に一致する８ｘ８画素のＣｒ信号ブロック２１０２ならびにＣｂ信号ブロック２１０３にて構成されている。なお、Ｙ信号ブロックは、更に４個の８ｘ８画素ブロック（２１０１−１，２１０１−２，２１０１−３，２１０１−４）に分割して処理されることがある。分割されたマクロブロックは、イントラ符号化方法、あるいはインター符号化方法の何れかの方法にて符号化される。
【００１８】
イントラ符号化では、入力マクロブロック画像２０１は、６個の８ｘ８画素ブロック（２１０１−１，２１０１−２，２１０１−３，２１０１−４，２００２，２００３）毎にＤＣＴ変換器２０３に入力され、６４個のＤＣＴ係数に変換される。各ＤＣＴ係数は、制御部３０１にて定められる量子化パラメータ（量子化の精度を決める値でＭＰＥＧ−４では１〜３１が移動範囲、図２の処理により決定される条件を満たすように制御する）に従って量子化器２０４にて量子化される。量子化されたＤＣＴ係数は、多重化器２０６に渡され、符号化される。この際、量子化パラメータも多重化器２０６に渡され、符号化される。量子化されたＤＣＴ係数は、局部復号器２２０の逆量子化器２０７と逆ＤＣＴ器２０８にて、入力ブロック画像に復号され、フレームメモリ２１０に合成される。局部復号器２２０は、復号側での復号画像と同じものを作成する能力をもつ必要がある。フレームメモリ２１０に蓄積された画像は時間方向のフレーム間予測に用いられる。
【００１９】
フレームメモリ２１０には、符号化対象の入力画像の復号画像が保存される。参照画像メモリ３１６には、符号化装置に接続されたカメラ位置の数だけフレームメモリが用意されている（双方向予測を行う場合には、各カメラに対して２枚ずつ必要）。参照画像メモリ３１６内の各フレームメモリは各カメラと１対１で対応し、対応するカメラに対する参照画像が保存される。フレームメモリ２１０のフレームデータと参照画像メモリ３１６のフレームデータは、実質上は区別なく管理されており、符号化装置に入力される画像を撮影するカメラが切り替えられた時に、フレームメモリ２１０のフレームデータが保存されているメモリのポインタと、参照画像メモリ３１６において切り替え前のカメラに対応するフレームデータが保存されているメモリのポインタとが交換される。
【００２０】
インター符号化では、まず、入力マクロブロック画像２０１と入力画像に対応するフレームメモリ２１０内の前フレームの局部復号画像との間で、動き補償処理が、動き補償器２１１にて行われる。動き補償とは、前フレームの局部復号画像（参照画像）から対象マクロブロックの内容と似通った部分（一般的には、前フレームの探索範囲に対して、輝度信号ブロック内の予測誤差信号の絶対値和が小さい部分を選択する）を検索し、その動き量（動きベクトル）を符号化する時間方向の圧縮技術である。図４に動き補償の処理構造を示す。図４は、太枠で囲んだ現フレーム５１の輝度信号ブロック５２について、参照画像５３上の予測ブロック５５と動きベクトル５６を探索範囲５７に対して示した図である。動きベクトル５６とは、現フレームの太枠ブロックに対して空間的に同位置に相当する参照画像上のブロック５４（破線）から、参照画像上の予測ブロック５５領域までの移動分を示している（色差信号用の動きベクトル長は、輝度信号の半分とし、符号化はしない）。通常は、フレームメモリ２１０内の局部復号画像が参照画像として動き補償器に提供されるが、カメラ切り替え直後のフレームの符号化時には、スイッチ３１７を参照画像メモリ３１６に切り替え、参照画像メモリ３１６から参照画像が提供される。参照画像メモリ３１６から、入力画像に対応する参照画像を選択する手順は次のようになる。符号化装置に入力される画像を撮影するカメラが切り換えられると、カメラシステムから、現在のカメラ番号を含むカメラスイッチ情報３１３が制御部３０１に入力される。これに対応して制御部３０１は、切り替え指令３１４をスイッチ３１７に、カメラ番号情報３１５を参照画像メモリ３１６に通知する。これにより、動き補償時の参照画像が、参照画像メモリ３１６内のカメラ番号情報３１５に対応するカメラの参照画像に切り換えられる。カメラ番号情報３１５は復号側に通知する必要がある。方法としては、ビデオの符号化データに合成して送る方法が考えられる。図９に示すような符号化データに発生し得ないユニークワードとカメラ番号情報の組み合わせ２０００をカメラ切り替えが発生したフレームのビデオデータの前に合成すればよい。なお、このデータ２０００のデータサイズは図２の符号量ｅに加算する必要がある。このような動き補償により検出された動きベクトル２１２は、多重化器２０６にて符号化される。また、動き補償によりフレームメモリ上の参照画像から抜き出された予測マクロブロック画像２１３は、現フレームの入力マクロブロック画像２０１との間で差分器２０２にて差分処理され、差分マクロブロック画像が生成される。差分マクロブロック画像は、図１３に示した６個の８×８画素ブロック（２１０１−１，２１０１−２，２１０１−３，２１０１−４，２００２，２００３）毎に、ＤＣＴ器２０３に入力され、６４個のＤＣＴ係数に変換される。各ＤＣＴ係数は、量子化パラメータ（量子化の精度を決める値でＭＰＥＧ−４では１〜３１が移動範囲、図２の処理により決定される条件を満たすように制御する）に従って量子化器２０４にて量子化され、量子化パラメータとともに多重化器２０６に渡され、符号化される。予測符号化の場合も、量子化ＤＣＴ係数を局部復号器２２０の逆量子化器２０７と逆ＤＣＴ器２０８にて、差分マクロブロック画像に復号し、加算器２０９にて予測マクロブロック画像と加算した後、フレームメモリ２１０に合成する。
【００２１】
イントラ符号化（ＩＮＴＲＡ）と予測符号化（ＩＮＴＥＲ）の判定は、ＩＮＴＲＡ／ＩＮＴＥＲ判定部２１４にてＭＢ単位で行われる。一般的に判定は、ＩＮＴＥＲは輝度信号ブロックにおける予測誤差の絶対値和、ＩＮＴＲＡは輝度信号ブロック内の平均値からの差分の絶対値和を評価値として行われる。なお、インター符号化の予測方法には、時間的に過去のフレームの情報を用いて予測マクロブロック画像の生成を行う前方予測以外に、時間的に未来のフレームの情報を用いて予測マクロブロック画像の生成を行う後方予測や時間的に過去と未来のフレームの情報を用いて予測マクロブロック画像の生成を行う双方向予測もある。後方予測や双方向予測を使用する符号化装置では、各カメラに対して、２枚の参照画像を用意する必要がある。また、本明細書では詳細は割愛するが、イントラ符号化でもフレーム内の予測が通常用いられる。イントラ予測の場合には、符号化中の画像の局部復号画素や符号化済みのＤＣＴ係数などが予測に用いられる。このようなフレーム内予測の特徴は本発明の処理手順には影響しない。
【００２２】
次に、図１における制御部３０１における量子化パラメータの設定処理について説明する。制御部３０１には、符号化処理を開始する前に予め規定格納データ量が記録されている必要がある。規定格納データ量の通知方法としては、予め符号化装置に設定されている場合、符号化装置の管理者が外部入力し、符号化装置の設定を変更する場合、ネットワークを介してデータ蓄積装置３から送信される場合などが考えられる。規定格納データ量は、データ蓄積装置の構成に依存する値であり、通常は変化しない。但し、ストレージ装置のディクス交換などが発生した場合には、更新する必要がある。
次の入力画像が入力されると制御部３０１は多重化部２０６から得られる前フレームのビット量情報３１０や入力画像のアクティビティー変動などから図２の推定値ｐを推測した後、図２の処理を開始する。そして、処理８０３あるいは処理８０４まで処理が進行すると、フレームの符号化タイプ（イントラフレーム、インターフレーム）と入力画像の目標符号量（処理８０３ではＤ−Ｔ、処理８０４ではイントラ符号化の推定符号量）が決定される。制御部３０１は、画質が落ちない範囲で、符号量が目標符号量に近く、かつ目標符号量よりも小さくなる値となるように量子化パラメータを制御し、入力画像の符号化処理を実施する。その後、実際の符号量にて処理８０５、処理８０６を実施する（条件によっては処理８０４も実施）。
【００２３】
図３と図１１にて、図７における復号装置４の内部構成を説明する。図１１は、図８のデータ７０２のように、１つのカメラにより撮影された画像の符号化データを再生する復号装置の構成であり、図３は、図８のデータ７０２に加えて、データ７０１やデータ７０３のような２つ以上のカメラにより撮影された画像の符号化データを多重化したデータも再生できる復号装置の構成である。
【００２４】
図１１では、まず、符号解読部５０１にて入力された符号化データを解析し、バイナリーコードから意味のある復号情報に変換する。そして、動きベクトル情報と予測モード情報（ＩＮＴＲＡ／ＩＮＴＥＲ判定）を動き補償器５０４に、量子化ＤＣＴ係数情報を逆量子化器５０２に振り分ける。解析したマクロブロックの予測モードがイントラ符号化であった場合には、復号した量子化ＤＣＴ係数情報を、逆量子化器５０２と逆ＤＣＴ器５０３において、８×８画素ブロック毎に逆量子化・逆ＤＣＴ処理し、マクロブロック画像を再生する。マクロブロックの予測モードがインター符号化であった場合には、まず、動き補償器５０４にて、予測マクロブロック画像が生成される。具体的には、動きベクトル情報の動き量に従って、前フレームの復号画像が蓄積されているフレームメモリ５０７から予測マクロブロック画像が抜き出される。次に、予測誤差信号に関する符号化データを、逆量子化器５０２と逆ＤＣＴ器５０３において、８×８画素ブロック毎に逆量子化・逆ＤＣＴ処理し、差分マクロブロック画像を再生する。そして、予測マクロブロック画像と差分マクロブロック画像を加算器５０５にて加算処理し、マクロブロック画像を再生する。再生されたマクロブロック画像は、合成器５０６にて復号フレーム画像に合成される。また、復号フレーム画像は、次フレームの予測用にフレームメモリ５０７に保存される。
【００２５】
図３では、動き補償器５０４に入力される参照画像を符号化側と同じ画像となるように制御する必要があり、復号カメラ番号情報に従って、スイッチ５０９を制御する。図１の符号化装置と同様に、再生画像を保存するフレームメモリ５０７に加えて、参照画像メモリ５０８に、符号化装置での設定カメラ位置の数だけ（双方向予測を用いる場合カメラ数の２倍）フレームメモリを用意している。フレームメモリ５０７には、復号された画像が保存される。参照画像メモリ５０８の各フレームメモリには、符号化装置における各設置カメラに対応する参照画像が保存される。フレームメモリ５０７のフレームデータと参照画像メモリ５０８のフレームデータは、実質上は区別なく管理されており、符号化解読部５０１がカメラ番号情報を受信したときに、フレームメモリ５０７のフレームデータが保存されているメモリのポインタと、参照画像メモリ５０８において最後に復号された画像に対応するカメラの参照画像が保存されているメモリのポインタとが交換される。符号解読部５０１にて復号されたカメラ番号情報が復号されると、スイッチ５０９に切り替え指令５１０が通知され、動き補償器５０４への入力パスが参照画像メモリ５０８側に切り替えられる。同時に、参照画像メモリ５０８にカメラ番号情報５１１が通知される。参照画像メモリ５０８は、カメラ番号情報５１１に対応する参照画像を動き補償器５０４に提供する。スイッチ５０９は、１フレーム分の符号化データの復号処理完了時にフレームメモリ５０７側に切り替えられる。
【００２６】
本発明の適用はストレージ装置内の構成には依存せず、ハードディスクであっても、ディスクアレイであっても、磁気テープであっても、光ストレージディスクメディアであっても本発明の符号化方法は実施できる。本発明では、ストレージ装置や配信サーバの性能により決定される規定格納データ量を考慮した符号化方法に特徴があり、規定格納データ量は、ストレージ装置内の構成に関わらず存在する。規定格納データ量を決める要素としては、ディスクアレイのディスクサイズ、ディスクのセクタサイズ、ディスクの論理データサイズ、メディアのセクタサイズ、メディアの論理データサイズ、配信サーバのキャッシュサイズなどが挙げられる。
【００２７】
本発明の符号化方法は、図７のように複数のカメラを持たない単一カメラのシステムでも適用できる。
【００２８】
本発明の符号化方法は、ストレージ装置に符号化データを記録するシステムに適用することが可能であり、監視システムに限定されない。例えば、符号化データを蓄積し、オンデマンドで映像を配信する映像配信サーバにも適用できる。
【００２９】
【発明の効果】
ディスクの不使用領域が削減され、ストレージ装置の利用効率が向上する。また、ランダムアクセス処理における検索時間が安定するとともに、データ転送処理のスループットが向上する。
【図面の簡単な説明】
【図１】本発明の符号化装置の構成を説明する図である。
【図２】本発明の符号量制御の処理を説明する図である。
【図３】本発明により生成された符号化データを復号する復号装置の構成を説明する図である。
【図４】動き補償の原理を説明する図である。
【図５】ビデオ符号化ビットストリームの全体構成を示した図である。
【図６】従来の符号化方法により生成されるランダムアクセス単位の符号化データ量と規定格納データ量の関係を説明する図である。
【図７】ストレージ装置を含むネットワーク映像監視システムの全体構成を説明する図である。
【図８】本発明の符号化方法により生成されるランダムアクセス単位の符号化データ量と規定格納データ量の関係を説明する図である。
【図９】カメラ情報のフォーマットを説明する図である。
【図１０】本発明の符号化方法により生成されるビデオデータの例を説明する図である。
【図１１】本発明により生成された符号化データを復号する復号装置の別構成を説明する図である。
【図１２】ビデオ符号化におけるマクロブロック分割を示した図である。
【図１３】ビデオ符号化におけるマクロブロックの構成を示した図である。
【符号の説明】
２００…入力画像、２０１…入力マクロブロック画像、２０２…差分器、２０３…ＤＣＴ処理部、２０４…量子化部、２０６…多重化部、２０７、５０２…逆量子化部、２０８、５０３…逆ＤＣＴ部、２０９、５０５…加算器、２１０、５０７…フレームメモリ、２１１、５０４…動き補償部、２１４…ＩＮＴＲＡ／ＩＮＴＥＲ判定部、３００…ＭＢ分割処理部、３０１…制御部、３１７、５０９…スイッチ、３１３…カメラスイッチ情報、３１４、５１０…切り替え指令、３１５、５１１…カメラ番号情報、３１６、５０８…参照画像メモリ、５０１…符号解読部、５０６…合成部、５１３…表示装置。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention belongs to a moving image data encoding technique.
[0002]
[Prior art]
2. Description of the Related Art In a video surveillance system that stores and manages encoded data obtained by encoding moving images in a storage device, data storage processing is performed in consideration of data access efficiency. In such a system, in response to a request from a user, a necessary data chunk is extracted from the storage device to the video distribution server and data distribution processing is performed. At this time, the amount of data to be extracted from the storage device includes a disk array. There is an optimum value for the system in terms of the basic data size, the logical size of the disk, and the cache size of the video distribution server. In this specification, this optimum data amount is defined as a specified storage data amount, and an area on the disk that is divided by the specified storage data amount is defined as a storage data area.
When searching for necessary image data in a storage device, first, a storage data area including necessary data is searched, and then, encoded data in the storage data area is decoded to search for required image data. Take steps. Therefore, in such a system, the first frame of each storage data area is coded data of a frame that can be randomly accessed.
[0003]
There are roughly two types of video coding methods, which are called intra coding and inter coding, respectively. Inter-encoding utilizes previously encoded frames for prediction processing in order to use inter-frame correlation. Intra coding is a coding method using only intra-frame correlation. A frame encoded using only intra coding is called an intra frame, and a frame encoded using inter coding is called an inter frame. Therefore, the frames that can be randomly accessed are only intra frames that can be reproduced only with the encoded data of the frame.
[0004]
FIG. 5 shows a basic data structure of MPEG-4 video encoding as an example. The VOS header is profile / level information that determines the applicable range of MPEG-4 video products, the VO header is version information that determines the data structure of video encoding, and the VOL header is image size, encoding bit rate, frame memory size, application tools, etc. Information. The GOV header contains time information. The VOP is encoded data of each frame, and the VOP immediately after the GOV header is always an intra frame. Since the VOS header, VO header, VOL header, and GOV header all start with a 32-bit unique word, they can be easily searched. End code of VOS indicating the end of the sequence is also a 32-bit unique word. These unique words start with 23 “0” s and 1 “1”, and have a structure in which 2 bytes of data following the 24 bits indicate the type of delimiter.
As described above, the encoded data is stored in the storage data area in a randomly accessible data unit (hereinafter, referred to as a random access unit) starting from an intra frame such as a GOV composed of a GOV header and a plurality of VOPs. And implements random access processing.
[0005]
[Problems to be solved by the invention]
For the purpose of communication or broadcasting, from the viewpoint of decoding processing such as buffer control, the number of frames in the random access unit is fixed, and the number of frames in the random access unit is almost constant so that the code amount in the random access unit is almost constant. Perform data distribution. However, for the purpose of monitoring and monitoring, the image quality is important, and the activity of the video greatly differs depending on the time and place. Therefore, if the number of frames in the random access unit is fixed and data is stored in the storage device in the random access unit, a large amount of unused disk space occurs due to a mismatch between the prescribed storage data amount and the data amount in the random access unit, Disk usage efficiency is reduced. For example, an unused disk area occurs as shown in data 601 in FIG. 6 (relationship between the prescribed storage data amount and the random access data amount). Such a phenomenon is likely to occur particularly at midnight when there is little movement in the video. Further, when data of a random access unit straddles two storage data areas like the data 602, a large unused disk area is generated, and the disk use efficiency is reduced. Such a phenomenon is likely to occur when the motion in the video is large.
[0006]
Also, since the time required for the search process in the random access unit depends on the amount of data in the random access unit, if the number of frames in the random access unit is fixed, the time required for the random access process is not stable, This leads to a decrease in service quality.
[0007]
Furthermore, when the number of frames in the random access unit is fixed, the data amount in the storage data area varies, so that the efficiency of using the communication band when distributing data to the decoding device is reduced.
[0008]
[Means for Solving the Problems]
The number of frames in the random access unit is changed according to the data amount, and encoding control is performed so that a multiple of the data amount of the random access unit approaches the prescribed storage data amount of the storage device. More specifically, if the sum of the estimated code amount of the next input image and the total amount of unit data is larger than the prescribed storage data amount, after initializing the unit data total amount, the code amount of the input image is reduced to the prescribed storage data amount. The input image is intra-coded so as to be smaller, and the code amount is added to the unit data total amount. If the sum of the estimated code amount of the next input image and the total amount of unit data is smaller than the prescribed storage data amount, the input image is interpolated so that the sum of the code amount of the input image and the total amount of unit data becomes smaller than the prescribed storage data amount. Encode.
[0009]
BEST MODE FOR CARRYING OUT THE INVENTION
As described in the related art, in a network monitoring system including a storage device, data distribution processing from the storage device to the decoding device is performed for each storage data area. Therefore, the first frame of the encoded data stored in each storage data area needs to be an intra frame. In a coding apparatus that performs coding control (fixed coding rate) so that the coding rate does not fluctuate, intra-frames are generated at regular intervals of a certain number of frames as periodic processing, and one cycle is set as a random access unit. As an exceptional process, an intra-frame may be generated at the time of a scene change (corresponding to a surveillance video when a camera is operated or a camera is switched). This is because the amount of encoded data can be reduced. However, in surveillance applications, the quality of each image in a video scene needs to be high quality, and the code amount of each frame greatly differs between a time when inter-frame fluctuation is large and a time when inter-frame fluctuation is small. . Therefore, after applying the encoding control method of inserting intra frames at a fixed frame number interval, and recording encoded data in the storage device so that the first frame of the storage data area is intra-coded, the The rate of occurrence of unused areas in the disk increases, and the disk utilization efficiency decreases.
[0010]
Therefore, in the present invention, a data amount that is a guide is set for each random access unit, and encoding processing is performed so as to approach the set data amount. Then, the target data amount of the random access unit is set so that a multiple thereof becomes the prescribed storage data amount. As a result, the number of frames in the random access unit becomes variable, but the data amount in the random access unit is stabilized, and the value of each storage data amount becomes a value close to the specified storage data amount. as a result,
1) The unused area of the storage device decreases, and the amount of data that can be stored increases.
2) The number of storage data areas required to accumulate a certain amount of data is reduced, and the average search time of the storage data area including the encoded data of the frame to be accessed is reduced.
3) Since the amount of encoded data between two intra frames is stable, the maximum search time in a random access unit and the response delay at the time of a search request are reduced (necessary because random access processing involves decoding processing). The time depends on the amount of encoded data),
4) Since the amount of codes stored in each storage data area is stabilized, the efficiency of data retrieval from the storage device to the distribution server is improved.
5) Since the amount of codes stored in each storage data area is stabilized, the communication band to the decoding device and the utilization efficiency of the data path in the data storage device including the storage device and the distribution server are improved.
And other effects.
[0011]
FIG. 2 shows the code amount control method of the present application, taking as an example a case where the target data amount of the random access unit and the prescribed storage data amount are made to match. When the next input image is input (process 801), the sum of the prescribed storage data amount D, the unit total data amount T, and the estimated code amount p is compared (process 802). Here, the unit total data amount T indicates the total code amount from the first frame recorded in one storage data area to the immediately preceding recording frame. The estimated code amount p is an estimated code amount when the next input image is inter-coded, and is a value estimated from the code amount of the immediately preceding frame, the current time, the camera angle, the activity of the current random access unit, and the like. is there. If D is larger than p + T, the input image is inter-coded so that the code amount e becomes smaller than DT (process 803). On the other hand, if D is smaller than p + T, the T value is first initialized to 0, and the encoding process for the current storage data area in the immediately preceding frame ends. Then, with the input image as the first frame of the new storage data area, intra coding is performed so that the code amount e becomes smaller than D (process 804). After the process 803 is completed, e is compared with DT (process 805). If e is larger than DT, the encoding process of process 803 is canceled and process 804 is performed. If e is smaller than DT in the processing 805, the value of T is updated (T = T + e) (processing 806). Also, after the processing 804 is completed, the value of T is updated (T = T + e) (processing 806). Thereafter, the processes 801 to 806 are repeated.
[0012]
In the above description, for simplicity, the target data amount in random access units and the prescribed storage data amount are used. However, the point of the present invention is that the first frame of each data storage area is an intra frame, and the amount of data stored in each data storage area is close to the prescribed storage data amount. Therefore, the number of intra frames included in each data storage area is not limited, and in an environment where response speed to random access is important, it is better to provide a plurality of intra frames in the data storage area to improve service quality. Good. For example, a method of setting n times (n is an integer of 2 or more) the target data amount of the random access unit so as to be the specified storage data amount, preparing a plurality of candidates for the target data amount, and determining a combination of the specified storage amounts. You may set so that it may become a data amount. In this case, the following processing is added before the processing 802. First, the total data amount A + p of the random access unit is compared with the target data amount B (process 807). At this time, if A + p is greater than B, A is initialized to 0 (step 808). The value of A + p is greater than B, and the current random access unit is not the last random access unit allocated to the data storage area (process 809), but the number of encoded frames in the random access unit If c is larger than the minimum number F of frames in the random access unit (process 812), the input image is set as an intra frame of the next random access unit, p is set as an estimated code amount of intra coding (process 810), and The value is set to 1 (step 811). Whether the value of A + P is larger than B and the current random access unit is the last random access unit allocated to the data storage area (process 809), or the value of c is smaller than F In this case (process 812), the value of c is increased by 1 (process 811). If the value of A + P is smaller than B in process 807, the value of c is increased by 1 (process 811). The process 812 has the effect of solving the problem that the coding efficiency is reduced in a scene where the change between frames is large. In the present invention, since the number of frames in the random access unit is controlled by the amount of encoded data, the number of frames in the random access unit is reduced in a scene having a large inter-frame variation. Will be higher. Therefore, when the number of frames in one random access unit is smaller than the prescribed value F, the next random access unit is synthesized, and the frequency of occurrence of intra frames in the time direction is reduced. With this processing, it is possible to avoid a decrease in coding efficiency. The value of F is set to a value smaller than the average number of frames in the random access unit. Further, in a system in which the data delivery process from the storage device to the decoding device is n times the prescribed storage data amount (n is an integer of 2 or more), the target data amount in random access units is set to a value larger than the prescribed storage data amount. Can be set to In this case, a value m times the target data amount (m is an integer of 2 or more and smaller than n) in the random access data unit is set to an n times value of the prescribed storage data amount, and the value of D is n times the prescribed storage data amount. Control as a value.
[0013]
In the above description, in the process 805, when e + T> D, the encoding process of the process 803 is canceled and the process 804 is performed. However, e is set to 0 (without encoding the input image, ) And processing 806 may be performed.
[0014]
Data 801 to data 803 in FIG. 8 show examples of MPEG-4 encoded data generated by the processing in FIG. 2 (the random access unit is GOV). As described above, the number of frames belonging to one random access unit varies depending on the activity of the video scene, the camera angle, the shooting time, and the like. However, the code amount of each random access unit is a value close to the prescribed storage data amount. The unused area of each storage data area is reduced as compared with the case of FIG.
FIG. 7 shows a configuration example of a network monitoring system including a storage device that performs the processing of FIG. In FIG. 7, the encoding device sequentially inputs images of each angle captured by a number of cameras (1a, 1b, 1c, 1d,...) To the encoding device 2. The encoding device 2 sequentially encodes the input video from each camera. For example, when three cameras are connected to the encoding device 2 and the encoding rate of each camera is 1 frame / sec, the encoding is performed at 3 frames / sec while shifting the input timing to the encoding device. I do. As another example, there is a case where the frame rate is set to 10 frames / sec, the video of the camera that inputs data to the encoding device is switched for several seconds, and the encoding is sequentially performed. 11 is a configuration example of encoded data when the data 701 in FIG. 10 includes three cameras. As shown in data 702 (encoded data of camera 1) in FIG. 10, the encoded data of each camera may be recorded as individual data in the storage device. When recording individually, the processing in FIG. 2 is also performed individually for each camera. In surveillance applications, camera number information and data feature information may be added to encoded data to improve search efficiency. Also in this case, the present invention can be applied by adding the data size of such information to the value of the code amount e in FIG. In addition, when there is room in the processing of the encoding device, encoded data of a plurality of bit rates and screen sizes may be generated for each input image in order to support decoding devices of different specifications. In this case, the present invention can be applied by individually applying the processing of FIG. 2 to the encoded data of each specification. The encoded data is distributed to the data storage device 3 via the network. Although FIG. 7 shows only one encoding monitoring apparatus, a plurality of monitoring apparatuses are connected via a network. In the data storage device 3, the receiving server 3a receives the encoded data of the monitoring video provided from each encoding monitoring device, and the code amount as shown in FIG. 8 becomes the prescribed storage data amount (setting method will be described later). After being divided into close and randomly accessible data units (consisting of one or more random access units), the data is stored in the storage data area of the storage device 3b. In some systems, the target data amount in random access units may be set to a value larger than the prescribed storage data amount.) At this time, for the storage data area where the VOL header of FIG. 5 does not exist before the first frame data in the storage data area, it is possible to add a VOS header, a VO header, and a VOL header before the GOV header to the receiving apparatus. Distribution processing becomes easy. However, it is necessary to add the data size of such information to the value of the code amount e. The camera number and time information (image size and bit rate, if necessary) of the data stored in each storage data area are managed by the distribution server 3c. In the case where the encoding device 2 performs the code amount control of FIG. 2 for each camera individually, first, as shown in FIG. 10, the input encoded data (data 701) of each camera is It is divided into encoded data (data 702, an example of data of camera 1). Then, each camera is divided into data units (consisting of one or more random access units) in which the code amount as shown in FIG. 8 is close to the prescribed storage data amount (setting method will be described later) and is randomly accessible. Is stored in the storage data area. When receiving a command from the monitor 5 via the network, the distribution server 3c searches for a storage data area including the corresponding time and the encoded data of the camera, and stores the data in the memory area in the distribution server for each storage data area. The data is read and distributed to the decryption device 4. At this time, the profile of the playback terminal on the monitor side is received at the same time as the command, and if the image size of the encoded data stored in the storage device does not match the specifications of the playback terminal, perform transcoding. To deliver. When encoded data corresponding to a plurality of terminal specifications is stored in the storage device by the processing of the encoding device, the encoded data of the optimal specification is searched and distributed. When coded data of a plurality of cameras is multiplexed and stored in the storage device 3b as in the data 701 in FIG. 10, only necessary data can be distributed based on a command from a supervisor. It becomes possible. For example, when only the data of camera 1 is requested, data 702 is delivered, when all of cameras 1 to 3 are requested, data 701 is delivered, and when only cameras 1 and 2 are requested, data 703 is delivered. .
In the above description, data distribution from the storage device to the decoding device is performed at the request of the supervisor, but the received encoded data may be distributed to the decoding device in real time. In this case, the data is transferred from the receiving server 3a to the distribution server 3b and then stored in the storage device 3b, or the data is directly distributed from the receiving server 3a to the decoding device and then stored in the storage device 3b.
[0015]
Further, in monitoring applications, long-term recording of several months is required, and there is a possibility that the disk capacity of the storage device becomes insufficient. On the other hand, as the time elapses, the access frequency of the coded data stored in the storage device from the supervisor decreases, and its importance decreases. Therefore, processing for reducing the amount of old data is performed. As a method,
1) performing transcoding (processing such as reducing the image size, reducing the frame rate, or reducing the image quality) on the encoded data in the storage device;
2) remove bidirectional prediction frames (encoded data not used for prediction of other frames and used in encoding schemes such as MPEG) in encoded data;
And the like, and the code amount is reduced. Specifically, the receiving server or the transmitting server extracts old data from the storage device, performs the above-described code amount reduction processing, and stores the data again in the storage device. The data in the storage data area where the original data is stored can be overwritten. Also in this case, if a code amount control method such as the process 2 in FIG. 2 is applied, the disk use efficiency is improved. Such a data amount reduction process is also effective when performing a backup process to another storage device data.
[0016]
1, the internal configuration of the encoding device 2 in FIG. 7 will be described. In this configuration, an extra frame memory for storing the reproduced image (locally decoded image) is prepared by the number of cameras connected to the encoding apparatus (twice the number of cameras when bidirectional prediction is used). Assign to camera. Then, control is performed so that the image stored in the frame memory corresponding to the camera that captured the input image is used as a reference image at the time of inter-frame prediction. Reduce the rate. At this time, as in the case of data 701 or data 703 in FIG. 10, when coded data of video captured by a plurality of cameras is multiplexed and distributed, the switching information of the reference image is notified to the decoding side.
[0017]
In the basic moving image encoding process, one frame of a moving image includes one luminance signal (Y signal: 2001) and two color difference signals (Cr signal: 2002, Cb signal: 2003), and the image size of the color difference signal is 信号 of the luminance signal both vertically and horizontally. At the time of encoding, first, the input image 200 is divided into small blocks as shown in FIG. This small block is called a macroblock. FIG. 13 shows the structure of a macroblock. The macro block is composed of one Y signal block 2101 of 16 × 16 pixels, and a Cr signal block 2102 and a Cb signal block 2103 of 8 × 8 pixels that spatially match the Y signal block 2101. The Y signal block may be further divided into four 8 × 8 pixel blocks (2101-1, 2101-2, 2101-3, and 2101-4) for processing. The divided macroblocks are encoded by any of an intra-encoding method and an inter-encoding method.
[0018]
In the intra coding, the input macroblock image 201 is input to the DCT converter 203 for each of six 8 × 8 pixel blocks (2101-1, 2101-2, 2101-3, 2101-4, 2002, 2003), and 64 Are converted into DCT coefficients. Each DCT coefficient is controlled by a quantization parameter (a value that determines the accuracy of quantization, which is determined by the control unit 301, in which the moving range is 1 to 31 in MPEG-4, and the condition determined by the processing in FIG. 2 is satisfied). ) Is quantized by the quantizer 204. The quantized DCT coefficients are passed to a multiplexer 206 and encoded. At this time, the quantization parameter is also passed to the multiplexer 206 and encoded. The quantized DCT coefficients are decoded into an input block image by an inverse quantizer 207 and an inverse DCT unit 208 of the local decoder 220, and are combined in a frame memory 210. The local decoder 220 needs to have the ability to create the same decoded image on the decoding side. The image stored in the frame memory 210 is used for inter-frame prediction in the time direction.
[0019]
The frame memory 210 stores a decoded image of the input image to be encoded. The reference image memory 316 is provided with the same number of frame memories as the number of camera positions connected to the encoding device (when performing bidirectional prediction, two frames are required for each camera). Each frame memory in the reference image memory 316 has a one-to-one correspondence with each camera, and stores a reference image for the corresponding camera. The frame data in the frame memory 210 and the frame data in the reference image memory 316 are substantially managed without distinction, and when the camera that captures the image input to the encoding device is switched, the frame data in the frame memory 210 is changed. Is exchanged with the pointer of the memory in which the frame data corresponding to the camera before switching in the reference image memory 316 is stored in the reference image memory 316.
[0020]
In the inter encoding, first, a motion compensation process is performed by the motion compensator 211 between the input macroblock image 201 and the locally decoded image of the previous frame in the frame memory 210 corresponding to the input image. Motion compensation refers to a portion similar to the content of a target macroblock from a locally decoded image (reference image) of a previous frame (generally, the absolute value of a prediction error signal in a luminance signal block is compared with the search range of the previous frame). This is a compression technique in the time direction for searching for a portion having a small sum of values) and encoding the amount of motion (motion vector). FIG. 4 shows a processing structure of motion compensation. FIG. 4 is a diagram showing a prediction block 55 and a motion vector 56 on a reference image 53 for a search range 57 for a luminance signal block 52 of a current frame 51 surrounded by a thick frame. The motion vector 56 indicates the amount of movement from the block 54 (broken line) on the reference image that spatially corresponds to the same position as the thick frame block of the current frame to the predicted block 55 area on the reference image. (The motion vector length for the color difference signal is half of the luminance signal and is not encoded.) Normally, the local decoded image in the frame memory 210 is provided to the motion compensator as a reference image. However, at the time of encoding a frame immediately after camera switching, the switch 317 is switched to the reference image memory 316, and the reference image memory 316 is referred to. An image is provided. The procedure for selecting a reference image corresponding to the input image from the reference image memory 316 is as follows. When the camera that captures the image input to the encoding device is switched, camera switch information 313 including the current camera number is input to the control unit 301 from the camera system. In response to this, the control unit 301 notifies the switch 317 to the switch 317 and the camera number information 315 to the reference image memory 316. Thereby, the reference image at the time of motion compensation is switched to the reference image of the camera corresponding to the camera number information 315 in the reference image memory 316. The camera number information 315 needs to be notified to the decoding side. As a method, a method of combining the encoded data with the video encoded data and sending it is conceivable. The combination 2000 of the unique word and the camera number information which cannot be generated in the encoded data as shown in FIG. 9 may be combined before the video data of the frame in which the camera switching has occurred. The data size of the data 2000 needs to be added to the code amount e in FIG. The motion vector 212 detected by such motion compensation is encoded by the multiplexer 206. The predicted macroblock image 213 extracted from the reference image on the frame memory by the motion compensation is subjected to difference processing between the input macroblock image 201 of the current frame and the differentiator 202 to generate a differential macroblock image. Is done. The difference macroblock image is input to the DCT unit 203 for each of the six 8 × 8 pixel blocks (2101-1, 2101-2, 2101-3, 2101-4, 2002, and 2003) shown in FIG. It is converted into 64 DCT coefficients. Each DCT coefficient is supplied to the quantizer 204 in accordance with a quantization parameter (a value that determines the precision of quantization, which is a moving range of 1 to 31 in MPEG-4, and that satisfies the condition determined by the processing in FIG. 2). , And is passed to the multiplexer 206 together with the quantization parameter, and is coded. Also in the case of predictive coding, the quantized DCT coefficients are decoded into a difference macroblock image by the inverse quantizer 207 and the inverse DCT unit 208 of the local decoder 220, and added to the predicted macroblock image by the adder 209. After that, the image is synthesized with the frame memory 210.
[0021]
The determination between the intra coding (INTRA) and the predictive coding (INTER) is performed by the INTRA / INTER determination unit 214 in MB units. In general, the determination is made using INTER as the evaluation value as the sum of the absolute values of the prediction errors in the luminance signal block, and INTRA as the evaluation value as the sum of the absolute values of the differences from the average value in the luminance signal block. In the prediction method of the inter-encoding, in addition to the forward prediction in which a predicted macroblock image is generated using information of a temporally past frame, a predicted macroblock image is predicted by using information of a temporally future frame. And bidirectional prediction in which a predicted macroblock image is generated using temporally past and future frame information. In an encoding device that uses backward prediction or bidirectional prediction, it is necessary to prepare two reference images for each camera. In addition, although details are omitted in this specification, prediction within a frame is usually used also in intra coding. In the case of intra prediction, a locally decoded pixel of an image being coded, coded DCT coefficients, and the like are used for prediction. Such features of intra prediction do not affect the processing procedure of the present invention.
[0022]
Next, the setting process of the quantization parameter in the control unit 301 in FIG. 1 will be described. It is necessary for the control unit 301 to record a prescribed storage data amount before starting the encoding process. As a method of notifying the prescribed storage data amount, when the encoding device is set in advance, when the administrator of the encoding device inputs the information externally and changes the setting of the encoding device, the data storage device 3 is connected via the network. May be transmitted from the server. The prescribed storage data amount is a value that depends on the configuration of the data storage device, and does not usually change. However, when a disk exchange or the like of the storage device occurs, the data needs to be updated.
When the next input image is input, the control unit 301 estimates the estimated value p of FIG. 2 from the bit amount information 310 of the previous frame obtained from the multiplexing unit 206 and the fluctuation of the activity of the input image, and the like. To start. Then, when the process proceeds to the process 803 or 804, the coding type of the frame (intra frame, inter frame) and the target code amount of the input image (DT in process 803, estimated code amount of intra coding in process 804) ) Is determined. The control unit 301 controls the quantization parameter so that the code amount is close to the target code amount and smaller than the target code amount within a range where the image quality does not deteriorate, and performs the encoding process of the input image. . Thereafter, the processing 805 and the processing 806 are performed with the actual code amount (the processing 804 is also performed depending on the condition).
[0023]
The internal configuration of the decoding device 4 in FIG. 7 will be described with reference to FIGS. FIG. 11 shows a configuration of a decoding device that reproduces encoded data of an image captured by one camera like data 702 in FIG. 8. FIG. 3 shows data 701 in addition to data 702 in FIG. This is a configuration of a decoding device that can also reproduce data obtained by multiplexing encoded data of images captured by two or more cameras, such as data and data 703.
[0024]
In FIG. 11, first, the coded data input by the decoding unit 501 is analyzed and converted from binary code to meaningful decoded information. Then, the motion vector information and the prediction mode information (INTRA / INTER determination) are distributed to the motion compensator 504, and the quantized DCT coefficient information is distributed to the inverse quantizer 502. If the prediction mode of the analyzed macroblock is intra coding, the decoded quantized DCT coefficient information is dequantized by the inverse quantizer 502 and the inverse DCT unit 503 for each 8 × 8 pixel block. Inverse DCT processing is performed to reproduce a macroblock image. When the prediction mode of the macroblock is inter coding, first, the motion compensator 504 generates a predicted macroblock image. Specifically, a predicted macroblock image is extracted from the frame memory 507 storing the decoded image of the previous frame according to the motion amount of the motion vector information. Next, the coded data relating to the prediction error signal is subjected to inverse quantization and inverse DCT processing for each 8 × 8 pixel block in an inverse quantizer 502 and an inverse DCT unit 503 to reproduce a differential macroblock image. Then, the predicted macroblock image and the difference macroblock image are added by the adder 505 to reproduce the macroblock image. The reproduced macro block image is combined with the decoded frame image by the combiner 506. The decoded frame image is stored in the frame memory 507 for prediction of the next frame.
[0025]
In FIG. 3, it is necessary to control the reference image input to the motion compensator 504 to be the same image as the encoding side, and the switch 509 is controlled according to the decoding camera number information. As in the encoding device of FIG. 1, in addition to the frame memory 507 for storing the reproduced image, the reference image memory 508 has the same number of camera positions as the number of camera positions set in the encoding device. Double) Frame memory is available. The decoded image is stored in the frame memory 507. Each frame memory of the reference image memory 508 stores a reference image corresponding to each installed camera in the encoding device. The frame data of the frame memory 507 and the frame data of the reference image memory 508 are managed substantially without distinction. When the encoding / decoding unit 501 receives the camera number information, the frame data of the frame memory 507 is stored. The pointer of the stored memory is exchanged with the pointer of the memory in which the reference image of the camera corresponding to the image decoded last in the reference image memory 508 is stored. When the decoding unit 501 decodes the decoded camera number information, a switch command 510 is notified to the switch 509, and the input path to the motion compensator 504 is switched to the reference image memory 508 side. At the same time, the camera number information 511 is notified to the reference image memory 508. The reference image memory 508 provides the reference image corresponding to the camera number information 511 to the motion compensator 504. The switch 509 is switched to the frame memory 507 when the decoding of one frame of encoded data is completed.
[0026]
The application of the present invention does not depend on the configuration in the storage device. The encoding method of the present invention is applicable to a hard disk, a disk array, a magnetic tape, and an optical storage disk medium. Can be implemented. The present invention is characterized by an encoding method that takes into account a prescribed storage data amount determined by the performance of the storage device and the distribution server, and the prescribed storage data amount exists regardless of the configuration in the storage device. Elements that determine the prescribed storage data amount include the disk size of the disk array, the disk sector size, the logical data size of the disk, the sector size of the media, the logical data size of the media, and the cache size of the distribution server.
[0027]
The encoding method of the present invention can be applied to a single camera system having no multiple cameras as shown in FIG.
[0028]
The encoding method of the present invention can be applied to a system that records encoded data in a storage device, and is not limited to a monitoring system. For example, the present invention can be applied to a video distribution server that stores encoded data and distributes video on demand.
[0029]
【The invention's effect】
The unused area of the disk is reduced, and the utilization efficiency of the storage device is improved. Further, the search time in the random access process is stabilized, and the throughput of the data transfer process is improved.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration of an encoding device according to the present invention.
FIG. 2 is a diagram illustrating a code amount control process according to the present invention.
FIG. 3 is a diagram illustrating a configuration of a decoding device that decodes encoded data generated according to the present invention.
FIG. 4 is a diagram illustrating the principle of motion compensation.
FIG. 5 is a diagram showing an overall configuration of a video encoded bit stream.
FIG. 6 is a diagram illustrating the relationship between the amount of encoded data in a random access unit generated by a conventional encoding method and a prescribed storage data amount.
FIG. 7 is a diagram illustrating an overall configuration of a network video monitoring system including a storage device.
FIG. 8 is a diagram illustrating the relationship between the amount of encoded data in a random access unit generated by the encoding method of the present invention and the prescribed amount of stored data.
FIG. 9 is a diagram illustrating a format of camera information.
FIG. 10 is a diagram illustrating an example of video data generated by the encoding method of the present invention.
FIG. 11 is a diagram illustrating another configuration of a decoding device that decodes encoded data generated according to the present invention.
FIG. 12 is a diagram illustrating macroblock division in video encoding.
FIG. 13 is a diagram illustrating a configuration of a macroblock in video encoding.
[Explanation of symbols]
Reference numeral 200: input image, 201: input macroblock image, 202: difference unit, 203: DCT processing unit, 204: quantization unit, 206: multiplexing unit, 207, 502: inverse quantization unit, 208, 503: inverse DCT 209, 505: adder, 210, 507: frame memory, 211, 504: motion compensator, 214: INTRA / INTER determination unit, 300: MB division processing unit, 301: control unit, 317, 509: switch, 313: camera switch information, 314, 510: switching instruction, 315, 511: camera number information, 316, 508: reference image memory, 501: code decoding unit, 506: combining unit, 513: display device.

Claims

A moving image encoding device including a storage device, wherein the encoding device has a control unit for controlling a code amount, and the control unit defines a sum of an estimated code amount of the next input image and a total amount of unit data in advance. If it is larger than the prescribed storage data amount, the total amount of unit data is initialized, and the input image is intra-encoded so that the code amount of the next input image is smaller than the prescribed unit storage data amount, and the code amount is united. If the sum of the estimated code amount of the next input image and the total amount of unit data is smaller than the specified unit storage data amount, the sum of the code amount of the next input image and the total amount of unit data is The input image is intra-coded or inter-coded so as to be smaller than the prescribed unit storage data amount, and the code amount is added to the unit data total amount.

The encoding apparatus according to claim 1, further comprising a path for inputting the prescribed storage data amount, and having a function of updating a prescribed value of the prescribed storage data amount.

The coding apparatus according to claim 1 or 2 is a moving picture coding apparatus included in a surveillance system having a camera switching function by a switcher, and is usually provided with a path for inputting camera switch information. In addition to the frame memory for the reference image, it has a plurality of additional frame memories for storing the local decoded image. The additional frame memory has one-to-one correspondence with each monitoring location. It has a function of switching a reference image from an image stored in a normal frame memory to an image stored in an additional frame memory.