JP2004193687A

JP2004193687A - Method using non-initialized buffer model

Info

Publication number: JP2004193687A
Application number: JP2002356054A
Authority: JP
Inventors: Feltman Mark; フェルトマンマーク; Yoichi Yagasaki; 陽一矢ケ崎; Tadayuki Ishikawa; 忠幸石川
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2002-12-06
Filing date: 2002-12-06
Publication date: 2004-07-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method using a non-initialized buffer easy to simplify. <P>SOLUTION: A timing scenario is described using an example. An encoder in the example generates a random access point after a fifth picture. To attain the best picture quality in a possible range in the example (often occurring in such scenario)a first picture after the random access point must be a comparatively a large picture (to execute an intra-coding). By using the modified model, there is no restriction that the buffer is made empty at this point and hence a comparatively large data quantity may be allotted to the first picture after the random access point instead of the fifth picture. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、ビデオ、オーディオまたはその他のデータの非初期化バッファモデルを用いた方法に関する。
【０００２】
【従来の技術】
最も近い関連従来技術は、ＩＳＯ／ＩＥＣＭＰＥＧａｎｄＩＴＵ−Ｔ
ＶＣＥＧ’，ＪｏｉｎｔＶｉｄｅｏＴｅａｍ’ｓｄｒａｆｔｓｔａｎｄａｒｄｄａｔｅｄ２００２−１１−１８，ａｎｄｃａｌｌｅｄＪＶＴ−Ｅ１４６ｄ３７に記載されている。（本明細書のファイル名は：ＪＶＴ−Ｅ１４６ｄ３７ｎｃｍ．ｄｏｃ）。このモデルは、デコーダバッファ、デコードタイミングの制約、並びにその他の関連する要求についての仮想参照（ｈｙｐｏｔｈｅｔｉｃａｌｒｅｆｅｒｅｎｃｅ）デコーダを含んでいる。このモデルは、図１に示すようにエンコードおよびデコードシステムを設計するエンジニアへの参考に給される。
エコードシステムの設計者は、そのモデルによって適切に処理されるストリームを生成するように、エンコードシステムの設計を試みる。
また、技術者は、このモデルを基にデコードシステムを設計し、しばしば、そのデコーダをＨＲＤの動作にできる限り近いものになるように努める。そのようなモデルは、正確に実現できない理想的な動作や動作を有する場合があり、デコーダの設計技術者は、理想的でない特性および理想的でない動作を補償する手段を負付加する必要がある。
【０００３】
また、このモデルのタイミングを記述するために式が用いられる。ＪＶＴ−Ｅ１４６ｄ３７のセクションＣ．１．１．１で引用され、あるいはそこから得られる以下の式は、このモデルの初期状態を示している。
【０００４】
ＨＲＤは、ランダムアクセスポイントＳＥＩメッセージによって規定される任意のランダムアクセスポイントに続く最初のピクチャで初期化可能である。そのバッファは、初期状態で空である。そのランダム・アクセスＳＥＩメッセージ後の最初のピクチャの最初のビットは、ＣＰＢに関連付けられたビットレートｂｉｔ＿ｒａｔｅ〔ｋ〕における初期到達（ａｒｒｉｖａｌ）時間ｔａｉ（０）＝０において、バッファに入りはじめる。
【０００５】
説明：
ｔ＝０における、ｔａｉ（０）＝０と、デコーダバッファフルネス（ｄｅｃｏｄｅｒ＿ｂｕｆｆｅｒ＿ｆｕｌｌｎｅｓｓ（ｔ）と呼ぶ）は：
ｄｅｃｏｄｅｒ＿ｂｕｆｆｅｒ＿ｆｕｌｌｎｅｓｓ（０）ビット
である。
続くピクチャの到達時間は、以下の式（１）で規定される。
【０００６】
【数１】

【０００７】
この式ｔａｉ，ｅａｒｌｉｅｓｔ（ｎ）は、以前は下記式（２）のように定義されていた。
しかしながら、それは最近、式（３）のように変更された。
【数２】

【数３】

しかしながら、後者は、恐らく、編集エラーであり、例えば、ｃｐｂ＿ｒｅｍｏｖａｌ遅延の現在の定義について編集エラーであり、以下の式（４）がより適切と思える。
【数４】

あるいは、ｃｐｂ＿ｒｅｍｏｖａｌを、下記式（５）と等しい値を返すように定義することも可能である。しかしながら、この本明細書では、（４）が使用される。
【数５】

【０００８】
以前の規格のように、このモデルの全ての制約および限定の結合された効果は、例えば、ピクチャのサイズおよび周波数を制約すること以外に、そのビットストリーム内の所定のフィールドの値に制約をおくことである。これらの式はエンコードタイミングについて言及していないが、この制約はおそらく、このモデルに仮想エンコーダを付加することでより明確になる。すなわち、所定の制約は、エンコーダ・バッファおよび適切タイングを用いれば、仮想エンコーダによっても規定可能である。このモデルに応じたエンコードおよびデコードのシナリオの例は、図１に示される。前述した式との関係、並びに図は以下である。
【０００９】
エンコーダバッファ入力は、曲線Ａである。これは、式（４）と同じである。
前述した式（１）で定義されるように各ピクチャの初期到達時間は、デコーダ入力曲線Ｋのある点として得られる。
デコーダバッファ出力曲線Ｌは、ｐｂ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙフィールドおよびｉｎｉｔｉａｌ＿ｃｐｂ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙで示されるタイングに対応している。
【００１０】
なお、値”ｄｅｃｏｄｉｎｇ＿ｄｅｌａｙ”によって曲線Ｌを左に移動することによって、曲線Ａが得られる。すなわち、時間オフセットを除くと、これら２つの曲線は同じタイングを規定する。
図４に記述された変数の幾つかは、ビットストリームに格納されて伝送され、他の変数は（ドラフト）規格内に記述されているだけである。下記表１は、これらの変数をさらに詳しく説明している。
【００１１】
【表１】

【００１２】
特性”ｋに応じた値”は、異なるバッファモデルを適用した場合に、変数の値を変えることができる場合を示している。これは、この（ドラフト）規格は、マルチバッファモデルをサポートしているためである。各バッファモデルｋは、例えば、異なるバッファ容量（Ｂ）と、異なるビットレート（Ｒ）を有することができる。この本明細書では、Ｂ（ｋ），Ｒ（ｋ），Ｆ（ｋ）について述べる代わりに、ｉｎｉｔｉａｌ＿ｃｐｂ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙ（ｋ）等、並びにより短いバージョンＢ，Ｒ，Ｆ、ｉｎｉｔｉａｌ＿ｃｐｂ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙが、信頼性を向上するためにしばしば用いられる。この場合、これらは、単に、例えば、バッファモデル０などの一つの所定のモデルの変数である。
【００１３】
【発明が解決しようとする課題】
この従来技術には、以下に示す問題がある：
このモデル自体は複雑である。
前述した表に示されるように、ビットストリーム内に比較的多数のフィールドをエンコードおよび挿入するという要求がある。これは、ビットストリームのオーバヘッドを増大させる。
これらのフィールドを演算して生成するために、例えば、図２に示すように、比較的複雑なエンコーダを実現する必要がある。その結果、図６に示すように、比較的複雑なデコーダを実現する必要がある。
【００１４】
このモデルは、ビットストリームの開始時、エンコーダがｄｅｃｏｄｅｒ＿ｄｅｌａｙを定義することを事実上要求している。このことは、図４に示される。例えば、ｔ＝０において、最初のピクチャが送信される（または、デジタル記録媒体に書き込まれる）。ピクチャが送信される直前に、メッセージが送られなければならない。このメッセージは、ｉｎｉｔｉａｌ＿ｃｐｂ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙ＿ｏｆｆｓｅｔおよびｉｎｉｔｉａｌ＿ｃｐｂ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙを定義している。これは、実質的に、ｄｅｃｏｄｅｒ＿ｄｅｌａｙに固定値を設定している。
しかしながら、多くのアプリケーションにおいて、１パスのエンコーダは、エンコード処理の開始で、ｄｅｃｏｄｅｒ＿ｄｅｌａｙの適切な値を設定できない。
【００１５】
このようなビットストリーム内のｄｅｃｏｄｅｒ＿ｄｅｌａｙ情報の明示的な符号化を行わなくても、このモデルを用いれば、多くの符号化シナリオにおいて、エンコーダは遅かれ早かれｄｅｃｏｄｅｒ＿ｄｅｌａｙを特定できる。これは、Ｒ（ｔ）がゼロになると生じる。（このシナリオを説明するその他の方法：添付した仮想参照エンコーダを用いれば、エンコーダバッファがゼロのときはいつでもＲ（ｔ）はゼロになる。）そのような場合、データ転送時に、関連するｉｎｉｔｉａｌ＿ｃｐｂ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙはｄｅｃｏｄｅｒ＿ｄｅｌａｙと等しくなると仮定している。そのため、そのようなエンコーダあるいはエンコードシステムでは、ｄｅｃｏｄｅｒ＿ｄｅｌａｙを効率的に定義している。
【００１６】
関連する問題：このモデルは、より短いｉｎｉｔｉａｌ＿ｃｐｂ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙ値を符号化することを許可してない。例えば、最短のｉｎｉｔｉａｌ＿ｃｐｂ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙ値を符号化することを常には許可していない。その結果、そのようなビットストリームは、可能な範囲で最速のデコーダスタートアップをサポートできない。
【００１７】
このモデルには、非連続問題（ｄｉｓｃｏｎｔｉｎｕｉｔｙｐｒｏｂｌｅｍ）がある。すなわち、ランダムアクセスポイントにおいて、バッファモデルの状態が、リセットされ、失われる。その結果、これらのポイントにおいて、続くビデオをサポートするのに不適切な状態がしばしば発生する。これは、図７に示される。この例では、５ピクチャ後に、他のＳＥＩメッセージがビデオストリーム内に挿入される。このポイントでデコーダバッファが空でなければならないという制約に従うために、前のピクチャ（例えば、５番目のピクチャ）は無駄の多い詰め込み（ｗａｓｔｅｆｕｌｓｔｕｆｆｉｎｇ）を必要とする。さらに、多くのシナリオにおいて、続くピクチャを、１ピクチャ期間後に、デコードして提供する必要がある。その結果、Ｒａｔｅ／Ｐｉｃｔｕｒｅ＿Ｐｅｒｉｏｄビットのみが、そのようなアクセスポイントがしばしば比較的大量のデータを要求するにも係わらず、そのようなランダムアクセスポイント後に、最初のピクチャによる利用が可能である。換言すると、そのような場合に、このモデルは、実用的でない。（そのため、これは、このモデルの範囲外である可能性がある）
【００１８】
【課題を解決するための手段】
本発明は、ビデオ、オーディオまたはその他のデータの非初期化バッファモデルを用いた方法に関する。
本発明は、前記バッファモデルの非ゼロ開始時間を用いた方法に関する。
本発明は、前記バッファモデル対しての非ゼロバッファフルネスを用いた方法に関する。
本発明は、デコード遅延パラメータの選択送信（伝送）方法に関する。
可変ビットレートタイミングモデルは、ビデオなどのエンコードシステムの設計者およびデコードシステムの設計者が相互運用を行うための参考となるように規定されている。このモデルは、所定の非ゼロ特性を用いて、ビットストリームの連続再生を行えるように初期化可能である。所定のタイミング情報は、隠蔽され、実行複雑さおよびビットオーバヘッドを削減している。
【００１９】
【発明の実施の形態】
このモデルは、以下に示す進歩ステップを用いることで簡単化される。：
タイミングイベントに関係する所定のデコーダをシフトする代わりに、時間ベース自体のシフトが、最初のピクチャの最初のビットに対してのゼロ到達時間より長い時間許容して行われる。
このことは、下記式（６）で示される。
【００２０】
【数６】

【００２１】
このことは、図８に示されるように、修正モデルに対しても適用される。なお、従来技術との差異は、従来技術がゼロ値のみを許可する点である。
続くピクチャ到達時間についての式（７）は、従来技術においてもそのまま規定されている、例えば、
【数７】

【００２２】
しかしながら、この改善された規則ｔａｉ（０）を用いて、ｔａｉ，ｅａｒｌｉｅｓｔ（ｎ）は、下記式（８）のように、さらに簡単化される。
【００２３】
【数８】

【００２４】
この発明の実施形態は、図８に示される。これは、図１０の例を用いて説明される。
【００２５】
この発明の他の観点は、ランダムなアクセスポイントにおいて、非ゼロバッファフルネスを許可することである。これは、下記式（９）でサポートされる。
【００２６】
【数９】

【００２７】
これは、図１１内のタイミングシナリオを例を用いて説明される。この例では、再び、エンコーダは、５番目のピクチャの後に、ランダムアクセスポイントを生成する。この例では、（そのようなシナリオの場合によくあることだが）、可能な範囲で最も良いピクチャ品質を達成するために、ランダムアクセスポイント後の最初のピクチャは比較的大きなピクチャでなければならない（イントラ符号化を行うためである）。この修正モデルを用いれば、このポイントでバッファを空にするという制約がないので、比較的で大きなデータ量を、５番目のピクチャの代わりに、ランダムアクセスポイント後の最初のピクチャに割り当てることができる。
【００２８】
本発明のその他の観点は、所定の種類の１パスーエンコードシステムをサポートするために、エンコードシステム内のｄｅｃｏｒｄｅｒ＿ｄｅｌａｙを隠蔽することである。これは、ｄｅｃｏｄｅｒ＿ｄｅｌａｙを直接的に搬送するフィールドを省いたり、この情報を間接的に搬送する（あるいは搬送を手助けする）フィールドを省くことで行われる。間接符号化方法は、２以上のフィールドが例えば図１に示されるようにｄｅｃｏｄｅｒ＿ｄｅｌａｙを搬送する場合を述べている。そのような場合に、ｄｅｃｏｄｅｒ＿ｄｅｌａｙを隠蔽するために、最も重要でないフィールドを省くことが好ましい。例えば、図６において（従来技術と比べて）、ｉｎｉｔｉａｌ＿ｃｐｂ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙよりも重要だという理由で、ｉｎｉｔｉａｌ＿ｃｐｂ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙが省かれている。これは、ｉｎｉｔｉａｌ＿ｃｐｂ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙが例えばデコーダのセットアップ遅延を最小化するために用いられているためである。そのような情報が隠蔽されると、図７に示すように、そのような情報をデコーダする必要がない。
【００２９】
本発明のその他の観点は、例えば２以上のフィールドのコンビネーションにより間接的に符号化されている場合であっても、ｄｅｃｏｄｅｒ＿ｄｅｌａｙを取得でいることである。（前述した例では複数のフィールドがある：ｉｎｉｔｉａｌ＿ｃｐｂ＿ｒｅｍｏａｌ＿ｄｅｌａｙ＿ｏｆｆｓｅｔおよびｉｎｉｔｉａｌ＿ｃｐｂ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙ）。また、以下の進歩ステップにより、例えば、ｄｅｃｏｄｅｒ＿ｄｅｌａｙは例えばデコーダシステムによって使用され、これにより、デコーダシステムは最短のスタートアップ遅延特性を用いてビットストリームを選択する。例えば、デコーダシステムは、そのようなビットストリームから（直接的に符号化あるいは取得された）ｄｅｃｏｄｅｒ＿ｄｅｌａｙを取り出し、これらのｄｅｃｏｄｅｒ＿ｄｅｌａｙを比較して最短のビットストリームを選択する。
【００３０】
また、エンコード側では、新種のオペレーションが可能である。特に、マルチプレクサは、（直接的に符号化あるいは取得された）ビデオのｄｅｃｏｄｅｒ＿ｄｅｌａｙを用いて、適切なオーディオエンコードバッファサイズをセットアップする。また、マルチプレクサは、（直接的に符号化あるいは取得された）ビデオのｄｅｃｏｄｅｒ＿ｄｅｌａｙを用いて、（関連する、あるいは非関連の）オーディオデータについての適切な初期送信遅延をセットアップする。
【００３１】
なお、本発明は、非ビデオデータが圧縮ビデオデータのようなビットレート特性を有する場合に適用可能である。また、所定の符号化されたビデオデータは、所定の符号化されたオーディオデータのビットレート特性と同じ特性を有している。そのため、本明細書内のビデオおよびオーディオについての全ての事項は、以下に示す前述した発明の観点の総括的な記載として置き換えられる。
【００３２】
マルチプレクサは、比較的高いバースティ（ｂｕｒｓｔｙ）のビットレートの（直接的に符号化あるいは取得された）ｄｅｃｏｄｅｒ＿ｄｅｌａｙを用いて、関連する低いバースティのビットレート用の適切なバッファサイズをセットアップする。また、マルチプレクサは、比較的高いバースティ（ｂｕｒｓｔｙ）のビットレートの（直接的に符号化あるいは取得された）ｄｅｃｏｄｅｒ＿ｄｅｌａｙを用いて、関連する低いバースティのビットレート用の適切な初期送信遅延をセットアップする。
【００３３】
本発明のその他の観点は、ｄｅｃｏｄｅｒ＿ｄｅｌａｙの送信および隠蔽をサポートして、全ての可能性のある種類のアプリケーションをサポートしている。これは、ビットストリーム内のｄｅｃｏｄｅｒ＿ｄｅｌａｙが（直接的あるいは間接的に）符号化されているか否かを示すタグをビットストリームに付加することでサポートされる。所定の種類の１パスーエンコードシステムは、フラグをゼロに設定して、ｄｅｃｏｄｅｒ＿ｄｅｌａｙを隠蔽する。これにより、エンコード処理過程で要求されると、ｄｅｃｏｄｅｒ＿ｄｅｌａｙを増加できる。その他のエンコードシステムは、フラグを１に設定して、例えば図８に示すようにｄｅｃｏｄｅｒ＿ｄｅｌａｙ情報がビットストリーム内に存在することを他のシステムに通知する。デコーダは、例えば、図９に示すフローチャートのように、双方の種類のビットストリームをサポートできる。
【００３４】
そのような直接的あるいは間接的にデコードされたｄｅｃｏｄｅｒ＿ｄｅｌａｙ値がなくても、エンコードシステムは、Ｒ（ｔ）が０になると、ｄｅｃｏｄｅｒ＿ｄｅｌａｙ値を固定する必要がある。このことは、ある種の１パス−エンコードでは問題である。この問題と、最小セットアップ遅延の符号化が常に可能というわけではないという問題との双方は、以下の方法で解決される。エンコーダバッファが空でない場合に、ｔａｉ（ｎ）＝ｔａｆ（ｎ−１）を要求する代わりに、以下の式が用いられる：
ｔａｉ（ｎ）＞＝ｔａｆ（ｎ−１）
すなわち、ｔａｉ（ｎ）＞＝ｔａｆ（ｎ−１）は、エンコーダバッファが空の場合に許可される。
（デコーダバッファがアンダーフローあるいはオーバーフローしないというその他の制約は以前として適用される）
【００３５】
エンコードシステムは、適切なｉｎｉｔｉａｌ＿ｃｐｂ＿ｄｅｌａｙ値（または、同様の意味を持つフィールドとともに）によるそのようなタイミングを特定できる。４ｃの例では、これは、例えば、ｍａｘ（）オペレーションの前にｔａｆ（ｎ−１）の値を増加することで付加される。
【００３６】
【発明の効果】
本発明の効果（従来技術と対比して）は以下のようになる：
本バッファモデルは簡略化される。
ｉｎｉｔｉａｌ＿ｃｐｂ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙ＿ｏｆｆｓｅｔが冗長になり、そのため、ビットストリームのオーバヘッドが低減される。
本エンコーダはこれらのフィールドを計算およびエンコードする必要がない。
（４）本デコーダは、これらのフィールドをデコードおよび処理しない。
（５）本モデルは、ランダムアクセスシナリオをサポートするのに適している。
（６）本エンコーダは、ストリームの開始時に、ｄｅｃｏｄｅｒ＿ｄｅｌａｙを決定する必要がない。
【図面の簡単な説明】
【図１】図１は、仮想参照デコーダを用いた相互運用を達成することを説明するための図である。
【図２】図２は、ＪＶＴの仮想参照デコーダモデルを説明するための図である。
【図３】図３は、従来技術のタイミングモデルの詳細を説明するための図である。
【図４】図４は、適合ビットストリームの一例を説明するため図である。
【図５】図５は、エンコーダの実現の一部に係わるＨＲＤの一例を説明するため図である。
【図６】図６は、デコーダの実現の一部に係わるＨＲＤの一例を説明するため図である。
【図７】図７は、ランダムアクセスタイムにおけるモデルの動作を説明するため図である。
【図８】図８は、本発明に係わるモデルを説明するため図である。
【図９】図９は、本発明に係わるモデルの詳細なタイミングを説明するため図である。
【図１０】図１０は、本発明に係わるｔａｉ（ｎ），ｔａｆ（ｎ）およびｉｎｉｔｉａｌ＿ｃｂｐ＿ｒｅｍｏｖａｌ＿ｄｅｌａｙｓをどのように計算するかをさらに詳細に説明する図である。
【図１１】図１１は、本発明に係わる適合ビットストリームの一例を説明するため図である。
【図１２】図１２は、ランダムアクセスポイントにおけるモデル動作を説明するため図である。
【図１３】図１２は、エンコーダの実現の一部に係わる簡単化されたＨＲＤの一例を説明するため図である。
【図１４】図１４は、デコーダの実現の一部に係わる簡単化されたＨＲＤの一例を説明するため図である。
【図１５】図１５は、最大符号化ピクチャバッファ遅延（本発明）を生成するエンコーダを示す図である。
【図１６】図１６は、条件付で最大符号化されたピクチャバッファ遅延を処理するデコーダ処理を示す図である。
【符号の説明】
１１エンコーダ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a method using an uninitialized buffer model for video, audio or other data.
[0002]
[Prior art]
The closest related prior art is ISO / IEC MPEG and ITU-T.
VCEG ', Joint Video Team's draft standard dated 2002-11-18, and called JVT-E146d37. (The file name in this specification is: JVT-E146d37ncm.doc). This model includes a hypothetical reference decoder for decoder buffers, decode timing constraints, and other relevant requirements. This model serves as a reference to the engineer designing the encoding and decoding system as shown in FIG.
The designer of the encoding system attempts to design the encoding system to produce a stream that is properly processed by the model.
Engineers also design decoding systems based on this model and often strive to make the decoder as close as possible to HRD operation. Such models may have ideal behaviors or behaviors that cannot be accurately realized, and decoder design engineers need to add a means of compensating for non-ideal characteristics and non-ideal behavior.
[0003]
Expressions are used to describe the timing of this model. Section C. JVT-E146d37. The following equations, quoted or derived from 1.1.1, show the initial state of the model.
[0004]
The HRD can be initialized with the first picture following any random access point defined by the random access point SEI message. The buffer is initially empty. The first bits of the first picture after the random access SEI message begin to enter the buffer at the initial arrival time tai (0) = 0 at the bit rate bit_rate [k] associated with the CPB.
[0005]
Description:
At t = 0, tai (0) = 0 and the decoder buffer fullness (called decoder_buffer_fullness (t)) is:
decoder_buffer_fullness (0) bits.
The arrival time of the subsequent picture is defined by the following equation (1).
[0006]
(Equation 1)

[0007]
The expression tai, earlist (n) was previously defined as the following expression (2).
However, it has recently been modified as in equation (3).
(Equation 2)

[Equation 3]

However, the latter is probably an edit error, eg, an edit error for the current definition of cpb_removal delay, and equation (4) below seems more appropriate.
(Equation 4)

Alternatively, cpb_removal can be defined to return a value equal to the following equation (5). However, in this specification, (4) is used.
(Equation 5)

[0008]
As in previous standards, the combined effect of all constraints and limitations of this model places constraints on the values of certain fields within that bitstream, other than, for example, constraining the size and frequency of pictures. That is. Although these equations do not mention the encoding timing, this constraint is probably made clearer by adding a virtual encoder to the model. That is, the predetermined constraint can be defined by the virtual encoder by using the encoder buffer and the appropriate tiling. An example of an encoding and decoding scenario according to this model is shown in FIG. The relationship with the above-described equations and the figures are as follows.
[0009]
The encoder buffer input is curve A. This is the same as equation (4).
The initial arrival time of each picture is obtained as a point on the decoder input curve K as defined by equation (1) above.
The decoder buffer output curve L corresponds to the ting indicated by the pb_removal_delay field and the initial_cpb_removal_delay.
[0010]
Note that the curve A is obtained by moving the curve L to the left by the value “decoding_delay”. That is, except for the time offset, these two curves define the same ting.
Some of the variables described in FIG. 4 are stored and transmitted in the bitstream, while others are only described in the (draft) standard. Table 1 below describes these variables in more detail.
[0011]
[Table 1]

[0012]
The characteristic “value according to k” indicates a case where the value of a variable can be changed when a different buffer model is applied. This is because the (draft) standard supports a multi-buffer model. Each buffer model k can, for example, have a different buffer capacity (B) and a different bit rate (R). In this specification, instead of describing B (k), R (k), and F (k), initial_cpb_removal_delay (k) and the like and shorter versions B, R, F, and initial_cpb_removal_delay improve reliability. Often used for. In this case, these are simply the variables of one predetermined model, such as, for example, buffer model 0.
[0013]
[Problems to be solved by the invention]
This prior art has the following problems:
The model itself is complicated.
As shown in the table above, there is a need to encode and insert a relatively large number of fields in a bitstream. This increases the bitstream overhead.
In order to calculate and generate these fields, it is necessary to realize a relatively complicated encoder, for example, as shown in FIG. As a result, it is necessary to realize a relatively complicated decoder as shown in FIG.
[0014]
This model effectively requires that the encoder define decoder_delay at the start of the bitstream. This is shown in FIG. For example, at t = 0, the first picture is transmitted (or written to a digital recording medium). Just before a picture is sent, a message must be sent. This message defines initial_cpb_removal_delay_offset and initial_cpb_removal_delay. This substantially sets a fixed value to decoder_delay.
However, in many applications, the one-pass encoder cannot set an appropriate value of decoder_delay at the start of the encoding process.
[0015]
Even without such explicit encoding of decoder_delay information in the bitstream, this model allows the encoder to specify decoder_delay sooner or later in many encoding scenarios. This occurs when R (t) goes to zero. (Another way to describe this scenario: With the attached virtual reference encoder, whenever the encoder buffer is zero, R (t) will be zero.) In such a case, upon data transfer, the associated initial_cpb_removal_delay is It is assumed to be equal to decoder_delay. Therefore, in such an encoder or an encoding system, decoder_delay is efficiently defined.
[0016]
Related issues: This model does not allow to encode shorter initial_cpb_removal_delay values. For example, it is not always allowed to encode the shortest initial_cpb_removal_delay value. As a result, such bitstreams cannot support the fastest possible decoder start-up.
[0017]
This model has a discontinuity problem. That is, at the random access point, the state of the buffer model is reset and lost. As a result, at these points, situations often arise that are inappropriate for supporting the following video. This is shown in FIG. In this example, after five pictures, another SEI message is inserted into the video stream. To comply with the constraint that the decoder buffer must be empty at this point, the previous picture (eg, the fifth picture) requires wasteful stuffing. Further, in many scenarios, subsequent pictures need to be decoded and provided after one picture period. As a result, only the Rate / Picture_Period bit is available by the first picture after such a random access point, even though such access points often require a relatively large amount of data. In other words, in such a case, this model is not practical. (So this could be outside the scope of this model)
[0018]
[Means for Solving the Problems]
The present invention relates to a method using an uninitialized buffer model for video, audio or other data.
The invention relates to a method using a non-zero start time of the buffer model.
The invention relates to a method using non-zero buffer fullness for said buffer model.
The present invention relates to a decoding delay parameter selective transmission (transmission) method.
The variable bit rate timing model is defined as a reference for a designer of an encoding system such as a video and a designer of a decoding system to perform interoperation. The model can be initialized to allow for continuous playback of the bitstream using certain non-zero characteristics. Certain timing information is hidden, reducing execution complexity and bit overhead.
[0019]
BEST MODE FOR CARRYING OUT THE INVENTION
This model is simplified using the following advancement steps. :
Instead of shifting a given decoder in relation to the timing event, the shift of the time base itself is allowed, allowing a time longer than the zero arrival time for the first bit of the first picture.
This is represented by the following equation (6).
[0020]
(Equation 6)

[0021]
This also applies to the modified model, as shown in FIG. The difference from the prior art is that the prior art permits only a zero value.
Equation (7) for the following picture arrival time is defined as it is in the related art. For example,
(Equation 7)

[0022]
However, using this improved rule tai (0), tai, earlist (n) is further simplified as in equation (8) below.
[0023]
(Equation 8)

[0024]
An embodiment of the present invention is shown in FIG. This is explained using the example of FIG.
[0025]
Another aspect of the invention is to allow non-zero buffer fullness at random access points. This is supported by equation (9) below.
[0026]
(Equation 9)

[0027]
This is explained using the timing scenario in FIG. 11 as an example. In this example, again, the encoder generates a random access point after the fifth picture. In this example (as is often the case in such a scenario), to achieve the best possible picture quality, the first picture after the random access point must be relatively large ( This is because intra coding is performed). With this modified model, there is no constraint to empty the buffer at this point, so a relatively large amount of data can be assigned to the first picture after the random access point instead of the fifth picture .
[0028]
Another aspect of the present invention is to hide decoder_delay in the encoding system to support certain types of one-pass encoding systems. This is done by omitting fields that directly carry decoder_delay, or omitting fields that carry this information indirectly (or assist in carrying it). The indirect encoding method describes the case where two or more fields carry decoder_delay, for example, as shown in FIG. In such a case, it is preferable to omit the least important fields in order to hide the decoder_delay. For example, in FIG. 6 (compared to the prior art), initial_cpb_removal_delay is omitted because it is more important than initial_cpb_removal_delay. This is because initial_cpb_removal_delay is used, for example, to minimize the setup delay of the decoder. When such information is concealed, there is no need to decode such information, as shown in FIG.
[0029]
Another aspect of the present invention is that decoder_delay can be obtained even when encoding is performed indirectly by a combination of two or more fields. (In the example described above, there are multiple fields: initial_cpb_removal_delay_offset and initial_cpb_removal_delay). Also, due to the following advancement steps, for example, decoder_delay is used, for example, by the decoder system, which selects the bitstream with the shortest startup delay characteristics. For example, a decoder system may extract decoder_delay (directly encoded or obtained) from such a bitstream and compare these decoder_delays to select the shortest bitstream.
[0030]
On the encoding side, a new kind of operation is possible. In particular, the multiplexer uses the decoder_delay of the video (directly encoded or obtained) to set up the appropriate audio encode buffer size. The multiplexer also uses the decoder_delay of the video (directly encoded or obtained) to set up an appropriate initial transmission delay for the audio data (associated or unrelated).
[0031]
The present invention can be applied to a case where non-video data has a bit rate characteristic like compressed video data. The predetermined encoded video data has the same characteristics as the bit rate characteristics of the predetermined encoded audio data. As such, all references to video and audio in this specification are to be replaced by the following general description of the above-described aspects of the invention.
[0032]
The multiplexer uses the decoder_delay of the relatively high bursty bit rate (directly encoded or obtained) to set up the appropriate buffer size for the associated low bursty bit rate. The multiplexer also uses the decoder_delay of the relatively high bursty bit rate (directly encoded or obtained) to set up an appropriate initial transmission delay for the associated low bursty bit rate.
[0033]
Other aspects of the invention support decoder_delay transmission and concealment to support all possible types of applications. This is supported by adding to the bitstream a tag indicating whether decoder_delay in the bitstream is (directly or indirectly) encoded. Certain types of one-pass encoding systems set the flag to zero to hide decoder_delay. Accordingly, decoder_delay can be increased when required in the encoding process. Other encoding systems set the flag to 1 to notify other systems that decoder_delay information is present in the bitstream, for example, as shown in FIG. The decoder can support both types of bit streams, for example, as in the flowchart shown in FIG.
[0034]
Even without such a directly or indirectly decoded decoder_delay value, the encoding system needs to fix the decoder_delay value when R (t) goes to zero. This is a problem with certain one-pass encodings. Both this problem and the problem that the encoding of the minimum setup delay is not always possible are solved in the following way. If the encoder buffer is not empty, instead of requesting tai (n) = taf (n-1), the following equation is used:
tai (n)> = taf (n-1)
That is, tai (n)> = taf (n-1) is permitted when the encoder buffer is empty.
(Other restrictions that the decoder buffer does not underflow or overflow still apply.)
[0035]
The encoding system can specify such timing with the appropriate initial_cpb_delay value (or with a field with similar meaning). In the example of 4c, this is added, for example, by increasing the value of taf (n-1) before the max () operation.
[0036]
【The invention's effect】
The advantages of the invention (as compared to the prior art) are as follows:
This buffer model is simplified.
The initial_cpb_removal_delay_offset becomes redundant, thus reducing bitstream overhead.
The encoder does not need to calculate and encode these fields.
(4) The decoder does not decode and process these fields.
(5) The model is suitable for supporting random access scenarios.
(6) The present encoder does not need to determine decoder_delay at the start of a stream.
[Brief description of the drawings]
FIG. 1 is a diagram for explaining achieving interoperability using a virtual reference decoder.
FIG. 2 is a diagram for explaining a virtual reference decoder model of the JVT.
FIG. 3 is a diagram for explaining details of a timing model according to the related art.
FIG. 4 is a diagram for explaining an example of a compatible bit stream.
FIG. 5 is a diagram for explaining an example of an HRD related to a part of the realization of an encoder.
FIG. 6 is a diagram for explaining an example of an HRD related to a part of the realization of a decoder;
FIG. 7 is a diagram for explaining an operation of a model at a random access time;
FIG. 8 is a diagram for explaining a model according to the present invention.
FIG. 9 is a diagram for explaining detailed timing of a model according to the present invention.
FIG. 10 is a diagram for explaining in more detail how to calculate tai (n), taf (n) and initial_cbp_removal_delays according to the present invention.
FIG. 11 is a diagram for explaining an example of a conforming bit stream according to the present invention.
FIG. 12 is a diagram for explaining a model operation at a random access point.
FIG. 13 is a diagram for explaining an example of a simplified HRD related to a part of the implementation of the encoder.
FIG. 14 is a diagram illustrating an example of a simplified HRD related to a part of the implementation of a decoder;
FIG. 15 is a diagram illustrating an encoder that generates a maximum coded picture buffer delay (the present invention).
FIG. 16 is a diagram illustrating a decoder process for processing a picture buffer delay that is conditionally coded maximum.
[Explanation of symbols]
11 Encoder

Claims

Method using non-initialized buffer model for video, audio or other data

A method using a non-zero start time of the buffer model.

A method using non-zero buffer fullness for the buffer model.

Selective transmission (transmission) method of decoding delay parameter.