JP3174042B6

JP3174042B6 - B-VOP time decoding method

Info

Publication number: JP3174042B6
Application number: JP2000305131A
Authority: JP
Inventors: ケン・タンティオ; メイ・シェンシェン; ジュー・リーチャク
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1996-07-05
Filing date: 1997-07-03
Publication date: 2007-01-10
Anticipated expiration: 2017-07-03

Description

【０００１】
【発明の属する技術分野】
本発明は、独立して符号化された複数のオーディオビジュアル対象をプレゼンテーションのために同期させることが必要なディジタルオーディオビジュアルマテリアルの符号化に有効である。本発明は、オーディオビジュアルマテリアルの時間的サンプリングが同一でない場合に特に役立つものである。
【０００２】
【従来の技術】
ＭＰＥＧ１およびＭＰＥＧ２において、入力ビデオは、規則的な時間間隔でサンプリングされた画像フレームからなる。これは、その入力ビデオのもっとも緻密な時間解像度を表現する。図１は、画像フレームが規則的な間隔でサンプリングされる一定のフレームレートによるビデオシーケンスを示す。ＭＰＥＧ１規格およびＭＰＥＧ２規格を用いたビデオシーケンスの符号化表現においては、復号化されたフレームの表示順序は参照時刻(temporal reference)によって表される。このパラメータは、ビットストリームシンタックスのピクチャーヘッダーに記述される。このパラメータの値は、表示順序を検査するときにそれぞれの復号化されるフレームごとに１つだけインクリメントされる。
【０００３】
Ｈ．２６３規格においては、フレームをスキップしてもよく、したがって、可変フレームレートビデオシーケンスを復号化することができる。しかしながら、フレームのサンプリングは不変のままである。このように、ＭＰＥＧ１およびＭＰＥＧ２において用いられる時刻参照方法は依然として適切なものであり、１だけインクリメントするのではなく（１＋入力フレームレートにおける非転送ピクチャー数）だけインクリメントするような修正が必要なだけである。
【０００４】
現在、多重ビデオ対象画においてビデオを独立した対象として符号化する研究開発がなされている。これは、それぞれのビデオ対象の復号化と同期とにおける新しい概念を表現するものである。これらの個々のビデオ対象画は、複数のソースから発生してもよく、また、まったく異なるフレームレートを有してもよいことが期待されている。対象のあるものは、ほぼ連続的な時間的サンプリングレートを有してもよい。これらのビデオ対象画は、組み合わせられ、表示される際には、合成画像となる。したがって、この合成のためにはある種の同期が必要となる。表示フレームレートは、どのビデオ対象画のフレームレートとも異なっていてもよい。図２は、互いに異なったフレームレートを有する２つのビデオ対象画の例を示す。たとえ２つのビデオ対象画の間の共通のフレームレートを捜し出すことができたとしても、必然的にそのフレームレートが合成処理装置の出力フレームレートと同じものになることはない。
【０００５】
以下、ビデオの領域における問題について述べるが、同様の本発明の原理は、オーディオの領域にも拡張することができ、また、この２つを組み合わせた領域にも拡張することができる。
【０００６】
この技術分野における現在の状況は、ビデオ対象画の同期に関する要求を満たしていないことは上述したことから明らかである。また、この技術分野における現在の状況は、異なったビデオ対象画が互いの倍数ではない異なったフレームレートを有する場合に、共通の参照時刻を提供しない。
【０００７】
第１の問題は、共通のローカル時刻基準メカニズムをそれぞれのビデオ対象画にどのように提供するかである。この時刻基準は、非常に緻密な時間的粒度(temporal granularity)を提供することができるとともに、ビデオ対象画の２つの連続する時点(instance)の間に非常に長い間隔があり得ることにも対処することができなければならない。
【０００８】
第２の問題は、異なったフレームレートを有するビデオ対象画を同期させるためのメカニズムをどのようにして提供するかである。
【０００９】
【発明が解決しようとする課題】
上述の問題は、すべてのローカル時刻基準に対して使用される共通の時間解像度を導入することによって解決することができる。広い範囲にわたる時間的粒度を提供するために、ローカル時刻基準は２つの異なった部分に分割される。第１の部分は、短い時刻基準を提供する緻密な粒度を有する時間解像度を含む。第２の部分は、長い時刻基準を提供する粗な粒度を有する時間解像度を含む。短い時刻基準は、それぞれのビデオ対象画に含まれ、ビデオ対象画の時点に参照時刻を提供する。そして、この短い時刻基準が、すべてのビデオ対象画に共通の長い時刻基準に同期させられる。この長い時刻基準は、すべての様々なビデオ対象画を、マスタークロックによって提供される共通の時刻基準に同期させるのに使用される。
【００１０】
【課題を解決するための手段】
本発明は、圧縮データに含まれるB-VOP（双方向予測により符号化されたVOP）の時刻を復号化する方法であって、前記圧縮データは、１秒単位の増分を表すモジュロ時刻基準増分と、1秒より短い増分を表すVOP時刻基準増分とを含んでおり、前記圧縮データから、表示順序における前記B-VOPの直前のI-VOP（イントラ符号化されたVOP）またはP-VOP（予測符号化されたVOP）の1秒単位の時刻を取得し、前記B-VOPの前記モジュロ時刻基準を復号化し、前記B-VOPのVOP時刻基準増分を復号化し、前記取得された1秒単位の時刻に、前記復号化されたモジュロ時刻基準増分と前記復号化されたVOP時刻基準増分とを加算した結果を前記B-VOPの時刻とするVOPの時刻復号化方法である。
【００１１】
【発明の実施の形態】
本発明は２つの時刻表示形式を用いることにより動作する。その第１の時刻表示形式は、ビデオ対象画に付加される短い時刻基準である。以下、この時刻基準をＶＯＰ（Video Object Plane すなわちビデオ対象画）時刻増分と呼ぶ。このＶＯＰ時刻増分は、復号化され互いに合成されるビデオ対象画のグループに付加される長い時刻基準と関連してビデオ対象画に用いられるタイミングとして作用する。この長い時刻基準をモジュロ時刻基準と呼ぶ。そして、これらのＶＯＰ時刻増分とモジュロ時刻基準とが連係して使用され、ビデオ対象画を表示のための最終的な合成シーケンスに合成するのに使用するための実際の時刻基準を決定する。
【００１２】
ビットストリームを編集し、異なったソースからの異なったビデオ対象画を新しいグループのビデオ対象画に合成するのを容易にするためには、共通時刻基準から個々のビデオ対象画のローカル時刻基準までの一定のオフセット値を提供する第３の成分が必要となる。以下では、このオフセットはＶＯＰ時刻オフセットと呼ばれる。これは、異なった対象画がモジュロ時刻基準の間隔に等しい粒度で同期しなければならないことを防止する。この成分は、一緒に多重化されるビデオ対象画が属するグループの中のそれぞれのビデオ対象画に対して不変のものであるべきである。
【００１３】
まず最初に、モジュロ時刻基準について説明する。
【００１４】
モジュロ時刻基準は、ローカル時刻基準の粗な解像度を表現する。それは、ＶＯＰ時刻増分のような値を有するものではない。実際には、それは、ＶＯＰ時刻増分をビデオ対象画のローカル時刻基準に同期させるためのより重要な同期メカニズムである。それは符号化されたビットストリームにマーカーとして配置され、それに続くビデオ対象画のＶＯＰ時刻増分がリセットされなければならないことを表し、また、ローカル時刻基準の参照は１つかまたはそれ以上のモジュロ時刻基準の間隔の単位でインクリメントされなければならないことを表す。図３、図４、図５ないし図１０、および、図１１において、モジュロ時刻基準は、ビットストリームヘッダーにおいて、ＶＯＰ時刻増分の前に挿入された連続する“１”とそれに続く“０”で表される。連続する“1”の数は、ゼロかそれ以上である。ビットストリームに挿入される“１”の数は、最後のＩ−ＶＯＰまたはＰ−ＶＯＰから経過したモジュロ時刻基準の単位数に依存する。符号器および復号器では、モジュロ時刻基準のカウンターは、“１”を検出するたびに１だけインクリメントされる。モジュロ時刻基準のカウンターは長さが有限であり、したがって、実際のシステムでは、モジュロ時刻基準は、その最大値を越えた場合には０にリセットされる。典型的なビデオシーケンスにおいては、ビデオ対象画はＶＯＰのグループを形成する。したがって、モジュロ時刻基準は、通常、このＶＯＰグループの開始点でリセットされる。
【００１５】
つぎに、ＶＯＰ時刻増分について説明する。
【００１６】
ＶＯＰ時刻増分は、ビデオ対象画のもっとも短い時間的サンプリングを利用することのできる単位によるものでなければならない。それは、対象画に用いられる負の時刻基準であってもよい。したがって、それは、要求されるもっとも緻密な時間解像度の粒度あるいは利用することのできるもっとも緻密な時間解像度の粒度を表現する。
【００１７】
そして、ＶＯＰ時刻増分は、グローバル時刻基準の間隔／ローカル時刻基準の解像度の比よりも大きいかまたは等しい有限長の数によって表されてもよい。図３は、ＩおよびＰ−ビデオ対象画に用いられるＶＯＰ時刻増分およびモジュロ時刻基準の参照の例を示す。絶対時刻基準が使用される。ＶＯＰ時刻増分は、モジュロ時刻基準が検出されるたびに毎回リセットされる。図４は、Ｉ、Ｐ、および、Ｂ−ビデオ対象画を用いたもう１つの例を示す。Ｂ−ビデオ対象画においてモジュロ時刻基準が同じように繰り返されることを除けば動作は同じである。もしＢ−ビデオ対象画においてモジュロ時刻基準が同じように繰り返されなければ、復号化およびプレゼンテーションの順序の相違による曖昧さが発生する。このことは以下で詳述される。
【００１８】
ＶＯＰ時刻増分はプレゼンテーション時刻基準に対応するので、符号化の順序がプレゼンテーションの順序と異なる場合に潜在的な問題が発生する。これは、Ｂ−ビデオ対象画によって発生する。ＭＰＥＧ−１およびＭＰＥＧ−２のＢ−ピクチャーと同じように、Ｂ−ビデオ対象画は、たとえそれらのプレゼンテーション順序が参照Ｉ−ビデオ対象画および参照Ｐ−ビデオ対象画より前であったとしても、それらの参照ビデオ対象画の後に符号化される。ＶＯＰ時刻増分は有限でありかつモジュロ時刻基準に基づく相対的なものなので、モジュロ時刻基準が検出された場合にはＶＯＰ時刻増分はリセットされる。しかしながら、Ｂ−ビデオ対象画に対する符号化の順序は遅れたままである。図５ないし図８は起こり得る曖昧さを示す。いつＶＯＰ時刻増分がリセットされるべきかを判定することはできない。実際に、図５に示されるような符号化されるイベントのシーケンスが与えられた場合、それが、図６、図７、および、図８のどのタイミング位置を表現しようとしているのかを知ることはできない。この問題は、異なった符号化の順序とプレゼンテーションの順序とが混在するすべての異なったタイプのビデオ対象画に共有される１つのモジュロ時刻基準を使用するために起こるのである。符号化の順序に対してなすことができることはなにもない。なぜなら、この参照情報はＢ−ビデオ対象画が必要とするからである。また、異なった予測形態のそれぞれが、独自のモジュロ時刻基準を有することは好ましくない。
【００１９】
つぎに、ＶＯＰ時刻オフセットについて説明する。
【００２０】
上述したことに加えて、モジュロ時刻基準はすべてのビデオ対象画の間で共有される。これは、異なったビデオ対象画間の同期がモジュロ時刻基準の間隔に等しい粒度を有することを意味する。これは、異なったグループからのビデオ対象画が組み合わせられてビデオ対象画の新しいグループを形成する場合には特に受け入れることができない。図１１は、互いにずれた２つの異なるローカル時刻基準によって符号化された２つの異なるビデオ対象画の例を示す。このように、これらのビデオ対象画が多重化される場合、ビデオ対象画の同期もまたずれたものとなる。個々のビデオ対象画のそれぞれにＶＯＰ時刻オフセットを持たせることによってより緻密な粒度が達成される。このことは、ビデオ対象画が操作され多重化される場合に、この値だけが変更されることを意味する。ＶＯＰ時刻増分を変更する必要がないだけでなく、粗な粒度を有するタイミング差を用いることなく異なったビデオ対象画を多重化することができる。図１１は、この時刻基準オフセットの使用を説明する。
【００２１】
本発明の好ましい実施例は、個々のビデオ対象画ビットストリームのそれぞれに用いられる時刻基準を符号化する方法と、異なったビデオ対象画を多重化して共通の時刻基準にする方法と、多重化されたビットストリームを成分に多重分離する方法と、成分ビットストリームから時刻基準を再生する方法とを含む。
【００２２】
つぎに、時刻基準の符号化を説明する。
【００２３】
時刻基準を符号化する実施例のフローチャートが図１２に示される。符号器においては、ステップ１において、まずローカル時刻基準がローカル開始時刻に初期化される。処理はステップ２に移り、そこで、符号器がローカル時刻基準の現在の値を判定する。ステップ３において、得られたローカル時刻基準があらかじめ符号化されたモジュロ時刻基準と比較され、その間隔がモジュロ時刻基準の間隔を越えているかどうかを検査する。もしその間隔を越えていれば、制御はステップ４に移り、そこで、必要な数のモジュロ時刻基準がビットストリームに挿入される。もしその間隔を越えていなければ、特別の処理は必要とされない。そして、処理はステップ５に進み、そこで、ＶＯＰ時刻増分がビットストリームに挿入される。つぎに、ステップ６において、対象画が符号化されてビットストリームに挿入される。そして、符号器は、ステップ７において、符号化されるべきさらなる対象画があるかどうかを判定するための検査を行う。もし符号化されるべき対象画があれば、処理はステップ２に戻り、そこで、ローカル時刻基準を得る。もし符号化されるべき対象画がなければ、処理は終了する。
【００２４】
Ｉ／Ｐ−ビデオ対象画およびＢ−ビデオ対象画のそれぞれに対する絶対および相対のＶＯＰ時刻増分を決定するために、以下の式が使用される。
ｔ_GTBn＝ｎ×ｔ_GTBI＋ｔ_GTB0 （ｎ＝０，１，２，３，．．）（１）
ｔ_AVTI＝ｔ_ETBI/P−ｔ_GTBn （２）
ｔ_RVTI＝ｔ_ETBB−ｔ_ETBI/P （３）
ここで、ｔ_GTBnは、ｎ番目の符号化されたモジュロ時刻基準によって表される符号器時刻基準である。
【００２５】
ｔ_GTBIは、予め定められたモジュロ時刻基準の間隔である。
【００２６】
ｔ_GTB0は、符号器時刻基準の開始時刻である。
【００２７】
ｔ_AVTIは、ＩまたはＰ−ビデオ対象画に対する絶対ＶＯＰ時刻増分である。
【００２８】
ｔ_ETBI/Pは、ＩまたはＰ−ビデオ対象画の符号化の開始点での符号器時刻基準である。
【００２９】
ｔ_RVTIは、Ｂ−ビデオ対象画に対する相対ＶＯＰ時刻増分である。
【００３０】
ｔ_ETBBは、Ｂ−ビデオ対象画の符号化の開始点での符号器時刻基準である。つぎに、複数のビデオ対象画の多重化について説明する。
【００３１】
複数のビデオ対象画が１つに多重化される場合、多重化装置は、多重ビデオ対象画のビットストリームを検査して同期だけでなく多重化の順序をも判定する。これに含まれる動作が図１３に示される。ステップ１１において、多重化されるべきそれぞれのビデオ対象画に対するＶＯＰ時刻オフセットがビットストリームに挿入される。つぎに、ステップ１２において、多重化されるべきビデオ対象画のすべてのビットストリームが検査され、すべてのビデオ対象画がそれらのそれぞれのモジュロ時刻基準であるかどうかを判定する。もしそうであれば、処理はステップ１３に進み、そこで、共通モジュロ時刻基準が、多重化されたビットストリームに挿入される。もしそうでなければ、処理はステップ１４に進み、そこで、次の符号化されたビデオ対象画が、多重化されたビットストリームに挿入される。ステップ１５において、多重化されるべきビデオ対象画のビットストリームが、多重化されるべきさらなるビデオ対象画があるかどうかを再度検査される。もしあれば、制御は再びステップ１２に進む。もしなければ、この処理を終了する。
【００３２】
つぎに、複数のビデオ対象画を含むビットストリームの多重分離について説明する。
【００３３】
多重ビデオ対象画を含むビットストリームの多重分離が図１４に示される。この処理はステップ２１から始まり、そこで、ＶＯＰ時刻オフセットが復号化され、同期に用いるために復号器に送られる。そして、ステップ２２において、多重化されたビットストリームが検査され、モジュロ時刻基準が検出されたかどうかを判定する。もしモジュロ時刻基準が検出されたならば、処理はステップ２３に進み、そこで、モジュロ時刻基準がすべてのビデオ対象画ビットストリームに挿入される。もしモジュロ時刻基準が検出されなければ、処理はステップ２４に進み、そこで、次のビデオ対象画が検査されて適切なビデオ対象画ビットストリームに挿入される。最後に、多重化されたビットストリームが再度検査され、多重分離すべきさらなるビデオ対象画があるかどうかを判定する。もしあれば、処理は再びステップ２２に進む。もしなければ、この処理は終了する。
【００３４】
つぎに、時刻基準の再生について説明する。
【００３５】
時刻基準を再生する実施例が図１５に示される。ローカル時刻基準を再生するとき、処理はステップ３１から始まり、そこで、多重分離装置によって復号化されたＶＯＰ時刻オフセットを考慮してローカル時刻基準が初期化される。そして、処理はステップ３２に進み、そこで、ビットストリームを検査してモジュロ時刻基準が復号化されたかどうかを判定する。もしモジュロ時刻基準が復号化されていれば、処理はステップ３３に進み、そこで、ローカル時刻基準がモジュロ時刻基準の増分だけインクリメントされる。そして、処理はステップ３７に進む。もしモジュロ時刻基準が復号化されていなければ、処理はステップ３４に進み、そこで、ビデオ対象画が検査され、それがＢ−ビデオ対象画かどうかが判定される。もしＢ−ビデオ対象画であれば、処理はステップ３５に進み、そこで、式（６）に基づいてＢ−ビデオ対象画の復号化時刻基準が計算される。そして、処理はステップ３７に進む。もしステップ３４の結果がＢ−ビデオ対象画でなければ、処理はステップ３６に進み、そこで、式（５）に基づいて復号化時刻基準が計算される。そして、処理はステップ３７に進む。ステップ３７において、ビットストリームが検査され、復号化すべきさらなるビデオ対象画があるかどうかが判定される。もしあれば、処理は再びステップ３２に進む。もしなければ、この処理は終了する。
【００３６】
ビデオ対象画のプレゼンテーションタイムスタンプを判定するために、以下の式が使用される。
ｔ_GTBn ＝ｎ×ｔ_GTBI＋ｔ_GTB0 （ｎ＝０，１，２，３，．．）（４）
ｔ_DTBI/P＝ｔ_AVTI＋ｔ_GTBn （５）
ｔ_DTBB ＝ｔ_RVTI＋ｔ_DTBI/P （６）
ここで、ｔ_GTBnは、ｎ番目の復号化されたモジュロ時刻基準によって表される復号化時刻基準である。
【００３７】
ｔ_GTBIは、予め定められたモジュロ時刻基準の間隔である。
【００３８】
ｔ_GTB0は、復号化時刻基準の開始時刻である。
【００３９】
ｔ_DTBI/Pは、ＩまたはＰ−ビデオ対象画の復号化の開始点での復号化時刻基準である。
【００４０】
ｔ_AVTIは、ＩまたはＰ−ビデオ対象画に対する復号化された絶対ＶＯＰ時刻増分である。
【００４１】
ｔ_DTBBは、Ｂ−ビデオ対象画の復号化の開始点での復号化時刻基準である。
【００４２】
ｔ_RVTIは、Ｂ−ビデオ対象画に対する復号化された相対ＶＯＰ時刻増分である。
【００４３】
つぎに、ビットストリーム符号器の実施例について説明する。
【００４４】
図１６は、モジュロ時刻基準およびＶＯＰ時刻増分を符号化するためのビットストリーム符号器の実施例を説明するブロック構成図である。この説明のために、図４に示される例が使用される。双方向予測が使用されるので、符号化の順序は、図４に示されるプレゼンテーションの順序とは異なる。符号化の順序は、Ｂ−ＶＯＰよりも前に、Ｉ−ＶＯＰとそれに続くＰ−ＶＯＰから開始される。これを以下の３つの段落で説明する。
【００４５】
処理はイニシャライザであるステップ４１から始まり、そこで、ビットストリーム符号器は、ローカル時刻基準レジスタを時刻符号の初期値に初期化することから始める。これと同じ時刻符号の値がビットストリームの中に符号化される。次のＩ−ＶＯＰの符号化の開始点において、時刻符号比較器であるステップ４２が、Ｉ−ＶＯＰのプレゼンテーション時刻をローカル時刻基準レジスタと比較する。その結果がモジュロ時刻基準符号器であるステップ４３に送られる。モジュロ時刻基準符号器は、経過したモジュロ時刻基準増分の数に等しい必要な数の“１”をビットストリームに挿入する。そして、モジュロ時刻基準符号の終わりを示すためにこれにシンボル“０”が続く。ローカル時刻基準レジスタが現在のモジュロ時刻基準に更新される。そして、処理は、ＶＯＰ時刻基準増分符号器であるステップ４４に進み、そこで、Ｉ−ＶＯＰのプレゼンテーション時刻符号の残りの部分が符号化される。
【００４６】
この処理が、Ｐ−ＶＯＰである次に符号化されるビデオ対象画に反復される。時刻符号比較器であるステップ４２は、Ｐ−ＶＯＰのプレゼンテーション時刻をローカル時刻基準レジスタと比較する。その結果がモジュロ時刻基準符号器であるステップ４３に送られる。モジュロ時刻基準符号器は、経過したモジュロ時刻基準増分の数に等しい必要な数の“１”を挿入する。そして、モジュロ時刻基準符号の終わりを示すためにこれにシンボル“０”が続く。Ｂ−ＶＯＰ時刻基準レジスタがローカル時刻基準レジスタの値にセットされ、ローカル時刻基準レジスタは現在のモジュロ時刻基準に更新される。そして、処理はＶＯＰ時刻基準増分符号器であるステップ４４に進み、そこで、Ｐ−ＶＯＰのプレゼンテーション時刻符号の残りの部分が符号化される。
【００４７】
そして、この処理が、Ｂ−ＶＯＰである次に符号化されるビデオ対象画に反復される。時刻符号比較器であるステップ４２は、Ｂ−ＶＯＰのプレゼンテーション時刻をＢ−ＶＯＰ時刻基準レジスタと比較する。その結果がモジュロ時刻基準符号器であるステップ４３に送られる。モジュロ時刻基準符号器は、経過したモジュロ時刻基準増分の数に等しい必要な数の“１”を挿入する。そして、モジュロ時刻基準符号の終わりを示すためにこれにシンボル“０”が続く。Ｂ−ＶＯＰ時刻基準レジスタとローカル時刻基準レジスタのいずれもが、Ｂ−ＶＯＰの処理の後では変更されない。そして、処理はＶＯＰ時刻基準増分符号器であるステップ４４に進み、そこで、Ｂ−ＶＯＰのプレゼンテーション時刻符号の残りの部分が符号化される。
【００４８】
ローカル時刻基準レジスタは、次のＶＯＰグループの始まりを表す次のＩ−ＶＯＰでリセットされる。
【００４９】
つぎに、ビットストリーム復号器の実施例について説明する。
【００５０】
図１７は、プレゼンテーションタイムスタンプを再生するためにモジュロ時刻基準およびＶＯＰ時刻増分に用いられる復号器の実施例を説明するブロック構成図である。符号器の実施例のときと同じように、図４に示される例が使用される。復号化の順序は符号化の順序と同じであり、Ｂ−ＶＯＰよりも前に、Ｉ−ＶＯＰとそれに続くＰ−ＶＯＰが復号化される。これが以下の段落で説明される。
【００５１】
処理はイニシャライザであるステップ５１から始まり、そこで、ローカル時刻基準レジスタが、ビットストリームから復号化された時刻符号の値にセットされる。そして、処理はモジュロ時刻基準復号器であるステップ５２に進み、そこで、モジュロ時刻基準増分が復号化される。復号化されるモジュロ時刻基準増分の総数は、シンボル“０”の前に復号化される“１”の数によって与えられる。次に、ＶＯＰ時刻基準増分が、ＶＯＰ時刻基準増分復号器であるステップ５３において復号化される。時刻基準計算器であるステップ５４において、Ｉ−ＶＯＰのプレゼンテーション時刻が再生される。復号化されたモジュロ時刻基準増分の合計値がローカル時刻基準レジスタに加算される。そして、ＶＯＰ時刻基準増分が、ローカル時刻基準レジスタに加算され、Ｉ−ＶＯＰのプレゼンテーション時刻が得られる。そして、処理はビデオ対象画復号器であるステップ５５に進み、そこで、ビデオ対象画が復号化される。
【００５２】
Ｐ−ＶＯＰに対しては、モジュロ時刻基準復号器であるステップ５２において処理が反復され、そこで、モジュロ時刻基準増分が復号化される。復号化されるモジュロ時刻基準増分の総数は、シンボル“０”の前に復号化される“１”の数によって与えられる。次に、ＶＯＰ時刻基準増分が、ＶＯＰ時刻基準増分復号器であるステップ５３において復号化される。時刻基準計算器であるステップ５４において、Ｐ−ＶＯＰのプレゼンテーション時刻が再生される。Ｂ−ＶＯＰモジュロ時刻基準レジスタが、ローカル時刻基準レジスタの値にセットされる。復号化されたモジュロ時刻基準増分の合計値がローカル時刻基準レジスタに加算される。そして、ＶＯＰ時刻基準増分が、ローカル時刻基準レジスタに加算され、Ｐ−ＶＯＰのプレゼンテーション時刻が得られる。処理はビデオ対象画復号器に進み、そこで、ビデオ対象画が復号化される。
【００５３】
Ｂ−ＶＯＰに対しては、モジュロ時刻基準復号器であるステップ５２において処理が反復され、そこで、モジュロ時刻基準増分が復号化される。復号化されるモジュロ時刻基準増分の総数は、シンボル“０”の前に復号化される“１”の数によって与えられる。次に、ＶＯＰ時刻基準増分が、ＶＯＰ時刻基準増分復号器であるステップ５３において復号化される。時刻基準計算器であるステップ５４において、Ｂ−ＶＯＰのプレゼンテーション時刻が再生される。復号化されたモジュロ時刻基準増分の合計値とＶＯＰ時刻基準増分とが、Ｂ−ＶＯＰ時刻基準レジスタに加算され、Ｂ−ＶＯＰのプレゼンテーション時刻が得られる。Ｂ−ＶＯＰ時刻基準レジスタとローカル時刻基準レジスタのいずれもが変更されないままである。そして、処理はビデオ対象画復号器に進み、そこで、ビデオ対象画が復号化される。
【００５４】
ローカル時刻基準レジスタは、次のＶＯＰグループの始まりを表す次のＩ−ＶＯＰでリセットされる。
【００５５】
つぎに、具体的な例を説明する。
【００５６】
図１８を参照すると、圧縮されたデータをビットストリームデータに符号化するステップの例が示される。図１８の上側の行に示されるように、圧縮されたビデオデータＶＯＰは、表示順に、Ｉ１、Ｂ１、Ｂ２、Ｐ１、Ｂ３、Ｐ２の順序で一列に並べられ、ＧＯＰ（グループオブピクチャー）ヘッダーがＶＯＰグループの開始点に挿入される。表示されるとともに、その表示が実行されるローカル時刻が、ローカル時刻クロックを用いてそれぞれのＶＯＰに関して判定される。例えば、第１のＶＯＰ（Ｉ１−ＶＯＰ）は、ビデオデータのまさに開始点からカウントされる１時２３分４５秒３５０ミリ秒（１：２３：４５：３５０）に表示され、第２のＶＯＰ（Ｂ１−ＶＯＰ）は、１：２３：４５：７５０に表示され、また、第３のＶＯＰ（Ｂ２−ＶＯＰ）は、１：２３：４６：１５０に表示され、以下も同様である。
【００５７】
ＶＯＰを符号化するためには、それぞれのＶＯＰに表示時刻データを挿入することが必要である。もし、時、分、秒、およびミリ秒を含む完全な形で時刻データを挿入するとすれば、それぞれのＶＯＰのヘッダー部分にかなりのデータ領域が必要である。本発明の目的は、そのようなデータ領域を減少させることであり、また、それぞれのＶＯＰに挿入されるべき時刻データを単純化することである。
【００５８】
図１８の１番上の横列に示されるＶＯＰのそれぞれは、ミリ秒からなる表示時刻データをＶＯＰ時刻増分領域に記憶する。また、１番上の横列にあるＶＯＰのそれぞれは、一時的に、時、分、秒からなる表示時刻データも記憶する。ＧＯＰヘッダーは、第１のＶＯＰ（Ｉ１−ＶＯＰ）に用いられる時、分、秒からなる表示データを記憶する。
【００５９】
図１８の２番目の横列に示されるように、ＶＯＰは、バッファー（図示せず）を用いて予め定められた時間だけ遅延させられる。双方向予測方式によれば、バッファーからＶＯＰが生成されるときにＶＯＰの順序が変わるので、双方向のＶＯＰすなわちＢ−ＶＯＰは、そのＢ−ＶＯＰが参照するＰ−ＶＯＰの後に位置すべきである。したがって、ＶＯＰは、Ｉ１、Ｐ１、Ｂ１、Ｂ２、Ｐ２、Ｂ３の順序で一列に並べられる。
【００６０】
図１８の３番目の横列に示されるように、時刻Ｔ１において、すなわち、ＧＯＰヘッダーがまさに符号化されるときに、ＧＯＰヘッダーに記憶された時、分、秒のデータがそのままローカル時刻基準レジスタに記憶される。図１８に示される例では、ローカル時刻基準レジスタは、１：２３：４５を記憶する。そして、時刻Ｔ２よりも前において、ＧＯＰヘッダーに対応するビットストリームデータが得られ、図１８の下側に示されるように時、分、秒のデータが挿入される。
【００６１】
そして、時刻Ｔ２において、第１のＶＯＰ（Ｉ１−ＶＯＰ）が取り込まれる。時刻符号比較器が、ローカル時刻基準レジスタに記憶された時刻（時、分、秒）を第１のＶＯＰ（Ｉ１−ＶＯＰ）に一時的に記憶された時刻（時、分、秒）と比較する。この例によれば、比較結果は同じとなる。したがって、比較器は、第１のＶＯＰ（Ｉ１−ＶＯＰ）がローカル時刻基準レジスタに保持されている秒と同じ秒において発生したことを表す“０”を生成する。比較器によって生成された“０”がそのまま第１のＶＯＰ（Ｉ１−ＶＯＰ）のモジュロ時刻基準領域に付与される。それと同時に、第１のＶＯＰ（Ｉ１−ＶＯＰ）に一時的に記憶された時、分、秒のデータは除去される。したがって、時刻Ｔ３よりも前において、第１のＶＯＰ（Ｉ１−ＶＯＰ）に対応するビットストリームデータが得られ、“０”がモジュロ時刻基準領域に挿入され、“３５０”がＶＯＰ時刻増分領域に挿入される。
【００６２】
次に、時刻Ｔ３において、第２のＶＯＰ（Ｐ１−ＶＯＰ）が取り込まれる。時刻符号比較器が、ローカル時刻基準レジスタに記憶された時刻（時、分、秒）を第２のＶＯＰ（Ｐ１−ＶＯＰ）に一時的に記憶された時刻（時、分、秒）と比較する。この例によれば、比較の結果は、第２のＶＯＰ（Ｐ１−ＶＯＰ）に一時的に記憶された時刻は、ローカル時刻基準レジスタに記憶された時刻よりも１秒だけ大きいこととなる。したがって、比較器は、第２のＶＯＰ（Ｐ１−ＶＯＰ）がローカル時刻基準レジスタに保持されている秒の次の１秒において発生したことを表す“１０”を生成する。もし第２のＶＯＰ（Ｐ１−ＶＯＰ）が、ローカル時刻基準レジスタに保持されている秒の次のさらにその次の秒において発生すれば、比較器は“１１０”を生成する。
【００６３】
時刻Ｔ３よりも後において、Ｂ−ＶＯＰ時刻基準レジスタは、時刻Ｔ３の直前にローカル時刻基準レジスタに保持されている時刻と等しい時刻がセットされる。この例では、Ｂ−ＶＯＰ時刻基準レジスタには、１：２３：４５がセットされる。また、時刻Ｔ３よりも後において、ローカル時刻基準レジスタは、第２のＶＯＰ（Ｐ１−ＶＯＰ）に一時的に記憶されている時刻に等しい時刻にインクリメントされる。したがって、この例においては、ローカル時刻基準レジスタは、１：２３：４６にインクリメントされる。
【００６４】
比較器によって生成された結果として得られた“１０”がそのまま第２のＶＯＰ（Ｐ１−ＶＯＰ）のモジュロ時刻基準領域に付与される。それと同時に、第２のＶＯＰ（Ｐ１−ＶＯＰ）に一時的に記憶された時、分、秒のデータが除去される。したがって、時刻Ｔ４よりも前において、第２のＶＯＰ（Ｐ１−ＶＯＰ）に対応するビットストリームデータが得られ、“１０”がモジュロ時刻基準領域に挿入され、“５５０”がＶＯＰ時刻増分領域に挿入される。
【００６５】
そして、時刻Ｔ４において、第３のＶＯＰ（Ｂ１−ＶＯＰ）が取り込まれる。時刻符号比較器が、Ｂ−ＶＯＰ時刻基準レジスタに記憶された時刻（時、分、秒）を第３のＶＯＰ（Ｂ１−ＶＯＰ）に一時的に記憶された時刻（時、分、秒）と比較する。この例によれば、比較の結果は同じとなる。したがって、比較器は、第３のＶＯＰ（Ｂ１−ＶＯＰ）がＢ−ＶＯＰ時刻基準レジスタに保持されている秒と同じ秒において発生したことを表す“０”を生成する。比較器によって生成された結果として得られた“０”がそのまま第３のＶＯＰ（Ｂ１−ＶＯＰ）のモジュロ時刻基準領域に付与される。それと同時に、第１のＶＯＰ（Ｉ１−ＶＯＰ）に一時的に記憶された時、分、秒のデータが除去される。したがって、時刻Ｔ５よりも前において、第３のＶＯＰ（Ｂ１−ＶＯＰ）に対応するビットストリームデータが得られ、“０”がモジュロ時刻基準領域に挿入され、“７５０”がＶＯＰ時刻増分領域に挿入され。
【００６６】
そして、時刻Ｔ５において、第４のＶＯＰ（Ｂ２−ＶＯＰ）が取り込まれる。時刻符号比較器が、Ｂ−ＶＯＰ時刻基準レジスタに記憶された時刻（時、分、秒）を第４のＶＯＰ（Ｂ２−ＶＯＰ）に一時的に記憶された時刻（時、分、秒）と比較する。この例によれば、比較の結果は、第４のＶＯＰ（Ｂ２−ＶＯＰ）に一時的に記憶された時刻がＢ−ＶＯＰ時刻基準レジスタに記憶された時刻よりも１秒だけ大きいことになる。したがって、比較器は、第４のＶＯＰ（Ｂ２−ＶＯＰ）がＢ−ＶＯＰ時刻基準レジスタに保持されている秒の次の１秒において発生したことを表す“１０”を生成する。
【００６７】
ＢタイプのＶＯＰを処理している間には、どのような結果を比較器が生成しようともそれに関係なく、ローカル時刻基準レジスタもＢ−ＶＯＰ時刻基準レジスタもインクリメントされることはない。
【００６８】
比較器によって生成された結果として得られた“１０”がそのまま第４のＶＯＰ（Ｂ２−ＶＯＰ）のモジュロ時刻基準領域に付与される。それと同時に、第４のＶＯＰ（Ｂ２−ＶＯＰ）に一時的に記憶された時、分、秒のデータが除去される。したがって、時刻Ｔ６よりも前において、第４のＶＯＰ（Ｂ２−ＶＯＰ）に対応するビットストリームデータが得られ、“１０”がモジュロ時刻基準領域に挿入され、“１５０”がＶＯＰ時刻増分領域に挿入される。
【００６９】
そして、時刻Ｔ６において、第５のＶＯＰ（Ｐ２−ＶＯＰ）が取り込まれる。時刻符号比較器が、ローカル時刻基準レジスタに記憶された時刻（時、分、秒）を第５のＶＯＰ（Ｐ２−ＶＯＰ）に一時的に記憶された時刻（時、分、秒）と比較する。この例によれば、比較の結果は、第５のＶＯＰ（Ｐ２−ＶＯＰ）に一時的に記憶された時刻がローカル時刻基準レジスタに記憶された時刻よりも１秒だけ大きいことになる。したがって、比較器は、第５のＶＯＰ（Ｐ２−ＶＯＰ）がローカル時刻基準レジスタに保持されている秒の次の１秒において発生したことを表す“１０”を生成する。
【００７０】
時刻Ｔ６よりも後において、Ｂ−ＶＯＰ時刻基準レジスタは、時刻Ｔ６の直前にローカル時刻基準レジスタに保持されている時刻と等しい時刻にインクリメントされる。この例においては、Ｂ−ＶＯＰ時刻基準レジスタは、１：２３：４６にインクリメントされる。さらに、時刻Ｔ６よりも後において、ローカル時刻基準レジスタは、第５のＶＯＰ（Ｐ２−ＶＯＰ）に一時的に記憶された時刻と等しい時刻にインクリメントされる。したがって、この例では、ローカル時刻基準レジスタは、１：２３：４７にインクリメントされる。
【００７１】
比較器によって生成された結果として得られた“１０”がそのまま第５のＶＯＰ（Ｐ２−ＶＯＰ）のモジュロ時刻基準領域に付与される。それと同時に、第５のＶＯＰ（Ｐ２−ＶＯＰ）に一時的に記憶された時、分、秒のデータが除去される。したがって、時刻Ｔ７よりも前において、第５のＶＯＰ（Ｐ２−ＶＯＰ）に対応するビットストリームデータが得られ、“１０”がモジュロ時刻基準領域に挿入され、“３５０”がＶＯＰ時刻増分領域に挿入される。
【００７２】
その後、同様の処理が実行され、それ以降のＶＯＰに対するビットストリームデータが形成される。
【００７３】
このビットストリームデータを復号化するために、上述の処理とは逆の処理が実行される。まず最初に、ＧＯＰヘッダーに保持される時刻（時、分、秒）が読み込まれる。読み込まれた時刻は、ローカル時刻基準レジスタに記憶される。
【００７４】
ＩタイプまたはＰタイプのＶＯＰすなわちＢタイプ以外のＶＯＰを受けた場合、モジュロ時刻基準領域に記憶されたデータが読み込まれる。もし読み込まれたデータが“０”であれば、すなわち、０の前に１がなければ、ローカル時刻基準レジスタは変更されることはない。また、Ｂ−ＶＯＰ時刻基準レジスタも変更されることはない。もし読み込まれたデータが“１０”であれば、ローカル時刻基準レジスタに記憶された時刻が１秒だけインクリメントされる。もし読み込まれたデータが“１１０”であれば、ローカル時刻基準レジスタに記憶された時刻が２秒だけインクリメントされる。このように、インクリメントされるべき秒数は、０の前に挿入された１の数によって決定される。また、読み込まれたデータが“１０”または“１１０”の場合には、メモリーであるＢ−ＶＯＰ時刻基準レジスタは、ローカル時刻基準レジスタがインクリメント直前に保持していた時刻をコピーする。そして、ローカル時刻基準レジスタに保持された時刻（時、分、秒）がＶＯＰ時刻増分領域に保持された時刻（ミリ秒）と組み合わされ、ＩタイプまたはＰタイプのＶＯＰが生成されるべき時刻が確定される。
【００７５】
ＢタイプのＶＯＰを受けた場合は、モジュロ時刻基準領域に記憶されたデータが読み込まれる。もし読み込まれたデータが“０”であれば、Ｂ−ＶＯＰ時刻基準レジスタに保持された時刻（時、分、秒）がＶＯＰ時刻増分領域に保持された時刻（ミリ秒）と組み合わされ、ＢタイプのＶＯＰが生成されるべき時刻が確定される。もし読み込まれたデータが“１０”であれば、Ｂ−ＶＯＰ時刻基準レジスタに保持された時刻（時、分、秒）は１秒が加算され、この加算されて得られた時刻がＶＯＰ時刻増分領域に保持された時刻（ミリ秒）と組み合わされ、ＢタイプのＶＯＰが生成されるべき時刻が確定される。もし読み込まれたデータが“１１０”であれば、Ｂ−ＶＯＰ時刻基準レジスタに保持された時刻（時、分、秒）に２秒加算され、この加算されて得られた時刻がＶＯＰ時刻増分領域に保持された時刻（ミリ秒）と組み合わされ、ＢタイプのＶＯＰが生成されるべき時刻が確定される。
【００７６】
本発明の効果は、異なった符号器によって符号化されたビデオ対象画を多重化することができることである。さらに、本発明は、異なったソースから得られる圧縮データを対象画に基づいて操作して新しいビットストリームを生成することを容易にする。本発明は、オーディオビジュアル対象画を同期させる方法を提供する。
【００７７】
このように本発明が説明されたが、上述されたものは様々な形態に変更することができる。そのような変形は本発明の精神および範囲を逸脱するものではなく、この分野に通常の知識を有する者には明白なように、そのような変更のすべては請求の範囲に包含されるものである。
【図面の簡単な説明】
【図１】ビデオシーケンスのフレームが一定間隔でサンプリングされる従来技術による時間的なサンプリングを説明する図である。
【図２】ビデオ対象画の概念およびその互いの関係を説明する図である。ビデオ対象画のサンプリングは不規則であってもよく、また、サンプリング周期は急激に変化してもよい。
【図３】ビデオ対象画の参照時刻がモジュロ時刻基準とＶＯＰ時刻増分とによって表される本発明を説明する図である。この説明では、Ｉ−ＶＯＰおよびＰ−ＶＯＰだけが使用されている。
【図４】ビデオ対象画の参照時刻がモジュロ時刻基準とＶＯＰ時刻増分とによって表される本発明を説明する図である。この説明では、Ｉ−ＶＯＰ、Ｐ−ＶＯＰ、および、Ｂ−ＶＯＰが使用されている。
【図５】Ｂ−ビデオ対象画のためにプレゼンテーション順序および符号化順序が異なる場合に発生することがある曖昧さの例を説明する図である。
【図６】Ｂ−ビデオ対象画のためにプレゼンテーション順序および符号化順序が異なる場合に発生することがある曖昧さの例を説明する図である。
【図７】Ｂ−ビデオ対象画のためにプレゼンテーション順序および符号化順序が異なる場合に発生することがある曖昧さの例を説明する図である。
【図８】Ｂ−ビデオ対象画のためにプレゼンテーション順序および符号化順序が異なる場合に発生することがある曖昧さの例を説明する図である。
【図９】絶対時刻基準および相対時刻基準を用いることによって曖昧さを解決することを説明する図である。
【図１０】絶対時刻基準および相対時刻基準を用いることによって曖昧さを解決することを説明する図である。
【図１１】２つのＶＯＰの組み合わせと、ＶＯＰ時刻オフセットを用いることによってそれらを共通時刻基準に同期させることとを説明する図である。
【図１２】時刻基準の符号化を説明するフローチャートである。
【図１３】複数のビデオ対象画の多重化を説明するフローチャートである。
【図１４】複数のビデオ対象画の多重分離を説明するフローチャートである。
【図１５】プレゼンテーションタイムスタンプの再生を説明するフローチャートである。
【図１６】時刻基準を符号化するためのビットストリーム符号器の動作を説明するブロック構成図である。
【図１７】時刻基準を復号化するためのビットストリーム復号器の動作を説明するブロック構成図である。
【図１８】ビットストリームデータの形成を説明するタイムチャートである。[0001]
BACKGROUND OF THE INVENTION
The present invention is useful for encoding digital audiovisual materials that require multiple independently encoded audiovisual objects to be synchronized for presentation. The present invention is particularly useful when the temporal sampling of the audiovisual material is not identical.
[0002]
[Prior art]
In MPEG1 and MPEG2, the input video consists of image frames sampled at regular time intervals. This represents the finest temporal resolution of the input video. FIG. 1 shows a video sequence with a constant frame rate in which image frames are sampled at regular intervals. In the coded representation of a video sequence using the MPEG1 standard and the MPEG2 standard, the display order of the decoded frames is represented by a reference time (temporal reference). This parameter is described in the picture header of the bitstream syntax. The value of this parameter is incremented by one for each decoded frame when checking the display order.
[0003]
H. In the H.263 standard, frames may be skipped, so variable frame rate video sequences can be decoded. However, the sampling of the frame remains unchanged. Thus, the time reference method used in MPEG1 and MPEG2 is still appropriate and only needs to be modified to increment by (1 + number of non-transfer pictures at input frame rate) instead of incrementing by 1. is there.
[0004]
Currently, research and development for encoding video as independent objects in multiple video object images has been conducted. This represents a new concept in the decoding and synchronization of each video object. It is expected that these individual video objects may originate from multiple sources and may have completely different frame rates. Some of the subjects may have a substantially continuous temporal sampling rate. When these video target images are combined and displayed, they become a composite image. Therefore, some kind of synchronization is required for this synthesis. The display frame rate may be different from the frame rate of any video target image. FIG. 2 shows an example of two video objects having different frame rates. Even if a common frame rate between two video objects can be found, the frame rate will not necessarily be the same as the output frame rate of the synthesis processor.
[0005]
In the following, problems in the video domain will be described, but the same principle of the present invention can be extended to the audio domain, and can also be extended to a combination of the two.
[0006]
It is clear from the above that the current state of the art does not meet the requirements for video object picture synchronization. Also, the current situation in the art does not provide a common reference time when different video objects have different frame rates that are not multiples of each other.
[0007]
The first issue is how to provide a common local time base mechanism for each video object. This time reference can provide very fine temporal granularity and also addresses the fact that there can be a very long interval between two successive instances of a video object. Must be able to.
[0008]
The second issue is how to provide a mechanism for synchronizing video objects with different frame rates.
[0009]
[Problems to be solved by the invention]
The above problem can be solved by introducing a common time resolution used for all local time references. In order to provide a wide range of temporal granularity, the local time base is divided into two different parts. The first part includes temporal resolution with a fine granularity that provides a short time base. The second part includes temporal resolution with a coarse granularity that provides a long time base. A short time base is included in each video object picture and provides a reference time at the time of the video object picture. This short time reference is then synchronized to the long time reference common to all video objects. This long time base is used to synchronize all the various video objects to a common time base provided by the master clock.
[0010]
[Means for Solving the Problems]
The present invention relates to a method for decoding the time of a B-VOP (VOP encoded by bi-directional prediction) included in compressed data, wherein the compressed data represents a modulo time base increment representing an increment of one second. And a VOP time base increment representing an increment shorter than 1 second, and from the compressed data, an I-VOP (intra-coded VOP) or P-VOP immediately before the B-VOP in the display order ( Predictive-encoded VOP), obtain the time in 1 second units, decode the modulo time base of the B-VOP, decode the VOP time base increment of the B-VOP, and obtain the obtained 1 second unit The VOP time decoding method uses the result of adding the decoded modulo time base increment and the decoded VOP time base increment to the B-VOP time.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
The present invention operates by using two time display formats. The first time display format is a short time reference added to the video object picture. Hereinafter, this time reference is referred to as VOP (Video Object Plane) increment. This VOP time increment acts as the timing used for the video object picture in conjunction with a long time base that is added to the group of video object pictures that are decoded and combined together. This long time base is called a modulo time base. These VOP time increments and modulo time bases are then used in conjunction to determine the actual time base for use in compositing the video object picture into the final composite sequence for display.
[0012]
To facilitate editing the bitstream and combining different video objects from different sources into a new group of video objects, from the common time base to the local time base of the individual video objects. A third component that provides a constant offset value is required. Hereinafter, this offset is referred to as a VOP time offset. This prevents different object pictures from having to be synchronized with a granularity equal to the modulo time base interval. This component should be invariant for each video object in the group to which the video objects that are multiplexed together belong.
[0013]
First, the modulo time reference will be described.
[0014]
The modulo time base represents a coarse resolution of the local time base. It does not have a value like VOP time increment. In practice, it is a more important synchronization mechanism for synchronizing the VOP time increment to the local time base of the video object. It is placed as a marker in the encoded bitstream and indicates that the video object's VOP time increment must be reset, and the local time base reference is one or more modulo time base references. Indicates that it must be incremented in intervals. 3, 4, 5 to 10, and 11, the modulo time base is represented by a continuous “1” inserted before the VOP time increment and a subsequent “0” in the bitstream header. Is done. The number of consecutive “1” s is zero or more. The number of “1” s inserted in the bitstream depends on the number of units in the modulo time base that has elapsed since the last I-VOP or P-VOP. In the encoder and decoder, the modulo time base counter is incremented by 1 each time "1" is detected. The modulo time base counter is finite in length, so in a practical system the modulo time base is reset to zero if its maximum value is exceeded. In a typical video sequence, video object pictures form a group of VOPs. Therefore, the modulo time base is usually reset at the start of this VOP group.
[0015]
Next, the VOP time increment will be described.
[0016]
The VOP time increment must be in units that can take advantage of the shortest temporal sampling of the video object. It may be a negative time reference used for the target image. It therefore represents the finest granularity of time resolution required or the finest granularity of temporal resolution that can be used.
[0017]
The VOP time increment may then be represented by a finite length number that is greater than or equal to the global time base interval / local time base resolution ratio. FIG. 3 shows an example of VOP time increment and modulo time base references used for I and P-video object pictures. An absolute time base is used. The VOP time increment is reset each time a modulo time reference is detected. FIG. 4 shows another example using I, P, and B-video object pictures. The operation is the same except that the modulo time base is repeated in the same way in the B-video object picture. If the modulo time base is not repeated in the same way in the B-video object picture, ambiguity arises due to differences in decoding and presentation order. This is described in detail below.
[0018]
Since the VOP time increment corresponds to the presentation time base, a potential problem occurs when the encoding order is different from the presentation order. This is caused by the B-video object picture. Similar to MPEG-1 and MPEG-2 B-pictures, B-video object pictures, even if their presentation order was before the reference I-video object picture and the reference P-video object picture, It is encoded after those reference video object pictures. Since the VOP time increment is finite and relative to the modulo time base, the VOP time increment is reset if a modulo time base is detected. However, the encoding order for the B-video object picture remains delayed. Figures 5 to 8 show possible ambiguities. It cannot be determined when the VOP time increment should be reset. In fact, given a sequence of events to be encoded as shown in FIG. 5, knowing which timing position in FIG. 6, FIG. 7, and FIG. Can not. This problem arises because of the use of a single modulo time base that is shared by all different types of video objects in which different encoding and presentation orders are mixed. There is nothing that can be done to the order of encoding. This is because this reference information is required by the B-video target image. Also, it is not preferred that each of the different prediction forms has its own modulo time base.
[0019]
Next, the VOP time offset will be described.
[0020]
In addition to the above, the modulo time base is shared among all video objects. This means that the synchronization between different video objects has a granularity equal to the modulo time base interval. This is particularly unacceptable when video object pictures from different groups are combined to form a new group of video object pictures. FIG. 11 shows an example of two different video object pictures encoded according to two different local time references that are offset from each other. Thus, when these video target images are multiplexed, the video target images are also out of synchronization. Finer granularity is achieved by giving each individual video object picture a VOP time offset. This means that only this value is changed when the video object picture is manipulated and multiplexed. Not only does the VOP time increment need not change, but different video objects can be multiplexed without using timing differences with coarse granularity. FIG. 11 illustrates the use of this time reference offset.
[0021]
The preferred embodiment of the present invention includes a method for encoding the time base used for each individual video object picture bitstream, a method for multiplexing different video object pictures into a common time reference, and a method for multiplexing. And a method of demultiplexing the bitstream into components and a method of reproducing a time reference from the component bitstream.
[0022]
Next, time-based encoding will be described.
[0023]
A flowchart of an embodiment for encoding a time reference is shown in FIG. In the encoder, in step 1, the local time reference is first initialized to the local start time. Processing moves to step 2, where the encoder determines the current value of the local time base. In step 3, the resulting local time base is compared with a pre-encoded modulo time base to check whether the interval exceeds the modulo time base interval. If that interval has been exceeded, control passes to step 4 where the required number of modulo time bases are inserted into the bitstream. If the interval is not exceeded, no special processing is required. The process then proceeds to step 5 where the VOP time increment is inserted into the bitstream. Next, in step 6, the target picture is encoded and inserted into the bit stream. The encoder then checks in step 7 to determine if there are more target images to be encoded. If there is a target picture to be encoded, the process returns to step 2 where the local time base is obtained. If there is no target picture to be encoded, the process ends.
[0024]
In order to determine the absolute and relative VOP time increments for each of the I / P-Video object picture and the B-Video object picture, the following equations are used:
t _GTBn = N × t _GTBI + T _GTB0 (N = 0, 1, 2, 3, ...) (1)
t _AVTI = T _{ETBI / P} -T _GTBn (2)
t _RVTI = T _ETBB -T _{ETBI / P} (3)
Where t _GTBn Is the encoder time base represented by the nth encoded modulo time base.
[0025]
t _GTBI Is a predetermined modulo time base interval.
[0026]
t _GTB0 Is the start time of the encoder time reference.
[0027]
t _AVTI Is the absolute VOP time increment for the I or P-video object picture.
[0028]
t _{ETBI / P} Is the encoder time reference at the start of encoding of the I or P-video object picture.
[0029]
t _RVTI Is the relative VOP time increment for the B-video object image.
[0030]
t _ETBB Is the encoder time reference at the start of encoding of the B-video target picture. Next, multiplexing of a plurality of video target images will be described.
[0031]
When a plurality of video target images are multiplexed into one, the multiplexing apparatus examines the bit stream of the multiplexed video target image to determine not only synchronization but also the multiplexing order. The operations included in this are shown in FIG. In step 11, the VOP time offset for each video object to be multiplexed is inserted into the bitstream. Next, in step 12, all bitstreams of the video object to be multiplexed are examined to determine whether all video objects are their respective modulo time bases. If so, processing proceeds to step 13 where a common modulo time base is inserted into the multiplexed bitstream. If not, the process proceeds to step 14 where the next encoded video object picture is inserted into the multiplexed bitstream. In step 15, the bit stream of the video object to be multiplexed is checked again for further video objects to be multiplexed. If so, control again proceeds to step 12. If not, the process ends.
[0032]
Next, demultiplexing of a bit stream including a plurality of video target images will be described.
[0033]
Demultiplexing of a bitstream including multiple video object pictures is shown in FIG. This process begins at step 21, where the VOP time offset is decoded and sent to the decoder for use in synchronization. Then, in step 22, the multiplexed bitstream is examined to determine if a modulo time reference has been detected. If a modulo time reference is detected, processing proceeds to step 23 where the modulo time reference is inserted into all video object picture bitstreams. If no modulo time reference is detected, processing proceeds to step 24 where the next video object is examined and inserted into the appropriate video object bitstream. Finally, the multiplexed bitstream is examined again to determine if there are additional video objects to demultiplex. If so, the process proceeds to step 22 again. If not, the process ends.
[0034]
Next, time-based reproduction will be described.
[0035]
An embodiment for reproducing the time reference is shown in FIG. When playing back the local time base, the process begins at step 31, where the local time base is initialized taking into account the VOP time offset decoded by the demultiplexer. The process then proceeds to step 32 where the bitstream is examined to determine if the modulo time base has been decoded. If the modulo time base has been decoded, processing proceeds to step 33 where the local time base is incremented by an increment of the modulo time base. Then, the process proceeds to step 37. If the modulo time base has not been decoded, processing proceeds to step 34 where the video object picture is examined to determine if it is a B-video object picture. If it is a B-video target picture, the process proceeds to step 35, where a decoding time reference for the B-video target picture is calculated based on equation (6). Then, the process proceeds to step 37. If the result of step 34 is not a B-video target image, processing proceeds to step 36 where a decoding time reference is calculated based on equation (5). Then, the process proceeds to step 37. In step 37, the bitstream is examined to determine if there are additional video objects to be decoded. If so, the process proceeds to step 32 again. If not, the process ends.
[0036]
To determine the presentation time stamp of the video object picture, the following equation is used:
t _GTBn = N × t _GTBI + T _GTB0 (N = 0, 1, 2, 3, ...) (4)
t _{DTBI / P} = T _AVTI + T _GTBn (5)
t _DTBB = T _RVTI + T _{DTBI / P} (6)
Where t _GTBn Is the decoding time base represented by the n th decoded modulo time base.
[0037]
t _GTBI Is a predetermined modulo time base interval.
[0038]
t _GTB0 Is the start time of the decoding time reference.
[0039]
t _{DTBI / P} Is a decoding time reference at the start point of decoding of the I or P-video target picture.
[0040]
t _AVTI Is the decoded absolute VOP time increment for the I or P-video object picture.
[0041]
t _DTBB Is a decoding time reference at the start point of decoding of the B-video target image.
[0042]
t _RVTI Is the decoded relative VOP time increment for the B-video object picture.
[0043]
Next, an embodiment of the bit stream encoder will be described.
[0044]
FIG. 16 is a block diagram illustrating an embodiment of a bitstream encoder for encoding modulo time base and VOP time increments. For this description, the example shown in FIG. 4 is used. Since bi-directional prediction is used, the coding order is different from the presentation order shown in FIG. The encoding order starts from the I-VOP followed by the P-VOP before the B-VOP. This is explained in the following three paragraphs.
[0045]
Processing begins at step 41, which is an initializer, where the bitstream encoder begins by initializing the local time base register to the initial value of the time code. The same time code value is encoded in the bitstream. At the start of the encoding of the next I-VOP, the time code comparator step 42 compares the presentation time of the I-VOP with the local time base register. The result is sent to step 43 which is a modulo time base encoder. The modulo time base encoder inserts the required number of “1” s equal to the number of elapsed modulo time base increments into the bitstream. This is followed by the symbol “0” to indicate the end of the modulo time base code. The local time base register is updated to the current modulo time base. The process then proceeds to step 44, which is a VOP time base incremental encoder, where the remaining portion of the I-VOP presentation time code is encoded.
[0046]
This process is repeated for the next encoded video object which is a P-VOP. Step 42, which is a time code comparator, compares the presentation time of the P-VOP with the local time base register. The result is sent to step 43 which is a modulo time base encoder. The modulo time base encoder inserts the required number of “1” s equal to the number of modulo time base increments that have elapsed. This is followed by the symbol “0” to indicate the end of the modulo time base code. The B-VOP time base register is set to the value of the local time base register, and the local time base register is updated to the current modulo time base. The process then proceeds to step 44, which is a VOP time base incremental encoder, where the remaining portion of the P-VOP presentation time code is encoded.
[0047]
This process is then repeated for the next video object to be encoded which is a B-VOP. Step 42, which is a time code comparator, compares the presentation time of the B-VOP with the B-VOP time base register. The result is sent to step 43 which is a modulo time base encoder. The modulo time base encoder inserts the required number of “1” s equal to the number of modulo time base increments that have elapsed. This is followed by the symbol “0” to indicate the end of the modulo time base code. Neither the B-VOP time base register nor the local time base register is changed after processing the B-VOP. The process then proceeds to step 44, which is a VOP time base incremental encoder, where the remaining portion of the B-VOP presentation time code is encoded.
[0048]
The local time base register is reset at the next I-VOP that represents the beginning of the next VOP group.
[0049]
Next, an embodiment of the bit stream decoder will be described.
[0050]
FIG. 17 is a block diagram illustrating an embodiment of a decoder used for the modulo time base and VOP time increment to reproduce the presentation time stamp. As in the encoder embodiment, the example shown in FIG. 4 is used. The decoding order is the same as the encoding order, and the I-VOP and the subsequent P-VOP are decoded before the B-VOP. This is explained in the following paragraphs.
[0051]
Processing begins at step 51, which is an initializer, where the local time base register is set to the value of the time code decoded from the bitstream. The process then proceeds to step 52, which is a modulo time base decoder, where the modulo time base increment is decoded. The total number of modulo time base increments to be decoded is given by the number of “1” s decoded before the symbol “0”. The VOP time base increment is then decoded in step 53, which is a VOP time base increment decoder. In step 54, which is a time reference calculator, the presentation time of the I-VOP is reproduced. The sum of the decoded modulo time base increment is added to the local time base register. Then, the VOP time reference increment is added to the local time reference register to obtain the presentation time of the I-VOP. The process then proceeds to step 55, which is a video object picture decoder, where the video object picture is decoded.
[0052]
For the P-VOP, the process is repeated in step 52, which is a modulo time base decoder, where the modulo time base increment is decoded. The total number of modulo time base increments to be decoded is given by the number of “1” s decoded before the symbol “0”. The VOP time base increment is then decoded in step 53, which is a VOP time base increment decoder. In step 54, which is a time reference calculator, the presentation time of the P-VOP is reproduced. The B-VOP modulo time base register is set to the value of the local time base register. The sum of the decoded modulo time base increment is added to the local time base register. Then, the VOP time base increment is added to the local time base register, and the presentation time of the P-VOP is obtained. Processing proceeds to the video object picture decoder, where the video object picture is decoded.
[0053]
For B-VOPs, the process is repeated in step 52, which is a modulo time base decoder, where the modulo time base increment is decoded. The total number of modulo time base increments to be decoded is given by the number of “1” s decoded before the symbol “0”. The VOP time base increment is then decoded in step 53, which is a VOP time base increment decoder. In step 54, which is a time reference calculator, the presentation time of the B-VOP is reproduced. The sum of the decoded modulo time base increment and the VOP time base increment are added to the B-VOP time base register to obtain the presentation time of the B-VOP. Both the B-VOP time base register and the local time base register remain unchanged. The processing then proceeds to the video object picture decoder, where the video object picture is decoded.
[0054]
The local time base register is reset at the next I-VOP that represents the beginning of the next VOP group.
[0055]
Next, a specific example will be described.
[0056]
Referring to FIG. 18, an example of steps for encoding compressed data into bitstream data is shown. As shown in the upper row of FIG. 18, the compressed video data VOP is arranged in a line in the order of I1, B1, B2, P1, B3, and P2 in the display order, and the GOP (group of picture) header is displayed. Inserted at the start of a VOP group. The local time at which the display is performed is determined for each VOP using the local time clock. For example, the first VOP (I1-VOP) is displayed at 1: 23: 45: 350 milliseconds (1: 23: 45: 350) counted from the very start of the video data, and the second VOP ( B1-VOP) is displayed at 1: 23: 45: 750, and the third VOP (B2-VOP) is displayed at 1: 23: 46: 150, and so on.
[0057]
In order to encode a VOP, it is necessary to insert display time data into each VOP. If time data is inserted in a complete form including hours, minutes, seconds, and milliseconds, a considerable data area is required in the header portion of each VOP. The object of the present invention is to reduce such data areas and to simplify the time data to be inserted into each VOP.
[0058]
Each of the VOPs shown in the top row in FIG. 18 stores display time data consisting of milliseconds in the VOP time increment area. Further, each VOP in the top row temporarily stores display time data consisting of hours, minutes, and seconds. The GOP header stores display data consisting of minutes and seconds when used for the first VOP (I1-VOP).
[0059]
As shown in the second row of FIG. 18, the VOP is delayed by a predetermined time using a buffer (not shown). According to the bi-directional prediction scheme, the order of VOPs changes when a VOP is generated from a buffer, so a bi-directional VOP, ie, a B-VOP, should be located after a P-VOP referenced by that B-VOP. is there. Therefore, the VOPs are arranged in a line in the order of I1, P1, B1, B2, P2, and B3.
[0060]
As shown in the third row of FIG. 18, at time T1, that is, when the GOP header is just encoded, the minutes and seconds data are stored in the local time base register as they are when stored in the GOP header. Remembered. In the example shown in FIG. 18, the local time base register stores 1:23:45. Before the time T2, bit stream data corresponding to the GOP header is obtained, and hour, minute, and second data are inserted as shown in the lower side of FIG.
[0061]
At time T2, the first VOP (I1-VOP) is captured. A time code comparator compares the time (hour, minute, second) stored in the local time base register with the time (hour, minute, second) temporarily stored in the first VOP (I1-VOP). . According to this example, the comparison results are the same. Therefore, the comparator generates “0” indicating that the first VOP (I1-VOP) occurred in the same second as that held in the local time base register. “0” generated by the comparator is directly added to the modulo time base area of the first VOP (I1-VOP). At the same time, when temporarily stored in the first VOP (I1-VOP), the minutes and seconds data are removed. Therefore, before the time T3, the bit stream data corresponding to the first VOP (I1-VOP) is obtained, “0” is inserted into the modulo time base area, and “350” is inserted into the VOP time increment area. Is done.
[0062]
Next, at time T3, the second VOP (P1-VOP) is captured. The time code comparator compares the time (hour, minute, second) stored in the local time base register with the time (hour, minute, second) temporarily stored in the second VOP (P1-VOP). . According to this example, as a result of the comparison, the time temporarily stored in the second VOP (P1-VOP) is one second greater than the time stored in the local time base register. Therefore, the comparator generates “10” indicating that the second VOP (P1−VOP) has occurred in one second following the second held in the local time base register. If the second VOP (P1-VOP) occurs in the second following the second held in the local time base register, the comparator produces "110".
[0063]
After the time T3, the B-VOP time base register is set to a time equal to the time held in the local time base register immediately before the time T3. In this example, 1:23:45 is set in the B-VOP time base register. Further, after time T3, the local time base register is incremented to a time equal to the time temporarily stored in the second VOP (P1-VOP). Thus, in this example, the local time base register is incremented to 1:23:46.
[0064]
"10" obtained as a result generated by the comparator is added to the modulo time base area of the second VOP (P1-VOP) as it is. At the same time, when temporarily stored in the second VOP (P1-VOP), the minute and second data are removed. Therefore, before the time T4, the bit stream data corresponding to the second VOP (P1-VOP) is obtained, “10” is inserted into the modulo time base area, and “550” is inserted into the VOP time increment area. Is done.
[0065]
At time T4, the third VOP (B1-VOP) is captured. The time code comparator stores the time (hour, minute, second) stored in the B-VOP time base register as the time (hour, minute, second) temporarily stored in the third VOP (B1-VOP). Compare. According to this example, the comparison results are the same. Therefore, the comparator generates “0” indicating that the third VOP (B1-VOP) occurred in the same second as that held in the B-VOP time base register. “0” obtained as a result generated by the comparator is directly added to the modulo time base area of the third VOP (B1-VOP). At the same time, when temporarily stored in the first VOP (I1-VOP), minutes and seconds of data are removed. Therefore, before the time T5, the bit stream data corresponding to the third VOP (B1-VOP) is obtained, “0” is inserted into the modulo time base area, and “750” is inserted into the VOP time increment area. It is.
[0066]
At time T5, the fourth VOP (B2-VOP) is captured. The time code comparator stores the time (hour, minute, second) stored in the B-VOP time base register as the time (hour, minute, second) temporarily stored in the fourth VOP (B2-VOP). Compare. According to this example, the result of the comparison is that the time temporarily stored in the fourth VOP (B2-VOP) is one second greater than the time stored in the B-VOP time base register. Therefore, the comparator generates “10” indicating that the fourth VOP (B2−VOP) has occurred in one second following the second held in the B-VOP time base register.
[0067]
While processing a B-type VOP, no matter what result the comparator produces, neither the local time base register nor the B-VOP time base register is incremented.
[0068]
“10” obtained as a result generated by the comparator is directly added to the modulo time base area of the fourth VOP (B2-VOP). At the same time, when temporarily stored in the fourth VOP (B2-VOP), the minute and second data are removed. Therefore, before the time T6, the bit stream data corresponding to the fourth VOP (B2-VOP) is obtained, and “10” is inserted into the modulo time base area and “150” is inserted into the VOP time increment area. Is done.
[0069]
At time T6, the fifth VOP (P2-VOP) is captured. The time code comparator compares the time (hour, minute, second) stored in the local time base register with the time (hour, minute, second) temporarily stored in the fifth VOP (P2-VOP). . According to this example, the result of the comparison is that the time temporarily stored in the fifth VOP (P2-VOP) is one second greater than the time stored in the local time base register. Therefore, the comparator generates “10” indicating that the fifth VOP (P2−VOP) has occurred in the second following the second held in the local time base register.
[0070]
After time T6, the B-VOP time base register is incremented to a time equal to the time held in the local time base register immediately before time T6. In this example, the B-VOP time base register is incremented to 1:23:46. Further, after time T6, the local time base register is incremented to a time equal to the time temporarily stored in the fifth VOP (P2-VOP). Thus, in this example, the local time base register is incremented to 1:23:47.
[0071]
“10” obtained as a result generated by the comparator is directly added to the modulo time base area of the fifth VOP (P2-VOP). At the same time, when temporarily stored in the fifth VOP (P2-VOP), the minute and second data are removed. Therefore, before the time T7, the bit stream data corresponding to the fifth VOP (P2-VOP) is obtained, “10” is inserted into the modulo time base area, and “350” is inserted into the VOP time increment area. Is done.
[0072]
Thereafter, similar processing is executed, and bit stream data for subsequent VOPs is formed.
[0073]
In order to decode this bit stream data, a process opposite to the process described above is executed. First, the time (hour, minute, second) held in the GOP header is read. The read time is stored in the local time base register.
[0074]
When an I-type or P-type VOP, that is, a VOP other than the B-type is received, the data stored in the modulo time base area is read. If the read data is “0”, that is, if there is no 1 before 0, the local time base register is not changed. Also, the B-VOP time reference register is not changed. If the read data is “10”, the time stored in the local time base register is incremented by 1 second. If the read data is “110”, the time stored in the local time base register is incremented by 2 seconds. Thus, the number of seconds to be incremented is determined by the number of 1s inserted before 0. When the read data is “10” or “110”, the B-VOP time base register as a memory copies the time held by the local time base register immediately before the increment. Then, the time (hour, minute, second) held in the local time base register is combined with the time (millisecond) held in the VOP time increment area, and the time at which the I-type or P-type VOP is to be generated is determined. Confirmed.
[0075]
When a B-type VOP is received, the data stored in the modulo time base area is read. If the read data is “0”, the time (hour, minute, second) held in the B-VOP time base register is combined with the time (millisecond) held in the VOP time increment area. The time at which a type of VOP is to be generated is determined. If the read data is "10", 1 second is added to the time (hour, minute, second) held in the B-VOP time base register, and the time obtained by this addition is the VOP time increment. Combined with the time (milliseconds) held in the area, the time at which the B-type VOP should be generated is determined. If the read data is “110”, 2 seconds are added to the time (hour, minute, second) held in the B-VOP time base register, and the time obtained by this addition is the VOP time increment area. In combination with the time (milliseconds) held in, the time at which the B-type VOP is to be generated is determined.
[0076]
The effect of the invention is that video object pictures encoded by different encoders can be multiplexed. Furthermore, the present invention facilitates manipulating compressed data obtained from different sources based on the subject image to generate a new bitstream. The present invention provides a method for synchronizing audiovisual object images.
[0077]
Although the present invention has been described in this manner, what has been described above can be modified in various forms. Such variations do not depart from the spirit and scope of the invention and, as will be apparent to those having ordinary skill in the art, all such modifications are intended to be encompassed by the following claims. is there.
[Brief description of the drawings]
FIG. 1 illustrates temporal sampling according to the prior art in which frames of a video sequence are sampled at regular intervals.
FIG. 2 is a diagram for explaining the concept of a video target image and the relationship between the images. Sampling of the video object picture may be irregular and the sampling period may change rapidly.
FIG. 3 is a diagram for explaining the present invention in which a reference time of a video object picture is represented by a modulo time base and a VOP time increment. In this description, only I-VOP and P-VOP are used.
FIG. 4 is a diagram for explaining the present invention in which a reference time of a video object picture is represented by a modulo time base and a VOP time increment. In this description, I-VOP, P-VOP, and B-VOP are used.
FIG. 5 is a diagram illustrating an example of ambiguity that may occur when the presentation order and the encoding order are different for a B-video target picture.
FIG. 6 is a diagram illustrating an example of ambiguity that may occur when the presentation order and the encoding order are different for a B-video target picture.
FIG. 7 is a diagram illustrating an example of ambiguity that may occur when the presentation order and the encoding order are different for a B-video target picture.
FIG. 8 is a diagram for explaining an example of ambiguity that may occur when the presentation order and the encoding order are different for a B-video target picture.
FIG. 9 illustrates resolving ambiguity by using an absolute time reference and a relative time reference.
FIG. 10 illustrates resolving ambiguity by using an absolute time reference and a relative time reference.
FIG. 11 is a diagram illustrating a combination of two VOPs and synchronizing them to a common time reference by using a VOP time offset.
FIG. 12 is a flowchart illustrating time-based encoding.
FIG. 13 is a flowchart illustrating multiplexing of a plurality of video target images.
FIG. 14 is a flowchart illustrating demultiplexing of a plurality of video target images.
FIG. 15 is a flowchart for explaining presentation time stamp reproduction;
FIG. 16 is a block diagram illustrating the operation of a bitstream encoder for encoding a time reference.
FIG. 17 is a block diagram illustrating the operation of a bitstream decoder for decoding a time reference.
FIG. 18 is a time chart illustrating the formation of bit stream data.

Claims

A method of decoding the time of a B-VOP (VOP encoded by bidirectional prediction) included in compressed data,
The compressed data includes a modulo time base increment representing an increment of 1 second and a VOP time base increment representing an increment shorter than 1 second;
From the compressed data, obtain the time in 1-second units of I-VOP (intra-coded VOP) or P-VOP (predictive-coded VOP) immediately before the B-VOP in the display order,
Decoding the modulo time base of the B-VOP;
Decoding the VOP time base increment of the B-VOP,
A VOP time decoding method in which the result of adding the decoded modulo time base increment and the decoded VOP time base increment to the acquired time in one second is the time of the B-VOP .