JP3380980B2

JP3380980B2 - Image encoding method, image decoding method, and image decoding device

Info

Publication number: JP3380980B2
Application number: JP08075898A
Authority: JP
Inventors: 陽一矢ヶ崎; 輝彦鈴木
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1997-04-01
Filing date: 1998-03-27
Publication date: 2003-02-24
Anticipated expiration: 2018-03-27
Also published as: JPH10336669A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、画像符号化方法、
並びに画像復号方法および画像復号装置に関する。特
に、例えば、動画像データを、光磁気ディスクや磁気テ
ープなどの記録媒体に記録し、これを再生してディスプ
レイなどに表示したり、テレビ会議システム、テレビ電
話システム、放送用機器、マルチメディアデータベース
検索システムなどのように、動画像データを伝送路を介
して送信側から受信側に伝送し、受信側において、受信
された動画像データを表示する場合や、編集して記録す
る場合などに用いて好適な画像符号化方法、並びに画像
復号方法および画像復号装置に関する。TECHNICAL FIELD The present invention relates to an image coding method ,
The present invention also relates to an image decoding method and an image decoding device . In particular, for example, moving image data is recorded on a recording medium such as a magneto-optical disk or a magnetic tape and is reproduced and displayed on a display or the like, or a video conference system, a video telephone system, broadcasting equipment, a multimedia database. Used when transmitting moving image data from the transmitting side to the receiving side via a transmission line and displaying the received moving image data, or when editing and recording, etc. And an image decoding method , an image decoding method, and an image decoding apparatus .

【０００２】[0002]

【従来の技術】例えば、テレビ会議システム、テレビ電
話システムなどのように、動画像データを遠隔地に伝送
するシステムにおいては、伝送路を効率良く利用するた
め、画像データを、そのライン相関やフレーム間相関を
利用して圧縮符号化するようになされている。2. Description of the Related Art In a system for transmitting moving image data to a remote place, such as a video conference system or a video telephone system, for example, the image data is line-correlated or framed in order to use the transmission line efficiently. The compression coding is performed by using the inter-correlation.

【０００３】動画像の高能率符号化方式として代表的な
ものとしてMPEG（Moving Picture Experts Group）（蓄
積用動画像符号化）方式がある。これはＩＳＯ−ＩＥＣ
／ＪＴＣ１／ＳＣ２／ＷＧ１１において議論され、標準
案として提案されたものであり、動き補償予測符号化と
ＤＣＴ（Discrete Cosine Transform）符号化を組み合
わせたハイブリッド方式が採用されている。A typical example of a high-efficiency coding method for moving images is the Moving Picture Experts Group (MPEG) (moving image coding for storage) method. This is ISO-IEC
/ JTC1 / SC2 / WG11, and it was proposed as a standard proposal, and a hybrid system combining motion compensation predictive coding and DCT (Discrete Cosine Transform) coding is adopted.

【０００４】ＭＰＥＧでは、様々なアプリケーションや
機能に対応するために、いくつかのプロファイルおよび
レベルが定義されている。最も基本となるのが、メイン
プロファイルメインレベル（ＭＰ＠ＭＬ（Main Profile
at Main Level））である。In MPEG, several profiles and levels are defined in order to support various applications and functions. The most basic is the main profile main level (MP @ ML (Main Profile
at Main Level)).

【０００５】図３８は、ＭＰＥＧ方式におけるＭＰ＠Ｍ
Ｌのエンコーダの一例の構成を示している。FIG. 38 shows MP @ M in the MPEG system.
The structure of an example of the encoder of L is shown.

【０００６】符号化すべき画像データは、フレームメモ
リ３１に入力され、一時記憶される。そして、動きベク
トル検出器３２は、フレームメモリ３１に記憶された画
像データを、例えば、１６画素×１６画素などで構成さ
れるマクロブロック単位で読み出し、その動きベクトル
を検出する。Image data to be encoded is input to the frame memory 31 and temporarily stored. Then, the motion vector detector 32 reads out the image data stored in the frame memory 31, for example, in units of macroblocks composed of 16 pixels × 16 pixels, and detects the motion vector.

【０００７】ここで、動きベクトル検出器３２において
は、各フレームの画像データを、Ｉピクチャ(フレーム
内符号化）、Ｐピクチャ（前方予測符号化）、またはＢ
ピクチャ（両方向予測符号化）のうちのいずれかとして
処理する。なお、シーケンシャルに入力される各フレー
ムの画像を、Ｉ，Ｐ，Ｂピクチャのいずれのピクチャと
して処理するかは、例えば、予め定められている（例え
ば、Ｉ，Ｂ，Ｐ，Ｂ，Ｐ，・・・Ｂ，Ｐとして処理され
る）。Here, in the motion vector detector 32, the image data of each frame is converted into an I picture (intra-frame coding), a P picture (forward prediction coding), or a B picture.
It is processed as one of the pictures (bidirectional predictive coding). It should be noted that which of I, P, and B pictures to process the images of the sequentially input frames is predetermined, for example (for example, I, B, P, B, P ,. .. are treated as B and P).

【０００８】即ち、動きベクトル検出器３２は、フレー
ムメモリ３１に記憶された画像データの中の、予め定め
られた所定の参照フレームを参照し、その参照フレーム
と、現在符号化の対象となっているフレームの１６画素
×１６ラインの小ブロック（マクロブロック）とをパタ
ーンマッチング（ブロックマッチング）することによ
り、そのマクロブロックの動きベクトルを検出する。That is, the motion vector detector 32 refers to a predetermined reference frame in the image data stored in the frame memory 31, and the reference frame and the current encoding target. The motion vector of the macroblock is detected by pattern matching (block matching) with a small block (macroblock) of 16 pixels × 16 lines of the existing frame.

【０００９】ここで、ＭＰＥＧにおいては、画像の予測
モードには、イントラ符号化（フレーム内符号化）、前
方予測符号化、後方予測符号化、両方向予測符号化の４
種類があり、Ｉピクチャはイントラ符号化され、Ｐピク
チャはイントラ符号化または前方予測符号化のいずれか
で符号化され、Ｂピクチャはイントラ符号化、前方予測
符号化、後方予測符号化、または両方法予測符号化のい
ずれかで符号化される。Here, in MPEG, there are four image prediction modes: intra coding (intra-frame coding), forward predictive coding, backward predictive coding, and bidirectional predictive coding.
There are types, I pictures are intra coded, P pictures are coded with either intra coding or forward predictive coding, and B pictures are intra coded, forward predictive coded, backward predictive coded, or both. It is encoded by any of the method predictive encodings.

【００１０】即ち、動きベクトル検出器３２は、Ｉピク
チャについては、予測モードとしてイントラ符号化モー
ドを設定する。この場合、動きベクトル検出器３２は、
動きベクトルの検出は行わず、予測モード（イントラ予
測モード）を、ＶＬＣ（可変長符号化）器３６および動
き補償器４２に出力する。That is, the motion vector detector 32 sets the intra coding mode as the prediction mode for the I picture. In this case, the motion vector detector 32
The motion vector is not detected, and the prediction mode (intra prediction mode) is output to the VLC (variable length coding) unit 36 and the motion compensator 42.

【００１１】また、動きベクトル検出器３２は、Ｐピク
チャについては、前方予測を行い、その動きベクトルを
検出する。さらに、動きベクトル検出器３２は、前方予
測を行うことにより生じる予測誤差と、符号化対象のマ
クロブロック（Ｐピクチャのマクロブロック）の、例え
ば分散とを比較する。その比較の結果、マクロブロック
の分散の方が予測誤差より小さい場合、動きベクトル検
出器３２は、予測モードとしてイントラ符号化モードを
設定し、ＶＬＣ器３６および動き補償器４２に出力す
る。また、動きベクトル検出器３２は、前方予測を行う
ことにより生じる予測誤差の方が小さければ、予測モー
ドとして前方予測符号化モードを設定し、検出した動き
ベクトルとともに、ＶＬＣ器３６および動き補償器４２
に出力する。The motion vector detector 32 also performs forward prediction for P pictures to detect the motion vector. Further, the motion vector detector 32 compares the prediction error generated by performing the forward prediction with, for example, the variance of the macroblock to be encoded (macroblock of P picture). As a result of the comparison, when the variance of the macroblock is smaller than the prediction error, the motion vector detector 32 sets the intra coding mode as the prediction mode and outputs it to the VLC unit 36 and the motion compensator 42. Further, if the prediction error caused by performing the forward prediction is smaller, the motion vector detector 32 sets the forward prediction coding mode as the prediction mode, and the detected motion vector, the VLC unit 36, and the motion compensator 42 are set.
Output to.

【００１２】さらに、動きベクトル検出器３２は、Ｂピ
クチャについては、前方予測、後方予測、および両方向
予測を行い、それぞれの動きベクトルを検出する。そし
て、動きベクトル検出器３２は、前方予測、後方予測、
および両方向予測についての予測誤差の中の最小のもの
（以下、適宜、最小予測誤差という）を検出し、その最
小予測誤差と、符号化対象のマクロブロック（Ｂピクチ
ャのマクロブロック）の、例えば分散とを比較する。そ
の比較の結果、マクロブロックの分散の方が最小予測誤
差より小さい場合、動きベクトル検出器３２は、予測モ
ードとしてイントラ符号化モードを設定し、ＶＬＣ器３
６および動き補償器４２に出力する。また、動きベクト
ル検出器３２は、最小予測誤差の方が小さければ、予測
モードとして、その最小予測誤差が得られた予測モード
を設定し、対応する動きベクトルとともに、ＶＬＣ器３
６および動き補償器４２に出力する。Further, the motion vector detector 32 performs forward prediction, backward prediction, and bidirectional prediction for the B picture, and detects each motion vector. Then, the motion vector detector 32 uses forward prediction, backward prediction,
And a minimum prediction error (hereinafter, appropriately referred to as a minimum prediction error) among the prediction errors for bidirectional prediction, and the minimum prediction error and, for example, the variance of the macroblock to be encoded (the macroblock of the B picture). Compare with. As a result of the comparison, when the variance of the macroblock is smaller than the minimum prediction error, the motion vector detector 32 sets the intra coding mode as the prediction mode, and the VLC unit 3
6 and the motion compensator 42. If the minimum prediction error is smaller, the motion vector detector 32 sets the prediction mode in which the minimum prediction error is obtained as the prediction mode, and the VLC unit 3 with the corresponding motion vector.
6 and the motion compensator 42.

【００１３】動き補償器４２は、動きベクトル検出器３
２から予測モードと動きベクトルの両方を受信すると、
その予測モードおよび動きベクトルにしたがって、フレ
ームメモリ４１に記憶されている、符号化され、かつ既
に局所復号された画像データを読み出し、この読み出さ
れた画像データを、予測画像データとして、演算器３３
および４０に供給する。The motion compensator 42 is a motion vector detector 3
When receiving both the prediction mode and the motion vector from 2,
According to the prediction mode and the motion vector, the coded and already locally-decoded image data stored in the frame memory 41 is read out, and the read-out image data is used as predicted image data as the calculator 33.
And 40.

【００１４】演算器３３は、動きベクトル検出器３２が
フレームメモリ３１から読み出した画像データと同一の
マクロブロックをフレームメモリ３１から読み出し、そ
のマクロブロックと、動き補償器４２からの予測画像と
の差分を演算する。この差分値は、ＤＣＴ器３４に供給
される。The calculator 33 reads from the frame memory 31 the same macroblock as the image data read from the frame memory 31 by the motion vector detector 32, and the difference between the macroblock and the predicted image from the motion compensator 42. Is calculated. This difference value is supplied to the DCT device 34.

【００１５】一方、動き補償器４２は、動きベクトル検
出器３２から予測モードのみを受信した場合、即ち、予
測モードがイントラ符号化モードである場合には、予測
画像を出力しない。この場合、演算器３３（演算器４０
も同様）は、特に処理を行わず、フレームメモリ３１か
ら読み出したマクロブロックを、そのままＤＣＴ器３４
に出力する。On the other hand, the motion compensator 42 does not output the predicted image when it receives only the prediction mode from the motion vector detector 32, that is, when the prediction mode is the intra coding mode. In this case, the calculator 33 (calculator 40
The same applies to the macro block read from the frame memory 31 without performing any processing.
Output to.

【００１６】ＤＣＴ器３４では、演算器３３の出力デー
タに対して、ＤＣＴ処理が施され、その結果得られるＤ
ＣＴ係数が、量子化器３５に供給される。量子化器３５
では、バッファ３７のデータ蓄積量（バッファ３７に記
憶されているデータの量）（バッファフィードバック）
に対応して量子化ステップ（量子化スケール）が設定さ
れ、その量子化ステップで、ＤＣＴ器３４からのＤＣＴ
係数が量子化される。この量子化されたＤＣＴ係数（以
下、適宜、量子化係数という）は、設定された量子化ス
テップとともに、ＶＬＣ器３６に供給される。In the DCT unit 34, DCT processing is performed on the output data of the arithmetic unit 33, and the resulting D is obtained.
The CT coefficient is supplied to the quantizer 35. Quantizer 35
Then, the amount of data accumulated in the buffer 37 (the amount of data stored in the buffer 37) (buffer feedback)
The quantization step (quantization scale) is set in correspondence with the DCT from the DCT unit 34 at the quantization step.
The coefficients are quantized. The quantized DCT coefficient (hereinafter, appropriately referred to as a quantized coefficient) is supplied to the VLC unit 36 together with the set quantization step.

【００１７】ＶＬＣ器３６では、量子化器３５より供給
される量子化係数が、例えばハフマン符号などの可変長
符号に変換され、バッファ３７に出力される。さらに、
ＶＬＣ器３６は、量子化器３５からの量子化ステップ、
動きベクトル検出器３２からの予測モード（イントラ符
号化（画像内予測符号化）、前方予測符号化、後方予測
符号化、または両方向予測符号化のうちのいずれが設定
されたかを示すモード）および動きベクトルも可変長符
号化し、その結果得られる符号化データを、バッフ３７
に出力する。In the VLC unit 36, the quantized coefficient supplied from the quantizer 35 is converted into a variable length code such as Huffman code, and output to the buffer 37. further,
The VLC unit 36 uses the quantization step from the quantizer 35,
Prediction mode from motion vector detector 32 (mode indicating which of intra coding (intra-picture predictive coding), forward predictive coding, backward predictive coding, or bidirectional predictive coding) is set and motion The vector is also variable-length coded, and the resulting coded data is stored in the buffer 37.
Output to.

【００１８】バッファ３７は、ＶＬＣ器３６からの符号
化データを一時蓄積することにより、そのデータ量を平
滑化し、符号化ビットストリームとして、例えば、伝送
路に出力し、または記録媒体に記録する。The buffer 37 smoothes the data amount by temporarily storing the coded data from the VLC unit 36, and outputs it as a coded bit stream to, for example, a transmission line or records it on a recording medium.

【００１９】また、バッファ３７は、そのデータ蓄積量
を量子化器３５に出力しており、量子化器３５は、この
バッファ３７からのデータ蓄積量にしたがって量子化ス
テップを設定する。即ち、量子化器３５は、バッファ３
７がオーバーフローしそうなとき、量子化ステップを大
きくし、これにより、量子化係数のデータ量を低下させ
る。また、量子化器３５は、バッファ３７がアンダーフ
ローしそうなとき、量子化ステップを小さくし、これに
より、量子化係数のデータ量を増大させる。このように
して、バッファ３７のオーバフローとアンダフローを防
止するようになっている。The buffer 37 also outputs the data storage amount to the quantizer 35, and the quantizer 35 sets the quantization step in accordance with the data storage amount from the buffer 37. That is, the quantizer 35 uses the buffer 3
When 7 is about to overflow, the quantization step is increased, thereby reducing the data amount of the quantization coefficient. The quantizer 35 reduces the quantization step when the buffer 37 is likely to underflow, thereby increasing the data amount of the quantization coefficient. In this way, overflow and underflow of the buffer 37 are prevented.

【００２０】量子化器３５が出力する量子化係数と量子
化ステップは、ＶＬＣ器３６だけでなく、逆量子化器３
８にも供給されるようになされている。逆量子化器３８
では、量子化器３５からの量子化係数が、同じく量子化
器３５からの量子化ステップにしたがって逆量子化さ
れ、これによりＤＣＴ係数に変換される。このＤＣＴ係
数は、ＩＤＣＴ器（逆ＤＣＴ器）３９に供給される。Ｉ
ＤＣＴ器３９では、ＤＣＴ係数が逆ＤＣＴ処理され、そ
の処理の結果得られるデータが、演算器４０に供給され
る。The quantizing coefficient and the quantizing step output by the quantizer 35 are not limited to those of the VLC unit 36, but also of the inverse quantizer 3.
It is designed to be supplied to 8 as well. Inverse quantizer 38
Then, the quantized coefficient from the quantizer 35 is inversely quantized in accordance with the quantization step from the quantizer 35, and converted into the DCT coefficient. The DCT coefficient is supplied to the IDCT device (inverse DCT device) 39. I
In the DCT unit 39, the DCT coefficient is subjected to inverse DCT processing, and the data obtained as a result of the processing is supplied to the arithmetic unit 40.

【００２１】演算器４０には、ＩＤＣＴ器３９の出力デ
ータの他、上述したように、動き補償器４２から、演算
器３３に供給されている予測画像と同一のデータが供給
されている。演算器４０は、ＩＤＣＴ器３９の出力デー
タ（予測残差（差分データ））と、動き補償器４２から
の予測画像データとを加算することで、元の画像データ
を局所復号し、この局所復号された画像データ（局所復
号画像データ）が出力される（但し、予測モードがイン
トラ符号化である場合には、ＩＤＣＴ器３９の出力デー
タは、演算器４０をスルーして、そのまま、局所復号画
像データとして、フレームメモリ４１に供給される）。
なお、この復号画像データは、受信側において得られる
復号画像データと同一のものである。In addition to the output data of the IDCT unit 39, the arithmetic unit 40 is supplied with the same data as the predicted image supplied to the arithmetic unit 33 from the motion compensator 42 as described above. The arithmetic unit 40 locally decodes the original image data by adding the output data (prediction residual (difference data)) of the IDCT unit 39 and the predicted image data from the motion compensator 42, and this local decoding Output image data (locally decoded image data) is output (however, when the prediction mode is intra-coding, the output data of the IDCT unit 39 passes through the arithmetic unit 40 and is directly output as the locally decoded image. It is supplied to the frame memory 41 as data).
The decoded image data is the same as the decoded image data obtained on the receiving side.

【００２２】演算器４０において得られた復号画像デー
タ（局所復号画像データ）は、フレームメモリ４１に供
給されて記憶され、その後、インター符号化（前方予測
符号化、後方予測符号化、量方向予測符号化）される画
像に対する参照画像データ（参照フレーム）として用い
られる。The decoded image data (locally decoded image data) obtained by the arithmetic unit 40 is supplied to and stored in the frame memory 41, and thereafter, inter-encoding (forward predictive coding, backward predictive coding, quantity direction prediction). It is used as reference image data (reference frame) for an image to be encoded.

【００２３】次に、図３９は、図３８のエンコーダから
出力される符号化データを復号する、ＭＰＥＧにおける
ＭＰ＠ＭＬのデコーダの一例の構成を示している。Next, FIG. 39 shows an example of the configuration of an MP @ ML decoder in MPEG for decoding the encoded data output from the encoder of FIG.

【００２４】伝送路を介して伝送されてきた符号化ビッ
トストリーム（符号化データ）が図示せぬ受信装置で受
信され、または記録媒体に記録された符号化ビットスト
リーム（符号化データ）が図示せぬ再生装置で再生さ
れ、バッファ１０１に供給されて記憶される。The encoded bit stream (encoded data) transmitted through the transmission path is received by a receiver (not shown) or the encoded bit stream (encoded data) recorded on a recording medium is illustrated. It is reproduced by a reproducing device, is supplied to the buffer 101, and is stored therein.

【００２５】ＩＶＬＣ器（逆ＶＬＣ器（可変長復号
器））１０２は、バッファ１０１に記憶された符号化デ
ータを読み出し、可変長復号することにより、その符号
化データを、マクロブロック単位で、動きベクトル、予
測モード、量子化ステップ、および量子化係数に分離す
る。これらのデータのうち、動きベクトルおよび予測モ
ードは動き補償器１０７に供給され、量子化ステップお
よびマクロブロックの量子化係数は逆量子化器１０３に
供給される。The IVLC unit (inverse VLC unit (variable length decoder)) 102 reads the encoded data stored in the buffer 101 and performs variable length decoding to move the encoded data in macroblock units. Separate into vector, prediction mode, quantization step, and quantized coefficient. Of these data, the motion vector and the prediction mode are supplied to the motion compensator 107, and the quantization step and the quantized coefficient of the macroblock are supplied to the inverse quantizer 103.

【００２６】逆量子化器１０３は、ＩＶＬＣ器１０２よ
り供給されたマクロブロックの量子化係数を、同じくＩ
ＶＬＣ器１０２より供給された量子化ステップにしたが
って逆量子化し、その結果得られるＤＣＴ係数を、ＩＤ
ＣＴ器１０４に出力する。ＩＤＣＴ器１０４は、逆量子
化器１０３からのマクロブロックのＤＣＴ係数を逆ＤＣ
Ｔし、演算器１０５に供給する。The inverse quantizer 103 uses the quantization coefficient of the macroblock supplied from the IVLC unit 102 as I
Inverse quantization is performed according to the quantization step supplied from the VLC unit 102, and the resulting DCT coefficient is ID
Output to the CT device 104. The IDCT unit 104 inverts the DCT coefficient of the macroblock from the inverse quantizer 103 to the inverse DC.
Then, it is supplied to the arithmetic unit 105.

【００２７】演算器１０５には、ＩＤＣＴ器１０４の出
力データの他、動き補償器１０７の出力データも供給さ
れている。即ち、動き補償器１０７は、フレームメモリ
１０６に記憶されている、既に復号された画像データ
を、図３８の動き補償器４２における場合と同様に、Ｉ
ＶＬＣ器１０２からの動きベクトルおよび予測モードに
したがって読み出し、予測画像データとして、演算器１
０５に供給する。演算器１０５は、ＩＤＣＴ器１０４の
出力データ（予測残差（差分値））と、動き補償器１０
７からの予測画像データとを加算することで、元の画像
データを復号する。この復号画像データは、フレームメ
モリ１０６に供給されて記憶される。なお、ＩＤＣＴ器
１０４の出力データが、イントラ符号化されたものであ
る場合には、その出力データは、演算器１０５をスルー
して、復号画像データとして、そのままフレームメモリ
１０６に供給されて記憶される。The output data of the motion compensator 107 as well as the output data of the IDCT device 104 is supplied to the arithmetic unit 105. That is, the motion compensator 107 sets the already-decoded image data stored in the frame memory 106 to I as in the case of the motion compensator 42 of FIG.
According to the motion vector and the prediction mode from the VLC unit 102, the arithmetic unit 1 is used as predicted image data.
Supply to 05. The calculator 105 outputs the output data (prediction residual (difference value)) of the IDCT unit 104 and the motion compensator 10
The original image data is decoded by adding it to the predicted image data from 7. The decoded image data is supplied to and stored in the frame memory 106. If the output data of the IDCT device 104 is intra-coded, the output data passes through the arithmetic unit 105 and is directly supplied to the frame memory 106 and stored as decoded image data. It

【００２８】フレームメモリ１０６に記憶された復号画
像データは、その後に復号される画像データの参照画像
データとして用いられる。さらに、復号画像データは、
出力再生画像として、例えば、図示せぬディスプレイな
どに供給されて表示される。The decoded image data stored in the frame memory 106 is used as reference image data for image data to be subsequently decoded. Furthermore, the decoded image data is
The output reproduction image is supplied and displayed on, for example, a display (not shown).

【００２９】なお、ＭＰＥＧ１および２では、Ｂピクチ
ャは、参照画像データとして用いられないため、エンコ
ーダまたはデコーダのそれぞれにおいて、フレームメモ
リ４１（図３８）または１０６（図３９）には記憶され
ない。In MPEG1 and MPEG2, since B picture is not used as reference image data, it is not stored in the frame memory 41 (FIG. 38) or 106 (FIG. 39) in the encoder or the decoder, respectively.

【００３０】[0030]

【発明が解決しようとする課題】以上の図３８、図３９
に示したエンコーダ、デコーダは、ＭＰＥＧ１／２の規
格に準拠したものであるが、現在、画像を構成する物体
などのオブジェクトのシーケンスであるＶＯ（Video Ob
ject）単位で符号化を行う方式につき、ＩＳＯ−ＩＥＣ
／ＪＴＣ１／ＳＣ２９／ＷＧ１１において、ＭＰＥＧ
（Moving Picture Experts Group）４として標準化作業
が進められている。[Problems to be Solved by the Invention]
The encoders and decoders shown in (1) are compliant with the MPEG1 / 2 standard, but are currently VO (Video Ob) which is a sequence of objects such as objects forming an image.
ISO-IEC for the method of encoding in units of
In / JTC1 / SC29 / WG11, MPEG
(Moving Picture Experts Group) 4 is in the process of standardization.

【００３１】ところで、ＭＰＥＧ４については、主とし
て、通信の分野で利用されるものとして、標準化作業が
進められていたため、ＭＰＥＧ１／２において規定され
ているＧＯＰ（Group Of Picture）は、ＭＰＥＧ４では
規定されておらず、従って、ＭＰＥＧ４が蓄積メディア
に利用された場合には、効率的なランダムアクセスが困
難になることが予想される。Meanwhile, with regard to MPEG4, since standardization work has been advanced mainly for use in the field of communication, GOP (Group Of Picture) defined in MPEG1 / 2 is defined in MPEG4. Therefore, if MPEG4 is used as a storage medium, it is expected that efficient random access will be difficult.

【００３２】本発明は、このような状況に鑑みてなされ
たものであり、効率的なランダムアクセスをすることが
できるようにするものである。The present invention has been made in view of such a situation, and makes it possible to perform efficient random access.

【００３３】[0033]

【課題を解決するための手段】本発明の画像符号化方法
は、複数のＶＯＰをグループ化し、各グループのＶＯＰ
の符号化を開始した絶対時刻を表す絶対時刻情報をグル
ープ単位に付加する第１の付加ステップと、グループ内
における相対時刻を、秒精度で表す秒精度時刻情報を生
成する秒精度時刻情報生成ステップと、Ｉ−ＶＯＰ，Ｐ
−ＶＯＰ、またはＢ−ＶＯＰそれぞれの表示時刻の直前
の秒精度時刻情報から、それぞれの表示時刻までの時間
を、秒精度より細かい精度で表す詳細時間情報を生成す
る詳細時間情報生成ステップと、Ｉ−ＶＯＰ，Ｐ−ＶＯ
Ｐ、またはＢ−ＶＯＰの表示時刻を表す情報として、秒
精度時刻情報および詳細時間情報を、対応するＩ−ＶＯ
Ｐ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰにそれぞれ付加する
第２の付加ステップとを備え、秒精度時刻情報生成ステ
ップにおいて、所定のＶＯＰについての秒精度時刻情報
として、絶対時刻情報から、所定のＶＯＰの表示時刻ま
での時間を、秒精度で表したもの、または所定のＶＯＰ
の直前に表示されるＩ−ＶＯＰもしくはＰ−ＶＯＰの表
示時刻から、所定のＶＯＰの表示時刻までの時間を、秒
精度で表したものを生成し、絶対時刻情報に、Ｉ−ＶＯ
Ｐ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰそれぞれに付加され
ている秒精度時刻情報および詳細時間情報を加算した時
刻を、Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰそれ
ぞれの表示時刻とすることを特徴とする。According to the image coding method of the present invention, a plurality of VOPs are grouped, and the VOPs of each group are grouped.
First addition step of adding absolute time information indicating the absolute time at which the encoding of the above is started in units of groups, and second precision time information generation step of generating second precision time information indicating relative time within the group with second precision And I-VOP, P
-VOP or B-VOP, detailed time information generation step of generating detailed time information representing the time from the second precision time information immediately before each display time to each display time with a precision finer than the second precision; -VOP, P-VO
As the information indicating the display time of the P or B-VOP, the second precision time information and the detailed time information are stored in the corresponding I-VO.
P, P-VOP, or a second addition step of adding to each of the B-VOPs, and in the second precision time information generation step, from the absolute time information to the predetermined VOP as the second precision time information for the predetermined VOP. The time up to the display time of is expressed in seconds precision or a predetermined VOP
The time from the display time of the I-VOP or the P-VOP displayed immediately before the display time of the predetermined VOP to the display time of the predetermined VOP is generated with the second precision, and the absolute time information is added to the I-VO.
Added to P, P-VOP, or B-VOP respectively
When the second precision time information and the detailed time information are added
The I-VOP, P-VOP, or B-VOP
It is characterized in that each display time .

【００３４】本発明の画像復号方法は、絶対時刻情報
に、Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰそれぞ
れに付加されている秒精度時刻情報および詳細時間情報
を加算することで、Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ
−ＶＯＰそれぞれの表示時刻を求める表示時刻算出ステ
ップと、Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰ
を、対応する表示時刻にしたがって復号する復号ステッ
プとを備え、所定のＶＯＰについての秒精度時刻情報と
して、絶対時刻情報から、所定のＶＯＰの表示時刻まで
の時間を、秒精度で表したもの、または所定のＶＯＰの
直前に表示されるＩ−ＶＯＰもしくはＰ−ＶＯＰの表示
時刻から、所定のＶＯＰの表示時刻までの時間を、秒精
度で表したものが用いられていることを特徴とする。The image decoding method of the present invention uses absolute time information.
I-VOP, P-VOP, or B-VOP respectively
Second precision time information and detailed time information added to this
By adding I-VOP, P-VOP, or B
-Display time calculation step for obtaining display time of each VOP, and I-VOP, P-VOP, or B-VOP
And a decoding step of decoding according to the corresponding display time, wherein the time from the absolute time information to the display time of the predetermined VOP is represented with second accuracy as the second accuracy time information for the predetermined VOP. Alternatively, it is characterized in that the time from the display time of the I-VOP or P-VOP displayed immediately before the predetermined VOP to the display time of the predetermined VOP is expressed in seconds accuracy.

【００３５】本発明の画像復号装置は、絶対時刻情報
に、Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰそれぞ
れに付加されている秒精度時刻情報および詳細時間情報
を加算することで、Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ
−ＶＯＰそれぞれの表示時刻を求める表示時刻算出手段
と、Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰを、対
応する表示時刻にしたがって復号する復号手段とを備
え、所定のＶＯＰについての秒精度時刻情報として、絶
対時刻情報から、所定のＶＯＰの表示時刻までの時間
を、秒精度で表したもの、または所定のＶＯＰの直前に
表示されるＩ−ＶＯＰもしくはＰ−ＶＯＰの表示時刻か
ら、所定のＶＯＰの表示時刻までの時間を、秒精度で表
したものが用いられていることを特徴とする。 The image decoding apparatus according to the present invention uses absolute time information.
I-VOP, P-VOP, or B-VOP respectively
Second precision time information and detailed time information added to this
By adding I-VOP, P-VOP, or B
-Display time calculating means for obtaining the display time of each VOP
And I-VOP, P-VOP, or B-VOP
And a decoding means for decoding according to the corresponding display time.
Yes, as the second precision time information for a given VOP,
Time from the time information to the display time of the specified VOP
In seconds precision, or immediately before a given VOP
Is it the display time of the displayed I-VOP or P-VOP?
, The time until the display time of the specified VOP is displayed with second accuracy.
It is characterized by the fact that it is used.

【００３６】[0036]

【００３７】[0037]

【００３８】[0038]

【００３９】[0039]

【００４０】[0040]

【００４１】[0041]

【００４２】[0042]

【００４３】本発明の画像符号化方法においては、複数
のＶＯＰがグループ化され、各グループのＶＯＰの符号
化を開始した絶対時刻を表す絶対時刻情報がグループ単
位に付加される。さらに、グループ内における相対時刻
を、秒精度で表す秒精度時刻情報が生成されるととも
に、Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰそれぞ
れの表示時刻の直前の秒精度時刻情報から、それぞれの
表示時刻までの時間を、秒精度より細かい精度で表す詳
細時間情報が生成される。そして、Ｉ−ＶＯＰ，Ｐ−Ｖ
ＯＰ、またはＢ−ＶＯＰの表示時刻を表す情報として、
秒精度時刻情報および詳細時間情報が、対応するＩ−Ｖ
ＯＰ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰにそれぞれ付加さ
れる。この場合に、所定のＶＯＰについての秒精度時刻
情報として、絶対時刻情報から、所定のＶＯＰの表示時
刻までの時間を、秒精度で表したもの、または所定のＶ
ＯＰの直前に表示されるＩ−ＶＯＰもしくはＰ−ＶＯＰ
の表示時刻から、所定のＶＯＰの表示時刻までの時間
を、秒精度で表したものが生成され、絶対時刻情報に、
Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰそれぞれに
付加されている秒精度時刻情報および詳細時間情報を加
算した時刻が、Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ−Ｖ
ＯＰそれぞれの表示時刻とされる。 In the image coding method of the present invention, a plurality of VOPs are grouped, and absolute time information indicating the absolute time when the coding of the VOPs of each group is started is added in group units. Further, second precision time information representing relative time in the group with second precision is generated, and from the second precision time information immediately before the display time of each I-VOP, P-VOP, or B-VOP, the respective precision time information is displayed. Detailed time information is generated that represents the time until the display time with a precision finer than the second precision. And I-VOP, P-V
As information indicating the display time of OP or B-VOP,
Second precision time information and detailed time information correspond to IV
It is added to OP, P-VOP, or B-VOP, respectively. In this case, as the second precision time information about the predetermined VOP, the time from the absolute time information to the display time of the predetermined VOP is represented by the second precision or the predetermined VOP.
I-VOP or P-VOP displayed immediately before OP
The time from the display time of to the display time of the predetermined VOP is represented with second precision, and the absolute time information is
For each I-VOP, P-VOP, or B-VOP
Adds the second precision time information and detailed time information
The calculated time is I-VOP, P-VOP, or B-V
The display time of each OP is set.

【００４４】本発明の画像復号方法および画像復号装置
においては、絶対時刻情報に、Ｉ−ＶＯＰ，Ｐ−ＶＯ
Ｐ、またはＢ−ＶＯＰそれぞれに付加されている秒精度
時刻情報および詳細時間情報を加算することで、Ｉ−Ｖ
ＯＰ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰそれぞれの表示時
刻が求められ、Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ−Ｖ
ＯＰが、対応する表示時刻にしたがって復号される。こ
の場合に、所定のＶＯＰについての秒精度時刻情報とし
て、絶対時刻情報から、所定のＶＯＰの表示時刻までの
時間を、秒精度で表したもの、または所定のＶＯＰの直
前に表示されるＩ−ＶＯＰもしくはＰ−ＶＯＰの表示時
刻から、所定のＶＯＰの表示時刻までの時間を、秒精度
で表したものが用いられている。In the image decoding method and the image decoding apparatus of the present invention, the absolute time information includes I-VOP and P-VO.
Second accuracy added to each P or B-VOP
By adding the time information and the detailed time information, the IV
When displaying OP, P-VOP, or B-VOP respectively
Is required, I-VOP, P-VOP, or B-V
The OP is decoded according to the corresponding display time. In this case, as the second precision time information for the predetermined VOP, the time from the absolute time information to the display time of the predetermined VOP is expressed in second precision, or the I-displayed immediately before the predetermined VOP. The time from the display time of the VOP or the P-VOP to the display time of the predetermined VOP, which is expressed in seconds, is used.

【００４５】[0045]

【００４６】[0046]

【００４７】[0047]

【００４８】[0048]

【００４９】[0049]

【００５０】[0050]

【００５１】[0051]

【００５２】[0052]

【００５３】[0053]

DETAILED DESCRIPTION OF THE INVENTION

【００５４】[0054]

【００５５】[0055]

【００５６】[0056]

【００５７】[0057]

【００５８】[0058]

【００５９】[0059]

【００６０】[0060]

【００６１】図１は、本発明を適用したエンコーダの一
実施の形態の構成例を示している。FIG. 1 shows a configuration example of an embodiment of an encoder to which the present invention is applied.

【００６２】符号化すべき画像（動画像）データは、Ｖ
Ｏ（Video Object）構成部１に入力され、ＶＯ構成部１
では、そこに入力される画像を構成するオブジェクトご
とに、そのシーケンスであるＶＯが構成され、ＶＯＰ構
成部２₁乃至２_Nに出力される。即ち、ＶＯ構成部１にお
いてＮ個のＶＯ＃１乃至ＶＯ＃Ｎが構成された場合、そ
のＮ個のＶＯ＃１乃至ＶＯ＃Ｎは、ＶＯＰ構成部２₁乃
至２_Nにそれぞれ出力される。The image (moving image) data to be encoded is V
O (Video Object) component 1 is input to VO component 1
Then, a VO, which is the sequence, is constructed for each object constituting the image input thereto, and is output to the VOP constructing sections 2 _{1 to} 2 _N. That is, when N VO # 1 to VO # N are configured in the VO configuration unit 1, the N VO # 1 to VO # N are output to the VOP configuration units 2 _{1 to} 2 _N , respectively.

【００６３】具体的には、例えば、符号化すべき画像デ
ータが、独立した背景Ｆ１のシーケンスと前景Ｆ２のシ
ーケンスとから構成される場合、ＶＯ構成部１は、例え
ば、前景Ｆ２のシーケンスを、ＶＯ＃１として、ＶＯＰ
構成部２₁に出力するとともに、背景Ｆ１のシーケンス
を、ＶＯ＃２として、ＶＯＰ構成部２₂に出力する。Specifically, for example, when the image data to be encoded is composed of the sequence of the background F1 and the sequence of the foreground F2 which are independent, the VO constructing unit 1 changes the sequence of the foreground F2 to VO, for example. VOP as # 1
The sequence of the background F1 is output to the VOP constructing unit 2 ₂ as VO # 2 while being output to the constructing unit 2 ₁ .

【００６４】なお、ＶＯ構成部１は、符号化すべき画像
データが、例えば、背景Ｆ１と前景Ｆ２とを、既に合成
したものである場合、所定のアルゴリズムにしたがっ
て、画像を領域分割することにより、背景Ｆ１と前景Ｆ
２とを取り出し、それぞれのシーケンスとしてのＶＯ
を、対応するＶＯＰ構成部２_n（但し、ｎ＝１，２，・
・・，Ｎ）に出力する。If the image data to be encoded is, for example, a background F1 and a foreground F2 that have already been combined, the VO construction unit 1 divides the image into regions by a predetermined algorithm. Background F1 and foreground F
2 and take out VO as each sequence
To the corresponding VOP component 2 _n (where n = 1, 2, ...
.., N)

【００６５】ＶＯＰ構成部２_nは、ＶＯ構成部１の出力
から、ＶＯＰ（VO Plane）を構成する。即ち、例えば、
各フレームから物体を抽出し、その物体を囲む、例え
ば、最小の長方形（以下、適宜、最小長方形という）を
ＶＯＰとする。なお、このとき、ＶＯＰ構成部２_nは、
その横および縦の画素数が、例えば、１６の倍数となる
ようにＶＯＰを構成する。ＶＯ構成部２_nは、ＶＯＰを
構成すると、そのＶＯＰを、ＶＯＰ符号化部３_nに出力
する。The VOP constructing unit 2 _n constructs a VOP (VO Plane) from the output of the VO constructing unit 1. That is, for example,
An object is extracted from each frame, and a minimum rectangle (hereinafter, appropriately referred to as a minimum rectangle) surrounding the object is set as a VOP. At this time, the VOP component 2 _n
The VOP is configured such that the number of horizontal and vertical pixels is, for example, a multiple of 16. When the VO constructing unit 2 _n constructs the VOP, the VO constructing unit 2 _n outputs the VOP to the VOP encoding unit 3 _n .

【００６６】さらに、ＶＯＰ構成部２_nは、ＶＯＰの大
きさ（例えば、横および縦の長さ）を表すサイズデータ
（VOP size）と、フレームにおける、そのＶＯＰの位置
（例えば、フレームの最も左上を原点とするときの座
標）を表すオフセットデータ（VOP offset）とを検出
し、これらのデータも、ＶＯＰ符号化部３_nに供給す
る。Further, the VOP constructing section 2 _n includes size data (VOP size) representing the size (for example, the horizontal and vertical lengths) of the VOP and the position of the VOP in the frame (for example, the upper leftmost part of the frame). The offset data (VOP offset) representing the coordinate when the origin is the origin is detected, and these data are also supplied to the VOP encoding unit 3 _n .

【００６７】ＶＯＰ符号化部３_nは、ＶＯＰ構成部２_nの
出力を、例えば、ＭＰＥＧや、Ｈ．２６３などの規格に
準拠した方式で符号化し、その結果得られるビットスト
リームを、多重化部４に出力する。多重化部４は、ＶＯ
Ｐ符号化部３₁乃至３_Nからのビットストリームを多重化
し、その結果得られる多重化データを、例えば、地上波
や、衛星回線、ＣＡＴＶ網その他の伝送路５を介して伝
送し、または、例えば、磁気ディスク、光磁気ディス
ク、光ディスク、磁気テープその他の記録媒体６に記録
する。The VOP encoding unit 3 _n outputs the output of the VOP forming unit 2 _n to, for example, MPEG or H.264. It is encoded by a method compliant with standards such as H.263 and the resulting bit stream is output to the multiplexing unit 4. The multiplexing unit 4 uses the VO
The bit streams from the P coding units 3 _{1 to} 3 _N are multiplexed, and the resulting multiplexed data is transmitted, for example, via a terrestrial wave, a satellite line, a CATV network, or another transmission line 5, or For example, it is recorded on a recording medium 6 such as a magnetic disk, a magneto-optical disk, an optical disk, a magnetic tape or the like.

【００６８】ここで、ＶＯおよびＶＯＰについて説明す
る。Here, VO and VOP will be described.

【００６９】ＶＯは、ある合成画像のシーケンスが存在
する場合の、その合成画像を構成する各オブジェクト
（物体）のシーケンスであり、ＶＯＰは、ある時刻にお
けるＶＯを意味する。即ち、例えば、いま、画像Ｆ１お
よびＦ２を合成して構成される合成画像Ｆ３がある場
合、画像Ｆ１またはＦ２が時系列に並んだものが、それ
ぞれＶＯであり、ある時刻における画像Ｆ１またはＦ２
が、それぞれＶＯＰである。従って、ＶＯは、異なる時
刻の、同一オブジェクトのＶＯＰの集合ということがで
きる。VO is a sequence of objects (objects) forming a composite image when a sequence of the composite image exists, and VOP means VO at a certain time. That is, for example, when there is a composite image F3 that is composed by combining the images F1 and F2, the images in which the images F1 and F2 are arranged in time series are VO, respectively, and the images F1 and F2 at a certain time.
Are VOPs. Therefore, a VO can be said to be a set of VOPs of the same object at different times.

【００７０】なお、例えば、画像Ｆ１を背景とするとと
もに、画像Ｆ２を前景とすると、合成画像Ｆ３は、画像
Ｆ２を抜くためのキー信号を用いて、画像Ｆ１およびＦ
２を合成することによって得られるが、この場合におけ
る画像Ｆ２のＶＯＰには、その画像Ｆ２を構成する画像
データ（輝度信号および色差信号）の他、適宜、そのキ
ー信号も含まれるものとする。For example, when the image F1 is used as the background and the image F2 is used as the foreground, the composite image F3 uses the key signal for extracting the image F2 and the images F1 and F1.
Although it is obtained by synthesizing the two, the VOP of the image F2 in this case includes not only the image data (luminance signal and color difference signal) forming the image F2 but also its key signal as appropriate.

【００７１】画像フレーム（画枠）のシーケンスは、そ
の大きさおよび位置のいずれも変化しないが、ＶＯは、
大きさや位置が変化する場合がある。即ち、同一のＶＯ
を構成するＶＯＰであっても、時刻によって、その大き
さや位置が異なる場合がある。The sequence of image frames does not change in either size or position, but VO
The size and position may change. That is, the same VO
The size and position of the VOP that composes may differ depending on the time.

【００７２】具体的には、図２は、背景である画像Ｆ１
と、前景である画像Ｆ２とからなる合成画像を示してい
る。Specifically, FIG. 2 shows the background image F1.
And a foreground image F2.

【００７３】画像Ｆ１は、例えば、ある自然の風景を撮
影したものであり、その画像全体のシーケンスが１つの
ＶＯ（ＶＯ＃０とする）とされている。また、画像Ｆ２
は、例えば、人が歩いている様子を撮影したものであ
り、その人を囲む最小の長方形のシーケンスが１つのＶ
Ｏ（ＶＯ＃１とする）とされている。The image F1 is, for example, a photograph of a certain natural landscape, and the sequence of the entire image is one VO (referred to as VO # 0). Also, the image F2
Is, for example, a photograph of a person walking, and the smallest rectangular sequence surrounding the person is one V
It is O (denoted as VO # 1).

【００７４】この場合、ＶＯ＃０は風景の画像であるか
ら、基本的に、通常の画像のフレームと同様に、その位
置および大きさの両方とも変化しない。これに対して、
ＶＯ＃１は人の画像であるから、人物が左右に移動した
り、また、図面において手前側または奥側に移動するこ
とにより、その大きさや位置が変化する。従って、図２
は、同一時刻におけるＶＯ＃０およびＶＯ＃１を表して
いるが、ＶＯの位置や大きさは、時間の経過にともなっ
て変化することがある。In this case, since VO # 0 is a landscape image, basically, both its position and size do not change, like a frame of a normal image. On the contrary,
Since VO # 1 is an image of a person, its size and position change as the person moves to the left or right, or moves to the front side or the back side in the drawing. Therefore, FIG.
Represents VO # 0 and VO # 1 at the same time, but the position and size of VO may change over time.

【００７５】そこで、図１のＶＯＰ符号化部３_nは、そ
の出力するビットストリームに、ＶＯＰを符号化したデ
ータの他、所定の絶対座標系におけるＶＯＰの位置（座
標）および大きさに関する情報も含めるようになされて
いる。なお、図２においては、ＶＯ＃０を構成する、あ
る時刻のＶＯＰ（画像Ｆ１）の位置を示すベクトルをＯ
ＳＴ０と、その時刻と同一時刻における、ＶＯ＃１のＶ
ＯＰ（画像Ｆ２）の位置を表すベクトルをＯＳＴ１と、
それぞれ表してある。Therefore, the VOP coding unit 3 _n in FIG. 1 outputs, in the bit stream output from the VOP coding unit 3 _n , information on the position (coordinates) and the size of the VOP in a predetermined absolute coordinate system, in addition to the VOP-coded data. It is designed to be included. In FIG. 2, the vector indicating the position of the VOP (image F1) at a certain time, which constitutes VO # 0, is O.
ST0 and V of VO # 1 at the same time as ST0
A vector representing the position of OP (image F2) is OST1 and
Each is represented.

【００７６】次に、図３は、スケーラビリティを実現す
る、図１のＶＯＰ符号化部３_nの構成例を示している。
即ち、ＭＰＥＧでは、異なる画像サイズやフレームレー
トに対応するスケーラビリティを実現するスケーラブル
符号化方式が導入されており、図３に示したＶＯＰ符号
化部３_nでは、そのようなスケーラビリティを実現する
ことができるようになされている。Next, FIG. 3 shows an example of the structure of the VOP coding unit 3 _n shown in FIG. 1, which realizes scalability.
That is, MPEG has introduced a scalable coding method that realizes scalability corresponding to different image sizes and frame rates, and the VOP coding unit 3 _n shown in FIG. 3 can realize such scalability. It is made possible.

【００７７】ＶＯＰ構成部２_nからのＶＯＰ（画像デー
タ）、並びにそのサイズデータ（VOPsize）、およびオ
フセットデータ（VOP offset）は、いずれも画像階層化
部２１に供給される。The VOP (image data) from the VOP constructing unit 2 _n , its size data (VOPsize), and offset data (VOP offset) are all supplied to the image hierarchizing unit 21.

【００７８】画像階層化部２１は、ＶＯＰから、１以上
の階層の画像データを生成する（ＶＯＰの１以上の階層
化を行う）。即ち、例えば、空間スケーラビリティの符
号化を行う場合においては、画像階層化部２１は、そこ
に入力される画像データを、そのまま上位レイヤ（上位
階層）の画像データとして出力するとともに、それらの
画像データを構成する画素数を間引くことなどにより縮
小し（解像度を低下させ）、これを下位レイヤ（下位階
層）の画像データとして出力する。The image hierarchization unit 21 generates image data of one or more hierarchies from VOPs (performs one or more hierarchies of VOPs). That is, for example, in the case of performing spatial scalability encoding, the image layering unit 21 outputs the image data input thereto as it is as image data of an upper layer (upper layer), and at the same time, outputs the image data. Is reduced (the resolution is lowered) by thinning out the number of pixels forming the image, and this is output as image data of a lower layer (lower layer).

【００７９】なお、入力されたＶＯＰを下位レイヤのデ
ータとするとともに、そのＶＯＰの解像度を、何らかの
手法で高くし（画素数を多くし）、これを、上位レイヤ
のデータとすることなども可能である。It is also possible to use the input VOP as lower layer data, increase the resolution of the VOP by some method (increase the number of pixels), and use this as upper layer data. Is.

【００８０】また、階層数は、１とすることが可能であ
るが、この場合、スケーラビリティは実現されない。な
お、この場合、ＶＯＰ符号化部３_nは、例えば、下位レ
イヤ符号化部２５だけで構成されることになる。The number of layers can be 1, but in this case scalability is not realized. In this case, the VOP coding unit 3 _n is composed of only the lower layer coding unit 25, for example.

【００８１】さらに、階層数は、３以上とすることも可
能であるが、ここでは、簡単のために、２階層の場合に
ついて説明を行う。Further, the number of layers can be three or more, but here, for simplicity, a case of two layers will be described.

【００８２】画像階層化部２１は、例えば、時間スケー
ラビリティ（テンポラルスケーラビリティ）の符号化を
行う場合、時刻に応じて、画像データを、下位レイヤま
たは上位レイヤのデータとして、例えば、交互に出力す
る。即ち、例えば、画像階層化部２１は、そこに、ある
ＶＯを構成するＶＯＰが、ＶＯＰ０，ＶＯＰ１，ＶＯＰ
２，ＶＯＰ３，・・・の順で入力されたとした場合、Ｖ
ＯＰ０，ＶＯＰ２，ＶＯＰ４，ＶＯＰ６，・・・を、下
位レイヤのデータとして、また、ＶＯＰ１，ＶＯＰ３，
ＶＯＰ５，ＶＯＰ７，・・・を、上位レイヤデータとし
て出力する。なお、時間スケーラビリティの場合は、こ
のようにＶＯＰが間引かれたものが、下位レイヤおよび
上位レイヤのデータとされるだけで、画像データの拡大
または縮小（解像度の変換）は行われない（但し、行う
ようにすることも可能である）。For example, when performing coding of temporal scalability (temporal scalability), the image layering unit 21 alternately outputs image data as lower layer data or upper layer data, for example, according to time. That is, for example, in the image hierarchization unit 21, the VOPs that form a certain VO are VOP0, VOP1, VOP.
If input in the order of 2, VOP 3, ..., V
OP0, VOP2, VOP4, VOP6, ... Are used as lower layer data, and VOP1, VOP3,
VOP5, VOP7, ... Are output as upper layer data. In the case of temporal scalability, the VOPs decimated in this way are only data of the lower layer and the upper layer, and the image data is not enlarged or reduced (conversion of resolution) (however, not shown). , It is also possible to do).

【００８３】また、画像階層化部２１は、例えば、ＳＮ
Ｒ（Signal to Noise Ratio）スケーラビリティの符号
化を行う場合、入力された画像データを、そのまま上位
レイヤまたは下位レイヤのデータそれぞれとして出力す
る。即ち、この場合、下位レイヤ並びに上位レイヤの画
像データは、同一のデータとなる。Further, the image hierarchizing unit 21 uses, for example, SN
When encoding R (Signal to Noise Ratio) scalability, input image data is output as it is as upper layer data or lower layer data, respectively. That is, in this case, the image data of the lower layer and the image data of the upper layer are the same data.

【００８４】ここで、ＶＯＰごとに符号化を行う場合の
空間スケーラビリティについては、例えば、次のような
３種類が考えられる。Here, with respect to the spatial scalability in the case of performing coding for each VOP, the following three types can be considered, for example.

【００８５】即ち、例えば、いま、ＶＯＰとして、図２
に示したような画像Ｆ１およびＦ２でなる合成画像が入
力されたとすると、第１の空間スケーラビリティは、図
４に示すように、入力されたＶＯＰ全体（図４（Ａ））
を上位レイヤ（EnhancementLayer）とするとともに、そ
のＶＯＰ全体を縮小したもの（図４（Ｂ））を下位レイ
ヤ（Base Layer）とするものである。That is, for example, as a VOP, as shown in FIG.
Assuming that a composite image composed of the images F1 and F2 as shown in FIG. 4 is input, the first spatial scalability is as shown in FIG. 4, and the entire input VOP (FIG. 4A)
Is an upper layer (Enhancement Layer), and the entire VOP is reduced (FIG. 4B) is a lower layer (Base Layer).

【００８６】また、第２の空間スケーラビリティは、図
５に示すように、入力されたＶＯＰを構成する一部の物
体（図５（Ａ）（ここでは、画像Ｆ２に相当する部
分）））を抜き出して（なお、このような抜き出しは、
例えば、ＶＯＰ構成部２_nにおける場合と同様にして行
われ、従って、これにより抜き出された物体も、１つの
ＶＯＰと考えることができる）、上位レイヤとするとと
もに、そのＶＯＰ全体を縮小したもの（図５（Ｂ））を
下位レイヤとするものである。The second spatial scalability is as shown in FIG. 5, in which a part of the objects constituting the input VOP (FIG. 5 (A) (here, the part corresponding to the image F2)). Pull out (note that this kind of pull out is
For example, it is performed in the same manner as in the case of the VOP configuration unit 2 _n , and thus the object extracted by this can also be considered as one VOP), which is an upper layer and the entire VOP is reduced. (FIG. 5 (B)) is the lower layer.

【００８７】さらに、第３の空間スケーラビリティは、
図６および図７に示すように、入力されたＶＯＰを構成
する物体（ＶＯＰ）を抜き出して、その物体ごとに、上
位レイヤおよび下位レイヤを生成するものである。な
お、図６は、図２のＶＯＰを構成する背景（画像Ｆ１）
から上位レイヤおよび下位レイヤを生成した場合を示し
ており、また、図７は、図２のＶＯＰを構成する前景
（画像Ｆ２）から上位レイヤおよび下位レイヤを生成し
た場合を示している。Further, the third spatial scalability is
As shown in FIG. 6 and FIG. 7, an object (VOP) forming the input VOP is extracted, and an upper layer and a lower layer are generated for each object. Note that FIG. 6 shows the background (image F1) that constitutes the VOP of FIG.
7 shows the case where the upper layer and the lower layer are generated, and FIG. 7 shows the case where the upper layer and the lower layer are generated from the foreground (image F2) forming the VOP of FIG.

【００８８】以上のようなスケーラビリティのうちのい
ずれを用いるかは予め決められており、画像階層化部２
１は、その予め決められたスケーラビリティによる符号
化を行うことができるように、ＶＯＰの階層化を行う。Which of the above scalability is used is determined in advance, and the image hierarchy unit 2
1 performs VOP layering so that encoding can be performed according to the predetermined scalability.

【００８９】さらに、画像階層化部２１は、そこに入力
されるＶＯＰのサイズデータおよびオフセットデータ
（それぞれを、以下、適宜、初期サイズデータ、初期オ
フセットデータという）から、生成した下位レイヤおよ
び上位レイヤのＶＯＰの所定の絶対座標系における位置
を表すオフセットデータと、その大きさを示すサイズデ
ータとを計算（決定）する。Further, the image hierarchizing unit 21 generates the lower layer and the upper layer from the VOP size data and the offset data (respectively referred to as initial size data and initial offset data, respectively) input thereto. Offset data representing the position of the VOP in a predetermined absolute coordinate system and size data representing the size thereof are calculated (determined).

【００９０】ここで、下位レイヤ並びに上位レイヤのＶ
ＯＰのオフセットデータ（位置情報）およびサイズデー
タの決定方法について、例えば、上述の第２のスケーラ
ビリティ（図５）を行う場合を例に説明する。Here, the V of the lower layer and the V of the upper layer are
A method of determining OP offset data (position information) and size data will be described, for example, in the case where the above-described second scalability (FIG. 5) is performed.

【００９１】この場合、下位レイヤのオフセットデータ
ＦＰＯＳ＿Ｂは、例えば、図８（Ａ）に示すように、下
位レイヤの画像データを、その解像度および上位レイヤ
の解像度の違いに基づいて拡大（アップサンプリング）
したときに、即ち、下位レイヤの画像を、上位レイヤの
画像の大きさと一致するような拡大率（上位レイヤの画
像を縮小して下位レイヤの画像を生成したときの、その
縮小率の逆数）（以下、適宜、倍率ＦＲという）で拡大
したときに、その拡大画像の絶対座標系におけるオフセ
ットデータが、初期オフセットデータと一致するように
決定される。また、下位レイヤのサイズデータＦＳＺ＿
Ｂも同様に、下位レイヤの画像を倍率ＦＲで拡大したと
きに得られる拡大画像のサイズデータが初期サイズデー
タと一致するように決定される。即ち、オフセットデー
タＦＰＯＳ＿ＢまたはサイズデータＦＳＺ＿Ｂは、それ
ぞれのＦＲ倍か、初期オフセットデータまたは初期サイ
ズデータと一致するように決定される。In this case, the offset data FPOS_B of the lower layer is, for example, as shown in FIG. 8A, the image data of the lower layer is enlarged (upsampling) based on the difference in the resolution and the resolution of the upper layer.
When it is done, that is, the enlargement ratio that matches the image of the lower layer with the size of the image of the upper layer (the reciprocal of the reduction ratio when the image of the lower layer is generated by reducing the image of the upper layer) When the image is enlarged by (hereinafter, appropriately referred to as a magnification FR), the offset data in the absolute coordinate system of the enlarged image is determined so as to match the initial offset data. In addition, the size data FSZ_ of the lower layer
Similarly, B is determined so that the size data of the enlarged image obtained when the image of the lower layer is enlarged by the magnification FR matches the initial size data. That is, the offset data FPOS_B or the size data FSZ_B is determined to be FR times the initial value, or to match the initial offset data or the initial size data.

【００９２】一方、上位レイヤのオフセットデータＦＰ
ＯＳ＿Ｅは、例えば、図８（Ｂ）に示すように、入力さ
れたＶＯＰから抜き出した物体を囲む最小長方形（ＶＯ
Ｐ）の、例えば、左上の頂点の座標が、初期オフセット
データに基づいて求められ、この値に決定される。ま
た、上位レイヤのサイズデータＦＰＯＳ＿Ｅは、入力さ
れたＶＯＰから抜き出した物体を囲む最小長方形の、例
えば横および縦の長さに決定される。On the other hand, the offset data FP of the upper layer
The OS_E is, for example, as shown in FIG. 8B, a minimum rectangle (VO) surrounding an object extracted from the input VOP.
For example, the coordinate of the upper left vertex of P) is obtained based on the initial offset data and is determined to this value. Further, the size data FPOS_E of the upper layer is determined to have, for example, the horizontal and vertical lengths of the smallest rectangle surrounding the object extracted from the input VOP.

【００９３】従って、この場合、下位レイヤのオフセッ
トデータＦＰＯＳ＿ＢおよびサイズデータＦＰＯＳ＿Ｂ
を、倍率ＦＲにしたがって変換し（変換後のオフセット
データＦＰＯＳ＿ＢまたはサイズデータＦＰＯＳ＿Ｂ
を、それぞれ、変換オフセットデータＦＰＯＳ＿Ｂまた
は変換サイズデータＦＰＯＳ＿Ｂという）、絶対座標系
において、変換オフセットデータＦＰＯＳ＿Ｂに対応す
る位置に、変換サイズデータＦＳＺ＿Ｂに対応する大き
さの画枠を考え、そこに、下位レイヤの画像データをＦ
Ｒ倍だけした拡大画像を配置するとともに（図８
（Ａ））、その絶対座標系において、上位レイヤのオフ
セットデータＦＰＯＳ＿ＥおよびサイズデータＦＰＯＳ
＿Ｅにしたがって、上位レイヤの画像を同様に配置する
と（図８（Ｂ））、拡大画像を構成する各画素と、上位
レイヤの画像を構成する各画素とは、対応するものどう
しが同一の位置に配置されることになる。即ち、この場
合、例えば、図８において、上位レイヤの画像である人
の部分と、拡大画像の中の人の部分とは、同一の位置に
配置されることになる。Therefore, in this case, the offset data FPOS_B and the size data FPOS_B of the lower layer
Is converted according to the scaling factor FR (offset data FPOS_B or size data FPOS_B after conversion).
Are respectively referred to as conversion offset data FPOS_B or conversion size data FPOS_B), and in an absolute coordinate system, at a position corresponding to the conversion offset data FPOS_B, an image frame having a size corresponding to the conversion size data FSZ_B is considered, F the layer image data
A magnified image of only R times is placed (see FIG. 8).
(A)), offset data FPOS_E and size data FPOS of the upper layer in the absolute coordinate system
When the images of the upper layer are similarly arranged according to _E (FIG. 8B), the pixels forming the enlarged image and the pixels forming the image of the upper layer correspond to each other at the same position. Will be placed in. That is, in this case, for example, in FIG. 8, the part of the person who is the upper layer image and the part of the person in the enlarged image are arranged at the same position.

【００９４】第１および第３のスケーラビリティにおけ
る場合も、同様にして、下位レイヤの拡大画像および上
位レイヤの画像を構成する、対応する画素どうしが、絶
対座標系において同一の位置に配置されるように、オフ
セットデータＦＰＯＳ＿ＢおよびＦＰＯＳ＿Ｅ、並びに
サイズデータＦＳＺ＿ＢおよびＦＳＺ＿Ｅが決定され
る。In the first and third scalability, similarly, the corresponding pixels forming the enlarged image of the lower layer and the image of the upper layer are arranged at the same position in the absolute coordinate system. Then, the offset data FPOS_B and FPOS_E and the size data FSZ_B and FSZ_E are determined.

【００９５】図３に戻り、画像階層化部２１において生
成された上位レイヤの画像データ、オフセットデータＦ
ＰＯＳ＿Ｅ、およびサイズデータＦＳＺ＿Ｅは、遅延回
路２２で、後述する下位レイヤ符号化部２５における処
理時間だけ遅延され、上位レイヤ符号化部２３に供給さ
れる。また、下位レイヤの画像データ、オフセットデー
タＦＰＯＳ＿Ｂ、およびサイズデータＦＳＺ＿Ｂは、下
位レイヤ符号化部２５に供給される。また、倍率ＦＲ
は、遅延回路２２を介して、上位レイヤ符号化部２３お
よび解像度変換部２４に供給される。Returning to FIG. 3, the upper layer image data and the offset data F generated by the image layering unit 21.
The POS_E and the size data FSZ_E are delayed by the delay circuit 22 by the processing time in the lower layer encoding unit 25, which will be described later, and are supplied to the upper layer encoding unit 23. Further, the lower layer image data, the offset data FPOS_B, and the size data FSZ_B are supplied to the lower layer encoding unit 25. Also, the magnification FR
Is supplied to the upper layer encoding unit 23 and the resolution converting unit 24 via the delay circuit 22.

【００９６】下位レイヤ符号化部２５では、下位レイヤ
の画像データが符号化され、その結果得られる符号化デ
ータ（ビットストリーム）に、オフセットデータＦＰＯ
Ｓ＿ＢおよびサイズデータＦＳＺ＿Ｂが含められ、多重
化部２６に供給される。In the lower layer encoding unit 25, the image data of the lower layer is encoded, and the encoded data (bit stream) obtained as a result is offset data FPO.
The S_B and size data FSZ_B are included and supplied to the multiplexing unit 26.

【００９７】また、下位レイヤ符号化部２５は、符号化
データを局所復号し、その結果局所復号結果である下位
レイヤの画像データを、解像度変換部２４に出力する。
解像度変換部２４は、下位レイヤ符号化部２５からの下
位レイヤの画像データを、倍率ＦＲにしたがって拡大
（または縮小）することにより、元の大きさに戻し、こ
れにより得られる拡大画像を、上位レイヤ符号化部２３
に出力する。Further, the lower layer encoding unit 25 locally decodes the encoded data, and as a result, outputs the lower layer image data which is the local decoding result to the resolution converting unit 24.
The resolution conversion unit 24 restores the image data of the lower layer from the lower layer encoding unit 25 to the original size by enlarging (or reducing) the image data according to the magnification FR, and the enlarged image obtained by this is converted into an upper image. Layer coding unit 23
Output to.

【００９８】一方、上位レイヤ符号化部２３では、上位
レイヤの画像データが符号化され、その結果得られる符
号化データ（ビットストリーム）に、オフセットデータ
ＦＰＯＳ＿ＥおよびサイズデータＦＳＺ＿Ｅが含めら
れ、多重化部２６に供給される。なお、上位レイヤ符号
化部２３においては、上位レイヤ画像データの符号化
は、解像度変換部２４から供給される拡大画像をも参照
画像として用いて行われる。On the other hand, in the upper layer encoding unit 23, the image data of the upper layer is encoded, the encoded data (bit stream) obtained as a result includes the offset data FPOS_E and the size data FSZ_E, and the multiplexing unit 26. In the upper layer encoding unit 23, the encoding of the upper layer image data is performed using the enlarged image supplied from the resolution conversion unit 24 as the reference image.

【００９９】多重化部２６では、上位レイヤ符号化部２
３および下位レイヤ符号化部２５の出力が多重化されて
出力される。In the multiplexing unit 26, the upper layer coding unit 2
3 and the outputs of the lower layer encoding unit 25 are multiplexed and output.

【０１００】なお、下位レイヤ符号化部２５から上位レ
イヤ符号化部２３に対しては、下位レイヤのサイズデー
タＦＳＺ＿Ｂ、オフセットデータＦＰＯＳ＿Ｂ、動きベ
クトルＭＶ、フラグＣＯＤなどが供給されており、上位
レイヤ符号化部２３では、これらのデータを必要に応じ
て参照しながら、処理を行うようになされているが、こ
の詳細については、後述する。The lower layer coding unit 25 supplies the lower layer size data FSZ_B, the offset data FPOS_B, the motion vector MV, the flag COD, etc. to the upper layer coding unit 23. The conversion unit 23 is configured to perform processing while referring to these data as necessary. The details will be described later.

【０１０１】次に、図９は、図３の下位レイヤ符号化部
２５の詳細構成例を示している。なお、図中、図３８に
おける場合と対応する部分については、同一の符号を付
してある。即ち、下位レイヤ符号化部２５は、基本的に
は、図３８のエンコーダと同様に構成されている。Next, FIG. 9 shows a detailed configuration example of the lower layer encoding unit 25 of FIG. In the figure, the same reference numerals are given to the portions corresponding to the case in FIG. That is, the lower layer encoding unit 25 is basically configured similarly to the encoder of FIG.

【０１０２】画像階層化部２１（図３）からの画像デー
タ、即ち、下位レイヤのＶＯＰは、図３８における場合
と同様に、フレームメモリ３１に供給されて記憶され、
動きベクトル検出器３２において、マクロブロック単位
で動きベクトルの検出が行われる。The image data from the image hierarchizing unit 21 (FIG. 3), that is, the VOP of the lower layer is supplied to and stored in the frame memory 31 as in the case of FIG.
The motion vector detector 32 detects a motion vector in macroblock units.

【０１０３】但し、下位レイヤ符号化部２５の動きベク
トル検出器３２には、下位レイヤのＶＯＰのサイズデー
タＦＳＺ＿ＢおよびオフセットデータＦＰＯＳ＿Ｂが供
給されるようになされており、そこでは、このサイズデ
ータＦＳＺ＿ＢおよびオフセットデータＦＰＯＳ＿Ｂに
基づいて、マクロブロックの動きベクトルが検出され
る。However, the motion vector detector 32 of the lower layer encoding unit 25 is supplied with the size data FSZ_B and the offset data FPOS_B of the VOP of the lower layer, in which the size data FSZ_B and The motion vector of the macro block is detected based on the offset data FPOS_B.

【０１０４】即ち、上述したように、ＶＯＰは、時刻
（フレーム）によって、大きさや位置が変化するため、
その動きベクトルの検出にあたっては、その検出のため
の基準となる座標系を設定し、その座標系における動き
を検出する必要がある。そこで、ここでは、動きベクト
ル検出器３２は、上述の絶対座標系を基準となる座標系
とし、その絶対座標系に、サイズデータＦＳＺ＿Ｂおよ
びオフセットデータＦＰＯＳ＿Ｂにしたがって、符号化
対象のＶＯＰおよび参照画像とするＶＯＰを配置して、
動きベクトルを検出するようになされている。That is, as described above, since the size and the position of the VOP change depending on the time (frame),
In detecting the motion vector, it is necessary to set a coordinate system that serves as a reference for the detection and detect the motion in the coordinate system. Therefore, here, the motion vector detector 32 uses the above-mentioned absolute coordinate system as a reference coordinate system, and according to the size coordinate system FSZ_B and the offset data FPOS_B, the VOP and the reference image to be encoded are set in the absolute coordinate system. Place the VOP to
It is designed to detect a motion vector.

【０１０５】なお、検出された動きベクトル（ＭＶ）
は、予測モードとともに、ＶＬＣ器３６および動き補償
器４２に供給される他、上位レイヤ符号化部２３（図
３）にも供給される。The detected motion vector (MV)
Is supplied to the VLC unit 36 and the motion compensator 42 together with the prediction mode, and is also supplied to the upper layer encoding unit 23 (FIG. 3).

【０１０６】また、動き補償を行う場合においても、や
はり、上述したように、基準となる座標系における動き
を検出する必要があるため、動き補償器４２には、サイ
ズデータＦＳＺ＿ＢおよびオフセットデータＦＰＯＳ＿
Ｂが供給されるようになされている。Also, when performing motion compensation, the motion compensator 42 also needs to detect the motion in the reference coordinate system as described above, and therefore the size data FSZ_B and the offset data FPOS_.
B is supplied.

【０１０７】動きベクトルの検出されたＶＯＰは、図３
８における場合と同様に量子化係数とされてＶＬＣ器３
６に供給される。ＶＬＣ器３６には、やはり図３８にお
ける場合と同様に、量子化係数、量子化ステップ、動き
ベクトル、および予測モードが供給される他、画像階層
化部２１からのサイズデータＦＳＺ＿Ｂおよびオフセッ
トデータＦＰＯＳ＿Ｂも供給されており、そこでは、こ
れらのデータすべてが可変長符号化される。The detected VOP of the motion vector is shown in FIG.
In the same way as in the case of 8, the VLC unit 3
6 is supplied. Similarly to the case in FIG. 38, the VLC unit 36 is supplied with the quantization coefficient, the quantization step, the motion vector, and the prediction mode, and also the size data FSZ_B and the offset data FPOS_B from the image layering unit 21. Is provided, where all of this data is variable length encoded.

【０１０８】動きベクトルの検出されたＶＯＰは、上述
したように符号化される他、やはり図３８における場合
と同様に局所復号され、フレームメモリ４１に記憶され
る。この復号画像は、前述したように参照画像として用
いられる他、解像度変換部２４（図３）に出力される。The VOP in which the motion vector is detected is encoded as described above, and also locally decoded as in the case of FIG. 38 and stored in the frame memory 41. This decoded image is used as a reference image as described above, and is also output to the resolution conversion unit 24 (FIG. 3).

【０１０９】なお、ＭＰＥＧ４においては、ＭＰＥＧ１
および２と異なり、Ｂピクチャ（Ｂ−ＶＯＰ）も参照画
像として用いられるため、Ｂピクチャも、局所復号さ
れ、フレームメモリ４１に記憶されるようになされてい
る（但し、現時点においては、Ｂピクチャが参照画像と
して用いられるのは上位レイヤについてだけである）。Note that in MPEG4, MPEG1
Unlike 2 and 2, since the B picture (B-VOP) is also used as the reference image, the B picture is also locally decoded and stored in the frame memory 41 (however, at the present time, the B picture is It is only used as a reference image for upper layers).

【０１１０】一方、ＶＬＣ器３６は、図３８で説明した
ように、Ｉ，Ｐ，Ｂピクチャ（Ｉ−ＶＯＰ，Ｐ−ＶＯ
Ｐ，Ｂ−ＶＯＰ）のマクロブロックについて、スキップ
マクロブロックとするかどうかを決定し、その決定結果
を示すフラグＣＯＤ，ＭＯＤＢを設定する。このフラグ
ＣＯＤ，ＭＯＤＢは、やはり可変長符号化されて伝送さ
れる。さらに、フラグＣＯＤは、上位レイヤ符号化部２
３にも供給される。On the other hand, the VLC device 36, as described with reference to FIG. 38, has I, P, B pictures (I-VOP, P-VO).
For macroblocks of (P, B-VOP), it is determined whether or not to be skip macroblocks, and flags COD and MODB indicating the determination result are set. The flags COD and MODB are also variable length coded and transmitted. Further, the flag COD indicates that the upper layer encoding unit 2
3 is also supplied.

【０１１１】次に、図１０は、図３の上位レイヤ符号化
部２３の構成例を示している。なお、図中、図９または
図３８における場合と対応する部分については、同一の
符号を付してある。即ち、上位レイヤ符号化部２３は、
フレームメモリ５２が新たに設けられていることを除け
ば、基本的には、図９の下位レイヤ符号化部２５または
図３８のエンコーダと同様に構成されている。Next, FIG. 10 shows an example of the configuration of the upper layer coding section 23 of FIG. Note that, in the figure, portions corresponding to those in FIG. 9 or FIG. 38 are denoted by the same reference numerals. That is, the upper layer encoding unit 23
The configuration is basically the same as that of the lower layer encoding unit 25 of FIG. 9 or the encoder of FIG. 38, except that a frame memory 52 is newly provided.

【０１１２】画像階層化部２１（図３）からの画像デー
タ、即ち、上位レイヤのＶＯＰは、図３８における場合
と同様に、フレームメモリ３１に供給されて記憶され、
動きベクトル検出器３２において、マクロブロック単位
で動きベクトルの検出が行われる。なお、この場合も、
動きベクトル検出器３２には、図９における場合と同様
に、上位レイヤのＶＯＰの他、そのサイズデータＦＳＺ
＿ＥおよびオフセットデータＦＰＯＳ＿Ｅが供給される
ようになされており、動きベクトル検出器３２では、上
述の場合と同様に、このサイズデータＦＳＺ＿Ｅおよび
オフセットデータＦＰＯＳ＿Ｅに基づいて、絶対座標系
における上位レイヤのＶＯＰの配置位置が認識され、マ
クロブロックの動きベクトルが検出される。The image data from the image hierarchizing unit 21 (FIG. 3), that is, the VOP of the upper layer is supplied to and stored in the frame memory 31 as in the case of FIG.
The motion vector detector 32 detects a motion vector in macroblock units. In this case, too,
In the motion vector detector 32, as in the case of FIG. 9, in addition to the VOP of the upper layer, its size data FSZ
_E and offset data FPOS_E are supplied, and the motion vector detector 32, based on this size data FSZ_E and offset data FPOS_E, in the motion vector detector 32, detects the VOP of the upper layer in the absolute coordinate system. The arrangement position is recognized, and the motion vector of the macroblock is detected.

【０１１３】ここで、上位レイヤ符号化部２３および下
位レイヤ符号化部２５における動きベクトル検出器３２
では、図３８で説明したように、予め設定されている所
定のシーケンスにしたがって、ＶＯＰが処理されていく
が、そのシーケンスは、ここでは、例えば、次のように
設定されている。Here, the motion vector detector 32 in the upper layer encoding unit 23 and the lower layer encoding unit 25.
Then, as described with reference to FIG. 38, the VOP is processed in accordance with the predetermined sequence set in advance, and the sequence is set here, for example, as follows.

【０１１４】即ち、空間スケーラビリティの場合におい
ては、図１１（Ａ）または図１１（Ｂ）に示すように、
上位レイヤまたは下位レイヤのＶＯＰは、例えば、Ｐ，
Ｂ，Ｂ，Ｂ，・・・またはＩ，Ｐ，Ｐ，Ｐ，・・・の順
でそれぞれ処理されていく。That is, in the case of spatial scalability, as shown in FIG. 11 (A) or FIG. 11 (B),
The VOP of the upper layer or the lower layer is, for example, P,
Processing is performed in the order of B, B, B, ... Or I, P, P, P ,.

【０１１５】そして、この場合、上位レイヤの最初のＶ
ＯＰであるＰピクチャ（Ｐ−ＶＯＰ）は、例えば、同時
刻における下位レイヤのＶＯＰ（ここでは、Ｉピクチャ
（Ｉ−ＶＯＰ））を参照画像として用いて符号化され
る。また、上位レイヤの２番目以降のＶＯＰであるＢピ
クチャ（Ｂ−ＶＯＰ）は、例えば、その直前の上位レイ
ヤのＶＯＰおよびそれと同時刻の下位レイヤのＶＯＰを
参照画像として用いて符号化される。即ち、ここでは、
上位レイヤのＢピクチャは、下位レイヤのＰピクチャと
同様に他のＶＯＰを符号化する場合の参照画像として用
いられる。In this case, the first V of the upper layer
The P picture (P-VOP), which is an OP, is encoded using, for example, a VOP of a lower layer at this time (here, an I picture (I-VOP)) as a reference image. Further, the B picture (B-VOP) that is the second and subsequent VOPs of the upper layer is encoded using, for example, the VOP of the immediately preceding upper layer and the VOP of the lower layer at the same time as the reference image. That is, here
The B picture of the upper layer is used as a reference image when other VOPs are coded like the P picture of the lower layer.

【０１１６】なお、下位レイヤについては、例えば、Ｍ
ＰＥＧ１や２、あるいはＨ．２６３における場合と同様
に符号化が行われていく。As for the lower layer, for example, M
PEG 1 or 2, or H.264. Encoding is performed as in the case of H.263.

【０１１７】ＳＮＲスケーラビリティは、空間スケーラ
ビリティにおける倍率ＦＲが１のときと考えられるか
ら、上述の空間スケーラビリティの場合と同様に処理さ
れる。Since it is considered that the SNR scalability is when the scaling factor FR in the spatial scalability is 1, the SNR scalability is processed in the same manner as the case of the spatial scalability described above.

【０１１８】テンポラルスケーラビリティの場合、即
ち、例えば、上述したように、ＶＯが、ＶＯＰ０，ＶＯ
Ｐ１，ＶＯＰ２，ＶＯＰ３，・・・で構成され、ＶＯＰ
１，ＶＯＰ３，ＶＯＰ５，ＶＯＰ７，・・・が上位レイ
ヤとされ（図１２（Ａ））、ＶＯＰ０，ＶＯＰ２，ＶＯ
Ｐ４，ＶＯＰ６，・・・が下位レイヤとされた場合にお
いては（図１２（Ｂ））、図１２に示すように、上位レ
イヤまたは下位レイヤのＶＯＰは、例えば、Ｂ，Ｂ，
Ｂ，・・・またはＩ，Ｐ，Ｐ，Ｐ，・・・の順でそれぞ
れ処理されていく。In the case of temporal scalability, that is, for example, as described above, VO is VOP0, VO
It is composed of P1, VOP2, VOP3, ...
1, VOP3, VOP5, VOP7, ... Are upper layers (FIG. 12A), and VOP0, VOP2, VO
When P4, VOP6, ... Are set as lower layers (FIG. 12 (B)), as shown in FIG. 12, the VOP of the upper layer or the lower layer is, for example, B, B,
B, ... Or I, P, P, P ,.

【０１１９】そして、この場合、上位レイヤの最初のＶ
ＯＰ１（Ｂピクチャ）は、例えば、下位レイヤのＶＯＰ
０（Ｉピクチャ）およびＶＯＰ２（Ｐピクチャ）を参照
画像として用いて符号化される。また、上位レイヤの２
番目のＶＯＰ３（Ｂピクチャ）は、例えば、その直前に
Ｂピクチャとして符号化された上位レイヤのＶＯＰ１、
およびＶＯＰ３の次の時刻（フレーム）における画像で
ある下位レイヤのＶＯＰ４（Ｐピクチャ）を参照画像と
して用いて符号化される。上位レイヤの３番目のＶＯＰ
５（Ｂピクチャ）も、ＶＯＰ３と同様に、例えば、その
直前にＢピクチャとして符号化された上位レイヤのＶＯ
Ｐ３、およびＶＯＰ５の次の時刻（フレーム）における
画像である下位レイヤのＶＯＰ６（Ｐピクチャ）を参照
画像として用いて符号化される。In this case, the first V of the upper layer
OP1 (B picture) is, for example, a VOP of a lower layer.
It is coded using 0 (I picture) and VOP2 (P picture) as reference images. Also, the upper layer 2
The th VOP3 (B picture) is, for example, the VOP1 of the upper layer coded as a B picture immediately before it,
And VOP4 (P picture) of the lower layer, which is an image at the time (frame) next to VOP3, is used as a reference image for encoding. Third VOP in the upper layer
Similarly to VOP3, 5 (B picture) is, for example, the VO of the upper layer coded immediately before it as a B picture.
Coding is performed by using VOP6 (P picture) of the lower layer, which is an image at the time (frame) next to P3 and VOP5, as a reference image.

【０１２０】以上のように、あるレイヤのＶＯＰ（ここ
では、上位レイヤ）については、ＰおよびＢピクチャを
符号化するための参照画像として、他のレイヤ（スケー
ラブルレイヤ）（ここでは、下位レイヤ）のＶＯＰを用
いることができる。このように、あるレイヤのＶＯＰを
符号化するのに、他のレイヤのＶＯＰを参照画像として
用いる場合、即ち、ここでは、上位レイヤのＶＯＰを予
測符号化するのに、下位レイヤのＶＯＰを参照画像とし
て用いる場合、上位レイヤ符号化部２３（図１０）の動
きベクトル検出器３２は、その旨を示すフラグｒｅｆ＿
ｌａｙｅｒ＿ｉｄ（階層数が３以上存在する場合、フラ
グｒｅｆ＿ｌａｙｅｒ＿ｉｄは、参照画像として用いる
ＶＯＰが属するレイヤを表す）を設定して出力するよう
になされている。As described above, with respect to a VOP of a certain layer (here, the upper layer), another layer (scalable layer) (here, the lower layer) is used as a reference image for encoding P and B pictures. VOPs can be used. As described above, when the VOP of a certain layer is used as a reference image for encoding the VOP of another layer, that is, here, the VOP of the lower layer is referred to for predictively encoding the VOP of the upper layer. When used as an image, the motion vector detector 32 of the upper layer encoding unit 23 (FIG. 10) uses the flag ref_ indicating that fact.
The layer_id (when the number of layers is 3 or more, the flag ref_layer_id represents the layer to which the VOP used as the reference image belongs) is set and output.

【０１２１】さらに、上位レイヤ符号化部２３の動きベ
クトル検出器３２は、ＶＯＰについてのフラグｒｅｆ＿
ｌａｙｅｒ＿ｉｄにしたがい、前方予測符号化または後
方予測符号化を、それぞれ、どのレイヤのＶＯＰを参照
画像として行うかを示すフラグｒｅｆ＿ｓｅｌｅｃｔ＿
ｃｏｄｅ（参照画像情報）を設定して出力するようにも
なされている。Furthermore, the motion vector detector 32 of the upper layer encoding unit 23 uses the flag ref_ regarding VOP.
According to the layer_id, a flag ref_select_ indicating which layer of the VOP is used as the reference image for forward predictive coding or backward predictive coding, respectively.
The code (reference image information) is also set and output.

【０１２２】即ち、例えば、上位レイヤ（Enhancement
Layer）のＰピクチャが、その直前に復号（局所復号）
される、それと同一のレイヤに属するＶＯＰを参照画像
として用いて符号化される場合、フラグｒｅｆ＿ｓｅｌ
ｅｃｔ＿ｃｏｄｅは「００」とされる。また、Ｐピクチ
ャが、その直前に表示される、それと異なるレイヤ（こ
こでは、下位レイヤ）（Reference Layer）に属するＶ
ＯＰを参照画像として用いて符号化される場合、フラグ
ｒｅｆ＿ｓｅｌｅｃｔ＿ｃｏｄｅは「０１」とされる。
さらに、Ｐピクチャが、その直後に表示される、それと
異なるレイヤに属するＶＯＰを参照画像として用いて符
号化される場合、フラグｒｅｆ＿ｓｅｌｅｃｔ＿ｃｏｄ
ｅは「１０」とされる。また、Ｐピクチャが、それと同
時刻における、異なるレイヤのＶＯＰを参照画像として
用いて符号化される場合、フラグｒｅｆ＿ｓｅｌｅｃｔ
＿ｃｏｄｅは「１１」とされる。That is, for example, the upper layer (Enhancement
Layer) P picture is decoded immediately before (local decoding)
If a VOP belonging to the same layer as the reference image is encoded as a reference image, the flag ref_sel
The ect_code is set to “00”. In addition, the P picture is a V that is displayed immediately before and belongs to a different layer (here, lower layer) (Reference Layer).
When encoded using OP as a reference image, the flag ref_select_code is set to "01".
Furthermore, if the P picture is coded using the VOP displayed immediately after that and belonging to a different layer as the reference image, the flag ref_select_cod
e is set to "10". Also, when a P picture is coded using VOPs of different layers at the same time as reference pictures, the flag ref_select is used.
_Code is set to "11".

【０１２３】一方、例えば、上位レイヤのＢピクチャ
が、それと同時刻における、異なるレイヤのＶＯＰを前
方予測のための参照画像として用い、かつ、その直前に
復号される、それと同一のレイヤに属するＶＯＰを後方
予測のための参照画像として用いて符号化される場合、
フラグｒｅｆ＿ｓｅｌｅｃｔ＿ｃｏｄｅは「００」とさ
れる。また、上位レイヤのＢピクチャが、それと同一の
レイヤに属するＶＯＰを前方予測のための参照画像とし
て用い、かつ、その直前に表示される、それと異なるレ
イヤに属するＶＯＰを後方予測のための参照画像として
用いて符号化される場合、フラグｒｅｆ＿ｓｅｌｅｃｔ
＿ｃｏｄｅは「０１」とされる。さらに、上位レイヤの
Ｂピクチャが、その直前に復号される、それと同一のレ
イヤに属するＶＯＰを前方予測のための参照画像として
用い、かつその直後に表示される、それと異なるレイヤ
に属するＶＯＰを後方予測のための参照画像として用い
て符号化される場合、フラグｒｅｆ＿ｓｅｌｅｃｔ＿ｃ
ｏｄｅは「１０」とされる。また、上位レイヤのＢピク
チャが、その直前に表示される、それと異なるレイヤに
属するＶＯＰを前方予測のための参照画像として用い、
かつその直後に表示される、それと異なるレイヤに属す
るＶＯＰを後方予測のための参照画像として用いて符号
化される場合、フラグｒｅｆ＿ｓｅｌｅｃｔ＿ｃｏｄｅ
は「１１」とされる。On the other hand, for example, a B picture of an upper layer uses a VOP of a different layer at the same time as that of a VOP belonging to the same layer as the reference image for forward prediction and decoded immediately before that. Is coded using as a reference image for backward prediction,
The flag ref_select_code is set to "00". In addition, a B picture of an upper layer uses a VOP belonging to the same layer as the reference image for forward prediction, and a VOP displayed immediately before that belonging to a different layer is used as a reference image for backward prediction. Flag is used as a flag ref_select
_Code is set to "01". Further, the B picture of the upper layer uses the VOP which is decoded immediately before and belongs to the same layer as the reference image for forward prediction, and the VOP which is displayed immediately after that and belongs to a different layer is backward. When coded using as a reference image for prediction, the flag ref_select_c
The ode is set to "10". In addition, a B picture of an upper layer, which is displayed immediately before that, belongs to a layer different from that, is used as a reference image for forward prediction,
In addition, when the VOP displayed immediately after that and belonging to a layer different from that is coded using as a reference image for backward prediction, the flag ref_select_code is used.
Is set to "11".

【０１２４】ここで、図１１および図１２で説明した予
測符号化の方法は、１つの例であり、前方予測符号化、
後方予測符号化、または両方向予測符号化における参照
画像として、どのレイヤの、どのＶＯＰを用いるかは、
例えば、上述した範囲で、自由に設定することが可能で
ある。Here, the predictive coding method described in FIGS. 11 and 12 is one example, and the forward predictive coding,
Which VOP of which layer is used as a reference image in backward predictive coding or bidirectional predictive coding is
For example, it can be freely set within the range described above.

【０１２５】なお、上述の場合においては、便宜的に、
「空間スケーラビリティ」、「時間スケーラビリテ
ィ」、「ＳＮＲスケーラビリティ」という語を用いた
が、フラグｒｅｆ＿ｓｅｌｅｃｔ＿ｃｏｄｅによって、
予測符号化に用いる参照画像を設定する場合、空間スケ
ーラビリティや、テンポラルスケーラビリティ、ＳＮＲ
スケーラビリティを明確に区別することは困難となる。
即ち、逆にいえば、フラグｒｅｆ＿ｓｅｌｅｃｔ＿ｃｏ
ｄｅを用いることによって、上述のようなスケーラビリ
ティの区別をせずに済むようになる。In the above case, for convenience,
Although the terms "spatial scalability", "temporal scalability" and "SNR scalability" are used, the flag ref_select_code allows
When setting a reference image used for predictive coding, spatial scalability, temporal scalability, SNR
It becomes difficult to clearly distinguish scalability.
That is, conversely, the flag ref_select_co
By using de, it becomes unnecessary to make the above distinction of scalability.

【０１２６】ここで、上述のスケーラビリティとフラグ
ｒｅｆ＿ｓｅｌｅｃｔ＿ｃｏｄｅとを対応付けるとすれ
ば、例えば、次のようになる。即ち、Ｐピクチャについ
ては、フラグｒｅｆ＿ｓｅｌｅｃｔ＿ｃｏｄｅが「１
１」の場合が、フラグｒｅｆ＿ｌａｙｅｒ＿ｉｄが示す
レイヤの同時刻におけるＶＯＰを参照画像（前方予測の
ための参照画像）として用いる場合であるから、これ
は、空間スケーラビリティまたはＳＮＲスケーラビリテ
ィに対応する。そして、フラグｒｅｆ＿ｓｅｌｅｃｔ＿
ｃｏｄｅが「１１」の場合以外は、テンポラルスケーラ
ビリティに対応する。Here, if the above scalability is associated with the flag ref_select_code, for example, the following is obtained. That is, for P pictures, the flag ref_select_code is "1".
The case of “1” is the case where the VOP at the same time of the layer indicated by the flag ref_layer_id is used as the reference image (reference image for forward prediction), and therefore this corresponds to the spatial scalability or the SNR scalability. Then, the flag ref_select_
Corresponding to temporal scalability except when the code is "11".

【０１２７】また、Ｂピクチャについては、フラグｒｅ
ｆ＿ｓｅｌｅｃｔ＿ｃｏｄｅが「００」の場合が、やは
り、フラグｒｅｆ＿ｌａｙｅｒ＿ｉｄが示すレイヤの同
時刻におけるＶＯＰを前方予測のための参照画像として
用いる場合であるから、これが、空間スケーラビリティ
またはＳＮＲスケーラビリティに対応する。そして、フ
ラグｒｅｆ＿ｓｅｌｅｃｔ＿ｃｏｄｅが「００」の場合
以外は、テンポラルスケーラビリティに対応する。For B pictures, the flag re
The case where f_select_code is "00" is also the case where the VOP at the same time of the layer indicated by the flag ref_layer_id is used as a reference image for forward prediction, and this corresponds to the spatial scalability or the SNR scalability. Then, except for the case where the flag ref_select_code is "00", it corresponds to temporal scalability.

【０１２８】なお、上位レイヤのＶＯＰの予測符号化の
ために、それと異なるレイヤ（ここでは、下位レイヤ）
の、同時刻におけるＶＯＰを参照画像として用いる場
合、両者の間に動きはないので、動きベクトルは、常に
０（（０，０））とされる。Note that a different layer (here, lower layer) is used for VOP predictive coding of the upper layer.
When the VOP at the same time is used as the reference image, there is no motion between the two, so the motion vector is always 0 ((0,0)).

【０１２９】図１０に戻り、上位レイヤ符号化部２３の
動きベクトル検出器３２では、以上のようなフラグｒｅ
ｆ＿ｌａｙｅｒ＿ｉｄおよびｒｅｆ＿ｓｅｌｅｃｔ＿ｃ
ｏｄｅが設定され、動き補償器４２およびＶＬＣ器３６
に供給される。Returning to FIG. 10, the motion vector detector 32 of the upper layer encoding unit 23 uses the flag re as described above.
f_layer_id and ref_select_c
ode is set, the motion compensator 42 and the VLC unit 36
Is supplied to.

【０１３０】また、動きベクトル検出器３２では、フラ
グｒｅｆ＿ｌａｙｅｒ＿ｉｄおよびｒｅｆ＿ｓｅｌｅｃ
ｔ＿ｃｏｄｅにしたがって、フレームメモリ３１を参照
するだけでなく、必要に応じて、フレームメモリ５２を
も参照して、動きベクトルが検出される。In the motion vector detector 32, the flags ref_layer_id and ref_selec are set.
According to t_code, not only the frame memory 31 is referred to, but also the frame memory 52 is referred to as needed, so that the motion vector is detected.

【０１３１】ここで、フレームメモリ５２には、解像度
変換部２４（図３）から、局所復号された下位レイヤの
拡大画像が供給されるようになされている。即ち、解像
度変換部２４では、局所復号された下位レイヤのＶＯＰ
が、例えば、いわゆる補間フィルタなどによって拡大さ
れ、これにより、そのＶＯＰを、ＦＲ倍だけした拡大画
像、つまり、その下位レイヤのＶＯＰに対応する上位レ
イヤのＶＯＰと同一の大きさとした拡大画像が生成さ
れ、上位レイヤ符号化部２３に供給される。フレームメ
モリ５２では、このようにして解像度変換部２４から供
給される拡大画像が記憶される。Here, the frame memory 52 is supplied with the locally decoded enlarged image of the lower layer from the resolution conversion unit 24 (FIG. 3). That is, in the resolution conversion unit 24, the locally decoded VOP of the lower layer is
However, for example, it is enlarged by a so-called interpolation filter or the like, whereby an enlarged image obtained by multiplying the VOP by FR, that is, an enlarged image having the same size as the VOP of the upper layer corresponding to the VOP of the lower layer is generated. And is supplied to the upper layer encoding unit 23. The frame memory 52 stores the enlarged image thus supplied from the resolution conversion unit 24.

【０１３２】従って、倍率ＦＲが１の場合は、解像度変
換部２４は、下位レイヤ符号化部２５からの局所復号さ
れたＶＯＰに対して、特に処理を施すことなく、そのま
ま、上位レイヤ符号化部２３に供給する。Therefore, when the scaling factor FR is 1, the resolution conversion section 24 directly processes the locally decoded VOP from the lower layer coding section 25 without any processing. 23.

【０１３３】動きベクトル検出器３２には、下位レイヤ
符号化部２５からサイズデータＦＳＺ＿Ｂおよびオフセ
ットデータＦＰＯＳ＿Ｂが供給されるとともに、遅延回
路２２（図３）からの倍率ＦＲが供給されるようになさ
れており、動きベクトル検出器３２は、フレームメモリ
５２に記憶された拡大画像を参照画像として用いる場
合、即ち、上位レイヤのＶＯＰの予測符号化に、そのＶ
ＯＰと同時刻における下位レイヤのＶＯＰを参照画像と
して用いる場合（この場合、フラグｒｅｆ＿ｓｅｌｅｃ
ｔ＿ｃｏｄｅは、Ｐピクチャについては「１１」に、Ｂ
ピクチャについては「００」にされる）、その拡大画像
に対応するサイズデータＦＳＺ＿Ｂおよびオフセットデ
ータＦＰＯＳ＿Ｂに、倍率ＦＲを乗算する。そして、そ
の乗算結果に基づいて、絶対座標系における拡大画像の
位置を認識し、動きベクトルの検出を行う。The motion vector detector 32 is supplied with the size data FSZ_B and the offset data FPOS_B from the lower layer encoding unit 25 and the magnification FR from the delay circuit 22 (FIG. 3). Therefore, the motion vector detector 32 uses the V when the enlarged image stored in the frame memory 52 is used as the reference image, that is, in the predictive encoding of the VOP of the upper layer.
When the VOP of the lower layer at the same time as the OP is used as the reference image (in this case, the flag ref_selec
t_code is "11" for P pictures and B
The picture is set to "00"), and the size data FSZ_B and the offset data FPOS_B corresponding to the enlarged image are multiplied by the magnification FR. Then, based on the multiplication result, the position of the enlarged image in the absolute coordinate system is recognized, and the motion vector is detected.

【０１３４】なお、動きベクトル検出器３２には、下位
レイヤの動きベクトルと予測モードが供給されるように
なされており、これは、次のような場合に使用される。
即ち、動きベクトル検出部３２は、例えば、上位レイヤ
のＢピクチャについてのフラグｒｅｆ＿ｓｅｌｅｃｔ＿
ｃｏｄｅが「００」である場合において、倍率ＦＲが１
であるとき、即ち、ＳＮＲスケーラビリティのとき（但
し、この場合、上位レイヤの予測符号化に、上位レイヤ
のＶＯＰが用いられるので、この点で、ここでいうＳＮ
Ｒスケーラビリティは、ＭＰＥＧ２に規定されているも
のと異なる）、上位レイヤと下位レイヤは同一の画像で
あるから、上位レイヤのＢピクチャの予測符号化には、
下位レイヤの同時刻における画像の動きベクトルと予測
モードをそのまま用いることができる。そこで、この場
合、動きベクトル検出部３２は、上位レイヤのＢピクチ
ャについては、特に処理を行わず、下位レイヤの動きベ
クトルと予測モードをそのまま採用する。The motion vector detector 32 is supplied with the motion vector of the lower layer and the prediction mode, which is used in the following case.
That is, the motion vector detection unit 32, for example, flags ref_select_ for the B picture of the upper layer.
When the code is "00", the magnification FR is 1
, That is, in the case of SNR scalability (however, in this case, since the VOP of the upper layer is used for predictive coding of the upper layer, in this respect, the SN referred to here).
R scalability is different from that specified in MPEG2), and since the upper layer and the lower layer are the same image, the predictive encoding of the B picture of the upper layer is
The motion vector and prediction mode of the image at the same time in the lower layer can be used as they are. Therefore, in this case, the motion vector detection unit 32 does not perform any particular processing on the B picture of the upper layer, and directly adopts the motion vector and the prediction mode of the lower layer.

【０１３５】なお、この場合、上位レイヤ符号化部２３
では、動きベクトル検出器３２からＶＬＣ器３６には、
動きベクトルおよび予測モードは出力されない（従っ
て、伝送されない）。これは、受信側において、上位レ
イヤの動きベクトルおよび予測モードを、下位レイヤの
復号結果から認識することができるからである。In this case, the upper layer coding unit 23
Then, from the motion vector detector 32 to the VLC device 36,
Motion vectors and prediction modes are not output (and thus not transmitted). This is because the receiving side can recognize the motion vector and prediction mode of the upper layer from the decoding result of the lower layer.

【０１３６】以上のように、動きベクトル検出器３２
は、上位レイヤのＶＯＰの他、拡大画像をも参照画像と
して用いて、動きベクトルを検出し、さらに、図３８で
説明したように、予測誤差（あるいは分散）を最小にす
る予測モードを設定する。また、動きベクトル検出器３
２は、例えば、フラグｒｅｆ＿ｓｅｌｅｃｔ＿ｃｏｄｅ
やｒｅｆ＿ｌａｙｅｒ＿ｉｄその他の必要な情報を設定
して出力する。As described above, the motion vector detector 32
Detects the motion vector by using not only the VOP of the upper layer but also the enlarged image as the reference image, and further sets the prediction mode that minimizes the prediction error (or variance) as described with reference to FIG. . Also, the motion vector detector 3
2 is, for example, a flag ref_select_code
And ref_layer_id and other necessary information are set and output.

【０１３７】なお、図１０では、下位レイヤ符号化部２
５から、下位レイヤにおけるＩまたはＰピクチャを構成
するマクロブロックがスキップマクロブロックであるか
どうかを示すフラグＣＯＤが、動きベクトル検出器３
２、ＶＬＣ器３６、および動き補償器４２に供給される
ようになされている。In FIG. 10, the lower layer coding unit 2
5, the flag COD indicating whether or not the macroblock forming the I or P picture in the lower layer is the skip macroblock is the motion vector detector 3
2, the VLC unit 36, and the motion compensator 42.

【０１３８】動きベクトルの検出されたマクロブロック
は、上述した場合と同様に符号化され、これにより、Ｖ
ＬＣ器３６からは、その符号化結果としての可変長符号
が出力される。The detected macroblock of the motion vector is encoded in the same manner as described above, so that V
The LC unit 36 outputs a variable length code as the encoding result.

【０１３９】なお、上位レイヤ符号化部２３のＶＬＣ器
３６は、下位レイヤ符号化部２５における場合と同様
に、フラグＣＯＤ，ＭＯＤＢを設定して出力するように
なされている。ここで、フラグＣＯＤは、上述したよう
に、ＩまたはＰピクチャのマクロブロックがスキップマ
クロブロックであるかどうかを示すものであるが、フラ
グＭＯＤＢは、Ｂピクチャのマクロブロックがスキップ
マクロブロックであるかどうかを示すものである。The VLC unit 36 of the upper layer encoding unit 23 is adapted to set and output the flags COD and MODB as in the case of the lower layer encoding unit 25. Here, as described above, the flag COD indicates whether the macroblock of the I or P picture is the skip macroblock, but the flag MODB indicates whether the macroblock of the B picture is the skip macroblock. It shows how.

【０１４０】また、ＶＬＣ器３６には、量子化係数、量
子化ステップ、動きベクトル、および予測モードの他、
倍率ＦＲ、フラグｒｅｆ＿ｓｅｒｅｃｔ＿ｃｏｄｅ，ｒ
ｅｆ＿ｌａｙｅｒ＿ｉｄ、サイズデータＦＳＺ＿Ｅ、オ
フセットデータＦＰＯＳ＿Ｅ、も供給されるようになさ
れており、ＶＬＣ器３６では、これらのデータがすべて
可変長符号化されて出力される。In addition, the VLC unit 36 includes a quantization coefficient, a quantization step, a motion vector, a prediction mode,
Magnification FR, flag ref_select_code, r
The ef_layer_id, the size data FSZ_E, and the offset data FPOS_E are also supplied, and the VLC unit 36 outputs all of these data by variable length coding.

【０１４１】一方、動きベクトルの検出されたマクロブ
ロックは符号化された後、やはり上述したように局所復
号され、フレームメモリ４１に記憶される。そして、動
き補償器４２において、動きベクトル検出器３２におけ
る場合と同様にして、フレームメモリ４１に記憶され
た、局所復号された上位レイヤのＶＯＰだけでなく、フ
レームメモリ５２に記憶された、局所復号されて拡大さ
れた下位レイヤのＶＯＰをも参照画像として用いて動き
補償が行われ、予測画像が生成される。On the other hand, the macroblock in which the motion vector is detected is coded, then locally decoded as described above and stored in the frame memory 41. Then, in the motion compensator 42, in the same manner as in the motion vector detector 32, not only the locally decoded upper layer VOP stored in the frame memory 41 but also the local decoding stored in the frame memory 52. The motion compensation is performed by using the VOP of the lower layer that has been expanded as a reference image, and the predicted image is generated.

【０１４２】即ち、動き補償器４２には、動きベクトル
および予測モードの他、フラグｒｅｆ＿ｓｅｒｅｃｔ＿
ｃｏｄｅ，ｒｅｆ＿ｌａｙｅｒ＿ｉｄ、倍率ＦＲ、サイ
ズデータＦＳＺ＿Ｂ，ＦＳＺ＿Ｅ、オフセットデータＦ
ＰＯＳ＿Ｂ，ＦＰＯＳ＿Ｅが供給されるようになされて
おり、動き補償器４２は、フラグｒｅｆ＿ｓｅｒｅｃｔ
＿ｃｏｄｅ，ｒｅｆ＿ｌａｙｅｒ＿ｉｄに基づいて、動
き補償すべき参照画像を認識し、さらに、参照画像とし
て、局所復号された上位レイヤのＶＯＰ、または拡大画
像を用いる場合には、その絶対座標系における位置と大
きさを、サイズデータＦＳＺ＿Ｅおよびオフセットデー
タＦＰＯＳ＿Ｅ、またはサイズデータＦＳＺ＿Ｂおよび
オフセットデータＦＰＯＳ＿Ｂに基づいて認識し、必要
に応じて、倍率ＦＲを用いて予測画像を生成する。That is, the motion compensator 42 has a flag ref_select_ in addition to the motion vector and the prediction mode.
code, ref_layer_id, magnification FR, size data FSZ_B, FSZ_E, offset data F
POS_B and FPOS_E are supplied, and the motion compensator 42 sets the flag ref_select.
A reference image to be motion-compensated is recognized based on _code, ref_layer_id, and when a locally decoded VOP of the upper layer or an enlarged image is used as the reference image, its position and size in the absolute coordinate system. Is recognized based on the size data FSZ_E and the offset data FPOS_E or the size data FSZ_B and the offset data FPOS_B, and a prediction image is generated using the scaling factor FR as necessary.

【０１４３】次に、図１３は、図１のエンコーダから出
力されるビットストリームを復号するデコーダの一実施
の形態の構成例を示している。Next, FIG. 13 shows a configuration example of an embodiment of a decoder for decoding the bit stream output from the encoder of FIG.

【０１４４】このデコーダには、図１のエンコーダから
伝送路５または記録媒体６を介して提供されるビットス
トリームが供給される。即ち、図１のエンコーダから出
力され、伝送路５を介して伝送されてくるビットストリ
ームは、図示せぬ受信装置で受信され、あるいは、記録
媒体６に記録されたビットストリームは、図示せぬ再生
装置で再生され、逆多重化部７１に供給される。A bit stream provided from the encoder of FIG. 1 via the transmission path 5 or the recording medium 6 is supplied to this decoder. That is, the bit stream output from the encoder of FIG. 1 and transmitted via the transmission path 5 is received by a receiver (not shown) or the bit stream recorded on the recording medium 6 is reproduced (not shown). It is reproduced by the device and supplied to the demultiplexing unit 71.

【０１４５】逆多重化部７１では、そこに入力されたビ
ットストリーム（後述するＶＳ（Video Stream））が受
信される。さらに、逆多重化部７１では、入力されたビ
ットストリームが、ＶＯごとのビットストリームＶＯ＃
１，ＶＯ＃２，・・・に分離され、それぞれ、対応する
ＶＯＰ復号部７２_nに供給される。ＶＯＰ復号部７２_nで
は、逆多重化部７１からのビットストリームから、ＶＯ
を構成するＶＯＰ（画像データ）、サイズデータ（VOP
size）、およびオフセットデータ（VOP offset）が復号
され、画像再構成部７３に供給される。In the demultiplexing unit 71, the bit stream (VS (Video Stream) described later) input thereto is received. Further, in the demultiplexing unit 71, the input bitstream is a bitstream VO # for each VO.
1, VO # 2, ... And supplied to the corresponding VOP decoding unit 72 _n . The VOP decoding unit 72 _n extracts the VO from the bit stream from the demultiplexing unit 71.
VOP (image data) and size data (VOP
size) and offset data (VOP offset) are decoded and supplied to the image reconstruction unit 73.

【０１４６】画像再構成部７３では、ＶＯＰ復号部７２
₁乃至７２_Nそれぞれからの出力に基づいて、元の画像が
再構成される。この再構成された画像は、例えば、モニ
タ７４に供給されて表示される。In the image reconstructing unit 73, the VOP decoding unit 72
_The original image is reconstructed based on the output from each of _{1 to} 72 _N. The reconstructed image is supplied to the monitor 74 and displayed, for example.

【０１４７】次に、図１４は、スケーラビリティを実現
する、図１３のＶＯＰ復号部７２_nの構成例を示してい
る。Next, FIG. 14 shows a configuration example of the VOP decoding unit 72 _n of FIG. 13 which realizes scalability.

【０１４８】逆多重化部７１（図１３）から供給される
ビットストリームは、逆多重化部９１に入力され、そこ
で、上位レイヤのＶＯＰのビットストリームと、下位レ
イヤのＶＯＰのビットストリームとに分離される。上位
レイヤのＶＯＰのビットストリームは、遅延回路９２に
おいて、下位レイヤ復号部９５における処理の時間だけ
遅延された後、上位レイヤ復号部９３に供給され、ま
た、下位レイヤのＶＯＰのビットストリームは、下位レ
イヤ復号部９５に供給される。The bitstream supplied from the demultiplexing unit 71 (FIG. 13) is input to the demultiplexing unit 91, where it is separated into an upper layer VOP bitstream and a lower layer VOP bitstream. To be done. The bit stream of the VOP of the upper layer is delayed by the delay circuit 92 for the processing time in the lower layer decoding unit 95, and then supplied to the upper layer decoding unit 93, and the bit stream of the VOP of the lower layer is lower. It is supplied to the layer decoding unit 95.

【０１４９】下位レイヤ復号部９５では、下位レイヤの
ビットストリームが復号され、その結果得られる下位レ
イヤの復号画像が解像度変換部９４に供給される。ま
た、下位レイヤ復号部９５は、下位レイヤのビットスト
リームを復号することにより得られるサイズデータＦＳ
Ｚ＿Ｂ、オフセットデータＦＰＯＳ＿Ｂ、動きベクトル
（ＭＶ）、予測モード、フラグＣＯＤなどの、上位レイ
ヤのＶＯＰを復号するのに必要な情報を、上位レイヤ復
号部９３に供給する。The lower layer decoding unit 95 decodes the lower layer bitstream, and the decoded image of the lower layer obtained as a result is supplied to the resolution conversion unit 94. The lower layer decoding unit 95 also receives size data FS obtained by decoding the lower layer bitstream.
Information necessary for decoding the VOP of the upper layer, such as Z_B, offset data FPOS_B, motion vector (MV), prediction mode, and flag COD, is supplied to the upper layer decoding unit 93.

【０１５０】上位レイヤ復号部９３では、遅延回路９２
を介して供給される上位レイヤのビットストリームが、
下位レイヤ復号部９５および解像度変換部９４の出力を
必要に応じて参照することにより復号され、その結果得
られる上位レイヤの復号画像、サイズデータＦＳＺ＿
Ｅ、およびオフセットデータＦＰＯＳ＿Ｅが出力され
る。さらに、上位レイヤ復号部９３は、上位レイヤのビ
ットストリームを復号することにより得られる倍率ＦＲ
を、解像度変換部９４に出力する。解像度変換部９４で
は、上位レイヤ復号部９３からの倍率ＦＲを用いて、図
３における解像度変換部２４における場合と同様にし
て、下位レイヤの復号画像が変換される。この変換によ
り得られる拡大画像は、上位レイヤ復号部９３に供給さ
れ、上述したように、上位レイヤのビットストリームの
復号に用いられる。In the upper layer decoding unit 93, the delay circuit 92
The upper layer bitstream supplied via
The output of the lower layer decoding unit 95 and the resolution conversion unit 94 is decoded by referring to the output as necessary, and the decoded image of the upper layer obtained as a result, the size data FSZ_
E and offset data FPOS_E are output. Further, the upper layer decoding unit 93 determines the scaling factor FR obtained by decoding the upper layer bitstream.
Is output to the resolution conversion unit 94. In the resolution conversion unit 94, the decoded image of the lower layer is converted using the magnification FR from the upper layer decoding unit 93 in the same manner as in the resolution conversion unit 24 in FIG. The enlarged image obtained by this conversion is supplied to the upper layer decoding unit 93 and is used for decoding the upper layer bit stream as described above.

【０１５１】次に、図１５は、図１４の下位レイヤ復号
部９５の構成例を示している。なお、図中、図３９のデ
コーダにおける場合と対応する部分については、同一の
符号を付してある。即ち、下位レイヤ復号部９５は、基
本的に、図３９のデコーダと同様に構成されている。Next, FIG. 15 shows a configuration example of the lower layer decoding unit 95 of FIG. Note that, in the figure, portions corresponding to those in the decoder of FIG. 39 are denoted by the same reference numerals. That is, the lower layer decoding unit 95 is basically configured similarly to the decoder of FIG.

【０１５２】逆多重化部９１からの下位レイヤのビット
ストリームは、バッファ１０１に供給され、そこで受信
されて一時記憶される。ＩＶＬＣ器１０２は、その後段
のブロックの処理状態に対応して、バッファ１０１から
ビットストリームを適宜読み出し、そのビットストリー
ムを可変長復号することで、量子化係数、動きベクト
ル、予測モード、量子化ステップ、サイズデータＦＳＺ
＿Ｂ、オフセットデータＦＰＯＳ＿Ｂ、およびフラグＣ
ＯＤなどを分離する。量子化係数および量子化ステップ
は、逆量子化器１０３に供給され、動きベクトルおよび
予測モードは、動き補償器１０７と上位レイヤ復号部９
３（図１４）に供給される。また、サイズデータＦＳＺ
＿ＢおよびオフセットデータＦＰＯＳ＿Ｂは、動き補償
器１０７、画像再構成部７３（図１３）、および上位レ
イヤ復号部９３に供給され、フラグＣＯＤは、上位レイ
ヤ復号部９３に供給される。The lower layer bit stream from the demultiplexing unit 91 is supplied to the buffer 101, where it is received and temporarily stored. The IVLC unit 102 appropriately reads out the bit stream from the buffer 101 in accordance with the processing state of the subsequent block, and performs variable length decoding on the bit stream to obtain a quantization coefficient, a motion vector, a prediction mode, and a quantization step. , Size data FSZ
_B, offset data FPOS_B, and flag C
Separate OD etc. The quantization coefficient and the quantization step are supplied to the dequantizer 103, and the motion vector and the prediction mode are the motion compensator 107 and the upper layer decoding unit 9
3 (FIG. 14). Also, size data FSZ
_B and offset data FPOS_B are supplied to the motion compensator 107, the image reconstruction unit 73 (FIG. 13), and the upper layer decoding unit 93, and the flag COD is supplied to the upper layer decoding unit 93.

【０１５３】逆量子化器１０３、ＩＤＣＴ器１０４、演
算器１０５、フレームメモリ１０６、または動き補償器
１０７では、図９の下位レイヤ符号化部２５の逆量子化
器３８、ＩＤＣＴ器３９、演算器４０、フレームメモリ
４１、または動き補償器４２における場合とそれぞれ同
様の処理が行われることで、下位レイヤのＶＯＰが復号
され、画像再構成部７３、上位レイヤ復号部９３、およ
び解像度変換部９４（図１４）に供給される。In the inverse quantizer 103, the IDCT unit 104, the arithmetic unit 105, the frame memory 106, or the motion compensator 107, the inverse quantizer 38, the IDCT unit 39, and the arithmetic unit of the lower layer encoding unit 25 of FIG. 40, the frame memory 41, or the motion compensator 42, respectively, the same processing is performed to decode the VOP of the lower layer, and the image reconstructing unit 73, the upper layer decoding unit 93, and the resolution converting unit 94 ( 14).

【０１５４】次に、図１６は、図１４の上位レイヤ復号
部９３の構成例を示している。なお、図中、図３９にお
ける場合と対応する部分については、同一の符号を付し
てある。即ち、上位レイヤ復号部９３は、フレームメモ
リ１１２が新たに設けられていることを除けば、基本的
に、図３９のエンコーダと同様に構成されている。Next, FIG. 16 shows a configuration example of the upper layer decoding unit 93 of FIG. Note that, in the figure, parts corresponding to those in FIG. 39 are denoted by the same reference numerals. That is, the upper layer decoding unit 93 is basically configured similar to the encoder of FIG. 39 except that the frame memory 112 is newly provided.

【０１５５】逆多重化部９１からの上位レイヤのビット
ストリームは、バッファ１０１を介してＩＶＬＣ器１０
２に供給される。ＩＶＬＣ器１０２は、上位レイヤのビ
ットストリームを可変長復号することで、量子化係数、
動きベクトル、予測モード、量子化ステップ、サイズデ
ータＦＳＺ＿Ｅ、オフセットデータＦＰＯＳ＿Ｅ、倍率
ＦＲ、フラグｒｅｆ＿ｌａｙｅｒ＿ｉｄ，ｒｅｆ＿ｓｅ
ｌｅｃｔ＿ｃｏｄｅ，ＣＯＤ，ＭＯＤＢなどを分離す
る。量子化係数および量子化ステップは、図１５におけ
る場合と同様に、逆量子化器１０３に供給され、動きベ
クトルおよび予測モードは、動き補償器１０７に供給さ
れる。また、サイズデータＦＳＺ＿Ｅおよびオフセット
データＦＰＯＳ＿Ｅは、動き補償器１０７および画像再
構成部７３（図１３）に供給され、フラグＣＯＤ，ＭＯ
ＤＢ，ｒｅｆ＿ｌａｙｅｒ＿ｉｄ、およびｒｅｆ＿ｓｅ
ｌｅｃｔ＿ｃｏｄｅは、動き補償器１０７に供給され
る。さらに、倍率ＦＲは、動き補償器１０７および解像
度変換部９４（図１４）に供給される。The upper layer bit stream from the demultiplexing unit 91 is passed through the buffer 101 to the IVLC unit 10.
2 is supplied. The IVLC unit 102 performs variable length decoding on the bit stream of the upper layer to obtain a quantization coefficient,
Motion vector, prediction mode, quantization step, size data FSZ_E, offset data FPOS_E, scaling factor FR, flags ref_layer_id, ref_se
Lect_code, COD, MODB, etc. are separated. The quantization coefficient and the quantization step are supplied to the dequantizer 103, and the motion vector and the prediction mode are supplied to the motion compensator 107, as in the case of FIG. Further, the size data FSZ_E and the offset data FPOS_E are supplied to the motion compensator 107 and the image reconstruction unit 73 (FIG. 13), and the flags COD, MO are supplied.
DB, ref_layer_id, and ref_se
The lect_code is supplied to the motion compensator 107. Further, the magnification FR is supplied to the motion compensator 107 and the resolution converter 94 (FIG. 14).

【０１５６】なお、動き補償器１０７には、上述したデ
ータの他、下位レイヤ復号部９５（図１４）から、下位
レイヤの動きベクトル、フラグＣＯＤ、サイズデータＦ
ＳＺ＿Ｂ、およびオフセットデータＦＰＯＳ＿Ｂが供給
されるようになされている。また、フレームメモリ１１
２には、解像度変換部９４から拡大画像が供給される。In addition to the above-described data, the motion compensator 107 receives the motion vector of the lower layer, the flag COD, and the size data F from the lower layer decoding unit 95 (FIG. 14).
SZ_B and offset data FPOS_B are supplied. In addition, the frame memory 11
An enlarged image is supplied to 2 from the resolution conversion unit 94.

【０１５７】逆量子化器１０３、ＩＤＣＴ器１０４、演
算器１０５、フレームメモリ１０６、動き補償器１０
７、またはフレームメモリ１１２では、図１０の上位レ
イヤ符号化部２３の逆量子化器３８、ＩＤＣＴ器３９、
演算器４０、フレームメモリ４１、動き補償器４２、ま
たはフレームメモリ５２における場合とそれぞれ同様の
処理が行われることで、上位レイヤのＶＯＰが復号さ
れ、画像再構成部７３に供給される。Inverse quantizer 103, IDCT device 104, calculator 105, frame memory 106, motion compensator 10
7, or in the frame memory 112, the inverse quantizer 38, the IDCT device 39, and the inverse quantizer 38 of the upper layer encoding unit 23 of FIG.
By performing the same processing as in the arithmetic unit 40, the frame memory 41, the motion compensator 42, or the frame memory 52, the VOP of the upper layer is decoded and supplied to the image reconstructing unit 73.

【０１５８】ここで、以上のように構成される上位レイ
ヤ復号部９３および下位レイヤ復号部９５を有するＶＯ
Ｐ復号部７２_nにおいては、上位レイヤについての復号
画像、サイズデータＦＳＺ＿Ｅ、およびオフセットデー
タＦＰＯＳ＿Ｅ（以下、適宜、これらをすべて含めて、
上位レイヤデータという）と、下位レイヤについての上
位レイヤについての復号画像、サイズデータＦＳＺ＿
Ｂ、およびオフセットデータＦＰＯＳ＿Ｂ（以下、適
宜、これらをすべて含めて、下位レイヤデータという）
が得られるが、画像再構成部７３では、この上位レイヤ
データまたは下位レイヤデータから、例えば、次のよう
にして画像が再構成されるようになされている。Here, a VO having the upper layer decoding unit 93 and the lower layer decoding unit 95 configured as described above.
In the P decoding unit 72 _n , the decoded image of the upper layer, the size data FSZ_E, and the offset data FPOS_E (hereinafter, including all of them, as appropriate,
Upper layer data), a decoded image of the upper layer of the lower layer, and size data FSZ_
B and offset data FPOS_B (hereinafter, all of them are appropriately included and referred to as lower layer data)
However, the image reconstruction unit 73 is configured to reconstruct an image from the upper layer data or the lower layer data in the following manner, for example.

【０１５９】即ち、例えば、第１の空間スケーラビリテ
ィ（図４）が行われた場合（入力されたＶＯＰ全体が上
位レイヤとされるとともに、そのＶＯＰ全体を縮小した
ものが下位レイヤされた場合）において、下位レイヤデ
ータおよび上位レイヤデータの両方のデータが復号され
たときには、画像再構成部７３は、上位レイヤデータの
みに基づき、サイズデータＦＳＺ＿Ｅに対応する大きさ
の上位レイヤの復号画像（ＶＯＰ）を、オフセットデー
タＦＰＯＳ＿Ｅによって示される位置に配置する。ま
た、例えば、上位レイヤのビットストリームにエラーが
生じたり、また、モニタ７４が、低解像度の画像にしか
対応していないため、下位レイヤデータのみの復号が行
われたときには、画像再構成部７３は、その下位レイヤ
データのみに基づき、サイズデータＦＳＺ＿Ｂに対応す
る大きさの上位レイヤの復号画像（ＶＯＰ）を、オフセ
ットデータＦＰＯＳ＿Ｂによって示される位置に配置す
る。That is, for example, in the case where the first spatial scalability (FIG. 4) is performed (when the entire input VOP is the upper layer and the reduced VOP is the lower layer). When both the lower layer data and the upper layer data are decoded, the image reconstruction unit 73 generates a decoded image (VOP) of the upper layer having a size corresponding to the size data FSZ_E based on only the upper layer data. , At the position indicated by the offset data FPOS_E. Further, for example, when an error occurs in the bit stream of the upper layer, and the monitor 74 supports only low resolution images, when only the lower layer data is decoded, the image reconstructing unit 73. Arranges the decoded image (VOP) of the upper layer having the size corresponding to the size data FSZ_B at the position indicated by the offset data FPOS_B based on only the lower layer data.

【０１６０】また、例えば、第２の空間スケーラビリテ
ィ（図５）が行われた場合（入力されたＶＯＰの一部が
上位レイヤとされるとともに、そのＶＯＰ全体を縮小し
たものが下位レイヤとされた場合）において、下位レイ
ヤデータおよび上位レイヤデータの両方のデータが復号
されたときには、画像再構成部７３は、サイズデータＦ
ＳＺ＿Ｂに対応する大きさの下位レイヤの復号画像を、
倍率ＦＲにしたがって拡大し、その拡大画像を生成す
る。さらに、画像再構成部７３は、オフセットデータＦ
ＰＯＳ＿ＢをＦＲ倍し、その結果得られる値に対応する
位置に、拡大画像を配置する。そして、画像再構成部７
３は、サイズデータＦＳＺ＿Ｅに対応する大きさの上位
レイヤの復号画像を、オフセットデータＦＰＯＳ＿Ｅに
よって示される位置に配置する。In addition, for example, when the second spatial scalability (FIG. 5) is performed (a part of the input VOP is set as the upper layer, and a reduction of the entire VOP is set as the lower layer. In the case), when both the lower layer data and the upper layer data are decoded, the image reconstruction unit 73 determines that the size data F
The decoded image of the lower layer of the size corresponding to SZ_B is
The image is enlarged according to the magnification FR, and the enlarged image is generated. Further, the image reconstructing unit 73 uses the offset data F
POS_B is multiplied by FR, and the enlarged image is arranged at a position corresponding to the value obtained as a result. Then, the image reconstruction unit 7
3 arranges the decoded image of the upper layer having the size corresponding to the size data FSZ_E at the position indicated by the offset data FPOS_E.

【０１６１】この場合、上位レイヤの復号画像の部分
が、それ以外の部分に比較して高い解像度で表示される
ことになる。In this case, the part of the decoded image in the upper layer is displayed with a higher resolution than the other parts.

【０１６２】なお、上位レイヤの復号画像を配置する場
合においては、その復号画像と、拡大画像とは合成され
る。When the decoded image of the upper layer is arranged, the decoded image and the enlarged image are combined.

【０１６３】また、図１４（図１３）には図示しなかっ
たが、上位レイヤ復号部９３（ＶＯＰ復号部７２_n）か
ら画像再構成部７３に対しては、上述したデータの他、
倍率ＦＲも供給されるようになされており、画像再構成
部７３は、これを用いて、拡大画像を生成するようにな
されている。Although not shown in FIG. 14 (FIG. 13), in addition to the above-mentioned data, the upper layer decoding unit 93 (VOP decoding unit 72 _n ) to the image reconstruction unit 73
The magnification FR is also supplied, and the image reconstructing unit 73 is configured to generate an enlarged image using this.

【０１６４】一方、第２の空間スケーラビリティが行わ
れた場合において、下位レイヤデータのみが復号された
ときには、上述の第１の空間スケーラビリティが行われ
た場合と同様にして、画像が再構成される。On the other hand, in the case where the second spatial scalability is performed, when only the lower layer data is decoded, the image is reconstructed in the same manner as in the case where the first spatial scalability is performed. .

【０１６５】さらに、第３の空間スケーラビリティ（図
６、図７）が行われた場合（入力されたＶＯＰを構成す
る物体ごとに、その物体（オブジェクト）全体を上位レ
イヤとするとともに、その物体全体を間引いたものを下
位レイヤとした場合）においては、上述の第２の空間ス
ケーラビリティが行われた場合と同様にして、画像が再
構成される。Furthermore, when the third spatial scalability (FIGS. 6 and 7) is performed (for each object forming the input VOP, the entire object (object) is set as an upper layer, and the entire object is In the case where the thinned layer is used as the lower layer), the image is reconstructed in the same manner as in the case where the second spatial scalability described above is performed.

【０１６６】上述したように、オフセットデータＦＰＯ
Ｓ＿ＢおよびＦＰＯＳ＿Ｅは、下位レイヤの拡大画像お
よび上位レイヤの画像を構成する、対応する画素どうし
が、絶対座標系において同一の位置に配置されるように
なっているため、以上のように画像を再構成すること
で、正確な（位置ずれのない）画像を得ることができ
る。As described above, the offset data FPO
In S_B and FPOS_E, the corresponding pixels forming the enlarged image of the lower layer and the image of the upper layer are arranged at the same position in the absolute coordinate system. With this configuration, an accurate image (without positional deviation) can be obtained.

【０１６７】次に、図１のエンコーダが出力する符号化
ビットストリームのシンタクスについて、例えば、MPEG
4規格のVideo Verification Model(Version6.0)（以
下、適宜、VM6.0と記述する）を例に説明する。[0167] Next, regarding the syntax of the coded bit stream output by the encoder of Fig. 1, for example, MPEG
An example will be described using the Video Verification Model (Version 6.0) of 4 standards (hereinafter, appropriately referred to as VM6.0).

【０１６８】図１７は、VM6.0における符号化ビットス
トリームの構成を示している。FIG. 17 shows the structure of a coded bitstream in VM6.0.

【０１６９】符号化ビットストリームは、ＶＳ（Video
Session Class）を単位として構成され、各ＶＳは、１
以上のＶＯ（Video Object Class）から構成される。そ
して、ＶＯは、１以上のＶＯＬ（Video Object Layer C
lass）から構成され（画像を階層化しないときは１のＶ
ＯＬで構成され、画像を階層化する場合には、その階層
数だけのＶＯＬで構成される）、ＶＯＬは、ＶＯＰ（Vi
deo Object Plane Class）から構成される。The coded bit stream is VS (Video
Session Class), and each VS has 1
It is composed of the above VO (Video Object Class). The VO is one or more VOLs (Video Object Layer C
lass) (V of 1 when the image is not layered)
If the image is layered, it is composed of the number of VOLs corresponding to the number of layers) and VOL is VOP (Vi
deo Object Plane Class).

【０１７０】なお、ＶＳは、画像シーケンスであり、例
えば、一本の番組や映画などに相当する。The VS is an image sequence and corresponds to, for example, one program or movie.

【０１７１】図１８または図１９は、ＶＳまたはＶＯの
シンタクスをそれぞれ示している。ＶＯは、画像全体ま
たは画像の一部（物体）のシーケンスに対応するビット
ストリームであり、従って、ＶＳは、そのようなシーケ
ンスの集合で構成される（よって、ＶＳは、例えば、一
本の番組などに相当する）。FIG. 18 or FIG. 19 shows the syntax of VS or VO, respectively. A VO is a bitstream corresponding to a sequence of an entire image or a part (object) of an image, and thus a VS is composed of a set of such sequences (so VS is, for example, a program Equivalent to).

【０１７２】図２０は、ＶＯＬのシンタクスを示してい
る。FIG. 20 shows the syntax of the VOL.

【０１７３】ＶＯＬは、上述したようなスケーラビリテ
ィのためのクラスであり、video_object_layer_idで示
される番号によって識別される。即ち、例えば、下位レ
イヤのＶＯＬについてのvideo_object_layer_idは０と
され、また、例えば、上位レイヤのＶＯＬについてのvi
deo_object_layer_idは１とされる。なお、上述したよ
うに、スケーラブルのレイヤの数は２に限られることな
く、１や３以上を含む任意の数とすることができる。The VOL is a class for scalability as described above, and is identified by the number indicated by video_object_layer_id. That is, for example, video_object_layer_id for the VOL of the lower layer is set to 0, and, for example, vi for the VOL of the upper layer.
deo_object_layer_id is set to 1. Note that, as described above, the number of scalable layers is not limited to 2, and can be any number including 1 or 3 or more.

【０１７４】また、各ＶＯＬについて、それが画像全体
であるのか、画像の一部であるのかは、video_object_l
ayer_shapeで識別される。このvideo_object_layer_sha
peは、ＶＯＬの形状を示すフラグで、例えば、以下のよ
うに設定される。For each VOL, video_object_l is used to determine whether it is the entire image or a part of the image.
Identified by ayer_shape. This video_object_layer_sha
pe is a flag indicating the shape of the VOL, and is set as follows, for example.

【０１７５】即ち、ＶＯＬの形状が長方形状であると
き、video_object_layer_shapeは、例えば「００」とさ
れる。また、ＶＯＬが、ハードキー（０または１のうち
のいずれか一方の値をとる２値（Binary）の信号）によ
って抜き出される領域の形状をしているとき、video_ob
ject_layer_shapeは、例えば「０１」とされる。さら
に、ＶＯＬが、ソフトキー（０乃至１の範囲の連続した
値（Gray-Scale）をとることが可能な信号）によって抜
き出される領域の形状をしているとき（ソフトキーを用
いて合成されるものであるとき）、video_object_layer
_shapeは、例えば「１０」とされる。That is, when the VOL has a rectangular shape, video_object_layer_shape is set to, for example, "00". Also, when the VOL has a shape of an area extracted by a hard key (a binary signal that takes one of 0 and 1), video_ob
The ject_layer_shape is set to "01", for example. Further, when the VOL has a shape of an area extracted by a soft key (a signal that can take a continuous value (Gray-Scale) in the range of 0 to 1) (combined by using the soft key). Video_object_layer
_shape is set to “10”, for example.

【０１７６】ここで、video_object_layer_shapeが「０
０」とされるのは、ＶＯＬの形状が長方形状であり、か
つ、そのＶＯＬの絶対座標形における位置および大きさ
が、時間とともに変化しない、即ち、一定の場合であ
る。なお、この場合、その大きさ（横の長さと縦の長
さ）は、video_object_layer_widthとvideo_object_lay
er_heightによって示される。video_object_layer_widt
hおよびvideo_object_layer_heightは、いずれも１０ビ
ットの固定長のフラグで、video_object_layer_shapeが
「００」の場合には、最初に、一度だけ伝送される（こ
れは、video_object_layer_shapeが「００」の場合、上
述したように、ＶＯＬの絶対座標系における大きさが一
定であるからである）。Here, video_object_layer_shape is "0".
“0” is defined when the VOL has a rectangular shape and the position and size of the VOL in the absolute coordinate form do not change with time, that is, are constant. In this case, the sizes (horizontal length and vertical length) are video_object_layer_width and video_object_lay.
Indicated by er_height. video_object_layer_widt
Both h and video_object_layer_height are fixed-length flags of 10 bits, and when video_object_layer_shape is "00", it is transmitted only once at the beginning (this is as described above when video_object_layer_shape is "00"). , The size of the VOL in the absolute coordinate system is constant).

【０１７７】また、ＶＯＬが、下位レイヤまたは上位レ
イヤのうちのいずれであるかは、１ビットのフラグであ
るscalabilityによって示される。ＶＯＬが下位レイヤ
の場合、scalabilityは、例えば１とされ、それ以外の
場合、scalabilityは、例えば０とされる。Whether the VOL is a lower layer or an upper layer is indicated by the scalability which is a 1-bit flag. If the VOL is a lower layer, the scalability is set to 1, for example, and otherwise the scalability is set to 0, for example.

【０１７８】さらに、ＶＯＬが、自身以外のＶＯＬにお
ける画像を参照画像として用いる場合、その参照画像が
属するＶＯＬは、上述したように、ref_layer_idで表さ
れる。なお、ref_layer_idは、上位レイヤについてのみ
伝送される。Further, when a VOL uses an image in a VOL other than itself as a reference image, the VOL to which the reference image belongs is represented by ref_layer_id as described above. Note that ref_layer_id is transmitted only for the upper layer.

【０１７９】また、図２０において、hor_sampling_fac
tor_nとhor_sampling_factor_mは、下位レイヤのＶＯＰ
の水平方向の長さに対応する値と、上位レイヤのＶＯＰ
の水平方向の長さに対応する値をそれぞれ示す。従っ
て、下位レイヤに対する上位レイヤの水平方向の長さ
（水平方向の解像度の倍率）は、式hor_sampling_facto
r_n/hor_sampling_factor_mで与えられる。Further, in FIG. 20, hor_sampling_fac
tor_n and hor_sampling_factor_m are VOPs of the lower layer
Value corresponding to the horizontal length of the VOP of the upper layer
The respective values corresponding to the horizontal length of are shown. Therefore, the horizontal length of the upper layer with respect to the lower layer (magnification of the horizontal resolution) is calculated by the formula hor_sampling_facto
It is given by r_n / hor_sampling_factor_m.

【０１８０】さらに、図２０において、ver_sampling_f
actor_nとver_sampling_factor_mは、下位レイヤのＶＯ
Ｐの垂直方向の長さに対応する値と、上位レイヤのＶＯ
Ｐの垂直方向の長さに対応する値をそれぞれ示す。従っ
て、下位レイヤに対する上位レイヤの垂直方向の長さ
（垂直方向の解像度の倍率）は、式ver_sampling_facto
r_n/ver_sampling_factor_mで与えられる。Further, in FIG. 20, ver_sampling_f
actor_n and ver_sampling_factor_m are lower layer VO
A value corresponding to the vertical length of P and the VO of the upper layer
The values corresponding to the vertical length of P are shown. Therefore, the vertical length (magnification of the vertical resolution) of the upper layer with respect to the lower layer is calculated by the formula ver_sampling_facto
It is given by r_n / ver_sampling_factor_m.

【０１８１】次に、図２１は、ＶＯＰ（Video Object P
lane Class）のシンタクスを示している。Next, FIG. 21 shows a VOP (Video Object P
(lane class) syntax is shown.

【０１８２】ＶＯＰの大きさ（横と縦の長さ）は、例え
ば、１０ビット固定長のVOP_widthとVOP_heightで表さ
れる。また、ＶＯＰの絶対座標系における位置は、例え
ば、１０ビット固定長のVOP_horizontal_spatial_mc_re
fとVOP_vertical_mc_refで表される。なお、VOP_width
またはVOP_heightは、ＶＯＰの水平方向または垂直方向
の長さをそれぞれ表し、これらは、上述のサイズデータ
ＦＳＺ＿ＢやＦＳＺ＿Ｅに相当する。また、VOP_horizo
ntal_spatial_mc_refまたはVOP_vertical_mc_refは、Ｖ
ＯＰの水平方向または垂直方向の座標（ｘまたはｙ座
標）をそれぞれ表し、これらは、上述のオフセットデー
タＦＰＯＳ＿ＢやＦＰＯＳ＿Ｅに相当する。The VOP size (horizontal and vertical lengths) is represented by VOP_width and VOP_height having a fixed length of 10 bits, for example. Further, the position of the VOP in the absolute coordinate system is, for example, VOP_horizontal_spatial_mc_re having a fixed length of 10 bits.
It is represented by f and VOP_vertical_mc_ref. Note that VOP_width
Alternatively, VOP_height represents the horizontal or vertical length of the VOP, and these correspond to the size data FSZ_B and FSZ_E described above. Also, VOP_horizo
ntal_spatial_mc_ref or VOP_vertical_mc_ref is V
Represents horizontal or vertical coordinates (x or y coordinates) of OP, which correspond to the offset data FPOS_B and FPOS_E described above.

【０１８３】VOP_width，VOP_height，VOP_horizontal_
spatial_mc_ref、およびVOP_vertical_mc_refは、video
_object_layer_shapeが「００」以外の場合にのみ伝送
される。即ち、video_object_layer_shapeが「００」の
場合、上述したように、ＶＯＰの大きさおよび位置はい
ずれも一定であるから、VOP_width，VOP_height，VOP_h
orizontal_spatial_mc_ref、およびVOP_vertical_mc_re
fは伝送する必要がない。この場合、受信側では、ＶＯ
Ｐは、その左上の頂点が、例えば、絶対座標系の原点に
一致するように配置され、また、その大きさは、図２０
で説明したvideo_object_layer_widthおよびvideo_obje
ct_layer_heightから認識される。VOP_width, VOP_height, VOP_horizontal_
spatial_mc_ref and VOP_vertical_mc_ref are video
It is transmitted only when _object_layer_shape is other than "00". That is, when the video_object_layer_shape is “00”, as described above, the size and position of the VOP are constant, so VOP_width, VOP_height, VOP_h.
orizontal_spatial_mc_ref, and VOP_vertical_mc_re
f need not be transmitted. In this case, on the receiving side, VO
20. P is arranged such that its upper left apex coincides with, for example, the origin of the absolute coordinate system, and its size is as shown in FIG.
Video_object_layer_width and video_obje described in
Recognized by ct_layer_height.

【０１８４】図２１において、ref_select_codeは、図
１７で説明したように、参照画像として用いる画像を表
すもので、ＶＯＰのシンタクスにおいて規定されてい
る。In FIG. 21, ref_select_code represents an image used as a reference image as described in FIG. 17, and is defined in the VOP syntax.

【０１８５】ところで、VM6.0では、各VOP(Video Objec
t Plane:従来のFrameに相当する)の表示時刻は、modulo
_time_baseと、VOP_time_increment（図２１）によっ
て、次のように定められる。By the way, in VM6.0, each VOP (Video Objec
(t Plane: Corresponding to conventional Frame) is displayed at the modulo
_time_base and VOP_time_increment (FIG. 21) determine as follows.

【０１８６】即ち、modulo_time_baseは、エンコーダの
ローカルな時間軸上における時刻を、１秒（1000ms（ミ
リ秒））の精度で表す。modulo_time_baseは、VOPヘッ
ダの中で伝送されるマーカ（marker）で表現され、必要
な数の「1」と、１の「0」とで構成される。modulo_tim
e_baseを構成する「1」の数が、最後に（現在から遡っ
て、最も最近に）（直前に）符号化／復号されたmodulo
_time_baseによって示された同期点（１秒精度の時刻）
からの累積時間を表す。即ち、modulo_time_baseが、例
えば、「０」の場合は、直前に符号化／復号されたmodu
lo_time_baseによって示された同期点からの累積時間が
０秒であることを表す。また、modulo_time_baseが、例
えば、「１０」の場合は、直前に符号化／復号されたmo
dulo_time_baseによって示された同期点からの累積時間
が１秒であることを表す。さらに、modulo_time_base
が、例えば、「１１０」の場合は、直前に符号化／復号
されたmodulo_time_baseによって示された同期点からの
累積時間が２秒であることを表す。以上のように、modu
lo_time_baseの「１」の数が、直前に符号化／復号され
たmodulo_time_baseによって示された同期点からの秒数
になっている。That is, modulo_time_base represents the time on the local time axis of the encoder with an accuracy of 1 second (1000 ms (millisecond)). modulo_time_base is expressed by a marker transmitted in the VOP header, and is composed of a required number of "1" s and 1 "0" s. modulo_tim
The number of "1" s that make up the e_base is the modulo that was last encoded (back to the present and most recently) (immediately before).
Sync point indicated by _time_base (1 second precision time)
Represents the cumulative time from. That is, when modulo_time_base is, for example, “0”, modu that was encoded / decoded immediately before
Indicates that the cumulative time from the sync point indicated by lo_time_base is 0 seconds. Further, when modulo_time_base is, for example, “10”, the mo coded / decoded immediately before.
Indicates that the cumulative time from the sync point indicated by dulo_time_base is 1 second. In addition, modulo_time_base
However, for example, “110” indicates that the cumulative time from the synchronization point indicated by modulo_time_base encoded / decoded immediately before is 2 seconds. As mentioned above, modu
The number of "1" in lo_time_base is the number of seconds from the sync point indicated by modulo_time_base that was encoded / decoded immediately before.

【０１８７】なお、VM6.0では、modulo_time_baseにつ
いて、「This value represents thelocal time base a
t the one second resolution unit (1000 millisecond
s).It is represented as a marker transmitted in th
e VOP header. The numberof consecutive "1" followe
d by a "0" indicates the number of seconds has ela
psed since the synchronization point marked by the
last encoded/decoded modulo_time_base.」と記載さ
れている。In VM6.0, modulo_time_base is "This value represents the local time base a
t the one second resolution unit (1000 millisecond
s) .It is represented as a marker transmitted in th
e VOP header. The number of consecutive "1" followe
d by a "0" indicates the number of seconds has ela
psed since the synchronization point marked by the
last encoded / decoded modulo_time_base. "

【０１８８】VOP_time_incrementは、エンコーダのロー
カルな時間軸上における時刻を、1msの精度で表す。VM
6.0では、VOP_time_incrementは、I-VOPおよびP-VOPに
ついては、直前に符号化／復号されたmodulo_time_base
によって示された同期点からの時間を表し、B-VOPにつ
いては、直前に符号化／復号されたI-VOPまたはP-VOPか
らの相対時間を表す。VOP_time_increment represents the time on the local time axis of the encoder with an accuracy of 1 ms. VM
In 6.0, VOP_time_increment is modulo_time_base encoded / decoded immediately before for I-VOP and P-VOP.
Indicates the time from the synchronization point indicated by, and for B-VOP, the relative time from the immediately preceding encoded / decoded I-VOP or P-VOP.

【０１８９】なお、VM6.0では、VOP_time_incrementに
ついて、「This value represents the local time bas
e in the units of milliseconds. For I and P-VOP's
thisvalue is the absolute VOP_time_increment from
the synchronization pointmarked by the last modulo
_time_base. For the B-VOP's this value is the rela
tive VOP_time_increment from the last encoded/deco
ded I- or P-VOP.」と記載されている。In VM6.0, regarding VOP_time_increment, "This value represents the local time bas
e in the units of milliseconds.For I and P-VOP's
thisvalue is the absolute VOP_time_increment from
the synchronization pointmarked by the last modulo
_time_base.For the B-VOP's this value is the rela
tive VOP_time_increment from the last encoded / deco
ded I- or P-VOP. "

【０１９０】そして、VM6.0では、「At the encoder, t
he following formula are used todetermine the abso
lute and relative VOP_time_increments for I/P-VOP'
s and B-VOP's, respectively.」と記載されている。Then, in VM6.0, "At the encoder, t
he following formula are used to determine the abso
lute and relative VOP_time_increments for I / P-VOP '
s and B-VOP's, respectively. "

【０１９１】即ち、エンコーダにおいて、以下の式を使
って、I-VOPおよびP-VOPと、B-VOPとについて、それぞ
れの表示時刻を符号化する旨が規定されている。That is, in the encoder, it is stipulated that the display time of each of the I-VOP and P-VOP and the B-VOP is coded using the following formula.

【０１９２】ｔ_GTB(n)＝ｎ×１０００ｍｓ＋ｔ_EST ｔ_AVTI＝ｔ_ETB(I/P)−ｔ_GTB(n) ｔ_RVTI＝ｔ_ETB(B)−ｔ_ETB(I/P) ・・・（１）但し、式（１）において、ｔ_GTB(n)は、ｎ番目に符号化
されたmodulo_time_baseによって示された同期点の時刻
（上述したように、秒精度）を表し、ｔ_ESTは、エンコ
ーダにおけるＶＯの符号化開始時刻（ＶＯの符号化が開
始された絶対時刻）を表す。また、ｔ_AVTIは、I-VOPま
たはP-VOPについてのVOP_time_incrementを表し、ｔ
_ETB(I/P)は、エンコーダにおけるI-VOPまたはP-VOPの符
号化開始時刻（ＶＯＰの符号化が開始された絶対時刻）
を表す。さらに、ｔ_RVTIは、B-VOPについてのVOP_time_
incrementを表し、ｔ_ETB(B)は、エンコーダにおけるB-V
OPの符号化開始時刻を表す。T _{GTB (n)} = n × 1000 ms + t _EST t _AVTI = t _{ETB (I / P)} −t _{GTB (n)} t _RVTI = t _{ETB (B)} −t _{ETB (I / P)} (1 However, in Expression (1), t _{GTB (n)} represents the time of the synchronization point indicated by the nth encoded modulo_time_base (second precision as described above), and t _EST is in the encoder. It represents the VO encoding start time (absolute time when the VO encoding was started). Also, t _AVTI represents VOP_time_increment for I-VOP or P-VOP, and t _AVTI
_{ETB (I / P)} is the encoding start time of I-VOP or P-VOP in the encoder (absolute time when the encoding of VOP was started)
Represents Further, t _RVTI is VOP_time_ for B-VOP.
represents the increment and t _{ETB (B)} is the BV at the encoder
Indicates the OP start time of OP.

【０１９３】なお、VM6.0では、式（１）におけるｔ
_GTB(n)，ｔ_EST，ｔ_AVTI，ｔ_ETB(I/P)，ｔ_RVTI，ｔ
_ETB(B)について、「t_GTB(n) is the encoder time base
marked by the nth encoded modulo_time_base, t_EST
is the encoder time base start time, t_AVTI is the
absolute VOP_time_increment for the I or P-VOP, t
_ETB(I/P ₎ is the encoder time base at the start of
the encoding of the I or P-VOP, t_RVTI is the relat
ive VOP_time_increment for the B-VOP, and t_ETB(B)
is the encoder time base at the start of the encod
ing of the B-VOP.」と記載されている。In VM6.0, t in equation (1)
_{GTB (n)} , t _EST , t _AVTI , t _{ETB (I / P)} , t _RVTI , t
Regarding _{ETB (B)} , "t _{GTB (n)} is the encoder time base
marked by the nth encoded modulo_time_base, t _EST
is the encoder time base start time, t _AVTI is the
absolute VOP_time_increment for the I or P-VOP, t
_{ETB (I / P} ₎ is the encoder time base at the start of
the encoding of the I or P-VOP, t _RVTI is the _relat
ive VOP_time_increment for the B-VOP, and t _{ETB (B)}
is the encoder time base at the start of the encod
ing of the B-VOP. "

【０１９４】また、VM6.0では、「At the decoder, the
following formula are used to determine the recov
ered time base of the I/P-VOP's and B-VOP's, respe
ctively:」と記載されている。Further, in VM6.0, "At the decoder, the
following formula are used to determine the recov
ered time base of the I / P-VOP's and B-VOP's, respe
ctively: ”is described.

【０１９５】即ち、デコーダ側では、以下の式を使っ
て、I-VOPおよびP-VOPと、B-VOPについて、それぞれの
表示時刻を復号する旨が規定されている。That is, on the decoder side, it is specified that the display times of I-VOP and P-VOP and B-VOP are decoded using the following formula.

【０１９６】ｔ_GTB(n)＝ｎ×１０００ｍｓ＋ｔ_DST ｔ_DTB(I/P)＝ｔ_AVTI＋ｔ_GTB(n) ｔ_DTB(B)＝ｔ_RVTI＋ｔ_DTB(I/P) ・・・（２）但し、式（２）において、ｔ_GTB(n)は、ｎ番目に復号さ
れたmodulo_time_baseによって示された同期点の時刻を
表し、ｔ_DSTは、デコーダにおけるＶＯの復号開始時刻
（ＶＯの復号が開始された絶対時刻）を表す。また、ｔ
_DTB(I/P)は、デコーダにおけるI-VOPまたはP-VOPの復号
開始時刻を表し、ｔ_AVTIは、I-VOPまたはP-VOPについて
のVOP_time_incrementを表す。さらに、ｔ_DTB(B)は、デ
コーダにおけるB-VOPの復号開始時刻（ＶＯＰの復号が
開始された絶対時刻）を表し、ｔ_RVT _Iは、B-VOPについ
てのVOP_time_incrementを表す。T _{GTB (n)} = n × 1000 ms + t _DST t _{DTB (I / P)} = t _AVTI + t _{GTB (n)} t _{DTB (B)} = t _RVTI + t _{DTB (I / P)} (2) In the equation (2), t _{GTB (n)} represents the time of the sync point indicated by the nth decoded modulo_time_base, and t _DST is the decoding start time of the VO in the decoder (the decoding of VO is started. Absolute time). Also, t
_{DTB (I / P)} represents the decoding start time of the I-VOP or P-VOP in the decoder, and t _AVTI represents the VOP_time_increment for the I-VOP or P-VOP. Further, t _{DTB (B)} represents the decoding start time of the B-VOP in the decoder (the absolute time when the decoding of the VOP was started), and t _RVT _I represents the VOP_time_increment of the B-VOP.

【０１９７】なお、VM6.0では、式（２）におけるｔ
_GTB(n)，ｔ_DST，ｔ_DTB(I/P)，ｔ_AVTI，ｔ_DTB(B)，ｔ
_RVTIについて、「t_GTB(n) is the encoding time base
marked bythe nth decoded modulo_time_base, t_DST is
the decoding time base start time, t_DTB(I/P) is t
he decoding time base at the start of the decoding
ofthe I or P-VOP, t_AVTI is the decoding absolute
VOP_time_increment for the I or P-VOP, t_DTB(B) is
the decoding time base at the start of the decodin
g of the B-VOP, and t_RVTI is the decoded relative
VOP_time_incrementfor the B-VOP.」と記載されてい
る。In VM6.0, t in equation (2)
_{GTB (n)} , t _DST , t _{DTB (I / P)} , t _AVTI , t _{DTB (B)} , t
About _RVTI , "t _{GTB (n)} is the encoding time base"
marked bythe nth decoded modulo_time_base, t _DST is
the decoding time base start time, t _{DTB (I / P)} is t
he decoding time base at the start of the decoding
of the I or P-VOP, t _AVTI is the decoding absolute
VOP_time_increment for the I or P-VOP, t _{DTB (B)} is
the decoding time base at the start of the decodin
g of the B-VOP, and t _RVTI is the decoded relative
VOP_time_increment for the B-VOP. "

【０１９８】図２２は、以上の定義に基づいて、modulo
_time_baseとVOP_time_incrementとの関係を示した図で
ある。FIG. 22 shows modulo based on the above definition.
It is a figure showing the relation between _time_base and VOP_time_increment.

【０１９９】図２２において、ＶＯは、Ｉ１（Ｉ−ＶＯ
Ｐ），Ｂ２（Ｂ−ＶＯＰ），Ｂ３，Ｐ４（Ｐ−ＶＯ
Ｐ），Ｂ５，Ｐ６，・・・というＶＯＰのシーケンスで
構成されている。いま、ＶＯの符号化／復号開始時刻
（絶対時刻）をｔ０とすると、modulo_time_baseは、時
刻ｔ０からの経過時間を、１秒精度で表すから、ｔ０＋
１秒、ｔ０＋２秒，・・・という時刻（同期点）を表
す。なお、図２２において、表示順は、Ｉ１，Ｂ２，Ｂ
３，Ｐ４，Ｂ５，Ｐ６，・・・であるが、符号化／復号
順は、Ｉ１，Ｐ４，Ｂ２，Ｂ３，Ｐ６，・・・である。In FIG. 22, VO is I1 (I-VO
P), B2 (B-VOP), B3, P4 (P-VO
P), B5, P6, ... VOP sequences. Now, assuming that the VO encoding / decoding start time (absolute time) is t0, modulo_time_base represents the elapsed time from time t0 with 1 second precision, so t0 +
It represents a time (synchronization point) of 1 second, t0 + 2 seconds, .... In FIG. 22, the display order is I1, B2, B.
3, P4, B5, P6, ..., The encoding / decoding order is I1, P4, B2, B3, P6 ,.

【０２００】図２２では（後述する図２５乃至図２８、
および図３３においても同様）、各ＶＯＰについてのVO
P_time_incrementを、四角形で囲んだ数字（単位はms）
で示してあり、modulo_time_baseによって示される同期
点の切り替わりを、▼印で示してある。従って、図２２
では、Ｉ１，Ｂ２，Ｂ３，Ｐ４，Ｂ５，Ｐ６についての
VOP_time_incrementが、３５０ｍｓ，４００ｍｓ，８０
０ｍｓ，５５０ｍｓ，４００ｍｓ，３５０ｍｓとそれぞ
れされており、Ｐ４およびＰ６において、同期点が切り
替わっている。In FIG. 22, (FIGS. 25 to 28 described later,
Also in FIG. 33), VO for each VOP
A number that encloses P_time_increment in a rectangle (unit is ms)
, And the switching of the synchronization points indicated by modulo_time_base is indicated by the ▼ mark. Therefore, FIG.
Then, for I1, B2, B3, P4, B5, P6
VOP_time_increment is 350ms, 400ms, 80
The synchronization points are 0 ms, 550 ms, 400 ms, and 350 ms, respectively, and the synchronization points are switched at P4 and P6.

【０２０１】いま、図２２において、Ｉ１のVOP_time_i
ncrementは、３５０msであるから、Ｉ１の符号化／復号
時刻は、直前に符号化／復号されたmodulo_time_baseに
よって示された同期点から３５０ｍｓ後の時刻となる。
なお、符号化／復号の開始直後は、その開始時刻（符号
化／復号開始時刻）ｔ０が同期点となるので、Ｉ１の符
号化／復号時刻は、符号化／復号開始時刻ｔ０から３５
０ｍｓ後の時刻ｔ０＋３５０ｍｓということになる。Now, in FIG. 22, VOP_time_i of I1
Since ncrement is 350 ms, the encoding / decoding time of I1 is the time 350 ms after the synchronization point indicated by modulo_time_base encoded / decoded immediately before.
Immediately after the start of encoding / decoding, the start time (encoding / decoding start time) t0 becomes the synchronization point, so the encoding / decoding time of I1 is 35 from the encoding / decoding start time t0.
This means that the time t0 + 350 ms after 0 ms.

【０２０２】そして、Ｂ２またはＢ３の符号化／復号時
刻は、直前に符号化／復号されたI-VOPまたはP-VOPか
ら、VOP_time_incrementだけ経過した時刻であるから、
いまの場合、最後の符号化／復号されたＩ１の符号化／
復号時刻ｔ０＋３５０ｍｓから、４００ｍｓまたは８０
０ｍｓ後の時刻ｔ０＋７５０ｍｓまたはｔ０＋１２００
ｍｓということに、それぞれなる。Since the coding / decoding time of B2 or B3 is the time when VOP_time_increment has elapsed from the I / VOP or P-VOP coded / decoded immediately before,
In the present case, the last encoded / decoded I1 encoded /
400 ms or 80 from decoding time t0 + 350 ms
Time t0 + 750 ms after 0 ms or t0 + 1200
It will be ms, respectively.

【０２０３】次に、Ｐ４についてであるが、Ｐ４では、
modulo_time_baseによって示される同期点が切り替わっ
ており、従って、同期点は時刻ｔ０＋１秒となる。その
結果、Ｐ４の符号化／復号時刻は、時刻ｔ０＋１秒から
５５０ｍｓ後の時刻（ｔ０＋１）秒＋５５０ｍｓという
ことになる。Next, regarding P4, in P4,
The sync point indicated by modulo_time_base has been switched, so the sync point is time t0 + 1 seconds. As a result, the encoding / decoding time of P4 is the time (t0 + 1) seconds +550 ms 550 ms after the time t0 + 1 seconds.

【０２０４】Ｂ５の符号化／復号時刻は、直前に符号化
／復号されたI-VOPまたはP-VOPから、VOP_time_increme
ntだけ経過した時刻であるから、いまの場合、最後の符
号化／復号されたＰ４の符号化／復号時刻（ｔ０＋１）
秒＋５５０ｍｓから、４００ｍｓ後の時刻（ｔ０＋１）
秒＋９５０ｍｓということになる。The coding / decoding time of B5 is VOP_time_increme from the I / VOP or P-VOP coded / decoded immediately before.
Since the time has passed by nt, in the present case, the encoding / decoding time (t0 + 1) of the last encoded / decoded P4
Time (t0 + 1) after 400 ms from second +550 ms
This means seconds +950 ms.

【０２０５】次に、Ｐ６についてであるが、Ｐ６では、
modulo_time_baseによって示される同期点が切り替わっ
ており、従って、同期点は時刻ｔ０＋２秒となる。その
結果、Ｐ６の符号化／復号時刻は、時刻ｔ０＋２秒から
３５０ｍｓ後の時刻（ｔ０＋２）秒＋３５０ｍｓという
ことになる。Next, regarding P6, in P6,
The sync point indicated by modulo_time_base has been switched, so the sync point is time t0 + 2 seconds. As a result, the encoding / decoding time of P6 is the time (t0 + 2) seconds + 350 ms 350 ms after the time t0 + 2 seconds.

【０２０６】なお、VM6.0では、modulo_time_baseによ
って示される同期点の切り替わりは、Ｉ−ＶＯＰとＰ−
ＶＯＰとに対してだけ許されており、Ｂ−ＶＯＰに対し
ては許されていない。Incidentally, in VM6.0, the switching of the synchronization points indicated by modulo_time_base is performed by I-VOP and P-
Only allowed for VOP and not for B-VOP.

【０２０７】また、VM6.0において、VOP_time_incremen
tが、I−VOPとP−VOPについては、直前に符号化／復号
されたmodulo_time_baseによって示された同期点からの
時間を表すのに対し、B-VOPについてだけは、直前に符
号化／復号されたI-VOPまたはP-VOPからの相対時間を表
すこととされているのは、主として、次のような理由に
よる。即ち、B-VOPは、表示順で、そのB-VOPを挟むI−V
OPまたはP−VOPを参照画像として予測符号化されるの
で、その予測符号化時に参照画像として用いるI−VOPま
たはP−VOPに対する重みを、B-VOPから、それを挟むI−
VOPまたはP−VOPまでの時間的距離に基づいて決めるた
めに、その時間的距離を、B-VOPについてのVOP_time_in
crementとしたことによる。In VM6.0, VOP_time_incremen
For I-VOP and P-VOP, t represents the time from the sync point indicated by modulo_time_base encoded / decoded immediately before, whereas for B-VOP only, encoded / decoded immediately before. The reason why it is supposed to represent the relative time from the I-VOP or P-VOP is mainly as follows. That is, B-VOPs are I-Vs that sandwich the B-VOP in display order.
Since OP or P-VOP is predictively coded as a reference image, the weight for I-VOP or P-VOP used as a reference image at the time of predictive coding, from B-VOP, I- sandwiching it-
To determine based on the temporal distance to VOP or P-VOP, the temporal distance is set to VOP_time_in for B-VOP.
Because it was crement.

【０２０８】ところで、上述したVM6.0のVOP_time_incr
ementの定義では、不都合が生じる。即ち、図２２で
は、B-VOPについてのVOP_time_incrementが、そのB-VOP
の直前に符号化／復号されるI-VOPまたはP-VOPからの相
対時間ではなく、直前に表示されるI-VOPまたはP-VOPか
らの相対時間を表すものとしてある。これは、次のよう
な理由による。即ち、例えば、Ｂ２やＢ３に注目した場
合、その直前に符号化／復号されるI-VOPまたはP-VOP
は、上述した符号化／復号順からいって、Ｐ４である。
従って、B-VOPについて、VOP_time_incrementが、そのB
-VOPの直前に符号化／復号されたI-VOPまたはP-VOPから
の相対時間を表すとした場合、Ｂ２やＢ３についてのVO
P_time_incrementは、Ｐ４の符号化／復号時刻からの相
対時間を表すこととなり、負の値になる。By the way, the above-mentioned VOP_time_incr of VM6.0
The definition of ement causes inconvenience. That is, in FIG. 22, the VOP_time_increment for the B-VOP is the B-VOP.
Is not the relative time from the I-VOP or P-VOP coded / decoded immediately before, but the relative time from the I-VOP or P-VOP displayed immediately before. This is for the following reason. That is, for example, when attention is paid to B2 or B3, the I-VOP or P-VOP coded / decoded immediately before that.
Is P4 in the encoding / decoding order described above.
Therefore, for B-VOP, VOP_time_increment is
-If the relative time from the I-VOP or P-VOP coded / decoded immediately before VOP is represented, VO for B2 and B3
P_time_increment represents the relative time from the encoding / decoding time of P4, and has a negative value.

【０２０９】一方、MPEG4規格では、VOP_time_incremen
tは、１０ビットとされており、０以上の値のみをとる
ものとすれば、０乃至１０２３の範囲の値を表現するこ
とができるから、隣接する同期点の間の位置を、時間的
に前（図２２において左方向）に位置する同期点を基準
として、1ms単位で表すことができる。[0209] On the other hand, according to the MPEG4 standard, VOP_time_incremen
Since t is 10 bits, and if it takes only a value of 0 or more, a value in the range of 0 to 1023 can be expressed. Therefore, the position between adjacent sync points can be temporally determined. It can be expressed in units of 1 ms with the synchronization point located in front (to the left in FIG. 22) as a reference.

【０２１０】しかしながら、VOP_time_incrementが、０
以上の値だけでなく、負の値もとることを許すと、例え
ば、隣接する同期点の間の位置が、時間的に前に位置す
る同期点を基準として表されたり、また、時間的に後に
位置する同期点を基準として表されたりすることになる
ため、ＶＯＰの符号化時刻や復号時刻を求める処理が煩
雑になる。However, VOP_time_increment is 0
If not only the above values but also negative values are allowed, for example, the positions between adjacent sync points are expressed with reference to the sync point located earlier in time, or in terms of time. Since it is expressed with a synchronization point located later as a reference, the process of obtaining the VOP encoding time and VOP decoding time becomes complicated.

【０２１１】従って、VM6.0では、上述したように、VOP
_time_incrementが、「This valuerepresents the loca
l time base in the units of milliseconds. For I a
ndP-VOP's this value is the absolute VOP_time_incr
ement from the synchronization point marked by the
last modulo_time_base. For the B-VOP's thisvalue
is the relative VOP_time_increment from the last
encoded/decoded I- or P-VOP.」と定義されているが、
最後の文の“For the B-VOP's this valueis the relat
ive VOP_time_increment from the last encoded/decod
ed I- or P-VOP”は、“For the B-VOP's this value i
s the relative VOP_time_increment from the last di
splayed I- or P-VOP”と変更するべきであり、これに
より、VOP_time_incrementが、直前に符号化／復号され
たI-VOPまたはP-VOPからの相対時間ではなく、直前に表
示されるI-VOPまたはP-VOPからの相対時間を表すものと
定義すべきである。Therefore, in VM6.0, as described above, VOP
_time_increment says `` This value represents the loca
l time base in the units of milliseconds.For I a
ndP-VOP's this value is the absolute VOP_time_incr
ement from the synchronization point marked by the
last modulo_time_base.For the B-VOP's this value
is the relative VOP_time_increment from the last
is defined as encoded / decoded I- or P-VOP.
The last sentence “For the B-VOP's this value is the relat
ive VOP_time_increment from the last encoded / decod
ed I- or P-VOP ”is“ For the B-VOP's this value i
s the relative VOP_time_increment from the last di
splayed I- or P-VOP ”, so that the VOP_time_increment is not the relative time from the last encoded / decoded I-VOP or P-VOP, but the last displayed I- It should be defined to represent the relative time from the VOP or P-VOP.

【０２１２】VOP_time_incrementを、このような定義に
することにより、B-VOPについての符号化／復号時刻の
計算の基準が、B-VOPよりも過去の表示時刻を持つI/P-V
OP（I-VOPまたはP-VOP）の表示時刻になるので、B-VOP
についてのVOP_time_incrementは、それが参照するI-VO
Pが、そのB-VOPよりも先に表示されない限り、常に、正
の値をとることになり、従って、I/P-VOPのVOP_time_in
crementも、常に正の値をとることになる。By defining VOP_time_increment in this way, the I / PV having a display time that is earlier than the B-VOP is used as the basis for calculating the encoding / decoding time for the B-VOP.
The OP (I-VOP or P-VOP) display time comes, so B-VOP
VOP_time_increment is the I-VO it references
Unless P is displayed before its B-VOP, it will always have a positive value, and therefore the I / P-VOP's VOP_time_in
crement will always take a positive value.

【０２１３】また、図２２では、さらにVM6.0の定義を
変更して、modulo_time_baseおよびVOP_time_increment
によって表される時刻が、符号化／復号時刻ではなく、
VOPの表示時刻であるとしてある。即ち、図２２では、V
OPのシーケンス上の絶対時刻を考えた場合に、式（１）
におけるt_EST(I/P)および式（２）におけるt
_DTB(I/P)は、IまたはP-VOPが位置するシーケンス上の絶
対時刻を、式（１）におけるt_EST _(B)および式（２）に
おけるt_DTB(B)は、B-VOPが位置するシーケンス上の絶対
時刻を、それぞれ表すものとしてある。Further, in FIG. 22, the definition of VM6.0 is further defined.
Change to modulo_time_base and VOP_time_increment
The time represented by is not the encoding / decoding time,
It is supposed to be the display time of the VOP. That is, in FIG. 22, V
Considering the absolute time on the OP sequence, equation (1)
At t_{EST (I / P)}And t in equation (2)
_{DTB (I / P)}Is the sequence on which the I or P-VOP is located.
The time is represented by t in equation (1)._EST _(B)And in equation (2)
T_{DTB (B)}Is an absolute on the sequence where the B-VOP is located
Each time is shown.

【０２１４】次に、VM6.0では、式（１）における符号
化開始時刻t_EST (the encoder timebase start time)は
符号化されず、その符号化開始時刻t_ESTと、各VOPの表
示時刻（VOPのシーケンス上の各VOPの位置を表す絶対時
刻）との差分情報としてのmodulo_time_baseおよびVOP_
time_incrementが符号化される。このため、デコーダ側
では、modulo_time_baseおよびVOP_time_incrementを用
いて、各VOPの間の相対的な時間関係は定めることがで
きるが、各VOPの絶対的な表示時刻、即ち、各VOPが、VO
Pのシーケンスの中のどの位置にあるものなのかを定め
ることはできない。従って、modulo_time_baseおよびVO
P_time_incrementだけでは、ビットストリームの途中に
アクセスすること、つまり、ランダムアクセスを行うこ
とはできない。Next, in VM6.0, the coding start time t _EST (the encoder timebase start time) in equation (1) is not coded, and the coding start time t _EST and the display time of each VOP ( Modulo_time_base and VOP_ as difference information with the absolute time that represents the position of each VOP on the VOP sequence)
time_increment is encoded. Therefore, on the decoder side, modulo_time_base and VOP_time_increment can be used to determine the relative time relationship between each VOP, but the absolute display time of each VOP, that is, each VOP
It is not possible to define where in P's sequence it is. Therefore, modulo_time_base and VO
P_time_increment alone cannot access the middle of the bitstream, that is, random access cannot be performed.

【０２１５】一方、単に符号化開始時刻t_ESTを符号化す
ると、デコーダでは、それを用いて、各VOPの絶対時刻
を復号することはできるが、常に、符号化ビットストリ
ームの先頭から、符号化開始時刻t_ESTと、各VOPの相対
的な時間情報であるmodulo_time_baseおよびVOP_time_i
ncrementを復号しながら、それを累積して、絶対時刻を
管理する必要があり、これは面倒であり、効率的なラン
ダムアクセスができない。On the other hand, if the encoding start time t _EST is simply encoded, the decoder can use it to decode the absolute time of each VOP, but the encoding is always performed from the beginning of the encoded bit stream. Start time t _EST and modulo_time_base and VOP_time_i which are relative time information of each VOP.
It is necessary to accumulate the ncrement and accumulate it while managing the absolute time, which is troublesome and cannot perform efficient random access.

【０２１６】そこで、本実施の形態では、容易に、効率
的なランダムアクセスを行うことができるように、VM6.
0の符号化ビットストリームの構成（階層）の中に、VOP
のシーケンス上の絶対時刻を符号化する階層（この階層
は、スケーラビリティを実現する階層（上述の下位レイ
ヤや上位レイヤ）ではなく、符号化ビットストリームの
階層である）を導入する。この階層は、符号化ビットス
トリームの先頭だけでなく、適当な位置に挿入できるよ
うな符号化ビットストリームの階層とする。Therefore, in the present embodiment, VM6.
Within the structure (layer) of the coded bitstream of 0, VOP
Introduces a layer that encodes absolute time on the sequence (this layer is a layer of an encoded bitstream, not a layer that realizes scalability (the above-described lower layer and upper layer)). This layer is a layer of a coded bitstream that can be inserted not only at the head of the coded bitstream but also at an appropriate position.

【０２１７】ここでは、この階層として、例えば、MPEG
1/2で用いられているGOP(Group ofPicture)層と同様に
規定されるものを導入する。これにより、MPEG4に独自
な符号化ストリームの階層を用いる場合に比べて、MPEG
4と、MPEG1/2とのコンパチビリティ（Compatibility）
を高めることができる。この新規に導入する階層を、こ
こでは、ＧＯＶ（またはＧＶＯＰ）（Group Of Video O
bject Plane）と呼ぶ。[0217] Here, as the hierarchy, for example, MPEG
Introduces the one specified in the same way as the GOP (Group of Picture) layer used in 1/2. As a result, compared to the case of using a unique encoded stream layer for MPEG4, MPEG4
4 and MPEG1 / 2 compatibility (Compatibility)
Can be increased. This newly introduced layer is here referred to as GOV (or GVOP) (Group Of Video O
bject Plane).

【０２１８】図２３は、VOPのシーケンス上の絶対時刻
を符号化するGOV層を導入した符号化ビットストリーム
の構成例を示している。FIG. 23 shows a structural example of a coded bitstream in which a GOV layer for coding absolute time on a VOP sequence is introduced.

【０２１９】GOV層は、ビットストリームの先頭だけで
なく、符号化ビットストリームの任意の位置に挿入する
ことができるように、VOL層とVOP層との間に規定されて
いる。The GOV layer is defined between the VOL layer and the VOP layer so that the GOV layer can be inserted not only at the head of the bitstream but also at an arbitrary position of the encoded bitstream.

【０２２０】これにより、あるVOL#0が、VOP#0，VOP#
1，・・・，VOP#n，VOP#(n+1)，・・・，VOP#mといった
VOPのシーケンスで構成される場合において、GOV層は、
その先頭のVOP#0の直前だけでなく、VOP#(n+1)の直前に
も挿入することができる。従って、エンコーダにおい
て、GOV層は、例えば、符号化ストリームの中の、ラン
ダムアクセスさせたい位置に挿入することができ、従っ
て、GOV層を挿入することで、あるVOLを構成するVOPの
一連のシーケンスは、GOV層によって、複数のグループ
（以下、適宜、GOVという）に分けられて符号化される
ことになる。As a result, a certain VOL # 0 becomes VOP # 0, VOP #
1, ..., VOP # n, VOP # (n + 1), ..., VOP # m
When it is composed of VOP sequence, GOV layer is
It can be inserted immediately before VOP # (n + 1) as well as immediately before the first VOP # 0. Therefore, in the encoder, the GOV layer can be inserted, for example, at a position in the encoded stream at which random access is desired. Therefore, by inserting the GOV layer, a sequence of VOPs constituting a certain VOL is sequenced. Will be coded by being divided into a plurality of groups (hereinafter appropriately referred to as GOV) by the GOV layer.

【０２２１】GOV層のシンタクス（Syntax）は、例え
ば、図２４に示すように定義される。The syntax of the GOV layer is defined as shown in FIG. 24, for example.

【０２２２】同図に示すように、GOV層は、グループス
タートコード（group_start_code）、タイムコード（ti
me_code）、クローズドGOP（closed_gop）、ブロークン
リンク（broken_link）、ネクストスタートコード（nex
t_start_code()）が順次配置されて構成される。As shown in the figure, the GOV layer has a group start code (group_start_code) and a time code (ti
me_code), closed GOP (closed_gop), broken link (broken_link), next start code (nex)
t_start_code ()) is sequentially arranged and configured.

【０２２３】次に、GOV層のセマンティクス（Semantic
s）について説明する。なお、GOV層のセマンティクス
は、基本的には、MPEG2のGOP層と同様であり、従って、
特に記述しない部分については、MPEG2Video規格(ISO/I
EC13818-2)を参照されたい。Next, the semantics of the GOV layer (Semantic
s) will be described. The semantics of the GOV layer are basically the same as the GOP layer of MPEG2, so
For parts that are not particularly described, the MPEG2 Video standard (ISO / I
See EC13818-2).

【０２２４】group_start_codeは、000001B8 （１６進
数）で、GOVの開始位置を示す。Group_start_code is 000001B8 (hexadecimal number) and indicates the start position of the GOV.

【０２２５】time_codeは、表１に示すような、１ビッ
トのdrop_frame_flag、５ビットのtime_code_hours、６
ビットのtime_code_minutes、１ビットのmarker_bit、
６ビットのtime_code_seconds、および６ビットのtime_
code_picturesの合計２５ビットで構成される。The time_code is 1-bit drop_frame_flag, 5-bit time_code_hours, 6 as shown in Table 1.
Bit time_code_minutes, 1 bit marker_bit,
6-bit time_code_seconds and 6-bit time_
It consists of a total of 25 bits of code_pictures.

【０２２６】[0226]

【表１】 [Table 1]

【０２２７】time_codeは、IEC standard publication
461で規定されている「time and control codes for vi
deo tape recorders」に相当する。ここで、MPEG4で
は、ビデオのフレームレート（VideoのFrame Rate）の
概念がないので（従って、VOPは、任意の時刻に表示す
ることができる）、ここでは、time_codeがドロップフ
レームモード（drop_frame_mode）で記述されているか
否かを示すdrop_frame_flagを利用せず、その値は、例
えば、0に固定する。同様の理由で、time_code_picture
sも利用せず、その値は、例えば、0に固定する。従っ
て、ここでは、time_codeは、時刻の時間の単位を表すt
ime_code_hours、時刻の分の単位を表すtime_code_minu
tes、および時刻の秒の単位を表すtime_code_secondsに
よって、GOVの先頭の時刻を表す。その結果、GOV層のti
me_code（符号化開始秒精度絶対時刻）は、秒精度で、
その先頭の時刻、即ち、そのGOV層の符号化が開始され
た、VOPのシーケンス上の絶対時刻を表現することとな
る。このため、本実施の形態では、秒より細かい精度の
時刻（時間）（ここでは、ミリ秒）は、VOP毎に設定す
る。[0227] time_code is IEC standard publication
461 `` time and control codes for vi
equivalent to "deo tape recorders". Here, since there is no concept of video frame rate (Video Frame Rate) in MPEG4 (hence, VOP can be displayed at any time), here time_code is drop frame mode (drop_frame_mode). The value is fixed to 0, for example, without using the drop_frame_flag indicating whether or not it is described. For the same reason, time_code_picture
The value of s is fixed to 0, for example, without using s. Therefore, here, time_code represents t, which represents the time unit of time.
ime_code_hours, time_code_minu representing minutes of the time
The time at the beginning of the GOV is represented by tes and time_code_seconds representing the unit of time seconds. As a result, the GOV layer ti
me_code (encoding start second precision absolute time) is second precision,
The leading time, that is, the absolute time on the VOP sequence at which the coding of the GOV layer is started will be expressed. Therefore, in the present embodiment, a time (time) (here, millisecond) having a precision finer than seconds is set for each VOP.

【０２２８】なお、time_codeのmarker_bitは、符号化
ビットストリームにおいて、０が２３個以上連続しない
ように１とされる。The marker_bit of time_code is set to 1 so that 23 or more 0s do not continue in the coded bit stream.

【０２２９】closed_gopは、MPEG2Video規格(ISO/IEC 1
3818-2)におけるclose_gopの定義の記載の中のＩ，Ｐ、
またはＢピクチャを、I-VOP，P-VOP、またはB-VOPにそ
れぞれ置き換えたものを意味し、従って、あるGOVの中
のB-VOPが、そのGOVを構成するVOPだけでなく、他のGOV
を構成するVOPを参照画像として符号化されているかど
うかを表す。ここで、以下に、MPEG2Video規格(ISO/IEC
13818-2)におけるclose_gopの定義について、上述のよ
うな置き換えを行った文を示す。Closed_gop is the MPEG2 Video standard (ISO / IEC 1
3818-2) I, P in the definition of close_gop definition,
Or B-pictures are replaced with I-VOPs, P-VOPs, or B-VOPs, respectively. Therefore, a B-VOP in a GOV is not only a VOP that composes the GOV, but also another GOV
It indicates whether or not the VOP forming the is encoded as a reference image. Here, the MPEG2 Video standard (ISO / IEC
Regarding the definition of close_gop in 13818-2), the following is a sentence that has been replaced as described above.

【０２３０】This is a one-bit flag which indicates
the nature of the predictions used in the first c
onsecutive B-VOPs (if any) immediately following t
he first coded I-VOP following the group of plane
header. The closed_gop isset to 1 to indicate that
these B-VOPs have been encoded using only backwar
d prediction or intra coding. This bit is provided
for use during anyediting which occurs after enco
ding. If the previous pictures have beenremoved by
editing, broken_link may be set to 1 so that a de
coder may avoid displaying these B-VOPs following
the first I-VOP following the group of plane heade
r. However if the closed_gop bit is set to 1, then
theeditor may choose not to set the broken_link b
it as these B-VOPs can becorrectly decoded.This is a one-bit flag which indicates
the nature of the predictions used in the first c
onsecutive B-VOPs (if any) immediately following t
he first coded I-VOP following the group of plane
header.The closed_gop isset to 1 to indicate that
these B-VOPs have been encoded using only backwar
d prediction or intra coding.This bit is provided
for use during anyediting which occurs after enco
ding. If the previous pictures have been removed by
editing, broken_link may be set to 1 so that a de
coder may avoid displaying these B-VOPs following
the first I-VOP following the group of plane heade
r. However if the closed_gop bit is set to 1, then
theeditor may choose not to set the broken_link b
it as these B-VOPs can becorrectly decoded.

【０２３１】broken_linkも、MPEG2Video規格(ISO/IEC
13818-2)におけるbroken_linkの記載について、closed_
gopにおける場合と同様の置き換を行ったものを意味
し、従って、GOVの先頭のB-VOPが正確に再生することが
できるかどうかを表す。ここで、以下に、MPEG2Video規
格(ISO/IEC 13818-2)におけるbroken_linkの定義につい
て、上述のような置き換えを行った文を示す。[0231] The broken_link also conforms to the MPEG2 Video standard (ISO / IEC
13818-2) regarding the description of broken_link, closed_link
It means the same replacement as in gop, and therefore represents whether the head B-VOP of GOV can be reproduced correctly. Here, the following is a sentence in which the definition of broken_link in the MPEG2 Video standard (ISO / IEC 13818-2) is replaced as described above.

【０２３２】This is a one-bit flag which shall be
set to 0 during encoding. It isset to 1 to indicat
e that the first consecutive B-VOPs (if any) immed
iately following the first coded I-VOP following t
he group of plane headermay not be correctly decod
ed because the reference frame which is used for p
rediction is not available (because of the action
of editing). A decoder may use this flag to avoid
displaying frames that cannot be correctly decode
d.This is a one-bit flag which shall be
set to 0 during encoding.It isset to 1 to indicat
e that the first consecutive B-VOPs (if any) immed
iately following the first coded I-VOP following t
he group of plane headermay not be correctly decod
ed because the reference frame which is used for p
rediction is not available (because of the action
of editing) .A decoder may use this flag to avoid
displaying frames that cannot be correctly decode
d.

【０２３３】next_start_code()は、次のGOVの先頭の位
置を与える。Next_start_code () gives the position of the beginning of the next GOV.

【０２３４】以上のようなGOV層を導入し、GOVの符号化
を開始する、GOVのシーケンス上の絶対時刻（以下、適
宜、符号化開始絶対時刻という）を、GOVのタイムコー
ドtime_codeに設定する。さらに、上述のように、GOV層
のtime_codeは秒精度なので、ここでは、各VOPの、VOP
のシーケンス上の絶対時刻の、さらに細かい精度の部分
を、VOP毎に設定する。By introducing the GOV layer as described above and setting the GOV encoding, the absolute time on the GOV sequence (hereinafter, appropriately referred to as the encoding start absolute time) is set to the GOV time code time_code. . Further, as described above, the time_code of the GOV layer is the second precision, so here, the VOP of each VOP is
The finer precision part of the absolute time on the sequence is set for each VOP.

【０２３５】即ち、図２５は、図２４のGOV層を導入し
た場合のtime_codeと、modulo_time_baseおよびVOP_tim
e_incrementとの関係を示している。That is, FIG. 25 shows time_code, modulo_time_base and VOP_tim when the GOV layer of FIG. 24 is introduced.
It shows the relationship with e_increment.

【０２３６】図２５において、GOVは、その先頭から、
表示順で、Ｉ１，Ｂ２，Ｂ３，Ｐ４，Ｂ５，Ｐ６が配置
されて構成されている。In FIG. 25, GOV is
I1, B2, B3, P4, B5, P6 are arranged in the display order.

【０２３７】いま、例えば、GOVの符号化開始絶対時刻
を、0h:12m:35sec:350msec（０時１２分３５秒３５０ミ
リ秒）とすると、GOVのtime_codeは、上述したように、
秒精度（秒単位）なので、0h:12m:35secとされる（time
_codeを構成するtime_code_hours，time_code_minute
s、またはtime_code_secondsが、それぞれ０，１２、ま
たは３５とされる）。一方、Ｉ１の、VOPのシーケンス
上の絶対時刻（図２５のGOVを含むVSの符号化前（また
は復号後）のVOPのシーケンスの絶対時刻）（これは、V
OPのシーケンスが表示されるときの、Ｉ１が表示される
時刻に相当するので、以下、適宜、表示時刻という）
が、例えば、0h:12m:35sec:350msecである場合には、そ
の表示時刻の、秒精度より細かい精度である350msは、
Ｉ１についてのI-VOPのVOP_time_incrementに設定され
て符号化されるように（Ｉ１についてのVOP_time_incre
ment＝350とされて符号化されるように）、VOP_time_in
crementのセマンティクスを変更する。[0237] Now, for example, assuming that the absolute start time of encoding of GOV is 0h: 12m: 35sec: 350msec (0: 12: 35: 350msec), the time_code of GOV is as described above.
Since it is second precision (second unit), it is set to 0h: 12m: 35sec (time
time_code_hours, time_code_minute that compose _code
s or time_code_seconds is 0, 12, or 35, respectively). On the other hand, the absolute time on the VOP sequence of I1 (the absolute time on the VOP sequence before encoding (or after decoding) VS including GOV in FIG. 25) (this is V
It corresponds to the time when I1 is displayed when the OP sequence is displayed, so it will be referred to as the display time hereinafter).
However, for example, when it is 0h: 12m: 35sec: 350msec, 350ms which is a precision finer than the second precision of the display time is
As set by the VOP_time_increment of the I-VOP for I1 and encoded (VOP_time_incre for I1
ment = 350 and encoded), VOP_time_in
Change the semantics of crement.

【０２３８】即ち、図２５において、GOVの、表示順で
先頭のI-VOP（Ｉ１）のVOP_time_incrementは、GOVのti
me_codeと、I-VOPの表示時刻の差分値とする。従って、
秒精度によるtime_codeで表された時刻が、GOVの最初の
同期点（ここでは、秒精度の時刻を表す点）となる。That is, in FIG. 25, the VOP_time_increment of the first I-VOP (I1) in the display order of the GOV is ti of the GOV.
It is the difference between me_code and the display time of the I-VOP. Therefore,
The time represented by time_code with the second precision is the first synchronization point of GOV (here, the point representing the second precision time).

【０２３９】なお、図２５において、GOVの２番目以降
に配置されたVOPであるＢ２，Ｂ３，Ｐ４，Ｂ５，Ｐ６
についてのVOP_time_incrementのセマンティクスは、図
２２で説明したように、VM6.0の定義を変更したものと
同様である。In FIG. 25, VOPs B2, B3, P4, B5, P6 which are VOPs arranged after the second GOV.
The semantics of VOP_time_increment for is similar to that when the definition of VM6.0 is changed as described in FIG.

【０２４０】従って、図２５において、Ｂ２またはＢ３
の表示時刻は、直前に表示されるI-VOPまたはP-VOPの表
示時刻から、VOP_time_incrementだけ経過した時刻であ
るから、いまの場合、直前に表示されるＩ１の表示時刻
0h:12m:35s+350msから、400msまたは800ms後の時刻0h:12
m:35s:750msまたは0h:12m:36s:200msということに、それ
ぞれなる。Therefore, in FIG. 25, B2 or B3
The display time of is the time when VOP_time_increment has elapsed from the display time of the I-VOP or P-VOP displayed immediately before, so in this case, the display time of I1 displayed immediately before
From 0h: 12m: 35s + 350ms, time 0h: 12 after 400ms or 800ms
It becomes m: 35s: 750ms or 0h: 12m: 36s: 200ms, respectively.

【０２４１】次に、Ｐ４についてであるが、Ｐ４では、
modulo_time_baseによって示される同期点が切り替わっ
ており、従って、同期点は時刻0h:12m:35sから１秒経過
した0h:12m:36sとなる。その結果、Ｐ４の表示時刻は、
時刻0h:12m:36sから550ms後の時刻0h:12m:36:550msとい
うことになる。Next, regarding P4, in P4,
The sync point indicated by modulo_time_base has been switched, so the sync point is 0h: 12m: 36s, which is one second after the time 0h: 12m: 35s. As a result, the display time of P4 is
It means 550ms after time 0h: 12m: 36s and time 0h: 12m: 36: 550ms.

【０２４２】Ｂ５の表示時刻は、直前に表示されるＩ−
ＶＯＰまたはP-VOPから、VOP_time_incrementだけ経過
した時刻であるから、いまの場合、直前に表示されるＰ
４の表示時刻0h:12m:36:550msから、400ms後の時刻0h:12
m:36s:950msということになる。The display time of B5 is the I-displayed immediately before.
Since it is the time when VOP_time_increment has passed from the VOP or P-VOP, in the present case, the P displayed immediately before is displayed.
4 display time 0h: 12m: 36: 550ms, time 400h later 0h: 12
It will be m: 36s: 950ms.

【０２４３】そして、Ｐ６についてであるが、Ｐ６で
は、modulo_time_baseによって示される同期点が切り替
わっており、従って、同期点は時刻0h:12m:35s+2秒、即
ち、0h:12m:37sとなる。その結果、Ｐ６の表示時刻は、
時刻0h:12m:37sから350ms後の時刻0h:12m:37s:350msと
いうことになる。Regarding P6, in P6, the synchronization point indicated by modulo_time_base is switched, and therefore the synchronization point is time 0h: 12m: 35s + 2 seconds, that is, 0h: 12m: 37s. As a result, the display time of P6 is
It means that the time is 0h: 12m: 37s: 350ms after 350ms from the time 0h: 12m: 37s.

【０２４４】次に、図２６は、表示順で、先頭のVOPがB
-VOPになっている場合の、GOVについてのtime_codeと、
modulo_time_baseおよびVOP_time_incrementとの関係を
示している。Next, in FIG. 26, the top VOP is B in the display order.
-The time_code for GOV when VOP is set,
It shows the relationship with modulo_time_base and VOP_time_increment.

【０２４５】図２６において、GOVは、その先頭から、
表示順で、Ｂ０，Ｉ１，Ｂ２，Ｂ３，Ｐ４，Ｂ５，Ｐ６
が配置されて構成されている。即ち、図２６では、図２
５において、Ｉ１の前にＢ０が追加されて、GOVが構成
されている。In FIG. 26, GOV is
In display order, B0, I1, B2, B3, P4, B5, P6
Are arranged and configured. That is, in FIG.
In 5, the GOV is constructed by adding B0 before I1.

【０２４６】この場合、GOVの先頭のＢ０についてのVOP
_time_incrementを、そのGOVを構成するI/P-VOPの表示
時刻を基準として定めることとすると、即ち、例えば、
Ｉ１の表示時刻を基準として定めることとすると、その
値は負になり、上述したように、都合が悪い。In this case, the VOP for B0 at the beginning of GOV
If _time_increment is defined based on the display time of the I / P-VOP forming the GOV, that is, for example,
If the display time of I1 is set as a reference, the value becomes negative, which is inconvenient as described above.

【０２４７】そこで、GOVの中の、I-VOPよりも先に表示
されるB-VOP（GOVの中で、最初に表示されるI-VOPより
も先行して表示されるB-VOP）のVOP_time_incrementに
ついては、そのセマンティクスを、以下のように変更す
る。Therefore, in the GOV, the B-VOP displayed before the I-VOP (the B-VOP displayed prior to the I-VOP displayed first in the GOV) Change the semantics of VOP_time_increment as follows.

【０２４８】すなわち、そのようなB-VOPのVOP_time_in
crementは、GOVのtime_codeの時刻と、B-VOPの表示時刻
との差分値とする。この場合、図２６に示すように、Ｂ
０の表示時刻が、例えば、0h:12m:35s:200msであり、GO
Vのtime_codeが、例えば、0h:12m:35sであるときには、
Ｂ０のVOP_time_incrementは、350ms（＝0h:12m:35s:20
0ms−0h:12m:35s）になる。このようにすることで、VOP
_time_incrementは、常に正の値になる。That is, VOP_time_in of such a B-VOP
crement is the difference value between the time of time_code of GOV and the display time of B-VOP. In this case, as shown in FIG.
The display time of 0 is, for example, 0h: 12m: 35s: 200ms, and GO
When the time_code of V is 0h: 12m: 35s, for example,
V0_time_increment of B0 is 350ms (= 0h: 12m: 35s: 20
0ms−0h: 12m: 35s). By doing this, VOP
_time_increment is always a positive value.

【０２４９】以上のような、VOP_time_incrementについ
てのセマンティクスの２つの変更により、GOVのtime_co
deと、VOPのmodulo_time_baseおよびVOP_time_incremen
tとを関係付けることができ、さらに、これにより、各V
OPが表示される絶対時刻（表示時刻）を特定することが
できる。[0249] Due to the two changes in the semantics of VOP_time_increment as described above, GOV time_co
de and VOP modulo_time_base and VOP_time_incremen
can be related to t, and in addition, each V
The absolute time (display time) when the OP is displayed can be specified.

【０２５０】次に、図２７は、I-VOPの表示時刻と、そ
れから予測されるB-VOPの表示時刻との間隔が１秒（正
確には、１．０２３秒）より大きい場合の、GOVについ
てのtime_codeと、modulo_time_baseおよびVOP_time_in
crementとの関係を示している。Next, FIG. 27 shows GOV when the interval between the display time of the I-VOP and the predicted display time of the B-VOP is longer than 1 second (more precisely, 1.023 seconds). About time_code and modulo_time_base and VOP_time_in
It shows the relationship with crement.

【０２５１】図２７において、GOVは、表示順で、Ｉ
１，Ｂ２，Ｂ３，Ｂ４，Ｐ６が順次配置されて構成され
ており、Ｂ４が、直前に表示されるI-VOPであるＩ１の
表示時刻よりも、１秒より後の時刻において表示される
ようになされている。In FIG. 27, GOV is I in display order.
1, B2, B3, B4, P6 are sequentially arranged, and B4 is displayed at a time that is one second after the display time of I1, which is the I-VOP displayed immediately before. Has been done.

【０２５２】この場合、上述のようにセマンティクスを
変更したVOP_time_incrementによって、Ｂ４の表示時刻
を符号化しようとしても、VOP_time_incrementは、上述
のように１０ビットであるため、1023までしか表現でき
ず、1.023秒より長い時間を表現することはできない。
そこで、VOP_time_incrementのセマンティクスをさらに
変更するとともに、modulo_time_baseのセマンティクス
をも変更し、このような場合であっても対応できるよう
にする。In this case, even if an attempt is made to encode the display time of B4 by VOP_time_increment whose semantics have been changed as described above, VOP_time_increment is 10 bits as described above, so it can only express up to 1023 and 1.023 seconds. It cannot express a longer time.
Therefore, the semantics of VOP_time_increment are further changed, and the semantics of modulo_time_base are also changed so that such a case can be dealt with.

【０２５３】ここでは、例えば、次のような第１または
第２の方法のいずれかによって対応する。Here, for example, either of the following first or second method is adopted.

【０２５４】即ち、第１の方法では、I/P-VOPの表示時
刻と、それから予測されるB-VOPの表示時刻との間の時
間を、ミリ秒精度で求め、その時間を、秒の単位まで
は、modulo_time_baseで表現し、残りのミリ秒の精度
を、VOP_time_incrementで表現する。That is, according to the first method, the time between the display time of the I / P-VOP and the predicted display time of the B-VOP is calculated with millisecond precision, and the time is calculated in seconds. Up to the unit, it is expressed by modulo_time_base, and the precision of the remaining milliseconds is expressed by VOP_time_increment.

【０２５５】図２７に示した場合において、第１の方法
にしたがって、modulo_time_baseおよびVOP_time_incre
mentを符号化した場合の、GOVについてのtime_codeと、
modulo_time_baseおよびVOP_time_incrementとの関係
を、図２８に示す。In the case shown in FIG. 27, according to the first method, modulo_time_base and VOP_time_incre
time_code for GOV when ment is encoded,
FIG. 28 shows the relationship between modulo_time_base and VOP_time_increment.

【０２５６】即ち、第１の方法では、modulo_time_base
の付加を、I-VOPおよびP-VOPだけでなく、B-VOPに対し
ても許可する。そして、B-VOPに付加されているmodulo_
time_baseは、同期点の切り替わりではなく、直前に表
示されるI/P-VOPの表示時刻からの秒単位の繰り上がり
を表すものとする。That is, in the first method, modulo_time_base
Is allowed not only for I-VOP and P-VOP but also for B-VOP. And modulo_ added to B-VOP
It is assumed that time_base does not indicate switching of synchronization points, but rather carries forward by seconds from the display time of the I / P-VOP displayed immediately before.

【０２５７】さらに、第１の方法では、B-VOPに付加さ
れるmodulo_time_baseによって示される、直前に表示さ
れるI/P-VOPの表示時刻からの秒単位の繰り上がり後の
時刻を、そのB-VOPの表示時刻から減算した値を、そのV
OP_time_incrementとして設定する。Further, in the first method, the time after the carry-up in seconds from the display time of the I / P-VOP displayed immediately before, which is indicated by modulo_time_base added to the B-VOP, is set to the B-VOP. -The value subtracted from the display time of VOP is the V
Set as OP_time_increment.

【０２５８】従って、第１の方法によれば、図２７にお
いて、例えば、Ｉ１の表示時刻を、0h:12m:35s:350msと
するとともに、Ｂ４の表示時刻を、0h:12m:36s:550msと
すると、Ｉ１とＢ４との表示時刻の差は、１秒以上の12
00msecであるから、Ｂ４には、図２８に示すように、直
前に表示されるＩ１の表示時刻からの秒単位の繰り上が
りを示すmodulo_time_base（図２８において、▼印で示
す）が付加される。具体的には、Ｂ４に付加されるmodu
lo_time_baseは、1200msの１秒の位の値である１秒の繰
り上がりを表す「１０」とされる。そして、Ｂ４のVOP_
time_incrementは、図２８に示すように、Ｉ１とＢ４と
の表示時刻の差の、１秒未満の値（Ｂ４の表示時刻か
ら、そのmodulo_time_baseによって示される、直前に表
示されるI/P-VOPであるＩ１の表示時刻からの秒単位の
繰り上がり後の時刻を減算した値）であるである２００
とされる。Therefore, according to the first method, in FIG. 27, for example, the display time of I1 is set to 0h: 12m: 35s: 350ms, and the display time of B4 is set to 0h: 12m: 36s: 550ms. Then, the difference between the display times of I1 and B4 is 12 seconds or more.
Since it is 00 msec, modulo_time_base (indicated by ▼ in FIG. 28) indicating a carry in seconds from the display time of I1 displayed immediately before is added to B4, as shown in FIG. Specifically, modu added to B4
lo_time_base is set to “10”, which represents a carry of 1 second, which is a value of 1 second of 1200 ms. And V4_ of B4
As shown in FIG. 28, time_increment is a value that is less than 1 second of the difference between the display times of I1 and B4 (from the display time of B4, the I / P-VOP displayed immediately before that is indicated by its modulo_time_base). Which is a value obtained by subtracting the time after the advance in seconds from the display time of a certain I1).
It is said that

【０２５９】以上のような、第１の方法によるmodulo_t
ime_baseとVOP_time_incrementについての処理は、エン
コーダ側では、例えば、図９および図１０に示したＶＬ
Ｃ器３６において、デコーダ側では、例えば、図１５お
よび図１６に示したＩＶＬＣ器１０２において、それぞ
れ行われる。Modulo_t according to the first method as described above
On the encoder side, the processing for ime_base and VOP_time_increment is performed by the VL shown in FIGS. 9 and 10, for example.
On the decoder side in the C unit 36, for example, in the IVLC unit 102 shown in FIGS. 15 and 16, respectively.

【０２６０】そこで、まず、図２９のフローチャートを
参照して、ＶＬＣ器３６が行うI/P-VOPのmodulo_time_b
aseおよびVOP_time_incrementに関する処理について説
明する。Therefore, first, referring to the flowchart of FIG. 29, modulo_time_b of the I / P-VOP performed by the VLC unit 36.
The processing related to ase and VOP_time_increment will be described.

【０２６１】ＶＬＣ器３６は、VOPのシーケンスを、GOV
ごとに分けて処理を行うようになされている。なお、GO
Vは、少なくとも１のイントラ符号化されるVOPを含むよ
うに構成される。The VLC unit 36 changes the sequence of VOP to GOV.
It is designed to be processed separately. In addition, GO
V is configured to include at least one intra-coded VOP.

【０２６２】ＶＬＣ器３６は、GOVを受信すると、例え
ば、その受信時刻を、そのGOVの符号化開始絶対時刻と
し、その符号化開始絶対時刻の秒精度まで（秒の桁まで
の符号化開始絶対時刻）を、time_codeとして符号化し
て、符号化ビットストリームの中に含める。その後、Ｖ
ＬＣ器３６は、GOVを構成するI/P-VOPを受信するごと
に、そのI/P-VOPを注目I/P-VOPとして、図２９のフロー
チャートにしたがい、注目I/P-VOPのmodulo_time_base
およびVOP_time_incrementを求めて、符号化する。When the GOV is received, the VLC unit 36 sets, for example, the reception time of the GOV as the absolute start time of encoding of the GOV, up to the second precision of the absolute start time of encoding (the absolute start of encoding up to the second digit). Time) is encoded as time_code and included in the encoded bitstream. Then V
Each time the LC unit 36 receives an I / P-VOP forming a GOV, the LC unit 36 sets the I / P-VOP as the target I / P-VOP and follows the flowchart of FIG. 29 to follow the modulo_time_base of the target I / P-VOP.
And VOP_time_increment are obtained and encoded.

【０２６３】即ち、ＶＬＣ器３６では、まず最初に、ス
テップＳ１において、modulo_time_baseに０Ｂ（Ｂは２
進数を表す）がセットされるとともに、VOP_time_incre
mentに０がセットされることにより、modulo_time_base
およびVOP_time_incrementがリセットされる。That is, in the VLC unit 36, first, in step S1, modulo_time_base is set to 0B (B is 2).
(Representing a decimal number) is set and VOP_time_incre
ment is set to 0, modulo_time_base
And VOP_time_increment are reset.

【０２６４】そして、ステップＳ２に進み、注目I/P-VO
Pが、処理対象としているGOV（処理対象GOV）の中で、
最初に表示されるI-VOP（First I-VOP）であるかどうか
が判定される。ステップＳ２において、注目I/P-VOP
が、処理対象GOVの中で、最初に表示されるI-VOPである
と判定された場合、ステップＳ４に進み、処理対象GOV
のtime_codeと、注目I/P-VOP（ここでは、処理対象GOV
の中で、最初に表示されるI-VOP）の表示時刻の秒精度
との差分、即ち、time_codeと、注目I/P-VOPの表示時刻
の秒の桁までとの差分が求められ、変数Ｄにセットされ
て、ステップＳ５に進む。Then, the processing proceeds to step S2, and the attention I / P-VO
Among the GOVs that P is processing (GOVs to be processed),
It is determined whether it is the first I-VOP displayed (First I-VOP). At step S2, the target I / P-VOP
Is determined to be the first I-VOP to be displayed among the GOVs to be processed, the process proceeds to step S4, and the GOVs to be processed
Time_code of the target I / P-VOP (here, the target GOV
, The difference between the display time of the first displayed I-VOP) and the second precision, that is, the difference between the time_code and the second digit of the display time of the I / P-VOP of interest is calculated. After being set to D, the process proceeds to step S5.

【０２６５】また、ステップＳ２において、注目I/P-VO
Pが、処理対象GOVの中で、最初に表示されるI-VOPでな
いと判定された場合、ステップＳ３に進み、注目I/P-VO
Pの表示時刻の秒の桁までと、その直前に表示されるI/P
-VOP（処理対象GOVを構成するVOPのうちの、注目I/P-VO
Pの直前に表示されるI/P-VOP）（Last display I/P-VO
P）の表示時刻の秒の桁までとの差分値が求められ、そ
の差分値が、変数Ｄにセットされて、ステップＳ５に進
む。Also, in step S2, the target I / P-VO
If it is determined that P is not the first I-VOP displayed in the GOVs to be processed, the process proceeds to step S3, and the target I / P-VO
Up to the second digit of the display time of P and the I / P displayed immediately before
-VOP (I / P-VO of interest among the VOPs that make up the GOV to be processed)
(I / P-VOP displayed immediately before P) (Last display I / P-VO
The difference value with the second digit of the display time of P) is obtained, the difference value is set in the variable D, and the process proceeds to step S5.

【０２６６】ステップＳ５では、変数Ｄが０に等しいか
どうか、即ち、time_codeと、注目I/P-VOPの表示時刻の
秒の桁までとの差分、または注目I/P-VOPの表示時刻の
秒の桁までと、その直前に表示されるI/P-VOPの表示時
刻の秒の桁までとの差分値が０秒であるかどうかが判定
される。ステップＳ５において、変数Ｄが０に等しくな
いと判定された場合、即ち、変数Ｄが１以上である場
合、ステップＳ６に進み、modulo_time_baseのＭＳＢ
（Most Significant Bit）として、１が付加される。即
ち、この場合、modulo_time_baseが、例えば、リセット
直後の０Ｂであるときには、１０Ｂとされ、また、modu
lo_time_baseが、例えば、１０Ｂであるときには、１１
０Ｂとされる。At step S5, whether or not the variable D is equal to 0, that is, the difference between time_code and the second digit of the display time of the attention I / P-VOP or the display time of the attention I / P-VOP is determined. It is determined whether the difference value between the second digit and the second digit of the display time of the I / P-VOP displayed immediately before that is 0 second. When it is determined in step S5 that the variable D is not equal to 0, that is, when the variable D is 1 or more, the process proceeds to step S6, and the MSB of modulo_time_base.
1 is added as (Most Significant Bit). That is, in this case, when modulo_time_base is, for example, 0B immediately after reset, it is set to 10B, and modu_time_base is modu_time_base.
When lo_time_base is, for example, 10B, 11
It is set to 0B.

【０２６７】そして、ステップＳ７に進み、変数Ｄが１
だけデクリメントされ、ステップＳ５に戻る。その後、
ステップＳ５において、変数Ｄが０に等しいと判定され
るまで、ステップＳ５乃至Ｓ７の処理が繰り返される。
即ち、これにより、modulo_time_baseは、time_code
と、注目I/P-VOPの表示時刻の秒の桁までとの差分、ま
たは注目I/P-VOPの表示時刻の秒の桁までと、その直前
に表示されるI/P-VOPの表示時刻の秒の桁までとの差分
値に対応する秒数と同一の数だけ「１」が連続し、その
最後に０が付加された値とされる。Then, the process proceeds to step S7, where the variable D is 1
Is decremented, and the process returns to step S5. afterwards,
The processes of steps S5 to S7 are repeated until it is determined in step S5 that the variable D is equal to 0.
That is, modulo_time_base is changed to time_code
And the difference between the display time of the target I / P-VOP and the second digit of the display time, or up to the second digit of the display time of the target I / P-VOP and the I / P-VOP displayed immediately before that. "1" continues for the same number of seconds as the number of seconds corresponding to the difference value up to the second digit of the time, and 0 is added to the end.

【０２６８】そして、ステップＳ５において、変数Ｄが
０に等しいと判定された場合、ステップＳ８に進み、VO
P_time_incrementに、注目I/P-VOPの表示時刻の秒精度
より細かい精度の時刻、即ち、ミリ秒単位の時刻がセッ
トされ、処理を終了する。If it is determined in step S5 that the variable D is equal to 0, the process proceeds to step S8 and VO
The P_time_increment is set to a time having a precision finer than the second precision of the display time of the target I / P-VOP, that is, a time in milliseconds, and the process ends.

【０２６９】以上のようにして求められた注目I/P-VOP
のmodulo_time_baseおよびVOP_time_incrementは、ＶＬ
Ｃ回路３６において、注目I/P-VOPに付加され、これに
より、符号化ビットストリームの中に含められる。[0269] Focused I / P-VOP obtained as described above
Modulo_time_base and VOP_time_increment of VL
In the C circuit 36, it is added to the I / P-VOP of interest and thereby included in the encoded bitstream.

【０２７０】なお、modulo_time_baseおよびVOP_time_i
ncrement、並びにtime_codeは、ＶＬＣ回路３６におい
て可変長符号化される。Note that modulo_time_base and VOP_time_i
The ncrement and time_code are variable length coded in the VLC circuit 36.

【０２７１】次に、ＶＬＣ器３６は、処理対象GOVを構
成するB-VOPを受信するごとに、そのB-VOPを注目B-VOP
として、図３０のフローチャートにしたがい、注目B-VO
Pのmodulo_time_baseおよびVOP_time_incrementを求め
て、符号化する。Next, every time the VLC unit 36 receives a B-VOP forming a GOV to be processed, the B-VOP is focused on by the B-VOP.
As shown in the flow chart of FIG.
P modulo_time_base and VOP_time_increment are obtained and encoded.

【０２７２】即ち、ＶＬＣ器３６では、まず最初に、ス
テップＳ１１において、図２９のステップＳ１における
場合と同様に、modulo_time_baseおよびVOP_time_incre
mentがリセットされる。That is, in the VLC device 36, first, in step S11, as in the case of step S1 of FIG. 29, modulo_time_base and VOP_time_incre.
ment is reset.

【０２７３】そして、ステップＳ１２に進み、注目B-VO
Pが、処理対象GOVの中で、最初に表示されるI-VOP（Fir
st I-VOP）よりも先に表示されるものであるかどうかが
判定される。ステップＳ１２において、注目B-VOPが、
処理対象GOVの中で、最初に表示されるI-VOPよりも先に
表示されるものであると判定された場合、ステップＳ１
４に進み、処理対象GOVのtime_codeと、注目B-VOP（こ
こでは、処理対象GOVの中で、最初に表示されるI-VOPよ
りも先行して表示されるB-VOP）の表示時刻との差分が
求められ、変数Ｄにセットされて、ステップＳ１５に進
む。従って、ここでは、変数Ｄには、ミリ秒精度の時間
（ミリ秒の桁までの時間）がセットされる（これに対し
て、図２９における変数Ｄには、上述したように、秒精
度の時間がセットされる）。Then, the process proceeds to step S12, and the attention B-VO
P is the first I-VOP (Fir (Fir)
st I-VOP) is determined before it is displayed. In step S12, the attention B-VOP is
If it is determined that the GOV to be processed is displayed before the I-VOP displayed first, step S1.
Proceed to step 4, and display the time_code of the processing target GOV and the display time of the B-VOP of interest (here, the B-VOP displayed prior to the I-VOP displayed first in the processing target GOV). Is calculated and set in the variable D, and the process proceeds to step S15. Therefore, here, the variable D is set to the millisecond precision time (time to the millisecond digit) (in contrast, the variable D in FIG. 29 is set to the second precision as described above. Time is set).

【０２７４】また、ステップＳ１２において、注目B-VO
Pが、処理対象GOVの中で、最初に表示されるI-VOPより
も後に表示されるものであると判定された場合、ステッ
プＳ１４に進み、注目B-VOPの表示時刻と、その直前に
表示されるI/P-VOP（処理対象GOVを構成するVOPのうち
の、注目B-VOPの直前に表示されるI/P-VOP）（Last dis
play I/P-VOP）の表示時刻との差分値が求められ、その
差分値が、変数Ｄにセットされて、ステップＳ１５に進
む。Also, in step S12, the B-VO of interest is
If it is determined that P is displayed after the I-VOP displayed first in the GOVs to be processed, the process proceeds to step S14, and the display time of the B-VOP of interest and immediately before that. I / P-VOP displayed (I / P-VOP displayed immediately before the attention B-VOP among the VOPs constituting the GOV to be processed) (Last dis
play I / P-VOP) display time difference is obtained, the difference value is set in variable D, and the process proceeds to step S15.

【０２７５】ステップＳ１５では、変数Ｄが１より大で
あるかどうか、即ち、time_codeと、注目B-VOPの表示時
刻との差分値、または注目B-VOPの表示時刻と、その直
前に表示されるI/P-VOPの表示時刻との差分値が、１秒
より大であるかどうかが判定される。ステップＳ５にお
いて、変数Ｄが１より大であると判定された場合、即
ち、modulo_time_baseのＭＳＢとして、１が付加され、
ステップＳ１７に進む。ステップＳ１７では、変数Ｄが
１だけデクリメントされ、ステップＳ１５に戻る。そし
て、ステップＳ１５において、変数Ｄが１より大でない
と判定されるまで、ステップＳ１５乃至Ｓ１７の処理が
繰り返される。即ち、これにより、modulo_time_base
は、time_codeと、注目B-VOPの表示時刻との差分値、ま
たは注目B-VOPの表示時刻と、その直前に表示されるI/P
-VOPの表示時刻との差分値に対応する時間の秒数と同一
の数だけ「１」が連続し、その最後に０が付加された値
とされる。In step S15, it is displayed whether or not the variable D is greater than 1, that is, the difference value between time_code and the display time of the B-VOP of interest, or the display time of the B-VOP of interest, and immediately before that. It is determined whether the difference value from the display time of the I / P-VOP is greater than 1 second. If it is determined in step S5 that the variable D is greater than 1, that is, 1 is added as the MSB of modulo_time_base,
It proceeds to step S17. In step S17, the variable D is decremented by 1, and the process returns to step S15. Then, the processes of steps S15 to S17 are repeated until it is determined in step S15 that the variable D is not greater than 1. I.e. this allows modulo_time_base
Is the difference between time_code and the display time of the B-VOP of interest, or the display time of the B-VOP of interest and the I / P displayed immediately before that.
-“1” continues for the same number of seconds as the time corresponding to the difference value from the display time of VOP, and 0 is added to the end.

【０２７６】そして、ステップＳ１５において、変数Ｄ
が１より大でないと判定された場合、ステップＳ８に進
み、そのときの変数Ｄの値、即ち、time_codeと、注目B
-VOPの表示時刻との差分値、または注目B-VOPの表示時
刻と、その直前に表示されるI/P-VOPの表示時刻との差
分値の秒の位より下の位（ミリ秒単位の時間）が、VOP_
time_incrementにセットされ、処理を終了する。Then, in step S15, the variable D
If it is determined that is not greater than 1, the process proceeds to step S8, and the value of the variable D at that time, that is, the time_code and the attention B
-The difference between the display time of the VOP or the display time of the B-VOP of interest and the difference value between the display time of the I / P-VOP displayed immediately before it and the position below the second digit (in milliseconds). Time) but VOP_
It is set to time_increment and the process ends.

【０２７７】以上のようにして求められた注目B-VOPのm
odulo_time_baseおよびVOP_time_incrementは、ＶＬＣ
回路３６において、注目B-VOPに付加され、これによ
り、符号化ビットストリームの中に含められる。[0277] m of the attention B-VOP obtained as described above
odulo_time_base and VOP_time_increment are VLC
In circuit 36, it is added to the B-VOP of interest and is thereby included in the encoded bitstream.

【０２７８】次に、ＩＶＬＣ器１０２は、上述のように
して、ＶＬＣ器３６が、VOPのシーケンスを、GOVごとに
分けて処理を行うことにより出力する符号化ストリーム
の中のVOPの表示時刻を、各VOPについての符号化データ
を受信するごとに、そのVOPを注目VOPとして処理を行う
ことにより認識し、その表示時刻にVOPが表示されるよ
うに、可変長復号を行うようになされている。即ち、Ｉ
ＶＬＣ器１０２は、GOVを受信すると、そのGOVのtime_c
odeを認識し、そのGOVを構成するI/P-VOPを受信するご
とに、そのI/P-VOPを注目I/P-VOPとして、図３１のフロ
ーチャートにしたがい、注目I/P-VOPのmodulo_time_bas
eおよびVOP_time_incrementに基づき、その表示時刻を
求める。Next, in the IVLC unit 102, as described above, the VLC unit 36 divides the VOP sequence into GOVs and processes the divided VOP sequence to display the VOP display time in the encoded stream. , Each time the encoded data for each VOP is received, the VOP is recognized as a target VOP by processing, and the variable length decoding is performed so that the VOP is displayed at the display time. . That is, I
When the VLC device 102 receives the GOV, the time_c of the GOV is received.
Each time the ode is recognized and the I / P-VOP forming the GOV is received, the I / P-VOP is set as the target I / P-VOP, and the target I / P-VOP modulo_time_bas
The display time is calculated based on e and VOP_time_increment.

【０２７９】即ち、ＩＶＬＣ器１０２では、まず最初
に、ステップＳ２１において、注目I/P-VOPが、処理対
象GOVの中で、最初に表示されるI-VOP（First I-VOP）
であるかどうかが判定される。ステップＳ２１におい
て、注目I/P-VOPが、処理対象GOVの中で、最初に表示さ
れるI-VOPであると判定された場合、ステップＳ２３に
進み、変数Ｔに、処理対象GOVのtime_codeがセットさ
れ、ステップＳ２４に進む。That is, in the IVLC device 102, first, in step S21, the focused I / P-VOP is the first I-VOP (First I-VOP) displayed in the GOVs to be processed.
Is determined. When it is determined in step S21 that the target I / P-VOP is the first I-VOP displayed in the processing target GOV, the process proceeds to step S23, and the variable T contains the time_code of the processing target GOV. After being set, the process proceeds to step S24.

【０２８０】また、ステップＳ２１において、注目I/P-
VOPが、処理対象GOVの中で、最初に表示されるI-VOPで
ないと判定された場合、ステップＳ２２に進み、注目I/
P-VOPの直前に表示されるI/P-VOP（処理対象GOVを構成
するVOPのうちの、注目I/P-VOPの直前に表示されるI/P-
VOP）（Last display I/P-VOP）の表示時刻の秒の桁ま
でが、変数Ｔにセットされて、ステップＳ２４に進む。In step S21, the target I / P-
If it is determined that the VOP is not the first I-VOP displayed in the GOVs to be processed, the process proceeds to step S22, and the I / O
I / P-VOP displayed immediately before the P-VOP (I / P-VOP displayed immediately before the focused I / P-VOP among the VOPs forming the GOV to be processed)
VOP) (Last display I / P-VOP) up to the second digit of the display time is set in the variable T, and the process proceeds to step S24.

【０２８１】ステップＳ２４では、注目I/P-VOPに付加
されているmodulo_time_baseが０Ｂに等しいかどうかが
判定される。ステップＳ２４において、注目I/P-VOPに
付加されているmodulo_time_baseが０Ｂに等しくないと
判定された場合、即ち、注目I/P-VOPに付加されているm
odulo_time_baseに１が含まれる場合、ステップＳ２５
に進み、そのmodulo_time_baseのＭＳＢの１が削除さ
れ、ステップＳ２６に進む。ステップＳ２６では、変数
Ｔが１秒だけインクリメントされ、ステップＳ２４に戻
り、以下、ステップＳ２４において、注目I/P-VOPに付
加されているmodulo_time_baseが０Ｂに等しいと判定さ
れるまで、ステップＳ２４乃至Ｓ２６の処理を繰り返
す。これにより、変数Ｔは、注目I/P-VOPに、最初に付
加されていたmodulo_time_baseの１の数に対応する秒数
だけインクリメントされる。In step S24, it is determined whether modulo_time_base added to the target I / P-VOP is equal to 0B. In step S24, when it is determined that modulo_time_base added to the attention I / P-VOP is not equal to 0B, that is, m added to the attention I / P-VOP.
If odulo_time_base includes 1, step S25
Then, the MSB 1 of modulo_time_base is deleted and the process proceeds to step S26. In step S26, the variable T is incremented by 1 second, and the process returns to step S24, and thereafter, in step S24, steps S24 to S26 are repeated until modulo_time_base added to the attention I / P-VOP is equal to 0B. The process of is repeated. As a result, the variable T is incremented by the number of seconds corresponding to the number 1 of modulo_time_base that was initially added to the target I / P-VOP.

【０２８２】そして、ステップＳ２４において、注目I/
P-VOPに付加されているmodulo_time_baseが０Ｂに等し
いと判定された場合、ステップＳ２７に進み、変数Ｔ
に、VOP_time_incrementが表すミリ秒精度の時刻が加算
され、その加算値が、注目I/P-VOPの表示時刻として認
識されて、処理を終了する。Then, in step S24, the attention I /
When it is determined that modulo_time_base added to the P-VOP is equal to 0B, the process proceeds to step S27 and the variable T
Is added to the time of millisecond precision represented by VOP_time_increment, the added value is recognized as the display time of the I / P-VOP of interest, and the process ends.

【０２８３】次に、ＩＶＬＣ器１０２では、GOVを構成
するB-VOPを受信した場合には、図３２のフローチャー
トにしたがい、注目B-VOPのmodulo_time_baseおよびVOP
_time_incrementに基づき、その表示時刻が求められ
る。Next, when the IVLC unit 102 receives the B-VOP forming the GOV, it follows the modulo_time_base and VOP of the B-VOP of interest according to the flowchart of FIG.
The display time is calculated based on _time_increment.

【０２８４】即ち、ＩＶＬＣ器１０２では、まず最初
に、ステップＳ３１において、注目B-VOPが、処理対象G
OVの中で、最初に表示されるI-VOP（First I-VOP）より
も先に表示されるものであるかどうかが判定される。ス
テップＳ３１において、注目B-VOPが、処理対象GOVの中
で、最初に表示されるI-VOPよりも先に表示されるもの
であると判定された場合、ステップＳ３３に進み、以
下、ステップＳ３３乃至Ｓ３７において、図３１のステ
ップＳ２３乃至Ｓ２７における場合とそれぞれ同様の処
理が行われることにより、注目B-VOPの表示時刻が求め
られる。That is, in the IVLC device 102, first, in step S31, the focused B-VOP is the processing target G.
It is determined whether the OV is displayed before the I-VOP displayed first (First I-VOP). When it is determined in step S31 that the focused B-VOP is displayed before the I-VOP displayed first in the GOVs to be processed, the process proceeds to step S33, and then step S33. In steps S37 to S37, the same processing as that in steps S23 to S27 of FIG.

【０２８５】一方、ステップＳ３１において、注目B-VO
Pが、処理対象GOVの中で、最初に表示されるI-VOPより
も後に表示されるものであると判定された場合、ステッ
プＳ３２に進み、以下、ステップＳ３２，Ｓ３４乃至Ｓ
３７において、図３１のステップＳ２２，Ｓ２４乃至Ｓ
２７における場合とそれぞれ同様の処理が行われること
により、注目B-VOPの表示時刻が求められる。On the other hand, in step S31, the target B-VO
When it is determined that P is displayed after the I-VOP displayed first in the GOVs to be processed, the process proceeds to step S32, and then steps S32, S34 to S34.
37, steps S22, S24 to S of FIG.
The display time of the focused B-VOP is obtained by performing the same processing as that in 27.

【０２８６】次に、第２の方法では、I-VOPの表示時刻
と、それから予測されるB-VOPの表示時刻との間の時間
を、秒の桁まで求め、その値を、modulo_time_baseで表
現し、B-VOPの表示時刻のミリ秒精度を、VOP_time_incr
ementで表現する。即ち、VM6.0では、上述したように、
B-VOPの予測符号化時に参照画像として用いるI−VOPま
たはP−VOPに対する重みを、B-VOPから、それを挟むI−
VOPまたはP−VOPまでの時間的距離に基づいて決めるの
に、その時間的距離を、B-VOPについてのVOP_time_incr
ementとしており、このため、I−VOPおよびP−VOPにつ
いてのVOP_time_incrementが、直前に符号化／復号され
たmodulo_time_baseによって示された同期点からの時間
を表すのと異なっているが、B-VOPの表示時刻と、それ
を挟むI−VOPまたはP−VOPの表示時刻が分かれば、それ
らの間の時間的距離は、差分をとるだけで求めることが
でき、従って、B-VOPについてのVOP_time_incrementだ
けを、I−VOPおよびP−VOPについてのVOP_time_increme
ntと異なる取り扱いとする必要性は小さい。むしろ、処
理効率の観点からは、Ｉ，Ｂ，ＰのすべてのVOPのVOP_t
ime_incrment（詳細時間情報）、さらには、modulo_tim
e_base（秒精度時刻情報）は、同一の取り扱いとするの
が望ましい。Next, in the second method, the time between the display time of the I-VOP and the predicted display time of the B-VOP is calculated up to the order of seconds, and the value is expressed by modulo_time_base. Then, the millisecond precision of the B-VOP display time is set to VOP_time_incr
Express in ement. That is, in VM6.0, as described above,
The weight for I-VOP or P-VOP used as a reference image at the time of predictive coding of B-VOP, from B-VOP, I- sandwiching it
To decide based on the temporal distance to VOP or P-VOP, the temporal distance is set to VOP_time_incr for B-VOP.
However, the VOP_time_increment for I-VOP and P-VOP is different from the time from the sync point indicated by the immediately encoded / decoded modulo_time_base. If the display time and the display time of the I-VOP or P-VOP sandwiching the display time are known, the temporal distance between them can be obtained by taking the difference, and therefore, only the VOP_time_increment for the B-VOP can be obtained. , V-time_increme for I-VOP and P-VOP
There is little need to treat it differently from nt. Rather, from the viewpoint of processing efficiency, VOP_t of all VOPs of I, B, P
ime_incrment (detailed time information), and modulo_tim
It is desirable to handle e_base (second precision time information) in the same way.

【０２８７】そこで、第２の方法では、B-VOPについて
のmodulo_time_baseおよびVOP_time_incrementを、I/P
−VOPについてのものと同一の取り扱いとしている。Therefore, in the second method, modulo_time_base and VOP_time_increment for B-VOP are set to I / P.
-The handling is the same as for VOP.

【０２８８】例えば、図２７に示した場合において、第
２の方法にしたがって、modulo_time_baseおよびVOP_ti
me_incrementを符号化した場合の、GOVについてのtime_
codeと、modulo_time_baseおよびVOP_time_incrementと
の関係を、図３３に示す。For example, in the case shown in FIG. 27, according to the second method, modulo_time_base and VOP_ti.
time_ about GOV when encoding me_increment
FIG. 33 shows the relationship between code and modulo_time_base and VOP_time_increment.

【０２８９】即ち、第２の方法でも、modulo_time_base
の付加を、I-VOPおよびP-VOPだけでなく、B-VOPに対し
ても許可する。そして、B-VOPに付加されるmodulo_time
_baseも、I/P-VOPに付加されるmodulo_time_baseと同様
に、同期点の切り替わりを表すものとする。That is, also in the second method, modulo_time_base
Is allowed not only for I-VOP and P-VOP but also for B-VOP. And modulo_time added to B-VOP
Similarly to modulo_time_base added to I / P-VOP, _base also represents switching of synchronization points.

【０２９０】さらに、第２の方法では、B-VOPに付加さ
れているmodulo_time_baseによって示される同期点の時
刻を、そのB-VOPの表示時刻から減算した値が、そのVOP
_time_incrementとして設定される。Further, in the second method, the value obtained by subtracting the time of the sync point indicated by modulo_time_base added to the B-VOP from the display time of the B-VOP is the VOP.
It is set as _time_increment.

【０２９１】従って、第２の方法によれば、図２７にお
いて、GOVのtime_codeが表す時刻である、GOVの最初の
同期点から、時刻time_code＋１秒で示される同期点ま
での間に表示されるＩ１またはＢ２のmodulo_time_base
は、いずれも０Ｂとされるとともに、それぞれのVOP_ti
me_incrementは、Ｉ１またはＢ２の表示時刻の秒の位よ
り下の位のミリ秒単位の値がセットされる。また、時刻
time_code＋１秒で示される同期点から、時刻time_code
＋２秒で示される同期点までの間に表示されるＢ３また
はＢ４のmodulo_time_baseは、いずれも１０Ｂとされる
とともに、それぞれのVOP_time_incrementは、Ｂ３また
はＢ４の表示時刻の秒の位より下の位のミリ秒単位の値
がセットされる。さらに、時刻time_code＋２秒で示さ
れる同期点から、時刻time_code＋３秒で示される同期
点までの間に表示されるＰ５のmodulo_time_baseは１１
０Ｂとされるとともに、そのVOP_time_incrementは、Ｐ
５の表示時刻の秒の位より下の位のミリ秒単位の値がセ
ットされる。Therefore, according to the second method, in FIG. 27, I1 displayed between the first sync point of the GOV, which is the time represented by the time_code of the GOV, and the sync point indicated by the time time_code + 1 second, is displayed. Or modulo_time_base of B2
Is set to 0B, and each VOP_ti
The me_increment is set to a value in milliseconds that is lower than the second place of the display time of I1 or B2. Also, the time
From the sync point indicated by time_code + 1 second, time time_code
The modulo_time_base of B3 or B4 displayed up to the synchronization point indicated by +2 seconds is 10B, and the VOP_time_increment of each is 3 mm below the second of the display time of B3 or B4. The value in seconds is set. Furthermore, the modulo_time_base of P5 displayed from the sync point indicated by time time_code + 2 seconds to the sync point indicated by time time_code + 3 seconds is 11
It is set to 0B, and its VOP_time_increment is P
A value in milliseconds below the second place of the display time of 5 is set.

【０２９２】図２７において、例えば、上述したよう
に、Ｉ１の表示時刻を、0h:12m:35s:350msとするととも
に、Ｂ４の表示時刻を、0h:12m:36s:550msとすると、Ｉ
１またはＢ４のmodulo_time_baseは、上述したように、
それぞれ０Ｂまたは１０Ｂとされる。また、Ｉ１または
Ｂ４のVOP_time_incrementは、それぞれ、表示時刻のミ
リ秒単位である350msまたは550msとされる。In FIG. 27, for example, assuming that the display time of I1 is 0h: 12m: 35s: 350ms and the display time of B4 is 0h: 12m: 36s: 550ms, as described above, I
Modulo_time_base of 1 or B4 is, as described above,
It is set to 0B or 10B, respectively. The VOP_time_increment of I1 or B4 is 350 ms or 550 ms, which is the millisecond unit of the display time, respectively.

【０２９３】以上のような、第２の方法によるmodulo_t
ime_baseとVOP_time_incrementについての処理も、例え
ば、第１の方法による場合と同様に、図９および図１０
に示したＶＬＣ器３６、並びに図１５および図１６に示
したＩＶＬＣ器１０２において行われる。Modulo_t by the second method as described above
As for the processing for ime_base and VOP_time_increment, for example, as in the case of the first method, FIG. 9 and FIG.
15 and the IVLC device 102 shown in FIGS. 15 and 16.

【０２９４】即ち、ＶＬＣ器３６では、I/P-VOPについ
ては、図２９における場合と同様にして、modulo_time_
baseおよびVOP_time_incrementが求められる。That is, in the VLC device 36, I / P-VOP is modulo_time_in the same manner as in FIG.
base and VOP_time_increment are required.

【０２９５】また、B-VOPについては、ＶＬＣ器３６に
おいて、GOVを構成するB-VOPを受信するごとに、そのB-
VOPを注目B-VOPとして、図３４のフローチャートにした
がい、注目B-VOPのmodulo_time_baseおよびVOP_time_in
crementが求められる。Regarding the B-VOP, each time the VLC unit 36 receives a B-VOP forming a GOV, that B-VOP is received.
According to the flowchart of FIG. 34, assuming that the VOP is the B-VOP of interest, the modulo_time_base and VOP_time_in of the B-VOP of interest
crement is required.

【０２９６】即ち、ＶＬＣ器３６では、まず最初に、ス
テップＳ４１において、図２９のステップＳ１における
場合と同様にして、modulo_time_baseおよびVOP_time_i
ncrementがリセットされる。That is, in the VLC device 36, first, in step S41, as in the case of step S1 of FIG. 29, modulo_time_base and VOP_time_i.
ncrement is reset.

【０２９７】そして、ステップＳ４２に進み、注目B-VO
Pが、処理対象としているGOV（処理対象GOV）の中で、
最初に表示されるI-VOP（First I-VOP）よりも先行して
表示されるものであるかどうかが判定される。ステップ
Ｓ１２において、注目B-VOPが、処理対象GOVの中で、最
初に表示されるI-VOPよりも先行して表示されるもので
あると判定された場合、ステップＳ４４に進み、処理対
象GOVのtime_codeと、注目B-VOPの表示時刻の秒精度と
の差分、即ち、time_codeと、注目B-VOPの表示時刻の秒
の桁までとの差分が求められ、変数Ｄにセットされて、
ステップＳ４５に進む。[0297] Then, the processing proceeds to step S42, and the attention B-VO
Among the GOVs that P is processing (GOVs to be processed),
It is determined whether or not it is displayed before the first I-VOP (First I-VOP) displayed. If it is determined in step S12 that the B-VOP of interest is displayed prior to the I-VOP displayed first in the GOVs to be processed, the process proceeds to step S44, and the GOVs to be processed are processed. Time_code and the second precision of the display time of the B-VOP of interest, that is, the difference between the time_code and the second digit of the display time of the B-VOP of interest is obtained and set to the variable D,
It proceeds to step S45.

【０２９８】また、ステップＳ４２において、注目B-VO
Pが、処理対象GOVの中で、最初に表示されるI-VOPより
も後に表示されるものであると判定された場合、ステッ
プＳ４３に進み、注目B-VOPの表示時刻の秒の桁まで
と、その直前に表示されるI/P-VOP（処理対象GOVを構成
するVOPのうちの、注目B-VOPの直前に表示されるI/P-VO
P）（Last display I/P-VOP）の表示時刻の秒の桁まで
との差分値が求められ、その差分値が、変数Ｄにセット
されて、ステップＳ４５に進む。In step S42, the target B-VO
When it is determined that P is displayed after the I-VOP displayed first in the GOVs to be processed, the process proceeds to step S43, and up to the second digit of the display time of the attention B-VOP. And the I / P-VOP displayed immediately before that (the I / P-VO displayed immediately before the target B-VOP among the VOPs forming the processing target GOV)
P) (Last display I / P-VOP), the difference value up to the second digit of the display time is obtained, the difference value is set in the variable D, and the process proceeds to step S45.

【０２９９】ステップＳ４５では、変数Ｄが０に等しい
かどうか、即ち、time_codeと、注目B-VOPの表示時刻の
秒の桁までとの差分、または注目B-VOPの表示時刻の秒
の桁までと、その直前に表示されるI/P-VOPの表示時刻
の秒の桁までとの差分値が０秒であるかどうかが判定さ
れる。ステップＳ４５において、変数Ｄが０に等しくな
いと判定された場合、即ち、変数Ｄが１以上である場
合、ステップＳ４６に進み、modulo_time_baseのＭＳＢ
として、１が付加される。In step S45, it is determined whether or not the variable D is equal to 0, that is, the difference between time_code and the second digit of the display time of the attention B-VOP, or the second digit of the display time of the attention B-VOP. Then, it is determined whether or not the difference value between the display time of the I / P-VOP displayed immediately before that and the second digit is 0 second. When it is determined in step S45 that the variable D is not equal to 0, that is, when the variable D is 1 or more, the process proceeds to step S46, and the MSB of modulo_time_base.
Is added as 1.

【０３００】そして、ステップＳ４７に進み、変数Ｄが
１だけデクリメントされ、ステップＳ４５に戻る。その
後、ステップＳ４５において、変数Ｄが０に等しいと判
定されるまで、ステップＳ４５乃至Ｓ４７の処理が繰り
返される。即ち、これにより、modulo_time_baseは、ti
me_codeと、注目B-VOPの表示時刻の秒の桁までとの差
分、または注目B-VOPの表示時刻の秒の桁までと、その
直前に表示されるI/P-VOPの表示時刻の秒の桁までとの
差分値に対応する秒数と同一の数だけ「１」が連続し、
その最後に０が付加された値とされる。Then, the process proceeds to step S47, the variable D is decremented by 1, and the process returns to step S45. After that, the processes of steps S45 to S47 are repeated until it is determined in step S45 that the variable D is equal to 0. That is, as a result, modulo_time_base is ti
The difference between me_code and the second digit of the display time of the B-VOP of interest, or the second digit of the display time of the B-VOP of interest and the second of the display time of the I / P-VOP displayed immediately before it. "1" continues for the same number of seconds corresponding to the difference value up to the digit
The value with 0 added to the end is set.

【０３０１】そして、ステップＳ４５において、変数Ｄ
が０に等しいと判定された場合、ステップＳ４８に進
み、VOP_time_incrementに、注目B-VOPの表示時刻の秒
精度より細かい精度の時刻、即ち、ミリ秒単位の時刻が
セットされ、処理を終了する。Then, in step S45, the variable D
When it is determined that is equal to 0, the process proceeds to step S48, the VOP_time_increment is set to a time having a precision finer than the second precision of the display time of the B-VOP of interest, that is, a time in milliseconds, and the process ends.

【０３０２】一方、ＩＶＬＣ器１０２では、I/P-VOPに
ついては、上述の図３１における場合と同様にして、mo
dulo_time_baseおよびVOP_time_incrementに基づき、そ
の表示時刻が求められる。On the other hand, in the IVLC unit 102, the I / P-VOP is set in the same manner as in the case of FIG.
The display time is calculated based on dulo_time_base and VOP_time_increment.

【０３０３】また、Ｂ−VOPについては、ＩＶＬＣ器１
０２において、GOVを構成するB-VOPを受信するごとに、
そのB-VOPを注目B-VOPとして、図３５のフローチャート
にしたがい、注目B-VOPのmodulo_time_baseおよびVOP_t
ime_incrementに基づき、その表示時刻が求められる。For B-VOP, IVLC unit 1
In 02, every time the B-VOP that constitutes the GOV is received,
With the B-VOP as the attention B-VOP, according to the flowchart in FIG. 35, the modulo_time_base and VOP_t of the attention B-VOP are set.
The display time is calculated based on ime_increment.

【０３０４】即ち、ＩＶＬＣ器１０２では、まず最初
に、ステップＳ５１において、注目B-VOPが、処理対象G
OVの中で、最初に表示されるI-VOP（First I-VOP）より
も先行して表示されるものであるかどうかが判定され
る。ステップＳ５１において、注目B-VOPが、処理対象G
OVの中で、最初に表示されるI-VOPよりも先行して表示
されるものであると判定された場合、ステップＳ５２に
進み、変数Ｔに、処理対象GOVのtime_codeがセットさ
れ、ステップＳ５４に進む。That is, in the IVLC device 102, first, in step S51, the B-VOP of interest is the processing target G.
It is determined whether or not the OV is displayed before the I-VOP displayed first (First I-VOP). In step S51, the attention B-VOP is the processing target G.
When it is determined that the OV is displayed prior to the I-VOP displayed first, the process proceeds to step S52, the time_code of the processing target GOV is set in the variable T, and the step S54 is performed. Proceed to.

【０３０５】また、ステップＳ５１において、注目B-VO
Pが、処理対象GOVの中で、最初に表示されるI-VOPより
も後に表示されるものであると判定された場合、ステッ
プＳ５３に進み、注目B-VOPの直前に表示されるI/P-VOP
（処理対象GOVを構成するVOPのうちの、注目B-VOPの直
前に表示されるI/P-VOP）（Last display I/P-VOP）の
表示時刻の秒の桁までが、変数Ｔにセットされて、ステ
ップＳ５４に進む。In step S51, the B-VO of interest is noted.
When it is determined that P is displayed after the I-VOP displayed first in the GOV to be processed, the process proceeds to step S53, and the I / I displayed immediately before the B-VOP of interest is displayed. P-VOP
(Of the VOPs that compose the GOV to be processed, the I / P-VOP displayed immediately before the B-VOP of interest) (Last display I / P-VOP) is displayed in the variable T up to the second digit. After being set, the process proceeds to step S54.

【０３０６】ステップＳ５４では、注目B-VOPに付加さ
れているmodulo_time_baseが０Ｂに等しいかどうかが判
定される。ステップＳ５４において、注目B-VOPに付加
されているmodulo_time_baseが０Ｂに等しくないと判定
された場合、即ち、注目B-VOPに付加されているmodulo_
time_baseに１が含まれる場合、ステップＳ５５に進
み、そのmodulo_time_baseのＭＳＢの１が削除され、ス
テップＳ５６に進む。ステップＳ５６では、変数Ｔが１
秒だけインクリメントされ、ステップＳ５４に戻り、以
下、ステップＳ５４において、注目B-VOPに付加されて
いるmodulo_time_baseが０Ｂに等しいと判定されるま
で、ステップＳ５４乃至Ｓ５６の処理を繰り返す。これ
により、変数Ｔは、注目B-VOPに、最初に付加されてい
たmodulo_time_baseの１の数に対応する秒数だけインク
リメントされる。[0306] In step S54, it is determined whether modulo_time_base added to the B-VOP of interest is equal to 0B. When it is determined in step S54 that modulo_time_base added to the attention B-VOP is not equal to 0B, that is, modulo_time added to the attention B-VOP.
When the time_base includes 1, the process proceeds to step S55, the MSB 1 of modulo_time_base is deleted, and the process proceeds to step S56. In step S56, the variable T is 1
It is incremented by seconds and returns to step S54, and thereafter, the processes of steps S54 to S56 are repeated until it is determined in step S54 that modulo_time_base added to the attention B-VOP is equal to 0B. As a result, the variable T is incremented by the number of seconds corresponding to the number 1 of modulo_time_base that was initially added to the target B-VOP.

【０３０７】そして、ステップＳ５４において、注目B-
VOPに付加されているmodulo_time_baseが０Ｂに等しい
と判定された場合、ステップＳ５７に進み、変数Ｔに、
VOP_time_incrementが表すミリ秒精度の時刻が加算さ
れ、その加算値が、注目B-VOPの表示時刻として認識さ
れて、処理を終了する。[0307] Then, in step S54, the attention B-
When it is determined that modulo_time_base added to VOP is equal to 0B, the process proceeds to step S57 and the variable T is set to
The time with millisecond precision represented by VOP_time_increment is added, the added value is recognized as the display time of the B-VOP of interest, and the process ends.

【０３０８】以上のように、符号化ビットストリームの
構成（階層）の中に、符号化開始絶対時刻を符号化する
GOV層を導入し、このGOV層を、ビットストリームの先頭
だけでなく、適当な位置に挿入できるようなものとする
とともに、VM6.0で規定されていたmodulo_time_baseとV
OP_time_incrementの定義を、上述のように変更したの
で、VOPのピクチャタイプの並びや、隣接するVOPの時間
間隔などによらず、すべての場合において、各VOPの表
示時刻（絶対時刻）を求めることが可能となる。As described above, the coding start absolute time is coded in the structure (layer) of the coded bit stream.
The GOV layer is introduced so that this GOV layer can be inserted not only at the beginning of the bitstream, but also at an appropriate position, and the modulo_time_base and V
Since the definition of OP_time_increment has been changed as described above, the display time (absolute time) of each VOP can be obtained in all cases, regardless of the arrangement of VOP picture types and the time interval between adjacent VOPs. It will be possible.

【０３０９】従って、エンコーダにおいて、符号化開始
絶対時刻を、GOV単位で符号化するとともに、各VOPのmo
dulo_time_baseとVOP_time_incrementを符号化し、符号
化ビットストリームに含めることで、デコーダでは、符
号化開始絶対時刻を、GOV単位で復号するとともに、VOP
のmodulo_time_baseおよびVOP_time_incrementを復号
し、それらから、各VOPの表示時刻を復号することがで
きるので、ランダムアクセスを、GOV単位で、効率的に
行うことが可能となる。Therefore, in the encoder, the absolute start time of encoding is encoded in GOV units and
By encoding dulo_time_base and VOP_time_increment and including them in the encoded bitstream, the decoder decodes the encoding start absolute time in GOV units and
Modulo_time_base and VOP_time_increment can be decoded, and the display time of each VOP can be decoded from them, it is possible to efficiently perform random access in GOV units.

【０３１０】なお、modulo_time_baseに付加する１の数
を、同期点の切り替わりにしたがって、単純に増加して
いくと、time_codeが示す時刻から、例えば、１時間
（３６００秒）経過後は（但し、GOVが、それだけの時
間に相当するVOPで構成されるとする）、modulo_time_b
aseは、３６００ビットの１と、１ビットの０とで構成
されるから、３６０１ビットという莫大なビット数にな
ることになる。Note that if the number of 1's added to modulo_time_base is simply increased according to the switching of the sync points, one hour (3600 seconds) after the time indicated by time_code (however, GOV , But with a VOP equivalent to that much time), modulo_time_b
Since ase is composed of 1 in 3600 bits and 0 in 1 bit, the number of bits becomes 3601 bits.

【０３１１】そこで、MPEG4では、modulo_time_base
は、同期点の切り替わり後に最初に現れるI/P-VOPにお
いてリセットされるように規定されている。Therefore, in MPEG4, modulo_time_base
Are defined to be reset at the first I / P-VOP that appears after the switching of sync points.

【０３１２】従って、例えば、図３６に示すように、GO
Vが、そのtime_codeが表す時刻である、GOVの最初の同
期点から、時刻time_code＋１秒で示される同期点まで
の間に表示されるＩ１およびＢ２、時刻time_code＋１
秒で示される同期点から、時刻time_code＋２秒で示さ
れる同期点までの間に表示されるＢ３およびＢ４、時刻
time_code＋２秒で示される同期点から、時刻time_code
＋３秒で示される同期点までの間に表示されるＰ５およ
びＢ６、時刻time_code＋３秒で示される同期点から、
時刻time_code＋４秒で示される同期点までの間に表示
されるＢ７、並びに時刻time_code＋４秒で示される同
期点から、時刻time_code＋５秒で示される同期点まで
の間に表示されるＢ８で構成される場合には、GOVの最
初の同期点から、時刻time_code＋１秒で示される同期
点までの間に表示されるＩ１およびＢ２のmodulo_time_
baseは、０Ｂとされる。Therefore, for example, as shown in FIG.
V is the time represented by the time_code, I1 and B2 displayed between the first sync point of GOV and the sync point indicated by time time_code + 1 seconds, time time_code + 1
B3 and B4 displayed between the sync point indicated by seconds and the sync point indicated by time_code + 2 seconds, the time
From the sync point indicated by time_code + 2 seconds, the time time_code
From the sync point indicated by P5 and B6, time_code + 3 seconds, which is displayed until the sync point indicated by +3 seconds,
When it is composed of B7 displayed between the sync point indicated by time time_code + 4 seconds and B8 displayed between the sync point indicated by time time_code + 4 seconds and the sync point indicated by time time_code + 5 seconds Is the modulo_time_ of I1 and B2 displayed between the first sync point of GOV and the sync point indicated by time time_code + 1 second.
The base is set to 0B.

【０３１３】また、時刻time_code＋１秒で示される同
期点から、時刻time_code＋２秒で示される同期点まで
の間に表示されるＢ３およびＢ４のmodulo_time_base
は、１０Ｂとされる。さらに、時刻time_code＋２秒で
示される同期点から、時刻time_code＋３秒で示される
同期点までの間に表示されるＰ５のmodulo_time_base
は、１１０Ｂとされる。Also, the modulo_time_base of B3 and B4 displayed from the synchronization point indicated by time time_code + 1 second to the synchronization point indicated by time time_code + 2 seconds.
Is 10B. Furthermore, the modulo_time_base of P5 displayed between the sync point indicated by time time_code + 2 seconds and the sync point indicated by time time_code + 3 seconds.
Is 110B.

【０３１４】そして、Ｐ５は、GOVの最初の同期点か
ら、時刻time_code＋１秒で示される同期点に切り替わ
った後に、最初に表示されるP-VOPであるから、modulo_
time_baseは、０Ｂにリセットされ、その後に表示され
るＢ６のmodulo_time_baseは、Ｐ５の表示時刻を求める
ときに参照される同期点、即ち、いまの場合、時刻time
_code＋２秒で示される同期点を、GOVの最初の同期点と
みなして設定される。従って、Ｂ６のmodulo_time_base
は、０Ｂとされる。Since P5 is the first P-VOP displayed after switching from the first sync point of GOV to the sync point indicated by time_code + 1 second, modulo_
The time_base is reset to 0B, and the modulo_time_base of B6 displayed thereafter is the synchronization point referred to when the display time of P5 is obtained, that is, the time time in this case.
It is set by regarding the sync point indicated by _code + 2 seconds as the first sync point of GOV. Therefore, B6 modulo_time_base
Is set to 0B.

【０３１５】その後、時刻time_code＋３秒で示される
同期点から、時刻time_code＋４秒で示される同期点ま
での間に表示されるＢ７のmodulo_time_baseは、１０Ｂ
とされ、時刻time_code＋４秒で示される同期点から、
時刻time_code＋５秒で示される同期点までの間に表示
されるＢ８のmodulo_time_baseは、１１０Ｂとされる。After that, the modulo_time_base of B7 displayed between the synchronization point indicated by time time_code + 3 seconds and the synchronization point indicated by time time_code + 4 seconds is 10B.
From the synchronization point indicated by time_code + 4 seconds,
The modulo_time_base of B8 displayed up to the synchronization point indicated by time time_code + 5 seconds is 110B.

【０３１６】図２９、図３０、および図３４で説明した
エンコーダ側（ＶＬＣ器３６）の処理は、上述のように
して、modulo_time_baseを設定するようになっている。In the processing on the encoder side (VLC unit 36) described with reference to FIGS. 29, 30, and 34, modulo_time_base is set as described above.

【０３１７】また、この場合、デコーダ側（ＩＶＬＣ器
１０２）では、同期点の切り替わり後に最初に表示され
るI/P-VOPを検出した場合は、それに付加されているmod
ulo_time_baseによって示される秒数を、time_codeに累
積加算して、表示時刻を求める必要があるが、即ち、例
えば、図３６に示した場合においては、Ｉ１乃至Ｐ５の
表示時刻は、time_codeに、各VOPに付加されているmodu
lo_time_baseに対応する秒数と、VOP_time_incrementと
を加算して求めれば良いが、同期点の切り替わり後に最
初に表示されるＰ５の後に表示されるＢ６乃至Ｂ８の表
示時刻は、time_codeに、各VOPに付加されているmodulo
_time_baseに対応する秒数と、VOP_time_incrementとを
加算する他に、さらに、Ｐ５のmodulo_time_baseに対応
する秒数である２秒を加算して求める必要があるが、図
３１、図３２、および図３５で説明した処理は、そのよ
うにして、表示時刻を求めるようになされている。In this case, on the decoder side (IVLC unit 102), when the I / P-VOP displayed first after the switching of the synchronization point is detected, the mod added to it is detected.
It is necessary to cumulatively add the number of seconds indicated by ulo_time_base to the time_code to obtain the display time. That is, for example, in the case shown in FIG. 36, the display times of I1 to P5 are added to the time_code for each VOP. Added to modu
It can be calculated by adding the number of seconds corresponding to lo_time_base and VOP_time_increment, but the display time of B6 to B8 displayed after P5 that is first displayed after the switching of the sync point is added to each VOP in time_code. Being modulo
In addition to adding the number of seconds corresponding to _time_base and VOP_time_increment, it is necessary to add 2 seconds corresponding to modulo_time_base of P5 to obtain the value. In FIG. 31, FIG. 32, and FIG. The described process is thus performed to obtain the display time.

【０３１８】次に、以上説明したエンコーダおよびデコ
ーダは、それ専用のハードウェアによって実現すること
もできるし、コンピュータに、上述したような処理を行
わせるためのプログラムを実行させることによっても実
現することができる。Next, the encoder and decoder described above can be realized by dedicated hardware, or can also be realized by causing a computer to execute a program for causing the above-described processing. You can

【０３１９】図３７は、図１のエンコーダまたは図１３
のデコーダとして機能するコンピュータの一実施の形態
の構成例を示している。FIG. 37 is a block diagram of the encoder of FIG.
2 illustrates a configuration example of an embodiment of a computer that functions as a decoder of the.

【０３２０】ＲＯＭ（Read Only Memory）２０１は、例
えば、ブートプログラムなどを記憶している。ＣＰＵ
（Central Processing Unit）２０２は、例えば、ＨＤ
（HardDisk）２０６に記憶されたプログラムを、ＲＡＭ
（Read Only Memory）２０３上に展開して実行すること
で、各種の処理を行うようになされている。ＲＡＭ２０
３は、ＣＰＵ２０２が実行するプログラムや、ＣＰＵ２
０２の処理上必要なデータを一時記憶するようになされ
ている。入力部２０４は、例えば、キーボードやマウス
などでなり、必要なコマンドやデータを入力するときな
どに操作される。出力部２０５は、例えば、ディスプレ
イなどでなり、ＣＰＵ２０２の制御にしたがったデータ
を表示する。ＨＤ２０６は、ＣＰＵ２０２が実行すべき
プログラム、さらには、エンコード対象の画像データ
や、エンコード後のデータ（符号化ビットストリー
ム）、デコード後の画像データなどを記憶するようにな
されている。通信Ｉ／Ｆ（Interface）２０７は、外部
との通信を制御することにより、例えば、エンコード対
象の画像データを、外部から受信したり、また、エンコ
ード後の符号化ビットストリームを外部に送信したりす
るようになされている。また、通信Ｉ／Ｆ２０７は、外
部でエンコードされた符号化ビットストリームを受信し
たり、また、デコード後の画像データを、外部に送信す
るようにもなされている。A ROM (Read Only Memory) 201 stores, for example, a boot program and the like. CPU
(Central Processing Unit) 202 is, for example, HD
The program stored in (HardDisk) 206 is stored in RAM
Various processes are performed by expanding the program on the (Read Only Memory) 203 and executing it. RAM20
3 is a program executed by the CPU 202 and the CPU 2
The data necessary for the process 02 is temporarily stored. The input unit 204 includes, for example, a keyboard and a mouse, and is operated when inputting necessary commands and data. The output unit 205 includes, for example, a display, and displays data according to the control of the CPU 202. The HD 206 stores a program to be executed by the CPU 202, image data to be encoded, encoded data (encoded bit stream), image data after decoding, and the like. The communication I / F (Interface) 207 controls the communication with the outside to receive, for example, the image data to be encoded from the outside, or to transmit the encoded bit stream after encoding to the outside. It is designed to do. The communication I / F 207 is also configured to receive an encoded bit stream encoded externally, and also transmit image data after decoding to the outside.

【０３２１】以上のように構成されるコンピュータのＣ
ＰＵ２０２に、上述したような処理を行うためのプログ
ラムを実行させることにより、このコンピュータは、図
１に示したエンコーダや、図１３に示したデコーダとし
て機能することになる。Computer C configured as described above
By causing the PU 202 to execute the program for performing the above-described processing, this computer functions as the encoder shown in FIG. 1 or the decoder shown in FIG.

【０３２２】なお、本実施の形態では、VOP_time_incre
mentは、VOPの表示時刻を、1ms単位で表すものとした
が、VOP_time_incrementは、その他、例えば、次のよう
なものとすることも可能である。即ち、１の同期点か
ら、その次の同期点までの間を、Ｎ個に分割し、VOPの
表示時刻に対応する分割点が、１の同期点から何番目の
分割点かを表す値を、VOP_time_incrementとすることが
可能である。このようにVOP_time_incrementを定義した
場合、Ｎ＝１０００とすると、VOP_time_incrementは、
VOPの表示時刻を、1ms単位で表すものとなる。なお、こ
の場合、デコーダでは、１の同期点から、その次の同期
点までの間を幾つに分割したかという情報が必要となる
が、同期点の間の分割数は、あらかじめ定めておいても
良いし、あるいは、GOV層よりも上位の階層に含めて、
デコーダに提供するようにしても良い。In this embodiment, VOP_time_incre
The ment represents the display time of the VOP in units of 1 ms, but the VOP_time_increment may be, for example, as follows. That is, the value from one sync point to the next sync point is divided into N, and a value indicating the number of the division point from the one sync point is the division point corresponding to the display time of the VOP. , VOP_time_increment. When VOP_time_increment is defined in this way, if N = 1000, VOP_time_increment becomes
The VOP display time is expressed in 1 ms units. In this case, the decoder needs information about how many divisions are made from one synchronization point to the next synchronization point. However, the number of divisions between the synchronization points is set in advance. Good, or include it in the hierarchy above the GOV layer,
It may be provided to the decoder.

【０３２３】[0323]

【発明の効果】本発明の画像符号化方法によれば、複数
のＶＯＰがグループ化され、各グループのＶＯＰの符号
化を開始した絶対時刻を表す絶対時刻情報がグループ単
位に付加される。さらに、グループ内における相対時刻
を、秒精度で表す秒精度時刻情報が生成されるととも
に、Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰそれぞ
れの表示時刻の直前の秒精度時刻情報から、それぞれの
表示時刻までの時間を、秒精度より細かい精度で表す詳
細時間情報が生成される。そして、Ｉ−ＶＯＰ，Ｐ−Ｖ
ＯＰ、またはＢ−ＶＯＰの表示時刻を表す情報として、
秒精度時刻情報および詳細時間情報が、対応するＩ−Ｖ
ＯＰ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰにそれぞれ付加さ
れる。この場合に、所定のＶＯＰについての秒精度時刻
情報として、絶対時刻情報から、所定のＶＯＰの表示時
刻までの時間を、秒精度で表したもの、または所定のＶ
ＯＰの直前に表示されるＩ−ＶＯＰもしくはＰ−ＶＯＰ
の表示時刻から、所定のＶＯＰの表示時刻までの時間
を、秒精度で表したものが生成され、絶対時刻情報に、
Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰそれぞれに
付加されている秒精度時刻情報および詳細時間情報を加
算した時刻が、Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ−Ｖ
ＯＰそれぞれの表示時刻とされる。従って、その符号化
結果に対して、グループ単位でのランダムアクセスを行
うことが可能となるとともに、例えば、秒精度時刻情報
が莫大なビット数となることを防止することが可能とな
る。According to the image coding method of the present invention, a plurality of VOPs are grouped and absolute time information indicating the absolute time when the coding of the VOPs of each group is started is added in group units. Further, second precision time information representing relative time in the group with second precision is generated, and from the second precision time information immediately before the display time of each I-VOP, P-VOP, or B-VOP, the respective precision time information is displayed. Detailed time information is generated that represents the time until the display time with a precision finer than the second precision. And I-VOP, P-V
As information indicating the display time of OP or B-VOP,
Second precision time information and detailed time information correspond to IV
It is added to OP, P-VOP, or B-VOP, respectively. In this case, as the second precision time information about the predetermined VOP, the time from the absolute time information to the display time of the predetermined VOP is represented by the second precision or the predetermined VOP.
I-VOP or P-VOP displayed immediately before OP
The time from the display time of to the display time of the predetermined VOP is represented with second precision, and the absolute time information is
For each I-VOP, P-VOP, or B-VOP
Adds the second precision time information and detailed time information
The calculated time is I-VOP, P-VOP, or B-V
The display time of each OP is set. Therefore, it is possible to perform random access in units of groups with respect to the encoded result, and it is possible to prevent the second precision time information from having an enormous number of bits, for example.

【０３２４】本発明の画像復号方法および画像復号装置
によれば、絶対時刻情報に、Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、
またはＢ−ＶＯＰそれぞれに付加されている秒精度時刻
情報および詳細時間情報を加算することで、Ｉ−ＶＯ
Ｐ，Ｐ−ＶＯＰ、またはＢ−ＶＯＰそれぞれの表示時刻
が求められ、Ｉ−ＶＯＰ，Ｐ−ＶＯＰ、またはＢ−ＶＯ
Ｐが、対応する表示時刻にしたがって復号される。この
場合に、所定のＶＯＰについての秒精度時刻情報とし
て、絶対時刻情報から、所定のＶＯＰの表示時刻までの
時間を、秒精度で表したもの、または所定のＶＯＰの直
前に表示されるＩ−ＶＯＰもしくはＰ−ＶＯＰの表示時
刻から、所定のＶＯＰの表示時刻までの時間を、秒精度
で表したものが用いられている。従って、符号化ビット
ストリームに対して、グループ単位でのランダムアクセ
スを行い、復号することが可能となる。また、例えば、
秒精度時刻情報が莫大なビット数となることを防止する
ことが可能となる。According to the image decoding method and the image decoding apparatus of the present invention, the absolute time information includes I-VOP, P-VOP,
Or the second precision time added to each B-VOP
By adding information and detailed time information, I-VO
Display time of P, P-VOP, or B-VOP
Is required, I-VOP, P-VOP, or B-VO
P is decoded according to the corresponding display time. In this case, as the second precision time information for the predetermined VOP, the time from the absolute time information to the display time of the predetermined VOP is expressed in second precision, or the I-displayed immediately before the predetermined VOP. The time from the display time of the VOP or the P-VOP to the display time of the predetermined VOP, which is expressed in seconds, is used. Therefore, it is possible to perform random access in units of groups to the encoded bitstream and perform decoding. Also, for example,
It is possible to prevent the second precision time information from having an enormous number of bits.

【０３２５】[0325]

【０３２６】[0326]

【０３２７】[0327]

【０３２８】[0328]

[Brief description of drawings]

【図１】本発明を適用したエンコーダの一実施の形態の
構成例を示すブロック図である。FIG. 1 is a block diagram showing a configuration example of an embodiment of an encoder to which the present invention has been applied.

【図２】時刻によって、ＶＯの位置、大きさが変化する
ことを説明するための図である。FIG. 2 is a diagram for explaining that the position and size of a VO change with time.

【図３】図１のＶＯＰ符号化部３₁乃至３_Nの構成例を示
すブロック図である。FIG. 3 is a block diagram showing a configuration example of VOP encoding sections 3 _{1 to} 3 _N in FIG.

【図４】空間スケーラビリティを説明するための図であ
る。FIG. 4 is a diagram for explaining spatial scalability.

【図５】空間スケーラビリティを説明するための図であ
る。FIG. 5 is a diagram for explaining spatial scalability.

【図６】空間スケーラビリティを説明するための図であ
る。FIG. 6 is a diagram for explaining spatial scalability.

【図７】空間スケーラビリティを説明するための図であ
る。FIG. 7 is a diagram for explaining spatial scalability.

【図８】ＶＯＰのサイズデータおよびオフセットデータ
の決定方法を説明するための図である。FIG. 8 is a diagram for explaining a method of determining VOP size data and offset data.

【図９】図３の下位レイヤ符号化部２５の構成例を示す
ブロック図である。9 is a block diagram showing a configuration example of a lower layer encoding unit 25 in FIG.

【図１０】図３の上位レイヤ符号化部２３の構成例を示
すブロック図である。10 is a block diagram showing a configuration example of an upper layer encoding unit 23 in FIG.

【図１１】空間スケーラビリティを説明するための図で
ある。FIG. 11 is a diagram for explaining spatial scalability.

【図１２】時間スケーラビリティを説明するための図で
ある。FIG. 12 is a diagram for explaining time scalability.

【図１３】本発明を適用したデコーダの一実施の形態の
構成例を示すブロック図である。FIG. 13 is a block diagram showing a configuration example of an embodiment of a decoder to which the present invention has been applied.

【図１４】図１３のＶＯＰ復号部７２₁乃至７２_Nの他の
構成例を示すブロック図である。14 is a block diagram showing another configuration example of the VOP decoding units 72 _{1 to} 72 _N of FIG.

【図１５】図１４の下位レイヤ復号部９５の構成例を示
すブロック図である。15 is a block diagram showing a configuration example of a lower layer decoding unit 95 in FIG.

【図１６】図１４の上位レイヤ復号部９３の構成例を示
すブロック図である。16 is a block diagram illustrating a configuration example of an upper layer decoding unit 93 in FIG.

【図１７】スケーラブル符号化によって得られるビット
ストリームのシンタクスを示す図である。[Fig. 17] Fig. 17 is a diagram illustrating the syntax of a bitstream obtained by scalable coding.

【図１８】ＶＳのシンタクスを示す図である。FIG. 18 is a diagram showing the syntax of VS.

【図１９】ＶＯのシンタクスを示す図である。[Fig. 19] Fig. 19 is a diagram illustrating the syntax of a VO.

【図２０】ＶＯＬのシンタクスを示す図である。[Fig. 20] Fig. 20 is a diagram illustrating the syntax of a VOL.

【図２１】ＶＯＰのシンタクスを示す図である。FIG. 21 is a diagram illustrating the syntax of VOP.

【図２２】modulo_time_baseとVOP_time_incrementとの
関係を示す図である。[Fig. 22] Fig. 22 is a diagram illustrating the relationship between modulo_time_base and VOP_time_increment.

【図２３】本発明によるビットストリームのシンタクス
を示す図である。FIG. 23 is a diagram showing the syntax of a bitstream according to the present invention.

【図２４】ＧＯＶのシンタクスを示す図である。FIG. 24 is a diagram showing the syntax of GOV.

【図２５】ＧＯＶ層のtime_code、並びにＧＯＶの先頭
のＩ−ＶＯＰのｍｏｄｕｌｏ＿ｔｉｍｅ＿ｂａｓｅとＶ
ＯＰ＿ｔｉｍｅ＿ｉｎｃｒｅｍｅｎｔの符号化方法を示
す図である。FIG. 25 is a time_code of the GOV layer, and modulo_time_base and V of the I-VOP at the head of the GOV.
It is a figure which shows the encoding method of OP_time_increment.

【図２６】ＧＯＶ層のtime_code、並びにＧＯＶの先頭
のＩ−ＶＯＰよりも前に位置するＢ−ＶＯＰのmodulo_t
ime_baseとVOP_time_incrementの符号化方法を示す図で
ある。FIG. 26 is a time_code of a GOV layer and a modulo_t of a B-VOP located before the I-VOP at the head of the GOV.
It is a figure which shows the encoding method of ime_base and VOP_time_increment.

【図２７】modulo_time_baseとVOP_time_incrementの定
義を変更しない場合のそれらの関係を示す図である。[Fig. 27] Fig. 27 is a diagram illustrating the relationship between modulo_time_base and VOP_time_increment when the definitions are not changed.

【図２８】Ｂ−ＶＯＰのmodulo_time_baseとVOP_time_i
ncrementの第１の方法による符号化処理を示す図であ
る。FIG. 28: B-VOP modulo_time_base and VOP_time_i
It is a figure which shows the encoding process by the 1st method of ncrement.

【図２９】Ｉ／Ｐ−ＶＯＰのmodulo_time_baseとVOP_ti
me_incrementの第１および第２の方法による符号化処理
を示すフローチャートである。FIG. 29: modulo_time_base and VOP_ti of I / P-VOP
It is a flowchart which shows the encoding process by the 1st and 2nd method of me_increment.

【図３０】Ｂ−ＶＯＰのmodulo_time_baseとVOP_time_i
ncrementの第１の方法による符号化処理を示すフローチ
ャートである。FIG. 30: B-VOP modulo_time_base and VOP_time_i
It is a flowchart which shows the encoding process by the 1st method of ncrement.

【図３１】第１および第２の方法により符号化したＩ／
Ｐ−ＶＯＰのmodulo_time_baseとVOP_time_incrementの
復号処理を示すフローチャートである。FIG. 31 is an I / O coded by the first and second methods.
It is a flowchart which shows the decoding process of modulo_time_base and VOP_time_increment of P-VOP.

【図３２】第１の方法により符号化したＢ−ＶＯＰのmo
dulo_time_baseとVOP_time_incrementの復号処理を示す
フローチャートである。FIG. 32: mo of B-VOP coded by the first method
It is a flowchart which shows the decoding process of dulo_time_base and VOP_time_increment.

【図３３】Ｂ−ＶＯＰのmodulo_time_baseとVOP_time_i
ncrementの第２の方法による符号化処理を示す図であ
る。[Fig. 33] B-VOP modulo_time_base and VOP_time_i
It is a figure which shows the encoding process by the 2nd method of ncrement.

【図３４】Ｂ−ＶＯＰのmodulo_time_baseとVOP_time_i
ncrementの第２の方法による符号化処理を示すフローチ
ャートである。FIG. 34 is a B-VOP modulo_time_base and VOP_time_i
It is a flowchart which shows the encoding process by the 2nd method of ncrement.

【図３５】第２の方法により符号化したＢ−ＶＯＰのmo
dulo_time_baseとVOP_time_incrementの復号処理を示す
フローチャートである。FIG. 35: mo of B-VOP encoded by the second method
It is a flowchart which shows the decoding process of dulo_time_base and VOP_time_increment.

【図３６】modulo_time_baseについて説明するための図
である。[Fig. 36] Fig. 36 is a diagram for describing modulo_time_base.

【図３７】本発明を適用したエンコーダおよびデコーダ
の他の実施の形態の構成例を示すブロック図である。[Fig. 37] Fig. 37 is a block diagram illustrating a configuration example of another embodiment of an encoder and a decoder to which the present invention has been applied.

【図３８】従来のエンコーダの一例の構成を示すブロッ
ク図である。FIG. 38 is a block diagram showing a configuration of an example of a conventional encoder.

【図３９】従来のデコーダの一例の構成を示すブロック
図である。FIG. 39 is a block diagram showing a configuration of an example of a conventional decoder.

[Explanation of symbols]

１ＶＯ構成部，２₁乃至２_N ＶＯＰ構成部，３₁
乃至３_N ＶＯＰ符号化部，４多重化部，２１
画像階層化部，２３上位レイヤ符号化部，２４解
像度変換部，２５下位レイヤ符号化部，２６多
重化部，３１フレームメモリ，３２動きベクトル
検出器，３３演算器，３４ＤＣＴ器，３５
量子化器，３６ＶＬＣ器，３８逆量子化器，
３９ＩＤＣＴ器，４０演算器，４１フレームメ
モリ，４２動き補償器，５３フレームメモリ，
７１逆多重化部，７２₁乃至７２_N ＶＯＰ復号部，
７３画像再構成部，９１逆多重化部，９３
上位レイヤ復号部，９４解像度変換部，９５下位
レイヤ復号部，１０２ＩＶＬＣ器，１０３逆量子
化器，１０４ＩＤＣＴ器，１０５演算器，１
０６フレームメモリ，１０７動き補償器，１１
２フレームメモリ，２０１ＲＯＭ，２０２ＣＰ
Ｕ，２０３ＲＡＭ，２０４入力部，２０５出
力部，２０６ＨＤ，２０７通信Ｉ／Ｆ1 VO component, 2 _{1 to} 2 _N VOP component, 3 ₁
To 3 _N VOP encoder, 4 multiplexer, 21
Image layering unit, 23 upper layer coding unit, 24 resolution conversion unit, 25 lower layer coding unit, 26 multiplexing unit, 31 frame memory, 32 motion vector detector, 33 arithmetic unit, 34 DCT unit, 35
Quantizer, 36 VLC device, 38 inverse quantizer,
39 IDCT device, 40 arithmetic unit, 41 frame memory, 42 motion compensator, 53 frame memory,
71 demultiplexing unit, 72 _{1 to} 72 _N VOP decoding unit,
73 image reconstruction unit, 91 demultiplexing unit, 93
Upper layer decoding unit, 94 resolution conversion unit, 95 lower layer decoding unit, 102 IVLC unit, 103 inverse quantizer, 104 IDCT unit, 105 arithmetic unit, 1
06 frame memory, 107 motion compensator, 11
2 frame memory, 201 ROM, 202 CP
U, 203 RAM, 204 input section, 205 output section, 206 HD, 207 communication I / F

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平５−128823（ＪＰ，Ａ) 特表平11−513222（ＪＰ，Ａ) 国際公開95／23411（ＷＯ，Ａ１) ＮＰＥＧ−４最新情報，電子情報通信学会技術研究報告，1997年３月19日，ＩＥ96−141，ｐ．１−８ (58)調査した分野(Int.Cl.⁷，ＤＢ名) H04N 7/24 - 7/68 ─────────────────────────────────────────────────── ─── Continuation of the front page (56) References Japanese Unexamined Patent Publication No. 5-128823 (JP, A) Special Table 11-513222 (JP, A) International Publication 95/23411 (WO, A1) NPEG-4 latest information, IEICE Technical Report, March 19, 1997, IE96-141, p. 1-8 (58) Fields investigated (Int.Cl. ⁷ , DB name) H04N ^7/ 24-7/68

Claims

(57) [Claims]

1. An image encoding method for encoding an image for each VOP (Video Object Plane) which is an object constituting the image, and outputting an encoded bit stream obtained as a result, which is intra-encoded. I-VOP (Intra-VO
P) and a VOP coded by either intra coding or forward predictive coding as P-VOP (Predicti
ve-VOP) and a VOP coded by any of intra coding, forward predictive coding, backward predictive coding, or bidirectional predictive coding, as a B-VOP (Biderectionally Pr
edictive-VOP), each of the plurality of VOPs is grouped, and the VOP of each group is
A first adding step of adding absolute time information representing the absolute time when the encoding of the above is started to the group unit, and second precision time information for generating second precision time information representing relative time within the group with second precision Detailed time information representing the generation step and the time from the second precision time information immediately before the display time of each of the I-VOP, P-VOP, and B-VOP to each display time with a precision finer than the second precision. And a detailed time information generating step of generating the detailed time information, and the second precision time information and the detailed time information as the information indicating the display time of the I-VOP, P-VOP, or B-VOP. -VOP or B-VOP, and a second addition step of adding each to the predetermined VO in the second precision time information generation step.
As the second precision time information about P, the time from the absolute time information to the display time of the predetermined VOP,
Generated in seconds precision or the time from the display time of the I-VOP or P-VOP displayed immediately before the predetermined VOP to the display time of the predetermined VOP in seconds precision and, to the absolute time information, the I-VOP, P-VOP, or
Or the second precision added to each B-VOP
The time obtained by adding the time information and the detailed time information to the I-
Display of VOP, P-VOP, or B-VOP
An image coding method characterized by setting the time .

2. An image decoding method for decoding an encoded bit stream obtained by encoding an image for each VOP (Video Object Plane) which is an object constituting the image, wherein an intra-encoded VOP is I-VOP (Intra-VO
P) and a VOP coded by either intra coding or forward predictive coding as P-VOP (Predicti
ve-VOP) and a VOP coded by any of intra coding, forward predictive coding, backward predictive coding, or bidirectional predictive coding, as a B-VOP (Biderectionally Pr
edictive-VOP) and each of the plurality of VOPs are grouped, and the VO of each group is
Absolute time information indicating the absolute time at which the encoding of P is started is added to the group unit, and the relative time within the group is represented by the second precision time information by the second precision and the I-VOP, P-VOP. , Or B
-Detailed time information indicating the time from the second precision time information immediately before each display time of each VOP to each display time with a precision finer than the second precision, as the information indicating the display time, the corresponding I -VOP, P-VOP,
Alternatively, when added to each B-VOP, the absolute time information includes the I-VOP, P-VOP, or
Or the second precision added to each B-VOP
By adding the time information and the detailed time information, the I-
Display of VOP, P-VOP, or B-VOP
A display time calculation step of calculating the time, the I-VOP, a P-VOP or B-VOP,, and a decoding step of decoding according to the corresponding display time, as the second precision time information for a given VOP, The time from the absolute time information to the display time of the predetermined VOP, which is expressed in seconds, or the predetermined VO.
An image decoding method characterized in that the time from the display time of the I-VOP or P-VOP displayed immediately before P to the display time of the predetermined VOP is expressed with second precision. .

3. An image is an object that composes the image
Encoding for each VOP (Video Object Plane)
Image decoding that decodes the encoded bitstream obtained by
A device, An intra-coded VOP is converted into an I-VOP (Intra-VO
P) and intra or forward predictive coding
VOP coded by either P-VOP (Predicti
ve-VOP), intra coding, forward prediction coding, backward prediction
Either in the measurement coding or in the bidirectional predictive coding
The coded VOP is a B-VOP (Biderectionally Pr
edictive-VOP) A plurality of the VOPs are grouped, and the VO of each group is
Absolute time information indicating the absolute time when the encoding of P is started is
It is added to each group, The relative time within the group is shown with the second precision.
Degree time information and the I-VOP, P-VOP, or B
-Second precision time information immediately before the display time of each VOP
From the information to the respective display time, the second accuracy
Detailed time information expressed with fine accuracy indicates the display time.
As the information, the corresponding I-VOP, P-VOP,
Or when added to B-VOP respectively, The absolute time information includes the I-VOP, P-VOP,
Or the second precision added to each B-VOP
By adding the time information and the detailed time information, the I-
Display of VOP, P-VOP, or B-VOP
A display time calculating means for obtaining the time, The I-VOP, P-VOP, or B-VOP is paired with
Decoding means for decoding according to the corresponding display time Equipped
e, As the second precision time information for a predetermined VOP,
From the absolute time information to the display time of the predetermined VOP
Of time in seconds, or the predetermined VO
I-VOP or P-VOP displayed immediately before P
Time from the display time to the display time of the predetermined VOP
Is expressed in seconds accuracy is used. Characterized by
Image decoding device.