JP2015188249A

JP2015188249A - Video coding device and video coding method

Info

Publication number: JP2015188249A
Application number: JP2015113470A
Authority: JP
Inventors: 裕司川島; Yuji Kawashima; 菊池　義浩; Yoshihiro Kikuchi; 義浩菊池
Original assignee: Toshiba Corp; Toshiba Lifestyle Products and Services Corp
Current assignee: Toshiba Corp; Toshiba Lifestyle Products and Services Corp
Priority date: 2015-06-03
Filing date: 2015-06-03
Publication date: 2015-10-29

Abstract

PROBLEM TO BE SOLVED: To provide a video coding device and a video coding method that further improve coding efficiency.SOLUTION: A video coding device comprises control means. The control means generates a B picture by using an inter-picture prediction structure that allows reference from a reference B picture in one GOP (Group Of Picture) to another reference B picture in the GOP.

Description

本実施形態は、動画像符号化装置及び動画像符号化方法に関する。 The present embodiment relates to a moving image encoding apparatus and a moving image encoding method.

動画像符号化方式の１つであるＨ．２６４は、ＤＰＢ（Decoded Picture Buffer）の導
入により、複数の参照ピクチャを参照することができる。ＤＰＢの導入は、Ｈ．２６４仕
様における符号化効率の向上に貢献している。ＤＰＢは、サイズ上限による参照ピクチャ
数の制約があるものの、復号ピクチャマーキング処理等を利用することで、復号ピクチャ
に対して時間的な距離が近いピクチャだけでなく、遠いピクチャも参照できる。 H. is one of the moving image encoding methods. H.264 can refer to a plurality of reference pictures by introducing DPB (Decoded Picture Buffer). The introduction of DPB This contributes to the improvement of encoding efficiency in the H.264 specification. Although the DPB has a restriction on the number of reference pictures due to the upper limit of the size, by using a decoded picture marking process or the like, not only a picture that is close in time to the decoded picture but also a picture that is far away can be referred to.

また、Ｈ．２６４等の動画像符号化方式では、Ｉピクチャ、Ｐピクチャ、Ｂピクチャが
ある。一般的に、発生符号量は、Ｉピクチャ、Ｐピクチャ、Ｂピクチャの順序で小さくな
る。したがって、Ｂピクチャが多くなればなるほど、ストリームの符号量は小さくなり、
符号化効率は向上する。 H. In a moving picture coding system such as H.264, there are an I picture, a P picture, and a B picture. Generally, the generated code amount decreases in the order of I picture, P picture, and B picture. Therefore, the more B pictures, the smaller the code amount of the stream,
Coding efficiency is improved.

動画像符号化方式の１つであるＭＰＥＧ−２では、Ｂピクチャが多くなるほど、Ｂピク
チャが参照するピクチャへの時間的な距離が遠くなる。そのため、ＭＰＥＧ−２仕様では
、Ｂピクチャの予測が当たりにくくなるため、符号化効率が悪化することが知られていた
。そこで、Ｈ．２６４は、参照Ｂピクチャ、すなわちＢピクチャからＢピクチャへの参照
を可能とするピクチャの導入により、符号化効率を向上させている。 In MPEG-2, which is one of the moving image coding systems, the time distance to the picture referred to by the B picture increases as the number of B pictures increases. For this reason, it has been known that the MPEG-2 specification makes it difficult to predict a B picture, resulting in a deterioration in encoding efficiency. Therefore, H.H. H.264 improves coding efficiency by introducing a reference B picture, that is, a picture that enables reference from a B picture to a B picture.

ＡＲＩＢ規格のＨ．２６４仕様では、放送や配信等でランダムアクセス再生および高速
再生等を可能にするため、画面間予測構造（ＧＯＰ (Group of Pictures)構造）の制約が
以下のように規定されている。非参照Ｂピクチャおよび参照Ｂピクチャの復号順序は、表
示順序が直後であるＩピクチャあるいはＰピクチャの直後であること。ここで、Ｉピクチ
ャあるいはＰピクチャは、非参照Ｂピクチャあるいは参照Ｂピクチャと同一ＧＯＰ内のピ
クチャとする。非参照Ｂピクチャは、（ａ）表示順序が直前もしくは直後のＩピクチャも
しくはＰピクチャのフレームあるいはフィールド・ペア、または、（ｂ）表示順序が直前
もしくは直後のＩピクチャもしくはＰピクチャより近く、表示順序が直前もしくは直後
である参照Ｂピクチャのフレームあるいはフィールド・ペアのみを参照すること。参照Ｂ
ピクチャは、（ａ）表示順序で直前もしくは直後のＩピクチャもしくはＰピクチャのフレ
ームあるいはフィールド・ペア、または、（ｂ）同一フレームを構成する参照Ｂピクチャ
のフィールドのみを参照すること。 ARIB standard H.264 In the H.264 specification, in order to enable random access reproduction, high-speed reproduction, and the like in broadcasting, distribution, etc., restrictions on the inter-screen prediction structure (GOP (Group of Pictures) structure) are defined as follows. The decoding order of the non-reference B picture and the reference B picture is immediately after the I picture or P picture whose display order is immediately after. Here, the I picture or the P picture is a picture in the same GOP as the non-reference B picture or the reference B picture. A non-reference B picture is either (a) a frame or field pair of an I picture or P picture immediately before or immediately after the display order, or (b) a display order closer to the I picture or P picture immediately before or after the display order. Refer only to the frame or field pair of the reference B picture that is immediately before or after. Reference B
A picture refers to (a) a frame or field pair of an I picture or P picture immediately before or after in the display order, or (b) only a field of a reference B picture constituting the same frame.

上記画面間予測構造の制約に従うＢピクチャ間の参照関係は、上位階層から下位階層へ
の参照のみ可能にした階層構造をとることができる。これにより、ある階層に属するピク
チャは、自分より下の階層のピクチャが復号されていれば、必ず復号できる。この階層関
係は、高速再生に利用できる。 The reference relationship between B pictures according to the restriction of the inter-screen prediction structure can take a hierarchical structure in which only reference from the upper hierarchy to the lower hierarchy is possible. As a result, a picture belonging to a certain hierarchy can be decoded without fail if a picture of a hierarchy below itself is decoded. This hierarchical relationship can be used for high-speed playback.

ＡＲＩＢＳＴＤ-Ｂ３２第１部付属２第３章３．６ARIB STD-B32 Part 1 Appendix 2 Chapter 3 3.6

しかしながら、現在の画面間予測構造の制約の下では、非参照Ｂピクチャから参照Ｂピ
クチャへの参照が不可能である。図９は、現在のＡＲＩＢ規格のＨ．２６４仕様での一例
となるＧＯＰに含まれる各ピクチャの画面間予測構造を示す図である。各ピクチャ間の参
照関係によれば、Ｉ０及びＰ４は０階層目、Ｂ２は１階層目、ｂ１、ｂ３は２階層目であ
る。０階層目は、ＩピクチャまたはＰピクチャで構成される。１階層目は、参照Ｂピクチ
ャで構成される。２階層目は、非参照Ｂピクチャで構成される。そのため、Ｂピクチャ間
の参照関係は、図９に示すように２階層構造をとるに留まる。現在の画面間予測構造の制
約の下では、入力画像信号のフレームレートが増えた場合、単位時間あたりに含まれるＩ
ピクチャあるいはＰピクチャの数は、フレームレートに比例して増大する。結果的に符号
化効率は下がる。そのため、入力画像信号のフレームレートが増えた場合であっても、Ｂ
ピクチャの数を増やすことができれば、より符号化効率を向上させることができる。 However, it is impossible to refer from a non-reference B picture to a reference B picture under the constraints of the current inter-screen prediction structure. FIG. 9 shows the current ARIB standard H.264. It is a figure which shows the inter-screen prediction structure of each picture contained in GOP used as an example by H.264 specification. According to the reference relationship between pictures, I0 and P4 are the 0th layer, B2 is the 1st layer, and b1 and b3 are the 2nd layer. The 0th layer is composed of an I picture or a P picture. The first layer is composed of reference B pictures. The second layer is composed of non-reference B pictures. For this reason, the reference relationship between the B pictures only has a two-layer structure as shown in FIG. Under the constraints of the current inter-screen prediction structure, when the frame rate of the input image signal is increased, I included in the unit time
The number of pictures or P pictures increases in proportion to the frame rate. As a result, the coding efficiency decreases. Therefore, even if the frame rate of the input image signal is increased, B
If the number of pictures can be increased, encoding efficiency can be further improved.

本発明の目的は、より符号化効率を向上させる動画像符号化装置及び動画像符号化方法
を提供することである。 An object of the present invention is to provide a moving picture coding apparatus and a moving picture coding method that further improve coding efficiency.

実施形態によれば、動画像符号化装置は、制御手段を備える。前記制御手段は、１つの
ＧＯＰ内の参照Ｂピクチャから前記ＧＯＰ内の他の参照Ｂピクチャへの参照を可能とする
画面間予測構造を用いてＢピクチャを生成するように制御する。 According to the embodiment, the moving image encoding apparatus includes a control unit. The control means performs control so that a B picture is generated using an inter-screen prediction structure that enables reference from a reference B picture in one GOP to another reference B picture in the GOP.

実施形態に係る一例となる動画像符号化装置の構成例を示すブロック図。The block diagram which shows the structural example of the moving image encoder which becomes an example which concerns on embodiment. 実施形態に係る一例となる参照Ｂピクチャの画面間予測構造を示す図。The figure which shows the inter-screen prediction structure of the reference B picture used as an example which concerns on embodiment. 実施形態に係る一例となる非参照Ｂピクチャの画面間予測構造を示す図。The figure which shows the inter-screen prediction structure of the non-reference B picture used as an example which concerns on embodiment. 実施形態に係る一例となるＧＯＰに含まれる各ピクチャの画面間予測構造を示す図。The figure which shows the inter-screen prediction structure of each picture contained in GOP used as an example which concerns on embodiment. 実施形態に係る一例となる高速再生を説明するための図。The figure for demonstrating the high-speed reproduction used as an example which concerns on embodiment. 実施形態に係る一例となる高速再生を説明するための図。The figure for demonstrating the high-speed reproduction used as an example which concerns on embodiment. 実施形態に係る一例となる高速再生を説明するための図。The figure for demonstrating the high-speed reproduction used as an example which concerns on embodiment. 実施形態に係る一例となる再生速度の変更を説明するための図。The figure for demonstrating the change of the reproduction speed used as an example which concerns on embodiment. ＡＲＩＢ規格のＨ．２６４仕様での一例となるＧＯＰに含まれる各ピクチャの画面間予測構造を示す図。ARIB standard H.264 The figure which shows the inter-screen prediction structure of each picture contained in GOP used as an example by H.264 specification.

以下、図面を参照して本実施形態について説明する。
図１は、実施形態に係る一例となる動画像符号化装置の構成例を示すブロック図である
。動画符号化装置１０は、入力画像信号（画像データ）２００から符号化ビット列（符号
化データ）２６０を生成するための装置である。動画符号化装置１０は、制御部（制御手
段）１０１、減算器１０２、直交変換器１０３、量子化器１０４、逆量子化器１０５、逆
直交変換器１０６、加算器１０７、ループフィルタ１０８、フレームメモリ１０９、予測
画像生成部１１０、エントロピー符号化器１１１を備える。 Hereinafter, this embodiment will be described with reference to the drawings.
FIG. 1 is a block diagram illustrating a configuration example of a moving image encoding apparatus as an example according to the embodiment. The moving image encoding apparatus 10 is an apparatus for generating an encoded bit string (encoded data) 260 from an input image signal (image data) 200. The moving image coding apparatus 10 includes a control unit (control means) 101, a subtractor 102, an orthogonal transformer 103, a quantizer 104, an inverse quantizer 105, an inverse orthogonal transformer 106, an adder 107, a loop filter 108, a frame A memory 109, a predicted image generation unit 110, and an entropy encoder 111 are provided.

制御部１０１は、動画像符号化装置１０に含まれる各要素の動作を制御する。
減算器１０２は、外部から入力画像信号２００が供給され、後述する予測画像生成器１
１０から予測画像信号２５０も供給される。減算器１０２は、入力画像信号２００から、
予測画像信号２５０を減算して予測誤差信号２１０を得る。減算器１０２は、予測誤差信
号２１０を直交変換器１０２に出力する。
直交変換器１０３は、例えば離散コサイン変換を実行して、予測誤差信号２１０を直交
変換して直交変換係数情報２２０を得る。直交変換器１０３は、直交変換係数情報２２０
を量子化器３０３に出力する。
量子化器１０４は、直交変換係数情報２２０を量子化して量子化直交変換係数情報（量
子化データ）２３０を得る。量子化器１０４は、量子化直交変換係数情報２３０を逆量子
化器１０５とエントロピー符号化器１１１に出力する。 The control unit 101 controls the operation of each element included in the video encoding device 10.
The subtracter 102 is supplied with an input image signal 200 from the outside, and a predicted image generator 1 described later.
The predicted image signal 250 is also supplied from 10. The subtracter 102 receives the input image signal 200 from
The prediction error signal 210 is obtained by subtracting the prediction image signal 250. The subtracter 102 outputs the prediction error signal 210 to the orthogonal transformer 102.
The orthogonal transformer 103 performs discrete cosine transform, for example, and orthogonally transforms the prediction error signal 210 to obtain orthogonal transform coefficient information 220. The orthogonal transformer 103 includes orthogonal transform coefficient information 220.
Is output to the quantizer 303.
The quantizer 104 quantizes the orthogonal transform coefficient information 220 to obtain quantized orthogonal transform coefficient information (quantized data) 230. The quantizer 104 outputs the quantized orthogonal transform coefficient information 230 to the inverse quantizer 105 and the entropy encoder 111.

逆量子化器１０５及び逆直交変換器１０６は、量子化直交変換係数情報２３０を局部復
号する。逆直交変換器１０６は、局部復号された量子化直交変換係数情報２３０を加算器
１０７に出力する。
加算器１０７は、局部復号された量子化直交変換係数情報２３０に予測画像信号２５０
を加算して局部復号画像信号２４０を得る。加算器１０７は、局部復号画像信号２４０を
ループフィルタ１０８に出力する。局部復号画像信号２４０は、ループフィルタ３０６を
介してフレームメモリ３０８に供給される。 The inverse quantizer 105 and the inverse orthogonal transformer 106 locally decode the quantized orthogonal transform coefficient information 230. The inverse orthogonal transformer 106 outputs the locally decoded quantized orthogonal transform coefficient information 230 to the adder 107.
The adder 107 adds the predicted image signal 250 to the locally decoded quantized orthogonal transform coefficient information 230.
To obtain a locally decoded image signal 240. The adder 107 outputs the locally decoded image signal 240 to the loop filter 108. The locally decoded image signal 240 is supplied to the frame memory 308 via the loop filter 306.

フレームメモリ１０９は、保存している局部復号画像信号２４０を予測画像生成器１１
０に供給する。
予測画像生成器１１０は、局部復号画像信号２４０に基づいて予測画像信号２５０を得
る。予測画像生成器１１０は、予測画像信号２５０を減算器３０１及び加算器３１５に出
力する。
エントロピー符号化部１１１は、量子化直交変換係数情報２３０を符号化処理すること
で符号化ビット列２６０を得る。エントロピー符号化部１１１は、符号化ビット列２６０
を外部に出力する。 The frame memory 109 converts the stored local decoded image signal 240 into the predicted image generator 11.
Supply to zero.
The predicted image generator 110 obtains a predicted image signal 250 based on the locally decoded image signal 240. The predicted image generator 110 outputs the predicted image signal 250 to the subtracter 301 and the adder 315.
The entropy encoding unit 111 obtains an encoded bit string 260 by encoding the quantized orthogonal transform coefficient information 230. The entropy encoding unit 111 includes an encoded bit string 260.
Is output to the outside.

動画像符号化装置１０は、上記構成により、Ｉピクチャ、Ｐピクチャ、Ｂピクチャを生
成し、Ｉピクチャを少なくとも１つ含む複数のピクチャで構成されるＧＯＰを符号化ビッ
ト列２６０として生成する。なお、Ｉピクチャは、その画面だけで符号化されて生成され
るピクチャである。Ｐピクチャは、単方向予測で符号化されて生成されるピクチャである
。Ｂピクチャは、双方向予測で符号化されて生成されるピクチャである。Ｂピクチャは、
他のピクチャから参照可能なＢピクチャ（以降、参照Ｂピクチャという）と、他のピクチ
ャから参照されないＢピクチャ（以降、非参照Ｂピクチャという）の２種類がある。 The moving picture encoding apparatus 10 generates an I picture, a P picture, and a B picture with the above configuration, and generates a GOP composed of a plurality of pictures including at least one I picture as an encoded bit string 260. Note that an I picture is a picture generated by being encoded only on the screen. A P picture is a picture generated by being encoded by unidirectional prediction. A B picture is a picture generated by being encoded by bidirectional prediction. B picture
There are two types: B pictures that can be referenced from other pictures (hereinafter referred to as reference B pictures) and B pictures that are not referenced from other pictures (hereinafter referred to as non-reference B pictures).

次に、本実施形態で規定するＢピクチャに関する画面間予測構造の制約について説明す
る。制御部１０１は、以下の（１）〜（５）に示す５つのＢピクチャに関する画面間予測
構造の制約のうち少なくとも１つを用いてＢピクチャを生成するように制御する。なお、
以下で説明するＩピクチャあるいはＰピクチャは、非参照Ｂピクチャあるいは参照Ｂピク
チャと同一ＧＯＰ内のピクチャを示している。 Next, restrictions on the inter-picture prediction structure relating to the B picture defined in this embodiment will be described. The control unit 101 performs control so as to generate a B picture using at least one of the restrictions on the inter-picture prediction structure for the five B pictures shown in (1) to (5) below. In addition,
An I picture or P picture described below indicates a picture in the same GOP as a non-reference B picture or a reference B picture.

（１）参照Ｂピクチャから参照Ｂピクチャへの参照を可能とする画面間予測構造。つま
り、この画面間予測構造は、１つのＧＯＰ内の参照Ｂピクチャから、このＧＯＰ内の他の
参照Ｂピクチャへの参照を可能とする。なお、非参照Ｂピクチャから参照Ｂピクチャへの
参照は、従来（ＡＲＩＢ規格のＨ．２６４仕様）どおり可能である。
（２）Ｂピクチャから、表示順が前のＩまたはＰピクチャへの参照を可能とする画面間
予測構造。つまり、この画面間予測構造は、ＧＯＰ内において、第１のＢピクチャから、
表示順序が第１のＢピクチャよりも前のＩピクチャまたはＰピクチャへの参照を可能とす
る。Ｂピクチャは、表示順が前のＩまたはＰピクチャであれば、従来の認められていた表
示順が直前のＩまたはＰピクチャ以外のＩまたはＰピクチャを参照できる。
（３）Ｂピクチャから、表示順序が直前のＰピクチャより遠いＢピクチャへの参照を不
可とする画面間予測構造。つまり、この画面間予測構造は、ＧＯＰ内において、第１のＢ
ピクチャから、表示順序が第１のＢピクチャの直前のＩピクチャまたはＰピクチャより遠
い第２のＢピクチャへの参照を不可とする。
（４）Ｂピクチャから、表示順序が直後のＰピクチャより遠いＰピクチャへの参照を不
可とする画面間予測構造。つまり、この画面間予測構造は、ＧＯＰ内において、第１のＢ
ピクチャから、表示順序が第１のＢピクチャの直後のＩピクチャまたはＰピクチャより遠
い他のＩピクチャまたは他のＰピクチャへの参照を不可とする。言い換えれば、この画面
間予測構造は、ＧＯＰ内において、表示順序が第１のＢピクチャよりも後のＩピクチャま
たはＰピクチャについては、第１のＢピクチャから、表示順序が第１のＢピクチャの直後
のＩピクチャまたはＰピクチャのみを参照する。
（５）Ｂピクチャから、表示順序が直前もしくは直後のＩピクチャもしくはＰピクチャ
より近くの参照Ｂピクチャのみを参照する画面間予測構造。つまり、この画面間予測構造
は、ＧＯＰ内に含まれる参照Ｂピクチャに関して、ＧＯＰ内において、第１のＢピクチャ
から、表示順序が第１のＢピクチャの直前もしくは直後のＩピクチャもしくはＰピクチャ
より近くの参照Ｂピクチャのみへの参照を可能とする。
なお、連続するＢピクチャ（非参照Ｂピクチャあるいは参照Ｂピクチャ）のフレーム、
あるいはフィールド・ペアの最大枚数は、従来の制約と異なり、一例として７とする。 (1) An inter-screen prediction structure that enables reference from a reference B picture to a reference B picture. That is, this inter-screen prediction structure enables reference from a reference B picture in one GOP to another reference B picture in this GOP. The reference from the non-reference B picture to the reference B picture can be performed as in the past (ARIB standard H.264 specification).
(2) An inter-screen prediction structure that enables reference from the B picture to the previous I or P picture in display order. That is, this inter-screen prediction structure is obtained from the first B picture in the GOP.
It is possible to refer to an I picture or P picture whose display order is earlier than the first B picture. If the display order of the B picture is the previous I or P picture, it is possible to refer to an I or P picture other than the previous I or P picture whose display order has been accepted in the past.
(3) An inter-screen prediction structure that disables reference from a B picture to a B picture that is farther than the previous P picture in display order. That is, this inter-screen prediction structure is the first B in the GOP.
Reference from a picture to a second B picture farther than the I picture or P picture immediately before the first B picture is disabled.
(4) An inter-screen prediction structure in which reference from a B picture to a P picture farther than the P picture immediately after the display order is impossible. That is, this inter-screen prediction structure is the first B in the GOP.
Reference from the picture to another I picture or another P picture farther than the I picture or P picture immediately after the first B picture is disabled. In other words, this inter-screen prediction structure is based on the fact that the I picture or the P picture whose display order is later than the first B picture in the GOP starts from the first B picture. Reference is made only to the immediately following I picture or P picture.
(5) An inter-screen prediction structure that refers only to a reference B picture closer to the I picture or P picture immediately before or immediately after the B picture. In other words, this inter-screen prediction structure is related to the reference B picture included in the GOP, in the GOP, from the first B picture, closer to the I picture or P picture immediately before or immediately after the first B picture. It is possible to refer to only the reference B picture.
It should be noted that frames of consecutive B pictures (non-reference B pictures or reference B pictures),
Alternatively, the maximum number of field pairs is set to 7 as an example, unlike conventional restrictions.

図２は、実施形態に係る一例となる参照Ｂピクチャの画面間予測構造を示す図である。
ここでは、参照Ｂピクチャ３０１を例にして画面間予測構造を説明する。なお、図２にお
ける「Ｉ」はＩピクチャ、「Ｐ」はピクチャ、「Ｂ」は参照Ｂピクチャ、「ｂ」は非参照
Ｂピクチャを意味する。図２は、１つのＧＯＰに含まれる各ピクチャを表示順序に並べて
いる。実線の矢印は、上記制約（上記（１）、（２）、（４）、（５））により参照可能
となる参照Ｂピクチャ３０１と他のピクチャとの関係の一例を示している。なお、実線の
矢印と共に示す「○」記号は、ＡＲＩＢ規格のＨ．２６４仕様でも参照可能であることを
示す。実線の矢印と共に示す「◎」記号は、本実施形態で規定する制約によって参照可能
となったことを示す。破線の矢印は、上記制約（上記（３）、（４））により参照不可と
なる参照Ｂピクチャ３０１と他のピクチャとの関係の一例を示している。矢印と共に示す
数字は、適用される上記制約の番号に対応している。なお、矢印と共に示す「×」記号は
、参照不可であることを示す。 FIG. 2 is a diagram illustrating an inter-screen prediction structure of a reference B picture as an example according to the embodiment.
Here, the inter-screen prediction structure will be described using the reference B picture 301 as an example. In FIG. 2, “I” means an I picture, “P” means a picture, “B” means a reference B picture, and “b” means a non-reference B picture. In FIG. 2, the pictures included in one GOP are arranged in the display order. Solid arrows indicate an example of the relationship between the reference B picture 301 and other pictures that can be referred to by the above restrictions ((1), (2), (4), and (5) above). The symbol “◯” shown together with the solid line arrow indicates the AR. It shows that it can be referred to in the H.264 specification. A symbol “」 ”shown together with a solid arrow indicates that the reference can be made due to the restriction defined in the present embodiment. Dashed arrows indicate an example of the relationship between the reference B picture 301 that cannot be referred to due to the restrictions (above (3) and (4)) and other pictures. The numbers shown with the arrows correspond to the numbers of the constraints that apply. Note that an “x” symbol shown with an arrow indicates that reference is impossible.

図３は、実施形態に係る一例となる非参照Ｂピクチャの画面間予測構造を示す図である
。ここでは、非参照Ｂピクチャ３０２を例にして画面間予測構造を説明する。なお、図３
における「Ｉ」、「Ｐ」、「Ｂ」、「ｂ」は、図２と同様のピクチャを意味する。図３は
、１つのＧＯＰに含まれる各ピクチャを表示順序に並べている。実線の矢印は、上記制約
（上記（２）、（４）、（５））により参照可能となる非参照Ｂピクチャ３０２と他のピ
クチャとの関係の一例を示している。なお、実線の矢印と共に示す「○」記号は、ＡＲＩ
Ｂ規格のＨ．２６４仕様でも参照可能であることを示す。実線の矢印と共に示す「◎」記
号は、本実施形態で規定する制約によって参照可能となったことを示す。破線の矢印は、
上記制約（上記（３）、（４））により参照不可となる非参照Ｂピクチャ３０１と他のピ
クチャとの関係の一例を示している。矢印と共に示す数字は、適用される上記制約の番号
に対応している。なお、矢印と共に示す「×」記号は、参照不可であることを示す。 FIG. 3 is a diagram illustrating an inter-screen prediction structure of a non-reference B picture as an example according to the embodiment. Here, the inter-screen prediction structure will be described using the non-reference B picture 302 as an example. Note that FIG.
“I”, “P”, “B”, and “b” in FIG. 2 mean pictures similar to those in FIG. In FIG. 3, the pictures included in one GOP are arranged in the display order. Solid arrows indicate an example of the relationship between the non-reference B picture 302 and other pictures that can be referred to by the above restrictions ((2), (4), and (5) above). The “O” symbol shown with the solid arrow is the ARI.
B standard H.264 It shows that it can be referred to in the H.264 specification. A symbol “」 ”shown together with a solid arrow indicates that the reference can be made due to the restriction defined in the present embodiment. The dashed arrow
An example of the relationship between the non-reference B picture 301 that cannot be referred to by the above restrictions (above (3) and (4)) and other pictures is shown. The numbers shown with the arrows correspond to the numbers of the constraints that apply. Note that an “x” symbol shown with an arrow indicates that reference is impossible.

図２及び図３に示すとおり、参照Ｂピクチャから参照可能となるピクチャ、参照不可と
なるピクチャは、非参照Ｂピクチャから参照可能となるピクチャ、参照不可となるピクチ
ャと同じである。 As shown in FIGS. 2 and 3, a picture that can be referred to from a reference B picture and a picture that cannot be referred to are the same as a picture that can be referred to from a non-reference B picture and a picture that cannot be referred to.

図４は、実施形態に係る一例となるＧＯＰに含まれる各ピクチャの画面間予測構造を示
す図である。図４は、１つのＧＯＰに含まれる各ピクチャを表示順序に並べている。矢印
は、上記（１）〜（５）の制約に沿った各ピクチャ間の参照関係を示している。各ピクチ
ャ間の参照関係によれば、Ｉ０及びＰ８は０階層目、Ｂ４は１階層目、Ｂ２及びＢ６は２
階層目、ｂ１、ｂ３、ｂ５及びｂ７は３階層目である。０階層目は、ＩピクチャまたはＰ
ピクチャで構成される。１階層目及び２階層目は、参照Ｂピクチャで構成される。３階層
目は、非参照Ｂピクチャで構成される。つまり、１つのＧＯＰは、Ｂピクチャ間で３階層
以上の画面間予測構造をとることができる。このような上記（１）〜（５）の制約に沿っ
たＢピクチャ間の参照関係は、上位階層から下位階層への参照のみ可能にした３階層以上
の階層構造をとることができる。 FIG. 4 is a diagram illustrating an inter-screen prediction structure of each picture included in an exemplary GOP according to the embodiment. In FIG. 4, the pictures included in one GOP are arranged in the display order. The arrows indicate the reference relationship between the pictures according to the restrictions (1) to (5). According to the reference relationship between the pictures, I0 and P8 are the 0th layer, B4 is the 1st layer, B2 and B6 are 2 layers.
Hierarchies b1, b3, b5 and b7 are the third hierarchy. Layer 0 is I picture or P
Consists of pictures. The first layer and the second layer are composed of reference B pictures. The third layer is composed of non-reference B pictures. That is, one GOP can have an inter-screen prediction structure with three or more layers between B pictures. The reference relationship between the B pictures in accordance with the restrictions (1) to (5) described above can take a hierarchical structure of three or more hierarchies in which only reference from the upper hierarchy to the lower hierarchy is possible.

復号器は、図４に示す一例となる画面間予測構造に基づいて各ピクチャを復号し、表示
順序に沿ってディスプレイに表示できる。復号器は、図４に示す１つのＧＯＰに含まれる
０階層目〜３階層目に位置する全てのピクチャを復号処理、表示処理することで、通常再
生する。なお、復号器は、必要最小限のピクチャのみを復号することで図４を用いて説明
した通常再生速度の２ｎ倍で高速再生できる。図５〜７は、図４に示す階層構造での一例
となる高速再生を説明するための図である。図５〜７は、図４と同様に、１つのＧＯＰに
含まれる各ピクチャを表示順序に並べている。矢印は、上記（１）〜（５）の制約に沿っ
た各ピクチャ間の参照関係を示している。なお、図５〜７で示す実線は、高速再生に用い
られるピクチャ及びその参照関係を示し、図５〜７で示す破線は、高速再生に用いられな
いピクチャ及びその参照関係を示している。図５に示す高速再生は、０階層目に位置する
ピクチャだけを復号処理、表示処理する再生である。図６に示す高速再生は、０階層目及
び１階層目に位置するピクチャだけを復号処理、表示処理する再生である。図７に示す高
速再生は、０階層目〜２階層目に位置するピクチャだけの復号処理、表示処理する再生で
ある。再生速度は、復号処理、表示処理するピクチャの数に応じて変わる。そのため、再
生速度は、図４に示す通常再生、図７に示す高速再生、図６に示す高速再生、図５に示す
高速再生の順序で速くなる。 The decoder can decode each picture based on the exemplary inter-screen prediction structure shown in FIG. 4 and display it on the display in the display order. The decoder performs normal reproduction by decoding and displaying all the pictures located in the 0th to 3rd layers included in one GOP shown in FIG. Note that the decoder can perform high-speed playback at 2n times the normal playback speed described with reference to FIG. 4 by decoding only the minimum necessary pictures. 5 to 7 are diagrams for explaining high-speed playback as an example in the hierarchical structure shown in FIG. 5 to 7 arrange the pictures included in one GOP in the display order as in FIG. The arrows indicate the reference relationship between the pictures according to the restrictions (1) to (5). 5 to 7 indicate pictures used for high-speed playback and their reference relations, and broken lines shown in FIGS. 5 to 7 indicate pictures not used for high-speed playback and their reference relations. The high speed reproduction shown in FIG. 5 is a reproduction in which only the picture located in the 0th layer is decoded and displayed. The high speed reproduction shown in FIG. 6 is a reproduction in which only the pictures located in the 0th layer and the 1st layer are decoded and displayed. The high-speed playback shown in FIG. 7 is playback for decoding and displaying only pictures located in the 0th layer to the 2nd layer. The playback speed varies depending on the number of pictures to be decoded and displayed. Therefore, the reproduction speed increases in the order of normal reproduction shown in FIG. 4, high-speed reproduction shown in FIG. 7, high-speed reproduction shown in FIG. 6, and high-speed reproduction shown in FIG.

図８は、再生速度の変更の一例を説明するための図である。図８は、１つのＧＯＰに含
まれる各ピクチャを表示順序に並べた図である。ここでは、Ｂ１０に関して、上記制約に
沿った参照関係の一部を矢印で示している。実線の矢印は、参照可能となるＢ１０と他の
ピクチャとの関係の一例を示している。「○」記号は、参照可能であることを示す。破線
の矢印は、参照不可となるＢ１０と他のピクチャとの関係の一例を示している。「×」記
号は、参照不可であることを示す。一例として、復号器は、Ｉ０からＢ１０の手前までは
図５を用いて説明した０階層目のみに位置するピクチャのみを復号する高速再生で処理し
ているとする。再生速度が、Ｂ１０の手前で、図４を用いて説明した０階層目〜３階層目
に位置するピクチャを再生する通常再生速度まで落とすように切り替えられたとする。Ｂ
１０は、（３）の制約により、Ｂ４を参照できない。そのため、復号器は、Ｂ１０を復号
するのに、復号していないＢ４を復号する必要がない。一方、Ｂ１０は、（２）の制約に
より、高速再生で復号されたＰ８だけでなくＩ０も参照できる。復号器は、Ｂ１０を復号
するためだけに復号されていないピクチャを復号する必要がないため、容易に再生速度を
切り替えられる。 FIG. 8 is a diagram for explaining an example of changing the playback speed. FIG. 8 is a diagram in which the pictures included in one GOP are arranged in the display order. Here, with regard to B10, a part of the reference relationship that conforms to the above constraint is indicated by an arrow. A solid arrow indicates an example of a relationship between B10 that can be referred to and another picture. “O” sign indicates that reference is possible. A broken arrow indicates an example of a relationship between B10 that cannot be referred to and another picture. The “x” symbol indicates that reference is impossible. As an example, it is assumed that the decoder processes from I0 to B10 before high-speed playback that decodes only pictures located only in the 0th hierarchy described with reference to FIG. Assume that the playback speed is switched to the normal playback speed for playing back the pictures located in the 0th layer to the 3rd layer described with reference to FIG. 4 before B10. B
10 cannot refer to B4 due to the restriction of (3). Therefore, the decoder does not need to decode B4 that has not been decoded in order to decode B10. On the other hand, B10 can refer to not only P8 decoded by high-speed playback but also I0 due to the restriction (2). Since the decoder does not need to decode an undecoded picture only for decoding B10, the playback speed can be easily switched.

上記（１）〜（５）の制約によれば、Ｂピクチャ間で３階層以上の画面間予測構造をと
ることができる。主として上記（１）、（２）、（５）の制約によれば、符号化効率を極
力維持または向上させることができる。主として上記（３）、（４）の制約によれば、復
号器側で符号化ビット列を２ｎ倍で高速再生できると共に、容易に再生速度を変更できる
。したがって、本実施形態によれば、動画像符号化装置１０は、入力画像信号のフレーム
レートが増えたとしても、単位時間あたりに含まれるＩピクチャあるいはＰピクチャの数
を増やすことなく、符号化効率を極力維持または向上させつつ、復号器側での高速再生を
実現できる符号化ビット列を生成できる。 According to the restrictions (1) to (5) above, an inter-screen prediction structure having three or more layers can be taken between B pictures. Mainly according to the restrictions (1), (2), and (5), the encoding efficiency can be maintained or improved as much as possible. Mainly according to the restrictions (3) and (4) above, the encoded bit string can be reproduced at a high speed of 2n times on the decoder side, and the reproduction speed can be easily changed. Therefore, according to the present embodiment, the moving picture coding apparatus 10 can improve the coding efficiency without increasing the number of I pictures or P pictures included per unit time even if the frame rate of the input picture signal increases. As a result, it is possible to generate a coded bit string that can realize high-speed reproduction on the decoder side.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したも
のであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その
他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の
省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や
要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる
。 Although several embodiments of the present invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１０…動画符号化装置、１０１…制御部（制御手段）、１０２…減算器、１０３…直交
変換器、１０４…量子化器、１０５…逆量子化器、１０６…逆直交変換器、１０７…加算
器、１０８…ループフィルタ、１０９…フレームメモリ、１１０…予測画像生成部、１１
１…エントロピー符号化器。 DESCRIPTION OF SYMBOLS 10 ... Moving image coding apparatus, 101 ... Control part (control means), 102 ... Subtractor, 103 ... Orthogonal transformer, 104 ... Quantizer, 105 ... Inverse quantizer, 106 ... Inverse orthogonal transformer, 107 ... Addition 108, loop filter, 109, frame memory, 110, predicted image generation unit, 11
1: Entropy encoder.

Claims

Control means for performing control for generating the encoded data in accordance with restrictions when generating encoded data having a picture hierarchy structure of 0th to 3rd layers;
The encoded data is distributed by broadcast waves,
One GOP (Group of Pictures) included in the encoded data includes a plurality of pictures,
The types of pictures included in the encoded data include at least an I picture, a P picture, and a B picture,
The 0th layer has the I picture and the P picture, the 1st to 3rd layers have the B picture,
The B picture is encoded with reference to a plurality of other pictures that are bidirectional in display time,
And a reference B picture that is referred to for encoding the other B picture, and a non-reference B picture that is not referred to for encoding the other B picture,
The P picture of the 0th layer is a unidirectional picture in display time and is encoded with reference to other pictures of the 0th layer,
The encoded data includes normal reproduction for decoding the 0th to 3rd layer pictures, (1) the 1st to 3rd layer pictures, (2) the 2nd and 3rd layer pictures, or (3) A format capable of switching between high-speed playback that omits decoding of the picture of the third layer and in the middle of the encoded data,
The constraints are
Enabling reference from the reference B picture in the one GOP to another reference B picture;
While prohibiting reference to other B pictures in the B picture that are earlier in display time than the I picture or P picture immediately before in display time and in the hierarchy higher than the first layer. , A moving picture that enables reference to the I picture or the P picture immediately before the display time, and to another picture that is before the I picture or the P picture immediately before and in the 0th layer Image encoding device.

When generating encoded data having a picture hierarchy structure of 0th to 3rd layers using the first encoding method, control is performed to generate the encoded data according to restrictions, and the encoded data is , Delivered by broadcast waves,
One GOP (Group of Pictures) included in the encoded data includes a plurality of pictures,
The types of pictures included in the encoded data include at least an I picture, a P picture, and a B picture,
The 0th layer has the I picture and the P picture, the 1st to 3rd layers have the B picture,
The B picture is encoded with reference to a plurality of other pictures that are bidirectional in display time,
And a reference B picture that is referred to for encoding the other B picture, and a non-reference B picture that is not referred to for encoding the other B picture,
The P picture of the 0th layer is a unidirectional picture in display time and is encoded with reference to other pictures of the 0th layer,
The encoded data includes normal reproduction for decoding the 0th to 3rd layer pictures, (1) the 1st to 3rd layer pictures, (2) the 2nd and 3rd layer pictures, or (3) A format capable of switching between high-speed playback that omits decoding of the picture of the third layer and in the middle of the encoded data,
The constraints are
Enabling reference from the reference B picture in the one GOP to another reference B picture;
While prohibiting reference to other B pictures in the B picture that are earlier in display time than the I picture or P picture immediately before in display time and in the hierarchy higher than the first layer. , A moving picture that enables reference to the I picture or the P picture immediately before the display time, and to another picture that is before the I picture or the P picture immediately before and in the 0th layer Image coding method.