JP6888695B2

JP6888695B2 - Receiver and receiving method

Info

Publication number: JP6888695B2
Application number: JP2020002576A
Authority: JP
Inventors: 塚越　郁夫; 郁夫塚越
Original assignee: Sony Corp; Sony Group Corp
Current assignee: Sony Corp; Sony Group Corp
Priority date: 2013-08-09
Filing date: 2020-01-10
Publication date: 2021-06-16
Anticipated expiration: 2034-01-17
Also published as: JP2021132396A; JP2018142984A; JP2016054543A; JP2016054544A; JP7338745B2; JP6330667B2; JP6817250B2; JP2020074560A; JP2022126774A; JP2015097418A; JP5947443B2; JP5947444B2; JP7095775B2

Description

本技術は、受信装置および受信方法に関する。 The present technology relates to a receiving device and a receiving method.

圧縮動画を、放送、ネット等でサービスする際、受信機のデコード能力によって再生可能なフレーム周波数の上限が制限される。従って、サービス側は普及している受信機の再生能力を考慮して、低フレーム周波数のサービスのみに制限したり、高低複数のフレーム周波数のサービスを同時提供したりする必要がある。 When servicing compressed video over broadcasting, the Internet, etc., the upper limit of the reproducible frame frequency is limited by the decoding ability of the receiver. Therefore, it is necessary for the service side to limit the service to only the low frame frequency service or to provide the high and low frame frequency services at the same time in consideration of the reproduction capability of the widely used receiver.

受信機は、高フレーム周波数のサービスに対応するには、高コストとなり、普及の阻害要因となる。初期に低フレーム周波数のサービス専用の安価な受信機のみ普及していて、将来サービス側が高フレーム周波数のサービスを開始する場合、新たな受信機が無いと全く視聴不可能であり、サービスの普及の阻害要因となる。 Receivers are expensive to support high frame frequency services, which is a hindrance to their widespread use. Initially, only inexpensive receivers dedicated to low-frame frequency services were widespread, and if the service side started high-frame frequency services in the future, it would not be possible to watch at all without a new receiver, and the service would become widespread. It becomes an inhibitory factor.

例えば、ＨＥＶＣ（High Efficiency Video Coding）において、動画像データを構成する各ピクチャの画像データを階層符号化することによる時間方向スケーラビリティが提案されている（非特許文献１参照）。受信側では、ＮＡＬ（Network Abstraction Layer）ユニットのヘッダに挿入されているテンポラルＩＤ（temporal_id）に基づき、各ピクチャの階層を識別でき、デコード能力に対応した階層までの選択的なデコードが可能となる。 For example, in HEVC (High Efficiency Video Coding), temporal scalability by hierarchically coding the image data of each picture constituting the moving image data has been proposed (see Non-Patent Document 1). On the receiving side, the hierarchy of each picture can be identified based on the temporal ID (temporal_id) inserted in the header of the NAL (Network Abstraction Layer) unit, and selective decoding up to the hierarchy corresponding to the decoding ability becomes possible. ..

Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, Thomas Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECNOROGY, VOL. 22, NO. 12,pp. 1649-1668, DECEMBER 2012Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, Thomas Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECNOROGY, VOL. 22, NO. 12, pp . 1649-1668, DECEMBER 2012

本技術の目的は、受信側においてデコード能力に応じた良好なデコード処理を可能とすることにある。 An object of the present technology is to enable good decoding processing according to the decoding ability on the receiving side.

本技術の概念は、
動画像データを構成する各ピクチャの画像データを複数の階層に分類し、該分類された各階層の画像データを符号化し、該符号化された各階層のピクチャの画像データを持つビデオストリームを生成する画像符号化部と、
上記生成されたビデオストリームを含む所定フォーマットのコンテナを送信する送信部とを備え
上記画像符号化部は、
高階層ほどピクチャ毎の符号化画像データのデコード時間間隔が小さくなるように設定されたデコードタイミング情報を上記各階層のピクチャの符号化画像データに付加する
送信装置にある。 The concept of this technology is
The image data of each picture constituting the moving image data is classified into a plurality of layers, the image data of each of the classified layers is encoded, and a video stream having the image data of the pictures of the encoded layers is generated. Image coding unit and
The image coding unit includes a transmission unit that transmits a container of a predetermined format including the generated video stream.
There is a transmission device that adds decoding timing information set so that the decoding time interval of the coded image data for each picture becomes smaller as the layer is higher, to the coded image data of the pictures in each layer.

本技術において、画像符号化部により、動画像データを構成する各ピクチャの画像データが符号化されてビデオストリーム（符号化ストリーム）が生成される。この場合、動画像データを構成する各ピクチャの画像データが複数の階層に分類されて符号化され、各階層のピクチャの画像データを持つビデオストリームが生成される。この場合、各階層のピクチャの符号化画像データに、高階層ほどピクチャ毎の符号化画像データのデコード時間間隔が小さくなるように設定されたデコードタイミング情報、例えばデコードタイムスタンプが付加される。 In the present technology, the image coding unit encodes the image data of each picture constituting the moving image data to generate a video stream (encoded stream). In this case, the image data of each picture constituting the moving image data is classified into a plurality of layers and encoded, and a video stream having the image data of the pictures of each layer is generated. In this case, decoding timing information, for example, a decoding time stamp, which is set so that the decoding time interval of the encoded image data for each picture becomes smaller as the layer is higher, is added to the encoded image data of the pictures in each layer.

送信部により、上述のビデオストリームを含む所定フォーマットのコンテナが送信される。例えば、コンテナは、デジタル放送規格で採用されているトランスポートストリーム（ＭＰＥＧ−２ＴＳ）であってもよい。また、例えば、コンテナは、インターネットの配信などで用いられるＭＰ４、あるいはそれ以外のフォーマットのコンテナであってもよい。 The transmitter transmits a container in a predetermined format containing the video stream described above. For example, the container may be a transport stream (MPEG-2 TS) adopted in the digital broadcasting standard. Further, for example, the container may be a container of MP4 or another format used for distribution on the Internet.

例えば、画像符号化部は、各階層のピクチャの符号化画像データを持つ単一のビデオストリームを生成し、複数の階層を２以上の所定数の階層組に分割し、各階層組のピクチャの符号化画像データに、所属階層組を識別するための識別情報を付加する、ようにされてもよい。この場合、例えば、識別情報は、ビットストリームのレベル指定値であり、高階層側の階層組ほど高い値とされる、ようにされてもよい。 For example, the image coding unit generates a single video stream having encoded image data of the pictures of each layer, divides a plurality of layers into a predetermined number of layer sets of two or more, and of the pictures of each layer set. Identification information for identifying the affiliation hierarchical set may be added to the coded image data. In this case, for example, the identification information may be a bitstream level specification value, and the higher the hierarchy side, the higher the value.

また、例えば、画像符号化部は、複数の階層を２以上の所定数の階層組に分割し、各階層組のピクチャの符号化画像データをそれぞれ持つ所定数のビデオストリームを生成する、ようにされてもよい。この場合、例えば、画像符号化部は、各階層組のピクチャの符号化画像データに、所属階層組を識別するための識別情報を付加する、ようにされてもよい。そして、この場合、例えば、識別情報は、ビットストリームのレベル指定値であり、高階層側の階層組ほど高い値とされる、ようにされてもよい。 Further, for example, the image coding unit divides a plurality of layers into a predetermined number of layer sets of two or more, and generates a predetermined number of video streams each having coded image data of a picture of each layer set. May be done. In this case, for example, the image coding unit may add identification information for identifying the belonging layer set to the coded image data of the picture of each layer set. Then, in this case, for example, the identification information may be a bitstream level designation value, and the higher the hierarchy side, the higher the value.

このように本技術においては、高階層ほどピクチャ毎の符号化画像データのデコード時間間隔が小さくなるように設定されたデコードタイミング情報が各階層のピクチャの符号化画像データに付加されるものである。そのため、受信側においてデコード性能に応じた良好なデコード処理が可能となる。例えば、デコード能力が低い場合であっても、バッファ破たんを招くことなく、低階層のピクチャの符号化画像データを選択的にデコードすることが可能となる。 As described above, in the present technology, the decoding timing information set so that the decoding time interval of the coded image data for each picture becomes smaller as the layer is higher is added to the coded image data of the pictures in each layer. .. Therefore, good decoding processing according to the decoding performance is possible on the receiving side. For example, even when the decoding ability is low, it is possible to selectively decode the coded image data of a low-layer picture without causing a buffer failure.

なお、本技術において、例えば、画像符号化部は、各階層のピクチャの符号化画像データを持つ単一のビデオストリームを生成するか、あるいは複数の階層を２以上の所定数の階層組に分割し、各階層組のピクチャの符号化画像データをそれぞれ持つ所定数のビデオストリームを生成し、コンテナのレイヤに、このコンテナに含まれるビデオストリームの構成情報を挿入する情報挿入部をさらに備える、ようにされてもよい。この場合、例えば、受信側では、コンテナに含まれるビデオストリームの構成情報に基づいて、ビデオストリームの構成を容易に把握でき、適切なデコード処理を行うことが可能となる。 In the present technology, for example, the image coding unit generates a single video stream having coded image data of pictures in each layer, or divides a plurality of layers into two or more predetermined number of layer sets. Then, a predetermined number of video streams having each layered picture coded image data are generated, and the layer of the container is further provided with an information insertion unit for inserting the configuration information of the video stream contained in the container. May be made. In this case, for example, the receiving side can easily grasp the configuration of the video stream based on the configuration information of the video stream included in the container, and can perform appropriate decoding processing.

また、本技術において、例えば、送信部は、複数の階層を２以上の所定数の階層組に分割し、低階層側の階層組のピクチャの符号化画像データをコンテナするパケットの優先度ほど高く設定する、ようにされてもよい。この場合、例えば、受信側では、このパケットの優先度に基づいて、自身のデコード能力に応じた階層組のピクチャの符号化画像データのみをバッファに取り込むことが可能となる。 Further, in the present technology, for example, the transmission unit divides a plurality of layers into a predetermined number of layer sets of two or more, and the priority of the packet for containerizing the coded image data of the pictures of the layer set on the lower layer side is higher. It may be set. In this case, for example, on the receiving side, based on the priority of this packet, only the coded image data of the hierarchical set of pictures according to its own decoding ability can be taken into the buffer.

また、本技術の他の概念は、
動画像データを構成する各ピクチャの画像データが複数の階層に分類されて符号化されることで得られた各階層のピクチャの符号化画像データを持つビデオストリームを含む所定フォーマットのコンテナを受信する受信部を備え、
上記各階層のピクチャの符号化画像データには、高階層ほどピクチャ毎の符号化画像データのデコード時間間隔が小さくなるように設定されたデコードタイミング情報が付加されており、
上記受信されたコンテナに含まれる上記ビデオストリームから選択された所定階層以下の階層のピクチャの符号化画像データを、上記デコードタイミング情報が示すデコードタイミングでデコードして、上記所定階層以下の階層のピクチャの画像データを得る処理部をさらに備える
受信装置にある。 In addition, other concepts of this technology
Receives a container in a predetermined format including a video stream containing encoded image data of the pictures of each layer obtained by classifying and encoding the image data of each picture constituting the moving image data into a plurality of layers. Equipped with a receiver
Decoding timing information set so that the decoding time interval of the coded image data for each picture becomes smaller as the layer is higher is added to the coded image data of the pictures in each layer.
The encoded image data of the picture in the layer below the predetermined layer selected from the video stream included in the received container is decoded at the decoding timing indicated by the decoding timing information, and the picture in the layer below the predetermined layer is decoded. The receiver further includes a processing unit for obtaining the image data of the above.

本技術において、受信部により、所定フォーマットのコンテナが受信される。このコンテナには、動画像データを構成する各ピクチャの画像データが複数の階層に分類されて符号化されることで得られた各階層のピクチャの画像データを持つビデオストリームが含まれている。各階層のピクチャの符号化画像データには、高階層ほどピクチャ毎の符号化画像データのデコード時間間隔が小さくなるように設定されたデコードタイミング情報、例えばデコードタイムスタンプが付加されている。 In the present technology, a container of a predetermined format is received by the receiving unit. This container contains a video stream having image data of the pictures of each layer obtained by classifying and encoding the image data of each picture constituting the moving image data into a plurality of layers. Decoding timing information, for example, a decoding time stamp, which is set so that the decoding time interval of the encoded image data for each picture becomes smaller as the layer is higher, is added to the encoded image data of the pictures in each layer.

処理部により、受信コンテナに含まれるビデオストリームから選択された所定階層以下の階層のピクチャの符号化画像データがデコードされて各ピクチャの画像データが得られる。この場合、各ピクチャの符号化画像データのデコードは,
遅くとも、それぞれ、付加されているデコードタイミング情報が示すデコードタイミングで行われる。 The processing unit decodes the encoded image data of the pictures in the layers below the predetermined layer selected from the video stream included in the receiving container, and obtains the image data of each picture. In this case, the decoding of the coded image data of each picture is
At the latest, each is performed at the decoding timing indicated by the added decoding timing information.

例えば、受信されたコンテナには、各階層のピクチャの符号化画像データを持つ単一のビデオストリームが含まれており、複数の階層は２以上の所定数の階層組に分割され、低階層側の階層組のピクチャの符号化画像データをコンテナするパケットの優先度ほど高く設定されており、処理部は、デコード能力に応じて選択された優先度のパケットでコンテナされた所定階層組のピクチャの符号化画像データをバッファに取り込んでデコードする、ようにされてもよい。 For example, the received container contains a single video stream containing encoded image data of the pictures of each layer, and the plurality of layers are divided into a predetermined number of layer sets of two or more, and the lower layer side. The priority of the packet that containers the encoded image data of the picture of the hierarchical set of is set higher, and the processing unit sets the priority of the picture of the predetermined hierarchical set that is containerized with the packet of the priority selected according to the decoding ability. The encoded image data may be taken into a buffer and decoded.

また、例えば、受信されたコンテナには、複数の階層が分割されて得られた２以上の所定数の階層組のピクチャの画像データをそれぞれ持つ所定数のビデオストリームが含まれており、処理部は、デコード能力に応じて選択されたビデオストリームが持つ所定階層組のピクチャの符号化画像データをバッファに取り込んでデコードする、ようにされてもよい。 Further, for example, the received container contains a predetermined number of video streams each having image data of two or more predetermined number of hierarchical sets of pictures obtained by dividing a plurality of layers, and is a processing unit. May be configured to take the encoded image data of a predetermined hierarchical set of pictures of the video stream selected according to the decoding ability into a buffer and decode it.

このように本技術においては、各階層のピクチャの符号化画像データには高階層ほどピクチャ毎の符号化画像データのデコード時間間隔が小さくなるように設定されたデコードタイムスタンプが付加されており、選択された所定階層以下の階層のピクチャの符号化画像データがそれに付加されているデコードタイミング情報が示すデコードタイミングで行われるものである。そのため、デコード性能に応じた良好なデコード処理が可能となる。例えば、デコード能力が低い場合であっても、バッファ破たんを招くことなく、低階層のピクチャの符号化画像データを選択的にデコードすることが可能となる。 As described above, in the present technology, a decoding time stamp is added to the coded image data of the pictures in each layer so that the decoding time interval of the coded image data for each picture becomes smaller as the layer is higher. The encoded image data of the pictures in the selected layer below the predetermined layer is performed at the decoding timing indicated by the decoding timing information added to the encoded image data. Therefore, good decoding processing according to the decoding performance becomes possible. For example, even when the decoding ability is low, it is possible to selectively decode the coded image data of a low-layer picture without causing a buffer failure.

なお、本技術において、例えば、処理部で得られる各ピクチャの画像データのフレームレートを表示能力に合わせるポスト処理部をさらに備える、ようにされてもよい。この場合、デコード能力が低い場合であっても、高表示能力にあったフレームレートの画像データを得ることが可能となる。 In the present technology, for example, a post processing unit that matches the frame rate of the image data of each picture obtained by the processing unit with the display capability may be further provided. In this case, even when the decoding ability is low, it is possible to obtain image data having a frame rate suitable for the high display ability.

また、本技術の他の概念は、
動画像データを構成する各ピクチャの画像データを複数の階層に分類し、該分類された各階層のピクチャの画像データを符号化し、該符号化された各階層のピクチャの画像データを持つビデオストリームを生成する画像符号化部を備え、
上記画像符号化部は、上記複数の階層を２以上の所定数の階層組に分割し、各階層組に対応したサブストリームのそれぞれにビットストリームのレベル指定値を挿入し、
上記各階層組に対応したサブストリームのそれぞれに挿入されるビットストリームのレベル指定値は、自己の階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値とされる
符号化装置にある。 In addition, other concepts of this technology
The image data of each picture constituting the moving image data is classified into a plurality of layers, the image data of the pictures of each classified layer is encoded, and a video stream having the image data of the encoded pictures of each layer is obtained. Equipped with an image coding unit to generate
The image coding unit divides the plurality of layers into a predetermined number of layer sets of two or more, inserts a bitstream level specification value into each of the substreams corresponding to each layer set, and inserts a bitstream level specification value.
The level specification value of the bitstream inserted in each of the substreams corresponding to each of the above layer sets is a level value including pictures of all layers included in the layer set below the own layer set. is there.

本技術において、画像符号化部により、動画像データを構成する各ピクチャの画像データが複数の階層に分類され、この分類された各階層のピクチャの画像データが符号化され、この符号化された各階層のピクチャの画像データを持つビデオストリームが生成される。 In the present technology, the image coding unit classifies the image data of each picture constituting the moving image data into a plurality of layers, and the image data of the pictures of each of the classified layers is encoded and encoded. A video stream containing the image data of the pictures in each layer is generated.

この場合、複数の階層は２以上の所定数の階層組に分割され、各階層組に対応したサブストリームのそれぞれにビットストリームのレベル指定値が挿入される。そして、この場合、各階層組に対応したサブストリームのそれぞれに挿入されるビットストリームのレベル指定値は、自己の階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値とされる。 In this case, the plurality of layers are divided into a predetermined number of layer sets of two or more, and the level specified value of the bit stream is inserted into each of the substreams corresponding to each layer set. In this case, the level specification value of the bitstream inserted in each of the substreams corresponding to each layer set is a level value including pictures of all layers included in the layer set below the own layer set. ..

例えば、画像符号化部は、各階層組に対応したサブストリームのそれぞれを含む所定数のビデオストリームを生成する、ようにされてもよい。また、例えば、画像符号化部は、各階層組に対応したサブストリームの全てを含む単一のビデオストリームを生成する、ようにされてもよい。 For example, the image coding unit may be configured to generate a predetermined number of video streams including each of the substreams corresponding to each layer set. Also, for example, the image coding unit may be configured to generate a single video stream that includes all of the substreams corresponding to each layer set.

このように本技術においては、各階層組に対応したサブストリームのそれぞれにビットストリームのレベル指定値が挿入され、その値は自己の階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値とされるものである。そのため、ビデオストリームの受信側では、各サブストリームのデコードが可能か否かの判断を、挿入されているビットストリームのレベル指定値に基づいて容易に判断することが可能となる。 As described above, in the present technology, the level specification value of the bitstream is inserted into each of the substreams corresponding to each layer set, and the value includes the pictures of all layers included in the layer set below the own layer set. It is a level value. Therefore, the receiving side of the video stream can easily determine whether or not each substream can be decoded based on the level specified value of the inserted bitstream.

また、本技術の他の概念は、
動画像データを構成する各ピクチャの画像データを複数の階層に分類し、該分類された各階層のピクチャの画像データを符号化し、該符号化された各階層のピクチャの画像データを持つビデオストリームを生成する画像符号化部を備え、
上記画像符号化部は、上記複数の階層を２以上の所定数の階層組に分割し、各階層組に対応したサブストリームのそれぞれにビットストリームのレベル指定値を挿入し、
上記各階層組に対応したサブストリームのそれぞれに挿入されるビットストリームのレベル指定値は、当該階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値とされ、
上記生成されたビデオストリームを含む所定フォーマットのコンテナを送信する送信部と、
上記コンテナのレイヤに、各階層組のサブストリームに挿入されるビットストリームのレベル指定値が当該階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値であることを示すフラグ情報を挿入する情報挿入部をさらに備える
送信装置にある。 In addition, other concepts of this technology
The image data of each picture constituting the moving image data is classified into a plurality of layers, the image data of the pictures of each classified layer is encoded, and a video stream having the image data of the encoded pictures of each layer is obtained. Equipped with an image coding unit to generate
The image coding unit divides the plurality of layers into a predetermined number of layer sets of two or more, inserts a bitstream level specification value into each of the substreams corresponding to each layer set, and inserts a bitstream level specification value.
The level specification value of the bitstream inserted in each of the substreams corresponding to each of the above layer sets is a level value including pictures of all layers included in the layer sets below the layer set.
A transmitter that sends a container of a predetermined format containing the generated video stream,
In the layer of the above container, flag information indicating that the level specification value of the bitstream inserted into the substream of each layer set is a level value including pictures of all layers included in the layer set below the layer set is added. The transmitter is further provided with an information insertion unit to be inserted.

この場合、複数の階層は２以上の所定数の階層組に分割され、各階層組に対応したサブストリームのそれぞれにビットストリームのレベル指定値が挿入される。そして、この場合、各階層組に対応したサブストリームのそれぞれに挿入されるビットストリームのレベル指定値は、当該階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値とされる。 In this case, the plurality of layers are divided into a predetermined number of layer sets of two or more, and the level specified value of the bit stream is inserted into each of the substreams corresponding to each layer set. In this case, the level designation value of the bitstream inserted into each of the substreams corresponding to each layer set is a level value including pictures of all layers included in the layer set below the layer set.

送信部により、生成されたビデオストリームを含む所定フォーマットのコンテナが送信される。情報挿入部により、コンテナのレイヤに、各階層組のサブストリームに挿入されるビットストリームのレベル指定値が当該階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値であることを示すフラグ情報が挿入される。 The transmitter transmits a container in a predetermined format containing the generated video stream. The information insertion unit tells the layer of the container that the level specification value of the bitstream inserted in the substream of each hierarchy is the level value including the pictures of all the layers included in the hierarchy below the hierarchy. The indicated flag information is inserted.

このように本技術においては、受信側では、コンテナのレイヤに挿入されるフラグ情報により、各階層組のサブストリームに挿入されるビットストリームのレベル指定値が、当該階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値であることがわかる。そのため、受信側では、所定の階層組以下の各サブストリームに含まれる全ての階層のピクチャを含むレベル値を階層毎のレベル指定値を用いるなどして確認する処理が不要となり、デコード処理の効率化を図ることが可能となる。 As described above, in the present technology, on the receiving side, the level specification value of the bitstream inserted into the substream of each layer set is included in the layer set below the layer set by the flag information inserted into the layer of the container. It can be seen that the level value includes the pictures of all the layers. Therefore, on the receiving side, it is not necessary to check the level value including the pictures of all the layers included in each substream below the predetermined layer set by using the level specified value for each layer, and the efficiency of the decoding process becomes unnecessary. It becomes possible to achieve the conversion.

本技術によれば、デコード能力に応じた良好なデコード処理が可能となる。なお、ここに記載された効果は必ずしも限定されるものではなく、本開示中に記載されたいずれかの効果であってもよい。 According to this technology, good decoding processing according to the decoding ability becomes possible. The effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.

実施の形態としての送受信システムの構成例を示すブロック図である。It is a block diagram which shows the configuration example of the transmission / reception system as an embodiment. 送信装置の構成例を示すブロック図である。It is a block diagram which shows the configuration example of a transmission device. エンコーダで行われる階層符号化の一例を示す図である。It is a figure which shows an example of the hierarchical coding performed by an encoder. ＮＡＬユニットヘッダの構造例（Syntax）およびその構造例における主要なパラメータの内容（Semantics）を示す図である。It is a figure which shows the structure example (Syntax) of a NAL unit header, and the content (Semantics) of the main parameter in the structure example. ＨＥＶＣによる各ピクチャの符号化画像データの構成を説明するための図である。It is a figure for demonstrating the structure of the coded image data of each picture by HEVC. 階層符号化の際のエンコード、デコード、表示順序と遅延の一例を示す図である。It is a figure which shows an example of encoding, decoding, display order and delay at the time of hierarchical coding. 階層符号化の符号化ストリームと、指定階層における表示期待（表示順）を示す図である。It is a figure which shows the coded stream of the hierarchical coding, and the display expectation (display order) in a designated layer. エンコーダ入力順と、指定階層におけるデコーダ出力の表示順を示す図である。It is a figure which shows the encoder input order and the display order of a decoder output in a designated layer. 階層符号化の際のピクチャのエンコードタイミング（デコード時にはデコードタイミングとなる）の一例を示す図である。It is a figure which shows an example of the encoding timing (decoding timing at the time of decoding) of a picture at the time of hierarchical coding. エンコーダの単一のビデオストリーム（符号化ストリーム）の出力例を示す図である。It is a figure which shows the output example of a single video stream (encoded stream) of an encoder. エンコーダのベースストリーム（B-stream）と拡張ストリーム（E-stream）の２つのビデオストリーム（符号化ストリーム）の出力例を示す図である。It is a figure which shows the output example of two video streams (encoded stream) of the base stream (B-stream) and extended stream (E-stream) of an encoder. エンコーダの構成例を示すブロック図である。It is a block diagram which shows the structural example of an encoder. エンコーダの処理フローの一例を示す図である。It is a figure which shows an example of the processing flow of an encoder. ＨＥＶＣデスクリプタ（HEVC_descriptor）の構造例（Syntax）を示す図である。It is a figure which shows the structural example (Syntax) of a HEVC descriptor (HEVC_descriptor). ＨＥＶＣデスクリプタの構造例における主要な情報の内容（Semantics）を示す図である。It is a figure which shows the content (Semantics) of the main information in the structural example of a HEVC descriptor. スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）の構造例（Syntax）を示す図である。It is a figure which shows the structural example (Syntax) of the scalability extension descriptor (scalability_extension_descriptor). スケーラビリティ・エクステンション・デスクリプタの構造例における主要な情報の内容（Semantics）を示す図である。It is a figure which shows the content (Semantics) of the main information in the structural example of a scalability extension descriptor. ＴＳパケットの構造例（Syntax）を示す図である。It is a figure which shows the structure example (Syntax) of a TS packet. ＶＰＳに含まれるビットレートのレベル指定値（general_level_idc）と、ＴＳパケットヘッダの「transport_priority」の設定値との関係を示す図である。It is a figure which shows the relationship between the level specification value (general_level_idc) of the bit rate included in VPS, and the setting value of "transport_priority" of a TS packet header. マルチプレクサの構成例を示すブロック図である。It is a block diagram which shows the configuration example of a multiplexer. マルチプレクサの処理フローの一例を示す図である。It is a figure which shows an example of the processing flow of a multiplexer. 単一ストリームによる配信を行う場合のトランスポートストリームＴＳの構成例を示す図である。It is a figure which shows the configuration example of the transport stream TS in the case of delivering by a single stream. 単一ストリームによる配信を行う場合のトランスポートストリームＴＳの具体的な構成例を示す図である。It is a figure which shows the specific configuration example of the transport stream TS in the case of delivering by a single stream. 複数ストリーム（２ストリーム）による配信を行う場合のトランスポートストリームＴＳの構成例を示す図である。It is a figure which shows the configuration example of the transport stream TS in the case of delivering by a plurality of streams (2 streams). ２ストリームによる配信を行う場合のトランスポートストリームＴＳの具体的な構成例を示す図である。It is a figure which shows the specific configuration example of the transport stream TS in the case of delivering by 2 streams. ２ストリームによる配信を行う場合のトランスポートストリームＴＳの他の具体的な構成例を示す図である。It is a figure which shows the other concrete configuration example of the transport stream TS in the case of delivering by 2 streams. 受信装置の構成例を示すブロック図である。It is a block diagram which shows the configuration example of a receiving device. デマルチプレクサの構成例を示すブロック図である。It is a block diagram which shows the configuration example of a demultiplexer. トランスポートストリームＴＳに単一のビデオストリーム（符号化ストリーム）が含まれている場合を示す図である。It is a figure which shows the case which the transport stream TS contains a single video stream (encoded stream). トランスポートストリームＴＳにベースストリームと拡張ストリームの２つのビデオストリーム（符号化ストリーム）が含まれている場合を示す図である。It is a figure which shows the case where two video streams (encoded stream) of a base stream and an extended stream are included in a transport stream TS. デマルチプレクサの処理フロー（１フレーム）の一例を示す図である。It is a figure which shows an example of the processing flow (1 frame) of a demultiplexer. デマルチプレクサの処理フロー（２フレーム）の一例を示す図である。It is a figure which shows an example of the processing flow (2 frames) of a demultiplexer. デコーダの構成例を示すブロック図である。It is a block diagram which shows the structural example of a decoder. 受信装置におけるデコーダ処理能力を考慮したビデオストリーム毎のデコード処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the decoding processing procedure for every video stream in consideration of the decoder processing capacity in a receiving device. ポスト処理部の構成例を示す図である。It is a figure which shows the structural example of the post processing part. デコーダ、ポスト処理部の処理フローの一例を示す図である。It is a figure which shows an example of the processing flow of a decoder and a post processing part.

以下、発明を実施するための形態（以下、「実施の形態」とする）について説明する。なお、説明は以下の順序で行う。
１．実施の形態
２．変形例 Hereinafter, embodiments for carrying out the invention (hereinafter referred to as “embodiments”) will be described. The explanation will be given in the following order.
1. 1. Embodiment 2. Modification example

＜１．実施の形態＞
［送受信システム］
図１は、実施の形態としての送受信システム１０の構成例を示している。この送受信システム１０は、送信装置１００と、受信装置２００とを有する構成となっている。 <1. Embodiment>
[Transmission / reception system]
FIG. 1 shows a configuration example of the transmission / reception system 10 as an embodiment. The transmission / reception system 10 has a transmission device 100 and a reception device 200.

送信装置１００は、コンテナとしてのトランスポートストリームＴＳを放送波に載せて送信する。このトランスポートストリームＴＳには、動画像データを構成する各ピクチャの画像データが複数の階層に分類され、各階層のピクチャの画像データの符号化データを持つビデオストリームが含まれる。この場合、例えば、Ｈ．２６４／ＡＶＣ、Ｈ．２６５/ＨＥＶＣなどの符号化が施され、被参照ピクチャが自己階層および／または自己階層よりも低い階層に所属するように符号化される。 The transmission device 100 transmits the transport stream TS as a container on a broadcast wave. The transport stream TS includes a video stream in which the image data of each picture constituting the moving image data is classified into a plurality of layers and has encoded data of the image data of the pictures in each layer. In this case, for example, H. 264 / AVC, H. Coding such as 265 / HEVC is applied so that the referenced picture belongs to a self-hierarchy and / or a lower hierarchy than the self-hierarchy.

各階層のピクチャの符号化画像データに、ピクチャ毎に、所属階層を識別するための階層識別情報が付加される。この実施の形態においては、各ピクチャのＮＡＬユニット（nal_unit）のヘッダ部分に、階層識別情報（temporal_idを意味する“nuh_temporal_id_plus1”）が配置される。このように階層識別情報が付加されることで、受信側では、所定階層以下の階層の符号化画像データを選択的に取り出してデコード処理を行うことができる。 Hierarchical identification information for identifying the affiliation hierarchy is added to the coded image data of the pictures of each layer for each picture. In this embodiment, hierarchical identification information (“nuh_temporal_id_plus1” meaning temporal_id) is arranged in the header portion of the NAL unit (nal_unit) of each picture. By adding the layer identification information in this way, the receiving side can selectively take out the coded image data of the layer below the predetermined layer and perform the decoding process.

トランスポートストリームＴＳには、各階層のピクチャの符号化画像データを持つ単一のビデオストリーム、あるいは複数の階層が２以上の所定数の階層組に分割され、各階層組のピクチャの符号化画像データをそれぞれ持つ所定数のビデオストリームが含まれる。また、このトランスポートストリームＴＳには、階層符号化の階層情報と、ビデオストリームの構成情報が挿入される。この情報は、トランスポートレイヤに挿入される。この情報により、受信側では、階層構成やストリーム構成を容易に把握でき、適切なデコード処理を行うことが可能となる。 In the transport stream TS, a single video stream having encoded image data of the pictures of each layer, or a plurality of layers are divided into a predetermined number of layer sets of two or more, and the coded images of the pictures of each layer set. Contains a predetermined number of video streams, each with data. Further, the hierarchical information of the hierarchical coding and the configuration information of the video stream are inserted into the transport stream TS. This information is inserted into the transport layer. With this information, the receiving side can easily grasp the hierarchical structure and the stream structure, and can perform appropriate decoding processing.

また、上述したように複数の階層が所定数の階層組に分割され、低階層側の階層組のピクチャの符号化画像データをコンテナするＴＳパケット（トランスポートストリームパケット）の優先度が高く設定される。この優先度により、受信側では、自身のデコード能力に応じた階層組のピクチャの符号化画像データのみをバッファに取り込んで処理することが可能となる。 Further, as described above, a plurality of layers are divided into a predetermined number of layer sets, and the priority of the TS packet (transport stream packet) for containerizing the coded image data of the picture of the lower layer set is set high. To. With this priority, the receiving side can take in only the coded image data of the hierarchical set of pictures according to its own decoding ability into the buffer and process it.

また、上述したように複数の階層が所定数の階層組に分割され、各階層組のピクチャの符号化画像データに、所属階層組を識別するための識別情報が付加される。この識別情報として、例えば、ビットストリームのレベル指定値（level_idc）が利用され、高階層側の階層組ほど高い値とされる。 Further, as described above, a plurality of layers are divided into a predetermined number of layer sets, and identification information for identifying the belonging layer set is added to the coded image data of the pictures of each layer set. As this identification information, for example, the level specified value (level_idc) of the bit stream is used, and the higher the hierarchy side, the higher the value.

受信装置２００は、送信装置１００から放送波に載せて送られてくる上述のトランスポートストリームＴＳを受信する。受信装置２００は、このトランスポートストリームＴＳに含まれるビデオストリームから自身のデコード能力に応じて選択的に所定階層以下の階層の符号化画像データを取り出してデコードし、各ピクチャの画像データを取得して、画像再生を行う。 The receiving device 200 receives the above-mentioned transport stream TS transmitted on the broadcast wave from the transmitting device 100. The receiving device 200 selectively extracts and decodes the coded image data of the layer below the predetermined layer from the video stream included in the transport stream TS according to its own decoding ability, and acquires the image data of each picture. Then, the image is reproduced.

例えば、上述したように、トランスポートストリームＴＳに、複数の階層のピクチャの符号化画像データを持つ単一のビデオストリームが含まれている場合がある。その場合、デコード能力に応じて選択された優先度のＴＳパケットでコンテナされた所定階層組のピクチャの符号化画像データがバッファに取り込まれてデコードされる。 For example, as described above, the transport stream TS may include a single video stream having coded image data of a plurality of layers of pictures. In that case, the encoded image data of a predetermined hierarchical set of pictures containerized with TS packets having a priority selected according to the decoding ability is taken into the buffer and decoded.

また、例えば、上述したように、トランスポートストリームＴＳに、複数の階層が分割されて得られた２以上の所定数の階層組のピクチャの符号化画像データをそれぞれ持つ所定数のビデオストリームが含まれている場合がある。その場合、デコード能力に応じて選択されたビデオストリームが持つ所定階層組のピクチャの符号化画像データがバッファに取り込まれてデコードされる。 Further, for example, as described above, the transport stream TS includes a predetermined number of video streams each having coded image data of two or more predetermined number of hierarchical sets of pictures obtained by dividing a plurality of layers. It may be. In that case, the encoded image data of a predetermined hierarchical set of pictures of the video stream selected according to the decoding ability is taken into the buffer and decoded.

また、受信装置２００は、上述のようにデコードして得られた各ピクチャの画像データのフレームレートを表示能力に合わせるポスト処理を行う。このポスト処理により、例えば、デコード能力が低い場合であっても、高表示能力にあったフレームレートの画像データを得ることが可能となる。 In addition, the receiving device 200 performs post processing to match the frame rate of the image data of each picture obtained by decoding as described above with the display capability. By this post processing, for example, even when the decoding ability is low, it is possible to obtain image data having a frame rate suitable for the high display ability.

「送信装置の構成」
図２は、送信装置１００の構成例を示している。この送信装置１００は、ＣＰＵ（Central Processing Unit）１０１と、エンコーダ１０２と、圧縮データバッファ（ｃｐｂ：coded picture buffer）１０３と、マルチプレクサ１０４と、送信部１０５を有している。ＣＰＵ１０１は、制御部であり、送信装置１００の各部の動作を制御する。 "Configuration of transmitter"
FIG. 2 shows a configuration example of the transmission device 100. The transmission device 100 includes a CPU (Central Processing Unit) 101, an encoder 102, a compressed data buffer (cpb: coded picture buffer) 103, a multiplexer 104, and a transmission unit 105. The CPU 101 is a control unit and controls the operation of each unit of the transmission device 100.

エンコーダ１０２は、非圧縮の動画像データを入力して、階層符号化を行う。エンコーダ１０２は、この動画像データを構成する各ピクチャの画像データを複数の階層に分類する。そして、エンコーダ１０２は、この分類された各階層のピクチャの画像データを符号化し、各階層のピクチャの符号化画像データを持つビデオストリームを生成する。エンコーダ１０２は、例えば、Ｈ．２６４／ＡＶＣ、Ｈ．２６５／ＨＥＶＣなどの符号化を行う。この際、エンコーダ１０２は、参照するピクチャ（被参照ピクチャ）が、自己階層および／または自己階層よりも下位の階層に所属するように、符号化する。 The encoder 102 inputs uncompressed moving image data and performs hierarchical coding. The encoder 102 classifies the image data of each picture constituting the moving image data into a plurality of layers. Then, the encoder 102 encodes the image data of the pictures of each of the classified layers, and generates a video stream having the encoded image data of the pictures of each layer. The encoder 102 can be, for example, H.I. 264 / AVC, H. Coding such as 265 / HEVC is performed. At this time, the encoder 102 encodes the referenced picture (referenced picture) so that it belongs to a hierarchy lower than the self-hierarchy and / or the self-hierarchy.

図３は、エンコーダ１０２で行われる階層符号化の一例を示している。この例は、０から４までの５階層に分類され、各階層のピクチャの画像データに対して符号化が施された例である。 FIG. 3 shows an example of hierarchical coding performed by the encoder 102. This example is an example in which the image data of the pictures in each layer is encoded by being classified into 5 layers from 0 to 4.

縦軸は階層を示している。階層０から４のピクチャの符号化画像データを構成するＮＡＬユニット（nal_unit）のヘッダ部分に配置されるtemporal_id（階層識別情報）として、それぞれ、０から４が設定される。一方、横軸は表示順（ＰＯＣ：picture order of composition）を示し、左側は表示時刻が前で、右側は表示時刻が後になる。 The vertical axis shows the hierarchy. 0 to 4 are set as temporary_id (layer identification information) arranged in the header portion of the NAL unit (nal_unit) constituting the coded image data of the pictures in layers 0 to 4, respectively. On the other hand, the horizontal axis indicates the display order (POC: picture order of composition), the left side shows the display time before, and the right side shows the display time after.

図４（ａ）は、ＮＡＬユニットヘッダの構造例（Syntax）を示し、図４（ｂ）は、その構造例における主要なパラメータの内容（Semantics）を示している。「Forbidden_zero_bit」の１ビットフィールドは、０が必須である。「Nal_unit_type」の６ビットフィールドは、ＮＡＬユニットタイプを示す。「Nuh_layer_id」の６ビットフィールドは、０を前提とする。「Nuh_temporal_id_plus1」の３ビットフィールドは、temporal_idを示し、１を加えた値（１〜７）をとる。 FIG. 4 (a) shows a structural example (Syntax) of the NAL unit header, and FIG. 4 (b) shows the contents (Semantics) of the main parameters in the structural example. 0 is required for the 1-bit field of "Forbidden_zero_bit". The 6-bit field of "Nal_unit_type" indicates the NAL unit type. The 6-bit field of "Nuh_layer_id" is assumed to be 0. The 3-bit field of "Nuh_temporal_id_plus1" indicates temporary_id and takes a value (1 to 7) to which 1 is added.

図３に戻って、矩形枠のそれぞれがピクチャを示し、数字は、符号化されているピクチャの順、つまりエンコード順（受信側ではデコード順）を示している。例えば、「２」から「１７」の１６個のピクチャによりサブ・ピクチャグループ（Sub group of pictures）が構成されており、「２」はそのサブ・ピクチャグループの先頭のピクチャとなる。「１」は前のサブ・ピクチャグループのピクチャである。このサブ・ピクチャグループがいくつか集まってＧＯＰ（Group Of Pictures）となる。 Returning to FIG. 3, each of the rectangular frames indicates a picture, and the numbers indicate the order of the encoded pictures, that is, the encoding order (decoding order on the receiving side). For example, a sub group of pictures is composed of 16 pictures from "2" to "17", and "2" is the first picture in the sub picture group. "1" is a picture of the previous sub picture group. Several of these sub-picture groups are grouped together to form a GOP (Group Of Pictures).

ＧＯＰの先頭ピクチャの符号化画像データは、図５に示すように、ＡＵＤ、ＶＰＳ、ＳＰＳ、ＰＰＳ、ＰＳＥＩ、ＳＬＩＣＥ、ＳＳＥＩ、ＥＯＳのＮＡＬユニットにより構成される。一方、ＧＯＰの先頭ピクチャ以外のピクチャは、ＡＵＤ、ＰＰＳ、ＰＳＥＩ、ＳＬＩＣＥ、ＳＳＥＩ、ＥＯＳのＮＡＬユニットにより構成される。ＶＰＳはＳＰＳと共に、シーケンス（ＧＯＰ）に一度、ＰＰＳは毎ピクチャで伝送可能とされている。 As shown in FIG. 5, the coded image data of the first picture of the GOP is composed of NAL units of AUD, VPS, SPS, PPS, PSEI, SLICE, SSEI, and EOS. On the other hand, the pictures other than the first picture of the GOP are composed of NAL units of AUD, PPS, PSEI, SLICE, SSEI and EOS. VPS, together with SPS, can be transmitted once in a sequence (GOP), and PPS can be transmitted in each picture.

図３に戻って、実線矢印は、符号化におけるピクチャの参照関係を示している。例えば、「２」のピクチャは、Ｐピクチャであり、「１」のピクチャを参照して符号化される。また、「３」のピクチャは、Ｂピクチャであり、「１」、「２」のピクチャを参照して符号化される。同様に、その他のピクチャは、表示順で近くのピクチャを参照して符号化される。なお、階層４のピクチャは、他のピクチャからの参照がない。 Returning to FIG. 3, the solid arrow indicates the reference relationship of the picture in the coding. For example, the picture of "2" is a P picture and is encoded with reference to the picture of "1". Further, the picture of "3" is a B picture, and is encoded with reference to the pictures of "1" and "2". Similarly, the other pictures are encoded with reference to nearby pictures in display order. The picture in layer 4 is not referenced by other pictures.

エンコーダ１０２は、各階層のピクチャの符号化画像データを持つビデオストリームを生成する。例えば、エンコーダ１０２は、複数の階層を２以上の所定数の階層組に分割し、各階層組に対応したサブストリームのそれぞれを含む所定数のビデオストリームを生成するか、または、各階層組に対応したサブストリームの全てを含む単一のビデオストリームを生成する。 The encoder 102 generates a video stream having coded image data of pictures in each layer. For example, the encoder 102 divides a plurality of layers into a predetermined number of layer sets of two or more, and generates a predetermined number of video streams including each of the substreams corresponding to each layer set, or each layer set. Generate a single video stream that contains all of the corresponding substreams.

例えば、図３の階層符号化の例において、階層０から３が低階層の階層組とされ、階層４が高階層の階層組とされて２つの階層組に分割されるとき、２つのサブストリームが存在する。すなわち、階層０から３のピクチャの符号化画像データを持つサブストリームと、階層４のピクチャの符号化画像データを持つサブストリームである。この場合、エンコーダ１０２は、２つのサブストリームを含む単一のビデオストリーム、または２つのサブビデオストリームをそれぞれ含む２つのビデオストリームを生成する。 For example, in the example of hierarchical coding in FIG. 3, when layers 0 to 3 are set as low-level hierarchical groups and layer 4 is set as high-level hierarchical groups and divided into two hierarchical groups, two substreams are used. Exists. That is, there are a sub-stream having coded image data of the pictures of layers 0 to 3 and a sub-stream having coded image data of the pictures of layer 4. In this case, the encoder 102 produces a single video stream containing two substreams, or two video streams containing each of the two substreams.

エンコーダ１０２は、生成するビデオストリームの数によらず、上述したように、複数の階層を２以上の所定数の階層組に分割し、各階層組のピクチャの符号化画像データに、所属階層組を識別するための識別情報を付加する。この場合、例えば、識別情報として、ＳＰＳ（Sequence Parameter Set）と、ＥＳＰＳ（Enhanced Sequence Parameter Set）に含まれるビットストリームのレベル指定値である「general_level_idc」が利用される。 As described above, the encoder 102 divides a plurality of layers into a predetermined number of layer sets of two or more, and belongs to the coded image data of the pictures of each layer set, regardless of the number of video streams to be generated. Add identification information to identify. In this case, for example, SPS (Sequence Parameter Set) and "general_level_idc", which is a bitstream level specification value included in EPSS (Enhanced Sequence Parameter Set), are used as identification information.

ＳＰＳは、従来周知のＮＡＬユニットであり、最下位の階層組のサブストリーム、つまりベースサブストリームに、シーケンス（ＧＯＰ）毎に含まれる。一方、ＥＳＰＳは、新規定義するＮＡＬユニットであり、最下位より上位の階層組のサブストリーム、つまりエンハンスサブストリームに、シーケンス（ＧＯＰ）毎に含まれる。ＳＰＳ，ＥＳＰＳに含まれる「general_level_idc」の値は、高階層側の階層組ほど高い値とされる。 The SPS is a conventionally known NAL unit, and is included in the lowest hierarchical substream, that is, the base substream, for each sequence (GOP). On the other hand, the ECSP is a newly defined NAL unit, and is included in each sequence (GOP) in the substream of the hierarchical group higher than the lowest level, that is, the enhanced substream. The value of "general_level_idc" included in SPS and ESPS is set to be higher in the higher hierarchy side.

なお、サブレイヤ（sublayer）毎に「sub_layer_level_idc」をＳＰＳ，ＥＳＰＳで送ることができるので、階層組を識別する識別情報として、この「sub_layer_level_idc」を用いることも可能である。以上はＳＰＳだけでなくＶＰＳにおいても供給される。 Since "sub_layer_level_idc" can be sent by SPS and ESPS for each sublayer, it is also possible to use this "sub_layer_level_idc" as the identification information for identifying the hierarchical set. The above is supplied not only in SPS but also in VPS.

この場合、各階層組のサブストリームのＳＰＳ，ＥＳＰＳに挿入される「general_level_idc」の値は、自己の階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値とされる。例えば、図３の階層符号化の例において、階層０から３の階層組のサブストリームのＳＰＳに挿入される「general_level_idc」の値は、階層０から３のピクチャのみを含むレベル値とされる。例えば、そのフレームレートが６０Ｐであるときは、“ｌｅｖｅｌ５．１”とされる。また、例えば、図３の階層符号化の例において、階層４の階層組のサブストリームのＥＳＰＳに挿入される「general_level_idc」の値は、階層０から４の全てのピクチャを含むレベル値とされる。例えば、そのフレームレートが１２０Ｐであるときは、“ｌｅｖｅｌ５．２”とされる。 In this case, the value of "general_level_idc" inserted into the SPS and ESPS of the substream of each layer set is a level value including pictures of all layers included in the layer set below the own layer set. For example, in the example of hierarchical coding in FIG. 3, the value of "general_level_idc" inserted into the SPS of the substream of the hierarchical set of layers 0 to 3 is a level value including only the pictures of layers 0 to 3. For example, when the frame rate is 60P, it is set to "level 5.1". Further, for example, in the example of hierarchical coding in FIG. 3, the value of "general_level_idc" inserted into the EPSS of the substream of the hierarchical set of layer 4 is a level value including all the pictures of layers 0 to 4. .. For example, when the frame rate is 120P, it is set to "level 5.2".

図６は、階層符号化の際のエンコード、デコード、表示順序と遅延の一例を示している。この例は、上述の図３の階層符号化例に対応している。この例は、全階層（全レイヤ）を、フル時間解像度で階層符号化する場合を示している。図６（ａ）はエンコーダ入力を示す。図６（ｂ）に示すように、１６ピクチャ分の遅延をもって、各ピクチャがエンコード順にエンコードされて、符号化ストリームが得られる。また、図６（ｂ）はデコーダ入力も示し、各ピクチャがデコード順にデコードされる。そして、図６（ｃ）に示すように、４ピクチャの遅延をもって、各ピクチャがデコードの画像データが表示順に得られる。 FIG. 6 shows an example of encoding, decoding, display order and delay during hierarchical coding. This example corresponds to the hierarchical coding example of FIG. 3 described above. This example shows the case where all layers (all layers) are hierarchically coded at full time resolution. FIG. 6A shows the encoder input. As shown in FIG. 6B, each picture is encoded in the encoding order with a delay of 16 pictures to obtain a coded stream. FIG. 6B also shows a decoder input, and each picture is decoded in the order of decoding. Then, as shown in FIG. 6C, the image data decoded by each picture is obtained in the display order with a delay of 4 pictures.

図７（ａ）は、上述の図６（ｂ）に示す符号化ストリームと同様の符号化ストリームを、階層０から２、階層３、階層４の３段階に分けて示している。ここで、「Ｔｉｄ」は、temporal_idを示している。図７（ｂ）は、階層０から２、つまりＴｉｄ＝０〜２の部分階層の各ピクチャを選択的にデコードする場合の表示期待（表示順）を示している。また、図７（ｃ）は、階層０から３、つまりＴｉｄ＝０〜３の部分階層の各ピクチャを選択的にデコードする場合の表示期待（表示順）を示している。さらに、図７（ｄ）は、階層０から４、つまりＴｉｄ＝０〜４の全階層の各ピクチャを選択的にデコードする場合の表示期待（表示順）を示している。 FIG. 7A shows a coded stream similar to the coded stream shown in FIG. 6B described above, divided into three stages of layers 0 to 2, layer 3, and layer 4. Here, "Tid" indicates temporary_id. FIG. 7B shows display expectations (display order) when each picture in the sub-layers of layers 0 to 2, that is, Tid = 0 to 2, is selectively decoded. Further, FIG. 7C shows display expectations (display order) when each picture in the layers 0 to 3, that is, each picture in the partial layer of Tid = 0 to 3 is selectively decoded. Further, FIG. 7D shows display expectations (display order) in the case of selectively decoding each picture of layers 0 to 4, that is, all layers of Tid = 0 to 4.

図７（ａ）の符号化ストリームをデコード能力別にデコード処理するには、時間解像度がフルレートのデコード能力が必要となる。しかし、Ｔｉｄ＝０〜２のデコードを行う場合、符号化されたフルの時間解像度に対して、１/４のデコード能力をもつデコーダが処理可能とすべきである。また、Ｔｉｄ＝０〜３のデコードを行う場合、符号化されたフルの時間解像度に対して、１/２のデコード能力をもつデコーダが処理可能とすべきである。 In order to decode the coded stream of FIG. 7A according to the decoding ability, a decoding ability having a full time resolution is required. However, when decoding Tid = 0-2, a decoder with a decoding capability of 1/4 should be able to process the encoded full time resolution. Further, when decoding Tid = 0 to 3, a decoder having a decoding capability of 1/2 should be able to process the encoded full time resolution.

しかし、階層符号化において参照される低階層に属するピクチャが連続し、それらが時間解像度でフルなタイミングで符号化されると、部分デコードするデコーダの能力が追い付かないことになる。図７（ａ）のＡの期間がそれに該当する。Ｔｉｄ＝０〜２、あるいはＴｉｄ＝０〜３の部分的な階層をデコードするデコーダは、表示の例で示すような、時間軸が１/４あるいは１/２の能力でデコード・表示を行うため、Ａの期間符号化された時間解像度がフルで連続するピクチャのデコードはできない。その間、ｃｐｂにエンコーダの想定外のバッファ占有量になる。 However, if the pictures belonging to the lower layers referred to in the layer coding are continuous and they are coded at full timing at the time resolution, the ability of the decoder to partially decode cannot catch up. The period A in FIG. 7 (a) corresponds to this. A decoder that decodes a partial hierarchy of Tid = 0 to 2 or Tid = 0 to 3 is for decoding and displaying with a time axis of 1/4 or 1/2 as shown in the display example. , A period-encoded time resolution is full and continuous pictures cannot be decoded. During that time, cpb becomes an unexpected buffer occupancy of the encoder.

ＴａはＴｉｄ＝０〜２をデコードするデコーダにおけるピクチャ毎のデコード処理に要する時間を示す。ＴｂはＴｉｄ＝０〜３をデコードするデコーダにおけるピクチャ毎のデコード処理に要する時間を示す。ＴｃはＴｉｄ＝０〜４（全階層）をデコードするデコーダにおけるピクチャ毎のデコード処理に要する時間を示す。これらの各時間の関係は、Ｔａ＞Ｔｂ＞Ｔｃとなる。 Ta indicates the time required for the decoding process for each picture in the decoder that decodes Tid = 0 to 2. Tb indicates the time required for the decoding process for each picture in the decoder that decodes Tid = 0 to 3. Tc indicates the time required for the decoding process for each picture in the decoder that decodes Tid = 0 to 4 (all layers). The relationship between these times is Ta> Tb> Tc.

そこで、この実施の形態においては、階層符号化の低階層に属するピクチャは、ピクチャ毎のデコード間隔を大きく取り、高階層に行くにつれ、デコード間隔が小さくなるようにバッファ制御を行う。その際、階層数に対してのミニマムデコード能力（Target minimum decoder capability）を定義する。例えば、図３の階層符号化の例において、ミニマムデコード能力が階層２までデコードできる能力とすると、５階層のうちの１/４の時間解像度で階層０から２のピクチャをデコードできるように、エンコード時の間隔をとり、後述のマルチプレクサ１０４で多重化する際に、そのデコード時刻の差をデコードタイムスタンプ（ＤＴＳ：decoding Time stamp）の値に反映させる。 Therefore, in this embodiment, the pictures belonging to the lower layer of the layer coding have a large decoding interval for each picture, and the buffer control is performed so that the decoding interval becomes smaller as the layer goes to the higher layer. At that time, the minimum decoder capability for the number of layers is defined. For example, in the example of layer coding in FIG. 3, assuming that the minimum decoding ability is the ability to decode up to layer 2, encoding is performed so that pictures in layers 0 to 2 can be decoded at a time resolution of 1/4 of the five layers. When the time is spaced and the multiplexing is performed by the multiplexer 104 described later, the difference in decoding time is reflected in the value of the decoding time stamp (DTS).

図３の階層符号化例ように、階層数が０〜４の５つである場合、階層０から２に属するピクチャの間隔はフル解像度の４倍の時間間隔とし、階層３に属するピクチャの間隔は、フル解像度の２倍の時間間隔とし、階層４に属するピクチャの間隔は、フル解像度の時間間隔とする。 As in the example of layer coding in FIG. 3, when the number of layers is 5, the number of layers is 0 to 4, the interval between pictures belonging to layers 0 to 2 is set to a time interval four times the full resolution, and the interval between pictures belonging to layer 3 is set. Is twice the time interval of the full resolution, and the interval of the pictures belonging to the layer 4 is the time interval of the full resolution.

一方で、エンコーダ１０２は、階層間でピクチャのエンコード（＝デコード）のタイミングが重ならないようにする。すなわち、エンコーダ１０２は、上述の方法で各ピクチャのエンコードを行う際、低階層のピクチャと高階層のピクチャとでエンコードタイミングが重なる場合には、より多くのピクチャから参照される低階層のピクチャのエンコードを優先し、高階層のピクチャは、それに準じたタイミングとする。ただし、最高階層に属するピクチャは非参照のＢピクチャなので、デコードしてそのまま表示（つまりｄｐｂ（decoded picture buffer）に貯めない）とするタイミングとなるよう制御することが可能となる。 On the other hand, the encoder 102 prevents the timings of picture encoding (= decoding) from overlapping between layers. That is, when the encoder 102 encodes each picture by the above method, if the encoding timings of the low-level picture and the high-level picture overlap, the encoder 102 refers to the low-level picture from more pictures. Priority is given to encoding, and high-level pictures are set to the same timing. However, since the picture belonging to the highest layer is a non-referenced B picture, it is possible to control the timing so that the picture is decoded and displayed as it is (that is, it is not stored in the dpb (decoded picture buffer)).

図８（ａ）はエンコーダ入力順を示している（図６（ａ）と同じ）。また、図８（ｂ）〜（ｄ）は、表示順（システムレイヤとしてはＰＴＳに相当）を示している（図７（ｂ）〜（ｄ）と同じ）。 FIG. 8A shows the encoder input order (same as FIG. 6A). Further, FIGS. 8 (b) to 8 (d) show the display order (corresponding to PTS as a system layer) (same as FIGS. 7 (b) to (d)).

図９は、階層符号化の際のピクチャのエンコードタイミング（デコード時にはデコードタイミングとなる）の一例を示している。この例は、上述の図３の階層符号化例に対応している。そして、この例は、ミニマムデコード能力を、階層２までデコードできる能力としたものである。実線による下線が付された部分は、１つのＳＧＰ（Sub Group of Picture）に属するピクチャ（「２」〜「１７」の１６個のピクチャ）を示している。また、実線矩形枠で示すピクチャは現在のＳＧＰに属しているが、破線矩形枠で示すピクチャは現在のＳＧＰには属してはおらず、現在のＳＧＰに属するピクチャによる予測には影響を与えない。 FIG. 9 shows an example of picture encoding timing (decoding timing at the time of decoding) at the time of hierarchical coding. This example corresponds to the hierarchical coding example of FIG. 3 described above. Then, in this example, the minimum decoding ability is set to the ability to decode up to layer 2. The part underlined by a solid line indicates a picture belonging to one SGP (Sub Group of Picture) (16 pictures of "2" to "17"). Further, the picture shown by the solid line rectangular frame belongs to the current SGP, but the picture shown by the broken line rectangular frame does not belong to the current SGP and does not affect the prediction by the picture belonging to the current SGP.

この場合、階層０から２に属するピクチャ、つまりピクチャ「２」、「３」、「４」、「１１」・・・の間隔は、フル解像度の４倍の時間間隔であるＴａとされる。また、階層３に属するピクチャ、つまり「５」、「８」、「１２」・・・の間隔は、基本的には、フル解像度の２倍の時間間隔であるＴｂとされる。 In this case, the intervals of the pictures belonging to layers 0 to 2, that is, the pictures "2", "3", "4", "11" ... Are set to Ta, which is a time interval four times the full resolution. Further, the intervals of the pictures belonging to the layer 3, that is, "5", "8", "12" ... Are basically Tb, which is twice the time interval of the full resolution.

しかし、「８」のピクチャのタイミングは、「１１」のピクチャのタイミングとの重なりを避けるために、エンコードのタイミングが、次の時間間隔位置とされる。以下、同様に、「１２」、「１５」のピクチャのタイミングも、階層０から２に属するピクチャとの重なりを避けるように調整される。この結果、階層３に属するピクチャのタイミングは、階層０から２に属するピクチャのタイミングの中間とされる。 However, the timing of the picture of "8" is set to the next time interval position in order to avoid overlapping with the timing of the picture of "11". Hereinafter, similarly, the timings of the pictures of "12" and "15" are also adjusted so as to avoid overlapping with the pictures belonging to layers 0 to 2. As a result, the timing of the pictures belonging to the layer 3 is set to be intermediate with the timing of the pictures belonging to the layers 0 to 2.

また、階層４に属するピクチャ、つまり「６」、「７」、「９」・・の間隔は、基本的には、フル解像度の時間間隔であるＴｃとされる。しかし、階層０から３に属する各ピクチャのタイミングとの重なりを避けるように調整される結果、階層４に属するピクチャのタイミングは、階層０から３に属するピクチャのタイミングの中間とされる。 Further, the intervals of the pictures belonging to the layer 4, that is, "6", "7", "9" ... Are basically Tc, which is a full-resolution time interval. However, as a result of adjusting so as to avoid overlapping with the timing of each picture belonging to layers 0 to 3, the timing of the picture belonging to layer 4 is set to be intermediate with the timing of the picture belonging to layers 0 to 3.

図示のように、１ＳＧＰ期間で、１ＳＧＰ分のピクチャ（「２」〜「１７」の１６個のピクチャ）のエンコード処理が行われる。これは、上述したように低階層に属するピクチャのエンコード間隔を大きく取る場合であっても実時間処理が可能であることを示している。 As shown in the figure, the encoding process of 1 SGP worth of pictures (16 pictures of "2" to "17") is performed in 1 SGP period. This indicates that real-time processing is possible even when the encoding interval of the pictures belonging to the lower hierarchy is large as described above.

図１０は、エンコーダ１０２の出力例を示している。この例は、エンコーダ１０２が単一のビデオストリーム（符号化ストリーム）を出力する例である。この例は、図３の階層符号化例に対応し、図９に示すタイミングで各ピクチャがエンコードされた場合の例である。 FIG. 10 shows an output example of the encoder 102. This example is an example in which the encoder 102 outputs a single video stream (encoded stream). This example corresponds to the hierarchical coding example of FIG. 3, and is an example in which each picture is encoded at the timing shown in FIG.

ビデオストリームは、階層０から４に属する各ピクチャの符号化画像データがエンコード順（符号化順）に配列されたものなっている。なお、このビデオストリームを受信側でデコードするに当たっては、現在のＳＧＰ（太線枠のピクチャ）に属する被参照ピクチャ（階層０から３のピクチャ）はデコード後に非圧縮データバッファ（dpb：decoded picture buffer）に滞留し、他のピクチャからの参照に備える。 In the video stream, the coded image data of each picture belonging to layers 0 to 4 is arranged in the encoding order (coding order). When decoding this video stream on the receiving side, the referenced pictures (pictures of layers 0 to 3) belonging to the current SGP (thick line frame picture) are decoded and then uncompressed data buffer (dpb: decoded picture buffer). Stay in and prepare for reference from other pictures.

図１１は、エンコーダ１０２の出力例を示している。この例は、エンコーダ１０２がベースストリム（B_str）と拡張ストリーム（E_str）の２つのビデオストリーム（符号化ストリーム）を出力する例である。この例は、図３の階層符号化例に対応し、図９に示すタイミングで各ピクチャがエンコードされた場合の例である。 FIG. 11 shows an output example of the encoder 102. In this example, the encoder 102 outputs two video streams (encoded streams), a base strim (B_str) and an extension stream (E_str). This example corresponds to the hierarchical coding example of FIG. 3, and is an example in which each picture is encoded at the timing shown in FIG.

ベースストリム（B-stream）は、階層０から３に属する各ピクチャの符号化画像データがエンコード順（符号化順）に配列されたものなっている。また、拡張ストリーム（E-stream）は、階層４に属する各ピクチャの符号化画像データがエンコード順（符号化順）に配列されたものなっている。なお、このビデオストリームを受信側でデコードするに当たっては、現在のＳＧＰ（太線枠のピクチャ）に属する被参照ピクチャ（階層０から３のピクチャ）はデコード後に非圧縮画像データバッファ（dpb：decoded picture buffer）に滞留し、他のピクチャからの参照に備える。 In the base trim (B-stream), the coded image data of each picture belonging to layers 0 to 3 is arranged in the encoding order (coding order). Further, in the extended stream (E-stream), the coded image data of each picture belonging to the layer 4 is arranged in the encoding order (coding order). When decoding this video stream on the receiving side, the referenced pictures (pictures of layers 0 to 3) belonging to the current SGP (thick line frame picture) are decoded and then uncompressed image data buffer (dpb: decoded picture buffer). ) To prepare for reference from other pictures.

図１２は、エンコーダ１０２の構成例を示している。このエンコーダ１０２は、テンポラルＩＤ発生部１２１と、バッファ遅延制御部１２２と、ＨＲＤ（Hypothetical Reference Decoder）設定部１２３と、パラメータセット/ＳＥＩエンコード部１２４と、スライスエンコード部１２５と、ＮＡＬパケット化部１２６を有している。 FIG. 12 shows a configuration example of the encoder 102. The encoder 102 includes a temporal ID generation unit 121, a buffer delay control unit 122, an HRD (Hypothetical Reference Decoder) setting unit 123, a parameter set / SEI encoding unit 124, a slice encoding unit 125, and a NAL packetizing unit 126. have.

テンポラルＩＤ発生部１２１には、ＣＰＵ１０１から、階層数（Number of layers）の情報が供給される。テンポラルＩＤ発生部１２１は、この階層数の情報に基づいて、階層数に応じたtemporal_idを発生する。例えば、図３の階層符号例においては、temporal_id＝０〜４が発生される。 Information on the number of layers is supplied from the CPU 101 to the temporal ID generation unit 121. The temporary ID generation unit 121 generates temporary_id according to the number of layers based on the information of the number of layers. For example, in the hierarchical code example of FIG. 3, temporary_id = 0 to 4 is generated.

バッファ遅延制御部１２２には、ＣＰＵ１０１から、ミニマムデコード能力（minimum_target_decoder_level_idc）の情報が供給されると共に、テンポラルＩＤ発生部１２１で発生されるtemporal_idが供給される。バッファ遅延制御部１２２は、階層毎に、各ピクチャの「cpb_removal_delay」、「dpb_output_delay」を計算する。 The buffer delay control unit 122 is supplied with information on the minimum decoding capability (minimum_target_decoder_level_idc) from the CPU 101, and is also supplied with the temporary_id generated by the temporal ID generation unit 121. The buffer delay control unit 122 calculates "cpb_removal_delay" and "dpb_output_delay" of each picture for each layer.

この場合、階層数に対しての、それをデコードするターゲットデコーダのミニマムデコード能力が指定されることで、被参照の低階層ピクチャのエンコードタイミング、そしてデコード即表示の高階層ピクチャのエンコードタイミングが決定される（図９参照）。このエンコードタイミングは、受信側で圧縮データバッファ（ｃｐｂ：coded picture buffer）から読み出されるデコードタイミングと同じ意味を示す。 In this case, by specifying the minimum decoding ability of the target decoder that decodes the number of layers, the encoding timing of the referenced low-level picture and the encoding timing of the high-level picture that is displayed immediately after decoding are determined. (See FIG. 9). This encoding timing has the same meaning as the decoding timing read from the compressed data buffer (cpb: coded picture buffer) on the receiving side.

「cpb_removal_delay」は、ピクチャが属する階層を反映して決められる。例えば、階層数をＮとし、temporal_id（Ｔｉｄ）が０〜Ｎ−１の範囲の値をとるものとする。また、ミニマムデコード能力は、temporal_id＝Ｋの階層のピクチャをデコードできる能力とする。バッファ遅延制御部１２２は、各階層におけるピクチャのエンコード間隔Ｄを、以下の数式（１）で求め、「cpb_removal_delay」、「dpb_output_delay」に反映させる。 "Cbp_removal_delay" is determined by reflecting the hierarchy to which the picture belongs. For example, it is assumed that the number of layers is N and the temporary_id (Tid) takes a value in the range of 0 to N-1. Further, the minimum decoding ability is the ability to decode a picture in the hierarchy of temporary_id = K. The buffer delay control unit 122 obtains the picture encoding interval D in each layer by the following mathematical formula (1) and reflects it in "cpb_removal_delay" and "dpb_output_delay".

D = 2 ** (N-1 - K ) (Tid <= K )
D = 2 ** (N-1 - Tid ) (K < Tid < N - 1 )
D = 入力シーケンス間隔 (Tid = N ? 1 )
・・・（１） D = 2 ** (N-1 --K) (Tid <= K)
D = 2 ** (N-1 --Tid) (K <Tid <N --1)
D = Input sequence interval (Tid = N? 1)
... (1)

なお、階層間で、時間的にエンコードタイミングが重なってしまう場合は、低階層側が優先的にエンコードされ、高階層側は上記の式で割り当てられる、次のタイムスロット（timeslot）でエンコードされる。 If the encoding timings overlap in time between layers, the lower layer side is preferentially encoded, and the higher layer side is encoded in the next time slot (timeslot) assigned by the above formula.

ＨＲＤ（Hypothetical Reference Decoder）設定部１２３には、バッファ遅延制御部１２２で計算された各階層のピクチャの「cpb_removal_delay」、「dpb_output_delay」が供給されると共に、ＣＰＵ１０１からストリーム数（Number of streams）の情報が供給される。ＨＲＤ設定部１２３は、これらの情報に基づいてＨＲＤ設定を行う。 The HRD (Hypothetical Reference Decoder) setting unit 123 is supplied with "cpb_removal_delay" and "dpb_output_delay" of the pictures of each layer calculated by the buffer delay control unit 122, and information on the number of streams (Number of streams) from the CPU 101. Is supplied. The HRD setting unit 123 sets the HRD based on this information.

パラメータセット/ＳＥＩエンコード部１２４には、ＨＲＤ設定情報と共に、temporal_idが供給される。パラメータセット/ＳＥＩエンコード部１２４は、符号化するストリーム数に応じて、各階層のピクチャのＶＰＳ、ＳＰＳ（ＥＳＰＳ）、ＰＰＳなどのパラメータセットとＳＥＩを生成する。 Temporal_id is supplied to the parameter set / SEI encoding unit 124 together with the HRD setting information. The parameter set / SEI encoding unit 124 generates a parameter set such as VPS, SPS (ESPS), and PPS of each layer of pictures and SEI according to the number of streams to be encoded.

例えば、「cpb_removal_delay」と「dpb_output_delay」を含むピクチャ・タイミング・ＳＥＩ（Picture timing SEI）が生成される。また、例えば、「initial_cpb_removal_time」を含むバッファリング・ピリオド・ＳＥＩ（Buffereing Perifod SEI）が生成される。バッファリング・ピリオド・ＳＥＩは、ＧＯＰの先頭のピクチャ（アクセスユニット）に対応して生成される。 For example, a picture timing SEI (Picture timing SEI) including "cpb_removal_delay" and "dpb_output_delay" is generated. Further, for example, a buffering period SEI (Buffereing Perifod SEI) including "initial_cpb_removal_time" is generated. The buffering period SEI is generated corresponding to the first picture (access unit) of the GOP.

「initial cpb removal time」は、圧縮データバッファ（ｃｐｂ）からＧＯＰ（Group Of Picture）の先頭のピクチャの符号化画像データをデコードする際に取り出す時刻（初期時刻）を示す。「cpb_removal_delay」は、各ピクチャの符号化画像データを圧縮データバッファ（ｃｐｂ）から取り出す時間であり、「initial_cpb_removal_time」と合わせて時刻が決まる。また、「dpb_output_delay」は、デコードして非圧縮データバッファ（ｄｐｂ）に入ってから取り出す時間を示す。 The “initial cpb removal time” indicates a time (initial time) to be taken out when decoding the encoded image data of the first picture of the GOP (Group Of Picture) from the compressed data buffer (cpb). The "cpb_removal_delay" is the time to take out the encoded image data of each picture from the compressed data buffer (cpb), and the time is determined together with the "initial_cpb_removal_time". Further, "dpb_output_delay" indicates the time for decoding and entering the uncompressed data buffer (dpb) and then extracting the data.

スライスエンコード部１２５は、各階層のピクチャの画像データをエンコードしてスライスデータ（slice segment header, slice segment data）を得る。スライスデコード部１２５は、フレームバッファにより、時間方向の予測の状態を表す情報として、「Prediction Unit」の予測先ピクチャのインデックスを示す「ref_idx_l0_active(ref_idx_l1_active)を、「slice segment header」に挿入する。これにより、デコード時には、temporal_idで示される階層レベルと共に、被参照ピクチャが決定される。また、スライスデコード部１２５は、現在のスライス（slice）のインデックスを、「short_term_ref_pic_set_idx」、あるいは「it_idx_sps」として、「slice segment header」に挿入する。 The slice encoding unit 125 encodes the image data of the pictures of each layer to obtain slice data (slice segment header, slice segment data). The slice decoding unit 125 inserts "ref_idx_l0_active (ref_idx_l1_active)" indicating the index of the prediction destination picture of the "Prediction Unit" into the "slice segment header" as information indicating the state of prediction in the time direction by the frame buffer. As a result, at the time of decoding, the referenced picture is determined together with the hierarchy level indicated by temporary_id. Further, the slice decoding unit 125 inserts the index of the current slice (slice) into the "slice segment header" as "short_term_ref_pic_set_idx" or "it_idx_sps".

ＮＡＬパケット化部１２６は、パラメータセット/ＳＥＩエンコード部１２４で生成されたパラメータセットおよびＳＥＩと、スライスエンコード部１２５で生成されるスライスデータに基づき、各階層のピクチャの符号化画像データを生成し、ストリーム数に応じた数のビデオストリーム（符号化ストリーム）を出力する。 The NAL packetizing unit 126 generates coded image data of a picture of each layer based on the parameter set and SEI generated by the parameter set / SEI encoding unit 124 and the slice data generated by the slice encoding unit 125. Outputs a number of video streams (encoded streams) according to the number of streams.

その際、ピクチャごとに、その階層を示すtemporal_idがＮＡＬユニットヘッダに付される（図４参照）。また、temporal_idで示される階層に属するピクチャは、サブレイヤ（sub_layer）として括られ、サブレイヤごとのビットレートのレベル指定値「Level_idc」が「sublayer_level_idc」とされて、ＶＰＳやＳＰＳ（ＥＳＰＳ）に挿入される。 At that time, a temporary_id indicating the hierarchy is added to the NAL unit header for each picture (see FIG. 4). In addition, pictures belonging to the hierarchy indicated by temporary_id are grouped as sublayers, and the bit rate level specification value "Level_idc" for each sublayer is set to "sublayer_level_idc" and inserted into VPS or SPS (ESPS). ..

図１３は、エンコーダ１０２の処理フローを示す。エンコーダ１０２は、ステップＳＴ１において、処理を開始し、その後に、ステップＳＴ２の処理に移る。このステップＳＴ２において、エンコーダ１０２は、階層符号化における階層数Ｎを設定する。次に、エンコーダ１０２は、ステップＳＴ３において、各階層のピクチャのtemporal_idを０〜（Ｎ−１）とする。 FIG. 13 shows the processing flow of the encoder 102. The encoder 102 starts the process in step ST1, and then moves on to the process in step ST2. In this step ST2, the encoder 102 sets the number of layers N in the layer coding. Next, in step ST3, the encoder 102 sets the temporary_id of the picture in each layer to 0 to (N-1).

次に、エンコーダ１０２は、ステップＳＴ４において、対象デコーダのうち、最小能力のデコーダがデコードできる階層レベルＫを、０〜Ｎ−１の範囲内に設定する。そして、エンコーダ１０２は、ステップＳＴ５において、バッファ遅延制御部１２２で、各階層におけるピクチャエンコード間隔Ｄを、上述の数式（１）で求める。 Next, in step ST4, the encoder 102 sets the layer level K that can be decoded by the decoder having the minimum capacity among the target decoders in the range of 0 to N-1. Then, in step ST5, the encoder 102 uses the buffer delay control unit 122 to obtain the picture encoding interval D in each layer by the above mathematical formula (1).

次に、エンコーダ１０２は、ステップＳＴ６において、階層間でピクチャのエンコードタイミングが時間的に重なるか否かを判断する。エンコードタイミングが重なるとき、エンコーダ１０２は、ステップＳＴ７において、低階層のピクチャを優先して符号化し、高階層のピクチャは、次のエンコード間隔Ｄのタイミングでエンコードする。その後、エンコーダ１０２は、ステップＳＴ８の処理に移る。 Next, in step ST6, the encoder 102 determines whether or not the encoding timings of the pictures overlap in time between the layers. When the encoding timings overlap, the encoder 102 preferentially encodes the lower layer picture in step ST7, and encodes the higher layer picture at the timing of the next encoding interval D. After that, the encoder 102 moves to the process of step ST8.

ステップＳＴ６でエンコードタイミングが重ならないとき、エンコーダ１０２は、直ちに、ステップＳＴ８の処理に移る。このステップＳＴ８において、エンコーダ１０２は、ステップＳＴ５で求めた各階層のピクチャのエンコード間隔Ｄを「cpb_removal_delay」、「dpb_output_delay」に反映し、ＨＲＤ設定、パラメータセット/ＳＥＩのエンコード、スライスエンコードを行い、ＮＡＬユニットとして多重化ブロックへ転送する。その後、エンコーダ１０２は、ステップＳＴ９において、処理を終了する。 When the encoding timings do not overlap in step ST6, the encoder 102 immediately moves to the process of step ST8. In step ST8, the encoder 102 reflects the encoding interval D of the pictures of each layer obtained in step ST5 in "cpb_removal_delay" and "dpb_output_delay", performs HRD setting, parameter set / SEI encoding, and slice encoding, and NAL. Transfer to the multiplexing block as a unit. After that, the encoder 102 ends the process in step ST9.

図２に戻って、圧縮データバッファ(ｃｐｂ)１０３は、エンコーダ１０２で生成された、各階層のピクチャの符号化データを含むビデオストリームを、一時的に蓄積する。マルチプレクサ１０４は、圧縮データバッファ１０３に蓄積されているビデオストリームを読み出し、ＰＥＳパケット化し、さらにトランスポートパケット化して多重し、多重化ストリームとしてのトランスポートストリームＴＳを得る。 Returning to FIG. 2, the compressed data buffer (cpb) 103 temporarily stores a video stream containing encoded data of the pictures of each layer generated by the encoder 102. The multiplexer 104 reads the video stream stored in the compressed data buffer 103, converts it into a PES packet, further converts it into a transport packet and multiplexes it, and obtains a transport stream TS as a multiplexed stream.

このトランスポートストリームＴＳには、各階層のピクチャの符号化画像データを持つ単一のビデオストリーム、あるいは複数の階層が２以上の所定数の階層組に分割され、各階層組のピクチャの符号化画像データをそれぞれ持つ所定数のビデオストリームが含まれる。マルチプレクサ１０４は、トランスポートストリームＴＳに、階層情報、ストリーム構成情報を挿入する。 In this transport stream TS, a single video stream having encoded image data of the pictures of each layer, or a plurality of layers are divided into a predetermined number of layer sets of two or more, and the pictures of each layer set are encoded. Includes a predetermined number of video streams, each with image data. The multiplexer 104 inserts hierarchical information and stream configuration information into the transport stream TS.

トランスポートストリームＴＳには、ＰＳＩ（Program Specific Information）として、ＰＭＴ（Program Map Table）が含まれている。このＰＭＴには、各ビデオストリームに関連した情報を持つビデオエレメンタリ・ループ（video ES1 loop）が存在する。このビデオエレメンタリ・ループには、各ビデオストリームに対応して、ストリームタイプ、パケット識別子（ＰＩＤ）等の情報が配置されると共に、そのビデオストリームに関連する情報を記述するデスクリプタも配置される。 The transport stream TS includes PMT (Program Map Table) as PSI (Program Specific Information). In this PMT, there is a video elemental loop (video ES1 loop) that has information related to each video stream. In this video elemental loop, information such as a stream type and a packet identifier (PID) is arranged corresponding to each video stream, and a descriptor describing information related to the video stream is also arranged.

マルチプレクサ１０４は、このデスクリプタの一つとして、ＨＥＶＣデスクリプタ（HEVC_descriptor）を挿入し、さらに、新たに定義するスケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）を挿入する。 The multiplexer 104 inserts a HEVC descriptor (HEVC_descriptor) as one of the descriptors, and further inserts a newly defined scalability extension descriptor (scalability_extension_descriptor).

図１４は、ＨＥＶＣデスクリプタ（HEVC_descriptor）の構造例（Syntax）を示している。また、図１５は、その構造例における主要な情報の内容（Semantics）を示している。 FIG. 14 shows a structural example (Syntax) of a HEVC descriptor (HEVC_descriptor). Further, FIG. 15 shows the contents (Semantics) of the main information in the structural example.

「descriptor_tag」の８ビットフィールドは、デスクリプタタイプを示し、ここでは、ＨＥＶＣデスクリプタであることを示す。「descriptor_length」の８ビットフィールドは、デスクリプタの長さ（サイズ）を示し、デスクリプタの長さとして、以降のバイト数を示す。 The 8-bit field of "descriptor_tag" indicates the descriptor type, and here, it indicates that it is a HEVC descriptor. The 8-bit field of "descriptor_length" indicates the length (size) of the descriptor, and indicates the number of bytes thereafter as the length of the descriptor.

「level_idc」の８ビットフィールドは、ビットレートのレベル指定値を示す。また、「temporal_layer_subset_flag = 1」であるとき、「temporal_id_min」の５ビットフィールドと、「temporal_id_max」の５ビットフィールドが存在する。「temporal_id_min」は、対応するビデオストリームに含まれる階層符号化データの最も低い階層のtemporal_idの値を示す。「temporal_id_max」は、対応するビデオストリームが持つ階層符号化データの最も高い階層のtemporal_idの値を示す。 The 8-bit field of "level_idc" indicates the level specification value of the bit rate. Further, when "temporal_layer_subset_flag = 1", there are a 5-bit field of "temporal_id_min" and a 5-bit field of "temporal_id_max". "Temporal_id_min" indicates the value of temporary_id of the lowest hierarchy of the hierarchically coded data contained in the corresponding video stream. "Temporal_id_max" indicates the value of temporary_id of the highest hierarchy of the hierarchically coded data of the corresponding video stream.

「level_constrained_flag」の１ビットフィールドは、新たに定義するものであり、該当サブストリーム（substream）にＳＰＳあるいはＥＳＰＳが存在し、その要素の“general_level_idc”は、そのサブストリームが含むtemporal_id（階層識別情報）以下のピクチャ（Picture）を含むレベル（Level）値をもつことを示す。“１”は、該当サブストリームにＳＰＳあるいはＥＳＰＳが存在し、その要素の“general_level_idc”は、そのサブストリームが含むtemporal_id 以下のピクチャを含むレベル値をもつ、ことを示す。“０”は、対象となるサービスを構成するサブストリーム群の中にはＳＰＳが１つ存在し、その“general_level_idc”は、当該サブストリームのみならず、同一サービスの下の他のサブストリームも含むレベル値を示す。 The 1-bit field of "level_constrained_flag" is newly defined, and SPS or EPSS exists in the corresponding substream, and the element "general_level_idc" is the temporary_id (hierarchical identification information) included in the substream. Indicates that it has a Level value that includes the following Picture. “1” indicates that SPS or ESPS exists in the corresponding substream, and “general_level_idc” of the element has a level value including pictures below the temporal_id included in the substream. “0” means that one SPS exists in the substream group constituting the target service, and the “general_level_idc” includes not only the substream but also other substreams under the same service. Indicates the level value.

「scalability_id」の３ビットフィールドは、新たに定義するものであり、複数のビデオストリームがスケーラブルなサービスを供給する際、個々のストリームに付されるスケーラビリティを示すＩＤである。“０”はベースストリームを示し、“１”〜“７”はベースストリームからのスケーラビリティの度合いによって増加するＩＤである。 The 3-bit field of "scalability_id" is newly defined and is an ID indicating the scalability assigned to each stream when a plurality of video streams provide a scalable service. “0” indicates a base stream, and “1” to “7” are IDs that increase depending on the degree of scalability from the base stream.

図１６は、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）の構造例（Syntax）を示している。また、図１７は、その構造例における主要な情報の内容（Semantics）を示している。 FIG. 16 shows a structural example (Syntax) of a scalability extension descriptor (scalability_extension_descriptor). In addition, FIG. 17 shows the contents (Semantics) of the main information in the structural example.

「scalability_extension_descriptor_tag」の８ビットフィールドは、デスクリプタタイプを示し、ここでは、スケーラビリティ・エクステンション・デスクリプタであることを示す。「scalability_extension_descriptor_length」の８ビットフィールドは、デスクリプタの長さ（サイズ）を示し、デスクリプタの長さとして、以降のバイト数を示す。「extension_stream_existing_flag」の１ビットフィールドは、別ストリームによる拡張サービスがあることを示すフラグである。“１”は拡張ストリームがあることを示し、“０”は拡張ストリームがないことを示す。 The 8-bit field of "scalability_extension_descriptor_tag" indicates the descriptor type, and here, it indicates that it is a scalability extension descriptor. The 8-bit field of "scalability_extension_descriptor_length" indicates the length (size) of the descriptor, and indicates the number of bytes thereafter as the descriptor length. The 1-bit field of "extension_stream_existing_flag" is a flag indicating that there is an extension service by another stream. “1” indicates that there is an extended stream, and “0” indicates that there is no extended stream.

「extension_type」の３ビットフィールドは、拡張のタイプを示す。“００１”は、拡張が、時間方向スケーラブルであることを示す。“０１０”は、拡張が、空間方向スケーラブルであることを示す。“０１１”は、拡張が、ビットレートスケーラブルであることを示す。 The 3-bit field of "extension_type" indicates the type of extension. “001” indicates that the extension is time-wise scalable. “010” indicates that the extension is spatially scalable. “011” indicates that the extension is bit rate scalable.

「number_of_streams」の４ビットフィールドは、配信サービスに関与するストリームの総数を示す。「scalability_id」の３ビットフィールドは、複数のビデオストリームがスケーラブルなサービスを供給する際、個々のストリームに付されるスケーラビリティを示すＩＤである。“０”はベースストリームを示し、“１”〜“７”はベースストリームからのスケーラビリティの度合いによって増加するＩＤである。「minimum_target_decoder_level_idc」の８ビットフィールドは、該当ストリームが対象とするデコーダの能力を示す。この情報は、受信機において、デコーダがストリームをデコードする前に符号化ピクチャの想定デコードタイミングがデコーダのpictureデコード処理能力の範囲を超えていないかどうかの判断に利用する。 The 4-bit field of "number_of_streams" indicates the total number of streams involved in the distribution service. The 3-bit field of "scalability_id" is an ID indicating the scalability attached to each stream when a plurality of video streams provide a scalable service. “0” indicates a base stream, and “1” to “7” are IDs that increase depending on the degree of scalability from the base stream. The 8-bit field of "minimum_target_decoder_level_idc" indicates the ability of the decoder targeted by the corresponding stream. This information is used in the receiver to determine whether the assumed decoding timing of the coded picture does not exceed the range of the decoder's picture decoding processing capacity before the decoder decodes the stream.

上述したように、この実施の形態において、ＳＰＳ，ＥＳＰＳに含まれるビットレートのレベル指定値（general_level_idc）などは、複数の階層を２以上の所定数の階層組に分割した際の所属階層組の識別情報として利用される。各階層組のレベル指定値の値は、この階層組のピクチャと、この階層組より低階層側の全ての階層組のピクチャとからなるフレームレートに対応した値とされる。 As described above, in this embodiment, the bit rate level specified value (general_level_idc) and the like included in SPS and ESPS are the affiliation layer sets when a plurality of layers are divided into a predetermined number of layer sets of two or more. It is used as identification information. The value of the level designation value of each layer set is a value corresponding to the frame rate including the picture of this layer group and the picture of all the layer groups on the lower layer side of this layer group.

マルチプレクサ１０４は、低階層側の階層組のピクチャの符号化画像データをコンテナするＴＳパケットの優先度ほど高く設定する。マルチプレクサ１０４は、例えば、複数の階層を低階層組と高階層組に二分する場合、ＴＳパケットヘッダの「transport_priority」の１ビットフィールドを利用する。 The multiplexer 104 is set as high as the priority of the TS packet that containers the encoded image data of the pictures of the lower layer set. The multiplexer 104 uses the 1-bit field of "transport_priority" of the TS packet header, for example, when dividing a plurality of layers into a low-layer group and a high-layer group.

図１８は、ＴＳパケットの構造例（Syntax）を示している。「transport_priority」の１ビットフィールドは、ベースレイヤ、つまり低階層側の階層組のピクチャの符号化画像データをコンテナするＴＳパケットの場合は“１”に設定され、ノンベースレイヤ、つまり高階層側の階層組のピクチャの符号化画像データをコンテナするＴＳパケットの場合は“０”に設定される。 FIG. 18 shows a structural example (Syntax) of the TS packet. The 1-bit field of "transport_priority" is set to "1" in the case of a TS packet that containers the encoded image data of the picture of the base layer, that is, the lower layer, and the non-base layer, that is, the higher layer. In the case of a TS packet that containers encoded image data of a layered picture, it is set to "0".

図１９は、ＳＰＳ，ＥＳＰＳのＮＡＬユニットに含まれるビットレートのレベル指定値（general_level_idc）と、ＴＳパケットヘッダの「transport_priority」の設定値との関係を示している。受信側では、これらの情報の一方あるいは双方を用いて、低階層側の階層組のピクチャの符号化画像データと、高階層側の階層組のピクチャの符号化画像データとを、分別することが可能となる。 FIG. 19 shows the relationship between the bit rate level specified value (general_level_idc) included in the NAL unit of SPS and ECSP and the set value of “transport_priority” in the TS packet header. On the receiving side, one or both of these pieces of information can be used to separate the coded image data of the lower layer set of pictures and the coded image data of the higher layer set of pictures. It will be possible.

図２０は、マルチプレクサ１０４の構成例を示している。ＴＳプライオリティ発生部１４１と、セクションコーディング部１４２と、ＰＥＳパケット化部１４３-1〜１４３-Nと、スイッチ部１４４と、トランスポートパケット化部１４５を有している。 FIG. 20 shows a configuration example of the multiplexer 104. It has a TS priority generation unit 141, a section coding unit 142, a PES packetization unit 143-1 to 143-N, a switch unit 144, and a transport packetization unit 145.

ＰＥＳパケット化部１４３-1〜１４３-Nは、それぞれ、圧縮データバッファ１０３に蓄積されているビデオストリーム１〜Ｎを読み込み、ＰＥＳパケットを生成する。この際、ＰＥＳパケット化部１４３-1〜１４３-Nは、ビデオストリーム１〜ＮのＨＲＤ情報を元にＤＴＳ（Decoding Time Stamp）、ＰＴＳ（Presentation Time Stamp）のタイムスタンプをＰＥＳヘッダに付与する、この場合、各ピクチャの「cpu_removal_delay」、「dpb_output_delay」が参照され、ＳＴＣ（System Time Clock）時刻に同期した精度で、各々ＤＴＳ、ＰＴＳに変換され、ＰＥＳヘッダの所定位置に配置される。 The PES packetizing units 143-1 to 143-N read the video streams 1 to N stored in the compressed data buffer 103, respectively, and generate a PES packet. At this time, the PES packetizing units 143-1 to 143-N add DTS (Decoding Time Stamp) and PTS (Presentation Time Stamp) time stamps to the PES header based on the HRD information of the video streams 1 to N. In this case, "cpu_removal_delay" and "dpb_output_delay" of each picture are referred to, converted into DTS and PTS, respectively, with an accuracy synchronized with the STC (System Time Clock) time, and placed at a predetermined position in the PES header.

スイッチ部１４４は、ＰＥＳパケット化部１４３-1〜１４３-Nで生成されたＰＥＳパケットを、パケット識別子（ＰＩＤ）に基づいて選択的に取り出し、トランスポートパケット化部１４５に送る。トランスポートパケット化部１４５は、ＰＥＳパケットをペイロードに含むＴＳパケットを生成し、トランスポートストリームを得る。 The switch unit 144 selectively takes out the PES packet generated by the PES packetizing units 143-1 to 143-N based on the packet identifier (PID) and sends it to the transport packetizing unit 145. The transport packetization unit 145 generates a TS packet containing a PES packet in the payload and obtains a transport stream.

ＴＳプライオリティ発生部１４１には、ＣＰＵ１０１から、階層数（Number of layers）とストリーム数（Number of streams）の情報が供給される。ＴＳプライオリティ発生部１４１は、階層数で示される複数の階層を２以上の所定数の階層組に分割した場合における、各階層組の優先度を発生する。例えば、２分割される場合には、ＴＳパケットヘッダの「transport_priority」の１ビットフィールドに挿入すべき値が発生される（図１９参照）。 Information on the number of layers and the number of streams is supplied from the CPU 101 to the TS priority generation unit 141. The TS priority generation unit 141 generates the priority of each layer set when a plurality of layers indicated by the number of layers are divided into two or more layer sets. For example, when it is divided into two, a value to be inserted in the 1-bit field of "transport_priority" of the TS packet header is generated (see FIG. 19).

ＴＳプライオリティ発生部１４１は、各階層組の優先度の情報を、トランスポートパケット化部１４５に送る。トランスポートパケット化部１４５は、この情報に基づいて、各ＴＳパケットの優先度を設定する。この場合、上述したように、低階層側の階層組のピクチャの符号化画像データをコンテナするＴＳパケットの優先度ほど高く設定する。 The TS priority generation unit 141 sends the priority information of each layer set to the transport packetization unit 145. The transport packetization unit 145 sets the priority of each TS packet based on this information. In this case, as described above, the priority of the TS packet that containers the encoded image data of the pictures of the lower layer set is set higher.

セクションコーディング部１４２には、ＣＰＵ１０１から、階層数（Number of layers）と、ストリーム数（Number of streams）と、最小ターゲットデコーダ・レベル(Minimum_target_decoder_level_idc)の情報が供給される。セクションコーディング部１４２は、この情報に基づいて、トランスポートストリームＴＳに挿入すべき各種のセクションデータ、例えば、上述したＨＥＶＣデスクリプタ（HEVC_descriptor）、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）などを生成する。 Information on the number of layers, the number of streams, and the minimum target decoder level (Minimum_target_decoder_level_idc) is supplied to the section coding unit 142 from the CPU 101. Based on this information, the section coding unit 142 generates various section data to be inserted into the transport stream TS, for example, the above-mentioned HEVC descriptor (HEVC_descriptor), scalability extension descriptor (scalability_extension_descriptor), and the like.

セクションコーディング部１４２は、各種セクションデータを、トランスポートパケット化部１４５に送る。トランスポートパケット化部１４５は、このセクションデータを含むＴＳパケットを生成し、トランスポートストリームＴＳに挿入する。 The section coding unit 142 sends various section data to the transport packetization unit 145. The transport packetization unit 145 generates a TS packet containing this section data and inserts it into the transport stream TS.

図２１は、マルチプレクサ１０４の処理フローを示す。この例は、複数の階層を低階層組と高階層組の２つに分割する例である。マルチプレクサ１０４は、ステップＳＴ１１において、処理を開始し、その後に、ステップＳＴ１２の処理に移る。このステップＳＴ１２において、マルチプレクサ１０４は、ビデオストリーム（ビデオエレメンタリストリーム）の各ピクチャのtemporal_id_、構成する符号化ストリーム数を設定する。 FIG. 21 shows the processing flow of the multiplexer 104. This example is an example of dividing a plurality of layers into two groups, a low layer group and a high layer group. The multiplexer 104 starts processing in step ST11, and then moves on to processing in step ST12. In this step ST12, the multiplexer 104 sets the temporary_id_ of each picture of the video stream (video elementary stream) and the number of coded streams to be composed.

次に、マルチプレクサ１０４は、ステップＳＴ１３において、低階層組のピクチャ、あるいは低階層組のピクチャを含むビデオストリームを多重化する際の「transport_priority」を“１”に設定する。また、マルチプレクサ１０４は、ステップＳＴ１４において、ＨＲＤ情報（cpu_removal_delay、dpb_output_delay）を参照して、ＤＴＳ、ＰＴＳを決め、ＰＥＳヘッダに挿入する。 Next, in step ST13, the multiplexer 104 sets the “transport_priority” when multiplexing the low-layer set of pictures or the video stream including the low-layer set of pictures to “1”. Further, in step ST14, the multiplexer 104 determines the DTS and PTS with reference to the HRD information (cpu_removal_delay, dpb_output_delay) and inserts them into the PES header.

次に、マルチプレクサ１０４は、ステップＳＴ１５において、シングルストリーム（単一ビデオストリーム）か否かを判断する。シングルストリームであるとき、マルチプレクサ１０４は、ステップＳＴ１６において、１つのＰＩＤ（パケット識別子）で多重化処理を進めることとし、その後に、ステップＳＴ１７の処理に移る。一方、シングルストリームでないとき、マルチプレクサ１０４は、ステップＳＴ１８において、複数のパケットＰＩＤ（パケット識別子）で多重化処理を進めることとし、その後に、ステップＳＴ１７の処理に移る。 Next, the multiplexer 104 determines in step ST15 whether or not it is a single stream (single video stream). When it is a single stream, the multiplexer 104 decides to proceed with the multiplexing process with one PID (packet identifier) in step ST16, and then moves to the process of step ST17. On the other hand, when it is not a single stream, the multiplexer 104 decides to proceed with the multiplexing process with a plurality of packet PIDs (packet identifiers) in step ST18, and then proceeds to the process of step ST17.

このステップＳＴ１７において、マルチプレクサ１０４は、ＨＥＶＣデスクリプタ、スケーラビリティ・エクステンション・デスクリプタなどをコーディングする。そして、マルチプレクサ１０４は、ステップＳＴ１９において、ビデオストリームをＰＥＳペイロードに挿入してＰＥＳパケット化し、その後、ステップＳＴ２０において、トランスポートパケット化し、トランスストリームＴＳを得る。その後、マルチプレクサ１０４は、ステップＳＴ２１において、処理を終了する。 In this step ST17, the multiplexer 104 codes a HEVC descriptor, a scalability extension descriptor, and the like. Then, in step ST19, the multiplexer 104 inserts the video stream into the PES payload to form a PES packet, and then in step ST20, transport packetizes the video stream to obtain a transport stream TS. After that, the multiplexer 104 ends the process in step ST21.

図２２は、単一ビデオストリームによる配信を行う場合のトランスポートストリームＴＳの構成例を示している。このトランスポートストリームＴＳには、単一ビデオストリームが含まれている。すなわち、この構成例では、複数の階層のピクチャの例えばＨＥＶＣによる符号化画像データを持つビデオストリームのＰＥＳパケット「video PES1」が存在すると共に、オーディオストリームのＰＥＳパケット「audio PES1」が存在する FIG. 22 shows a configuration example of the transport stream TS in the case of distribution by a single video stream. This transport stream TS includes a single video stream. That is, in this configuration example, a video stream PES packet "video PES1" having, for example, HEVC-encoded image data of a plurality of layers of pictures exists, and an audio stream PES packet "audio PES1" exists.

この単一のビデオストリームには、階層符号化の複数の階層が２以上の所定数の階層組に分割されて得られた所定数のサブストリームが含まれる。ここで、最下位の階層組のサブストリーム（ベースサブストリーム）にはＳＰＳが含まれ、最下位より上位の階層組のサブストリーム（エンハンスサブストリーム）にはＥＳＰＳが含まれる。そして、ＳＰＳ，ＥＳＰＳの要素の“general_level_idc”の値は、自己の階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値とされる。 This single video stream includes a predetermined number of substreams obtained by dividing a plurality of layers of hierarchical coding into a predetermined number of hierarchical sets of two or more. Here, the substream (base substream) of the lowest hierarchy group includes SPS, and the substream of the hierarchy group higher than the lowest level (enhanced substream) includes EPSS. Then, the value of "general_level_idc" of the elements of SPS and EPSS is set to the level value including the pictures of all the layers included in the hierarchy group below the own hierarchy set.

各ピクチャの符号化画像データには、ＶＰＳ、ＳＰＳ、ＥＳＰＳ、ＳＥＩなどのＮＡＬユニットが存在する。上述したように、各ピクチャのＮＡＬユニットのヘッダには、そのピクチャの階層を示すtemporal_idが挿入されている。また、例えば、ＳＰＳ，ＥＳＰＳにはビットレートのレベル指定値（general_level_idc）が含まれている。また、例えば、ピクチャ・タイミング・ＳＥＩ（Picture timing SEI）には、「cpb_removal_delay」と「dpb_output_delay」が含まれている。 The coded image data of each picture includes NAL units such as VPS, SPS, EPSS, and SEI. As described above, a temporary_id indicating the hierarchy of the picture is inserted in the header of the NAL unit of each picture. Further, for example, SPS and EPSS include a bit rate level specified value (general_level_idc). Further, for example, the picture timing SEI (Picture timing SEI) includes "cpb_removal_delay" and "dpb_output_delay".

なお、各ピクチャの符号化画像データをコンテナするＴＳパケットのヘッダに「transport_priority」の１ビットの優先度を示すフィールドが存在する。この「transport_priority」により、コンテナする符号化画像データが、低階層組のピクチャのものか、あるいは高階層組のピクチャのものかが識別可能である。 In the header of the TS packet that containers the encoded image data of each picture, there is a field indicating the priority of 1 bit of "transport_priority". By this "transport_priority", it is possible to identify whether the coded image data to be containerized is a picture of a low-layer group or a picture of a high-layer group.

また、トランスポートストリームＴＳには、ＰＳＩ（Program Specific Information）として、ＰＭＴ（Program Map Table）が含まれている。このＰＳＩは、トランスポートストリームに含まれる各エレメンタリストリームがどのプログラムに属しているかを記した情報である。 Further, the transport stream TS includes PMT (Program Map Table) as PSI (Program Specific Information). This PSI is information describing which program each elementary stream included in the transport stream belongs to.

ＰＭＴには、プログラム全体に関連する情報を記述するプログラム・ループ（Program loop）が存在する。また、ＰＭＴには、各エレメンタリストリームに関連した情報を持つエレメンタリ・ループが存在する。この構成例では、ビデオエレメンタリ・ループ（video ES1 loop）が存在すると共に、オーディオエレメンタリ・ループ（audio ES1 loop）が存在する。 In PMT, there is a program loop that describes information related to the entire program. In addition, the PMT has an elemental loop having information related to each elemental stream. In this configuration example, there is a video elemental loop (video ES1 loop) and an audio elemental loop (audio ES1 loop).

ビデオエレメンタリ・ループには、ビデオストリーム（video PES1）に対応して、ストリームタイプ、パケット識別子（PID）等の情報が配置されると共に、そのビデオストリームに関連する情報を記述するデスクリプタも配置される。このデスクリプタの一つとして、上述したＨＥＶＣデスクリプタ（HEVC_descriptor）、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）が挿入される。 In the video elemental loop, information such as the stream type and packet identifier (PID) is arranged corresponding to the video stream (video PES1), and a descriptor that describes the information related to the video stream is also arranged. To. As one of the descriptors, the above-mentioned HEVC descriptor (HEVC_descriptor) and the scalability extension descriptor (scalability_extension_descriptor) are inserted.

図２３は、図３の階層符号化の例において、階層０〜３のピクチャでベースサブストリーム（B stream）が生成され、階層４のピクチャでエンハンスサブストリーム（E stream）が生成される場合を示している。この場合、ベースサブストリームに含まれる各ピクチャは６０Ｐを構成し、エンハンスサブストリーム（E stream）に含まれる各ピクチャは、ベースサブストリームに含まれる各ピクチャに追加されてＰＥＳ全体で１２０Ｐを構成する。 FIG. 23 shows a case where the base substream (B stream) is generated in the pictures of the layers 0 to 3 and the enhanced substream (E stream) is generated in the pictures of the layer 4 in the example of the hierarchical coding of FIG. Shown. In this case, each picture included in the base substream constitutes 60P, and each picture included in the enhanced substream (E stream) is added to each picture included in the base substream to form 120P in the entire PES. ..

ベースサブストリームのピクチャは、「ＡＵＤ」、「ＶＰＳ」、「ＳＰＳ」、「ＰＰＳ」、「ＰＳＥＩ」、「ＳＬＩＣＥ」、「ＳＳＥＩ」、「ＥＯＳ」などのＮＡＬユニットにより構成される。「ＶＰＳ」、「ＳＰＳ」は、例えば、ＧＯＰの先頭ピクチャに挿入される。ＳＰＳの要素の“general_level_idc”の値は、“level5.1”とされる。なお、「ＥＯＳ」はなくてもよい。 The base substream picture is composed of NAL units such as "AUD", "VPS", "SPS", "PPS", "PSEI", "SLICE", "SSEI", and "EOS". The "VPS" and "SPS" are inserted into, for example, the first picture of the GOP. The value of "general_level_idc" of the SPS element is "level5.1". It should be noted that "EOS" may not be present.

一方、エンハンスサブストリームのピクチャは、「ＡＵＤ」、「ＥＳＰＳ」、「ＰＰＳ」、「ＰＳＥＩ」、「ＳＬＩＣＥ」、「ＳＳＥＩ」、「ＥＯＳ」などのＮＡＬユニットにより構成される。なお、「ＥＳＰＳ」は、例えば、ＧＯＰの先頭ピクチャに挿入される。ＥＳＰＳの要素の“general_level_idc”の値は、“level5.2”とされる。なお、「ＰＳＥＩ」、「ＳＳＥＩ」、「ＥＯＳ」はなくてもよい。 On the other hand, the enhanced substream picture is composed of NAL units such as "AUD", "ESPS", "PPS", "PSEI", "SLICE", "SSEI", and "EOS". In addition, "ESPS" is inserted in the first picture of GOP, for example. The value of "general_level_idc" of the element of ECSP is set to "level5.2". It should be noted that "PSEI", "SSEI", and "EOS" may be omitted.

「video ES1 loop」には、ビデオストリーム（video PES1）に対応して、ストリームタイプ、パケット識別子（PID）等の情報が配置されると共に、そのビデオストリームに関連する情報を記述するデスクリプタも配置される。このストリームタイプは、ベースストリームを示す“０ｘ２４”とされる。また、デスクリプタの一つとして、上述したＨＥＶＣデスクリプタが挿入される。 In the "video ES1 loop", information such as the stream type and packet identifier (PID) is arranged corresponding to the video stream (video PES1), and a descriptor that describes the information related to the video stream is also arranged. To. This stream type is set to "0x24" indicating the base stream. Further, as one of the descriptors, the above-mentioned HEVC descriptor is inserted.

「level_constrained_flag」の１ビットフィールドは、“１”とされる。これにより、「該当サブストリームにＳＰＳあるいはＥＳＰＳが存在し、その要素の“general_level_idc”は、そのサブストリームが含むtemporal_id 以下のピクチャを含むレベル値をもつ」ことが示される。また、「level_idc」の値は、ビデオストリーム（video PES1）の全体のレベル値を示す“level5.2”とされる。また、「temporal_id_min」は０とされ、「temporal_id_max」は４とされ、ビデオストリーム（video PES1）に階層０〜４のピクチャが含まれていることが示される。 The 1-bit field of "level_constrained_flag" is set to "1". This indicates that "SPS or ESPS exists in the relevant substream, and the" general_level_idc "of the element has a level value including the pictures below the temporary_id included in the substream." The value of "level_idc" is "level5.2", which indicates the overall level value of the video stream (video PES1). Further, "temporal_id_min" is set to 0, "temporal_id_max" is set to 4, and it is shown that the video stream (video PES1) contains pictures of layers 0 to 4.

このような単一ビデオストリームによる配信が行われる場合、受信側では、「level_constrained_flag」、ＳＰＳ、ＥＳＰＳの要素の“general_level_idc”などに基づいて、各サブストリームが自身のデコーダ処理能力の範囲内にあるか否かが判断され、範囲内にあるサブストリームのデコードが行われる。 When distribution by such a single video stream is performed, on the receiving side, each substream is within the range of its own decoder processing capacity based on "level_constrained_flag", "general_level_idc" of SPS, EPSS elements, and the like. Whether or not it is determined, and the substream within the range is decoded.

図２４は、複数ストリーム、ここでは２ストリームによる配信を行う場合のトランスポートストリームＴＳの構成例を示している。このトランスポートストリームＴＳには、２つのビデオストリームが含まれている。すなわち、この構成例では、複数の階層が低階層組と高階層組の２つの階層組に分割され、２つの階層組のピクチャの例えばＨＥＶＣによる符号化画像データを持つビデオストリームのＰＥＳパケット「video PES1」、「video PES2」が存在すると共に、オーディオストリームのＰＥＳパケット「audio PES1」が存在する。 FIG. 24 shows a configuration example of the transport stream TS in the case of distribution by a plurality of streams, here, two streams. This transport stream TS contains two video streams. That is, in this configuration example, a plurality of layers are divided into two layers, a low layer group and a high layer group, and a PES packet "video" of a video stream having pictures of the two layers, for example, encoded image data by HEVC. Along with the existence of "PES1" and "video PES2", the PES packet "audio PES1" of the audio stream exists.

この２つのビデオストリームには、階層符号化の複数の階層が２つの階層組に分割されて得られた２つのサブストリームのそれぞれが含まれる。ここで、下位側の階層組のサブストリーム（ベースサブストリーム）にはＳＰＳが含まれ、上位側の階層組のサブストリーム（エンハンスサブストリーム）にはＥＳＰＳが含まれる。そして、ＳＰＳ，ＥＳＰＳの要素の“general_level_idc”の値は、自己の階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値とされる。 The two video streams include each of two substreams obtained by dividing a plurality of layers of hierarchical coding into two hierarchical sets. Here, the substream (base substream) of the lower hierarchical group includes SPS, and the substream of the upper hierarchical group (enhanced substream) includes EPSS. Then, the value of "general_level_idc" of the elements of SPS and EPSS is set to the level value including the pictures of all the layers included in the hierarchy group below the own hierarchy set.

各ピクチャの符号化画像データには、ＳＰＳ、ＥＳＰＳなどのＮＡＬユニットが存在する。上述したように、各ピクチャのＮＡＬユニットのヘッダには、そのピクチャの階層を示すtemporal_idが挿入されている。また、例えば、ＳＰＳ、ＥＳＰＳにはビットレートのレベル指定値（general_level_idc）が含まれている。また、例えば、ピクチャ・タイミング・ＳＥＩ（Picture timing SEI）には、「cpb_removal_delay」と「dpb_output_delay」が含まれている。 NAL units such as SPS and ESPS are present in the coded image data of each picture. As described above, a temporary_id indicating the hierarchy of the picture is inserted in the header of the NAL unit of each picture. Further, for example, SPS and EPSS include a bit rate level specified value (general_level_idc). Further, for example, the picture timing SEI (Picture timing SEI) includes "cpb_removal_delay" and "dpb_output_delay".

また、各ピクチャの符号化画像データをコンテナするＴＳパケットのヘッダに「transport_priority」の１ビットの優先度を示すフィールドが存在する。この「transport_priority」により、コンテナする符号化画像データが、低階層組のピクチャのものか、あるいは高階層組のピクチャのものかが識別可能である。 In addition, there is a field indicating the priority of 1 bit of "transport_priority" in the header of the TS packet that containers the encoded image data of each picture. By this "transport_priority", it is possible to identify whether the coded image data to be containerized is a picture of a low-layer group or a picture of a high-layer group.

ＰＭＴには、プログラム全体に関連する情報を記述するプログラム・ループ（Program loop）が存在する。また、ＰＭＴには、各エレメンタリストリームに関連した情報を持つエレメンタリ・ループが存在する。この構成例では、２つのビデオエレメンタリ・ループ（video ES1 loop, video ES2 loop ）が存在すると共に、オーディオエレメンタリ・ループ（audio ES1 loop）が存在する。 In PMT, there is a program loop that describes information related to the entire program. In addition, the PMT has an elemental loop having information related to each elemental stream. In this configuration example, there are two video elemental loops (video ES1 loop, video ES2 loop) and an audio elemental loop (audio ES1 loop).

各ビデオエレメンタリ・ループには、ビデオストリーム（video PES1, video PES2）に対応して、ストリームタイプ、パケット識別子（PID）等の情報が配置されると共に、そのビデオストリームに関連する情報を記述するデスクリプタも配置される。このデスクリプタの一つとして、上述したＨＥＶＣデスクリプタ（HEVC_descriptor）、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）が挿入される。 In each video elemental loop, information such as stream type and packet identifier (PID) is arranged corresponding to the video stream (video PES1, video PES2), and information related to the video stream is described. Descriptors are also placed. As one of the descriptors, the above-mentioned HEVC descriptor (HEVC_descriptor) and the scalability extension descriptor (scalability_extension_descriptor) are inserted.

図２５は、図３の階層符号化の例において、階層０〜３のピクチャでベースサブストリーム（B stream）が生成され、階層４のピクチャでエンハンスサブストリーム（E stream）が生成される場合を示している。この場合、ベースサブストリームに含まれる各ピクチャは６０Ｐを構成し、エンハンスサブストリーム（E stream）に含まれる各ピクチャは、ベースサブストリームに含まれる各ピクチャに追加されてＰＥＳ全体で１２０Ｐを構成する。 FIG. 25 shows a case where the base substream (B stream) is generated in the pictures of the layers 0 to 3 and the enhanced substream (E stream) is generated in the pictures of the layer 4 in the example of the hierarchical coding of FIG. Shown. In this case, each picture included in the base substream constitutes 60P, and each picture included in the enhanced substream (E stream) is added to each picture included in the base substream to form 120P in the entire PES. ..

ベースサブストリームのピクチャは、「ＡＵＤ」、「ＶＰＳ」、「ＳＰＳ」、「ＰＰＳ」、「ＰＳＥＩ」、「ＳＬＩＣＥ」、「ＳＳＥＩ」、「ＥＯＳ」などのＮＡＬユニットにより構成される。なお、「ＶＰＳ」、「ＳＰＳ」は、例えば、ＧＯＰの先頭ピクチャに挿入される。ＳＰＳの要素の“general_level_idc”の値は、“level5.1”とされる。なお、「ＥＯＳ」はなくてもよい。 The base substream picture is composed of NAL units such as "AUD", "VPS", "SPS", "PPS", "PSEI", "SLICE", "SSEI", and "EOS". Note that "VPS" and "SPS" are inserted into, for example, the first picture of the GOP. The value of "general_level_idc" of the SPS element is "level5.1". It should be noted that "EOS" may not be present.

「level_constrained_flag」の１ビットフィールドは、“１”とされる。これにより、「該当サブストリームにＳＰＳあるいはＥＳＰＳが存在し、その要素の“general_level_idc”は、そのサブストリームが含むtemporal_id 以下のピクチャを含むレベル値をもつ」ことが示される。また、「level_idc」の値は、ベースサブストリーム（B stream）のレベル値を示す“level5.1”とされる。また、「temporal_id_min」は０とされ、「temporal_id_max」は３とされ、ベースサブストリーム（B stream）に階層０〜３のピクチャが含まれていることが示される。 The 1-bit field of "level_constrained_flag" is set to "1". This indicates that "SPS or ESPS exists in the relevant substream, and the" general_level_idc "of the element has a level value including the pictures below the temporary_id included in the substream." The value of "level_idc" is "level5.1" indicating the level value of the base substream (B stream). Further, "temporal_id_min" is set to 0, "temporal_id_max" is set to 3, and it is shown that the base substream (B stream) contains pictures of layers 0 to 3.

「video ES2 loop」には、ビデオストリーム（video PES2）に対応して、ストリームタイプ、パケット識別子（PID）等の情報が配置されると共に、そのビデオストリームに関連する情報を記述するデスクリプタも配置される。このストリームタイプは、エンハンススストリームを示す“０ｘ２５”とされる。また、デスクリプタの一つとして、上述したＨＥＶＣデスクリプタが挿入される。 In the "video ES2 loop", information such as the stream type and packet identifier (PID) is arranged corresponding to the video stream (video PES2), and a descriptor that describes the information related to the video stream is also arranged. To. This stream type is set to "0x25" indicating an enhanced stream. Further, as one of the descriptors, the above-mentioned HEVC descriptor is inserted.

「level_constrained_flag」の１ビットフィールドは、“１”とされる。これにより、「該当サブストリームにＳＰＳあるいはＥＳＰＳが存在し、その要素の“general_level_idc”は、そのサブストリームが含むtemporal_id 以下のピクチャを含むレベル値をもつ」ことが示される。また、「level_idc」の値は、ベースサブストリーム（B stream）およびエンハンスストリーム（E stream）のレベル値を示す“level5.2”とされる。また、「temporal_id_min」は４とされ、「temporal_id_max」は４とされ、エンハンスストリーム（E stream））に階層４のピクチャが含まれていることが示される。 The 1-bit field of "level_constrained_flag" is set to "1". This indicates that "SPS or ESPS exists in the relevant substream, and the" general_level_idc "of the element has a level value including the pictures below the temporary_id included in the substream." The value of "level_idc" is set to "level5.2" indicating the level values of the base substream (B stream) and the enhancement stream (E stream). Further, "temporal_id_min" is set to 4, "temporal_id_max" is set to 4, and it is shown that the enhanced stream (E stream) contains the picture of the layer 4.

このような複数ビデオストリームによる配信が行われる場合、受信側では、「level_constrained_flag」、ＳＰＳ、ＥＳＰＳの要素の“general_level_idc”などに基づいて、各サブストリームが自身のデコーダ処理能力の範囲内にあるか否かが判断され、範囲内にあるサブストリームのデコードが行われる。 When distribution by such multiple video streams is performed, on the receiving side, is each substream within the range of its own decoder processing capacity based on "level_constrained_flag", "general_level_idc" of SPS, EPSS elements, and the like? Whether or not it is determined, and the substream within the range is decoded.

図２６は、図３の階層符号化の例において、階層０〜３のピクチャでベースサブストリーム（B stream）が生成され、階層４のピクチャでエンハンスサブストリーム（E stream）が生成される場合におけるトランスポートストリームＴＳの他の構成例を示している。この場合、ベースサブストリームに含まれる各ピクチャは６０Ｐを構成し、エンハンスサブストリーム（E stream）に含まれる各ピクチャは、ベースサブストリームに含まれる各ピクチャに追加されてＰＥＳ全体で１２０Ｐを構成する。 FIG. 26 shows a case where the base substream (B stream) is generated in the pictures of the layers 0 to 3 and the enhanced substream (E stream) is generated in the pictures of the layer 4 in the example of the hierarchical coding of FIG. Another configuration example of the transport stream TS is shown. In this case, each picture included in the base substream constitutes 60P, and each picture included in the enhanced substream (E stream) is added to each picture included in the base substream to form 120P in the entire PES. ..

ベースサブストリームのピクチャは、「ＡＵＤ」、「ＶＰＳ」、「ＳＰＳ」、「ＰＰＳ」、「ＰＳＥＩ」、「ＳＬＩＣＥ」、「ＳＳＥＩ」、「ＥＯＳ」などのＮＡＬユニットにより構成される。なお、「ＶＰＳ」、「ＳＰＳ」は、例えば、ＧＯＰの先頭ピクチャに挿入される。ＳＰＳの要素の“general_level_idc”の値は、“level5.2”とされる。この場合、ＳＰＳの要素の“sub_layer_level_present_flag”は“１”とされ、“sublayer_level_idc[3]”で、ベースサブストリームのレベル値“level5.1”が示される。なお、「ＥＯＳ」はなくてもよい。 The base substream picture is composed of NAL units such as "AUD", "VPS", "SPS", "PPS", "PSEI", "SLICE", "SSEI", and "EOS". Note that "VPS" and "SPS" are inserted into, for example, the first picture of the GOP. The value of "general_level_idc" of the SPS element is "level5.2". In this case, the "sub_layer_level_present_flag" of the SPS element is set to "1", and the level value "level5.1" of the base substream is indicated by "sublayer_level_idc [3]". It should be noted that "EOS" may not be present.

エンハンスサブストリームのピクチャは、「ＡＵＤ」、「ＰＰＳ」、「ＳＬＩＣＥ」などのＮＡＬユニットにより構成される。しかし、図２５におけるような「ＥＳＰＳ」のＮＡＬユニットは存在しない。 The enhanced substream picture is composed of NAL units such as "AUD", "PPS", and "SLICE". However, there is no "ESPS" NAL unit as shown in FIG.

図２５におけるような「level_constrained_flag」は存在しない。「level_idc」の値は、ベースサブストリーム（B stream）のレベル値を示す“level5.1”とされる。また、「temporal_id_min」は０とされ、「temporal_id_max」は３とされ、ベースサブストリーム（B stream）に階層０〜３のピクチャが含まれていることが示される。 There is no "level_constrained_flag" as in FIG. 25. The value of "level_idc" is set to "level5.1" indicating the level value of the base substream (B stream). Further, "temporal_id_min" is set to 0, "temporal_id_max" is set to 3, and it is shown that the base substream (B stream) contains pictures of layers 0 to 3.

図２５におけるような「level_constrained_flag」は存在しない。「level_idc」の値は、ベースサブストリーム（B stream）およびエンハンスストリーム（E stream）のレベル値を示す“level5.2”とされる。また、「temporal_id_min」は４とされ、「temporal_id_max」は４とされ、エンハンスストリーム（E stream））に階層４のピクチャが含まれていることが示される。 There is no "level_constrained_flag" as in FIG. 25. The value of "level_idc" is set to "level5.2" indicating the level value of the base substream (B stream) and the enhancement stream (E stream). Further, "temporal_id_min" is set to 4, "temporal_id_max" is set to 4, and it is shown that the enhanced stream (E stream) contains the picture of the layer 4.

このような複数ビデオストリームによる配信が行われる場合、受信側では、ＳＰＳの要素の“general_level_idc”、“sublayer_level_idc”などに基づいて、各サブストリームが自身のデコーダ処理能力の範囲内にあるか否かが判断され、範囲内にあるサブストリームのデコードが行われる。 When distribution by such multiple video streams is performed, on the receiving side, whether or not each substream is within the range of its own decoder processing capacity based on the SPS elements "general_level_idc", "sublayer_level_idc", etc. Is determined, and the substream within the range is decoded.

図２に戻って、送信部１０５は、トランスポートストリームＴＳを、例えば、ＱＰＳＫ／ＯＦＤＭ等の放送に適した変調方式で変調し、ＲＦ変調信号を送信アンテナから送信する。 Returning to FIG. 2, the transmission unit 105 modulates the transport stream TS by a modulation method suitable for broadcasting such as QPSK / OFDM, and transmits an RF modulated signal from the transmission antenna.

図２に示す送信装置１００の動作を簡単に説明する。エンコーダ１０２には、非圧縮の動画像データが入力される。エンコーダ１０２では、この動画像データに対して、階層符号化が行われる。すなわち、エンコーダ１０２では、この動画像データを構成する各ピクチャの画像データが複数の階層に分類されて符号化され、各階層のピクチャの符号化画像データを持つビデオストリームが生成される。この際、参照するピクチャが、自己階層および／または自己階層よりも下位の階層に所属するように、符号化される。 The operation of the transmission device 100 shown in FIG. 2 will be briefly described. Uncompressed moving image data is input to the encoder 102. In the encoder 102, hierarchical coding is performed on the moving image data. That is, in the encoder 102, the image data of each picture constituting the moving image data is classified into a plurality of layers and encoded, and a video stream having the encoded image data of the pictures of each layer is generated. At this time, the referenced picture is encoded so that it belongs to the self-hierarchy and / or the hierarchy lower than the self-hierarchy.

エンコーダ１０２では、各階層のピクチャの符号化画像データを持つビデオストリームが生成される。例えば、エンコーダ１０２では、複数の階層が２以上の所定数の階層組に分割され、各階層組に対応したサブストリームのそれぞれを含む所定数のビデオストリームが生成されるか、または、各階層組に対応したサブストリームの全てを含む単一のビデオストリームが生成される。 The encoder 102 generates a video stream having coded image data of pictures in each layer. For example, in the encoder 102, a plurality of layers are divided into a predetermined number of layer sets of two or more, and a predetermined number of video streams including each of the substreams corresponding to each layer set are generated, or each layer set is generated. A single video stream containing all of the corresponding substreams is generated.

エンコーダ１０２では、各階層組のピクチャの符号化画像データに、所属階層組を識別するための識別情報が付加される。この場合、例えば、識別情報として、ＳＰＳ，ＥＳＰＳの要素である「general_level_idc」が利用される。ＳＰＳは、最下位の階層組のサブストリーム（ベースサブストリーム）に、シーケンス（ＧＯＰ）毎に含まれる。一方、ＥＳＰＳは、最下位より上位の階層組のサブストリーム（エンハンスサブストリーム）に、シーケンス（ＧＯＰ）毎に含まれる。ＳＰＳ，ＥＳＰＳに含まれる「general_level_idc」の値は、高階層側の階層組ほど高い値とされる。例えば、各階層組のサブストリームのＳＰＳ，ＥＳＰＳに挿入される「general_level_idc」の値は、自己の階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値とされる。 In the encoder 102, identification information for identifying the belonging layer set is added to the coded image data of the picture of each layer set. In this case, for example, "general_level_idc", which is an element of SPS and EPSS, is used as the identification information. The SPS is included in the substream (base substream) of the lowest hierarchical set for each sequence (GOP). On the other hand, ECSP is included in each sequence (GOP) in the substream (enhanced substream) of the hierarchical set higher than the lowest. The value of "general_level_idc" included in SPS and ESPS is set to be higher in the higher hierarchy side. For example, the value of "general_level_idc" inserted into the SPS and ESPS of the substream of each layer set is a level value including pictures of all layers included in the layer set below the own layer set.

エンコーダ１０２で生成された、各階層のピクチャの符号化データを含むビデオストリームは、圧縮データバッファ（ｃｐｂ）１０３に供給され、一時的に蓄積される。マルチプレクサ１０４では、圧縮データバッファ１０３に蓄積されているビデオストリームが読み出され、ＰＥＳパケット化され、さらにトランスポートパケット化されて多重され、多重化ストリームとしてのトランスポートストリームＴＳが得られる。 The video stream including the coded data of the pictures of each layer generated by the encoder 102 is supplied to the compressed data buffer (cpb) 103 and temporarily stored. In the multiplexer 104, the video stream stored in the compressed data buffer 103 is read out, converted into a PES packet, further converted into a transport packet and multiplexed, and the transport stream TS as a multiplexed stream is obtained.

このトランスポートストリームＴＳには、各階層のピクチャの符号化画像データを持つ単一のビデオストリーム、あるいは２以上の所定数のビデオストリームが含まれる。マルチプレクサ１０４では、トランスポートストリームＴＳに、階層情報、ストリーム構成情報が挿入される。すなわち、マルチプレクサ１０４では、各ビデオストリームに対応したビデオエレメンタリ・ループに、ＨＥＶＣデスクリプタ（HEVC_descriptor）、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）が挿入される。 The transport stream TS includes a single video stream having encoded image data of pictures in each layer, or two or more predetermined number of video streams. In the multiplexer 104, hierarchical information and stream configuration information are inserted into the transport stream TS. That is, in the multiplexer 104, the HEVC descriptor (HEVC_descriptor) and the scalability extension descriptor (scalability_extension_descriptor) are inserted into the video elemental loop corresponding to each video stream.

また、マルチプレクサ１０４では、低階層側の階層組のピクチャの符号化画像データをコンテナするＴＳパケットの優先度ほど高く設定される。マルチプレクサ１０４では、例えば、複数の階層を低階層組と高階層組に二分される場合、ＴＳパケットヘッダの「transport_priority」の１ビットフィールドが利用されて優先度が設定される。 Further, in the multiplexer 104, the priority of the TS packet that containers the encoded image data of the pictures of the lower layer set is set higher. In the multiplexer 104, for example, when a plurality of layers are divided into a low-layer group and a high-layer group, the 1-bit field of "transport_priority" of the TS packet header is used to set the priority.

マルチプレクサ１０４で生成されるトランスポートストリームＴＳは、送信部１０５に送られる。送信部１０５では、このトランスポートストリームＴＳが、例えば、ＱＰＳＫ／ＯＦＤＭ等の放送に適した変調方式で変調され、ＲＦ変調信号が送信アンテナから送信される。 The transport stream TS generated by the multiplexer 104 is sent to the transmission unit 105. In the transmission unit 105, the transport stream TS is modulated by a modulation method suitable for broadcasting such as QPSK / OFDM, and an RF modulation signal is transmitted from the transmission antenna.

「受信装置の構成」
図２７は、受信装置２００の構成例を示している。この受信装置２００は、ＣＰＵ（Central Processing Unit）２０１と、受信部２０２と、デマルチプレクサ２０３と、圧縮データバッファ（ｃｐｂ：coded picture buffer）２０４を有している。また、この受信装置２００は、デコーダ２０５と、非圧縮データバッファ（ｄｐｂ：decoded picture buffer）２０６と、ポスト処理部２０７を有している。ＣＰＵ２０１は、制御部を構成し、受信装置２００の各部の動作を制御する。 "Receiver configuration"
FIG. 27 shows a configuration example of the receiving device 200. The receiving device 200 includes a CPU (Central Processing Unit) 201, a receiving unit 202, a demultiplexer 203, and a compressed data buffer (cpb: coded picture buffer) 204. Further, the receiving device 200 has a decoder 205, an uncompressed data buffer (dpb: decoded picture buffer) 206, and a post processing unit 207. The CPU 201 constitutes a control unit and controls the operation of each unit of the receiving device 200.

受信部２０２は、受信アンテナで受信されたＲＦ変調信号を復調し、トランスポートストリームＴＳを取得する。デマルチプレクサ２０３は、トランスポートストリームＴＳから、デコード能力（Decoder temporal layer capability）に応じた階層組のピクチャの符号化画像データを選択的に取り出し、圧縮データバッファ（ｃｐｂ：coded picture buffer）２０４に送る。 The receiving unit 202 demodulates the RF modulated signal received by the receiving antenna and acquires the transport stream TS. The demultiplexer 203 selectively extracts the encoded image data of the layered picture according to the decoding capability (Decoder temporal layer capability) from the transport stream TS and sends it to the compressed data buffer (cpb: coded picture buffer) 204. ..

図２８は、デマルチプレクサ２０３の構成例を示している。このデマルチプレクサ２０３は、ＰＣＲ抽出部２３１と、タイムスタンプ抽出部２３２と、セクション抽出部２３３と、ＴＳプライオリティ抽出部２３４と、ＰＥＳペイロード抽出部２３５と、ピクチャ選択部２３６を有している。 FIG. 28 shows a configuration example of the demultiplexer 203. The demultiplexer 203 includes a PCR extraction unit 231, a time stamp extraction unit 232, a section extraction unit 233, a TS priority extraction unit 234, a PES payload extraction unit 235, and a picture selection unit 236.

ＰＣＲ抽出部２３１は、ＰＣＲ（Program Clock Reference）が含まれるＴＳパケットからＰＣＲを抽出し、ＣＰＵ２０１に送る。タイムスタンプ抽出部２３２は、ピクチャ毎にＰＥＳヘッダに挿入されているタイムスタンプ（ＤＴＳ、ＰＴＳ）を抽出し、ＣＰＵ２０１に送る。セクション抽出部２３３は、トランスポートストリームＴＳからセクションデータを抽出し、ＣＰＵ２０１に送る。このセクションデータには、上述したＨＥＶＣデスクリプタ（HEVC_descriptor）、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）などが含まれている。 The PCR extraction unit 231 extracts PCR from a TS packet containing PCR (Program Clock Reference) and sends it to CPU 201. The time stamp extraction unit 232 extracts the time stamps (DTS, PTS) inserted in the PES header for each picture and sends them to the CPU 201. The section extraction unit 233 extracts section data from the transport stream TS and sends it to the CPU 201. This section data includes the above-mentioned HEVC descriptor (HEVC_descriptor), scalability extension descriptor (scalability_extension_descriptor), and the like.

ＴＳプライオリティ抽出部２３４は、各ＴＳパケットに設定されている優先度情報を抽出する。この優先度は、上述したように、複数の階層を２以上の所定数の階層組に分割した場合における各階層組の優先度であり、低階層側の階層組ほど高く設定されている。例えば、低階層組と高階層組に２分されている場合、ＴＳパケットヘッダの「transport_priority」の１ビットフィールドの値が抽出される。この値は、低階層組では“１”とされ、高階層組では“０”に設定されている。 The TS priority extraction unit 234 extracts the priority information set in each TS packet. As described above, this priority is the priority of each layer group when a plurality of layers are divided into a predetermined number of layer groups of two or more, and is set higher as the layer group on the lower layer side. For example, when it is divided into a low-layer group and a high-level group, the value of the 1-bit field of "transport_priority" of the TS packet header is extracted. This value is set to "1" in the low-level group and "0" in the high-level group.

ＰＥＳペイロード抽出部２３５は、トランスポートストリームＴＳからＰＥＳペイロード、つまり、各階層のピクチャの符号化画像データを抽出する。ピクチャ選択部２３６は、ＰＥＳペイロード抽出部２３５で取り出される各階層のピクチャの符号化画像データから、デコード能力（Decoder temporal layer capability）に応じた階層組のピクチャの符号化画像データを選択的に取り出し、圧縮データバッファ（ｃｐｂ：coded picture buffer）２０４に送る。この場合、ピクチャ選択部２３６は、セクション抽出部２３３で得られる階層情報、ストリーム構成情報、ＴＳプライオリティ抽出部２３４で抽出される優先度情報を参照する。 The PES payload extraction unit 235 extracts the PES payload, that is, the encoded image data of the pictures of each layer from the transport stream TS. The picture selection unit 236 selectively extracts the coded image data of the layered pictures according to the decoding capability (Decoder temporal layer capability) from the coded image data of the pictures of each layer taken out by the PES payload extraction unit 235. , Send to compressed data buffer (cpb: coded picture buffer) 204. In this case, the picture selection unit 236 refers to the hierarchical information, the stream configuration information, and the priority information extracted by the TS priority extraction unit 234 obtained by the section extraction unit 233.

例えば、トランスポートストリームＴＳに含まれるビデオストリーム（符号化ストリーム）のフレームレートが１２０ｆｐｓである場合を考える。例えば、複数の階層が低階層側の階層組と高階層側の階層組とに２分割され、各階層組のピクチャのフレームレートがそれぞれ６０ｆｐｓであるとする。例えば、上述の図３に示す階層符号化例では、階層０から３は低階層側の階層組とされ、階層４は高階層側の階層組とされる。 For example, consider a case where the frame rate of the video stream (encoded stream) included in the transport stream TS is 120 fps. For example, it is assumed that a plurality of layers are divided into a lower layer side layer group and a higher layer side layer group, and the frame rate of the picture of each layer group is 60 fps. For example, in the hierarchical coding example shown in FIG. 3 described above, layers 0 to 3 are set on the lower layer side, and layer 4 is set as the layer set on the higher layer side.

ＴＳパケットのヘッダに含まれる「transport_priority」の１ビットフィールドは、ベースレイヤ、つまり低階層側の階層組のピクチャの符号化画像データをコンテナするＴＳパケットの場合には“１”に設定され、ノンベースレイヤ、つまり高階層側の階層組のピクチャの符号化画像データをコンテナするＴＳパケットの場合には“０”に設定されている。 The 1-bit field of "transport_priority" included in the header of the TS packet is set to "1" in the case of the TS packet that containers the encoded image data of the picture of the base layer, that is, the lower layer set, and is non- It is set to "0" in the case of a TS packet that containers the encoded image data of the base layer, that is, the picture of the higher layer set.

この場合、トランスポートストリームＴＳに、各階層のピクチャの符号化データを持つ単一のビデオストリーム（符号化ストリーム）（図１０参照）が含まれている場合がある。また、この場合、トランスポートストリームＴＳに、低階層側の階層組のピクチャの符号化画像データ持つベースストリーム（B-stream）と、高階層側の階層組のピクチャの符号化画像データを持つエンハンスストリーム（E-stream）の２つのビデオストリーム（符号化ストリーム）が含まれている場合（図１１参照）がある。 In this case, the transport stream TS may include a single video stream (encoded stream) (see FIG. 10) having encoded data of the pictures of each layer. Further, in this case, the transport stream TS has a base stream (B-stream) having encoded image data of the lower layer set of pictures and an enhancement having the encoded image data of the higher layer side layer set of pictures. There is a case (see FIG. 11) in which two video streams (encoded streams) of a stream (E-stream) are included.

ピクチャ選択部２３６は、例えば、デコード能力が、１２０Ｐ（１２０ｆｐｓ）に対応している場合、全階層のピクチャの符号化画像データを取り出し、圧縮データバッファ（ｃｐｂ）２０４に送る。一方、ピクチャ選択部２３６は、例えば、デコード能力が、１２０Ｐに対応していないが６０Ｐ（６０ｆｐｓ）に対応している場合、低階層側の階層組のピクチャの符号化画像データのみを取り出し、圧縮データバッファ（ｃｐｂ）２０４に送る。 For example, when the decoding ability corresponds to 120P (120 fps), the picture selection unit 236 takes out the encoded image data of the pictures of all layers and sends them to the compressed data buffer (cpb) 204. On the other hand, when the decoding ability does not correspond to 120P but corresponds to 60P (60fps), for example, the picture selection unit 236 extracts and compresses only the encoded image data of the pictures in the lower layer set. It is sent to the data buffer (cpb) 204.

図２９は、トランスポートストリームＴＳに単一のビデオストリーム（符号化ストリーム）が含まれている場合を示している。ここで、「High」は高階層側の階層組のピクチャを示し、「Low」は低階層側の階層組のピクチャを示す。また、「Ｐ」は「transport_priority」を示している。 FIG. 29 shows a case where the transport stream TS includes a single video stream (encoded stream). Here, "High" indicates a picture of a hierarchical group on the high-level side, and "Low" indicates a picture of a hierarchical group on the low-level side. Further, "P" indicates "transport_priority".

デコード能力が、１２０Ｐに対応している場合、ピクチャ選択部２３６は、全階層のピクチャの符号化画像データを取り出し、圧縮データバッファ（ｃｐｂ）２０４に送り、領域１（cpb_1）に蓄積する。一方、デコード能力が、１２０Ｐに対応していないが６０Ｐに対応している場合、「transport_priority」に基づくフィルタリングを行って、Ｐ＝１である低階層側の階層組のピクチャだけを取り出し、圧縮データバッファ（ｃｐｂ）２０４に送り、領域１（cpb_1）に蓄積する。 When the decoding ability corresponds to 120P, the picture selection unit 236 takes out the coded image data of the pictures of all layers, sends them to the compressed data buffer (cpb) 204, and stores them in the area 1 (cpb_1). On the other hand, when the decoding ability does not correspond to 120P but corresponds to 60P, filtering based on "transport_priority" is performed to extract only the pictures of the lower layer set with P = 1 and compressed data. It is sent to the buffer (cpb) 204 and accumulated in the area 1 (cpb_1).

図３０は、トランスポートストリームＴＳにベースストリームと拡張ストリームの２つのビデオストリーム（符号化ストリーム）が含まれている場合を示している。ここで、「High」は高階層側の階層組のピクチャを示し、「Low」は低階層側の階層組のピクチャを示す。また、「Ｐ」は「transport_priority」を示している。また、ベースストリームのパケット識別子（ＰＩＤ）はＰＩＤ１であり、拡張ストリームのパケット識別子（ＰＩＤ）はＰＩＤ２であるとする。 FIG. 30 shows a case where the transport stream TS includes two video streams (encoded streams), a base stream and an extended stream. Here, "High" indicates a picture of a hierarchical group on the high-level side, and "Low" indicates a picture of a hierarchical group on the low-level side. Further, "P" indicates "transport_priority". Further, it is assumed that the packet identifier (PID) of the base stream is PID1 and the packet identifier (PID) of the extended stream is PID2.

デコード能力が、１２０Ｐに対応している場合、ピクチャ選択部２３６は、全階層のピクチャの符号化画像データを取り出し、圧縮データバッファ（ｃｐｂ）２０４に送る。そして、低階層側の階層組のピクチャの符号化画像データは領域１（cpb_1）に蓄積し、低階層側の階層組のピクチャの符号化画像データは領域２（cpb_2）に蓄積する。 When the decoding ability corresponds to 120P, the picture selection unit 236 takes out the coded image data of the pictures of all layers and sends them to the compressed data buffer (cpb) 204. Then, the coded image data of the pictures of the lower layer set is stored in the area 1 (cpb_1), and the coded image data of the pictures of the lower layer side is stored in the area 2 (cpb_2).

一方、デコード能力が、１２０Ｐに対応していないが１２０Ｐに対応している場合、パケット識別子（ＰＩＤ）に基づくフィルタリングを行って、ＰＩＤ１である低階層側の階層組のピクチャだけを取り出し、圧縮データバッファ（ｃｐｂ）２０４に送り、領域１（cpb_1）に蓄積する。なお、この場合も、「transport_priority」に基づくフィルタリングを行ってもよい。 On the other hand, when the decoding ability does not correspond to 120P but corresponds to 120P, filtering is performed based on the packet identifier (PID), and only the pictures of the lower layer set which is PID1 are extracted and compressed data. It is sent to the buffer (cpb) 204 and accumulated in the area 1 (cpb_1). In this case as well, filtering based on "transport_priority" may be performed.

図３１は、デマルチプレクサ２０３の処理フローの一例を示している。この処理フローは、トランスポートストリームＴＳに単一のビデオストリーム（符号化ストリーム）が含まれている場合を示している。 FIG. 31 shows an example of the processing flow of the demultiplexer 203. This processing flow shows the case where the transport stream TS contains a single video stream (encoded stream).

デマルチプレクサ２０３は、ステップＳＴ３１において、処理を開始し、その後に、ステップＳＴ３２の処理に移る。このステップＳＴ３２おいて、ＣＰＵ２０１から、デコード能力（Decoder temporal layer capability）が設定される。次に、デマルチプレクサ２０３は、ステップＳＴ３３おいて、全階層をデコードする能力があるか否かを判断する。 The demultiplexer 203 starts the process in step ST31, and then moves to the process in step ST32. In this step ST32, the decoding capability (Decoder temporal layer capability) is set from the CPU 201. Next, the demultiplexer 203 determines in step ST33 whether or not it has the ability to decode all layers.

全階層をデコードする能力があるとき、デマルチプレクサ２０３は、ステップＳＴ３４において、該当ＰＩＤフィルタを通過する全ＴＳパケットをデマルチプレクスし、セクションパーシング（Section parsing）を行う。その後、デマルチプレクサ２０３は、ステップＳＴ３５の処理に移る。 When the demultiplexer 203 has the ability to decode all layers, in step ST34, the demultiplexer 203 demultiplexes all TS packets passing through the corresponding PID filter and performs section parsing. After that, the demultiplexer 203 moves to the process of step ST35.

ステップＳＴ３３で全階層をデコードする能力がないとき、デマルチプレクサ２０３は、ステップＳＴ３６において、「transport_priority」が“１”のＴＳパケットをデマルチプレクスし、セクションパーシング（Section parsing）を行う。その後、デマルチプレクサ２０３は、ステップＳＴ３５の処理に移る。 When the demultiplexer 203 does not have the ability to decode all layers in step ST33, the demultiplexer 203 demultiplexes the TS packet whose “transport_priority” is “1” in step ST36 and performs section parsing. After that, the demultiplexer 203 moves to the process of step ST35.

ステップＳＴ３５において、デマルチプレクサ２０３は、対象となるＰＩＤのセクションの中で、ＨＥＶＣデスクリプタ（HEVC_descriptor）、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）を読み、拡張ストリームの有無、スケーラブルタイプ、ストリームの数とＩＤ、temporal_idの最大、最小値、最小ターゲットデコーダ・レベルを得る。 In step ST35, the demultiplexer 203 reads the HEVC descriptor (HEVC_descriptor), the scalability_extension_descriptor (scalability_extension_descriptor) in the section of the target PID, and determines the presence / absence of the extension stream, the scalable type, the number and ID of the streams, and so on. Get the maximum, minimum, and minimum target decoder levels for temporal_id.

次に、デマルチプレクサ２０３は、ステップＳＴ３７で、ＰＩＤの対象となる符号化ストリームを圧縮データバッファ（ｃｐｂ）２０４へ転送すると共に、ＤＴＳ、ＰＴＳを、ＣＰＵ２０１に通知する。デマルチプレクサ２０３は、ステップＳＴ３７の処理の後、ステップＳＴ３８において、処理を終了する。 Next, in step ST37, the demultiplexer 203 transfers the coded stream to be PID to the compressed data buffer (cpb) 204, and notifies the CPU 201 of the DTS and PTS. The demultiplexer 203 ends the process in step ST38 after the process in step ST37.

図３２は、デマルチプレクサ２０３の処理フローの一例を示している。この処理フローは、トランスポートストリームＴＳにベースストリームと拡張ストリームの２つのビデオストリーム（符号化ストリーム）が含まれている場合を示している。 FIG. 32 shows an example of the processing flow of the demultiplexer 203. This processing flow shows the case where the transport stream TS includes two video streams (encoded streams), a base stream and an extended stream.

デマルチプレクサ２０３は、ステップＳＴ４１において、処理を開始し、その後に、ステップＳＴ４２の処理に移る。このステップＳＴ４２おいて、ＣＰＵ２０１から、デコード能力（Decoder temporal layer capability）が設定される。次に、デマルチプレクサ２０３は、ステップＳＴ４３おいて、全階層をデコードする能力があるか否かを判断する。 The demultiplexer 203 starts the process in step ST41, and then moves to the process in step ST42. In this step ST42, the decoding capability (Decoder temporal layer capability) is set from the CPU 201. Next, the demultiplexer 203 determines in step ST43 whether or not it has the ability to decode all layers.

全階層をデコードする能力があるとき、デマルチプレクサ２０３は、ステップＳＴ４４において、該当ＰＩＤフィルタを通過する全ＴＳパケットをデマルチプレクスし、セクションパーシング（Section parsing）を行う。その後、デマルチプレクサ２０３は、ステップＳＴ４５の処理に移る。 When the demultiplexer 203 has the ability to decode all layers, in step ST44, the demultiplexer 203 demultiplexes all TS packets passing through the corresponding PID filter and performs section parsing. After that, the demultiplexer 203 moves to the process of step ST45.

ステップＳＴ４３で全階層をデコードする能力がないとき、デマルチプレクサ２０３は、ステップＳＴ４６において、ＰＩＤ＝ＰＩＤ１のＴＳパケットをデマルチプレクスし、セクションパーシング（Section parsing）を行う。その後、デマルチプレクサ２０３は、ステップＳＴ４５の処理に移る。 When the demultiplexer 203 does not have the ability to decode all layers in step ST43, the demultiplexer 203 demultiplexes the TS packet with PID = PID1 and performs section parsing in step ST46. After that, the demultiplexer 203 moves to the process of step ST45.

ステップＳＴ４５において、デマルチプレクサ２０３は、対象となるＰＩＤのセクションの中で、ＨＥＶＣデスクリプタ（HEVC_descriptor）、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）を読み、拡張ストリームの有無、スケーラブルタイプ、ストリームの数とＩＤ、temporal_idの最大、最小値、最小ターゲットデコーダ・レベルを得る。 In step ST45, the demultiplexer 203 reads the HEVC descriptor (HEVC_descriptor), the scalability_extension_descriptor (scalability_extension_descriptor) in the section of the target PID, and determines the presence / absence of the extension stream, the scalable type, the number and ID of the streams, and so on. Get the maximum, minimum, and minimum target decoder levels for temporal_id.

次に、デマルチプレクサ２０３は、ステップＳＴ４７で、ＰＩＤの対象となる符号化ストリームを圧縮データバッファ（ｃｐｂ）２０４へ転送すると共に、ＤＴＳ、ＰＴＳを、ＣＰＵ２０１に通知する。デマルチプレクサ２０３は、ステップＳＴ４７の処理の後、ステップＳＴ４８において、処理を終了する。 Next, in step ST47, the demultiplexer 203 transfers the coded stream to be PID to the compressed data buffer (cpb) 204, and notifies the CPU 201 of the DTS and PTS. The demultiplexer 203 ends the process in step ST48 after the process in step ST47.

図２７に戻って、圧縮データバッファ(ｃｐｂ)２０４は、デマルチプレクサ２０３で取り出されるビデオストリーム（符号化ストリーム）を、一時的に蓄積する。デコーダ２０５は、圧縮データバッファ２０４に蓄積されているビデオストリームから、デコードすべき階層として指定された階層のピクチャの符号化画像データを取り出す。そして、デコーダ２０５は、取り出された各ピクチャの符号化画像データを、それぞれ、そのピクチャのデコードタイミングでデコードし、非圧縮データバッファ（ｄｐｂ）２０６に送る。 Returning to FIG. 27, the compressed data buffer (cpb) 204 temporarily stores the video stream (encoded stream) taken out by the demultiplexer 203. The decoder 205 extracts the encoded image data of the picture of the layer designated as the layer to be decoded from the video stream stored in the compressed data buffer 204. Then, the decoder 205 decodes the encoded image data of each of the extracted pictures at the decoding timing of the picture, and sends the encoded image data to the uncompressed data buffer (dpb) 206.

ここで、デコーダ２０５には、ＣＰＵ２０１からデコードすべき階層がtemporal_idで指定される。この指定階層は、デマルチプレクサ２０３で取り出されるビデオストリーム（符号化ストリーム）に含まれる全階層、あるいは低階層側の一部の階層とされ、ＣＰＵ２０１により自動的に、あるいはユーザ操作に応じて設定される。また、デコーダ２０５には、ＣＰＵ２０１から、ＤＴＳ（Decoding Time stamp）に基づいて、デコードタイミングが与えられる。なお、デコーダ２０５は、各ピクチャの符号化画像データをデコードする際に、必要に応じて、非圧縮データバッファ２０６から被参照ピクチャの画像データを読み出して利用する。 Here, in the decoder 205, the hierarchy to be decoded from the CPU 201 is specified by temporary_id. This designated layer is all layers included in the video stream (encoded stream) taken out by the demultiplexer 203, or a part of the lower layer side, and is set automatically by the CPU 201 or according to the user operation. To. Further, the decoder 205 is given a decoding timing from the CPU 201 based on the DTS (Decoding Time stamp). When decoding the coded image data of each picture, the decoder 205 reads out the image data of the referenced picture from the uncompressed data buffer 206 and uses it as necessary.

図３３は、デコーダ２０５の構成例を示している。このデコーダ２０５は、テンポラルＩＤ解析部２５１と、対象階層選択部２５２と、ストリーム統合部２５３と、デコード部２５４を有している。テンポラルＩＤ解析部２５１は、圧縮データバッファ２０４に蓄積されているビデオストリーム（符号化ストリーム）を読み出し、各ピクチャの符号化画像データのＮＡＬユニットヘッダに挿入されているtemporal_idを解析する。 FIG. 33 shows a configuration example of the decoder 205. The decoder 205 has a temporal ID analysis unit 251, a target layer selection unit 252, a stream integration unit 253, and a decoding unit 254. The temporary ID analysis unit 251 reads the video stream (encoded stream) stored in the compressed data buffer 204, and analyzes the temporary_id inserted in the NAL unit header of the encoded image data of each picture.

対象階層選択部２５２は、圧縮データバッファ２０４から読み出されたビデオストリームから、テンポラルＩＤ解析部２５１の解析結果に基づいて、デコードすべき階層として指定された階層のピクチャの符号化画像データを取り出す。この場合、対象階層選択部２５２からは、圧縮データバッファ２０４から読み出されたビデオストリームの数および指定階層に応じて、単一または複数のビデオストリーム（符号化ストリーム）が出力される。 The target layer selection unit 252 extracts the encoded image data of the picture of the layer designated as the layer to be decoded based on the analysis result of the temporal ID analysis unit 251 from the video stream read from the compressed data buffer 204. .. In this case, the target layer selection unit 252 outputs a single or a plurality of video streams (encoded streams) according to the number of video streams read from the compressed data buffer 204 and the designated layer.

ストリーム統合部２５３は、対象階層選択部２５２から出力される所定数のビデオストリーム（符号化ストリーム）を一つに統合する。デコード部２５４は、ストリーム統合部２５３で統合されたビデオストリーム（符号化ストリーム）が持つ各ピクチャの符号化画像データを、順次デコードタイミングでデコードし、非圧縮データバッファ（ｄｐｂ）２０６に送る。 The stream integration unit 253 integrates a predetermined number of video streams (encoded streams) output from the target layer selection unit 252 into one. The decoding unit 254 decodes the encoded image data of each picture of the video stream (encoded stream) integrated by the stream integration unit 253 sequentially at the decoding timing, and sends the encoded image data to the uncompressed data buffer (dpb) 206.

この場合、デコード部２５４はデマルチプレクサ２０３から得られるLevel_constrained_flagにより、ＳＰＳ，ＥＳＰＳの解析を行って、「general_level_idc」、「sublayer_level_idc」などを把握し、ストリームあるいはサブストリームが自身のデコーダ処理能力範囲内でデコードし得るものかどうかを確認する。また、この場合、デコード部２５４は、ＳＥＩの解析を行って、例えば、「initial_cpb_removal_time」、「cpb_removal_delay」を把握し、ＣＰＵ２０１からのデコードタイミングが適切か確認する。 In this case, the decoding unit 254 analyzes SPS and ESPS by the Level_constrained_flag obtained from the demultiplexer 203, grasps "general_level_idc", "sublayer_level_idc", etc., and the stream or substream is within its own decoder processing capacity range. Check if it can be decoded. Further, in this case, the decoding unit 254 analyzes the SEI, grasps, for example, "initial_cpb_removal_time" and "cpb_removal_delay", and confirms whether the decoding timing from the CPU 201 is appropriate.

また、デコード部２５４は、スライス（Slice）のデコードを行う際に、スライスヘッダ（Slice header）から、時間方向の予測先を表す情報として、「ref_idx_l0_active(ref_idx_l1_active)を取得し、時間方向の予測を行う。なお、デコード後のピクチャは、スライスヘッダ（slice header）から得られる「short_term_ref_pic_set_idx」、あるいは「it_idx_sps」が指標とされて、他のピクチャによる被参照として処理される。 Further, when decoding the slice, the decoding unit 254 acquires "ref_idx_l0_active (ref_idx_l1_active)" as information indicating the prediction destination in the time direction from the slice header (Slice header), and predicts in the time direction. The decoded picture is processed as a reference by another picture using "short_term_ref_pic_set_idx" or "it_idx_sps" obtained from the slice header as an index.

図３４のフローチャートは、受信装置２００におけるデコーダ処理能力を考慮したビデオストリーム毎のデコード処理手順の一例を示している。受信装置２００は、ステップＳＴ６１で処理を開始し、ステップＳＴ６２において、ＨＥＶＣデスクリプタ（HEVC_descriptor）を読む。 The flowchart of FIG. 34 shows an example of the decoding processing procedure for each video stream in consideration of the decoder processing capacity of the receiving device 200. The receiving device 200 starts processing in step ST61, and reads the HEVC descriptor (HEVC_descriptor) in step ST62.

次に、受信装置２００は、ステップＳＴ６３において、ＨＥＶＣデスクリプタに“level_constrained_flag”が存在するか判断する。存在するとき、受信装置２００は、ステップＳＴ６４において、“level_constrained_flag”が“１”であるか判断する。“１”であるとき、受信装置２００は、ステップＳＴ６５の処理に進む。 Next, in step ST63, the receiving device 200 determines whether or not “level_constrained_flag” exists in the HEVC descriptor. When present, the receiving device 200 determines in step ST64 whether the “level_constrained_flag” is “1”. When it is "1", the receiving device 200 proceeds to the process of step ST65.

このステップＳＴ６５において、受信装置２００は、該当するＰＩＤのＰＥＳパケット（PES packet）のタイムスタンプを参照し、ペイロード（payload）部分のビデオストリームのＳＰＳあるいはＥＳＰＳを読む。そして、受信装置２００は、ステップＳＴ６６において、ＳＰＳあるいはＥＳＰＳの要素である「general_level_idc」を読む。 In this step ST65, the receiving device 200 refers to the time stamp of the PES packet of the corresponding PID and reads the SPS or EPSS of the video stream of the payload portion. Then, the receiving device 200 reads "general_level_idc" which is an element of SPS or EPSS in step ST66.

次に、受信装置２００は、ステップＳＴ６７において、「general_level_idc」がデコーダ処理能力範囲内か判断する。デコーダ処理能力範囲内であるとき、受信装置２００は、ステップＳＴ６８において、該当するストリームあるいはサブストリームをデコードする。その後に、受信装置２００は、ステップＳＴ６９において、処理を終了する。一方、ステップＳＴ６７でデコーダ処理能力範囲内でないとき、受信装置２００は、直ちにステップＳＴ６９に進み、処理を終了する。 Next, in step ST67, the receiving device 200 determines whether "general_level_idc" is within the decoder processing capacity range. When it is within the decoder processing capacity range, the receiving device 200 decodes the corresponding stream or substream in step ST68. After that, the receiving device 200 ends the process in step ST69. On the other hand, when it is not within the decoder processing capacity range in step ST67, the receiving device 200 immediately proceeds to step ST69 and ends the processing.

また、ステップＳＴ６３で“level_constrained_flag”が存在しないとき、あるいはステップＳＴ６４で“level_constrained_flag”が“０”であるとき、受信装置２００は、ステップＳＴ７０の処理に移る。このステップＳＴ７０において、受信装置２００は、該当するＰＩＤのＰＥＳパケット（PES packet）のタイムスタンプを参照し、ペイロード（payload）部分のビデオストリームのＳＰＳを読む。一方、該当するビデオストリームにＳＰＳが存在しない場合は、temporal_layer が下位のピクチャ（Picture）を含むサブストリームのＳＰＳを参照する。 Further, when "level_constrained_flag" does not exist in step ST63, or when "level_constrained_flag" is "0" in step ST64, the receiving device 200 moves to the process of step ST70. In this step ST70, the receiving device 200 refers to the time stamp of the PES packet of the corresponding PID and reads the SPS of the video stream of the payload portion. On the other hand, if SPS does not exist in the corresponding video stream, temporal_layer refers to the SPS of the substream containing the lower picture (Picture).

次に、受信装置２００は、ステップＳＴ７１において、ＳＰＳの要素である「general_level_idc」を読む。そして、受信装置２００は、ステップＳＴ７２において、「general_level_idc」がデコーダ処理能力範囲内か判断する。デコーダ処理能力範囲内であるとき、受信装置２００は、ステップＳＴ７３の処理に移る。 Next, the receiving device 200 reads "general_level_idc" which is an element of SPS in step ST71. Then, in step ST72, the receiving device 200 determines whether "general_level_idc" is within the decoder processing capacity range. When it is within the decoder processing capacity range, the receiving device 200 moves to the processing of step ST73.

一方、デコーダ処理能力範囲内でないとき、受信装置２００は、ステップＳＴ７４において、ＳＰＳの要素の「Sublayer_level_idc」をチェックする。そして、受信装置２００は、ステップＳＴ７５において、「Sublayer_level_idc」がデコーダ処理能力範囲内である「Sublayer」が存在するか判断する。存在しないとき、受信装置２００は、直ちにステップＳＴ６９に進み、処理を終了する。一方、存在するとき、受信装置２００は、ステップＳＴ７３の処理に移る。 On the other hand, when it is not within the decoder processing capacity range, the receiving device 200 checks the "Sublayer_level_idc" of the SPS element in step ST74. Then, in step ST75, the receiving device 200 determines whether or not there is a "Sublayer" whose "Sublayer_level_idc" is within the decoder processing capacity range. When not present, the receiving device 200 immediately proceeds to step ST69 and ends the process. On the other hand, when present, the receiving device 200 moves to the process of step ST73.

このステップＳＴ７３において、受信装置２００は、ストリームの全体、あるいは該当するサブレイヤ（Sublayer）部分を、temporal_id値を参照してデコードする。その後、受信装置２００は、ステップＳＴ６９において、処理を終了する。 In this step ST73, the receiving device 200 decodes the entire stream or the corresponding Sublayer portion with reference to the temporary_id value. After that, the receiving device 200 ends the process in step ST69.

図２７に戻って、非圧縮データバッファ（ｄｐｂ）２０６は、デコーダ２０５でデコードされた各ピクチャの画像データを、一時的に蓄積する。ポスト処理部２０７は、非圧縮データバッファ（ｄｐｂ）２０６から表示タイミングで順次読み出された各ピクチャの画像データに対して、そのフレームレートを、表示能力に合わせる処理を行う。この場合、ＣＰＵ２０１から、ＰＴＳ（Presentation Time stamp）に基づいて、表示タイミングが与えられる。 Returning to FIG. 27, the uncompressed data buffer (dpb) 206 temporarily stores the image data of each picture decoded by the decoder 205. The post processing unit 207 performs a process of adjusting the frame rate of the image data of each picture sequentially read from the uncompressed data buffer (dpb) 206 at the display timing according to the display capability. In this case, the display timing is given by the CPU 201 based on the PTS (Presentation Time stamp).

例えば、デコード後の各ピクチャの画像データのフレームレートが１２０ｆｐｓであって、表示能力が１２０ｆｐｓであるとき、ポスト処理部２０７は、デコード後の各ピクチャの画像データをそのままディスプレイに送る。また、例えば、デコード後の各ピクチャの画像データのフレームレートが１２０ｆｐｓであって、表示能力が６０ｆｐｓであるとき、ポスト処理部２０７は、デコード後の各ピクチャの画像データに対して時間方向解像度が１/２倍となるようにサブサンプル処理を施し、６０ｆｐｓの画像データとしてディスプレイに送る。 For example, when the frame rate of the image data of each decoded picture is 120 fps and the display capacity is 120 fps, the post processing unit 207 sends the image data of each decoded picture to the display as it is. Further, for example, when the frame rate of the image data of each picture after decoding is 120 fps and the display capacity is 60 fps, the post processing unit 207 has a temporal resolution with respect to the image data of each picture after decoding. Subsample processing is performed so as to be 1/2 times, and the image data is sent to the display as 60 fps image data.

また、例えば、デコード後の各ピクチャの画像データのフレームレートが６０ｆｐｓであって、表示能力が１２０ｆｐｓであるとき、ポスト処理部２０７は、デコード後の各ピクチャの画像データに対して時間方向解像度が２倍となるように補間処理を施し、１２０ｆｐｓの画像データとしてディスプレイに送る。また、例えば、デコード後の各ピクチャの画像データのフレームレートが６０ｆｐｓであって、表示能力が６０ｆｐｓであるとき、ポスト処理部２０７は、デコード後の各ピクチャの画像データをそのままディスプレイに送る。 Further, for example, when the frame rate of the image data of each picture after decoding is 60 fps and the display capacity is 120 fps, the post processing unit 207 has a time-direction resolution with respect to the image data of each picture after decoding. Interpolation processing is performed so as to double the image data, and the image data is sent to the display at 120 fps. Further, for example, when the frame rate of the image data of each decoded picture is 60 fps and the display capability is 60 fps, the post processing unit 207 sends the image data of each decoded picture to the display as it is.

図３５は、ポスト処理部２０７の構成例を示している。この例は、上述したようにデコード後の各ピクチャの画像データのフレームレートが１２０ｆｐｓあるいは６０ｆｐｓであって、表示能力が１２０ｆｐｓあるいは６０ｆｐｓである場合に対処可能とした例である。 FIG. 35 shows a configuration example of the post processing unit 207. This example is an example in which it is possible to deal with a case where the frame rate of the image data of each picture after decoding is 120 fps or 60 fps and the display capability is 120 fps or 60 fps as described above.

ポスト処理部２０７は、補間部２７１と、サブサンプル部２７２と、スイッチ部２７３を有している。非圧縮データバッファ２０６からのデコード後の各ピクチャの画像データは、直接スイッチ部２７３に入力され、あるいは補間部２７１で２倍のフレームレートとされた後にスイッチ部２７３に入力され、あるいはサブサンプル部２７２で１/２倍のフレームレートとされた後にスイッチ部２７３に入力される。 The post processing unit 207 has an interpolation unit 271, a subsample unit 272, and a switch unit 273. The image data of each picture after decoding from the uncompressed data buffer 206 is directly input to the switch unit 273, or is input to the switch unit 273 after being doubled in the frame rate by the interpolation unit 271, or is input to the subsample unit. It is input to the switch unit 273 after the frame rate is set to 1/2 times that of 272.

スイッチ部２７３には、ＣＰＵ２０１から、選択情報が供給される。この選択情報は、ＣＰＵ２０１が、表示能力を参照して自動的に、あるいは、ユーザ操作に応じて発生する。スイッチ部２７３は、選択情報に基づいて、入力のいずれかを選択的に出力とする。これにより、非圧縮データバッファ（ｄｐｂ）２０６から表示タイミングで順次読み出された各ピクチャの画像データのフレームレートは、表示能力に合ったものとされる。 Selection information is supplied to the switch unit 273 from the CPU 201. This selection information is generated automatically by the CPU 201 with reference to the display capability or in response to a user operation. The switch unit 273 selectively outputs any of the inputs based on the selection information. As a result, the frame rate of the image data of each picture sequentially read from the uncompressed data buffer (dpb) 206 at the display timing is set to match the display capability.

図３６は、デコーダ２０５、ポスト処理部２０７の処理フローの一例を示している。デコーダ２０５、ポスト処理部２０７は、ステップＳＴ５１において、処理を開始し、その後に、ステップＳＴ５２の処理に移る。このステップＳＴ５２において、デコーダ２０５は、圧縮データバッファ（ｃｐｂ）２０４に蓄積されているデコード対象のビデオストリームを読み出し、temporal_idに基づいて、ＣＰＵ２０１からデコード対象として指定される階層のピクチャを選択する。 FIG. 36 shows an example of the processing flow of the decoder 205 and the post processing unit 207. The decoder 205 and the post processing unit 207 start processing in step ST51, and then move to the processing in step ST52. In this step ST52, the decoder 205 reads the video stream to be decoded stored in the compressed data buffer (cpb) 204, and selects a picture in the hierarchy designated as the decoding target from the CPU 201 based on the temporal_id.

次に、デコーダ２０５は、ステップＳＴ５３において、選択された各ピクチャの符号化画像データをデコードタイミングで順次デコードし、デコード後の各ピクチャの画像データを非圧縮データバッファ（ｄｐｂ）２０６に転送して、一時的に蓄積する。次に、ポスト処理部２０７は、ステップＳＴ５４において、非圧縮データバッファ（ｄｐｂ）２０６から、表示タイミングで各ピクチャの画像データを読み出す。 Next, in step ST53, the decoder 205 sequentially decodes the encoded image data of each selected picture at the decoding timing, and transfers the image data of each decoded picture to the uncompressed data buffer (dpb) 206. , Temporarily accumulate. Next, in step ST54, the post processing unit 207 reads out the image data of each picture from the uncompressed data buffer (dpb) 206 at the display timing.

次に、ポスト処理部２０７は、読み出された各ピクチャの画像データのフレームレートが表示能力にあっているか否かを判断する。フレームレートが表示能力に合っていないとき、ポスト処理部２０７は、ステップＳＴ５６において、フレームレートを表示能力に合わせて、ディスプレイに送り、その後、ステップＳＴ５７において、処理を終了する。一方、フレームレートが表示能力に合っているとき、ポスト処理部２０７は、ステップＳＴ５８において、フレームレートそのままでディスプレイに送り、その後、ステップＳＴ５７において、処理を終了する。 Next, the post processing unit 207 determines whether or not the frame rate of the image data of each read picture matches the display capability. When the frame rate does not match the display capability, the post processing unit 207 sends the frame rate to the display in accordance with the display capability in step ST56, and then ends the process in step ST57. On the other hand, when the frame rate matches the display capability, the post processing unit 207 sends the frame rate to the display as it is in step ST58, and then ends the processing in step ST57.

図２７に示す受信装置２００の動作を簡単に説明する。受信部２０２では、受信アンテナで受信されたＲＦ変調信号が復調され、トランスポートストリームＴＳが取得される。このトランスポートストリームＴＳは、デマルチプレクサ２０３に送られる。デマルチプレクサ２０３では、トランスポートストリームＴＳから、デコード能力（Decoder temporal layer capability）に応じた階層組のピクチャの符号化画像データが選択的に取り出され、圧縮データバッファ（ｃｐｂ）２０４に送られ、一時的に蓄積される。 The operation of the receiving device 200 shown in FIG. 27 will be briefly described. In the receiving unit 202, the RF modulated signal received by the receiving antenna is demodulated, and the transport stream TS is acquired. This transport stream TS is sent to the demultiplexer 203. In the demultiplexer 203, the encoded image data of the layered picture according to the decoding capability (Decoder temporal layer capability) is selectively extracted from the transport stream TS, sent to the compressed data buffer (cpb) 204, and temporarily. Accumulates.

デコーダ２０５では、圧縮データバッファ２０４に蓄積されているビデオストリームから、デコードすべき階層として指定された階層のピクチャの符号化画像データが取り出される。そして、デコーダ２０５では、取り出された各ピクチャの符号化画像データが、それぞれ、そのピクチャのデコードタイミングでデコードされ、非圧縮データバッファ（ｄｐｂ）２０６に送られ、一時的に蓄積される。この場合、各ピクチャの符号化画像データがデコードされる際に、必要に応じて、非圧縮データバッファ２０６から被参照ピクチャの画像データが読み出されて利用される。 In the decoder 205, the encoded image data of the picture of the layer designated as the layer to be decoded is taken out from the video stream stored in the compressed data buffer 204. Then, in the decoder 205, the encoded image data of each of the extracted pictures is decoded at the decoding timing of the picture, sent to the uncompressed data buffer (dpb) 206, and temporarily stored. In this case, when the coded image data of each picture is decoded, the image data of the referenced picture is read out from the uncompressed data buffer 206 and used as needed.

非圧縮データバッファ（ｄｐｂ）２０６から表示タイミングで順次読み出された各ピクチャの画像データは、ポスト処理部２０７に送られる。ポスト処理部２０７では、各ピクチャの画像データに対して、そのフレームレートを、表示能力に合わせるための補間あるいはサブサンプルが行われる。このポスト処理部２０７で処理された各ピクチャの画像データは、ディスプレイに供給され、その各ピクチャの画像データによる動画像の表示が行われる。 The image data of each picture sequentially read from the uncompressed data buffer (dpb) 206 at the display timing is sent to the post processing unit 207. The post processing unit 207 performs interpolation or subsamples for adjusting the frame rate of the image data of each picture to match the display capability. The image data of each picture processed by the post processing unit 207 is supplied to the display, and the moving image is displayed by the image data of each picture.

以上説明したように、図１に示す送受信システム１０においては、送信側において、階層毎にエンコード間隔が算出され、高階層ほどピクチャ毎の符号化画像データのデコード時間間隔が小さくなるように設定されたデコードタイムスタンプが各階層のピクチャの符号化画像データに付加されるものである。そのため、例えば、受信側においてデコード能力に応じた良好なデコード処理が可能となる。例えば、デコード能力が低い場合であっても、圧縮データバッファ２０４のバッファ破たんを招くことなく、低階層のピクチャの符号化画像データを選択的にデコードすることが可能となる。 As described above, in the transmission / reception system 10 shown in FIG. 1, the encoding interval is calculated for each layer on the transmitting side, and the higher the layer, the smaller the decoding time interval of the coded image data for each picture is set. The decoded time stamp is added to the coded image data of the picture in each layer. Therefore, for example, good decoding processing can be performed on the receiving side according to the decoding ability. For example, even when the decoding ability is low, it is possible to selectively decode the coded image data of a low-layer picture without causing the buffer of the compressed data buffer 204 to collapse.

また、図１に示す送受信システム１０においては、送信側において、トランスポートストリームＴＳのレイヤに、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）等が挿入されるものである。そのため、例えば、受信側では、階層符号化における階層情報、トランスポートストリームＴＳに含まれるビデオストリームの構成情報などを容易に把握でき、適切なデコード処理を行うことが可能となる。 Further, in the transmission / reception system 10 shown in FIG. 1, a scalability extension descriptor (scalability_extension_descriptor) or the like is inserted into the layer of the transport stream TS on the transmission side. Therefore, for example, the receiving side can easily grasp the hierarchical information in the hierarchical coding, the configuration information of the video stream included in the transport stream TS, and the like, and can perform appropriate decoding processing.

また、図１に示す送受信システム１０においては、送信部において、複数の階層を２以上の所定数の階層組に分割し、低階層側の階層組のピクチャの符号化画像データをコンテナするＴＳパケットの優先度ほど高く設定されるものである。例えば、２分割の場合、「transport_priority」の１ビットフィールドは、ベースレイヤ、つまり低階層側の階層組のピクチャの符号化画像データをコンテナするＴＳパケットの場合は“１”に設定され、ノンベースレイヤ、つまり高階層側の階層組のピクチャの符号化画像データをコンテナするＴＳパケットの場合は“０”に設定される。そのため、例えば、受信側では、このＴＳパケットの優先度に基づいて、自身のデコード能力に応じた階層組のピクチャの符号化画像データのみを圧縮データバッファ（ｃｐｂ）２０４に取り込むことが可能となり、バッファ破たんを回避することが容易となる。 Further, in the transmission / reception system 10 shown in FIG. 1, in the transmission unit, a TS packet that divides a plurality of layers into a predetermined number of layer sets of two or more and containers the encoded image data of the pictures of the layer set on the lower layer side. The higher the priority, the higher the setting. For example, in the case of two divisions, the 1-bit field of "transport_priority" is set to "1" in the case of a TS packet that containers the encoded image data of the picture of the base layer, that is, the lower layer set, and is non-base. It is set to "0" in the case of a TS packet that containers the encoded image data of a layer, that is, a picture of a layered set on the higher layer side. Therefore, for example, on the receiving side, based on the priority of the TS packet, only the coded image data of the hierarchical set of pictures according to its own decoding ability can be taken into the compressed data buffer (cpb) 204. It becomes easier to avoid buffer rupture.

また、図１に示す送受信システム１０においては、送信側において、各階層組に対応したサブストリームのそれぞれにビットストリームのレベル指定値が挿入され、その値は自己の階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値とされるものである。そのため、ビデオストリームの受信側では、各サブストリームのデコードが可能か否かの判断を、挿入されているビットストリームのレベル指定値に基づいて容易に判断することが可能となる。 Further, in the transmission / reception system 10 shown in FIG. 1, on the transmitting side, a bitstream level designation value is inserted into each of the substreams corresponding to each layer set, and the value is included in the layer set below the own layer set. It is a level value that includes pictures of all layers. Therefore, the receiving side of the video stream can easily determine whether or not each substream can be decoded based on the level specified value of the inserted bitstream.

また、図１に示す送受信システム１０においては、送信側において、トランスポートストリームＴＳのレイヤ（コンテナのレイヤ）に、各階層組のサブストリームに挿入されるビットストリームのレベル指定値が、当該階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値であることを示すフラグ情報（level_constrained_flag）が挿入されるものである。そのため、受信側では、このフラグ情報により、各階層組のサブストリームに挿入されるビットストリームのレベル指定値が当該階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値であることがわかり、sublayer_level_idcを用いた確認処理が不要となり、デコード処理の効率化を図ることが可能となる Further, in the transmission / reception system 10 shown in FIG. 1, on the transmitting side, the level specified value of the bit stream inserted into the substream of each layer set is set in the layer (container layer) of the transport stream TS. Flag information (level_constrained_flag) indicating that the level value includes pictures of all layers included in the following layer set is inserted. Therefore, on the receiving side, according to this flag information, the level specification value of the bitstream inserted into the substream of each layer set is a level value including pictures of all layers included in the layer set below the layer set. It is possible to improve the efficiency of decoding processing by eliminating the need for confirmation processing using sublayer_level_idc.

＜２．変形例＞
なお、上述実施の形態においては、送信装置１００と受信装置２００からなる送受信システム１０を示したが、本技術を適用し得る送受信システムの構成は、これに限定されるものではない。例えば、受信装置２００の部分が、例えば、（ＨＤＭＩ（High-Definition Multimedia Interface）などのデジタルインタフェースで接続されたセットトップボックスおよびモニタの構成などであってもよい。 <2. Modification example>
In the above-described embodiment, the transmission / reception system 10 including the transmission device 100 and the reception device 200 is shown, but the configuration of the transmission / reception system to which the present technology can be applied is not limited to this. For example, the portion of the receiving device 200 may be, for example, a configuration of a set-top box and a monitor connected by a digital interface such as (HDMI (High-Definition Multimedia Interface)).

また、上述実施の形態においては、コンテナがトランスポートストリーム（ＭＰＥＧ−２ＴＳ）である例を示した。しかし、本技術は、インターネット等のネットワークを利用して受信端末に配信される構成のシステムにも同様に適用できる。インターネットの配信では、ＭＰ４やそれ以外のフォーマットのコンテナで配信されることが多い。つまり、コンテナとしては、デジタル放送規格で採用されているトランスポートストリーム（ＭＰＥＧ−２ＴＳ）、インターネット配信で使用されているＭＰ４などの種々のフォーマットのコンテナが該当する。 Further, in the above-described embodiment, an example in which the container is a transport stream (MPEG-2 TS) is shown. However, this technology can be similarly applied to a system configured to be delivered to a receiving terminal using a network such as the Internet. In Internet distribution, it is often distributed in containers of MP4 or other formats. That is, the container corresponds to a container of various formats such as transport stream (MPEG-2 TS) adopted in the digital broadcasting standard and MP4 used in Internet distribution.

また、本技術は、以下のような構成を取ることもできる。
（１）動画像データを構成する各ピクチャの画像データを複数の階層に分類し、該分類された各階層のピクチャの画像データを符号化し、該符号化された各階層のピクチャの画像データを持つビデオストリームを生成する画像符号化部と、
上記生成されたビデオストリームを含む所定フォーマットのコンテナを送信する送信部とを備え、
上記画像符号化部は、
高階層ほどピクチャ毎の符号化画像データのデコード時間間隔が小さくなるように設定されたデコードタイミング情報を上記各階層のピクチャの符号化画像データに付加する
送信装置。
（２）上記画像符号化部は、
上記各階層のピクチャの符号化画像データを持つ単一のビデオストリームを生成し、
上記複数の階層を２以上の所定数の階層組に分割し、各階層組のピクチャの符号化画像データに、所属階層組を識別するための識別情報を付加する
前記（１）に記載の送信装置。
（３）上記識別情報は、ビットストリームのレベル指定値であり、高階層側の階層組ほど高い値とされる
前記（２）に記載の送信装置。
（４）上記画像符号化部は、
上記複数の階層を２以上の所定数の階層組に分割し、
上記各階層組のピクチャの符号化画像データをそれぞれ持つ上記所定数のビデオストリームを生成する
前記（１）に記載の送信装置。
（５）上記画像符号化部は、
上記各階層組のピクチャの符号化画像データに、所属階層組を識別するための識別情報を付加する
前記（４）に記載の送信装置。
（６）上記識別情報は、ビットストリームのレベル指定値であり、高階層側の階層組ほど高い値とされる
前記（５）に記載の送信装置。
（７）上記画像符号化部は、
上記各階層のピクチャの符号化画像データを持つ単一のビデオストリームを生成するか、あるいは上記複数の階層を２以上の所定数の階層組に分割し、各階層組のピクチャの符号化画像データをそれぞれ持つ上記所定数のビデオストリームを生成し、
上記コンテナのレイヤに、該コンテナに含まれるビデオストリームの構成情報を挿入する情報挿入部をさらに備える
前記（１）から（６）のいずれかに記載の送信装置。
（８）上記送信部は、
上記複数の階層を２以上の所定数の階層組に分割し、低階層側の階層組のピクチャの符号化画像データをコンテナするパケットの優先度ほど高く設定する
前記（１）から（８）のいずれかに記載の送信装置。
（９）動画像データを構成する各ピクチャの画像データを複数の階層に分類し、該分類された各階層のピクチャの画像データを符号化し、該符号化された各階層のピクチャの画像データを持つビデオストリームを生成する画像符号化ステップと、
送信部により上記生成されたビデオストリームを含む所定フォーマットのコンテナを送信する送信ステップとを有し、
上記画像符号化ステップでは、
高階層ほどピクチャ毎の符号化画像データのデコード時間間隔が小さくなるように設定されたデコードタイミング情報を上記各階層のピクチャの符号化画像データに付加する
送信方法。
（１０）動画像データを構成する各ピクチャの画像データが複数の階層に分類されて符号化されることで得られた各階層のピクチャの符号化画像データを持つビデオストリームを含む所定フォーマットのコンテナを受信する受信部を備え、
上記各階層のピクチャの符号化画像データには、高階層ほどピクチャ毎の符号化画像データのデコード時間間隔が小さくなるように設定されたデコードタイミング情報が付加されており、
上記受信されたコンテナに含まれる上記ビデオストリームから選択された所定階層以下の階層のピクチャの符号化画像データを、上記デコードタイミング情報が示すデコードタイミングでデコードして、上記所定階層以下の階層のピクチャの画像データを得る処理部をさらに備える
受信装置。
（１１）上記受信されたコンテナには、上記各階層のピクチャの符号化画像データを持つ単一のビデオストリームが含まれており、
上記複数の階層は２以上の所定数の階層組に分割され、低階層側の階層組のピクチャの符号化画像データをコンテナするパケットの優先度ほど高く設定されており、
上記処理部は、デコード能力に応じて選択された優先度のパケットでコンテナされた所定階層組のピクチャの符号化画像データをバッファに取り込んでデコードする
前記（１０）に記載の受信装置。
（１２）上記受信されたコンテナには、上記複数の階層が分割されて得られた２以上の所定数の階層組のピクチャの符号化画像データをそれぞれ持つ上記所定数のビデオストリームが含まれており、
上記処理部は、デコード能力に応じて選択されたビデオストリームが持つ所定階層組のピクチャの符号化画像データをバッファに取り込んでデコードする
前記（１０）に記載の受信装置。
（１３）上記処理部で得られる各ピクチャの画像データのフレームレートを表示能力に合わせるポスト処理部をさらに備える
前記（１０）から（１２）のいずれかに記載の受信装置。
（１４）受信部により動画像データを構成する各ピクチャの画像データが複数の階層に分類されて符号化されることで得られた各階層のピクチャの符号化画像データを持つビデオストリームを含む所定フォーマットのコンテナを受信する受信ステップを有し、
上記各階層のピクチャの符号化画像データには、高階層ほどピクチャ毎の符号化画像データのデコード時間間隔が小さくなるように設定されたデコードタイミング情報が付加されており、
上記受信されたコンテナに含まれる上記ビデオストリームから選択された所定階層以下の階層のピクチャの符号化画像データを、上記デコードタイミング情報が示すデコードタイミングでデコードして、上記所定階層以下の階層のピクチャの画像データを得る処理ステップをさらに有する
受信方法。
（１５）動画像データを構成する各ピクチャの画像データを複数の階層に分類し、該分類された各階層のピクチャの画像データを符号化し、該符号化された各階層のピクチャの画像データを持つビデオストリームを生成する画像符号化部と、
上記生成されたビデオストリームを含む所定フォーマットのコンテナを送信する送信部とを備え、
上記画像符号化部は、
上記各階層のピクチャの符号化画像データを持つ単一のビデオストリームを生成するか、あるいは上記複数の階層を２以上の所定数の階層組に分割し、各階層組のピクチャの符号化画像データをそれぞれ持つ上記所定数のビデオストリームを生成し、
上記コンテナのレイヤに、該コンテナに含まれるビデオストリームの構成情報を挿入する情報挿入部をさらに備える
送信装置。
（１６）動画像データを構成する各ピクチャの画像データを複数の階層に分類し、該分類された各階層のピクチャの画像データを符号化し、該符号化された各階層のピクチャの画像データを持つビデオストリームを生成する画像符号化部と、
上記生成されたビデオストリームを含む所定フォーマットのコンテナを送信する送信部とを備え、
上記送信部は、
上記複数の階層を２以上の所定数の階層組に分割し、低階層側の階層組のピクチャの符号化画像データをコンテナするパケットの優先度ほど高く設定する
送信装置。
（１７）動画像データを構成する各ピクチャの画像データを複数の階層に分類し、該分類された各階層のピクチャの画像データを符号化し、該符号化された各階層のピクチャの画像データを持つビデオストリームを生成する画像符号化部を備え、
上記画像符号化部は、上記複数の階層を２以上の所定数の階層組に分割し、各階層組に対応したサブストリームのそれぞれにビットストリームのレベル指定値を挿入し、
上記各階層組に対応したサブストリームのそれぞれに挿入されるビットストリームのレベル指定値は、自己の階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値とされる
符号化装置。
（１８）上記画像符号化部は、
上記各階層組に対応したサブストリームのそれぞれを含む上記所定数のビデオストリームを生成する、または上記各階層組に対応したサブストリームの全てを含む単一のビデオストリームを生成する
前記（１７）に記載の符号化装置。
（１９）画像符号化部により、動画像データを構成する各ピクチャの画像データを複数の階層に分類し、該分類された各階層のピクチャの画像データを符号化し、該符号化された各階層のピクチャの画像データを持つビデオストリームを生成する画像符号化ステップを有し、
上記画像符号化ステップでは、上記複数の階層を２以上の所定数の階層組に分割し、各階層組に対応したサブストリームのそれぞれにビットストリームのレベル指定値を挿入し、
上記各階層組に対応したサブストリームのそれぞれに挿入されるビットストリームのレベル指定値は、自己の階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値とされる
符号化方法。
（２０）動画像データを構成する各ピクチャの画像データを複数の階層に分類し、該分類された各階層のピクチャの画像データを符号化し、該符号化された各階層のピクチャの画像データを持つビデオストリームを生成する画像符号化部を備え、
上記画像符号化部は、上記複数の階層を２以上の所定数の階層組に分割し、各階層組に対応したサブストリームのそれぞれにビットストリームのレベル指定値を挿入し、
上記各階層組に対応したサブストリームのそれぞれに挿入されるビットストリームのレベル指定値は、当該階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値とされ、
上記生成されたビデオストリームを含む所定フォーマットのコンテナを送信する送信部と、
上記コンテナのレイヤに、各階層組のサブストリームに挿入されるビットストリームのレベル指定値が当該階層組以下の階層組に含まれる全ての階層のピクチャを含むレベル値であることを示すフラグ情報を挿入する情報挿入部をさらに備える
送信装置。 In addition, the present technology can also have the following configurations.
(1) The image data of each picture constituting the moving image data is classified into a plurality of layers, the image data of the pictures of each classified layer is encoded, and the image data of the encoded pictures of each layer is obtained. An image encoding unit that generates a video stream to have
It is equipped with a transmitter that transmits a container of a predetermined format including the generated video stream.
The image coding unit
A transmission device that adds decoding timing information set so that the decoding time interval of the coded image data for each picture becomes smaller as the layer is higher, to the coded image data of the pictures in each layer.
(2) The image coding unit is
Generate a single video stream with the coded image data of the pictures in each of the above layers.
The transmission according to (1) above, wherein the plurality of layers are divided into two or more predetermined number of layer groups, and identification information for identifying the layer group to which the layer belongs is added to the coded image data of the picture of each layer group. apparatus.
(3) The transmission device according to (2) above, wherein the identification information is a bit stream level specified value, and the higher the hierarchy side, the higher the value.
(4) The image coding unit is
The plurality of layers are divided into a predetermined number of layer sets of two or more, and the layers are divided into two or more layers.
The transmission device according to (1) above, which generates the predetermined number of video streams having the encoded image data of the pictures of each layer set.
(5) The image coding unit is
The transmission device according to (4) above, wherein identification information for identifying the belonging layer set is added to the coded image data of the picture of each layer set.
(6) The transmission device according to (5) above, wherein the identification information is a bit stream level specified value, and the higher the hierarchy side, the higher the value.
(7) The image coding unit is
A single video stream having the coded image data of the pictures of each layer is generated, or the plurality of layers are divided into two or more predetermined number of layer sets, and the coded image data of the pictures of each layer set is generated. Generate the above predetermined number of video streams, each with
The transmission device according to any one of (1) to (6) above, further comprising an information insertion unit for inserting configuration information of a video stream included in the container into the layer of the container.
(8) The transmitter is
The plurality of layers are divided into a predetermined number of layer sets of two or more, and the priority of the packet for containerizing the coded image data of the picture of the layer set on the lower layer side is set higher. The transmitter according to any.
(9) The image data of each picture constituting the moving image data is classified into a plurality of layers, the image data of the pictures of each classified layer is encoded, and the image data of the encoded pictures of each layer is obtained. Image coding steps to generate a video stream with
It has a transmission step of transmitting a container of a predetermined format including the generated video stream by the transmission unit.
In the above image coding step,
A transmission method in which decoding timing information set so that the decoding time interval of the coded image data for each picture becomes smaller as the layer is higher is added to the coded image data of the pictures in each layer.
(10) A container having a predetermined format including a video stream having encoded image data of the pictures of each layer obtained by classifying and encoding the image data of each picture constituting the moving image data into a plurality of layers. Equipped with a receiver to receive
Decoding timing information set so that the decoding time interval of the coded image data for each picture becomes smaller as the layer is higher is added to the coded image data of the pictures in each layer.
The encoded image data of the picture in the layer below the predetermined layer selected from the video stream included in the received container is decoded at the decoding timing indicated by the decoding timing information, and the picture in the layer below the predetermined layer is decoded. A receiving device further provided with a processing unit for obtaining image data of the above.
(11) The received container contains a single video stream having coded image data of the pictures in each layer.
The plurality of layers are divided into a predetermined number of layer sets of two or more, and are set as high as the priority of the packet that containers the coded image data of the picture of the layer set on the lower layer side.
The receiving device according to (10) above, wherein the processing unit takes in the encoded image data of a predetermined hierarchical set of pictures containerized with packets having a priority selected according to the decoding ability into a buffer and decodes the data.
(12) The received container includes the predetermined number of video streams each having encoded image data of two or more predetermined number of hierarchical sets of pictures obtained by dividing the plurality of layers. Ori,
The receiving device according to (10) above, wherein the processing unit takes in the encoded image data of a predetermined layer set of pictures of a video stream selected according to the decoding ability into a buffer and decodes the data.
(13) The receiving device according to any one of (10) to (12) above, further comprising a post processing unit that matches the frame rate of the image data of each picture obtained by the processing unit with the display capability.
(14) A predetermined video stream including the coded image data of the pictures of each layer obtained by classifying the image data of each picture constituting the moving image data by the receiving unit into a plurality of layers and encoding the images. Has a receive step to receive the format container and
Decoding timing information set so that the decoding time interval of the coded image data for each picture becomes smaller as the layer is higher is added to the coded image data of the pictures in each layer.
The encoded image data of the picture in the layer below the predetermined layer selected from the video stream included in the received container is decoded at the decoding timing indicated by the decoding timing information, and the picture in the layer below the predetermined layer is decoded. A receiving method further comprising a processing step of obtaining image data of.
(15) The image data of each picture constituting the moving image data is classified into a plurality of layers, the image data of the pictures of each classified layer is encoded, and the image data of the encoded pictures of each layer is obtained. An image encoding unit that generates a video stream to have
It is equipped with a transmitter that transmits a container of a predetermined format including the generated video stream.
The image coding unit
A single video stream having the coded image data of the pictures of each layer is generated, or the plurality of layers are divided into two or more predetermined number of layer sets, and the coded image data of the pictures of each layer set is generated. Generate the above predetermined number of video streams, each with
A transmission device further comprising an information insertion unit for inserting configuration information of a video stream included in the container into the layer of the container.
(16) The image data of each picture constituting the moving image data is classified into a plurality of layers, the image data of the pictures of each classified layer is encoded, and the image data of the encoded pictures of each layer is obtained. An image encoding unit that generates a video stream to have
It is equipped with a transmitter that transmits a container of a predetermined format including the generated video stream.
The above transmitter
A transmission device that divides the plurality of layers into a predetermined number of layer sets of two or more, and sets the priority of the packet that containers the coded image data of the pictures of the layer set on the lower layer side higher.
(17) The image data of each picture constituting the moving image data is classified into a plurality of layers, the image data of the pictures of each classified layer is encoded, and the image data of the encoded pictures of each layer is obtained. Equipped with an image encoding unit that generates a video stream to have
The image coding unit divides the plurality of layers into a predetermined number of layer sets of two or more, inserts a bitstream level specification value into each of the substreams corresponding to each layer set, and inserts a bitstream level specification value.
The level specification value of the bitstream inserted into each of the substreams corresponding to each of the above-mentioned layer sets is a level value including pictures of all layers included in the layer sets below the own layer set.
(18) The image coding unit is
To generate the predetermined number of video streams including each of the substreams corresponding to each layer set, or to generate a single video stream including all of the substreams corresponding to each layer set in (17). The coding device described.
(19) The image coding unit classifies the image data of each picture constituting the moving image data into a plurality of layers, encodes the image data of the pictures of each of the classified layers, and encodes each layer. Has an image coding step that produces a video stream with image data of the picture of
In the image coding step, the plurality of layers are divided into a predetermined number of layer sets of two or more, and a bitstream level specification value is inserted into each of the substreams corresponding to each layer set.
A coding method in which the level specification value of the bitstream inserted into each of the substreams corresponding to each of the above layer sets is a level value including pictures of all layers included in the layer sets below the own layer set.
(20) The image data of each picture constituting the moving image data is classified into a plurality of layers, the image data of the pictures of each classified layer is encoded, and the image data of the encoded pictures of each layer is obtained. Equipped with an image encoding unit that generates a video stream to have
The image coding unit divides the plurality of layers into a predetermined number of layer sets of two or more, inserts a bitstream level specification value into each of the substreams corresponding to each layer set, and inserts a bitstream level specification value.
The level specification value of the bitstream inserted in each of the substreams corresponding to each of the above layer sets is a level value including pictures of all layers included in the layer sets below the layer set.
A transmitter that sends a container of a predetermined format containing the generated video stream,
In the layer of the above container, flag information indicating that the level specification value of the bitstream inserted into the substream of each layer set is a level value including pictures of all layers included in the layer set below the layer set is added. A transmitter further including an information insertion unit to be inserted.

本技術の主な特徴は、階層毎にエンコード間隔を算出し、高階層ほどピクチャ毎の符号化画像データのデコード時間間隔が小さくなるように設定されたデコードタイムスタンプが各階層のピクチャの符号化画像データに付加することで、受信側においてデコード性能に応じた良好なデコード処理を可能としたことである（図９参照）。また、本技術の主な特徴は、複数の階層を２以上の所定数の階層組に分割し、低階層側の階層組のピクチャの符号化画像データをコンテナするパケットの優先度ほど高く設定することで、受信側において優先度をもとに自身のデコード能力に応じた階層組のピクチャの符号化画像データのみをバッファに取り込み、バッファ破綻を回避可能にしたことである（図１９参照）。 The main feature of this technology is that the encoding interval is calculated for each layer, and the decoding time stamp set so that the decoding time interval of the encoded image data for each picture becomes smaller as the layer is higher is the coding of the pictures in each layer. By adding it to the image data, it is possible to perform good decoding processing according to the decoding performance on the receiving side (see FIG. 9). Further, the main feature of the present technology is that a plurality of layers are divided into a predetermined number of layer sets of two or more, and the priority of the packet for containerizing the coded image data of the picture of the layer set on the lower layer side is set higher. As a result, on the receiving side, only the encoded image data of the hierarchical set of pictures according to its own decoding ability is taken into the buffer based on the priority, and the buffer failure can be avoided (see FIG. 19).

１０・・・送受信システム
１００・・・送信装置
１０１・・・ＣＰＵ
１０２・・・エンコーダ
１０３・・・圧縮データバッファ（ｃｐｂ）
１０４・・・マルチプレクサ
１０５・・・送信部
１２１・・・テンポラルＩＤ発生部
１２２・・・バッファ遅延制御部
１２３・・・ＨＲＤ設定部
１２４・・・パラメータセット/ＳＥＩエンコード部
１２５・・・スライスエンコード部
１２６・・・ＮＡＬパケット化部
１４１・・・ＴＳプライオリティ発生部
１４２・・・セクションコーディング部
１４３-1〜１４３-N・・・ＰＥＳパケット化部
１４４・・・トランスポートパケット化部
２００・・・受信装置
２０１・・・ＣＰＵ
２０２・・・受信部
２０３・・・デマルチプレクサ
２０４・・・圧縮データバッファ（ｃｐｂ）
２０５・・・デコーダ
２０６・・・非圧縮データバッファ（ｄｐｂ）
２０７・・・ポスト処理部
２３１・・・ＰＣＲ抽出部
２３２・・・タイムスタンプ抽出部
２３３・・・セクション抽出部
２３４・・・ＴＳプライオリティ抽出部
２３５・・・ＰＥＳペイロード抽出部
２３６・・・ピクチャ選択部
２５１・・・テンポラルＩＤ解析部
２５２・・・対象階層選択部
２５３・・・ストリーム統合部
２５４・・・デコード部
２７１・・・補間部
２７２・・・サブサンプル部
２７３・・・スイッチ部 10 ... Transmission / reception system 100 ... Transmission device 101 ... CPU
102 ... Encoder 103 ... Compressed data buffer (cpb)
104 ... multiplexer 105 ... transmitter 121 ... temporal ID generator 122 ... buffer delay control unit 123 ... HRD setting unit 124 ... parameter set / SEI encoding unit 125 ... slice encoding Part 126 ・・・ NAL packetization part 141 ・・・ TS priority generation part 142 ・・・ Section coding part 143-1 to 143-N ・・・ PES packetization part 144 ・・・ Transport packetization part 200 ・・・ Receiver 201 ・・・ CPU
202 ... Receiver 203 ... Demultiplexer 204 ... Compressed data buffer (cpb)
205 ... Decoder 206 ... Uncompressed data buffer (dpb)
207 ... Post processing unit 231 ... PCR extraction unit 232 ... Time stamp extraction unit 233 ... Section extraction unit 234 ... TS priority extraction unit 235 ... PES payload extraction unit 236 ... Picture Selection unit 251 ... Temporal ID analysis unit 252 ... Target hierarchy selection unit 253 ... Stream integration unit 254 ... Decoding unit 271 ... Interpolation unit 272 ... Subsample unit 273 ... Switch unit

Claims

When classifying the image data of the pictures constituting the moving image data into a plurality of layers and decoding the image data of each picture from the lowest layer to the higher layer among the image data of the classified pictures, each picture The first encoded image data including the picture on the lower layer side and the second encoded image data including the picture on the higher layer side, which are encoded so that the decoding order and the display order of the image data of the above are different. At the same time, corresponding to the first level specified value based on the frame rate of 60 Hz consisting of only the pictures on the lower layer side corresponding to the first coded image data, and corresponding to the second coded image data. A receiver for receiving a second level specified value based on a frame rate of 120 Hz including the picture on the lower layer side and the picture on the higher layer side is provided.
The decoding time of each picture included in the first coded image data is specified as a time at 60 Hz intervals in the additional information of the access unit of each picture, and the decoding of each picture included in the second coded image data is performed. The time is specified as a time at intervals of 60 Hz in the additional information of the access unit of each picture, and the decoding timing of each picture included in the second coded image data is the decoding timing of the picture included in the first coded image data. It is encoded so that it is an intermediate timing of the decoding timing.
A process of calculating the specified decoding time of each picture included in the received first coded image data and the second coded image data based on the additional information of the access unit of each picture. A receiving device further provided with a unit.

Based on the first level designated value and the second level designated value, the processing unit receives the first coded image data, the first coded image data, and the second coded image data. The receiving device according to claim 1, wherein the coded image data of the above is subjected to a decoding process to obtain moving image data.

When classifying the image data of the pictures constituting the moving image data into a plurality of layers and decoding the image data of each picture from the lowest layer to the higher layer among the image data of the classified pictures, each picture The first encoded image data including the picture on the lower layer side and the second encoded image data including the picture on the higher layer side, which are encoded so that the decoding order and the display order of the image data of the above are different. At the same time, corresponding to the first level specified value based on the frame rate of 60 Hz consisting of only the pictures on the lower layer side corresponding to the first coded image data, and corresponding to the second coded image data. It has a reception step of receiving a second level specified value based on a frame rate of 120 Hz including the picture on the lower layer side and the picture on the higher layer side.
The decoding time of each picture included in the first coded image data is specified as a time at 60 Hz intervals in the additional information of the access unit of each picture, and the decoding of each picture included in the second coded image data is performed. The time is specified as a time at intervals of 60 Hz in the additional information of the access unit of each picture, and the decoding timing of each picture included in the second coded image data is the decoding timing of the picture included in the first coded image data. It is encoded so that it is an intermediate timing of the decoding timing.
A processing step of calculating the above-specified decoding time of each picture included in the received first coded image data and the second coded image data based on the additional information of the access unit of each picture. A receiving method that further has.