JP2020205642A

JP2020205642A - Transmission method and transmission apparatus

Info

Publication number: JP2020205642A
Application number: JP2020164375A
Authority: JP
Inventors: 塚越　郁夫; Ikuo Tsukagoshi; 郁夫塚越
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2019-12-19
Filing date: 2020-09-30
Publication date: 2020-12-24
Anticipated expiration: 2033-08-27
Also published as: JP6950802B2

Abstract

To allow for excellent decoding processing depending on decoding capability on a receiving side.SOLUTION: Image data of each picture composing moving image data is hierarchically encoded to generate a first video stream having encoded image data of a picture on a low hierarchy side and a second video stream having encoded image data of a picture on a high hierarchy side. The first video stream and the second video stream are transmitted. Identification information of the first video stream is transmitted corresponding to the first video stream, and identification information of the video stream obtained by combining the first video stream and the second video stream is transmitted corresponding to the second video stream.SELECTED DRAWING: Figure 12

Description

本技術は、送信装置、送信方法、受信装置および受信方法に関する。詳しくは、本技術は、動画像データを構成する各ピクチャの画像データを階層符号化して送信する送信装置等に関する。 The present technology relates to a transmitting device, a transmitting method, a receiving device and a receiving method. More specifically, the present technology relates to a transmission device or the like that hierarchically encodes and transmits the image data of each picture constituting the moving image data.

圧縮動画を、放送、ネット等でサービスする際、受信機のデコード能力によって再生可能なフレーム周波数の上限が制限される。従って、サービス側は普及している受信機の再生能力を考慮して、低フレーム周波数のサービスのみに制限したり、高低複数のフレーム周波数のサービスを同時提供したりする必要がある。 When servicing compressed video over broadcasting, the Internet, etc., the upper limit of the reproducible frame frequency is limited by the decoding ability of the receiver. Therefore, it is necessary for the service side to limit the service to only the low frame frequency service or to provide the high and low frame frequency services at the same time in consideration of the reproduction capability of the widely used receiver.

受信機は、高フレーム周波数のサービスに対応するには、高コストとなり、早期普及の阻害要因となる。初期に低フレーム周波数のサービス専用の安価な受信機のみ普及していて、将来サービス側が高フレーム周波数のサービスを開始する場合、新たな受信機が無いと全く視聴不可能であり、新規サービスの普及の阻害要因となる。 Receivers are expensive to support high frame frequency services, which is a hindrance to early spread. Initially, only inexpensive receivers dedicated to low-frame frequency services were widespread, and if the service side starts high-frame frequency services in the future, it will not be possible to watch at all without a new receiver, and new services will spread. It becomes an inhibitory factor.

例えば、ＨＥＶＣ（High Efficiency Video Coding）において、動画像データを構成する各ピクチャの画像データを階層符号化することによる時間方向スケーラビリティが提案されている（非特許文献１参照）。受信側では、ＮＡＬ（Network Abstraction Layer）ユニットのヘッダに挿入されているテンポラルＩＤ（temporal_id）に基づき、各ピクチャの階層を識別でき、デコード能力に対応した階層までの選択的なデコードが可能となる。 For example, in HEVC (High Efficiency Video Coding), temporal scalability by hierarchically encoding the image data of each picture constituting the moving image data has been proposed (see Non-Patent Document 1). On the receiving side, the hierarchy of each picture can be identified based on the temporal ID (temporal_id) inserted in the header of the NAL (Network Abstraction Layer) unit, and selective decoding up to the hierarchy corresponding to the decoding ability becomes possible. ..

Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, Thomas Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECNOROGY, VOL. 22, NO. 12, pp. 1649-1668, DECEMBER 2012Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, Thomas Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECNOROGY, VOL. 22, NO. 12, pp . 1649-1668, DECEMBER 2012

本技術の目的は、受信側においてデコード能力に応じた良好なデコード処理を可能とすることにある。 An object of the present technology is to enable good decoding processing according to the decoding ability on the receiving side.

本技術の概念は、
動画像データを構成する各ピクチャの画像データを複数の階層に分類し、該分類された各階層のピクチャの画像データを符号化し、該符号化された各階層のピクチャの画像データを持つビデオデータを生成する画像符号化部と、
上記生成されたビデオデータを含む所定フォーマットのコンテナを送信する送信部と、
上記複数の階層を２以上の所定数の階層組に分割し、上記ビデオデータをコンテナするパケットに、該ビデオデータに含まれる各ピクチャの符号化画像データがそれぞれどの階層組に属するピクチャの符号化画像データであるかを識別する識別情報を挿入する識別情報挿入部を備える
送信装置。 The concept of this technology is
The image data of each picture constituting the moving image data is classified into a plurality of layers, the image data of the pictures of the classified layers is encoded, and the video data having the image data of the encoded pictures of each layer is obtained. And the image encoding unit that generates
A transmitter that transmits a container in a predetermined format containing the generated video data,
The plurality of layers are divided into two or more predetermined number of layer sets, and the encoded image data of each picture included in the video data is encoded in a packet that containers the video data. A transmission device including an identification information insertion unit that inserts identification information that identifies whether the data is image data.

本技術において、画像符号化部により、動画像データを構成する各ピクチャの画像データが符号化されてビデオデータが生成される。この場合、動画像データを構成する各ピクチャの画像データが複数の階層に分類されて符号化され、各階層のピクチャの符号化像データを持つビデオデータが生成される。 In the present technology, the image coding unit encodes the image data of each picture constituting the moving image data to generate video data. In this case, the image data of each picture constituting the moving image data is classified into a plurality of layers and encoded, and video data having the encoded image data of the pictures of each layer is generated.

送信部により、上述のビデオデータを含む所定フォーマットのコンテナが送信される。例えば、コンテナは、デジタル放送規格で採用されているトランスポートストリーム（ＭＰＥＧ−２ＴＳ）であってもよい。また、例えば、コンテナは、インターネットの配信などで用いられるＭＰ４、あるいはそれ以外のフォーマットのコンテナであってもよい。 The transmission unit transmits a container in a predetermined format containing the above-mentioned video data. For example, the container may be a transport stream (MPEG-2 TS) adopted in the digital broadcasting standard. Further, for example, the container may be a container of MP4 or another format used for distribution on the Internet.

識別情報挿入部により、複数の階層が２以上の所定数の階層組に分割され、ビデオデータをコンテナするパケットに、このビデオデータに含まれる各ピクチャの符号化画像データがそれぞれどの階層組に属するピクチャの符号化画像データであるかを識別する識別情報が挿入される。例えば、識別情報は、低階層側の階層組ほど高く設定される優先度情報である、ようにされてもよい。 The identification information insertion unit divides a plurality of layers into a predetermined number of layer sets of two or more, and the coded image data of each picture included in the video data belongs to which layer set in the packet that containers the video data. Identification information that identifies whether the picture is coded image data is inserted. For example, the identification information may be set to be higher priority information as the hierarchy set on the lower hierarchy side.

例えば、識別情報は、ペイロードにピクチャ毎の符号化画像データを含むＰＥＳパケットのヘッダに挿入される、ようにされてもよい。そして、この場合、例えば、識別情報は、ヘッダのＰＥＳプライオリティのフィールドを利用して挿入される、ようにされてもよい。また、例えば、識別情報は、アダプテーションフィールドを持つＴＳパケットの、このアダプテーションフィールドに挿入される、ようにされてもよい。そして、この場合、例えば、識別情報は、アダプテーションフィールドのＥＳプライオリティインジケータのフィールドを利用して挿入される、ようにされてもよい。また、例えば、識別情報は、該当するピクチャのトラックに関連するヘッダのボックスに挿入される、ようにされてもよい。 For example, the identification information may be inserted into the header of a PES packet containing coded image data for each picture in the payload. Then, in this case, for example, the identification information may be inserted by utilizing the PES priority field of the header. Further, for example, the identification information may be inserted into the adaptation field of the TS packet having the adaptation field. Then, in this case, for example, the identification information may be inserted by utilizing the field of the ES priority indicator of the adaptation field. Also, for example, the identification information may be inserted into a box in the header associated with the track of the corresponding picture.

このように本技術においては、ビデオデータをコンテナするパケットに、このビデオデータに含まれる各ピクチャの符号化画像データがそれぞれどの階層組に属するピクチャの符号化画像データであるかを識別する識別情報が挿入されるものである。そのため、受信側においては、この識別情報を利用することで、デコード能力に応じた所定階層以下の階層のピクチャの符号化画像データを選択的にデコードすることが容易に可能となる。 As described above, in the present technology, in the packet for containerizing the video data, the identification information for identifying which layer group the coded image data of each picture included in the video data belongs to is the coded image data of the picture. Is inserted. Therefore, on the receiving side, by using this identification information, it is possible to easily selectively decode the encoded image data of the picture in the layer below the predetermined layer according to the decoding ability.

なお、本技術において、例えば、画像符号化部は、各階層のピクチャの符号化画像データを持つ単一のビデオストリームを生成するか、あるいは複数の階層を２以上の所定数の階層組に分割し、各階層組のピクチャの符号化画像データをそれぞれ持つ所定数のビデオストリームを生成し、コンテナのレイヤに、このコンテナに含まれるビデオストリームの構成情報を挿入する構成情報挿入部をさらに備える、ようにされてもよい。この場合、例えば、受信側では、コンテナに含まれるビデオストリームの構成情報に基づいて、ビデオストリームの構成を容易に把握可能となる。 In the present technology, for example, the image coding unit generates a single video stream having coded image data of pictures in each layer, or divides a plurality of layers into two or more predetermined number of layer sets. A configuration information insertion unit is further provided, which generates a predetermined number of video streams having encoded image data of each layer of pictures, and inserts the configuration information of the video stream contained in the container into the layer of the container. May be done. In this case, for example, the receiving side can easily grasp the configuration of the video stream based on the configuration information of the video stream included in the container.

また、本技術の他の概念は、
動画像データを構成する各ピクチャの画像データが複数の階層に分類されて符号化されることで得られた各階層のピクチャの符号化画像データを持つビデオデータを含む所定フォーマットのコンテナを受信する受信部と、
上記受信されたコンテナに含まれる上記ビデオストリームからデコード能力に応じた所定階層以下の階層のピクチャの符号化画像データを選択的にバッファに取り込み、該バッファに取り込まれた各ピクチャの符号化画像データをデコードして、上記所定階層以下の階層のピクチャの画像データを得る画像復号化部を備える
受信装置にある。 In addition, other concepts of this technology
Receives a container in a predetermined format containing video data having encoded image data of the pictures of each layer obtained by classifying and encoding the image data of each picture constituting the moving image data into a plurality of layers. With the receiver
From the video stream included in the received container, the encoded image data of the pictures in the predetermined layer or lower according to the decoding ability is selectively taken into the buffer, and the encoded image data of each picture taken into the buffer is taken. Is in the receiving device including an image decoding unit which decodes the image data of the picture in the layer below the predetermined layer.

本技術において、受信部により、所定フォーマットのコンテナが受信される。このコンテナには、動画像データを構成する各ピクチャの画像データが複数の階層に分類されて符号化されることで得られた各階層のピクチャの画像データを持つビデオデータが含まれている。 In the present technology, a container of a predetermined format is received by the receiving unit. This container contains video data having image data of the pictures of each layer obtained by classifying and encoding the image data of each picture constituting the moving image data into a plurality of layers.

画像復号化部により、受信されたコンテナに含まれるビデオデータからデコード能力に応じた所定階層以下の階層のピクチャの符号化画像データが選択的にバッファに取り込まれ、このバッファに取り込まれた各ピクチャの符号化画像データがデコードされて、所定階層以下の階層のピクチャの画像データが得られる。 The image decoding unit selectively fetches the encoded image data of the pictures in the predetermined hierarchy or lower according to the decoding ability from the video data contained in the received container into the buffer, and each picture captured in this buffer. The coded image data of the above is decoded, and the image data of the picture of the layer below the predetermined layer is obtained.

例えば、複数の階層は２以上の所定数の階層組に分割され、ビデオデータをコンテナするパケットに、このビデオデータに含まれる各ピクチャの符号化画像データがそれぞれどの階層組に属するピクチャの符号化画像データであるかを識別する識別情報が挿入されており、画像復号化部は、識別情報に基づいて、デコード能力に応じた所定階層組のピクチャの符号化画像データをバッファに取り込んでデコードする、ようにされてもよい。 For example, a plurality of layers are divided into a predetermined number of layer sets of two or more, and the coded image data of each picture included in the video data is encoded in a packet that containers the video data. Identification information for identifying whether the data is image data is inserted, and the image decoding unit takes in the encoded image data of a predetermined hierarchical set of pictures according to the decoding ability into a buffer and decodes the encoded image data based on the identification information. , May be done.

この場合、例えば、識別情報は、ペイロードにピクチャ毎の符号化画像データを含むＰＥＳパケットのヘッダに挿入されている、ようにされてもよい。また、この場合、例えば、識別情報は、アダプテーションフィールドを持つＴＳパケットの、このアダプテーションフィールドに挿入されている、ようにされてもよい。また、この場合、例えば、識別情報は、該当するピクチャのトラックに関連するヘッダのボックスに挿入されている、ようにされてもよい。 In this case, for example, the identification information may be inserted in the header of the PES packet containing the coded image data for each picture in the payload. Further, in this case, for example, the identification information may be inserted in the adaptation field of the TS packet having the adaptation field. Also, in this case, for example, the identification information may be inserted in the header box associated with the track of the corresponding picture.

また、例えば、複数の階層は２以上の所定数の階層組に分割され、受信されたコンテナには、所定数の階層組のピクチャの符号化画像データをそれぞれ持つ所定数のビデオストリームが含まれており、画像符号化部は、ストリーム識別情報に基づいて、デコード能力に応じた所定階層組のピクチャの符号化画像データをバッファに取り込んでデコードする、ようにされてもよい。このとき、例えば、画像復号化部は、所定階層組のピクチャの符号化画像データが複数のビデオストリームに含まれている場合、各ピクチャの符号化画像データをデコードタイミング情報に基づいて１つのストリームにしてバッファに取り込む、ようにされてもよい。 Further, for example, a plurality of layers are divided into a predetermined number of layer sets of two or more, and the received container contains a predetermined number of video streams each having encoded image data of a predetermined number of layers of pictures. Therefore, the image coding unit may be configured to take in the coded image data of a predetermined layer set of pictures according to the decoding ability into the buffer and decode it based on the stream identification information. At this time, for example, when the coded image data of a predetermined layer set of pictures is included in a plurality of video streams, the image decoding unit converts the coded image data of each picture into one stream based on the decoding timing information. And take it into the buffer.

このように本技術においては、受信されたビデオデータからデコード能力に応じた所定階層以下の階層のピクチャの符号化画像データが選択的にバッファに取り込まれてデコードされるものである。そのため、デコード能力に応じた適切なデコード処理が可能となる。 As described above, in the present technology, the encoded image data of the pictures in the predetermined layer or lower according to the decoding ability is selectively taken into the buffer and decoded from the received video data. Therefore, an appropriate decoding process according to the decoding ability becomes possible.

なお、本技術において、例えば、画像復号化部は、選択的にバッファに取り込まれる各ピクチャの符号化画像データのデコードタイムスタンプを書き換えて低階層ピクチャのデコード間隔を調整する機能を持つ、ようにされてもよい。この場合、デコード能力の低いデコーダでも無理のないデコード処理が可能となる。 In the present technology, for example, the image decoding unit has a function of rewriting the decoding time stamp of the encoded image data of each picture selectively taken into the buffer to adjust the decoding interval of the low-layer picture. May be done. In this case, even a decoder having a low decoding ability can perform a reasonable decoding process.

また、本技術において、例えば、画像復号化部で得られる各ピクチャの画像データのフレームレートを表示能力に合わせるポスト処理部をさらに備える、ようにされてもよい。この場合、デコード能力が低い場合であっても、高表示能力にあったフレームレートの画像データを得ることが可能となる。 Further, in the present technology, for example, a post processing unit that matches the frame rate of the image data of each picture obtained by the image decoding unit with the display capability may be further provided. In this case, even when the decoding ability is low, it is possible to obtain image data having a frame rate suitable for the high display ability.

本技術によれば、受信側においてデコード能力に応じた良好なデコード処理が可能となる。なお、ここに記載された効果は必ずしも限定されるものではなく、本開示中に記載されたいずれかの効果であってもよい。 According to the present technology, good decoding processing according to the decoding ability can be performed on the receiving side. The effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.

実施の形態としての送受信システムの構成例を示すブロック図である。It is a block diagram which shows the configuration example of the transmission / reception system as an embodiment. 送信装置の構成例を示すブロック図である。It is a block diagram which shows the configuration example of a transmission device. エンコーダで行われる階層符号化の一例を示す図である。It is a figure which shows an example of the hierarchical coding performed by an encoder. ＮＡＬユニットヘッダの構造例（Syntax）およびその構造例における主要なパラメータの内容（Semantics）を示す図である。It is a figure which shows the structure example (Syntax) of a NAL unit header, and the content (Semantics) of the main parameter in the structure example. ＨＥＶＣによる各ピクチャの符号化画像データの構成を説明するための図である。It is a figure for demonstrating the structure of the coded image data of each picture by HEVC. 階層符号化の際のエンコード、デコード、表示順序と遅延の一例を示す図である。It is a figure which shows an example of encoding, decoding, display order and delay at the time of hierarchical coding. 階層符号化の符号化ストリームと、指定階層における表示期待（表示順）を示す図である。It is a figure which shows the coded stream of layer coding, and the display expectation (display order) in a designated layer. ＨＥＶＣデスクリプタ（HEVC_descriptor）の構造例（Syntax）を示す図である。It is a figure which shows the structural example (Syntax) of a HEVC descriptor (HEVC_descriptor). ＨＥＶＣデスクリプタの構造例における主要な情報の内容（Semantics）を示す図である。It is a figure which shows the content (Semantics) of the main information in the structural example of a HEVC descriptor. スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）の構造例（Syntax）を示す図である。It is a figure which shows the structure example (Syntax) of the scalability extension descriptor (scalability_extension_descriptor). スケーラビリティ・エクステンション・デスクリプタの構造例における主要な情報の内容（Semantics）を示す図である。It is a figure which shows the content (Semantics) of the main information in the structural example of a scalability extension descriptor. マルチプレクサの構成例を示すブロック図である。It is a block diagram which shows the configuration example of a multiplexer. マルチプレクサの処理フローの一例を示す図である。It is a figure which shows an example of the processing flow of a multiplexer. 単一ストリームによる配信を行う場合のトランスポートストリームＴＳの構成例を示す図である。It is a figure which shows the configuration example of the transport stream TS in the case of delivering by a single stream. 受信装置の構成例を示すブロック図である。It is a block diagram which shows the configuration example of a receiving device. デマルチプレクサの構成例を示すブロック図である。It is a block diagram which shows the configuration example of a demultiplexer. トランスポートストリームＴＳに単一のビデオストリーム（符号化ストリーム）が含まれている場合を示す図である。It is a figure which shows the case where the transport stream TS contains a single video stream (encoded stream). トランスポートストリームＴＳにベースストリームと拡張ストリームの２つのビデオストリーム（符号化ストリーム）が含まれている場合を示す図である。It is a figure which shows the case where two video streams (encoded stream) of a base stream and an extended stream are included in a transport stream TS. 各ピクチャの符号化画像データのデコードタイムスタンプを書き換えて低階層ピクチャのデコード間隔を調整する機能を説明するための図である。It is a figure for demonstrating the function which adjusts the decoding interval of a low-layer picture by rewriting the decoding time stamp of the coded image data of each picture. デマルチプレクサの処理フロー（１フレーム）の一例を示す図である。It is a figure which shows an example of the processing flow (1 frame) of a demultiplexer. デマルチプレクサの処理フロー（２フレーム）の一例を示す図である。It is a figure which shows an example of the processing flow (2 frames) of a demultiplexer. デコーダの構成例を示すブロック図である。It is a block diagram which shows the structural example of a decoder. ポスト処理部の構成例を示す図である。It is a figure which shows the structural example of the post processing part. デコーダ、ポスト処理部の処理フローの一例を示す図である。It is a figure which shows an example of the processing flow of a decoder and a post processing part. アダプテーションフィールドの配置例を示す図である。It is a figure which shows the arrangement example of the adaptation field. 階層組の識別情報をアダプテーションフィールドに挿入する場合におけるマルチプレクサの構成例を示すブロック図である。It is a block diagram which shows the configuration example of the multiplexer in the case of inserting the identification information of a hierarchy set into an adaptation field. 階層組の識別情報をアダプテーションフィールドに挿入する場合におけるトランスポートストリームＴＳの構成例を示す図である。It is a figure which shows the configuration example of the transport stream TS at the time of inserting the identification information of a hierarchy set into an adaptation field. 階層組の識別情報をアダプテーションフィールドに挿入する場合におけるデマルチプレクサの構成例を示すブロック図である。It is a block diagram which shows the configuration example of the demultiplexer at the time of inserting the identification information of a hierarchy set into an adaptation field. ＭＰ４ストリームの構成例を示す図である。It is a figure which shows the structural example of the MP4 stream. 「SampleDependencyTypeBox」の構造例を示す図である。It is a figure which shows the structural example of "SampleDependencyTypeBox". 「SampleDependencyTypeBox」の構造例おける主要な情報の内容を示す図である。It is a figure which shows the content of the main information in the structure example of "SampleDependencyTypeBox". 「SampleScalablePriorityBox」の構造例を示す図である。It is a figure which shows the structural example of "SampleScalablePriorityBox". 「SampleScalablePriorityBox」の構造例おける主要な情報の内容を示す図である。It is a figure which shows the content of the main information in the structure example of "SampleScalablePriorityBox".

以下、発明を実施するための形態（以下、「実施の形態」とする）について説明する。なお、説明は以下の順序で行う。
１．実施の形態
２．変形例 Hereinafter, embodiments for carrying out the invention (hereinafter referred to as “embodiments”) will be described. The explanation will be given in the following order.
1. 1. Embodiment 2. Modification example

＜１．実施の形態＞
［送受信システム］
図１は、実施の形態としての送受信システム１０の構成例を示している。この送受信システム１０は、送信装置１００と、受信装置２００とを有する構成となっている。 <1. Embodiment>
[Transmission / reception system]
FIG. 1 shows a configuration example of the transmission / reception system 10 as an embodiment. The transmission / reception system 10 has a transmission device 100 and a reception device 200.

送信装置１００は、コンテナとしてのトランスポートストリームＴＳを放送波に載せて送信する。このトランスポートストリームＴＳには、動画像データを構成する各ピクチャの画像データが複数の階層に分類され、各階層のピクチャの画像データの符号化データを持つビデオストリームが含まれる。この場合、例えば、Ｈ．２６４／ＡＶＣ、ＨＥＶＣなどの符号化が施され、被参照ピクチャが自己階層および／または自己階層よりも低い階層に所属するように符号化される。 The transmission device 100 carries the transport stream TS as a container on a broadcast wave and transmits the transport stream TS. The transport stream TS includes a video stream in which the image data of each picture constituting the moving image data is classified into a plurality of layers and has encoded data of the image data of the pictures in each layer. In this case, for example, H. Coding such as 264 / AVC, HEVC, etc. is applied, and the referenced picture is encoded so as to belong to a self-hierarchy and / or a lower hierarchy than the self-hierarchy.

各階層のピクチャの符号化画像データに、ピクチャ毎に、所属階層を識別するための階層識別情報が付加される。この実施の形態においては、各ピクチャのＮＡＬユニット（nal_unit）のヘッダ部分に、階層識別情報（temporal_id）を意味する“nuh_temporal_id_plus1”が配置される。このように階層識別情報が付加されることで、受信側では、ＮＡＬユニットのレイヤにおいて各ピクチャの階層識別が可能となり、所定階層以下の階層の符号化画像データを選択的に取り出してデコード処理を行うことができる。 Hierarchical identification information for identifying the affiliation hierarchy is added to the encoded image data of the pictures of each layer for each picture. In this embodiment, "nuh_temporal_id_plus1" meaning hierarchical identification information (temporal_id) is arranged in the header portion of the NAL unit (nal_unit) of each picture. By adding the hierarchy identification information in this way, the receiving side can identify the hierarchy of each picture in the layer of the NAL unit, and selectively extracts the encoded image data of the hierarchy below the predetermined hierarchy and performs the decoding process. It can be carried out.

この実施の形態において、複数の階層は２以上の所定数の階層組に分割され、ビデオストリームのレイヤに、このビデオストリームが持つ各ピクチャの符号化画像データがそれぞれどの階層組に属するピクチャの符号化画像データであるかを識別する識別情報が挿入される。 In this embodiment, the plurality of layers are divided into a predetermined number of layer sets of two or more, and the code of the picture to which the coded image data of each picture of the video stream belongs to the layer of the video stream. Identification information that identifies whether the data is converted image data is inserted.

この実施の形態において、この識別情報は、低階層側の階層組ほど高く設定される優先度情報とされ、ペイロードにピクチャ毎の符号化画像データを含むＰＥＳパケットのヘッダに挿入される。この識別情報により、受信側では、自身のデコード能力に応じた階層組のピクチャの符号化画像データのみをバッファに取り込んで処理することが可能となる。 In this embodiment, the identification information is set as priority information as the layer set on the lower layer side is set higher, and is inserted into the header of the PES packet including the encoded image data for each picture in the payload. With this identification information, the receiving side can take in only the encoded image data of the hierarchical set of pictures according to its own decoding ability into the buffer and process it.

トランスポートストリームＴＳには、各階層のピクチャの符号化画像データを持つ単一のビデオストリーム、あるいは上述の各階層組のピクチャの符号化画像データをそれぞれ持つ所定数のビデオストリームが含まれる。このトランスポートストリームＴＳには、階層符号化の階層情報と、ビデオストリームの構成情報が挿入される。この情報により、受信側では、階層構成やストリーム構成を容易に把握でき、適切なデコード処理を行うことが可能となる。 The transport stream TS includes a single video stream having encoded image data of each layer of pictures, or a predetermined number of video streams having each of the above-mentioned encoded image data of each layer of pictures. Hierarchical coding hierarchical information and video stream configuration information are inserted into the transport stream TS. With this information, the receiving side can easily grasp the hierarchical structure and the stream structure, and can perform appropriate decoding processing.

受信装置２００は、送信装置１００から放送波に載せて送られてくる上述のトランスポートストリームＴＳを受信する。受信装置２００は、このトランスポートストリームＴＳに含まれるビデオストリームからデコード能力に応じて選択された所定階層以下の階層のピクチャの符号化画像データを選択的にバッファに取り込んでデコードし、各ピクチャの画像データを取得して、画像再生を行う。 The receiving device 200 receives the above-mentioned transport stream TS transmitted on the broadcast wave from the transmitting device 100. The receiving device 200 selectively takes in the encoded image data of the pictures in the layer below the predetermined layer selected according to the decoding ability from the video stream included in the transport stream TS into the buffer and decodes them, and decodes each picture. Image data is acquired and the image is reproduced.

例えば、上述したように、トランスポートストリームＴＳに、複数の階層のピクチャの符号化画像データを持つ単一のビデオストリームが含まれている場合がある。その場合、上述の識別情報に基づいて、デコード能力に応じた所定階層組のピクチャの符号化画像データがバッファに取り込まれて処理される。 For example, as described above, the transport stream TS may include a single video stream having coded image data of pictures in a plurality of layers. In that case, based on the above-mentioned identification information, the encoded image data of a predetermined hierarchical set of pictures according to the decoding ability is taken into the buffer and processed.

また、例えば、上述したように、トランスポートストリームＴＳに、複数の階層が分割されて得られた２以上の所定数の階層組のピクチャの符号化画像データをそれぞれ持つ所定数のビデオストリームが含まれている場合がある。その場合、ストリーム識別情報に基づいて、デコード能力に応じた所定階層組のピクチャの符号化画像データがバッファに取り込まれて処理される。 Further, for example, as described above, the transport stream TS includes a predetermined number of video streams each having encoded image data of two or more predetermined number of hierarchical sets of pictures obtained by dividing a plurality of layers. It may be. In that case, based on the stream identification information, the encoded image data of a predetermined hierarchical set of pictures according to the decoding ability is taken into the buffer and processed.

また、受信装置２００は、選択的にバッファに取り込まれる各ピクチャの符号化画像データのデコードタイムスタンプを書き換えて低階層ピクチャのデコード間隔を調整する処理を行う。この調整処理により、デコード能力の低いデコーダでも無理のないデコード処理が可能となる。 In addition, the receiving device 200 performs a process of rewriting the decoding time stamp of the encoded image data of each picture selectively taken into the buffer to adjust the decoding interval of the low-layer picture. By this adjustment processing, even a decoder having a low decoding ability can perform a reasonable decoding processing.

また、受信装置２００は、上述のようにデコードして得られた各ピクチャの画像データのフレームレートを表示能力に合わせるポスト処理を行う。このポスト処理により、例えば、デコード能力が低い場合であっても、高表示能力にあったフレームレートの画像データを得ることが可能となる。 In addition, the receiving device 200 performs post processing to match the frame rate of the image data of each picture obtained by decoding as described above with the display capability. By this post processing, for example, even when the decoding ability is low, it is possible to obtain image data having a frame rate suitable for the high display ability.

「送信装置の構成」
図２は、送信装置１００の構成例を示している。この送信装置１００は、ＣＰＵ（Central Processing Unit）１０１と、エンコーダ１０２と、圧縮データバッファ（ｃｐｂ：coded picture buffer）１０３と、マルチプレクサ１０４と、送信部１０５を有している。ＣＰＵ１０１は、制御部であり、送信装置１００の各部の動作を制御する。 "Configuration of transmitter"
FIG. 2 shows a configuration example of the transmission device 100. The transmission device 100 includes a CPU (Central Processing Unit) 101, an encoder 102, a compressed data buffer (cpb: coded picture buffer) 103, a multiplexer 104, and a transmission unit 105. The CPU 101 is a control unit and controls the operation of each unit of the transmission device 100.

エンコーダ１０２は、非圧縮の動画像データを入力して、階層符号化を行う。エンコーダ１０２は、この動画像データを構成する各ピクチャの画像データを複数の階層に分類する。そして、エンコーダ１０２は、この分類された各階層のピクチャの画像データを符号化し、各階層のピクチャの符号化画像データを持つビデオストリームを生成する。エンコーダ１０２は、例えば、Ｈ．２６４／ＡＶＣ、ＨＥＶＣなどの符号化を行う。この際、エンコーダ１０２は、参照するピクチャ（被参照ピクチャ）が、自己階層および／または自己階層よりも下位の階層に所属するように、符号化する。 The encoder 102 inputs uncompressed moving image data and performs hierarchical coding. The encoder 102 classifies the image data of each picture constituting the moving image data into a plurality of layers. Then, the encoder 102 encodes the image data of the pictures of each of the classified layers, and generates a video stream having the encoded image data of the pictures of each layer. The encoder 102 is, for example, H.I. Coding such as 264 / AVC and HEVC is performed. At this time, the encoder 102 encodes the referenced picture (referenced picture) so that it belongs to the self-layer and / or the layer lower than the self-layer.

図３は、エンコーダ１０２で行われる階層符号化の一例を示している。この例は、０から４までの５階層に分類され、各階層のピクチャの画像データに対して、例えばＨＥＶＣの符号化が施された例である。 FIG. 3 shows an example of hierarchical coding performed by the encoder 102. This example is an example in which the image data of the pictures in each layer is classified into 5 layers from 0 to 4, for example, HEVC coding is applied.

縦軸は階層を示している。階層０から４のピクチャの符号化画像データを構成するＮＡＬユニット（nal_unit）のヘッダ部分に配置されるtemporal_id（階層識別情報）として、それぞれ、０から４が設定される。一方、横軸は表示順（ＰＯＣ：picture order of composition）を示し、左側は表示時刻が前で、右側は表示時刻が後になる。 The vertical axis shows the hierarchy. 0 to 4 are set as temporary_id (layer identification information) arranged in the header portion of the NAL unit (nal_unit) constituting the encoded image data of the pictures in layers 0 to 4, respectively. On the other hand, the horizontal axis indicates the display order (POC: picture order of composition), the left side shows the display time before, and the right side shows the display time after.

図４（ａ）は、ＮＡＬユニットヘッダの構造例（Syntax）を示し、図４（ｂ）は、その構造例における主要なパラメータの内容（Semantics）を示している。「Forbidden_zero_bit」の１ビットフィールドは、０が必須である。「Nal_unit_type」の６ビットフィールドは、ＮＡＬユニットタイプを示す。「Nuh_layer_id」の６ビットフィールドは、０を前提とする。「Nuh_temporal_id_plus1」の３ビットフィールドは、temporal_idを示し、１を加えた値（１〜７）をとる。 FIG. 4 (a) shows a structural example (Syntax) of the NAL unit header, and FIG. 4 (b) shows the contents (Semantics) of the main parameters in the structural example. 0 is required for the 1-bit field of "Forbidden_zero_bit". The 6-bit field of "Nal_unit_type" indicates the NAL unit type. The 6-bit field of "Nuh_layer_id" is assumed to be 0. The 3-bit field of "Nuh_temporal_id_plus1" indicates temporary_id and takes a value (1 to 7) to which 1 is added.

図３に戻って、矩形枠のそれぞれがピクチャを示し、数字は、符号化されているピクチャの順、つまりエンコード順（受信側ではデコード順）を示している。「１」から「１７」（「２」を除く）の１６個のピクチャによりサブ・ピクチャグループ（Sub group of pictures）を構成しており、「１」はそのサブ・ピクチャグループの先頭ピクチャである。「２」は、次のサブ・ピクチャグループの先頭ピクチャとなる。あるいは、「１」を除いて、「２」から「１７」までの１６個のピクチャによりサブ・ピクチャグループを構成しており、「２」はそのサブ・ピクチャグループの先頭のピクチャとなる。 Returning to FIG. 3, each of the rectangular frames indicates a picture, and the numbers indicate the order of the encoded pictures, that is, the encoding order (decoding order on the receiving side). A sub group of pictures is composed of 16 pictures from "1" to "17" (excluding "2"), and "1" is the first picture of the sub picture group. .. “2” is the first picture of the next sub picture group. Alternatively, except for "1", 16 pictures from "2" to "17" form a sub-picture group, and "2" is the first picture in the sub-picture group.

「１」のピクチャは、ＧＯＰ（Group Of Pictures）の先頭のピクチャとなり得る。ＧＯＰの先頭ピクチャの符号化画像データは、図５に示すように、ＡＵＤ、ＶＰＳ、ＳＰＳ、ＰＰＳ、ＰＳＥＩ、ＳＬＩＣＥ、ＳＳＥＩ、ＥＯＳのＮＡＬユニットにより構成される。一方、ＧＯＰの先頭ピクチャ以外のピクチャは、ＡＵＤ、ＰＰＳ、ＰＳＥＩ、ＳＬＩＣＥ、ＳＳＥＩ、ＥＯＳのＮＡＬユニットにより構成される。ＶＰＳはＳＰＳと共に、シーケンス（ＧＯＰ）に一度、ＰＰＳはマイピクチャで伝送可能とされている。 The picture of "1" can be the first picture of GOP (Group Of Pictures). As shown in FIG. 5, the coded image data of the first picture of the GOP is composed of NAL units of AUD, VPS, SPS, PPS, PSEI, SLICE, SSEI, and EOS. On the other hand, the pictures other than the first picture of the GOP are composed of NAL units of AUD, PPS, PSEI, SLICE, SSEI and EOS. VPS, together with SPS, can be transmitted once in a sequence (GOP), and PPS can be transmitted in My Picture.

図３に戻って、実線矢印は、符号化におけるピクチャの参照関係を示している。例えば、「１」のピクチャは、Ｉピクチャであり、他のピクチャを参照しない。「２」のピクチャは、Ｐピクチャであり、「１」のピクチャを参照して符号化される。また、「３」のピクチャは、Ｂピクチャであり、「１」、「３」のピクチャを参照して符号化される。以下、同様、その他のピクチャは、表示順で近くのピクチャを参照して符号化される。なお、階層４のピクチャは、他のピクチャからの参照がない。 Returning to FIG. 3, the solid arrow indicates the reference relationship of the picture in the coding. For example, the picture of "1" is an I picture and does not refer to another picture. The picture of "2" is a P picture and is encoded with reference to the picture of "1". Further, the picture of "3" is a B picture, and is encoded with reference to the pictures of "1" and "3". Hereinafter, similarly, other pictures are encoded with reference to nearby pictures in the display order. Note that the picture in layer 4 is not referenced by other pictures.

エンコーダ１０２は、各階層のピクチャの符号化画像データを持つ単一のビデオストリーム（シングルストリーム）を生成するか、あるいは、複数の階層を２以上の所定数の階層組に分割し、各階層組のピクチャの符号化画像データをそれぞれ持つ所定数のビデオストリーム（マルチストリーム）を生成する。例えば、図３の階層符号化の例において、階層０から３を低階層の階層組とし、階層４を高階層の階層組として２つの階層組に分割されるとき、エンコーダ１０２は、各階層組のピクチャの符号化画像データをそれぞれ持つ２つのビデオストリーム（符号化ストリーム）を生成する。 The encoder 102 generates a single video stream (single stream) having encoded image data of the pictures of each layer, or divides a plurality of layers into two or more predetermined number of layer sets, and each layer set. Generates a predetermined number of video streams (multi-streams) each having the encoded image data of the picture. For example, in the example of hierarchical coding in FIG. 3, when layers 0 to 3 are set as low-level hierarchical groups and layer 4 is set as high-level hierarchical groups, the encoder 102 is divided into two hierarchical groups. Two video streams (encoded streams) each having the encoded image data of the picture of the above are generated.

エンコーダ１０２は、生成するビデオストリームの数によらず、上述したように、複数の階層を２以上の所定数の階層組に分割し、各階層組のピクチャの符号化画像データに、所属階層組を識別するための識別情報を付加する。この場合、例えば、識別情報として、ＳＰＳに含まれるビットストリームのレベル指定値である「general_level_idc」が利用され、高階層側の階層組ほど高い値とされる。なお、サブレイヤ（sublayer）毎に「sub_layer_level_idc」をＳＰＳで送ることができるので、識別情報として、この「sub_layer_level_idc」を用いてもよい。以上はＳＰＳだけでなくＶＰＳにおいても供給される。 As described above, the encoder 102 divides a plurality of layers into a predetermined number of layer sets of two or more, and belongs to the coded image data of the pictures of each layer set, regardless of the number of video streams to be generated. Add identification information to identify. In this case, for example, "general_level_idc", which is a level designation value of the bit stream included in the SPS, is used as the identification information, and the higher the hierarchy side, the higher the value. Since "sub_layer_level_idc" can be sent by SPS for each sublayer, this "sub_layer_level_idc" may be used as the identification information. The above is supplied not only in SPS but also in VPS.

この場合、各階層組のレベル指定値の値は、この階層組のピクチャと、この階層組より低階層側の全ての階層組のピクチャとからなるフレームレートに対応した値とされる。例えば、図３の階層符号化の例において、階層０から３の階層組のレベル指定値は、階層０から３のピクチャのみからなるフレームレートに対応した値とされ、階層４の階層組のレベル指定値は、階層０から４の全ての階層のピクチャからなるフレームレートに対応した値とされる。 In this case, the value of the level designation value of each layer set is a value corresponding to the frame rate including the picture of this layer set and the picture of all the layer groups on the lower layer side of this layer group. For example, in the example of hierarchical coding in FIG. 3, the level designation value of the hierarchical set of layers 0 to 3 is a value corresponding to the frame rate consisting of only the pictures of layers 0 to 3, and the level of the hierarchical set of layers 4 The specified value is a value corresponding to a frame rate consisting of pictures in all layers 0 to 4.

図６は、階層符号化の際のエンコード、デコード、表示順序と遅延の一例を示している。この例は、上述の図３の階層符号化例に対応している。この例は、全階層（全レイヤ）を、フル時間解像度で階層符号化する場合を示している。図６（ａ）はエンコーダ入力を示す。図６（ｂ）に示すように、１６ピクチャ分の遅延をもって、各ピクチャがエンコード順にエンコードされて、符号化ストリームが得られる。また、図６（ｂ）はデコーダ入力を示し、各ピクチャがデコード順にデコードされる。そして、図６（ｃ）に示すように、４ピクチャの遅延をもって、各ピクチャの画像データが表示順に得られる。 FIG. 6 shows an example of encoding, decoding, display order and delay during hierarchical coding. This example corresponds to the hierarchical coding example of FIG. 3 described above. This example shows the case where all layers (all layers) are hierarchically coded at full time resolution. FIG. 6A shows the encoder input. As shown in FIG. 6B, each picture is encoded in the encoding order with a delay of 16 pictures to obtain a coded stream. Further, FIG. 6B shows a decoder input, and each picture is decoded in the decoding order. Then, as shown in FIG. 6C, the image data of each picture is obtained in the display order with a delay of 4 pictures.

図７（ａ）は、上述の図６（ｂ）に示す符号化ストリームと同様の符号化ストリームを、階層０から２、階層３、階層４の３段階に分けて示している。ここで、「Ｔｉｄ」は、temporal_idを示している。図７（ｂ）は、階層０から２、つまりＴｉｄ＝０〜２の部分階層の各ピクチャを選択的にデコードする場合の表示期待（表示順）を示している。また、図７（ｃ）は、階層０から３、つまりＴｉｄ＝０〜３の部分階層の各ピクチャを選択的にデコードする場合の表示期待（表示順）を示している。さらに、図７（ｄ）は、階層０から４、つまりＴｉｄ＝０〜４の全階層の各ピクチャを選択的にデコードする場合の表示期待（表示順）を示している。 FIG. 7A shows a coded stream similar to the coded stream shown in FIG. 6B described above, divided into three stages of layers 0 to 2, layer 3, and layer 4. Here, "Tid" indicates temporary_id. FIG. 7B shows display expectations (display order) when each picture in the sub-layers of layers 0 to 2, that is, Tid = 0 to 2, is selectively decoded. Further, FIG. 7C shows display expectations (display order) when each picture in the layers 0 to 3, that is, each picture in the partial layer of Tid = 0 to 3 is selectively decoded. Further, FIG. 7D shows display expectations (display order) in the case of selectively decoding each picture of layers 0 to 4, that is, all layers of Tid = 0 to 4.

図７（ａ）の符号化ストリームをデコード能力別にデコード処理するには、時間解像度がフルレートのデコード能力が必要となる。しかし、Ｔｉｄ＝０〜２のデコードを行う場合、符号化されたフルの時間解像度に対して、１/４のデコード能力をもつデコーダが処理可能とすべきである。また、Ｔｉｄ＝０〜３のデコードを行う場合、符号化されたフルの時間解像度に対して、１/２のデコード能力をもつデコーダが処理可能とすべきである。 In order to decode the encoded stream of FIG. 7A according to the decoding ability, a decoding ability having a full time resolution is required. However, when decoding Tid = 0 to 2, a decoder having a decoding capability of 1/4 should be able to process the encoded full time resolution. Further, when decoding Tid = 0 to 3, a decoder having a decoding capability of 1/2 should be able to process the encoded full time resolution.

しかし、階層符号化において参照される低階層に属するピクチャが連続し、それらが時間解像度でフルなタイミングで符号化されると、部分デコードするデコーダの能力が追い付かないことになる。図７（ａ）のＡの期間がそれに該当する。Ｔｉｄ＝０〜２、あるいはＴｉｄ＝０〜３の部分的な階層をデコードするデコーダは、表示の例で示すような、時間軸が１/４あるいは１/２の能力でデコード・表示を行うため、Ａの期間符号化された時間解像度がフルで連続するピクチャのデコードはできない。 However, if the pictures belonging to the lower hierarchy referred to in the hierarchical coding are continuous and encoded at full timing at the time resolution, the ability of the decoder to partially decode cannot catch up. The period A in FIG. 7 (a) corresponds to this. A decoder that decodes a partial hierarchy of Tid = 0 to 2 or Tid = 0 to 3 is for decoding and displaying with the ability of 1/4 or 1/2 of the time axis as shown in the display example. , A period-encoded time resolution is full and continuous pictures cannot be decoded.

ＴａはＴｉｄ＝０〜２をデコードするデコーダにおけるピクチャ毎のデコード処理に要する時間を示す。ＴｂはＴｉｄ＝０〜３をデコードするデコーダにおけるピクチャ毎のデコード処理に要する時間を示す。ＴｃはＴｉｄ＝０〜４（全階層）をデコードするデコーダにおけるピクチャ毎のデコード処理に要する時間を示す。これらの各時間の関係は、Ｔａ＞Ｔｂ＞Ｔｃとなる。 Ta indicates the time required for the decoding process for each picture in the decoder that decodes Tid = 0 to 2. Tb indicates the time required for the decoding process for each picture in the decoder that decodes Tid = 0 to 3. Tc indicates the time required for the decoding process for each picture in the decoder that decodes Tid = 0 to 4 (all layers). The relationship between these times is Ta> Tb> Tc.

この実施の形態においては、後述するように、受信装置２００は、デコード能力が低いデコーダを持ち、低階層ピクチャのデコードを選択的に行う場合、デコードタイムスタン（ＤＴＳ：decoding Time stamp）を書き換えて低階層ピクチャのデコード間隔を調整する機能を持つようにされる。これにより、デコード能力の低いデコーダでも、無理のないデコード処理が可能となる。 In this embodiment, as will be described later, when the receiving device 200 has a decoder having a low decoding ability and selectively decodes a low-layer picture, the decoding time stamp (DTS) is rewritten. It is made to have a function to adjust the decoding interval of low-level pictures. As a result, even a decoder having a low decoding ability can perform a reasonable decoding process.

図２に戻って、圧縮データバッファ(ｃｐｂ)１０３は、エンコーダ１０２で生成された、各階層のピクチャの符号化データを含むビデオストリームを、一時的に蓄積する。マルチプレクサ１０４は、圧縮データバッファ１０３に蓄積されているビデオストリームを読み出し、ＰＥＳパケット化し、さらにトランスポートパケット化して多重し、多重化ストリームとしてのトランスポートストリームＴＳを得る。 Returning to FIG. 2, the compressed data buffer (cpb) 103 temporarily stores a video stream containing encoded data of the pictures of each layer generated by the encoder 102. The multiplexer 104 reads the video stream stored in the compressed data buffer 103, converts it into a PES packet, further converts it into a transport packet and multiplexes it, and obtains a transport stream TS as a multiplexed stream.

この実施の形態においては、上述したように、複数の階層は２以上の所定数の階層組に分割される。マルチプレクサ１０４は、ＰＥＳパケットのヘッダ(ＰＥＳヘッダ)に、ビデオストリームが持つ各ピクチャの符号化画像データがそれぞれどの階層組に属するピクチャの符号化画像データであるかを識別する識別情報を挿入する。この識別情報により、受信側では、自身のデコード能力に応じた階層組のピクチャの符号化画像データのみをバッファに取り込んで処理することが可能となる。 In this embodiment, as described above, the plurality of layers are divided into a predetermined number of layer sets of two or more. The multiplexer 104 inserts, in the header (PES header) of the PES packet, identification information for identifying which layer set the coded image data of each picture of the video stream belongs to. With this identification information, the receiving side can take in only the encoded image data of the hierarchical set of pictures according to its own decoding ability into the buffer and process it.

マルチプレクサ１０４は、例えば、複数の階層を低階層組と高階層組に二分する場合、ＰＥＳヘッダに存在する、周知のＰＥＳプライオリティ（PES_priority）の１ビットフィールドを利用する。この１ビットフィールドは、ＰＥＳペイロードに低階層側の階層組のピクチャの符号化画像データを含む場合は“１”、つまり優先度が高く設定される。一方、この１ビットフィールドは、ＰＥＳペイロードに高階層側の階層組のピクチャの符号化画像データを含む場合は“０”、つまり優先度が低く設定される。 The multiplexer 104 uses, for example, a well-known PES priority (PES_priority) 1-bit field existing in the PES header when dividing a plurality of layers into a low-layer group and a high-layer group. This 1-bit field is set to "1" when the PES payload contains the encoded image data of the pictures of the lower layer set, that is, the priority is set high. On the other hand, this 1-bit field is set to "0", that is, a low priority when the PES payload contains encoded image data of a set of pictures on the higher layer side.

トランスポートストリームＴＳには、上述したように、各階層のピクチャの符号化画像データを持つ単一のビデオストリーム、あるいは上述の各階層組のピクチャの符号化画像データをそれぞれ持つ所定数のビデオストリームが含まれる。マルチプレクサ１０４は、トランスポートストリームＴＳに、階層情報、ストリーム構成情報を挿入する。 As described above, the transport stream TS includes a single video stream having encoded image data of the pictures of each layer, or a predetermined number of video streams having the encoded image data of the pictures of each layer set described above. Is included. The multiplexer 104 inserts hierarchical information and stream configuration information into the transport stream TS.

トランスポートストリームＴＳには、ＰＳＩ（Program Specific Information）の一つとして、ＰＭＴ（Program Map Table）が含まれている。このＰＭＴには、各ビデオストリームに関連した情報を持つビデオエレメンタリ・ループ（video ES1 loop）が存在する。このビデオエレメンタリ・ループには、各ビデオストリームに対応して、ストリームタイプ、パケット識別子（ＰＩＤ）等の情報が配置されると共に、そのビデオストリームに関連する情報を記述するデスクリプタも配置される。 The transport stream TS includes PMT (Program Map Table) as one of PSI (Program Specific Information). In this PMT, there is a video elemental loop (video ES1 loop) that has information related to each video stream. In this video elemental loop, information such as a stream type and a packet identifier (PID) is arranged corresponding to each video stream, and a descriptor describing information related to the video stream is also arranged.

マルチプレクサ１０４は、このデスクリプタの一つとして、ＨＥＶＣデスクリプタ（HEVC_descriptor）を挿入し、さらに、新たに定義するスケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）を挿入する。 The multiplexer 104 inserts a HEVC descriptor (HEVC_descriptor) as one of the descriptors, and further inserts a newly defined scalability extension descriptor (scalability_extension_descriptor).

図８は、ＨＥＶＣデスクリプタ（HEVC_descriptor）の構造例（Syntax）を示している。また、図９は、その構造例における主要な情報の内容（Semantics）を示している。 FIG. 8 shows a structural example (Syntax) of the HEVC descriptor (HEVC_descriptor). In addition, FIG. 9 shows the contents (Semantics) of the main information in the structural example.

「descriptor_tag」の８ビットフィールドは、デスクリプタタイプを示し、ここでは、ＨＥＶＣデスクリプタであることを示す。「descriptor_length」の８ビットフィールドは、デスクリプタの長さ（サイズ）を示し、デスクリプタの長さとして、以降のバイト数を示す。 The 8-bit field of "descriptor_tag" indicates the descriptor type, and here, it indicates that it is a HEVC descriptor. The 8-bit field of "descriptor_length" indicates the length (size) of the descriptor, and indicates the number of bytes thereafter as the length of the descriptor.

「level_idc」の８ビットフィールドは、ビットレートのレベル指定値を示す。また、「temporal_layer_subset_flag = 1」であるとき、「temporal_id_min」の５ビットフィールドと、「temporal_id_max」の５ビットフィールドが存在する。「temporal_id_min」は、対応するビデオストリームに含まれる階層符号化データの最も低い階層のtemporal_idの値を示す。「temporal_id_max」は、対応するビデオストリームが持つ階層符号化データの最も高い階層のtemporal_idの値を示す。 The 8-bit field of "level_idc" indicates the level specification value of the bit rate. Further, when "temporal_layer_subset_flag = 1", there are a 5-bit field of "temporal_id_min" and a 5-bit field of "temporal_id_max". "Temporal_id_min" indicates the value of temporary_id of the lowest hierarchy of the hierarchically encoded data contained in the corresponding video stream. “Temporal_id_max” indicates the value of temporary_id in the highest hierarchy of the hierarchically encoded data of the corresponding video stream.

「level_constrained_flag」の１ビットフィールドは、新たに定義するものであり、ＶＰＳのＮＡＬユニットに含まれるビットストリームのレベル指定値（general_level_idc）がピクチャ毎に変わり得ることを示す。“１”は変わり得ることを示し、“０”は変わらないことを示す。 The 1-bit field of "level_constrained_flag" is newly defined and indicates that the level specification value (general_level_idc) of the bitstream included in the NAL unit of the VPS can be changed for each picture. “1” indicates that it can change, and “0” indicates that it does not change.

上述したように、例えば、“general_level_idc”は、複数の階層を２以上の所定数の階層組に分割した際の所属階層組の識別情報として利用される。そのため、複数の階層組のピクチャの符号化画像データを持つビデオストリームの場合、“general_level_idc”がピクチャ毎に変わり得ることになる。一方、単一の階層組のピクチャの符号化画像データを持つビデオストリームの場合は、“general_level_idc”がピクチャ毎に変わるということはない。あるいは、sublayerごとに“sublayer_level_idc”が付され、デコーダはデコード可能な範囲のtemporal_idのパケットを読むことによって、対応する階層のデータを処理する。 As described above, for example, "general_level_idc" is used as identification information of the affiliation hierarchy group when a plurality of layers are divided into a predetermined number of hierarchy sets of two or more. Therefore, in the case of a video stream having coded image data of a plurality of hierarchical sets of pictures, "general_level_idc" can change for each picture. On the other hand, in the case of a video stream having encoded image data of a single hierarchical set of pictures, "general_level_idc" does not change for each picture. Alternatively, "sublayer_level_idc" is attached to each sublayer, and the decoder processes the data of the corresponding layer by reading the packet of temporary_id in the decodable range.

「scalability_id」の３ビットフィールドは、新たに定義するものであり、複数のビデオストリームがスケーラブルなサービスを供給する際、個々のストリームに付されるスケーラビリティを示すＩＤである。“０”はベースストリームを示し、“１”〜“７”はベースストリームからのスケーラビリティの度合いによって増加するＩＤである。 The 3-bit field of "scalability_id" is newly defined and is an ID indicating the scalability assigned to each stream when a plurality of video streams provide a scalable service. “0” indicates a base stream, and “1” to “7” are IDs that increase depending on the degree of scalability from the base stream.

図１０は、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）の構造例（Syntax）を示している。また、図１１は、その構造例における主要な情報の内容（Semantics）を示している。 FIG. 10 shows a structural example (Syntax) of a scalability extension descriptor (scalability_extension_descriptor). In addition, FIG. 11 shows the contents (Semantics) of the main information in the structural example.

「scalability_extension_descriptor_tag」の８ビットフィールドは、デスクリプタタイプを示し、ここでは、スケーラビリティ・エクステンション・デスクリプタであることを示す。「scalability_extension_descriptor_length」の８ビットフィールドは、デスクリプタの長さ（サイズ）を示し、デスクリプタの長さとして、以降のバイト数を示す。「extension_stream_existing_flag」の１ビットフィールドは、別ストリームによる拡張サービスがあることを示すフラグである。“１”は拡張ストリームがあることを示し、“０”は拡張ストリームがないことを示す。 The 8-bit field of "scalability_extension_descriptor_tag" indicates the descriptor type, and here, it indicates that it is a scalability extension descriptor. The 8-bit field of "scalability_extension_descriptor_length" indicates the length (size) of the descriptor, and indicates the number of bytes thereafter as the descriptor length. The 1-bit field of "extension_stream_existing_flag" is a flag indicating that there is an extension service by another stream. “1” indicates that there is an extended stream, and “0” indicates that there is no extended stream.

「extension_type」の３ビットフィールドは、拡張のタイプを示す。“００１”は、拡張が、時間方向スケーラブルであることを示す。“０１０”は、拡張が、空間方向スケーラブルであることを示す。“０１１”は、拡張が、ビットレートスケーラブルであることを示す。 The 3-bit field of "extension_type" indicates the type of extension. “001” indicates that the extension is time-wise scalable. “010” indicates that the extension is spatially scalable. “011” indicates that the extension is bit rate scalable.

「number_of_streams」の４ビットフィールドは、配信サービスに関与するストリームの総数を示す。「scalability_id」の３ビットフィールドは、複数のビデオストリームがスケーラブルなサービスを供給する際、個々のストリームに付されるスケーラビリティを示すＩＤである。“０”はベースストリームを示し、“１”〜“７”はベースストリームからのスケーラビリティの度合いによって増加するＩＤである。 The 4-bit field of "number_of_streams" indicates the total number of streams involved in the distribution service. The 3-bit field of "scalability_id" is an ID indicating the scalability attached to each stream when a plurality of video streams provide a scalable service. “0” indicates a base stream, and “1” to “7” are IDs that increase depending on the degree of scalability from the base stream.

「number_of_layers」の３ビットフィールドは、当該ストリームの総階層数を示す。「sublayer_level_idcの８ビットフィールドは、temporal_idで示される該当サブレイヤが、それより下位のレイヤを含んで、デコーダが対応するlevel_idcの値を示す。「Number of layers」は、ＮＡＬユニットヘッダ（NAL unit header）の「Nuh_temporal_id_plus1」のすべての値を包含するものであり、デマルチプレクサ（demuxer）がこれを検知することで、所定のlevel_idcに対応するデコーダがどの階層までデコードできるかを、「sublayer_level_idc」により事前に認識することが可能となる。 The 3-bit field of "number_of_layers" indicates the total number of layers of the stream. "The 8-bit field of sublayer_level_idc indicates the value of level_idc corresponding to the corresponding sublayer indicated by temporary_id, including the layers below it." Number of layers "is the NAL unit header. It includes all the values of "Nuh_temporal_id_plus1" in It becomes possible to recognize.

上述したように、この実施の形態において、ＳＰＳに含まれるビットレートのレベル指定値（general_level_idc）などは、複数の階層を２以上の所定数の階層組に分割した際の所属階層組の識別情報として利用される。各階層組のレベル指定値の値は、この階層組のピクチャと、この階層組より低階層側の全ての階層組のピクチャとからなるフレームレートに対応した値とされる。 As described above, in this embodiment, the bit rate level specified value (general_level_idc) included in the SPS is the identification information of the affiliation hierarchy group when a plurality of layers are divided into a predetermined number of hierarchy sets of two or more. It is used as. The value of the level designation value of each layer set is a value corresponding to the frame rate consisting of the picture of this layer group and the picture of all the layer groups on the lower layer side of this layer group.

図１２は、マルチプレクサ１０４の構成例を示している。ＰＥＳプライオリティ発生部１４１と、セクションコーディング部１４２と、ＰＥＳパケット化部１４３-1〜１４３-Nと、スイッチ部１４４と、トランスポートパケット化部１４５を有している。 FIG. 12 shows a configuration example of the multiplexer 104. It has a PES priority generation unit 141, a section coding unit 142, a PES packetization unit 143-1 to 143-N, a switch unit 144, and a transport packetization unit 145.

ＰＥＳパケット化部１４３-1〜１４３-Nは、それぞれ、圧縮データバッファ１０３に蓄積されているビデオストリーム１〜Ｎを読み込み、ＰＥＳパケットを生成する。この際、ＰＥＳパケット化部１４３-1〜１４３-Nは、ビデオストリーム１〜ＮのＨＲＤ情報を元にＤＴＳ（Decoding Time Stamp）、ＰＴＳ（Presentation Time Stamp）のタイムスタンプをＰＥＳヘッダに付与する、この場合、各ピクチャの「cpu_removal_delay」、「dpb_output_delay」が参照され、ＳＴＣ（System Time Clock）時刻に同期した精度で、各々ＤＴＳ、ＰＴＳに変換され、ＰＥＳヘッダの所定位置に配置される。 The PES packetizing units 143-1 to 143-N read the video streams 1 to N stored in the compressed data buffer 103, respectively, and generate a PES packet. At this time, the PES packetizing units 143-1 to 143-N add DTS (Decoding Time Stamp) and PTS (Presentation Time Stamp) time stamps to the PES header based on the HRD information of the video streams 1 to N. In this case, "cpu_removal_delay" and "dpb_output_delay" of each picture are referred to, converted into DTS and PTS, respectively, with an accuracy synchronized with the STC (System Time Clock) time, and placed at a predetermined position in the PES header.

ＰＥＳプライオリティ発生部１４１には、ＣＰＵ１０１から、階層数（Number of layers）とストリーム数（Number of streams）の情報が供給される。ＰＥＳプライオリティ発生部１４１は、階層数で示される複数の階層を２以上の所定数の階層組に分割した場合における、各階層組の優先度情報を発生する。例えば、２分割される場合には、ＰＥＳパケットヘッダの「PES_priority」の１ビットフィールドに挿入すべき値（低階層組は“１”、高階層組は“０”）を発生する。 Information on the number of layers and the number of streams is supplied from the CPU 101 to the PES priority generation unit 141. The PES priority generation unit 141 generates priority information for each layer set when a plurality of layers indicated by the number of layers are divided into two or more layer sets. For example, when the PES packet header is divided into two, a value to be inserted in the 1-bit field of "PES_priority" of the PES packet header ("1" for the low-level group and "0" for the high-level group) is generated.

ＰＥＳプライオリティ発生部１４１で発生される各階層組の優先度情報は、ＰＥＳパケット化部１４３-1〜１４３-Nに供給される。ＰＥＳパケット化部１４３-1〜１４３-Nは、この各階層組の優先度を、その階層組のピクチャの符号化画像データを含むＰＥＳパケットのヘッダに識別情報として挿入する。 The priority information of each layer set generated by the PES priority generation unit 141 is supplied to the PES packetization units 143-1 to 143-N. The PES packetizing units 143-1 to 143-N insert the priority of each layer set into the header of the PES packet including the coded image data of the picture of the layer set as identification information.

なお、このようにピクチャ毎にＰＥＳパケットのヘッダにそのピクチャが属する階層組の優先度をヘッダ情報として挿入する処理は、エンコーダ１０２で単一のビデオストリーム（シングルストリーム）が生成される場合に限ってもよい。この場合は、ＰＥＳパケット化部１４３-1でのみ処理が行われることとなる。 Note that the process of inserting the priority of the hierarchical set to which the picture belongs into the header of the PES packet for each picture as header information is limited to the case where the encoder 102 generates a single video stream (single stream). You may. In this case, the processing is performed only by the PES packetizing unit 143-1.

スイッチ部１４４は、ＰＥＳパケット化部１４３-1〜１４３-Nで生成されたＰＥＳパケットを、パケット識別子（ＰＩＤ）に基づいて選択的に取り出し、トランスポートパケット化部１４５に送る。トランスポートパケット化部１４５は、ＰＥＳパケットをペイロードに含むＴＳパケットを生成し、トランスポートストリームＴＳを得る。 The switch unit 144 selectively extracts the PES packets generated by the PES packetization units 143-1 to 143-N based on the packet identifier (PID) and sends them to the transport packetization unit 145. The transport packetization unit 145 generates a TS packet including a PES packet in the payload, and obtains a transport stream TS.

セクションコーディング部１４２は、トランスポートストリームＴＳに挿入すべき各種のセクションデータを生成する。セクションコーディング部１４２には、ＣＰＵ１０１から、階層数（Number of layers）と、ストリーム数（Number of streams）の情報が供給される。セクションコーディング部１４２は、この情報に基づいて、上述したＨＥＶＣデスクリプタ（HEVC_descriptor）、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）を生成する。 The section coding unit 142 generates various section data to be inserted into the transport stream TS. Information on the number of layers and the number of streams is supplied to the section coding unit 142 from the CPU 101. The section coding unit 142 generates the above-mentioned HEVC descriptor (HEVC_descriptor) and the scalability extension descriptor (scalability_extension_descriptor) based on this information.

セクションコーディング部１４２は、各種セクションデータを、トランスポートパケット化部１４５に送る。トランスポートパケット化部１４５は、このセクションデータを含むＴＳパケットを生成し、トランスポートストリームＴＳに挿入する。 The section coding unit 142 sends various section data to the transport packetization unit 145. The transport packetization unit 145 generates a TS packet containing this section data and inserts it into the transport stream TS.

図１３は、マルチプレクサ１０４の処理フローを示す。この例は、複数の階層を低階層組と高階層組の２つに分割する例である。マルチプレクサ１０４は、ステップＳＴ１において、処理を開始し、その後に、ステップＳＴ２の処理に移る。このステップＳＴ２において、マルチプレクサ１０４は、ビデオストリーム（ビデオエレメンタリストリーム）の各ピクチャのtemporal_id_と、構成する符号化ストリーム数を設定する。 FIG. 13 shows the processing flow of the multiplexer 104. This example is an example of dividing a plurality of layers into two groups, a low layer group and a high layer group. The multiplexer 104 starts processing in step ST1 and then moves on to processing in step ST2. In this step ST2, the multiplexer 104 sets the temporal_id_ of each picture of the video stream (video elementary stream) and the number of encoded streams to be configured.

次に、マルチプレクサ１０４は、ステップＳＴ３において、ＨＲＤ情報（cpu_removal_delay、dpb_output_delay）を参照して、ＤＴＳ、ＰＴＳを決め、ＰＥＳヘッダの所定位置に挿入する。 Next, in step ST3, the multiplexer 104 determines the DTS and PTS with reference to the HRD information (cpu_removal_delay, dpb_output_delay) and inserts them at a predetermined position in the PES header.

次に、マルチプレクサ１０４は、ステップＳＴ４において、シングルストリーム（単一ビデオストリーム）か否かを判断する。シングルストリームであるとき、マルチプレクサ１０４は、ステップＳＴ５において、１つのＰＩＤ（パケット識別子）で多重化処理を進めることとし、その後に、ステップＳＴ７の処理に移る。 Next, the multiplexer 104 determines in step ST4 whether or not it is a single stream (single video stream). When it is a single stream, the multiplexer 104 decides to proceed with the multiplexing process with one PID (packet identifier) in step ST5, and then moves to the process of step ST7.

このステップＳＴ７において、マルチプレクサ１０４は、ピクチャのそれぞれについて低階層組のピクチャ（スライス）であるか判断する。低階層組のピクチャであるとき、マルチプレクサ１０４は、ステップＳＴ８において、ペイロードにそのピクチャの符号化画像データを含むＰＥＳパケットのヘッダの「PES_priority」を“１”に設定する。一方、高階層組（非低階層組）のピクチャであるとき、マルチプレクサ１０４は、ステップＳＴ９において、ペイロードにそのピクチャの符号化画像データを含むＰＥＳパケットのヘッダの「PES_priority」を“０”に設定する。マルチプレクサ１０４は、ステップＳＴ８、ステップＳＴ９の処理の後、ステップＳＴ１０の処理に移る。 In this step ST7, the multiplexer 104 determines whether each of the pictures is a low-layer set of pictures (slices). When the picture is a low-layer set, the multiplexer 104 sets “PES_priority” of the header of the PES packet including the encoded image data of the picture in the payload in step ST8 to “1”. On the other hand, when the picture is in a high-layer group (non-low-layer group), the multiplexer 104 sets "PES_priority" in the header of the PES packet containing the encoded image data of the picture in the payload in step ST9 to "0". To do. The multiplexer 104 moves to the process of step ST10 after the processes of steps ST8 and ST9.

ここで、ピクチャ（picture）とスライス（slice）の関連付けについて説明する。ピクチャは、概念で、構造定義としてはスライスと同じである。１ピクチャは、複数のスライスに分けられるが、この複数のスライスがアクセスユニットとしては同じであることは、パラメータセット（parameter set）でわかるようになっている。 Here, the association between the picture and the slice will be described. A picture is a concept and has the same structure definition as a slice. One picture is divided into a plurality of slices, and it can be seen from the parameter set that the plurality of slices are the same as an access unit.

上述のステップＳＴ４でシングルストリームでないとき、マルチプレクサ１０４は、ステップＳＴ６において、複数のパケットＰＩＤ（パケット識別子）で多重化処理を進めることとし、その後に、ステップＳＴ１０の処理に移る。このステップＳＴ１０において、マルチプレクサ１０４は、符号化ストリーム（ビデオエレメンタリストリーム）をＰＥＳペイロードに挿入してＰＥＳパケット化する。 When it is not a single stream in step ST4 described above, the multiplexer 104 decides to proceed with the multiplexing process with a plurality of packet PIDs (packet identifiers) in step ST6, and then proceeds to the process of step ST10. In this step ST10, the multiplexer 104 inserts a coded stream (video elementary stream) into the PES payload to form a PES packet.

次に、マルチプレクサ１０４は、ステップＳＴ１１において、ＨＥＶＣデスクリプタ、スケーラビリティ・エクステンション・デスクリプタなどをコーディングする。そして、マルチプレクサ１０４は、ステップＳＴ１２においてトランスポートパケット化し、トランスポートストリームＴＳを得る。その後、マルチプレクサ１０４は、ステップＳＴ１３において、処理を終了する。 Next, the multiplexer 104 codes the HEVC descriptor, the scalability extension descriptor, and the like in step ST11. Then, the multiplexer 104 makes a transport packet in step ST12 and obtains a transport stream TS. After that, the multiplexer 104 ends the process in step ST13.

図１４は、単一ストリームによる配信を行う場合のトランスポートストリームＴＳの構成例を示している。このトランスポートストリームＴＳには、１つのビデオストリームが含まれている。すなわち、この構成例では、複数の階層のピクチャの例えばＨＥＶＣによる符号化画像データを持つビデオストリームのＰＥＳパケット「video PES1」が存在すると共に、オーディオストリームのＰＥＳパケット「audio PES1」が存在する FIG. 14 shows a configuration example of the transport stream TS in the case of distribution by a single stream. This transport stream TS includes one video stream. That is, in this configuration example, a video stream PES packet "video PES1" having, for example, HEVC-encoded image data of a plurality of layers of pictures exists, and an audio stream PES packet "audio PES1" exists.

各ピクチャの符号化画像データには、ＶＰＳ、ＳＰＳ、ＳＥＩなどのＮＡＬユニットが存在する。上述したように、各ピクチャのＮＡＬユニットのヘッダには、そのピクチャの階層を示すtemporal_idが挿入されている。また、例えば、ＶＰＳにはビットレートのレベル指定値（general_level_idc）が含まれている。また、例えば、ピクチャ・タイミング・ＳＥＩ（Picture timing SEI）には、「cpb_removal_delay」と「dpb_output_delay」が含まれている。 NAL units such as VPS, SPS, and SEI are present in the coded image data of each picture. As described above, a temporary_id indicating the hierarchy of the picture is inserted in the header of the NAL unit of each picture. Further, for example, the VPS includes a bit rate level specification value (general_level_idc). Further, for example, the picture timing SEI (Picture timing SEI) includes "cpb_removal_delay" and "dpb_output_delay".

また、ＰＥＳパケットのヘッダ（ＰＥＳヘッダ）に「PES_priority」の１ビットの優先度を示すフィールドが存在する。この「PES_priority」により、ＰＥＳペイロードに含まれるピクチャの符号化画像データが、低階層組のピクチャのものか、あるいは高階層組のピクチャのものかが識別可能である。 Further, in the header (PES header) of the PES packet, there is a field indicating the priority of 1 bit of "PES_priority". By this "PES_priority", it is possible to identify whether the encoded image data of the picture included in the PES payload is that of a low-layer set of pictures or that of a high-layer set of pictures.

また、トランスポートストリームＴＳには、ＰＳＩ（Program Specific Information）の一つとして、ＰＭＴ（Program Map Table）が含まれている。このＰＳＩは、トランスポートストリームに含まれる各エレメンタリストリームがどのプログラムに属しているかを記した情報である。 Further, the transport stream TS includes PMT (Program Map Table) as one of PSI (Program Specific Information). This PSI is information describing which program each elementary stream included in the transport stream belongs to.

ＰＭＴには、プログラム全体に関連する情報を記述するプログラム・ループ（Program loop）が存在する。また、ＰＭＴには、各エレメンタリストリームに関連した情報を持つエレメンタリ・ループが存在する。この構成例では、ビデオエレメンタリ・ループ（video ES1 loop）が存在すると共に、オーディオエレメンタリ・ループ（audio ES1 loop）が存在する。 The PMT has a program loop that describes information related to the entire program. In addition, the PMT has an elemental loop that has information related to each elementary stream. In this configuration example, there is a video elemental loop (video ES1 loop) and an audio elemental loop (audio ES1 loop).

ビデオエレメンタリ・ループには、ビデオストリーム（video PES1）に対応して、ストリームタイプ、パケット識別子（PID）等の情報が配置されると共に、そのビデオストリームに関連する情報を記述するデスクリプタも配置される。このデスクリプタの一つとして、上述したＨＥＶＣデスクリプタ（HEVC_descriptor）、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）が挿入される。 In the video elemental loop, information such as the stream type and packet identifier (PID) is arranged corresponding to the video stream (video PES1), and a descriptor that describes the information related to the video stream is also arranged. To. As one of the descriptors, the above-mentioned HEVC descriptor (HEVC_descriptor) and the scalability extension descriptor (scalability_extension_descriptor) are inserted.

図２に戻って、送信部１０５は、トランスポートストリームＴＳを、例えば、ＱＰＳＫ／ＯＦＤＭ等の放送に適した変調方式で変調し、ＲＦ変調信号を送信アンテナから送信する。 Returning to FIG. 2, the transmission unit 105 modulates the transport stream TS by a modulation method suitable for broadcasting such as QPSK / OFDM, and transmits an RF modulated signal from the transmission antenna.

図２に示す送信装置１００の動作を簡単に説明する。エンコーダ１０２には、非圧縮の動画像データが入力される。エンコーダ１０２では、この動画像データに対して、階層符号化が行われる。すなわち、エンコーダ１０２では、この動画像データを構成する各ピクチャの画像データが複数の階層に分類されて符号化され、各階層のピクチャの符号化画像データを持つビデオストリームが生成される。この際、参照するピクチャが、自己階層および／または自己階層よりも下位の階層に所属するように、符号化される。 The operation of the transmission device 100 shown in FIG. 2 will be briefly described. Uncompressed moving image data is input to the encoder 102. In the encoder 102, hierarchical coding is performed on the moving image data. That is, in the encoder 102, the image data of each picture constituting the moving image data is classified into a plurality of layers and encoded, and a video stream having the encoded image data of the pictures of each layer is generated. At this time, the referenced picture is encoded so that it belongs to the self-hierarchy and / or the hierarchy lower than the self-hierarchy.

エンコーダ１０２では、各階層のピクチャの符号化画像データを持つ単一のビデオストリームが生成されるか、あるいは、複数の階層が２以上の所定数の階層組に分割され、各階層組のピクチャの符号化画像データをそれぞれ持つ所定数のビデオストリームが生成される。 In the encoder 102, a single video stream having encoded image data of the pictures of each layer is generated, or a plurality of layers are divided into a predetermined number of layer sets of two or more, and the pictures of each layer set A predetermined number of video streams, each with encoded image data, are generated.

また、エンコーダ１０２で生成された、各階層のピクチャの符号化データを含むビデオストリームは、圧縮データバッファ（ｃｐｂ）１０３に供給され、一時的に蓄積される。マルチプレクサ１０４では、圧縮データバッファ１０３に蓄積されているビデオストリームが読み出され、ＰＥＳパケット化され、さらにトランスポートパケット化されて多重され、多重化ストリームとしてのトランスポートストリームＴＳが得られる。 Further, the video stream including the encoded data of the pictures of each layer generated by the encoder 102 is supplied to the compressed data buffer (cpb) 103 and temporarily stored. In the multiplexer 104, the video stream stored in the compressed data buffer 103 is read out, converted into a PES packet, further converted into a transport packet and multiplexed, and the transport stream TS as a multiplexed stream is obtained.

マルチプレクサ１０４では、例えば、単一のビデオストリーム（シングルストリーム）の場合、ＰＥＳパケットのヘッダ(ＰＥＳヘッダ)に、ビデオストリームが持つ各ピクチャの符号化画像データがそれぞれどの階層組に属するピクチャの符号化画像データであるかを識別する識別情報が挿入される。例えば、複数の階層を低階層組と高階層組に二分する場合、ＰＥＳヘッダのＰＥＳプライオリティ（PES_priority）の１ビットフィールドが利用される。 In the multiplexer 104, for example, in the case of a single video stream (single stream), the coded image data of each picture of the video stream is encoded in the header (PES header) of the PES packet to which layer set the picture data belongs to. Identification information that identifies whether it is image data is inserted. For example, when dividing a plurality of layers into a low-layer group and a high-level group, a 1-bit field of PES priority (PES_priority) in the PES header is used.

また、マルチプレクサ１０４では、トランスポートストリームＴＳに、階層情報、ストリーム構成情報が挿入される。すなわち、マルチプレクサ１０４では、各ビデオストリームに対応したビデオエレメンタリ・ループに、ＨＥＶＣデスクリプタ（HEVC_descriptor）、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）が挿入される。 Further, in the multiplexer 104, hierarchical information and stream configuration information are inserted into the transport stream TS. That is, in the multiplexer 104, the HEVC descriptor (HEVC_descriptor) and the scalability extension descriptor (scalability_extension_descriptor) are inserted into the video elemental loop corresponding to each video stream.

マルチプレクサ１０４で生成されるトランスポートストリームＴＳは、送信部１０５に送られる。送信部１０５では、このトランスポートストリームＴＳが、例えば、ＱＰＳＫ／ＯＦＤＭ等の放送に適した変調方式で変調され、ＲＦ変調信号が送信アンテナから送信される。 The transport stream TS generated by the multiplexer 104 is sent to the transmission unit 105. In the transmission unit 105, the transport stream TS is modulated by a modulation method suitable for broadcasting such as QPSK / OFDM, and an RF modulation signal is transmitted from the transmission antenna.

「受信装置の構成」
図１５は、受信装置２００の構成例を示している。この受信装置２００は、ＣＰＵ（Central Processing Unit）２０１と、受信部２０２と、デマルチプレクサ２０３と、圧縮データバッファ（ｃｐｂ：coded picture buffer）２０４を有している。また、この受信装置２００は、デコーダ２０５と、非圧縮データバッファ（ｄｐｂ：decoded picture buffer）２０６と、ポスト処理部２０７を有している。ＣＰＵ２０１は、制御部を構成し、受信装置２００の各部の動作を制御する。 "Receiver configuration"
FIG. 15 shows a configuration example of the receiving device 200. The receiving device 200 includes a CPU (Central Processing Unit) 201, a receiving unit 202, a demultiplexer 203, and a compressed data buffer (cpb: coded picture buffer) 204. Further, the receiving device 200 has a decoder 205, an uncompressed data buffer (dpb: decoded picture buffer) 206, and a post processing unit 207. The CPU 201 constitutes a control unit and controls the operation of each unit of the receiving device 200.

受信部２０２は、受信アンテナで受信されたＲＦ変調信号を復調し、トランスポートストリームＴＳを取得する。デマルチプレクサ２０３は、トランスポートストリームＴＳから、デコード能力（Decoder temporal layer capability）に応じた階層組のピクチャの符号化画像データを選択的に取り出し、圧縮データバッファ（ｃｐｂ：coded picture buffer）２０４に送る。 The receiving unit 202 demodulates the RF modulated signal received by the receiving antenna and acquires the transport stream TS. The demultiplexer 203 selectively extracts the encoded image data of the layered picture according to the decoding capability (Decoder temporal layer capability) from the transport stream TS and sends it to the compressed data buffer (cpb: coded picture buffer) 204. ..

図１６は、デマルチプレクサ２０３の構成例を示している。デマルチプレクサ２０３は、ＴＳアダプテーションフィールド抽出部２３１と、クロック情報抽出部２３２と、ＴＳペイロード抽出部２３３と、セクション抽出部２３４と、ＰＳＩテーブル/デスクリプタ抽出部２３５と、ＰＥＳパケット抽出部２３６を有している。また、デマルチプレクサ２０３は、ＰＥＳヘッダ抽出部２３７と、タイムスタンプ抽出部２３８と、識別情報抽出部２３９と、ＰＥＳペイロード抽出部２４０と、ストリーム構成部（ストリームコンポーザ）２４１を有している。 FIG. 16 shows a configuration example of the demultiplexer 203. The demultiplexer 203 includes a TS adaptation field extraction unit 231, a clock information extraction unit 232, a TS payload extraction unit 233, a section extraction unit 234, a PSI table / descriptor extraction unit 235, and a PES packet extraction unit 236. ing. Further, the demultiplexer 203 has a PES header extraction unit 237, a time stamp extraction unit 238, an identification information extraction unit 239, a PES payload extraction unit 240, and a stream configuration unit (stream composer) 241.

ＴＳアダプテーションフィールド抽出部２３１は、トランスポートストリームＴＳのアダプテーションフィールドを持つＴＳパケットから当該アダプテーションフィールドを抽出する。クロック情報抽出部２３２は、ＰＣＲ（Program Clock Reference）が含まれるアダプテーションフィールドから当該ＰＣＲを抽出し、ＣＰＵ２０１に送る。 The TS adaptation field extraction unit 231 extracts the adaptation field from the TS packet having the adaptation field of the transport stream TS. The clock information extraction unit 232 extracts the PCR from the adaptation field including the PCR (Program Clock Reference) and sends it to the CPU 201.

ＴＳペイロード抽出部２３３は、トランスポートストリームＴＳのＴＳペイロードを持つＴＳパケットから当該ＴＳペイロードを抽出する。セクション抽出部２３４は、セクションデータが含まれるＴＳペイロードから当該セクションデータを抽出する。ＰＳＩテーブル/デスクリプタ抽出部２３５は、セクション抽出部２３４で抽出されたセクションデータを解析し、ＰＳＩテーブルやデスクリプタを抽出する。そして、ＰＳＩテーブル/デスクリプタ抽出部２３５は、temporal_idの最小値（min）と最大値（max）を、ＣＰＵ２０１に送ると共に、ストリーム構成部２４１に送る。 The TS payload extraction unit 233 extracts the TS payload from the TS packet having the TS payload of the transport stream TS. The section extraction unit 234 extracts the section data from the TS payload including the section data. The PSI table / descriptor extraction unit 235 analyzes the section data extracted by the section extraction unit 234 and extracts the PSI table and the descriptor. Then, the PSI table / descriptor extraction unit 235 sends the minimum value (min) and the maximum value (max) of temporary_id to the CPU 201 and also to the stream configuration unit 241.

ＰＥＳパケット抽出部２３６は、ＰＥＳパケットが含まれるＴＳペイロードから当該ＰＥＳパケットを抽出する。ＰＥＳヘッダ抽出部２３７は、ＰＥＳパケット抽出部２３６で抽出されたＰＥＳパケットからＰＥＳヘッダを抽出する。タイムスタンプ抽出部２３８は、ピクチャ毎にＰＥＳヘッダに挿入されているタイムスタンプ（ＤＴＳ、ＰＴＳ）を抽出し、ＣＰＵ２０１に送ると共に、ストリーム構成部２４１に送る。 The PES packet extraction unit 236 extracts the PES packet from the TS payload including the PES packet. The PES header extraction unit 237 extracts the PES header from the PES packet extracted by the PES packet extraction unit 236. The time stamp extraction unit 238 extracts the time stamps (DTS, PTS) inserted in the PES header for each picture, sends them to the CPU 201, and sends them to the stream configuration unit 241.

識別情報抽出部２３９は、ピクチャ毎にＰＥＳヘッダに挿入されている、そのピクチャが属する階層組を識別する識別情報を抽出し、ストリーム構成部２４１に送る。例えば、複数の階層が低階層組と高階層組に２分されている場合、ＰＥＳヘッダの「PES_priority」の１ビットフィールドの優先度情報を抽出し、ストリーム構成部２４１に送る。なお、この識別情報は、トランスポートストリームＴＳに単一のビデオストリームが含まれる場合には送信側で必ず挿入されているが、トランスポートストリームＴＳに複数のビデオストリームが含まれる場合には送信側で挿入されないこともある。 The identification information extraction unit 239 extracts the identification information inserted in the PES header for each picture and identifies the hierarchical set to which the picture belongs, and sends it to the stream configuration unit 241. For example, when a plurality of layers are divided into a low layer group and a high layer group, the priority information of the 1-bit field of "PES_priority" in the PES header is extracted and sent to the stream configuration unit 241. Note that this identification information is always inserted on the transmitting side when the transport stream TS contains a single video stream, but on the transmitting side when the transport stream TS contains a plurality of video streams. It may not be inserted with.

ＰＥＳペイロード抽出部２４０は、ＰＥＳパケット抽出部２３６で抽出されたＰＥＳパケットからＰＥＳペイロード、つまり、各階層のピクチャの符号化画像データを抽出する。ストリーム構成部２４１は、ＰＥＳペイロード抽出部２４０で取り出される各階層のピクチャの符号化画像データから、デコード能力（Decoder temporal layer capability）に応じた階層組のピクチャの符号化画像データを選択的に取り出し、圧縮データバッファ（ｃｐｂ：coded picture buffer）２０４に送る。この場合、ストリーム構成部２４１は、ＰＳＩテーブル/デスクリプタ抽出部２３５で得られる階層情報、ストリーム構成情報、識別情報抽出部２３９で抽出される識別情報（優先度情報）などを参照する。 The PES payload extraction unit 240 extracts the PES payload, that is, the encoded image data of the pictures of each layer from the PES packets extracted by the PES packet extraction unit 236. The stream configuration unit 241 selectively extracts the coded image data of the layered pictures according to the decoding capability (Decoder temporal layer capability) from the coded image data of the pictures of each layer extracted by the PES payload extraction unit 240. , Send to compressed data buffer (cpb: coded picture buffer) 204. In this case, the stream configuration unit 241 refers to the hierarchical information obtained by the PSI table / descriptor extraction unit 235, the stream configuration information, the identification information (priority information) extracted by the identification information extraction unit 239, and the like.

例えば、トランスポートストリームＴＳに含まれるビデオストリーム（符号化ストリーム）のフレームレートが１２０ｆｐｓである場合を考える。例えば、複数の階層が低階層側の階層組と高階層側の階層組とに２分され、各階層組のピクチャのフレームレートがそれぞれ６０ｆｐｓであるとする。例えば、上述の図３に示す階層符号化例では、階層０から３は低階層側の階層組とされ、６０ｆｐｓのlevel_idc対応のデコーダがデコード可能となる。また、階層４は高階層側の階層組とされ、１２０ｆｐｓのlevel_idc対応のデコーダがデコード可能となる。 For example, consider a case where the frame rate of the video stream (encoded stream) included in the transport stream TS is 120 fps. For example, it is assumed that a plurality of layers are divided into a lower layer side layer group and a higher layer side layer group, and the frame rate of the picture of each layer group is 60 fps. For example, in the hierarchical coding example shown in FIG. 3 described above, layers 0 to 3 are set on the lower layer side, and a decoder corresponding to level_idc at 60 fps can decode. Further, the layer 4 is a layered set on the higher layer side, and a decoder corresponding to level_idc of 120 fps can decode it.

この場合、トランスポートストリームＴＳに、各階層のピクチャの符号化データを持つ単一のビデオストリーム（符号化ストリーム）が含まれているか、あるいは、低階層側の階層組のピクチャの符号化画像データ持つベースストリム（B_str）と、高階層側の階層組のピクチャの符号化画像データを持つ拡張ストリーム（E_str）の２つのビデオストリーム（符号化ストリーム）が含まれている。 In this case, the transport stream TS includes a single video stream (encoded stream) having encoded data of the pictures of each layer, or encoded image data of the pictures of the lower layer set. It includes two video streams (encoded streams), a base strrim (B_str) having a base strim, and an extended stream (E_str) having encoded image data of a hierarchical set of pictures on the higher layer side.

ストリーム構成部２４１は、デコード能力が、１２０ｆｐｓに対応している場合、全階層のピクチャの符号化画像データを取り出し、圧縮データバッファ（ｃｐｂ）２０４に送る。一方、ストリーム構成部２４１は、デコード能力が、１２０ｆｐｓに対応していないが６０ｆｐｓに対応している場合、低階層側の階層組のピクチャの符号化画像データのみを取り出し、圧縮データバッファ（ｃｐｂ）２０４に送る。 When the decoding capability corresponds to 120 fps, the stream component unit 241 takes out the encoded image data of the pictures of all layers and sends them to the compressed data buffer (cpb) 204. On the other hand, when the decoding capability does not correspond to 120 fps but corresponds to 60 fps, the stream component unit 241 extracts only the encoded image data of the pictures of the lower layer set and compresses the data buffer (cpb). Send to 204.

図１７は、トランスポートストリームＴＳに単一のビデオストリーム（符号化ストリーム）が含まれている場合におけるストリーム構成部２４１のピクチャ（スライス）選択の一例を示している。ここで、「High」は高階層側の階層組のピクチャを示し、「Low」は低階層側の階層組のピクチャを示す。また、「Ｐ」は「PES_priority」を示している。 FIG. 17 shows an example of picture (slice) selection of the stream component unit 241 when the transport stream TS includes a single video stream (encoded stream). Here, "High" indicates a picture of a hierarchical group on the higher layer side, and "Low" indicates a picture of a hierarchical group on the lower layer side. Further, "P" indicates "PES_priority".

デコード能力が、１２０ｆｐｓに対応している場合、ストリーム構成部２４１は、全階層のピクチャの符号化画像データを取り出し、圧縮データバッファ（ｃｐｂ）２０４に送る。一方、デコード能力が、１２０ｆｐｓに対応していないが６０ｆｐｓに対応している場合、ストリーム構成部２４１は、「PES_priority」に基づくフィルタリングを行って、Ｐ＝１である低階層側の階層組のピクチャだけを取り出し、圧縮データバッファ（ｃｐｂ）２０４に送る。 When the decoding capability corresponds to 120 fps, the stream configuration unit 241 takes out the encoded image data of the pictures of all layers and sends them to the compressed data buffer (cpb) 204. On the other hand, when the decoding capability does not correspond to 120 fps but corresponds to 60 fps, the stream configuration unit 241 performs filtering based on "PES_priority" and performs a filter based on "PES_priority", and the picture of the lower layer set with P = 1. Is taken out and sent to the compressed data buffer (cpb) 204.

図１８は、トランスポートストリームＴＳにベースストリームと拡張ストリームの２つのビデオストリーム（符号化ストリーム）が含まれている場合におけるストリーム構成部２４１のピクチャ（スライス）選択の一例を示している。ここで、「High」は高階層側の階層組のピクチャを示し、「Low」は低階層側の階層組のピクチャを示す。また、ベースストリームのパケット識別子（ＰＩＤ）はＰＩＤＡであり、拡張ストリームのパケット識別子（ＰＩＤ）はＰＩＤＢであるとする。 FIG. 18 shows an example of picture (slice) selection of the stream component unit 241 when the transport stream TS includes two video streams (encoded streams), a base stream and an extended stream. Here, "High" indicates a picture of a hierarchical group on the higher layer side, and "Low" indicates a picture of a hierarchical group on the lower layer side. Further, it is assumed that the packet identifier (PID) of the base stream is PID A and the packet identifier (PID) of the extended stream is PID B.

デコード能力が、１２０ｆｐｓに対応している場合、ストリーム構成部２４１は、全階層のピクチャの符号化画像データを取り出し、圧縮データバッファ（ｃｐｂ）２０４に送る。この場合、ストリーム構成部２４１は、各ピクチャの符号化画像データをデコードタイミング情報に基づいて１つのストリームにして圧縮データバッファ（ｃｐｂ）２０４に送る。 When the decoding capability corresponds to 120 fps, the stream configuration unit 241 takes out the encoded image data of the pictures of all layers and sends them to the compressed data buffer (cpb) 204. In this case, the stream configuration unit 241 converts the encoded image data of each picture into one stream based on the decoding timing information and sends it to the compressed data buffer (cpb) 204.

その場合、デコードタイミングとしてＤＴＳの値をみて、それがピクチャ間で単調増加するようにストリームを一つにまとめる。このピクチャのまとめ処理自体は、圧縮データバッファ（ｃｐｂ）２０４がストリーム分だけ複数存在して、その複数の圧縮データバッファ（ｃｐｂ）２０４から読み出された複数ストリームに対して行って一つのストリームとしてデコード処理をするようにしもよい。 In that case, the value of DTS is viewed as the decoding timing, and the streams are combined so that it increases monotonically between the pictures. The picture grouping process itself is performed for a plurality of compressed data buffers (cpb) 204 for each stream, and is performed for a plurality of streams read from the plurality of compressed data buffers (cpb) 204 as one stream. Decoding processing may be performed.

一方、デコード能力が、１２０ｆｐｓに対応していないが６０ｆｐｓに対応している場合、ストリーム構成部２４１は、パケット識別子（ＰＩＤ）に基づくフィルタリングを行って、ＰＩＤＡである低階層側の階層組のピクチャだけを取り出し、圧縮データバッファ（ｃｐｂ）２０４に送る。 On the other hand, when the decoding capability does not correspond to 120 fps but corresponds to 60 fps, the stream configuration unit 241 performs filtering based on the packet identifier (PID) and performs filtering based on the packet identifier (PID) to perform the filtering of the lower layer set which is PID A. Only the picture is taken out and sent to the compressed data buffer (cpb) 204.

なお、ストリーム構成部２４１は、選択的に圧縮データバッファ（ｃｐｂ）２０４に送る各ピクチャの符号化画像データのデコードタイムスタンプを書き換えて低階層ピクチャのデコード間隔を調整する機能を持つ。これにより、デコーダ２０５のデコード能力の低い場合であっても、無理のないデコード処理が可能となる。 The stream configuration unit 241 has a function of selectively rewriting the decoding time stamp of the encoded image data of each picture sent to the compressed data buffer (cpb) 204 to adjust the decoding interval of the low-layer picture. As a result, even when the decoding ability of the decoder 205 is low, it is possible to perform a reasonable decoding process.

図１９は、図３に示す階層符号化例で、低階層側の階層組と高階層側の階層組とに２分されている場合であって、ストリーム構成部２４１で低階層組に属するピクチャの符号化画像データが選択的に取り出されて、圧縮データバッファ（ｃｐｂ）２０４に送られる場合を示している。 FIG. 19 is an example of hierarchical coding shown in FIG. 3, in which the lower layer side layer set and the higher layer side layer set are divided into two, and the picture belonging to the lower layer group in the stream configuration unit 241. The case where the coded image data of the above is selectively taken out and sent to the compressed data buffer (cpb) 204 is shown.

図１９（ａ）は、デコード間隔調整前のデコードタイミングを示している。この場合、ピクチャ間のデコード間隔にばらつきがあり、最短のデコード間隔は１２０ｆｐｓフル解像度のデコード間隔と等しくなる。これに対して、図１９（ｂ）は、デコード間隔調整後のデコードタイミングを示している。この場合、ピクチャ間のデコード間隔は等しくされ、デコード間隔は、フル解像度のデコード間隔の１/２となる。このように、各階層において対象デコーダの能力に応じてデコード間隔が調整される。 FIG. 19A shows the decoding timing before adjusting the decoding interval. In this case, the decoding interval varies between pictures, and the shortest decoding interval is equal to the decoding interval of 120 fps full resolution. On the other hand, FIG. 19B shows the decoding timing after adjusting the decoding interval. In this case, the decoding intervals between the pictures are equalized, and the decoding interval is 1/2 of the full resolution decoding interval. In this way, the decoding interval is adjusted according to the capability of the target decoder in each layer.

図２０は、デマルチプレクサ２０３の処理フローの一例を示している。この処理フローは、トランスポートストリームＴＳに単一のビデオストリーム（符号化ストリーム）が含まれている場合を示している。 FIG. 20 shows an example of the processing flow of the demultiplexer 203. This processing flow shows the case where the transport stream TS contains a single video stream (encoded stream).

デマルチプレクサ２０３は、ステップＳＴ３１において、処理を開始し、その後に、ステップＳＴ３２の処理に移る。このステップＳＴ３２おいて、ＣＰＵ２０１から、デコード能力（Decoder temporal layer capability）が設定される。次に、デマルチプレクサ２０３は、ステップＳＴ３３おいて、全階層（レイヤ）をデコードする能力があるか否かを判断する。 The demultiplexer 203 starts the process in step ST31, and then moves to the process in step ST32. In this step ST32, the decoding capability (Decoder temporal layer capability) is set from the CPU 201. Next, the demultiplexer 203 determines in step ST33 whether or not it has the ability to decode all layers.

全階層をデコードする能力があるとき、デマルチプレクサ２０３は、ステップＳＴ３４において、該当ＰＩＤフィルタを通過する全ＴＳパケットをデマルチプレクスし、セクションパーシング（Section parsing）を行う。その後、デマルチプレクサ２０３は、ステップＳＴ３５の処理に移る。 When the demultiplexer 203 has the ability to decode all layers, in step ST34, the demultiplexer 203 demultiplexes all TS packets passing through the corresponding PID filter and performs section parsing. After that, the demultiplexer 203 moves to the process of step ST35.

ステップＳＴ３３で全階層をデコードする能力がないとき、デマルチプレクサ２０３は、ステップＳＴ３６において、「PES_priority」が“１”のＴＳパケットをデマルチプレクスし、セクションパーシング（Section parsing）を行う。その後、デマルチプレクサ２０３は、ステップＳＴ３５の処理に移る。 When the demultiplexer 203 does not have the ability to decode all layers in step ST33, the demultiplexer 203 demultiplexes the TS packet having “PES_priority” of “1” in step ST36 and performs section parsing. After that, the demultiplexer 203 moves to the process of step ST35.

ステップＳＴ３５において、デマルチプレクサ２０３は、対象となるＰＩＤのセクションの中で、ＨＥＶＣデスクリプタ（HEVC_descriptor）、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）を読み、拡張ストリームの有無、スケーラブルタイプ、ストリームの数とＩＤ、temporal_idの最大、最小値、そして、各レイヤのデコーダ対応Levelを得る。 In step ST35, the demultiplexer 203 reads the HEVC descriptor (HEVC_descriptor), the scalability_extension_descriptor (scalability_extension_descriptor) in the section of the target PID, and determines the presence / absence of the extension stream, the scalable type, the number and ID of the streams, and so on. Obtain the maximum and minimum values of temporal_id, and the decoder-compatible Level of each layer.

次に、デマルチプレクサ２０３は、ステップＳＴ３７で、ＰＩＤの対象となる符号化ストリームを圧縮データバッファ（ｃｐｂ）２０４へ転送すると共に、ＤＴＳ、ＰＴＳを、ＣＰＵ２０１に通知する。デマルチプレクサ２０３は、ステップＳＴ３７の処理の後、ステップＳＴ３８において、処理を終了する。 Next, in step ST37, the demultiplexer 203 transfers the coded stream to be PID to the compressed data buffer (cpb) 204, and notifies the CPU 201 of the DTS and PTS. The demultiplexer 203 ends the process in step ST38 after the process in step ST37.

図２１は、デマルチプレクサ２０３の処理フローの一例を示している。この処理フローは、トランスポートストリームＴＳにベースストリームと拡張ストリームの２つのビデオストリーム（符号化ストリーム）が含まれている場合を示している。 FIG. 21 shows an example of the processing flow of the demultiplexer 203. This processing flow shows the case where the transport stream TS includes two video streams (encoded streams), a base stream and an extended stream.

デマルチプレクサ２０３は、ステップＳＴ４１において、処理を開始し、その後に、ステップＳＴ４２の処理に移る。このステップＳＴ４２おいて、ＣＰＵ２０１から、デコード能力（Decoder temporal layer capability）が設定される。次に、デマルチプレクサ２０３は、ステップＳＴ４３おいて、全階層（レイヤ）をデコードする能力があるか否かを判断する。 The demultiplexer 203 starts the process in step ST41, and then moves to the process in step ST42. In this step ST42, the decoding capability (Decoder temporal layer capability) is set from the CPU 201. Next, the demultiplexer 203 determines in step ST43 whether or not it has the ability to decode all layers.

全階層をデコードする能力があるとき、デマルチプレクサ２０３は、ステップＳＴ４４において、ＰＩＤフィルタにより全階層を構成する複数のストリームをデマルチプレクスし、セクションパーシング（Section parsing）を行う。その後、デマルチプレクサ２０３は、ステップＳＴ４５の処理に移る。 When capable of decoding all layers, the demultiplexer 203 demultiplexes a plurality of streams constituting all layers by a PID filter in step ST44 and performs section parsing. After that, the demultiplexer 203 moves to the process of step ST45.

ステップＳＴ４３で全階層をデコードする能力がないとき、デマルチプレクサ２０３は、ステップＳＴ４６において、ＰＩＤ＝ＰＩＤＡのストリームをデマルチプレクスし、セクションパーシング（Section parsing）を行う。その後、デマルチプレクサ２０３は、ステップＳＴ４５の処理に移る。 When the demultiplexer 203 does not have the ability to decode the entire layer in step ST43, the demultiplexer 203 demultiplexes the stream of PID = PID A in step ST46 and performs section parsing. After that, the demultiplexer 203 moves to the process of step ST45.

ステップＳＴ４５において、デマルチプレクサ２０３は、対象となるＰＩＤのセクションの中で、ＨＥＶＣデスクリプタ（HEVC_descriptor）、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）を読み、拡張ストリームの有無、スケーラブルタイプ、ストリームの数とＩＤ、temporal_idの最大、最小値、そして、各レイヤのデコーダ対応Levelを得る。 In step ST45, the demultiplexer 203 reads the HEVC descriptor (HEVC_descriptor), the scalability_extension_descriptor (scalability_extension_descriptor) in the section of the target PID, and determines the presence / absence of the extension stream, the scalable type, the number and ID of the streams, and so on. Obtain the maximum and minimum values of temporal_id, and the decoder-compatible Level of each layer.

次に、デマルチプレクサ２０３は、ステップＳＴ４７で、ＰＩＤの対象となる符号化ストリームをＤＴＳ（無い場合はＰＴＳ）情報に基づき、１つのストリームに結合し、圧縮データバッファ（ｃｐｂ）２０４へ転送すると共に、ＤＴＳ、ＰＴＳを、ＣＰＵ２０１に通知する。デマルチプレクサ２０３は、ステップＳＴ４７の処理の後、ステップＳＴ４８において、処理を終了する。 Next, in step ST47, the demultiplexer 203 combines the coded stream to be PID into one stream based on the DTS (or PTS if not present) information and transfers it to the compressed data buffer (cpb) 204. , DTS, PTS are notified to the CPU 201. The demultiplexer 203 ends the process in step ST48 after the process in step ST47.

図１５に戻って、圧縮データバッファ(ｃｐｂ)２０４は、デマルチプレクサ２０３で取り出されるビデオストリーム（符号化ストリーム）を、一時的に蓄積する。デコーダ２０５は、圧縮データバッファ２０４に蓄積されているビデオストリームから、デコードすべき階層として指定された階層のピクチャの符号化画像データを取り出す。そして、デコーダ２０５は、取り出された各ピクチャの符号化画像データを、それぞれ、そのピクチャのデコードタイミングでデコードし、非圧縮データバッファ（ｄｐｂ）２０６に送る。 Returning to FIG. 15, the compressed data buffer (cpb) 204 temporarily stores the video stream (encoded stream) taken out by the demultiplexer 203. The decoder 205 extracts the encoded image data of the picture of the layer designated as the layer to be decoded from the video stream stored in the compressed data buffer 204. Then, the decoder 205 decodes the encoded image data of each of the extracted pictures at the decoding timing of the picture, and sends the encoded image data to the uncompressed data buffer (dpb) 206.

ここで、デコーダ２０５には、ＣＰＵ２０１からデコードすべき階層がtemporal_idで指定される。この指定階層は、デマルチプレクサ２０３で取り出されるビデオストリーム（符号化ストリーム）に含まれる全階層、あるいは低階層側の一部の階層とされ、ＣＰＵ２０１により自動的に、あるいはユーザ操作に応じて設定される。また、デコーダ２０５には、ＣＰＵ２０１から、ＤＴＳ（Decoding Time stamp）に基づいて、デコードタイミングが与えられる。なお、デコーダ２０５は、各ピクチャの符号化画像データをデコードする際に、必要に応じて、非圧縮データバッファ２０６から被参照ピクチャの画像データを読み出して利用する。 Here, in the decoder 205, the hierarchy to be decoded from the CPU 201 is specified by temporary_id. This designated layer is all layers included in the video stream (encoded stream) taken out by the demultiplexer 203, or a part layer on the lower layer side, and is set automatically by the CPU 201 or according to the user operation. To. Further, the decoder 205 is given a decoding timing from the CPU 201 based on the DTS (Decoding Time stamp). When decoding the encoded image data of each picture, the decoder 205 reads out the image data of the referenced picture from the uncompressed data buffer 206 and uses it as necessary.

図２２は、デコーダ２０５の構成例を示している。このデコーダ２０５は、テンポラルＩＤ解析部２５１と、対象階層選択部２５２と、デコード部２５３を有している。テンポラルＩＤ解析部２５１は、圧縮データバッファ２０４に蓄積されているビデオストリーム（符号化ストリーム）を読み出し、各ピクチャの符号化画像データのＮＡＬユニットヘッダに挿入されているtemporal_idを解析する。 FIG. 22 shows a configuration example of the decoder 205. The decoder 205 has a temporal ID analysis unit 251, a target layer selection unit 252, and a decoding unit 253. The temporary ID analysis unit 251 reads out the video stream (encoded stream) stored in the compressed data buffer 204, and analyzes the temporary_id inserted in the NAL unit header of the encoded image data of each picture.

対象階層選択部２５２は、圧縮データバッファ２０４から読み出されたビデオストリームから、テンポラルＩＤ解析部２５１の解析結果に基づいて、デコードすべき階層として指定された階層のピクチャの符号化画像データを取り出す。デコード部２５３は、対象階層選択部２５２で取り出された各ピクチャの符号化画像データを、順次デコードタイミングでデコードし、非圧縮データバッファ（ｄｐｂ）２０６に送る。 The target layer selection unit 252 extracts the encoded image data of the picture of the layer designated as the layer to be decoded based on the analysis result of the temporal ID analysis unit 251 from the video stream read from the compressed data buffer 204. .. The decoding unit 253 decodes the encoded image data of each picture taken out by the target layer selection unit 252 sequentially at the decoding timing, and sends the encoded image data to the uncompressed data buffer (dpb) 206.

この場合、デコード部２５３は、ＶＰＳ、ＳＰＳの解析を行って、例えば、サブレイヤごとのビットレートのレベル指定値「sublayer_level_idc」を把握し、デコード能力内でデコードし得るものかどうかを確認する。また、この場合、デコード部２５３は、ＳＥＩの解析を行って、例えば、「initial_cpb_removal_time」、「cpb_removal_delay」を把握し、ＣＰＵ２０１からのデコードタイミングが適切か確認する。 In this case, the decoding unit 253 analyzes the VPS and SPS, for example, grasps the level specified value "sublayer_level_idc" of the bit rate for each sublayer, and confirms whether or not it can be decoded within the decoding ability. Further, in this case, the decoding unit 253 analyzes the SEI, grasps, for example, "initial_cpb_removal_time" and "cpb_removal_delay", and confirms whether the decoding timing from the CPU 201 is appropriate.

デコード部２５３は、スライス（Slice）のデコードを行う際に、スライスヘッダ（Slice header）から、時間方向の予測先を表す情報として、「ref_idx_l0_active(ref_idx_l1_active)を取得し、時間方向の予測を行う。なお、デコード後のピクチャは、スライスヘッダ（slice header）から得られる「short_term_ref_pic_set_idx」、あるいは「it_idx_sps」が指標とされて、他のピクチャによる被参照として処理される。 When decoding a slice, the decoding unit 253 acquires "ref_idx_l0_active (ref_idx_l1_active)" as information indicating a prediction destination in the time direction from the slice header (Slice header), and performs prediction in the time direction. The decoded picture is processed as a reference by another picture using "short_term_ref_pic_set_idx" or "it_idx_sps" obtained from the slice header as an index.

図１５に戻って、非圧縮データバッファ（ｄｐｂ）２０６は、デコーダ２０５でデコードされた各ピクチャの画像データを、一時的に蓄積する。ポスト処理部２０７は、非圧縮データバッファ（ｄｐｂ）２０６から表示タイミングで順次読み出された各ピクチャの画像データに対して、そのフレームレートを、表示能力に合わせる処理を行う。この場合、ＣＰＵ２０１から、ＰＴＳ（Presentation Time stamp）に基づいて、表示タイミングが与えられる。 Returning to FIG. 15, the uncompressed data buffer (dpb) 206 temporarily stores the image data of each picture decoded by the decoder 205. The post processing unit 207 performs a process of adjusting the frame rate of the image data of each picture sequentially read from the uncompressed data buffer (dpb) 206 at the display timing according to the display capability. In this case, the display timing is given by the CPU 201 based on the PTS (Presentation Time stamp).

例えば、デコード後の各ピクチャの画像データのフレームレートが１２０ｆｐｓであって、表示能力が１２０ｆｐｓであるとき、ポスト処理部２０７は、デコード後の各ピクチャの画像データをそのままディスプレイに送る。また、例えば、デコード後の各ピクチャの画像データのフレームレートが１２０ｆｐｓであって、表示能力が６０ｆｐｓであるとき、ポスト処理部２０７は、デコード後の各ピクチャの画像データに対して時間方向解像度が１/２倍となるようにサブサンプル処理を施し、６０ｆｐｓの画像データとしてディスプレイに送る。 For example, when the frame rate of the image data of each decoded picture is 120 fps and the display capacity is 120 fps, the post processing unit 207 sends the image data of each decoded picture to the display as it is. Further, for example, when the frame rate of the image data of each picture after decoding is 120 fps and the display capacity is 60 fps, the post processing unit 207 has a time direction resolution with respect to the image data of each picture after decoding. Subsample processing is performed so as to be 1/2 times, and the image data is sent to the display as 60 fps image data.

また、例えば、デコード後の各ピクチャの画像データのフレームレートが６０ｆｐｓであって、表示能力が１２０ｆｐｓであるとき、ポスト処理部２０７は、デコード後の各ピクチャの画像データに対して時間方向解像度が２倍となるように補間処理を施し、１２０ｆｐｓの画像データとしてディスプレイに送る。また、例えば、デコード後の各ピクチャの画像データのフレームレートが６０ｆｐｓであって、表示能力が６０ｆｐｓであるとき、ポスト処理部２０７は、デコード後の各ピクチャの画像データをそのままディスプレイに送る。 Further, for example, when the frame rate of the image data of each picture after decoding is 60 fps and the display capacity is 120 fps, the post processing unit 207 has a time-direction resolution for the image data of each picture after decoding. Interpolation processing is performed so that the image data is doubled, and the image data is sent to the display as 120 fps image data. Further, for example, when the frame rate of the image data of each decoded picture is 60 fps and the display capability is 60 fps, the post processing unit 207 sends the image data of each decoded picture to the display as it is.

図２３は、ポスト処理部２０７の構成例を示している。この例は、上述したようにデコード後の各ピクチャの画像データのフレームレートが１２０ｆｐｓあるいは６０ｆｐｓであって、表示能力が１２０ｆｐｓあるいは６０ｆｐｓである場合に対処可能とした例である。 FIG. 23 shows a configuration example of the post processing unit 207. This example is an example in which it is possible to deal with a case where the frame rate of the image data of each picture after decoding is 120 fps or 60 fps and the display capability is 120 fps or 60 fps as described above.

ポスト処理部２０７は、補間部２７１と、サブサンプル部２７２と、スイッチ部２７３を有している。非圧縮データバッファ２０６からのデコード後の各ピクチャの画像データは、直接スイッチ部２７３に入力され、あるいは補間部２７１で２倍のフレームレートとされた後にスイッチ部２７３に入力され、あるいはサブサンプル部２７２で１/２倍のフレームレートとされた後にスイッチ部２７３に入力される。 The post processing unit 207 has an interpolation unit 271, a subsample unit 272, and a switch unit 273. The image data of each picture after decoding from the uncompressed data buffer 206 is directly input to the switch unit 273, or is input to the switch unit 273 after being doubled in the frame rate by the interpolation unit 271, or is input to the subsample unit. After the frame rate is set to 1/2 times that of 272, it is input to the switch unit 273.

スイッチ部２７３には、ＣＰＵ２０１から、選択情報が供給される。この選択情報は、ＣＰＵ２０１が、表示能力を参照して自動的に、あるいは、ユーザ操作に応じて発生する。スイッチ部２７３は、選択情報に基づいて、入力のいずれかを選択的に出力とする。これにより、非圧縮データバッファ（ｄｐｂ）２０６から表示タイミングで順次読み出された各ピクチャの画像データのフレームレートは、表示能力に合ったものとされる。 Selection information is supplied to the switch unit 273 from the CPU 201. This selection information is generated automatically by the CPU 201 with reference to the display capability or in response to a user operation. The switch unit 273 selectively outputs any of the inputs based on the selection information. As a result, the frame rate of the image data of each picture sequentially read from the uncompressed data buffer (dpb) 206 at the display timing is set to match the display capability.

図２４は、デコーダ２０５、ポスト処理部２０７の処理フローの一例を示している。デコーダ２０５、ポスト処理部２０７は、ステップＳＴ５１において、処理を開始し、その後に、ステップＳＴ５２の処理に移る。このステップＳＴ５２において、デコーダ２０５は、圧縮データバッファ（ｃｐｂ）２０４に蓄積されているデコード対象のビデオストリームを読み出し、temporal_idに基づいて、ＣＰＵ２０１からデコード対象として指定される階層のピクチャを選択する。 FIG. 24 shows an example of the processing flow of the decoder 205 and the post processing unit 207. The decoder 205 and the post processing unit 207 start processing in step ST51, and then move to the processing in step ST52. In this step ST52, the decoder 205 reads the video stream to be decoded stored in the compressed data buffer (cpb) 204, and selects the picture of the hierarchy designated as the decoding target from the CPU 201 based on the temporal_id.

次に、デコーダ２０５は、ステップＳＴ５３において、選択された各ピクチャの符号化画像データをデコードタイミングで順次デコードし、デコード後の各ピクチャの画像データを非圧縮データバッファ（ｄｐｂ）２０６に転送して、一時的に蓄積する。次に、ポスト処理部２０７は、ステップＳＴ５４において、非圧縮データバッファ（ｄｐｂ）２０６から、表示タイミングで各ピクチャの画像データを読み出す。 Next, in step ST53, the decoder 205 sequentially decodes the encoded image data of each selected picture at the decoding timing, and transfers the image data of each decoded picture to the uncompressed data buffer (dpb) 206. , Temporarily accumulate. Next, in step ST54, the post processing unit 207 reads out the image data of each picture from the uncompressed data buffer (dpb) 206 at the display timing.

次に、ポスト処理部２０７は、読み出された各ピクチャの画像データのフレームレートが表示能力にあっているか否かを判断する。フレームレートが表示能力に合っていないとき、ポスト処理部２０７は、ステップＳＴ５６において、フレームレートを表示能力に合わせて、ディスプレイに送り、その後、ステップＳＴ５７において、処理を終了する。一方、フレームレートが表示能力に合っているとき、ポスト処理部２０７は、ステップＳＴ５８において、フレームレートそのままでディスプレイに送り、その後、ステップＳＴ５７において、処理を終了する。 Next, the post processing unit 207 determines whether or not the frame rate of the image data of each read picture matches the display capability. When the frame rate does not match the display capability, the post processing unit 207 sends the frame rate to the display in accordance with the display capability in step ST56, and then ends the process in step ST57. On the other hand, when the frame rate matches the display capability, the post processing unit 207 sends the frame rate to the display as it is in step ST58, and then ends the processing in step ST57.

図１５に示す受信装置２００の動作を簡単に説明する。受信部２０２では、受信アンテナで受信されたＲＦ変調信号が復調され、トランスポートストリームＴＳが取得される。このトランスポートストリームＴＳは、デマルチプレクサ２０３に送られる。デマルチプレクサ２０３では、トランスポートストリームＴＳから、デコード能力（Decoder temporal layer capability）に応じた階層組のピクチャの符号化画像データが選択的に取り出され、圧縮データバッファ（ｃｐｂ）２０４に送られ、一時的に蓄積される。 The operation of the receiving device 200 shown in FIG. 15 will be briefly described. The receiving unit 202 demodulates the RF modulated signal received by the receiving antenna and acquires the transport stream TS. This transport stream TS is sent to the demultiplexer 203. In the demultiplexer 203, the encoded image data of the picture of the layer set according to the decoding capability (Decoder temporal layer capability) is selectively extracted from the transport stream TS, sent to the compressed data buffer (cpb) 204, and temporarily. Accumulates.

デコーダ２０５では、圧縮データバッファ２０４に蓄積されているビデオストリームから、デコードすべき階層として指定された階層のピクチャの符号化画像データが取り出される。そして、デコーダ２０５では、取り出された各ピクチャの符号化画像データが、それぞれ、そのピクチャのデコードタイミングでデコードされ、非圧縮データバッファ（ｄｐｂ）２０６に送られ、一時的に蓄積される。この場合、各ピクチャの符号化画像データがデコードされる際に、必要に応じて、非圧縮データバッファ２０６から被参照ピクチャの画像データが読み出されて利用される。 In the decoder 205, the encoded image data of the picture of the layer designated as the layer to be decoded is taken out from the video stream stored in the compressed data buffer 204. Then, in the decoder 205, the encoded image data of each of the extracted pictures is decoded at the decoding timing of the picture, sent to the uncompressed data buffer (dpb) 206, and temporarily stored. In this case, when the coded image data of each picture is decoded, the image data of the referenced picture is read from the uncompressed data buffer 206 and used as needed.

非圧縮データバッファ（ｄｐｂ）２０６から表示タイミングで順次読み出された各ピクチャの画像データは、ポスト処理部２０７に送られる。ポスト処理部２０７では、各ピクチャの画像データに対して、そのフレームレートを、表示能力に合わせるための補間あるいはサブサンプルが行われる。このポスト処理部２０７で処理された各ピクチャの画像データは、ディスプレイに供給され、その各ピクチャの画像データによる動画像の表示が行われる。 The image data of each picture sequentially read from the uncompressed data buffer (dpb) 206 at the display timing is sent to the post processing unit 207. The post processing unit 207 performs interpolation or subsamples for adjusting the frame rate of the image data of each picture to match the display capability. The image data of each picture processed by the post processing unit 207 is supplied to the display, and a moving image is displayed by the image data of each picture.

以上説明したように、図１に示す送受信システム１０においては、送信側において、ビデオストリームのレイヤ（ＰＥＳパケットのヘッダ）に、このビデオストリームに含まれる各ピクチャの符号化画像データがそれぞれどの階層組に属するピクチャの符号化画像データであるかを識別する識別情報が挿入されるものである。そのため、例えば、受信側においては、この識別情報を利用することで、デコード能力に応じた所定階層以下の階層のピクチャの符号化画像データを選択的にデコードすることが容易に可能となる。 As described above, in the transmission / reception system 10 shown in FIG. 1, on the transmitting side, the layer of the video stream (header of the PES packet) contains the coded image data of each picture included in the video stream. Identification information that identifies whether the picture belongs to the coded image data of the picture belongs to is inserted. Therefore, for example, on the receiving side, by using this identification information, it is possible to easily selectively decode the encoded image data of the picture in the layer below the predetermined layer according to the decoding ability.

また、図１に示す送受信システム１０においては、送信側において、トランスポートストリームＴＳのレイヤに、スケーラビリティ・エクステンション・デスクリプタ（scalability_extension_descriptor）等が挿入されるものである。そのため、例えば、受信側では、階層符号化における階層情報、トランスポートストリームＴＳに含まれるビデオストリームの構成情報などを容易に把握でき、適切なデコード処理を行うことが可能となる。 Further, in the transmission / reception system 10 shown in FIG. 1, a scalability extension descriptor (scalability_extension_descriptor) or the like is inserted into the layer of the transport stream TS on the transmission side. Therefore, for example, the receiving side can easily grasp the hierarchical information in the hierarchical coding, the configuration information of the video stream included in the transport stream TS, and the like, and can perform appropriate decoding processing.

また、図１に示す送受信システム１０においては、受信側において、受信されたビデオストリームからデコード能力（Decoder temporal layer capability）に応じた所定階層以下の階層のピクチャの符号化画像データが選択的に圧縮データバッファ２０４に取り込まれてデコードされるものである。そのため、例えば、デコード能力に応じた適切なデコード処理が可能となる。 Further, in the transmission / reception system 10 shown in FIG. 1, on the receiving side, the encoded image data of the pictures in the layers below the predetermined layer according to the decoding capability (Decoder temporal layer capability) is selectively compressed from the received video stream. It is taken into the data buffer 204 and decoded. Therefore, for example, an appropriate decoding process according to the decoding ability becomes possible.

また、図１に示す送受信システム１０においては、受信側において、選択的に圧縮データバッファ２０４に取り込まれる各ピクチャの符号化画像データのデコードタイムスタンプを書き換えて低階層ピクチャのデコード間隔を調整する機能を持つものである。そのため、例えば、デコーダ２０５のデコード能力が低い場合であっても無理のないデコード処理が可能となる。 Further, in the transmission / reception system 10 shown in FIG. 1, a function of rewriting the decoding time stamp of the encoded image data of each picture selectively taken into the compressed data buffer 204 on the receiving side to adjust the decoding interval of the low-layer picture. To have. Therefore, for example, even when the decoding ability of the decoder 205 is low, it is possible to perform a reasonable decoding process.

また、図１に示す送受信システム１０においては、受信側において、復号化後の各ピクチャの画像データのフレームレートをポスト処理部２０７で表示能力に合わせるものである。そのため、例えば、デコード能力が低い場合であっても、高表示能力にあったフレームレートの画像データを得ることが可能となる。 Further, in the transmission / reception system 10 shown in FIG. 1, on the receiving side, the post processing unit 207 adjusts the frame rate of the image data of each decoded picture to the display capability. Therefore, for example, even when the decoding ability is low, it is possible to obtain image data having a frame rate suitable for the high display ability.

＜２．変形例＞
なお、上述実施の形態においては、ビデオストリームに含まれる各ピクチャの符号化画像データが所定数の階層組のうちどの階層組に属するピクチャの符号化画像データであるかを識別する識別情報を、ＰＥＳパケットのヘッダ（ＰＥＳヘッダ）に挿入する例を示した。しかし、この識別情報の挿入位置は、これに限定されるものではない。 <2. Modification example>
In the above-described embodiment, the identification information for identifying which of the predetermined number of hierarchical groups the encoded image data of each picture included in the video stream belongs to is the coded image data of the picture. An example of inserting into the header (PES header) of the PES packet is shown. However, the insertion position of this identification information is not limited to this.

例えば、マルチプレクサ１０４（図２参照）は、この識別情報を、アダプテーションフィールドを持つＴＳパケットの、当該アダプテーションフィールドに挿入してよい。マルチプレクサ１０４は、例えば、複数の階層を低階層組と高階層組に二分する場合、アダプテーションフィールドに存在する、周知のエレメンタリ・ストリーム・プライオリティ・インジケータ（elementary_stream_priority_indicator）の１ビットフィールドを利用する。 For example, the multiplexer 104 (see FIG. 2) may insert this identification information into the adaptation field of the TS packet having the adaptation field. For example, when dividing a plurality of layers into a low-layer group and a high-level group, the multiplexer 104 utilizes a 1-bit field of a well-known elementary stream priority indicator (elementary_stream_priority_indicator) existing in an adaptation field.

この１ビットフィールドは、後続するＴＳパケットのペイロードに、低階層側の階層組のピクチャの符号化画像データをペイロードに持つＰＥＳパケットを含む場合は“１”、つまり優先度が高く設定される。一方、この１ビットフィールドは、後続するＴＳパケットのペイロードに、低階層側の階層組のピクチャの符号化画像データをペイロードに持つＰＥＳパケットを含む場合“０”、つまり優先度が低く設定される。 This 1-bit field is set to "1", that is, a high priority when the payload of the subsequent TS packet includes a PES packet having the encoded image data of the pictures of the lower layer set in the payload. On the other hand, this 1-bit field is set to "0" when the payload of the subsequent TS packet includes a PES packet having the encoded image data of the pictures of the lower layer set in the payload, that is, the priority is set low. ..

図２５は、アダプテーションフィールドの配置例を示している。この例は、複数の階層を低階層組と高階層組に二分されている場合であって、エレメンタリ・ストリーム・プライオリティ・インジケータ（elementary_stream_priority_indicator）の１ビットフィールドを利用した場合の例である。 FIG. 25 shows an example of arrangement of adaptation fields. This example is a case where a plurality of layers are divided into a low layer group and a high layer group, and a 1-bit field of the elementary stream priority indicator (elementary_stream_priority_indicator) is used.

図示の例において、１ピクチャの符号化画像データをペイロードに持つＰＥＳパケットを分割して持つ所定数のＴＳパケット群毎に、その直前に、アダプテーションフィールドを持つＴＳパケットが配置される。この場合、その１ピクチャが低階層側の階層組のピクチャであるとき、エレメンタリ・ストリーム・プライオリティ・インジケータの１ビットフィールドは“１”に設定される。一方、その１ピクチャが高階層側の階層組のピクチャであるとき、エレメンタリ・ストリーム・プライオリティ・インジケータの１ビットフィールドは“０”に設定される。 In the illustrated example, a TS packet having an adaptation field is arranged immediately before each of a predetermined number of TS packet groups having PES packets having one picture of encoded image data as a payload divided. In this case, when the one picture is a picture of the lower layer set, the 1-bit field of the elementary stream priority indicator is set to "1". On the other hand, when the one picture is a picture of a hierarchical set on the higher layer side, the 1-bit field of the elementary stream priority indicator is set to "0".

図２５に示すように、アダプテーションフィールドを持つＴＳパケットを配置することで、受信側では、ビデオストリームに含まれるピクチャの符号化画像データ毎に、いずれの階層組に属するピクチャの符号化データであるかを容易に識別可能となる。なお、図２５の配置例では、１ピクチャ毎にアダプテーションフィールドを持つＴＳパケットを配置するように示しているが、ピクチャが属する階層組が切り替わるごとに、その直前にアダプテーションフィールドを持つＴＳパケットを配置するようにされてもよい。 As shown in FIG. 25, by arranging the TS packet having the adaptation field, on the receiving side, for each coded image data of the picture included in the video stream, the coded data of the picture belonging to any hierarchical set is obtained. Can be easily identified. In the arrangement example of FIG. 25, the TS packet having an adaptation field is arranged for each picture, but every time the hierarchical group to which the picture belongs is switched, the TS packet having an adaptation field is arranged immediately before that. You may be made to do so.

図２６は、階層組の識別情報を上述したようにアダプテーションフィールドに挿入する場合における、送信装置１００のマルチプレクサ１０４Ａの構成例を示している。この図２６において、図１２と対応する部分には同一符号を付し、その詳細説明は省略する。このマルチプレクサ１０４Ａは、図１２のマルチプレクサ１０４におけるＰＥＳプライオリティ発生部１４１の代わりに、アダプテーションフィールド・プライオリティ指示部１４６を有するものとされる。 FIG. 26 shows a configuration example of the multiplexer 104A of the transmission device 100 when the identification information of the hierarchical set is inserted into the adaptation field as described above. In FIG. 26, the parts corresponding to those in FIG. 12 are designated by the same reference numerals, and detailed description thereof will be omitted. The multiplexer 104A is assumed to have an adaptation field priority indicator unit 146 in place of the PES priority generation unit 141 in the multiplexer 104 of FIG.

プライオリティ指示部１４６には、ＣＰＵ１０１から、階層数（Number of layers）とストリーム数（Number of streams）の情報が供給される。プライオリティ指示部１４６は、階層数で示される複数の階層を２以上の所定数の階層組に分割した場合における、各階層組の優先度情報を発生する。例えば、２分割される場合には、エレメンタリ・ストリーム・プライオリティ・インジケータの１ビットフィールドに挿入すべき値（低階層組は“１”、高階層組は“０”）を発生する。 Information on the number of layers and the number of streams is supplied from the CPU 101 to the priority indicator unit 146. The priority indicator unit 146 generates priority information for each layer set when a plurality of layers indicated by the number of layers are divided into two or more layer sets. For example, when it is divided into two, a value to be inserted in the 1-bit field of the elementary stream priority indicator (“1” for the low-level group and “0” for the high-level group) is generated.

プライオリティ指示部１４６で発生される各階層組の優先度情報は、トランスポートパケット化部１４５に供給される。トランスポートパケット化部１４５は、１ピクチャの符号化画像データをペイロードに持つＰＥＳパケットを分割して持つ所定数のＴＳパケット群毎に、その直前に、アダプテーションフィールドを持つＴＳパケットを配置する。そして、その場合、トランスポートパケット化部１４５は、アダプテーションフィールドに、ピクチャが属する階層組に対応した優先度情報を識別情報として挿入する。 The priority information of each layer set generated by the priority indicating unit 146 is supplied to the transport packetizing unit 145. The transport packetization unit 145 arranges a TS packet having an adaptation field immediately before each of a predetermined number of TS packet groups having a PES packet having one picture of encoded image data as a payload divided. Then, in that case, the transport packetizing unit 145 inserts the priority information corresponding to the hierarchical set to which the picture belongs into the adaptation field as the identification information.

図２７は、階層組の識別情報を上述したようにアダプテーションフィールドに挿入する場合におけるトランスポートストリームＴＳの構成例を示している。この構成例は、上述の図１４に示す構成例とほぼ同様の構成とされている。この構成例では、アダプテーションフィールドを持つＴＳパケットが存在し、このアダプテーションフィールドに、各ピクチャが属する階層組を識別するための識別情報が挿入される。例えば、複数の階層が低階層組と高階層組に二分される場合には、エレメンタリ・ストリーム・プライオリティ・インジケータ（elementary_stream_priority_indicator）の１ビットフィールドが利用される。 FIG. 27 shows a configuration example of the transport stream TS when the identification information of the hierarchical set is inserted into the adaptation field as described above. This configuration example has almost the same configuration as the configuration example shown in FIG. 14 described above. In this configuration example, a TS packet having an adaptation field exists, and identification information for identifying the hierarchical set to which each picture belongs is inserted into this adaptation field. For example, when a plurality of layers are divided into a low-layer group and a high-level group, the 1-bit field of the elementary stream priority indicator (elementary_stream_priority_indicator) is used.

図２８は、階層組の識別情報を上述したようにアダプテーションフィールドに挿入する場合における、受信装置２００のデマルチプレクサ２０３Ａの構成例を示している。この図２８において、図１６と対応する部分には同一符号を付し、その詳細説明は省略する。このデマルチプレクサ２０３Ａは、図１６のデマルチプレクサ２０３における識別情報抽出部２３９の代わりに、識別情報抽出部２４２を有するものとされる。 FIG. 28 shows a configuration example of the demultiplexer 203A of the receiving device 200 when the identification information of the hierarchical set is inserted into the adaptation field as described above. In FIG. 28, the parts corresponding to those in FIG. 16 are designated by the same reference numerals, and detailed description thereof will be omitted. The demultiplexer 203A is assumed to have an identification information extraction unit 242 instead of the identification information extraction unit 239 in the demultiplexer 203 of FIG.

この識別情報抽出部２４２は、アダプテーションフィールドから識別情報を抽出し、ストリーム構成部２４１に送る。例えば、複数の階層が低階層組と高階層組に２分されている場合、アダプテーションフィールドの「elementary_stream_priority_indicator」の１ビットフィールドの優先度情報を抽出し、ストリーム構成部２４１に送る。 The identification information extraction unit 242 extracts the identification information from the adaptation field and sends it to the stream configuration unit 241. For example, when a plurality of layers are divided into a low layer group and a high layer group, the priority information of the 1-bit field of the "elementary_stream_priority_indicator" of the adaptation field is extracted and sent to the stream component unit 241.

ストリーム構成部２４１は、ＰＥＳペイロード抽出部２４０で取り出される各階層のピクチャの符号化画像データから、デコード能力（Decoder temporal layer capability）に応じた階層組のピクチャの符号化画像データを選択的に取り出し、圧縮データバッファ（ｃｐｂ）２０４に送る。この場合、ストリーム構成部２４１は、ＰＳＩテーブル/デスクリプタ抽出部２３５で得られる階層情報、ストリーム構成情報、識別情報抽出部２４２で抽出される識別情報（優先度情報）などを参照する。 The stream configuration unit 241 selectively extracts the encoded image data of the pictures of the layer set according to the decoding capability (Decoder temporal layer capability) from the encoded image data of the pictures of each layer extracted by the PES payload extraction unit 240. , Send to compressed data buffer (cpb) 204. In this case, the stream configuration unit 241 refers to the hierarchical information obtained by the PSI table / descriptor extraction unit 235, the stream configuration information, the identification information (priority information) extracted by the identification information extraction unit 242, and the like.

また、上述実施の形態においては、送信装置１００と受信装置２００からなる送受信システム１０を示したが、本技術を適用し得る送受信システムの構成は、これに限定されるものではない。例えば、受信装置２００の部分が、例えば、（ＨＤＭＩ（High-Definition Multimedia Interface）などのデジタルインタフェースで接続されたセットトップボックスおよびモニタの構成などであってもよい。なお、「ＨＤＭＩ」は、登録商標である。 Further, in the above-described embodiment, the transmission / reception system 10 including the transmission device 100 and the reception device 200 is shown, but the configuration of the transmission / reception system to which the present technology can be applied is not limited to this. For example, the portion of the receiving device 200 may be, for example, a configuration of a set-top box and a monitor connected by a digital interface such as (HDMI (High-Definition Multimedia Interface)). Note that "HDMI" is registered. It is a trademark.

また、上述実施の形態においては、コンテナがトランスポートストリーム（ＭＰＥＧ−２ＴＳ）である例を示した。しかし、本技術は、インターネット等のネットワークを利用して受信端末に配信される構成のシステムにも同様に適用できる。インターネットの配信では、ＭＰ４やそれ以外のフォーマットのコンテナで配信されることが多い。つまり、コンテナとしては、デジタル放送規格で採用されているトランスポートストリーム（ＭＰＥＧ−２ＴＳ）、インターネット配信で使用されているＭＰ４などの種々のフォーマットのコンテナが該当する。 Further, in the above-described embodiment, an example in which the container is a transport stream (MPEG-2 TS) is shown. However, this technology can be similarly applied to a system configured to be delivered to a receiving terminal using a network such as the Internet. In Internet distribution, it is often distributed in containers of MP4 or other formats. That is, the container corresponds to a container of various formats such as transport stream (MPEG-2 TS) adopted in the digital broadcasting standard and MP4 used in Internet distribution.

例えば、図２９は、ＭＰ４ストリームの構成例を示している。このＭＰ４ストリームには、「moov
」、「moof」、「mdat」などのボックスが存在する。「mdat」のボックスに、トラックとして、ビデオの符号化ストリームであるビデオエレメンタリストリーム「track1:video ES1」が存在すると共に、オーディオの符号化ストリームであるオーディオエレメンタリストリーム「track1:audio ES1」が存在する For example, FIG. 29 shows a configuration example of an MP4 stream. In this MP4 stream, "moov
, "Moof", "mdat" and other boxes exist. In the box of "mdat", there is a video elementary stream "track1: video ES1" which is a video coded stream as a track, and an audio elemental stream "track1: audio ES1" which is an audio coded stream. Exists

また、「moof」のボックスには、ヘッダ部分として「mfhd(movie fragment header」が存在し、そのデータ部分として、各トラックに対応した、「track fragment」が存在する。ビデオエレメンタリストリーム「track1:video ES1」に対応した「track1 fragment(video)」には、「Independent and disposal samples」が存在し、その中に、各ピクチャにそれぞれ対応した「SampleDependencyTypeBox」というボックスが挿入されている。 Also, in the "moof" box, "mfhd (movie fragment header"" exists as a header part, and "track fragment" corresponding to each track exists as its data part. Video elemental stream "track1:" In "track1 fragment (video)" corresponding to "video ES1", "Independent and disposal samples" exist, and a box called "SampleDependencyTypeBox" corresponding to each picture is inserted in it.

このボックスの中に、各ピクチャの符号化画像データがそれぞれどの階層組に属するピクチャの符号化画像データであるかを識別する識別情報を挿入できる。例えば、複数の階層を最上位層とそれ以外の下位層の２つの階層組に分割する場合、「sample_depends_on」の２ビットフィールドと、「sample_is_depended_on」の２ビットフィールドを利用して、当該識別情報の挿入が可能である。 In this box, identification information for identifying which layer set the coded image data of each picture belongs to can be inserted. For example, when dividing a plurality of layers into two layers, the uppermost layer and the other lower layers, the 2-bit field of "sample_depends_on" and the 2-bit field of "sample_is_depended_on" are used to obtain the identification information. It can be inserted.

図３０は、「SampleDependencyTypeBox」の構造例（Syntax）を示している。また、図３１は、その構造例における主要な情報の内容（Semantics）を示している。この場合、「sample_depends_on」を“１”として他のピクチャを参照するものでＩピクチャでないことを示すと共に、「sample_is_depended_on」を“２”として他のピクチャに参照されないことを示すことで、最上位層の組に属するピクチャであるとの識別が可能となる。また、これ以外の状態では、そのピクチャは階層層の階層組に属するピクチャであるとの識別が可能となる。 FIG. 30 shows a structural example (Syntax) of the “SampleDependencyTypeBox”. Further, FIG. 31 shows the contents (Semantics) of the main information in the structural example. In this case, "sample_depends_on" is set to "1" to indicate that it refers to another picture and is not an I picture, and "sample_is_depended_on" is set to "2" to indicate that it is not referred to by another picture. It is possible to identify the picture as belonging to the set of. Further, in any other state, the picture can be identified as a picture belonging to the hierarchical set of the hierarchical layer.

なお、「SampleDependencyTypeBox」のボックスを使用する代わりに、新たに定義する、「SampleScalablePriorityBox」というボックスを使用することも考えられる。図３２は、「SampleScalablePriorityBox」の構造例（Syntax）を示している。また、図３３は、その構造例における主要な情報の内容（Semantics）を示している。 Instead of using the "SampleDependencyTypeBox" box, it is possible to use a newly defined box called "SampleScalablePriorityBox". FIG. 32 shows a structural example (Syntax) of the “SampleScalablePriorityBox”. In addition, FIG. 33 shows the contents (Semantics) of the main information in the structural example.

この場合、複数の階層を最低階層組と高階層組の２つの階層組に分割する場合、「base_and_priority」の２ビットフィールドを利用して、当該識別情報が挿入される。すなわち、「base_and_priority」を例えば“１”とすることで優先度が低く、高階層組に属するピクチャであるとの識別が可能となる。一方、「base_and_priority」を例えば“２”とすることで優先度が高く、低階層組に属するピクチャであるとの識別が可能となる。 In this case, when the plurality of layers are divided into two layers, the lowest layer group and the higher layer group, the identification information is inserted by using the 2-bit field of "base_and_priority". That is, by setting "base_and_priority" to, for example, "1", the priority is low and it is possible to identify the picture as belonging to a high-level group. On the other hand, by setting "base_and_priority" to, for example, "2", the picture has a high priority and can be identified as a picture belonging to a low-level group.

また、本技術は、以下のような構成を取ることもできる。
（１）動画像データを構成する各ピクチャの画像データを複数の階層に分類し、該分類された各階層のピクチャの画像データを符号化し、該符号化された各階層のピクチャの画像データを持つビデオデータを生成する画像符号化部と、
上記生成されたビデオデータを含む所定フォーマットのコンテナを送信する送信部と、
上記複数の階層を２以上の所定数の階層組に分割し、上記ビデオデータをコンテナするパケットに、該ビデオデータに含まれる各ピクチャの符号化画像データがそれぞれどの階層組に属するピクチャの符号化画像データであるかを識別する識別情報を挿入する識別情報挿入部を備える
送信装置。
（２）上記識別情報は、低階層側の階層組ほど高く設定される優先度情報である
前記（１）に記載の送信装置。
（３）上記識別情報は、ペイロードにピクチャ毎の符号化画像データを含むＰＥＳパケットのヘッダに挿入される
前記（１）に記載の送信装置。
（４）上記識別情報は、上記ヘッダのＰＥＳプライオリティのフィールドを利用して挿入される
前記（３）に記載の送信装置。
（５）上記識別情報は、アダプテーションフィールドを持つＴＳパケットの、該アダプテーションフィールドに挿入される
前記（１）に記載の送信装置。
（６）上記識別情報は、上記アダプテーションフィールドのＥＳプライオリティインジケータのフィールドを利用して挿入される
前記（５）に記載の送信装置。
（７）上記識別情報は、該当するピクチャのトラックに関連するヘッダのボックスに挿入される
前記（１）に記載の送信装置。
（８）上記画像符号化部は、
上記各階層のピクチャの符号化画像データを持つ単一のビデオストリームを生成するか、あるいは上記各階層組のピクチャの符号化画像データをそれぞれ持つ所定数のビデオデータを生成し、
上記コンテナのレイヤに、該コンテナに含まれるビデオストリームの構成情報を挿入する構成情報挿入部をさらに備える
前記（１）から（７）のいずれかに記載の送信装置。
（９）動画像データを構成する各ピクチャの画像データを複数の階層に分類し、該分類された各階層のピクチャの画像データを符号化し、該符号化された各階層のピクチャの画像データを持つビデオデータを生成する画像符号化ステップと、
送信部により、上記生成されたビデオデータを含む所定フォーマットのコンテナを送信する送信ステップと、
上記複数の階層を２以上の所定数の階層組に分割し、上記ビデオデータをコンテナするパケットに、該ビデオデータに含まれる各ピクチャの符号化画像データがそれぞれどの階層組に属するピクチャの符号化画像データであるかを識別する識別情報を挿入する識別情報挿入ステップを有する
送信方法。
（１０）動画像データを構成する各ピクチャの画像データが複数の階層に分類されて符号化されることで得られた各階層のピクチャの符号化画像データを持つビデオデータを含む所定フォーマットのコンテナを受信する受信部と、
上記受信されたコンテナに含まれる上記ビデオデータからデコード能力に応じた所定階層以下の階層のピクチャの符号化画像データを選択的にバッファに取り込み、該バッファに取り込まれた各ピクチャの符号化画像データをデコードして、上記所定階層以下の階層のピクチャの画像データを得る画像復号化部を備える
受信装置。
（１１）上記複数の階層は２以上の所定数の階層組に分割され、上記ビデオデータをコンテナするパケットに、該ビデオデータに含まれる各ピクチャの符号化画像データがそれぞれどの階層組に属するピクチャの符号化画像データであるかを識別する識別情報が挿入されており、
上記画像復号化部は、上記識別情報に基づいて、上記デコード能力に応じた所定階層組のピクチャの符号化画像データを上記バッファに取り込んでデコードする
前記（１０）に記載の受信装置。
（１２）上記識別情報は、ペイロードにピクチャ毎の符号化画像データを含むＰＥＳパケットのヘッダに挿入されている
前記（１１）に記載の受信装置。
（１３）上記識別情報は、アダプテーションフィールドを持つＴＳパケットの、該アダプテーションフィールドに挿入されている
前記（１１）に記載の受信装置。
（１４）上記識別情報は、該当するピクチャのトラックに関連するヘッダのボックスに挿入されている
前記（１１）に記載の送信装置。
（１５）上記複数の階層は２以上の所定数の階層組に分割され、上記受信されたコンテナには、上記所定数の階層組のピクチャの符号化画像データをそれぞれ持つ上記所定数のビデオストリームが含まれており、
上記画像符号化部は、ストリーム識別情報に基づいて、上記デコード能力に応じた所定階層組のピクチャの符号化画像データを上記バッファに取り込んでデコードする
前記（１０）に記載の受信装置。
（１６）上記画像復号化部は、
上記所定階層組のピクチャの符号化画像データが複数のビデオストリームに含まれている場合、各ピクチャの符号化画像データをデコードタイミング情報に基づいて１つのストリームにして上記バッファに取り込む
前記（１５）に記載の受信装置。
（１７）上記画像復号化部は、
上記選択的にバッファに取り込まれる各ピクチャの符号化画像データのデコードタイムスタンプを書き換えて低階層ピクチャのデコード間隔を調整する機能を持つ
前記（１０）から（１６）のいずれかに記載の受信装置。
（１８）上記画像復号化部で得られる各ピクチャの画像データのフレームレートを表示能力に合わせるポスト処理部をさらに備える
前記（１０）から（１７）のいずれかに記載の受信装置。
（１９）受信部により、動画像データを構成する各ピクチャの画像データが複数の階層に分類されて符号化されることで得られた各階層のピクチャの符号化画像データを持つビデオデータを含む所定フォーマットのコンテナを受信する受信ステップと、
上記受信されたコンテナに含まれる上記ビデオデータからデコード能力に応じた所定階層以下の階層のピクチャの符号化画像データを選択的にバッファに取り込み、該バッファに取り込まれた各ピクチャの符号化画像データをデコードして、上記所定階層以下の階層のピクチャの画像データを得る画像復号化ステップを有する
受信方法。 In addition, the present technology can also have the following configurations.
(1) The image data of each picture constituting the moving image data is classified into a plurality of layers, the image data of the pictures of each classified layer is encoded, and the image data of the encoded pictures of each layer is obtained. An image encoding unit that generates video data to have
A transmitter that transmits a container in a predetermined format containing the generated video data,
The plurality of layers are divided into two or more predetermined number of layer sets, and the encoded image data of each picture included in the video data is encoded in a packet that containers the video data. A transmission device including an identification information insertion unit that inserts identification information that identifies whether the data is image data.
(2) The transmission device according to (1) above, wherein the identification information is priority information set higher as the layer set on the lower layer side.
(3) The transmission device according to (1) above, wherein the identification information is inserted into the header of a PES packet including encoded image data for each picture in the payload.
(4) The transmission device according to (3), wherein the identification information is inserted by using the PES priority field of the header.
(5) The transmission device according to (1) above, wherein the identification information is inserted into the adaptation field of a TS packet having an adaptation field.
(6) The transmitting device according to (5) above, wherein the identification information is inserted by using the field of the ES priority indicator of the adaptation field.
(7) The transmitting device according to (1) above, wherein the identification information is inserted into a box of a header related to a track of the corresponding picture.
(8) The image coding unit is
A single video stream having the coded image data of the pictures in each layer is generated, or a predetermined number of video data having each coded image data of the pictures in each layer are generated.
The transmission device according to any one of (1) to (7) above, further comprising a configuration information insertion unit for inserting configuration information of a video stream included in the container into the layer of the container.
(9) The image data of each picture constituting the moving image data is classified into a plurality of layers, the image data of the pictures of each classified layer is encoded, and the image data of the encoded pictures of each layer is obtained. Image coding step to generate video data to have
A transmission step in which the transmitter transmits a container in a predetermined format containing the generated video data,
The plurality of layers are divided into two or more predetermined number of layer sets, and the encoded image data of each picture included in the video data is encoded in a packet that containers the video data. A transmission method having an identification information insertion step that inserts identification information that identifies whether it is image data.
(10) A container in a predetermined format containing video data having coded image data of the pictures of each layer obtained by classifying the image data of each picture constituting the moving image data into a plurality of layers and encoding them. And the receiver that receives
From the video data contained in the received container, the encoded image data of the pictures in the predetermined layer or lower according to the decoding ability is selectively taken into the buffer, and the encoded image data of each picture taken into the buffer is taken. A receiving device including an image decoding unit that decodes an image data of a picture in a layer below the predetermined layer.
(11) The plurality of layers are divided into two or more predetermined number of layer sets, and the encoded image data of each picture included in the video data belongs to which layer set in the packet containing the video data. Identification information that identifies whether the data is encoded image data of
The receiving device according to (10), wherein the image decoding unit takes in the encoded image data of a predetermined layer set of pictures according to the decoding ability into the buffer and decodes it based on the identification information.
(12) The receiving device according to (11) above, wherein the identification information is inserted in the header of a PES packet including encoded image data for each picture in the payload.
(13) The receiving device according to (11) above, wherein the identification information is inserted into the adaptation field of a TS packet having an adaptation field.
(14) The transmitting device according to (11) above, wherein the identification information is inserted in a box of a header related to a track of the corresponding picture.
(15) The plurality of layers are divided into two or more predetermined number of layer sets, and the predetermined number of video streams having the encoded image data of the pictures of the predetermined number of layer sets in the received container. Is included,
The receiving device according to (10) above, wherein the image coding unit takes in the encoded image data of a predetermined layer set of pictures according to the decoding ability into the buffer and decodes it based on the stream identification information.
(16) The image decoding unit is
When the coded image data of the pictures of the predetermined layer set is included in a plurality of video streams, the coded image data of each picture is converted into one stream based on the decoding timing information and taken into the buffer (15). The receiver described in.
(17) The image decoding unit is
The receiving device according to any one of (10) to (16) above, which has a function of rewriting the decoding time stamp of the encoded image data of each picture selectively taken into the buffer to adjust the decoding interval of the low-layer picture. ..
(18) The receiving device according to any one of (10) to (17), further including a post processing unit that matches the frame rate of the image data of each picture obtained by the image decoding unit with the display capability.
(19) The receiving unit includes video data having coded image data of the pictures of each layer obtained by classifying the image data of each picture constituting the moving image data into a plurality of layers and encoding them. A receive step that receives a container of a given format, and
From the video data contained in the received container, the encoded image data of the pictures in the predetermined layer or lower according to the decoding ability is selectively taken into the buffer, and the encoded image data of each picture taken into the buffer is taken. A receiving method having an image decoding step of decoding an image data of a picture in a layer below the predetermined layer.

本技術の主な特徴は、ビデオデータをコンテナするパケットに、このビデオデータに含まれる各ピクチャの符号化画像データがそれぞれどの階層組に属するピクチャの符号化画像データであるかを識別する識別情報を挿入することで、受信側においては、この識別情報を利用して、デコード能力に応じた所定階層以下の階層のピクチャの符号化画像データを選択的にデコードすることを容易に可能としたことである（図１２参照）。 The main feature of the present technology is identification information for identifying which layer group the encoded image data of each picture included in the video data belongs to in the packet that containers the video data. By inserting, the receiving side can easily use this identification information to selectively decode the encoded image data of the picture in the layer below the predetermined layer according to the decoding ability. (See FIG. 12).

１０・・・送受信システム
１００・・・送信装置
１０１・・・ＣＰＵ
１０２・・・エンコーダ
１０３・・・圧縮データバッファ（ｃｐｂ）
１０４，１０４Ａ・・・マルチプレクサ
１０５・・・送信部
１４１・・・ＰＥＳプライオリティ発生部
１４２・・・セクションコーディング部
１４３-1〜１４３-N・・・ＰＥＳパケット化部
１４４・・・スイッチ部
１４５・・・トランスポートパケット化部
１４６・・・アダプテーションフィールド・プライオリティ指示部
２００・・・受信装置
２０１・・・ＣＰＵ
２０２・・・受信部
２０３・・・デマルチプレクサ
２０４・・・圧縮データバッファ（ｃｐｂ）
２０５・・・デコーダ
２０６・・・非圧縮データバッファ（ｄｐｂ）
２０７・・・ポスト処理部
２３１・・・ＴＳアダプテーションフィールド抽出部
２３２・・・クロック情報抽出部
２３３・・・ＴＳペイロード抽出部
２３４・・・セクション抽出部
２３５・・・ＰＳＩテーブル／デスクリプタ抽出部
２３６・・・ＰＥＳパケット抽出部
２３７・・・ＰＥＳヘッダ抽出部
２３８・・・タイムスタンプ抽出部
２３９・・・識別情報抽出部
２４０・・・ＰＥＳペイロード抽出部
２４１・・・ストリーム構成部
２４２・・・識別情報抽出部
２５１・・・テンポラルＩＤ解析部
２５２・・・対象階層選択部
２５３・・・デコード部
２７１・・・補間部
２７２・・・サブサンプル部
２７３・・・スイッチ部 10 ... Transmission / reception system 100 ... Transmission device 101 ... CPU
102 ... Encoder 103 ... Compressed data buffer (cpb)
104, 104A ... multiplexer 105 ... transmitter 141 ... PES priority generator 142 ... section coding section 143-1 to 143-N ... PES packetizing section 144 ... switch section 145.・・ Transport packetization unit 146 ・・・ Adaptation field priority indicator 200 ・・・ Receiver 201 ・・・ CPU
202 ... Receiver 203 ... Demultiplexer 204 ... Compressed data buffer (cpb)
205 ... Decoder 206 ... Uncompressed data buffer (dpb)
207 ... Post processing unit 231 ... TS adaptation field extraction unit 232 ... Clock information extraction unit 233 ... TS payload extraction unit 234 ... Section extraction unit 235 ... PSI table / descriptor extraction unit 236 ... PES packet extraction unit 237 ... PES header extraction unit 238 ... Time stamp extraction unit 239 ... Identification information extraction unit 240 ... PES payload extraction unit 241 ... Stream configuration unit 242 ... Identification information extraction unit 251 ... Temporal ID analysis unit 252 ... Target hierarchy selection unit 253 ... Decoding unit 271 ... Interpolation unit 272 ... Subsample unit 273 ... Switch unit

本技術は、送信方法および送信装置に関する。
BACKGROUND The present disclosure relates to transmit a method and a transmission apparatus.

Claims

The image data of each picture constituting the moving image data is hierarchically encoded, and the first video stream having the encoded image data of the lower layer side picture and the second video having the encoded image data of the higher layer side picture are obtained. Equipped with an image encoding unit that generates a stream
Hierarchical identification information is added to the coded image data of each of the hierarchically coded pictures.
Coding of each picture included in the first video stream and the second video stream generated by the image coding unit and included in the first video stream corresponding to the first video stream. The first stream identification information indicating that the image data belongs to the coded image data of the lower layer picture, and the code of each picture included in the second video stream corresponding to the second video stream. A frame rate consisting of only the second stream identification information indicating that the converted image data belongs to the encoded image data of the picture on the higher layer side and the picture on the lower layer side corresponding to the first video stream. A second value based on a frame rate consisting of a first descriptor into which a first value based on the data is inserted, a picture on the lower layer side and a picture on the higher layer side corresponding to the second video stream. A multiplexing stream generator that generates a multiplexing stream containing a second descriptor in which
A transmission device further comprising a transmission unit that transmits the multiplexing stream generated by the multiplexing stream generation unit.

The level specification value of the first video stream and the level specification value of the video stream obtained by combining the first video stream and the second video stream are inserted into the NAL unit of the SPS of the first video stream. The transmitting device according to claim 1.

The image coding unit hierarchically encodes the image data of each picture constituting the moving image data, and the first video stream having the coded image data of the picture on the lower layer side and the coded image data of the picture on the higher layer side. Has an image coding step that produces a second video stream with
Hierarchical identification information is added to the coded image data of each of the hierarchically coded pictures.
The multiplexing stream generation unit includes the first video stream and the second video stream generated in the image coding step, and the first video stream corresponds to the first video stream. The first stream identification information indicating that the coded image data of each picture included belongs to the coded image data of the picture on the lower layer side, and the second video stream corresponding to the second video stream. The second stream identification information indicating that the coded image data of each picture included in the above belongs to the coded image data of the picture on the higher layer side, and the lower layer side corresponding to the first video stream. A frame rate consisting of a first descriptor in which a first value based on a frame rate consisting of only pictures is inserted, a picture on the lower layer side and a picture on the higher layer side corresponding to the second video stream. A multiplexing stream generation step that generates a multiplexing stream containing a second descriptor with a second value inserted based on
A transmission method in which the transmission unit further includes a transmission step of transmitting the multiplexing stream generated in the multiplexing stream generation step.

The first video stream having the coded image data of the picture on the lower layer side and the coded image data of the picture on the higher layer side generated by hierarchically coding the image data of each picture constituting the moving image data. The coded image data of each picture included in the first video stream corresponding to the first video stream, which includes the second video stream having the data, belongs to the coded image data of the picture on the lower layer side. The first stream identification information indicating that the data and the coded image data of each picture included in the second video stream corresponding to the second video stream become the coded image data of the picture on the higher layer side. A second stream identification information indicating that the data belongs, a first descriptor in which a first value based on a frame rate consisting of only the pictures on the lower layer side corresponding to the first video stream is inserted, and the above. Receives a multiplexed stream containing a second descriptor in which a second value based on a frame rate consisting of the picture on the lower layer side and the picture on the higher layer side is inserted corresponding to the second video stream. Equipped with a receiver
Hierarchical identification information is added to the coded image data of each of the hierarchically coded pictures.
Based on the first stream identification information, the second stream identification information, the first value, and the second value, only the first video stream or the first video stream can be obtained from the multiplexing stream. A receiving device further comprising a processing unit that extracts and decodes both the video stream of the above and the second video stream.

The receiving device according to claim 4, wherein the processing unit further performs processing for adjusting the frame rate of the image data of each picture obtained by performing the decoding according to the display capability.

The receiver is generated by hierarchically coding the image data of each picture that constitutes the moving image data, and the code of the first video stream having the coded image data of the picture on the lower layer side and the picture on the higher layer side. A second video stream having the converted image data is included, and the coded image data of each picture included in the first video stream corresponding to the first video stream is the coding of the picture on the lower layer side. The first stream identification information indicating that the data belongs to the image data, and the coded image data of each picture included in the second video stream corresponding to the second video stream are the codes of the pictures on the higher layer side. A first value in which a second stream identification information indicating that the data belongs to the converted image data and a first value based on a frame rate consisting of only the pictures on the lower layer side corresponding to the first video stream are inserted. Multiplexing including a descriptor and a second descriptor in which a second value based on a frame rate consisting of the lower layer picture and the higher layer picture corresponding to the second video stream is inserted. Has a receive step to receive the stream,
Hierarchical identification information is added to the coded image data of each of the hierarchically coded pictures.
Based on the first stream identification information, the second stream identification information, the first value, and the second value, the processing unit can select only the first video stream from the multiplexing stream. Alternatively, a receiving method further comprising a processing step of extracting both the first video stream and the second video stream and performing decoding processing.