JP2015226305A

JP2015226305A - Encoding device

Info

Publication number: JP2015226305A
Application number: JP2014112114A
Authority: JP
Inventors: 守屋　芳美; Yoshimi Moriya; 芳美守屋; 彰峯澤; Akira Minesawa; 一之宮澤; Kazuyuki Miyazawa; 亮史服部; Akifumi Hattori; 関口　俊一; Shunichi Sekiguchi; 俊一関口; 幸成松田; Yukinari Matsuda; 大樹工藤; Daiki Kudo
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2014-05-30
Filing date: 2014-05-30
Publication date: 2015-12-14

Abstract

PROBLEM TO BE SOLVED: To provide an encoding device that can reconstruct and decode a bit stream which is subjected to time hierarchical coding on the basis of a decoding timing of each access unit in a decoding device.SOLUTION: An encoding device has control information encoding means for encoding control information containing instruction time information representing an instruction time of an access unit at the top of an instruction order and decoding time information representing a decoding time of an access unit at the top of an encoding order for each GOP which is an assemble of plural access unit capable of decoding all video signals of one or more access units which are encoded by an inter-frame prediction encoding system, time information encoding means for encoding the difference between the decoding time and the decoding time of the access unit at the top on an access-unit basis, and outputting encoded data of the time information, and multiplexing means for multiplexing the encoded data and outputting a bit stream corresponding to the multiplexed encoded data.

Description

この発明は、映像信号や音声信号を符号化してビットストリームを生成する符号化装置及び符号化方法と、ビットストリームに多重化されている符号化データを復号する復号装置及び復号方法とに関するものである。 The present invention relates to an encoding device and an encoding method for encoding a video signal and an audio signal to generate a bit stream, and a decoding device and a decoding method for decoding encoded data multiplexed in the bit stream. is there.

日本のディジタル放送では、以下の非特許文献１に記載されているように、映像信号や音声信号の符号化データである映像ストリームと音声ストリームは、ＭＰＥＧ−２（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐＰｈａｓｅ−２）のシステム規格であるトランスポートストリーム（ＴＳ）形式で多重化されて伝送される。このとき、符号化装置は、映像ストリーム及び音声ストリームに関連するメタデータの符号化データについても、映像ストリーム及び音声ストリームと一緒に多重化して伝送する。 In Japanese digital broadcasting, as described in Non-Patent Document 1 below, a video stream and an audio stream that are encoded data of a video signal and an audio signal are MPEG-2 (Moving Picture Experts Group Phase-2). The data is multiplexed and transmitted in the transport stream (TS) format, which is a system standard. At this time, the encoding apparatus also multiplexes and transmits the encoded data of the metadata related to the video stream and the audio stream together with the video stream and the audio stream.

ＭＰＥＧ−２でのトランスポートストリーム（ＴＳ）の他に、ＭＰＥＧで標準化が進められている新しいトランスポート方式として、ＭＭＴ（ＭＰＥＧＭｅｄｉａＴｒａｎｓｐｏｒｔ）があり、ＭＭＴは、１つのプログラムを構成する１以上の映像コンポーネント（映像ストリーム）と音声コンポーネント（音声ストリーム）を伝送する際、コンポーネント毎に、異なる伝送形態（例えば、放送、通信など）での伝送を可能にしている。 In addition to the transport stream (TS) in MPEG-2, there is MMT (MPEG Media Transport) as a new transport system that is being standardized in MPEG, and MMT is one or more that constitute one program. When transmitting a video component (video stream) and an audio component (audio stream), it is possible to transmit in a different transmission form (for example, broadcast, communication, etc.) for each component.

ここで、ＨＥＶＣ／Ｈ．２６５（以下、「ＨＥＶＣ」と称する）は、ＭＰＥＧ及びＩＴＵ（ＩｎｔｅｒｎａｔｉｏｎａｌＴｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎＵｎｉｏｎ）で標準化された新しい映像符号化方式である。
ＨＥＶＣでは、時間階層符号化（時間方向にスケーラブルな符号化）が導入されており、アクセスユニット（１ピクチャを復号するために必要な符号化データを含む単位）を構成する符号化単位のＮＡＬ（ＮｅｔｗｏｒｋＡｂｓｔｒａｃｔｉｏｎＬａｙｅｒ）ユニット毎に階層レベルを指定することができる。 Here, HEVC / H. H.265 (hereinafter referred to as “HEVC”) is a new video encoding method standardized by MPEG and ITU (International Telecommunication Union).
In HEVC, temporal hierarchical coding (scalable coding in the time direction) has been introduced, and the NAL (the coding unit that constitutes an access unit (a unit including coded data necessary for decoding one picture)) (Network Abstraction Layer) A hierarchical level can be specified for each unit.

図９はＨＥＶＣでの時間階層符号化例を示す説明図である。
図９において、ＴｅｍｐｏｒａｌＩＤは各アクセスユニット（ＡＵ）の階層レベルを示す識別情報である。
ＩＲＡＰは、ＨＥＶＣで規定されているＩＲＡＰ（Ｉｎｔｒａｒａｎｄｏｍａｃｃｅｓｓｐｏｉｎｔ）ピクチャのことであり、ビットストリームの途中から復号を開始するときに、表示順でＩＲＡＰピクチャ以降のピクチャについては正常に復号されることが保証される。
ＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅｓ）は、１以上のアクセスユニット（ＡＵ）の映像信号がフレーム間予測符号化方式で符号化された場合に、前記１以上のアクセスユニットの映像信号の全てを復号することが可能な複数のアクセスユニット（ＡＵ）の集合である。即ち、符号化順で先頭のアクセスユニット（ＡＵ）であるＩＲＡＰピクチャと、そのＩＲＡＰピクチャに続くアクセスユニット（ＡＵ）（ＩＲＡＰピクチャ以外のピクチャ）との集合である。 FIG. 9 is an explanatory diagram showing an example of time hierarchy coding in HEVC.
In FIG. 9, TemporalID is identification information indicating the hierarchical level of each access unit (AU).
IRAP is an IRAP (Intra Random Access Point) picture defined by HEVC. When decoding is started from the middle of a bitstream, pictures following the IRAP picture in the display order are normally decoded. Is guaranteed.
GOP (Group Of Pictures) may decode all of the video signals of the one or more access units when the video signal of one or more access units (AU) is encoded by the inter-frame predictive coding method. It is a set of possible access units (AU). That is, it is a set of an IRAP picture that is the first access unit (AU) in the coding order and an access unit (AU) (picture other than the IRAP picture) that follows the IRAP picture.

時間階層符号化の内容は公知であるため詳細な説明を省略するが、時間階層符号化の制約として、符号化対象のアクセスユニット（ＡＵ）が有する階層レベルより大きい階層レベルを有するアクセスユニット（ＡＵ）は参照することができないというものがある。
このような制約を設けることで、例えば図１１の例で階層レベル２以下（ＴｅｍｐｏｒａｌＩＤ≦２）のアクセスユニット（ＡＵ）は、復号時に階層レベル３（ＴｅｍｏｒａｌＩＤ＝３）のアクセスユニット（ＡＵ）を参照しないため、階層レベル３のアクセスユニット（ＡＵ）を復号することなく階層レベル２以下（ＴｅｍｐｏｒａｌＩＤ≦２）のアクセスユニット（ＡＵ）を復号することが可能である。
なお、ＨＥＶＣでは、最大階層が６までの参照構造による時間階層符号化が可能である。 Since the contents of the time hierarchy coding are known, detailed description thereof will be omitted. However, as a restriction on the time hierarchy coding, an access unit (AU) having a higher hierarchy level than that of the access unit (AU) to be encoded ) Cannot be referred to.
By providing such a restriction, for example, in the example of FIG. 11, an access unit (AU) at hierarchical level 2 or lower (Temporal ID ≦ 2) refers to an access unit (AU) at hierarchical level 3 (Temporal ID = 3) at the time of decoding. Therefore, it is possible to decode an access unit (AU) at hierarchical level 2 or lower (Temporal ID ≦ 2) without decoding an access unit (AU) at hierarchical level 3.
In HEVC, temporal hierarchy coding is possible using a reference structure with a maximum hierarchy of up to 6.

図１０は図９のピクチャ構造で符号化される各ピクチャの符号化順及び表示順を示す説明図である。
図１０に示すように、階層レベル３のアクセスユニットと階層レベル２以下のアクセスユニットが表示順で交互になるように符号化されていれば、階層レベル０から階層レベル３のすべてのアクセスユニットを復号した場合の表示フレームレートが２Ｎ（Ｈｚ）であったときに、階層レベル２以下のアクセスユニットのみを復号した場合には表示フレームレートＮ（Ｈｚ）で再生することができる。従って、表示フレームレートがＮ（Ｈｚ）以下に対応している復号装置で再生する場合には階層レベル２以下のアクセスユニットのみを復号装置に渡せばよい。 FIG. 10 is an explanatory diagram showing the encoding order and display order of each picture encoded with the picture structure of FIG.
As shown in FIG. 10, if the access units at the hierarchy level 3 and the access units at the hierarchy level 2 or lower are encoded so as to alternate in the display order, all access units from the hierarchy level 0 to the hierarchy level 3 are When the display frame rate at the time of decoding is 2N (Hz), if only the access unit of the hierarchical level 2 or lower is decoded, it is possible to reproduce at the display frame rate N (Hz). Therefore, in the case of reproduction by a decoding device that supports a display frame rate of N (Hz) or lower, only an access unit having a hierarchical level of 2 or lower needs to be passed to the decoding device.

例えばＭＭＴでは、図１０のように構成された映像のビットストリームを多重化して配信するときに、階層レベル２以下のアクセスユニットのみで構成されるアクセスユニットの集合と階層レベル３のアクセスユニットのみで構成されるアクセスユニットの集合に対し、それぞれ異なる値の識別子を付与して配信することができる。なおＭＭＴでは、同一の識別子を付与したアクセスユニットの集合をアセットと呼ぶ。階層レベル２以下のアクセスユニットで構成されるアセットの識別子をＡ_０、階層レベル３のアクセスユニットで構成されるアセットの識別子をＡ_１として、アセットごとに異なる伝送形態で伝送することも可能であり、例えばアセットＡ_０を放送で伝送し、アセットＡ_１を通信で伝送することも可能である。
ＭＭＴでは、アセット間で提示時刻の同期をとるために、ＧＯＰ単位に表示順で先頭のアクセスユニットの提示時刻をＮＴＰ（ＮｅｔｗｏｒｋＴｉｍｅＰｒｏｔｏｃｏｌ）形式で記述する記述子を用意しており、ＧＯＰ単位に本記述子を多重化して伝送することができる。またアセットが異なる場合にはアセットごとに先頭のアクセスユニットの提示時刻を伝送することができ、異なる伝送形態で伝送された複数のアセットを受信側で受け取った場合も提示時刻で同期をとって再生(提示)することができる。 For example, in MMT, when a video bit stream configured as shown in FIG. 10 is multiplexed and distributed, only a set of access units composed of access units of hierarchical level 2 and lower and only access units of hierarchical level 3 are used. Different sets of identifiers can be assigned to the set of configured access units and distributed. In MMT, a set of access units assigned the same identifier is called an asset. It is also possible to transmit in different transmission formats for each asset, with the identifier of the asset consisting of access units of hierarchical level 2 or lower being A ₀ and the identifier of the asset consisting of access units of hierarchical level 3 being A ₁ . For example, it is also possible to transmit asset A ₀ by broadcasting and transmit asset A ₁ by communication.
In order to synchronize the presentation time between assets, MMT provides a descriptor that describes the presentation time of the first access unit in the NTP (Network Time Protocol) format in display order in GOP units. This descriptor can be multiplexed and transmitted. Also, if the assets are different, the presentation time of the first access unit can be transmitted for each asset, and even when multiple assets transmitted in different transmission formats are received at the receiving side, playback is performed with the presentation time synchronized (Presentation).

ＳＴＤ−Ｂ３２（ＡＲＩＢ（一般社団法人電波産業会）で策定されたディジタル放送に関する標準規格）STD-B32 (Standard for digital broadcasting established by ARIB (Radio Industry Association))

従来の符号化装置は以上のように構成されているので、図９のように時間階層符号化された映像のビットストリームを各アクセスユニットの階層レベルに応じて異なるアセットを構成し、アセットごとに異なる伝送形態を用いて伝送した場合には、復号装置において各アクセスユニットの復号タイミングに基づき、図１０に示すような符号化順と同じ順番のビットストリームを再構成する必要があるが、ＭＭＴでは各アクセスユニットの復号時刻を伝送することができないため、ビットストリームを再構成できないという課題があった。 Since the conventional encoding apparatus is configured as described above, different assets are configured according to the hierarchical level of each access unit from the bitstream of the video encoded in time hierarchy as shown in FIG. When transmission is performed using a different transmission form, it is necessary to reconstruct a bitstream in the same order as the encoding order as shown in FIG. 10 based on the decoding timing of each access unit in the decoding device. Since the decoding time of each access unit cannot be transmitted, there is a problem that the bitstream cannot be reconfigured.

この発明は上記のような課題を解決するためになされたもので、時間階層符号化された映像のビットストリームを各アクセスユニットの階層レベルに応じて異なるアセットを構成し伝送する場合でも、復号装置において各アクセスユニットの復号タイミングに基づいて時間階層符号化されたビットストリームを再構成し復号することができる符号化装置、復号装置、符号化方法及び復号方法を得ることを目的とする。 The present invention has been made to solve the above-described problems, and even when a bit stream of video that has been time-hierarchically coded constitutes different assets according to the hierarchy level of each access unit and is transmitted, the decoding apparatus An encoding device, a decoding device, an encoding method, and a decoding method capable of reconstructing and decoding a bit stream encoded in time hierarchy based on the decoding timing of each access unit.

この発明に係る符号化装置は、
映像のアクセスユニット単位に映像信号を符号化して、前記映像信号の符号化データを出力する映像符号化手段と、
前記映像符号化手段により１以上のアクセスユニットの映像信号がフレーム間予測符号化方式で符号化された場合に、前記１以上のアクセスユニットの映像信号の全てを復号することが可能な複数のアクセスユニットの集合であるＧＯＰ毎に、提示順で先頭のアクセスユニットの提示時刻を示す提示時刻情報と、符号化順で先頭のアクセスユニットの復号時刻を示す復号時刻情報とを含む制御情報を符号化して、ＧＯＰ単位の制御情報の符号化データを出力する制御情報符号化手段と、
前記映像符号化手段から出力された映像信号の符号化データのアクセスユニット単位に、このアクセスユニットの復号時刻と前記符号化順で先頭のアクセスユニットの復号時刻との差を符号化して、時刻情報の符号化データを出力する時刻情報符号化手段と、
前記映像符号化手段から出力された映像信号の符号化データと前記時刻情報符号化手段から出力された時刻情報の符号化データと前記制御情報符号化手段から出力された制御情報の符号化データを多重化し、多重化後の符号化データであるビットストリームを出力する多重化手段と
を備えたものである。 The encoding device according to the present invention is:
Video encoding means for encoding a video signal in units of video access units and outputting encoded data of the video signal;
A plurality of accesses capable of decoding all of the video signals of the one or more access units when the video signal of the one or more access units is encoded by the interframe predictive encoding method by the video encoding means. Control information including presentation time information indicating the presentation time of the first access unit in the presentation order and decoding time information indicating the decoding time of the first access unit in the encoding order is encoded for each GOP that is a set of units. Control information encoding means for outputting encoded data of control information in GOP units;
For each access unit of encoded data of the video signal output from the video encoding means, the difference between the decoding time of this access unit and the decoding time of the first access unit in the encoding order is encoded to obtain time information Time information encoding means for outputting the encoded data of
The encoded data of the video signal output from the video encoding means, the encoded data of time information output from the time information encoding means, and the encoded data of control information output from the control information encoding means. And multiplexing means for outputting a bit stream that is multiplexed encoded data.

この発明によれば、時間階層符号化された映像のビットストリームを各アクセスユニットの階層レベルに応じて異なるアセットを構成し伝送する場合でも、復号装置において各アクセスユニットの復号タイミングに基づいて時間階層符号化されたビットストリームを再構成し復号することができる効果がある。 According to the present invention, even when a bit stream of video encoded in time hierarchy is configured and transmitted with different assets according to the hierarchy level of each access unit, the decoding apparatus uses the time hierarchy based on the decoding timing of each access unit. The encoded bit stream can be reconstructed and decoded.

この発明の実施の形態１による符号化装置を示す構成図である。It is a block diagram which shows the encoding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による符号化装置の処理内容（符号化方法）を示すフローチャートである。It is a flowchart which shows the processing content (encoding method) of the encoding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による復号装置を示す構成図である。It is a block diagram which shows the decoding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による復号装置の処理内容（復号方法）を示すフローチャートである。It is a flowchart which shows the processing content (decoding method) of the decoding apparatus by Embodiment 1 of this invention. ＭＭＴでビットストリームを伝送する場合の符号化データの概要を示す説明図である。It is explanatory drawing which shows the outline | summary of the coding data in the case of transmitting a bit stream by MMT. ＭＰＵの構成例を示す説明図である。It is explanatory drawing which shows the structural example of MPU. ＨＥＶＣピクチャ構造記述子を示す説明図である。It is explanatory drawing which shows a HEVC picture structure descriptor. ＨＥＶＣでの時間階層符号化例を示す説明図である。It is explanatory drawing which shows the example of time hierarchy encoding in HEVC. ピクチャ構造の一例を示す説明図である。It is explanatory drawing which shows an example of a picture structure. 図９のピクチャ構造で符号化される各ピクチャの符号化順及び提示順を示す説明図である。It is explanatory drawing which shows the encoding order and presentation order of each picture encoded with the picture structure of FIG. ＰＡメッセージの構成を示す説明図である。It is explanatory drawing which shows the structure of PA message. 分離前のビットストリームと分離後のビットストリームの一例を示す説明図である。It is explanatory drawing which shows an example of the bit stream before separation, and the bit stream after separation.

実施の形態１．
図１はこの発明の実施の形態１による符号化装置を示す構成図である。
図１において、音声符号化部１はディジタルの音声信号が与えられると、音声のアクセスユニット（ＡＵ）単位に、例えば、ＭＰＥＧ−４オーディオなどの方式によって当該音声信号を符号化して、その音声信号の符号化データである音声ストリームを生成するとともに、その音声ストリームに関するメタデータを符号化する処理を実施する。また符号化されたアクセスユニットの提示時刻（ＰＴＳ）を音声ＭＭＴＰパケット生成部８へ出力する。
音声ＭＭＴＰペイロード生成部２は音声符号化部１により符号化されたメタデータとアクセスユニット（ＡＵ）単位の音声信号の符号化データからなる音声ＭＭＴＰペイロードを生成する処理を実施する。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing an encoding apparatus according to Embodiment 1 of the present invention.
In FIG. 1, when a digital audio signal is given, the audio encoding unit 1 encodes the audio signal in units of audio access units (AU) by a method such as MPEG-4 audio. The audio stream that is the encoded data is generated, and the process for encoding the metadata related to the audio stream is performed. The encoded access unit presentation time (PTS) is output to the voice MMTP packet generator 8.
The audio MMTP payload generation unit 2 performs a process of generating an audio MMTP payload including the metadata encoded by the audio encoding unit 1 and encoded data of an audio signal in units of access units (AU).

ＨＥＶＣ符号化部３はディジタルの映像信号が与えられると、映像のアクセスユニット（ＡＵ）単位に、ＨＥＶＣ方式によって当該映像信号を符号化して、その映像信号の符号化データである映像ストリームを生成するとともに、その映像ストリームに関するメタデータを符号化する処理を実施する。
映像ＭＭＴＰペイロード生成部４はＨＥＶＣ符号化部３により符号化されたメタデータとアクセスユニット（ＡＵ）単位の映像信号の符号化データからなる映像ＭＭＴＰペイロードを生成する処理を実施する。なお、ＨＥＶＣ符号化部３及び映像ＭＭＴＰペイロード生成部４から映像符号化手段が構成されている。 When a digital video signal is given, the HEVC encoding unit 3 encodes the video signal by the HEVC method for each video access unit (AU), and generates a video stream that is encoded data of the video signal. At the same time, a process for encoding the metadata related to the video stream is performed.
The video MMTP payload generation unit 4 performs a process of generating a video MMTP payload composed of metadata encoded by the HEVC encoding unit 3 and encoded data of a video signal in units of access units (AU). The HEVC encoding unit 3 and the video MMTP payload generation unit 4 constitute video encoding means.

制御情報符号化部５は音声符号化部１により生成された音声ストリーム及びＨＥＶＣ符号化部３により生成された映像ストリームに関する制御情報として、ＭＭＴで規定されているＰＡメッセージと呼ばれる制御情報を符号化する処理を実施する。
図１１にＰＡメッセージの構成を示す。ＰＡメッセージは１個以上のテーブルから構成される。
ＰＡメッセージに含まれる１つのテーブルには、１つのプログラム（ＭＭＴでは、パッケージと称する）を構成する１以上の映像コンポーネント（映像ストリーム）や音声コンポーネント（音声ストリーム）に関する情報が記述されている。ＭＭＴでは、映像コンポーネント及び音声コンポーネントがアセットと呼ばれる。 The control information encoding unit 5 encodes control information called PA message defined by MMT as control information related to the audio stream generated by the audio encoding unit 1 and the video stream generated by the HEVC encoding unit 3. Perform the process.
FIG. 11 shows the configuration of the PA message. The PA message is composed of one or more tables.
In one table included in the PA message, information on one or more video components (video streams) and audio components (audio streams) constituting one program (referred to as a package in MMT) is described. In MMT, video components and audio components are called assets.

具体的には、アセットを識別するアセットＩＤ、アセットの種類（ＨＥＶＣ形式の映像ストリームやＭＰＥＧ−４オーディオ形式の音声ストリームなどの種類）を識別するアセットタイプ、各アセットの符号化データやメタデータを格納しているＭＭＴＰパケットを示すパケットＩＤあるいはＩＰ配信される場合のＩＰアドレスなどアセットの取得先に関する情報、各アセットに関するメタ情報を記述するための各種記述子が、パッケージを構成するアセットの数分だけテーブルに含まれている。
記述子には、各アセットのＭＰＵ（ＭｅｄｉａＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を構成しているアクセスユニット（ＡＵ）の中で、提示順（表示順）で先頭のアクセスユニット（ＡＵ）の提示時刻（表示時刻）を示すＭＰＵタイムスタンプ記述子（提示時刻情報）などＭＭＴ規格で定義されている記述子の他に、ユーザーが独自に新たな記述子を定義することも可能であり、独自記述子としてＭＰＵ時刻情報記述子が含まれる。 Specifically, an asset ID for identifying an asset, an asset type for identifying an asset type (a type such as a video stream in HEVC format or an audio stream in MPEG-4 audio format), encoded data and metadata of each asset Various descriptors for describing information on asset acquisition destinations, such as packet IDs indicating stored MMTP packets or IP addresses when IP is distributed, and meta information on each asset are the number of assets constituting the package. Only included in the table.
In the descriptor, the presentation time (display time) of the first access unit (AU) in the presentation order (display order) among the access units (AU) constituting the MPU (Media Processing Unit) of each asset is displayed. In addition to the descriptors defined in the MMT standard, such as the MPU time stamp descriptor (presentation time information) to be shown, the user can also define a new descriptor independently. Includes children.

なお、ＭＰＵは、１以上のアクセスユニット（ＡＵ）から構成されており、ＭＰＵ単体で映像や音声の復号処理を行うことができる単位となる。また、ＭＰＵは、１以上のアクセスユニット（ＡＵ）の映像信号がフレーム間予測符号化方式で符号化される場合には、前記１以上のアクセスユニット（ＡＵ）の映像信号の全てを復号することが可能な複数のアクセスユニット（ＡＵ）の集合であるＧＯＰと同じ単位になる。 The MPU is composed of one or more access units (AU), and is a unit that can perform video and audio decoding processing by the MPU alone. Also, the MPU decodes all of the video signals of the one or more access units (AU) when the video signals of the one or more access units (AU) are encoded by the inter-frame predictive coding method. It becomes the same unit as GOP which is a set of a plurality of access units (AU).

ＭＰＵ時刻情報記述子には、復号時刻（ＤＴＳ）や提示時刻（ＰＴＳ）などの時刻情報を記述する単位を示す情報（ｔｉｍｅｓｃａｌｅ）やＭＰＵを構成しているアクセスユニット（ＡＵ）の中で、符号化順で先頭のアクセスユニット（ＡＵ）の復号時刻を算出するための情報（ｉｎｉｔｉａｌ＿ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｄｅｌａｙ)、ＭＰＵを構成している各アクセスユニットの復号時刻や提示時刻を算出するための情報が符号化されているか否かを示すフラグ（ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｏｆｆｓｅｔ＿ｐｒｅｓｅｎｔ＿ｆｌａｇ，ｄｅｃｏｄｉｎｇ＿ｔｉｍｅ＿ｏｆｆｓｅｔ＿ｐｒｅｓｅｎｔ＿ｆｌａｇ）、各アクセスユニットの復号時刻や提示時刻を算出するための情報を符号化するときの符号長を示す情報（ｔｉｍｅ＿ｏｆｆｓｅｔ＿ｌｅｎｇｔｈ＿ｍｉｎｕｓ１）などが記述されている。 The MPU time information descriptor includes a code indicating information (timescale) indicating a unit describing time information such as a decoding time (DTS) and a presentation time (PTS), and an access unit (AU) constituting the MPU. Whether the information (initial_presentation_time_delay) for calculating the decoding time of the head access unit (AU) in the order of generation and the information for calculating the decoding time and presentation time of each access unit constituting the MPU are encoded Flags indicating whether or not (presentation_time_offset_present_flag, decoding_time_offset_present_flag), and information indicating the code length when encoding the information for calculating the decoding time and the presentation time of each access unit Information (time_offset_length_minus1) and the like are described.

制御ＭＭＴＰペイロード生成部６は制御情報符号化部５により符号化された制御情報の符号化データからなる制御ＭＭＴＰペイロードを生成する処理を実施する。
なお、制御情報符号化部５の一部及び制御ＭＭＴＰペイロード生成部６から制御情報符号化手段が構成されている。また、制御情報符号化部５の他の一部は時刻情報符号化手段を構成している。 The control MMTP payload generation unit 6 performs a process of generating a control MMTP payload composed of encoded data of control information encoded by the control information encoding unit 5.
Note that a control information encoding unit includes a part of the control information encoding unit 5 and the control MMTP payload generation unit 6. The other part of the control information encoding unit 5 constitutes time information encoding means.

映像ＭＭＴＰパケット生成部９は、映像ＭＭＴＰペイロード生成部４により生成された映像ＭＭＴＰペイロードに所定のＭＭＴＰヘッダを付与してビットストリームを構成する映像ＭＭＴＰパケットを生成する。ＭＭＴＰヘッダは、必須で符号化する情報を含む必須ヘッダとオプショナルで符号化する情報を含む拡張ヘッダから構成される。必須ヘッダにはＭＭＴＰペイロードに含まれる符号化データの種別に応じて割り当てられるパケットＩＤなどが含まれる。
拡張ヘッダは、ＭＭＴＰペイロードに含まれる符号化データのアクセスユニット単位に提示時刻や復号時刻を算出するための情報（提示時刻情報や復号時刻情報）を符号化するか否かを示すフラグの値に応じて、提示時刻情報（ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｏｆｆｓｅｔ）や復号時刻情報（ｄｅｃｏｄｉｎｇ＿ｔｉｍｅ＿ｏｆｆｓｅｔ）が含まれる。 The video MMTP packet generation unit 9 adds a predetermined MMTP header to the video MMTP payload generated by the video MMTP payload generation unit 4 to generate a video MMTP packet constituting a bit stream. The MMTP header is composed of an essential header including information to be encoded as essential and an extension header including information to be encoded as an option. The essential header includes a packet ID assigned according to the type of encoded data included in the MMTP payload.
The extension header has a flag value indicating whether to encode information (presentation time information and decoding time information) for calculating a presentation time and a decoding time for each access unit of encoded data included in the MMTP payload. Accordingly, presentation time information (presentation_time_offset) and decoding time information (decoding_time_offset) are included.

音声ＭＭＴＰパケット生成部８は、音声ＭＭＴＰペイロード生成部２により生成された音声ＭＭＴＰペイロードに所定のＭＭＴＰヘッダを付与してビットストリームを構成する音声ＭＭＴＰパケットを生成する。ＭＭＴＰヘッダは、必須で符号化する情報を含む必須ヘッダとオプショナルで符号化する情報を含む拡張ヘッダから構成される。拡張ヘッダの内容については、映像ＭＭＴＰパケット生成部にて符号化される拡張ヘッダと同じである。 The voice MMTP packet generator 8 adds a predetermined MMTP header to the voice MMTP payload generated by the voice MMTP payload generator 2 to generate voice MMTP packets constituting a bit stream. The MMTP header is composed of an essential header including information to be encoded as essential and an extension header including information to be encoded as an option. The contents of the extension header are the same as the extension header encoded by the video MMTP packet generator.

制御ＭＭＴＰパケット生成部１０は、制御ＭＭＴＰペイロード生成部６により生成された制御ＭＭＴＰペイロードに所定のＭＭＴＰヘッダを付与し、ビットストリームを構成する制御ＭＭＴＰパケットを生成する。 The control MMTP packet generation unit 10 adds a predetermined MMTP header to the control MMTP payload generated by the control MMTP payload generation unit 6 and generates a control MMTP packet that forms a bit stream.

ＭＭＴＰパケット多重化部７は、音声ＭＭＴＰパケット生成部により生成された音声ＭＭＴＰパケットと、制御ＭＭＴＰパケット生成部により生成された制御ＭＭＴＰパケットと、映像ＭＭＴＰパケット生成部により生成された映像ＭＭＴＰパケットとを多重化してビットストリームを構成する処理を実施する。
ＭＭＴＰパケット多重化部７は、アセットごとに異なるビットストリームを構成することもできる。例えば図９に示すように時間階層符号化された映像ビットストリームの階層レベル２以下のアクセスユニットを含むアセットのＭＭＴＰパケットから構成されるビットストリーム１と、階層レベル３のアクセスユニットを含むアセットのＭＭＴＰパケットから構成されるビットストリーム２として、それぞれのビットストリームを異なる伝送形態で送ることも可能である。
なお、ＭＭＴＰパケット多重化部７は多重化手段を構成している。 The MMTP packet multiplexing unit 7 combines the audio MMTP packet generated by the audio MMTP packet generation unit, the control MMTP packet generated by the control MMTP packet generation unit, and the video MMTP packet generated by the video MMTP packet generation unit. A process of multiplexing and forming a bitstream is performed.
The MMTP packet multiplexing unit 7 can also configure a different bit stream for each asset. For example, as shown in FIG. 9, a bitstream 1 composed of an MMTP packet of an asset including an access unit of hierarchical level 2 or lower of a video bitstream encoded in time hierarchy, and an MMTP of an asset including an access unit of hierarchical level 3 It is also possible to send each bit stream in a different transmission form as the bit stream 2 composed of packets.
The MMTP packet multiplexing unit 7 constitutes multiplexing means.

図１の例では、符号化装置の構成要素である音声符号化部１、音声ＭＭＴＰペイロード生成部２、ＨＥＶＣ符号化部３、映像ＭＭＴＰペイロード生成部４、制御情報符号化部５、制御ＭＭＴＰペイロード生成部６及び制御ＭＭＴＰパケット生成部１０等のそれぞれが専用のハードウェア（例えば、ＣＰＵを実装している半導体集積回路、あるいは、ワンチップマイコンなど）で構成されているものを想定しているが、符号化装置がコンピュータで構成されていてもよい。
符号化装置をコンピュータで構成する場合、音声符号化部１、音声ＭＭＴＰペイロード生成部２、ＨＥＶＣ符号化部３、映像ＭＭＴＰペイロード生成部４、制御情報符号化部５、制御ＭＭＴＰペイロード生成部６及び制御ＭＭＴＰパケット生成部１０等の処理内容を記述しているプログラムをコンピュータのメモリに格納し、当該コンピュータのＣＰＵが当該メモリに格納されているプログラムを実行するようにすればよい。 In the example of FIG. 1, an audio encoding unit 1, an audio MMTP payload generating unit 2, an HEVC encoding unit 3, a video MMTP payload generating unit 4, a control information encoding unit 5, and a control MMTP payload that are components of the encoding device. It is assumed that each of the generation unit 6 and the control MMTP packet generation unit 10 is configured by dedicated hardware (for example, a semiconductor integrated circuit on which a CPU is mounted, or a one-chip microcomputer). The encoding device may be configured by a computer.
When the encoding apparatus is configured by a computer, an audio encoding unit 1, an audio MMTP payload generating unit 2, an HEVC encoding unit 3, a video MMTP payload generating unit 4, a control information encoding unit 5, a control MMTP payload generating unit 6, and A program describing the processing contents of the control MMTP packet generation unit 10 and the like may be stored in the memory of a computer, and the CPU of the computer may execute the program stored in the memory.

図２はこの発明の実施の形態１による符号化装置の処理内容（符号化方法）を示すフローチャートである。 FIG. 2 is a flowchart showing the processing contents (encoding method) of the encoding apparatus according to Embodiment 1 of the present invention.

図３はこの発明の実施の形態１による復号装置を示す構成図である。
図３において、ＭＭＴＰパケット解析部１２は、符号化装置（図１の符号化装置、あるいは、図１の符号化装置に相当する符号化装置）から出力された１以上のアセットを含む１以上のビットストリームが入力する。ＭＭＴＰパケット解析部１２は、ビットストリームを構成しているＭＭＴＰパケットのＭＭＴＰヘッダを解析して、そのＭＭＴＰヘッダに含まれているパケットＩＤを取得し、そのパケットＩＤがＭＭＴＰペイロードに含まれている符号化データが制御情報（ＰＡメッセージ）である旨を示していれば、そのＭＭＴＰパケットに含まれているＭＭＴＰペイロードである制御ＭＭＴＰペイロードを制御ＭＭＴＰペイロード処理部１３に出力する。 3 is a block diagram showing a decoding apparatus according to Embodiment 1 of the present invention.
3, the MMTP packet analysis unit 12 includes one or more assets including one or more assets output from the encoding device (the encoding device in FIG. 1 or the encoding device corresponding to the encoding device in FIG. 1). The bitstream is input. The MMTP packet analysis unit 12 analyzes the MMTP header of the MMTP packet constituting the bit stream, acquires the packet ID included in the MMTP header, and encodes the packet ID included in the MMTP payload. If the digitized data indicates control information (PA message), a control MMTP payload that is an MMTP payload included in the MMTP packet is output to the control MMTP payload processing unit 13.

制御ＭＭＴＰペイロード処理部１３はＭＭＴＰパケット解析部１２から出力された制御ＭＭＴＰペイロードに含まれている符号化データの復号処理を実施して、制御情報であるＰＡメッセージを復号する。
また、制御ＭＭＴＰペイロード処理部１３はＰＡメッセージに記述されているテーブルからパッケージを構成するアセットに関する情報、各アセットの符号化データやメタデータを格納しているＭＭＴＰパケットを示すパケットＩＤあるいはＩＰ配信される場合のＩＰアドレスなどアセットの取得先に関する情報を復号する。パケットＩＤやアセットの取得先に関する情報はＭＭＴＰパケット解析部へ出力する。
また、制御ＭＭＴＰペイロード処理部１３はＰＡメッセージに記述されているテーブルからパッケージを構成するアセットに関するＭＰＵタイムスタンプ記述子とＭＰＵ時刻情報記述子を復号する。 The control MMTP payload processing unit 13 performs a decoding process on the encoded data included in the control MMTP payload output from the MMTP packet analysis unit 12 and decodes the PA message as control information.
Further, the control MMTP payload processing unit 13 receives a packet ID indicating an MMTP packet storing information on assets constituting the package, encoded data and metadata of each asset from the table described in the PA message, or IP distribution. Information related to the asset acquisition destination, such as the IP address. Information regarding the packet ID and asset acquisition destination is output to the MMTP packet analysis unit.
In addition, the control MMTP payload processing unit 13 decodes the MPU time stamp descriptor and the MPU time information descriptor related to the assets constituting the package from the table described in the PA message.

ＭＭＴＰパケット解析部１２は、ＭＭＴＰヘッダに含まれているパケットＩＤを取得し、取得したパケットＩＤと制御ＭＭＴＰペイロード処理部１３より出力された各アセットのパケットＩＤとを照合し、そのパケットＩＤがＭＭＴＰペイロードに含まれている符号化データが音声信号または映像信号である旨を示していれば、そのＭＭＴＰパケットをアセット分離部１４に出力する処理を実施する。 The MMTP packet analysis unit 12 acquires the packet ID included in the MMTP header, collates the acquired packet ID with the packet ID of each asset output from the control MMTP payload processing unit 13, and the packet ID is MMTP If the encoded data included in the payload indicates that it is an audio signal or a video signal, a process of outputting the MMTP packet to the asset separation unit 14 is performed.

また、ＭＭＴＰパケット解析部１２は、制御ＭＭＴＰペイロード処理部１３により復号されたＭＰＵ時刻情報記述子に記述されているＭＰＵを構成している各アクセスユニットの復号時刻や提示時刻を算出するための情報が符号化されているか否かを示すフラグ（ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｏｆｆｓｅｔ＿ｐｒｅｓｅｎｔ＿ｆｌａｇ，ｄｅｃｏｄｉｎｇ＿ｔｉｍｅ＿ｏｆｆｓｅｔ＿ｐｒｅｓｅｎｔ＿ｆｌａｇ）の値に応じて、ＭＭＴＰ拡張ヘッダより提示時刻情報（ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｏｆｆｓｅｔ）や復号時刻情報（ｄｅｃｏｄｉｎｇ＿ｔｉｍｅ＿ｏｆｆｓｅｔ）を復号し、ＭＰＵタイムスタンプ記述子に記述されている提示順で先頭のアクセスユニット（ＡＵ）の提示時刻とＭＰＵ時刻情報記述子に記述されている符号化順で先頭のアクセスユニット（ＡＵ）の復号時刻を算出するための情報（ｉｎｉｔｉａｌ＿ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｄｅｌａｙ)を復号して得られる符号化順で先頭のアクセスユニット（ＡＵ）の復号時刻とから各アクセスユニット（ＡＵ）の提示時刻および復号時刻を算出する処理を実施する。算出された提示時刻および復号時刻は、アクセスユニットに含まれる符号化データの種別に応じて、音声ＭＭＴＰペイロード処理部１５および映像ＭＭＴＰペイロード処理部１９へ出力される。 Further, the MMTP packet analysis unit 12 is information for calculating the decoding time and the presentation time of each access unit constituting the MPU described in the MPU time information descriptor decoded by the control MMTP payload processing unit 13. Depending on the value of a flag (presentation_time_offset_present_flag, decoding_time_offset_present_flag) indicating whether or not is encoded, presentation time information (presentation_time_offset) and decoding time information (decoding_time descriptor) are decoded from the MMTP extension header. Described in the presentation time and MPU time information descriptor of the first access unit (AU) in the order of presentation Each access from the decoding time of the head access unit (AU) in the coding order obtained by decoding the information (initial_presentation_time_delay) for calculating the decoding time of the head access unit (AU) in the coded order Processing for calculating the presentation time and decoding time of the unit (AU) is performed. The calculated presentation time and decoding time are output to the audio MMTP payload processing unit 15 and the video MMTP payload processing unit 19 according to the type of encoded data included in the access unit.

アセット分離部１４は制御ＭＭＴＰペイロード処理部１３により復号されたＰＡメッセージのテーブルに記述されているアセットＩＤ、アセットタイプ及びパケットＩＤを参照して、ＭＭＴＰパケット解析部１２から出力されたＭＭＴＰパケットに含まれているＭＭＴＰペイロードが音声ＭＭＴＰペイロードであるのか、映像ＭＭＴＰペイロードであるのかを特定し、音声ＭＭＴＰペイロードであれば、そのＭＭＴＰパケットに含まれている音声ＭＭＴＰペイロードを抽出して、その音声ＭＭＴＰペイロードを音声ＭＭＴＰペイロード処理部１５に出力し、映像ＭＭＴＰペイロードであれば、そのＭＭＴＰパケットに含まれている映像ＭＭＴＰペイロードを抽出して、その映像ＭＭＴＰペイロードを映像ＭＭＴＰペイロード処理部１９に出力する処理を実施する。 The asset separation unit 14 refers to the asset ID, asset type, and packet ID described in the PA message table decoded by the control MMTP payload processing unit 13 and is included in the MMTP packet output from the MMTP packet analysis unit 12. Whether the MMTP payload is an audio MMTP payload or a video MMTP payload. If the MMTP payload is an audio MMTP payload, the audio MMTP payload included in the MMTP packet is extracted and the audio MMTP payload is extracted. Is output to the audio MMTP payload processing unit 15. If it is a video MMTP payload, the video MMTP payload included in the MMTP packet is extracted and the video MMTP payload is output to the video MMTP payload processing unit 19. Process to implement that.

音声ＭＭＴＰペイロード処理部１５はアセット分離部１４から出力された音声ＭＭＴＰペイロードから音声ストリームのＭＦＵ（ＭｅｄｉａＦｒａｇｍｅｎｔＵｎｉｔ）又はＭＰＵを再構成することで、後段の音声ストリーム復号部１７で復号可能な形式の音声エレメンタリーストリーム（音声ＥＳ）を生成し、その音声ＥＳを音声ＥＳバッファ１６に格納する処理を実施する。ＭＦＵは、ＭＰＵよりも小さな単位であり、１アクセスユニット（ＡＵ）または１ＮＡＬユニットを１ＭＦＵと定義することができる。
また、音声ＭＭＴＰペイロード処理部１５はアセット分離部１４から出力された音声ＭＭＴＰペイロードに含まれている音声ストリームに関するメタデータを抽出し、そのメタデータを音声ＥＳバッファ１６に格納する処理を実施する。音声ＥＳバッファ１６は音声ＥＳ及びメタデータを一時的に格納するメモリである。 The audio MMTP payload processing unit 15 reconstructs the MFU (Media Fragment Unit) or MPU of the audio stream from the audio MMTP payload output from the asset separation unit 14, so that the audio stream decoding unit 17 can decode the audio stream. A process of generating an audio elementary stream (audio ES) and storing the audio ES in the audio ES buffer 16 is performed. The MFU is a smaller unit than the MPU, and one access unit (AU) or one NAL unit can be defined as one MFU.
In addition, the audio MMTP payload processing unit 15 extracts the metadata regarding the audio stream included in the audio MMTP payload output from the asset separation unit 14 and stores the metadata in the audio ES buffer 16. The audio ES buffer 16 is a memory that temporarily stores the audio ES and metadata.

音声ストリーム復号部１７は、各アクセスユニット（ＡＵ）のＤＴＳ（復号時刻）になると、音声ＥＳバッファ１６から音声ＥＳを取り出して、当該アクセスユニット（ＡＵ）の音声信号を復号し、その復号した音声信号とＰＴＳ（提示時刻）を音声データバッファ１８に格納する処理を実施する。
音声データバッファ１８は音声ストリーム復号部１７により復号された音声信号とＰＴＳ（提示時刻）を一時的に格納するメモリである。 When the DTS (decoding time) of each access unit (AU) is reached, the audio stream decoding unit 17 extracts the audio ES from the audio ES buffer 16, decodes the audio signal of the access unit (AU), and decodes the decoded audio. Processing for storing the signal and PTS (presentation time) in the audio data buffer 18 is performed.
The audio data buffer 18 is a memory that temporarily stores the audio signal decoded by the audio stream decoding unit 17 and the PTS (presentation time).

映像ＭＭＴＰペイロード処理部１９はアセット分離部１４から出力された映像ＭＭＴＰペイロードから映像ストリームのＭＦＵ又はＭＰＵを再構成することで、後段のＨＥＶＣＥＳ復号部２１で復号可能な形式のＨＥＶＣエレメンタリーストリーム（ＨＥＶＣＥＳ）を生成し、そのＨＥＶＣエレメンタリーストリームをＨＥＶＣＥＳバッファ２０に格納する処理を実施する。
また、映像ＭＭＴＰペイロード処理部１９はアセット分離部１４から出力された映像ＭＭＴＰペイロードに含まれている映像ストリームに関するメタデータを抽出し、そのメタデータをＨＥＶＣＥＳバッファ２０に格納する処理を実施する。
ＨＥＶＣＥＳバッファ２０はＨＥＶＣエレメンタリーストリーム及びメタデータを一時的に格納するメモリである。 The video MMTP payload processing unit 19 reconstructs the video stream MFU or MPU from the video MMTP payload output from the asset separation unit 14, thereby enabling the HEVC elementary stream (HEVC) in a format decodable by the HEVCES decoding unit 21 in the subsequent stage. ES) is generated, and the HEVC elementary stream is stored in the HEVCES buffer 20.
In addition, the video MMTP payload processing unit 19 extracts metadata related to the video stream included in the video MMTP payload output from the asset separation unit 14 and stores the metadata in the HEVCES buffer 20.
The HEVCES buffer 20 is a memory that temporarily stores HEVC elementary streams and metadata.

ＨＥＶＣＥＳ復号部２１は各アクセスユニット（ＡＵ）のＤＴＳ（復号時刻）になると、ＨＥＶＣＥＳバッファ２０からＨＥＶＣエレメンタリーストリームを取り出して、当該アクセスユニット（ＡＵ）の映像信号を復号し、その復号した映像信号である復号画像とＰＴＳ（提示時刻）を復号画像バッファ２２に格納する処理を実施する。
復号画像バッファ２２はＨＥＶＣＥＳ復号部２１により復号された各アクセスユニット（ＡＵ）の復号画像とＰＴＳ（提示時刻）を一時的に格納するメモリである。
なお、映像ＭＭＴＰペイロード処理部１９、ＨＥＶＣＥＳバッファ２０、ＨＥＶＣＥＳ復号部２１及び復号画像バッファ２２から映像復号手段が構成されている。 When the HEVCES decoding unit 21 reaches the DTS (decoding time) of each access unit (AU), the HEVCES decoding unit 21 extracts the HEVC elementary stream from the HEVCES buffer 20, decodes the video signal of the access unit (AU), and decodes the decoded video signal. The decoded image and the PTS (presentation time) are stored in the decoded image buffer 22.
The decoded image buffer 22 is a memory that temporarily stores the decoded image and PTS (presentation time) of each access unit (AU) decoded by the HEVCES decoding unit 21.
The video MMTP payload processing unit 19, the HEVCES buffer 20, the HEVCES decoding unit 21, and the decoded image buffer 22 constitute video decoding means.

図３の例では、復号装置の構成要素であるＭＭＴＰパケット解析部１２、制御ＭＭＴＰペイロード処理部１３、アセット分離部１４、音声ＭＭＴＰペイロード処理部１５、音声ＥＳバッファ１６、音声ストリーム復号部１７、音声データバッファ１８、映像ＭＭＴＰペイロード処理部１９、ＨＥＶＣＥＳバッファ２０、ＨＥＶＣＥＳ復号部２１及び復号画像バッファ２２のそれぞれが専用のハードウェア（バッファ以外は、例えば、ＣＰＵを実装している半導体集積回路、あるいは、ワンチップマイコンなど）で構成されているものを想定しているが、復号装置がコンピュータで構成されていてもよい。
復号装置をコンピュータで構成する場合、音声ＥＳバッファ１６、音声データバッファ１８、ＨＥＶＣＥＳバッファ２０及び復号画像バッファ２２をコンピュータの内部メモリ又は外部メモリ上に構成するとともに、ＭＭＴＰパケット解析部１２、制御ＭＭＴＰペイロード処理部１３、アセット分離部１４、音声ＭＭＴＰペイロード処理部１５、音声ストリーム復号部１７、映像ＭＭＴＰペイロード処理部１９及びＨＥＶＣＥＳ復号部２１の処理内容を記述しているプログラムをコンピュータのメモリに格納し、当該コンピュータのＣＰＵが当該メモリに格納されているプログラムを実行するようにすればよい。 In the example of FIG. 3, the MMTP packet analysis unit 12, the control MMTP payload processing unit 13, the asset separation unit 14, the audio MMTP payload processing unit 15, the audio ES buffer 16, the audio stream decoding unit 17, and the audio, which are components of the decoding device. Each of the data buffer 18, the video MMTP payload processing unit 19, the HEVCES buffer 20, the HEVCES decoding unit 21, and the decoded image buffer 22 has dedicated hardware (other than the buffer, for example, a semiconductor integrated circuit on which a CPU is mounted, or However, the decoding device may be constituted by a computer.
When the decoding apparatus is configured by a computer, the audio ES buffer 16, the audio data buffer 18, the HEVCES buffer 20, and the decoded image buffer 22 are configured on the internal memory or external memory of the computer, and the MMTP packet analysis unit 12, the control MMTP payload A program describing processing contents of the processing unit 13, the asset separation unit 14, the audio MMTP payload processing unit 15, the audio stream decoding unit 17, the video MMTP payload processing unit 19 and the HEVCES decoding unit 21 is stored in a memory of a computer. The CPU of the computer may execute a program stored in the memory.

図４はこの発明の実施の形態１による復号装置の処理内容（復号方法）を示すフローチャートである。 FIG. 4 is a flowchart showing the processing contents (decoding method) of the decoding apparatus according to Embodiment 1 of the present invention.

次に動作について説明する。
最初の符号化装置の処理内容を説明する。
音声符号化部１は、ディジタルの音声信号が与えられると、音声のアクセスユニット（ＡＵ）単位に、例えば、ＭＰＥＧ−４オーディオなどの方式によって当該音声信号を符号化して、その音声信号の符号化データである音声ストリームを生成するとともに、その音声ストリームに関するメタデータを符号化する（図２のステップＳＴ１）。
ＨＥＶＣ符号化部３は、ディジタルの映像信号が与えられると、映像のアクセスユニット（ＡＵ）単位に、ＨＥＶＣ方式によって当該映像信号を符号化して、その映像信号の符号化データである映像ストリームを生成するとともに、その映像ストリームに関するメタデータを符号化する（ステップＳＴ２）。 Next, the operation will be described.
The processing contents of the first encoding device will be described.
When a digital audio signal is given, the audio encoding unit 1 encodes the audio signal in units of audio access units (AU) by a method such as MPEG-4 audio, and encodes the audio signal. An audio stream that is data is generated, and metadata relating to the audio stream is encoded (step ST1 in FIG. 2).
When a digital video signal is given, the HEVC encoding unit 3 encodes the video signal by the HEVC method for each video access unit (AU), and generates a video stream that is encoded data of the video signal. At the same time, metadata about the video stream is encoded (step ST2).

ここで、図５はＭＭＴでビットストリームを伝送する場合の符号化データの概要を示す説明図である。
図５において、アクセスユニット（ＡＵ）は、映像であれば、１ピクチャを復号するために必要な符号化データを含む単位であり、音声であれば、符号化単位となる１以上のサンプルから構成されるフレームである。
ＮＡＬユニットはＨＥＶＣの符号化単位であり、１アクセスユニット（ＡＵ）は、１以上のＮＡＬユニットから構成される。
ＭＰＵは、１以上のアクセスユニットから構成され、ＭＰＵ単体で映像や音声の復号処理を行うことができる単位となる。また、ＭＰＵは、１以上のアクセスユニット（ＡＵ）の映像信号がフレーム間予測符号化方式で符号化される場合には、前記１以上のアクセスユニット（ＡＵ）の映像信号の全てを復号することが可能な複数のアクセスユニット（ＡＵ）の集合であるＧＯＰと同じ単位になる。
ＭＦＵは、ＭＰＵよりも小さな単位であり、１アクセスユニット（ＡＵ）又は１ＮＡＬユニットを１ＭＦＵと定義することができる。 Here, FIG. 5 is an explanatory diagram showing an outline of encoded data when a bit stream is transmitted by MMT.
In FIG. 5, an access unit (AU) is a unit that includes encoded data necessary for decoding one picture if it is a video, and is composed of one or more samples that are an encoding unit if it is audio. Is the frame to be played.
The NAL unit is a HEVC encoding unit, and one access unit (AU) is composed of one or more NAL units.
The MPU is composed of one or more access units, and is a unit that can perform video and audio decoding processing by the MPU alone. Also, the MPU decodes all of the video signals of the one or more access units (AU) when the video signals of the one or more access units (AU) are encoded by the inter-frame predictive coding method. It becomes the same unit as GOP which is a set of a plurality of access units (AU).
The MFU is a smaller unit than the MPU, and one access unit (AU) or one NAL unit can be defined as one MFU.

図６はＭＰＵの構成例を示す説明図である。
図６において、ＭＰＵメタデータは、ＭＰＵに関連するメタデータが記述されるものである。なおＭＰＵメタデータは符号化しなくてもよい。
ムービーフラグメントメタデータ（ＭＦメタ）は、１アクセスユニット（ＡＵ）の符号化データ（サンプルデータ）に付随するメタデータが記述されるものである。例えば、アクセスユニット（ＡＵ）の符号化データがファイル形式で格納される場合、アクセスユニット（ＡＵ）毎に、符号化データが格納されているアドレスや符号化データのデータ長、当該アクセスユニット（ＡＵ）の時間長に関する情報が含まれる。なおムービーフラグメントメタデータは符号化しなくてもよい。
ＭＰＵメタデータ、ムービーフラグメントメタデータ、ＭＦＵ及びＭＭＴの制御情報は、ＭＭＴＰパケット化されて伝送される。ＭＭＴＰパケットはＭＭＴＰヘッダとＭＭＴＰペイロードから構成される。 FIG. 6 is an explanatory diagram showing a configuration example of the MPU.
In FIG. 6, MPU metadata describes metadata related to the MPU. The MPU metadata may not be encoded.
Movie fragment metadata (MF metadata) describes metadata accompanying encoded data (sample data) of one access unit (AU). For example, when the encoded data of the access unit (AU) is stored in a file format, for each access unit (AU), the address where the encoded data is stored, the data length of the encoded data, the access unit (AU) ) Is included. Movie fragment metadata may not be encoded.
MPU metadata, movie fragment metadata, MFU and MMT control information are transmitted as MMTP packets. The MMTP packet is composed of an MMTP header and an MMTP payload.

音声ＭＭＴＰペイロード生成部２は、音声符号化部１からメタデータ（ＭＰＵメタデータ、ＭＦメタなど）の符号化データと、アクセスユニット（ＡＵ）単位の音声信号の符号化データとを受けると、ＭＰＵ単位のＭＰＵメタデータの符号化データと、アクセスユニット（ＡＵ）単位のＭＦメタの符号化データ及び音声信号の符号化データ（サンプルデータ）からなる音声ＭＭＴＰペイロードを生成する（ステップＳＴ３）。
映像ＭＭＴＰペイロード生成部４は、ＨＥＶＣ符号化部３からメタデータ（ＭＰＵメタデータ、ＭＦメタなど）の符号化データと、アクセスユニット（ＡＵ）単位の映像信号の符号化データとを受けると、ＭＰＵ単位のＭＰＵメタデータの符号化データと、アクセスユニット（ＡＵ）単位のＭＦメタの符号化データ及び映像信号の符号化データ（サンプルデータ）からなる映像ＭＭＴＰペイロードを生成する（ステップＳＴ４）。 When the audio MMTP payload generation unit 2 receives the encoded data of the metadata (MPU metadata, MF meta, etc.) and the encoded data of the audio signal in units of access units (AU) from the audio encoding unit 1, An audio MMTP payload including encoded data of MPU metadata in units, encoded data of MF meta in units of access units (AU), and encoded data (sample data) of audio signals is generated (step ST3).
When receiving the encoded data of metadata (MPU metadata, MF meta, etc.) and the encoded data of the video signal in units of access units (AU) from the HEVC encoder 3, the video MMTP payload generator 4 receives the MPU. A video MMTP payload composed of encoded data of MPU metadata in units, encoded data of MF meta in units of access units (AU), and encoded data (sample data) of video signals is generated (step ST4).

制御情報符号化部５は、音声符号化部１により生成された音声ストリーム及びＨＥＶＣ符号化部３により生成された映像ストリームに関する制御情報を符号化する（ステップＳＴ５）。
音声ストリーム及び映像ストリームに関する制御情報として、例えば、ＭＭＴで規定されているＰＡメッセージやＭＰＵ時刻情報記述子などを符号化する。
ＰＡメッセージには、上述したように、１つのプログラム（ＭＭＴでは、パッケージと称する）を構成する１以上の映像コンポーネント（映像ストリーム）や音声コンポーネント（音声ストリーム）に関する情報が記述されている。
即ち、ＰＡメッセージには、音声符号化部１及びＨＥＶＣ符号化部３により生成されたアセット（映像ストリーム、音声ストリーム）を識別するアセットＩＤ、アセットの種類を識別するアセットタイプ、各アセットのＭＰＵを構成しているアクセスユニット（ＡＵ）の中で、提示順で先頭のアクセスユニット（ＡＵ）の提示時刻を示すＭＰＵタイムスタンプ記述子、各アセットの符号化データやメタデータを格納しているＭＭＴＰパケットを示すパケットＩＤなどが記述されている。 The control information encoding unit 5 encodes the control information regarding the audio stream generated by the audio encoding unit 1 and the video stream generated by the HEVC encoding unit 3 (step ST5).
As control information related to an audio stream and a video stream, for example, a PA message or an MPU time information descriptor defined by MMT is encoded.
As described above, in the PA message, information on one or more video components (video streams) and audio components (audio streams) constituting one program (referred to as a package in MMT) is described.
That is, the PA message includes an asset ID for identifying an asset (video stream, audio stream) generated by the audio encoding unit 1 and HEVC encoding unit 3, an asset type for identifying the asset type, and an MPU for each asset. MMTP packet storing MPU time stamp descriptor indicating the presentation time of the first access unit (AU) in the order of presentation among the configured access units (AU), encoded data and metadata of each asset A packet ID or the like indicating is described.

図７はＭＰＵ時刻情報記述子を示す説明図である。
ＭＰＵ時刻情報記述子には、図７に示すように、どのＭＰＵに関連する情報を含むかを識別するためのシーケンス番号（ｍｐｕ＿ｓｅｑｕｅｎｃｅ＿ｎｕｍｂｅｒ）と、符号化順でＭＰＵ先頭のアクセスユニットの復号時刻と提示順でＭＰＵ先頭のアクセスユニットの提示時刻の時間差（ｉｎｉｔｉａｌ＿ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｄｅｌａｙ）と、アクセスユニットの単位で符号化する提示時刻情報及び表示時刻情報を表す単位（ｔｉｍｅｓｃａｌｅ）（１／ｔｉｍｅｓｌｃａｌｅ秒）、アクセスユニットの単位で提示時刻情報を符号化するか否かを示すフラグ（ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｏｆｆｓｅｔ＿ｐｒｅｓｅｎｔ＿ｆｌａｇ）、アクセスユニットの単位で復号時刻情報を符号化するか否かを示すフラグ（ｄｅｃｏｄｉｎｇ＿ｔｉｍｅ＿ｏｆｆｓｅｔ＿ｐｒｅｓｅｎｔ＿ｆｌａｇ）、アクセスユニットの単位で符号化する提示時刻情報及び表示時刻情報の符号長（ｔｉｍｅ＿ｏｆｆｓｅｔ＿ｌｅｎｇｔｈ＿ｍｉｎｕｓ１）などが記述されている。なお、ｔｉｍｅｓｃａｌｅ、ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｏｆｆｓｅｔ＿ｐｒｅｓｅｎｔ＿ｆｌａｇ、ｄｅｃｏｄｉｎｇ＿ｔｉｍｅ＿ｏｆｆｓｅｔ＿ｐｒｅｓｅｎｔ＿ｆｌａｇ、ｔｉｍｅ＿ｏｆｆｓｅｔ＿ｌｅｎｇｔｈ＿ｍｉｎｕｓ１は常に固定値を用いるようにすれば、符号化しなくてもよい。 FIG. 7 is an explanatory diagram showing the MPU time information descriptor.
In the MPU time information descriptor, as shown in FIG. 7, a sequence number (mpu_sequence_number) for identifying which MPU-related information is included, and the decoding time and presentation of the MPU head access unit in coding order The time difference (initial_presentation_time_delay) of the presentation time of the MPU head access unit in order, the unit of presentation time information encoded in the unit of access unit and the display time information (timescale) (1 / timescale second), the unit of access unit A flag (presentation_time_offset_present_flag) indicating whether or not the presentation time information is encoded, and a flag (encoding or not decoding time information is encoded in units of access units) ecoding_time_offset_present_flag), the code length of the presentation time information and display time information encoding in units of access units (time_offset_length_minus1), etc. are described. Note that timescale, presentation_time_offset_present_flag, decoding_time_offset_present_flag, and time_offset_length_minus1 do not have to be encoded if fixed values are always used.

制御ＭＭＴＰペイロード生成部６は、制御情報符号化部５から制御情報の符号化データを受けると、その制御情報の符号化データからなる制御ＭＭＴＰペイロードを生成する（ステップＳＴ６）。
映像ＭＭＴＰパケット生成部は、映像ＭＭＴＰペイロード生成部により生成された映像ＭＭＴＰペイロードに所定のＭＭＴＰヘッダを付与してビットストリームを構成する映像ＭＭＴＰパケットを生成する。ＭＭＴＰヘッダは、必須で符号化する情報を含む必須ヘッダとオプショナルで符号化する情報を含む拡張ヘッダから構成される。必須ヘッダにはＭＭＴＰペイロードに含まれる符号化データの種別に応じて割り当てられるパケットＩＤなどが含まれる。 When receiving the encoded data of the control information from the control information encoding unit 5, the control MMTP payload generating unit 6 generates a control MMTP payload including the encoded data of the control information (step ST6).
The video MMTP packet generation unit adds a predetermined MMTP header to the video MMTP payload generated by the video MMTP payload generation unit to generate a video MMTP packet constituting a bit stream. The MMTP header is composed of an essential header including information to be encoded as essential and an extension header including information to be encoded as an option. The essential header includes a packet ID assigned according to the type of encoded data included in the MMTP payload.

拡張ヘッダは、ＭＭＴＰペイロードに含まれる符号化データのアクセスユニット単位に提示時刻や復号時刻を算出するための情報（提示時刻情報や復号時刻情報）を符号化するか否かを示すフラグの値に応じて、提示時刻情報（ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｏｆｆｓｅｔ）や復号時刻情報（ｄｅｃｏｄｉｎｇ＿ｔｉｍｅ＿ｏｆｆｓｅｔ）が含まれる。
提示時刻情報（ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｏｆｆｓｅｔ）は、ＭＭＴＰペイロードに含まれる符号化データのアクセスユニットの提示時刻と提示順でＭＰＵ先頭アクセスユニットの提示時刻の差である。
復号時刻情報（ｄｅｃｏｄｉｎｇ＿ｔｉｍｅ＿ｏｆｆｓｅｔ）は、ＭＭＴＰペイロードに含まれる符号化データのアクセスユニットの復号時刻と符号化順でＭＰＵ先頭アクセスユニットの復号時刻の差である。
なお提示時刻情報（ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｏｆｆｓｅｔ）は、復号時刻情報（ｄｅｃｏｄｉｎｇ＿ｔｉｍｅ＿ｏｆｆｓｅｔ）を復号して算出されるアクセスユニットの復号時刻からの差を符号化するようにしてもよい。 The extension header has a flag value indicating whether to encode information (presentation time information and decoding time information) for calculating a presentation time and a decoding time for each access unit of encoded data included in the MMTP payload. Accordingly, presentation time information (presentation_time_offset) and decoding time information (decoding_time_offset) are included.
The presentation time information (presentation_time_offset) is a difference between the presentation time of the access unit of the encoded data included in the MMTP payload and the presentation time of the MPU head access unit in the presentation order.
The decoding time information (decoding_time_offset) is the difference between the decoding time of the access unit of the encoded data included in the MMTP payload and the decoding time of the MPU head access unit in the encoding order.
The presentation time information (presentation_time_offset) may be encoded as a difference from the decoding time of the access unit calculated by decoding the decoding time information (decoding_time_offset).

音声ＭＭＴＰパケット生成部８は、音声ＭＭＴＰペイロード生成部２により生成された音声ＭＭＴＰペイロードに所定のＭＭＴＰヘッダを付与してビットストリームを構成する音声ＭＭＴＰパケットを生成する。ＭＭＴＰヘッダは、必須で符号化する情報を含む必須ヘッダとオプショナルで符号化する情報を含む拡張ヘッダから構成される。 The voice MMTP packet generator 8 adds a predetermined MMTP header to the voice MMTP payload generated by the voice MMTP payload generator 2 to generate voice MMTP packets constituting a bit stream. The MMTP header is composed of an essential header including information to be encoded as essential and an extension header including information to be encoded as an option.

制御ＭＭＴＰパケット生成部１０は、制御ＭＭＴＰペイロード生成部６により生成された制御ＭＭＴＰペイロードに所定のＭＭＴＰヘッダを付与し、ビットストリームを構成する制御ＭＭＴＰパケットを生成する。
このＭＭＴＰパケットを生成する際、所定のＭＭＴＰヘッダを付与するが、このＭＭＴＰヘッダには、ＭＭＴＰペイロードに含まれている符号化データの種別に応じて割り当てられるパケットＩＤが含まれる。 The control MMTP packet generation unit 10 adds a predetermined MMTP header to the control MMTP payload generated by the control MMTP payload generation unit 6 and generates a control MMTP packet that forms a bit stream.
When this MMTP packet is generated, a predetermined MMTP header is added, and this MMTP header includes a packet ID assigned according to the type of encoded data included in the MMTP payload.

ＭＭＴＰパケット多重化部７は、音声ＭＭＴＰパケット生成部８により生成された音声ＭＭＴＰパケットと、制御ＭＭＴＰパケット生成部１０により生成された制御ＭＭＴＰパケットと、映像ＭＭＴＰパケット生成部９により生成された映像ＭＭＴＰパケットとを多重化してビットストリームを構成する。（ステップＳＴ７） The MMTP packet multiplexing unit 7 includes a voice MMTP packet generated by the voice MMTP packet generation unit 8, a control MMTP packet generated by the control MMTP packet generation unit 10, and a video MMTP generated by the video MMTP packet generation unit 9. A bit stream is multiplexed to form a bit stream. (Step ST7)

次に復号装置の処理内容を説明する。
ＭＭＴＰパケット解析部１２は、符号化装置（図１の符号化装置、あるいは、図１の符号化装置に相当する符号化装置）から出力された１以上のアセットを含む１以上のビットストリームを入力して、そのビットストリームを構成しているＭＭＴＰパケットのＭＭＴＰヘッダを解析して、そのＭＭＴＰヘッダに含まれているパケットＩＤを取得する。
ＭＭＴＰパケット解析部１２は、そのパケットＩＤがＭＭＴＰペイロードに含まれている符号化データが制御情報（ＰＡメッセージ、ＨＥＶＣピクチャ構造記述子）である旨を示していれば、そのＭＭＴＰパケットに含まれているＭＭＴＰペイロードである制御ＭＭＴＰペイロードを制御ＭＭＴＰペイロード処理部１３に出力する。 Next, processing contents of the decoding device will be described.
The MMTP packet analysis unit 12 inputs one or more bit streams including one or more assets output from the encoding device (the encoding device in FIG. 1 or the encoding device corresponding to the encoding device in FIG. 1). Then, the MMTP header of the MMTP packet constituting the bit stream is analyzed, and the packet ID included in the MMTP header is obtained.
The MMTP packet analysis unit 12 includes the packet ID included in the MMTP packet if the encoded data included in the MMTP payload indicates control information (PA message, HEVC picture structure descriptor). The control MMTP payload, which is the MMTP payload being stored, is output to the control MMTP payload processing unit 13.

一方、そのパケットＩＤがＭＭＴＰペイロードに含まれている符号化データが音声信号又は映像信号である旨を示していれば、そのＭＭＴＰパケットをアセット分離部１４に出力する。
また、ＭＭＴＰパケット解析部１２は、制御ＭＭＴＰペイロード処理部１３により復号されたＭＰＵ時刻情報記述子に記述されているＭＰＵを構成している各アクセスユニットの復号時刻や提示時刻を算出するための情報が符号化されているか否かを示すフラグ（ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｏｆｆｓｅｔ＿ｐｒｅｓｅｎｔ＿ｆｌａｇ，ｄｅｃｏｄｉｎｇ＿ｔｉｍｅ＿ｏｆｆｓｅｔ＿ｐｒｅｓｅｎｔ＿ｆｌａｇ）の値に応じて、ＭＭＴＰ拡張ヘッダより提示時刻情報（ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｏｆｆｓｅｔ）や復号時刻情報（ｄｅｃｏｄｉｎｇ＿ｔｉｍｅ＿ｏｆｆｓｅｔ）を復号し、ＭＰＵタイムスタンプ記述子に記述されている提示順で先頭のアクセスユニット（ＡＵ）の提示時刻とＭＰＵ時刻情報記述子に記述されている符号化順で先頭のアクセスユニット（ＡＵ）の復号時刻とから各アクセスユニット（ＡＵ）の提示時刻および復号時刻を算出する処理を実施する。算出された提示時刻および復号時刻は、アクセスユニットに含まれる符号化データの種別に応じて、音声ＭＭＴＰペイロード処理部１５および映像ＭＭＴＰペイロード処理部１９へ出力される。 On the other hand, if the packet ID indicates that the encoded data included in the MMTP payload is an audio signal or a video signal, the MMTP packet is output to the asset separation unit 14.
Further, the MMTP packet analysis unit 12 is information for calculating the decoding time and the presentation time of each access unit constituting the MPU described in the MPU time information descriptor decoded by the control MMTP payload processing unit 13. Depending on the value of a flag (presentation_time_offset_present_flag, decoding_time_offset_present_flag) indicating whether or not is encoded, presentation time information (presentation_time_offset) and decoding time information (decoding_time descriptor) are decoded from the MMTP extension header. Described in the presentation time and MPU time information descriptor of the first access unit (AU) in the order of presentation It is performed and a decoding time of the beginning of the access unit (AU) a process of calculating a presentation time and decoding time of each access unit (AU) in coding order is. The calculated presentation time and decoding time are output to the audio MMTP payload processing unit 15 and the video MMTP payload processing unit 19 according to the type of encoded data included in the access unit.

アセット分離部１４は、制御ＭＭＴＰペイロード処理部１３がＰＡメッセージを復号すると、そのＰＡメッセージのテーブルに記述されているアセットＩＤ、アセットタイプ及びパケットＩＤを参照して、ＭＭＴＰパケット解析部１２から出力されたＭＭＴＰパケットに含まれているＭＭＴＰペイロードが音声ＭＭＴＰペイロードであるのか、映像ＭＭＴＰペイロードであるのかを特定する。 When the control MMTP payload processing unit 13 decrypts the PA message, the asset separation unit 14 refers to the asset ID, asset type, and packet ID described in the PA message table, and is output from the MMTP packet analysis unit 12. Whether the MMTP payload included in the MMTP packet is an audio MMTP payload or a video MMTP payload is specified.

アセット分離部１４は、ＭＭＴＰパケット解析部１２から出力されたＭＭＴＰパケットに含まれているＭＭＴＰペイロードが音声ＭＭＴＰペイロードであれば、そのＭＭＴＰパケットに含まれている音声ＭＭＴＰペイロードを抽出して、その音声ＭＭＴＰペイロードを音声ＭＭＴＰペイロード処理部１５に出力する。
アセット分離部１４は、ＭＭＴＰパケット解析部１２から出力されたＭＭＴＰパケットに含まれているＭＭＴＰペイロードが映像ＭＭＴＰペイロードであれば、そのＭＭＴＰパケットに含まれている映像ＭＭＴＰペイロードを抽出して、その映像ＭＭＴＰペイロードを映像ＭＭＴＰペイロード処理部１９に出力する。 If the MMTP payload included in the MMTP packet output from the MMTP packet analysis unit 12 is an audio MMTP payload, the asset separation unit 14 extracts the audio MMTP payload included in the MMTP packet and extracts the audio The MMTP payload is output to the voice MMTP payload processing unit 15.
If the MMTP payload included in the MMTP packet output from the MMTP packet analysis unit 12 is a video MMTP payload, the asset separation unit 14 extracts the video MMTP payload included in the MMTP packet and extracts the video The MMTP payload is output to the video MMTP payload processing unit 19.

音声ＭＭＴＰペイロード処理部１５は、アセット分離部１４から音声ＭＭＴＰペイロードを受けると、その音声ＭＭＴＰペイロードから音声ストリームのＭＦＵ又はＭＰＵを再構成することで、後段の音声ストリーム復号部１７で復号可能な形式の音声エレメンタリーストリーム（音声ＥＳ）を生成し、その音声ＥＳを音声ＥＳバッファ１６に格納する。
音声ＭＭＴＰペイロードから音声ＥＳを生成する処理自体は公知の技術であるため詳細な説明を省略する。
また、音声ＭＭＴＰペイロード処理部１５は、アセット分離部１４から出力された音声ＭＭＴＰペイロードに含まれている音声ストリームに関するメタデータを抽出し、そのメタデータを音声ＥＳバッファ１６に格納する。 When the audio MMTP payload processing unit 15 receives the audio MMTP payload from the asset separation unit 14, the audio MMTP payload processing unit 15 reconstructs the MFU or MPU of the audio stream from the audio MMTP payload, so that the audio stream decoding unit 17 can decode the audio stream. Audio elementary stream (audio ES) is generated, and the audio ES is stored in the audio ES buffer 16.
Since the process itself for generating the audio ES from the audio MMTP payload is a known technique, detailed description thereof is omitted.
In addition, the audio MMTP payload processing unit 15 extracts metadata about the audio stream included in the audio MMTP payload output from the asset separation unit 14 and stores the metadata in the audio ES buffer 16.

音声ストリーム復号部１７は、ＭＭＴＰパケット解析部にて復号したＤＴＳを参照して、各アクセスユニット（ＡＵ）の復号時刻を把握し、各アクセスユニット（ＡＵ）の復号時刻になると、音声ＥＳバッファ１６から音声ＥＳを取り出して、当該アクセスユニット（ＡＵ）の音声信号を復号し、その復号した音声信号とＭＭＴＰパケット解析部にて復号したＰＴＳ（提示時刻）を音声データバッファ１８に格納する。
これにより、外部の再生装置（図示せず）は、音声データバッファ１８に格納されている音声信号とＰＴＳ（提示時刻）を取り出せば、その提示時刻に音声信号を再生することができる。 The audio stream decoding unit 17 refers to the DTS decoded by the MMTP packet analysis unit, grasps the decoding time of each access unit (AU), and at the decoding time of each access unit (AU), the audio ES buffer 16 The voice ES is taken out from the voice signal, the voice signal of the access unit (AU) is decoded, and the decoded voice signal and the PTS (presentation time) decoded by the MMTP packet analysis unit are stored in the voice data buffer 18.
Thus, if an external playback device (not shown) takes out the audio signal and the PTS (presentation time) stored in the audio data buffer 18, the audio signal can be reproduced at the presentation time.

映像ＭＭＴＰペイロード処理部１９は、アセット分離部１４から映像ＭＭＴＰペイロードを受けると、その映像ＭＭＴＰペイロードから映像ストリームのＭＦＵ又はＭＰＵを再構成することで、後段のＨＥＶＣＥＳ復号部２１で復号可能な形式のＨＥＶＣエレメンタリーストリーム（ＨＥＶＣＥＳ）を生成し、そのＨＥＶＣエレメンタリーストリームをＨＥＶＣＥＳバッファ２０に格納する。
映像ＭＭＴＰペイロードからＨＥＶＣエレメンタリーストリームを生成する処理自体は公知の技術であるため詳細な説明を省略する。
また、映像ＭＭＴＰペイロード処理部１９は、アセット分離部１４から出力された映像ＭＭＴＰペイロードに含まれている映像ストリームに関するメタデータを抽出し、そのメタデータをＨＥＶＣＥＳバッファ２０に格納する。 When the video MMTP payload processing unit 19 receives the video MMTP payload from the asset separation unit 14, the video MMTP payload processing unit 19 reconstructs the MFU or MPU of the video stream from the video MMTP payload, so that the HEVCES decoding unit 21 in the subsequent stage can decode it. A HEVC elementary stream (HEVC ES) is generated, and the HEVC elementary stream is stored in the HEVCES buffer 20.
Since the process itself for generating the HEVC elementary stream from the video MMTP payload is a known technique, detailed description thereof is omitted.
In addition, the video MMTP payload processing unit 19 extracts metadata regarding the video stream included in the video MMTP payload output from the asset separation unit 14, and stores the metadata in the HEVCES buffer 20.

ＨＥＶＣＥＳ復号部２１は、ＭＭＴＰパケット解析部にて復号したＤＴＳを参照して、各アクセスユニット（ＡＵ）の復号時刻を把握し、各アクセスユニット（ＡＵ）の復号時刻になると、ＨＥＶＣＥＳバッファ２０からＨＥＶＣエレメンタリーストリームを取り出して、当該アクセスユニット（ＡＵ）の映像信号を復号し、その復号した映像信号である復号画像とＭＭＴＰパケット解析部にて復号したＰＴＳ（提示時刻）を復号画像バッファ２２に格納する。
これにより、外部の再生装置（図示せず）は、復号画像バッファ２２に格納されている復号画像とＰＴＳ（提示時刻）を取り出せば、その提示時刻に復号画像を再生することができる。
ＴｅｍｐｏｒａｌＩＤが０からＭのアクセスユニットから構成される時間階層符号化された映像ビットストリームを、ＴｅｍｐｏｒａｌＩＤが０から（Ｍ−１）のアクセスユニットから構成されるビットストリームとＴｅｍｐｏｒａｌＩＤがＭのアクセスユニットから構成されるビットストリームに分離して、それぞれのビットストリームが異なる伝送路を用いて伝送されるときの処理について説明する。
図１２に分離前のビットストリームと分離後のビットストリームの一例を示す。 The HEVCES decoding unit 21 refers to the DTS decoded by the MMTP packet analysis unit to grasp the decoding time of each access unit (AU). When the decoding time of each access unit (AU) comes, the HEVCES buffer 20 reads the HEVC The elementary stream is extracted, the video signal of the access unit (AU) is decoded, and the decoded image that is the decoded video signal and the PTS (presentation time) decoded by the MMTP packet analysis unit are stored in the decoded image buffer 22. To do.
Thus, if an external playback device (not shown) takes out the decoded image and the PTS (presentation time) stored in the decoded image buffer 22, the decoded image can be reproduced at the presentation time.
A temporal bitstream-coded video bitstream composed of access units with TemporalID 0 to M, a bitstream composed of access units with TemporalID 0 to (M-1), and an access unit with TemporalID M The processing when each bit stream is transmitted using different transmission paths will be described.
FIG. 12 shows an example of the bit stream before separation and the bit stream after separation.

ＭＭＴＰパケット解析部１２は、入力されたビットストリームを構成しているＭＭＴＰパケットに含まれている制御ＭＭＴＰペイロードを制御ＭＭＴＰペイロード処理部１３に出力し、そのビットストリームを構成しているＭＭＴＰパケットに含まれている音声ＭＭＴＰペイロード又は映像ＭＭＴＰペイロードをアセット分離部１４に出力する。 The MMTP packet analysis unit 12 outputs the control MMTP payload included in the MMTP packet constituting the input bit stream to the control MMTP payload processing unit 13, and is included in the MMTP packet constituting the bit stream. The recorded audio MMTP payload or video MMTP payload is output to the asset separation unit 14.

制御ＭＭＴＰペイロード処理部１３は、ＭＭＴＰパケット解析部１２から制御ＭＭＴＰペイロードを受けると、その制御ＭＭＴＰペイロードに含まれている符号化データの復号処理を実施して、制御情報であるＰＡメッセージを復号する（ステップＳＴ１５）。
制御ＭＭＴＰペイロード処理部１３は、ＰＡメッセージに記述されているアセットに関する情報などから、映像ビットストリームが時間階層符号化されていて、時間階層レベル（ＴｅｍｐｏｒａｌＩＤ）によって２以上のアセット（例えばアセット１、アセット２）に分離されてアセットごとに異なる伝送路から取得していること、アセット間の依存関係（アセット２はアセット１と依存関係にあること）などの情報を取得する。 Upon receiving the control MMTP payload from the MMTP packet analysis unit 12, the control MMTP payload processing unit 13 performs a decoding process on the encoded data included in the control MMTP payload and decodes the PA message that is the control information (Step ST15).
The control MMTP payload processing unit 13 encodes the video bitstream based on the information related to the asset described in the PA message, etc., and has two or more assets (for example, asset 1, asset, etc.) according to the temporal hierarchy level (TemporalID). 2) to acquire information such as that the asset is acquired from a different transmission path for each asset, and the dependency relationship between assets (the asset 2 is dependent on the asset 1).

アセット分離部１４は、ＭＭＴＰパケット解析部１２からＭＭＴＰパケットを受け取ると、ＰＡメッセージに記述された映像アセットに関する情報に基づいて、パケットＩＤを参照して、そのＭＭＴＰパケットに含まれている映像ＭＭＴＰペイロードを映像ＭＭＴＰペイロード処理部１９に出力する。例えば映像アセットがアセット１とアセット２から構成されている場合には、それぞれのアセットに関する映像ＭＭＴＰペイロードが映像ＭＭＴＰペイロード処理部へ出力される。 When the asset separation unit 14 receives the MMTP packet from the MMTP packet analysis unit 12, the asset separation unit 14 refers to the packet ID based on the information about the video asset described in the PA message, and the video MMTP payload included in the MMTP packet. Is output to the video MMTP payload processing unit 19. For example, when the video asset is composed of asset 1 and asset 2, the video MMTP payload related to each asset is output to the video MMTP payload processing unit.

映像ＭＭＴＰペイロード処理部１９は、アセット分離部１４から２以上の映像アセットに係る映像ＭＭＴＰペイロードを受けるとそれぞれの映像ＭＭＴＰペイロードからＨＥＶＣエレメンタリーストリームを生成して、そのＨＥＶＣエレメンタリーストリームをＨＥＶＣＥＳバッファ２０に格納するとともに、各々の映像ＭＭＴＰペイロードに含まれているアクセスユニットの復号時刻と提示時刻をＨＥＶＣＥＳバッファ２０に格納する（ステップＳＴ１９）。 When the video MMTP payload processing unit 19 receives the video MMTP payloads related to two or more video assets from the asset separation unit 14, the video MMTP payload processing unit 19 generates a HEVC elementary stream from each video MMTP payload, and the HEVC elementary stream is converted into the HEVCES buffer 20. And the decoding time and presentation time of the access unit included in each video MMTP payload are stored in the HEVCES buffer 20 (step ST19).

ＨＥＶＣＥＳ復号部２１は、例えばアセット１とアセット２の依存関係に基づき、アセット１のアクセスユニットのＤＴＳとアセット２のアクセスユニットのＤＴＳを比較することによって、分離前のＨＥＶＣエレメンタリーストリームの符号化順を特定することができ、異なるビットストリームに分離して入力されたＨＥＶＣエレメンタリーストリームを正しい復号時刻でＨＥＶＣＥＳバッファ２０から取り出して、当該アクセスユニット（ＡＵ）の映像信号を復号することができる。 The HEVCES decoding unit 21 compares the DTS of the access unit of the asset 1 and the DTS of the access unit of the asset 2 based on, for example, the dependency relationship between the asset 1 and the asset 2, thereby encoding the HEVC elementary stream before separation. The HEVC elementary stream separated and input into different bit streams can be extracted from the HEVCES buffer 20 at the correct decoding time, and the video signal of the access unit (AU) can be decoded.

以上で明らかなように、この実施の形態１によれば、１以上のアクセスユニット（ＡＵ）の映像信号が時間階層符号化された場合に、各アクセスユニットの階層レベルに応じて異なるアセットを構成する場合に、各アセットを構成するＭＰＵの単位にＭＰＵ時刻情報記述子を符号化し、ＭＰＵ時刻情報記述子に含まれるＭＭＴＰペイロードに含まれる符号化データのアクセスユニット単位に提示時刻や復号時刻を算出するための情報（提示時刻情報や復号時刻情報）を符号化するか否かを示すフラグの値に応じて、ＭＰＵを構成するアクセスユニットの単位に、提示時刻情報（ｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｉｍｅ＿ｏｆｆｓｅｔ）や復号時刻情報（ｄｅｃｏｄｉｎｇ＿ｔｉｍｅ＿ｏｆｆｓｅｔ）を符号化するように構成したので、時間階層符号化された映像のビットストリームを各アクセスユニットの階層レベルに応じて異なるアセットを構成し伝送する場合でも、復号装置において各アクセスユニットの復号タイミングに基づいて時間階層符号化されたビットストリームを再構成し復号することができる符号化装置、復号装置が得られる効果がある。 As apparent from the above, according to the first embodiment, when video signals of one or more access units (AU) are time-hierarchically encoded, different assets are configured according to the hierarchical level of each access unit. In this case, the MPU time information descriptor is encoded in the unit of MPU constituting each asset, and the presentation time and the decoding time are calculated in the access unit unit of the encoded data included in the MMTP payload included in the MPU time information descriptor. The presentation time information (presentation_time_offset) and the decoding time information (in accordance with the value of a flag indicating whether or not to encode information (presentation time information and decoding time information) for each access unit constituting the MPU decoding_time_offset) is encoded so that the time hierarchy Even when different encoded video bitstreams are configured and transmitted according to the hierarchical level of each access unit, the decoding apparatus re-generates the time-stream encoded bitstream based on the decoding timing of each access unit. There is an effect that an encoding device and a decoding device that can be configured and decoded are obtained.

なお、本願発明はその発明の範囲内において、実施の形態の自由な組み合わせ、あるいは実施の形態の任意の構成要素の変形、もしくは実施の形態において任意の構成要素の省略が可能である。 In the present invention, within the scope of the invention, free combination of the embodiments, modification of any component of the embodiment, or omission of any component in the embodiment is possible.

１音声符号化部、２音声ＭＭＴＰペイロード生成部、３ＨＥＶＣ符号化部（映像符号化手段）、４映像ＭＭＴＰペイロード生成部（映像符号化手段）、５制御情報符号化部（制御情報符号化手段、時刻情報符号化手段）、６制御ＭＭＴＰペイロード生成部（制御情報符号化手段）、７ＭＭＴＰパケット多重化部（多重化手段）、８音声ＭＭＴＰパケット生成部、９映像ＭＭＴＰパケット生成部、１０制御ＭＭＴＰパケット生成部、１２ＭＭＴＰパケット解析部、１３制御ＭＭＴＰペイロード処理部（提示時刻算出手段）、１４アセット分離部、１５音声ＭＭＴＰペイロード処理部、１６音声ＥＳバッファ、１７音声ストリーム復号部、１８音声データバッファ、１９映像ＭＭＴＰペイロード処理部（映像復号手段）、２０ＨＥＶＣＥＳバッファ（映像復号手段）、２１ＨＥＶＣＥＳ復号部（映像復号手段）、２２復号画像バッファ（映像復号手段）。 DESCRIPTION OF SYMBOLS 1 Audio encoding part, 2 Audio MMTP payload production | generation part, 3 HEVC encoding part (video coding means), 4 Video MMTP payload production | generation part (video coding means), 5 Control information coding part (Control information coding means) , Time information encoding means), 6 control MMTP payload generation section (control information encoding means), 7 MMTP packet multiplexing section (multiplexing means), 8 audio MMTP packet generation section, 9 video MMTP packet generation section, 10 control MMTP packet generation unit, 12 MMTP packet analysis unit, 13 control MMTP payload processing unit (presentation time calculation means), 14 asset separation unit, 15 audio MMTP payload processing unit, 16 audio ES buffer, 17 audio stream decoding unit, 18 audio data Buffer, 19 Video MMTP payload processing unit (Video decoding means) 20 HEVCES buffer (video decoding means), 21 HEVCES decoder (video decoding means), 22 the decoded picture buffer (video decoding means).

Claims

Video encoding means for encoding a video signal in units of video access units and outputting encoded data of the video signal;
A plurality of accesses capable of decoding all of the video signals of the one or more access units when the video signal of the one or more access units is encoded by the interframe predictive encoding method by the video encoding means. Control information including presentation time information indicating the presentation time of the first access unit in the presentation order and decoding time information indicating the decoding time of the first access unit in the encoding order is encoded for each GOP that is a set of units. Control information encoding means for outputting encoded data of control information in GOP units;
For each access unit of encoded data of the video signal output from the video encoding means, the difference between the decoding time of this access unit and the decoding time of the first access unit in the encoding order is encoded to obtain time information Time information encoding means for outputting the encoded data of
The encoded data of the video signal output from the video encoding means, the encoded data of time information output from the time information encoding means, and the encoded data of control information output from the control information encoding means. An encoding apparatus comprising: multiplexing means for multiplexing and outputting a bit stream that is encoded data after multiplexing.