JP6257448B2

JP6257448B2 - Encoding device, decoding device, encoding method, and decoding method

Info

Publication number: JP6257448B2
Application number: JP2014112682A
Authority: JP
Inventors: 守屋　芳美; 芳美守屋; 彰峯澤; 亮史服部; 一之宮澤; 幸成松田; 関口　俊一; 俊一関口
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2014-05-30
Filing date: 2014-05-30
Publication date: 2018-01-10
Anticipated expiration: 2034-05-30
Also published as: JP2015228553A

Description

この発明は、映像信号や音声信号を符号化してビットストリームを生成する符号化装置及び符号化方法と、ビットストリームに多重化されている符号化データを復号する復号装置及び復号方法とに関するものである。 The present invention relates to an encoding device and an encoding method for encoding a video signal and an audio signal to generate a bit stream, and a decoding device and a decoding method for decoding encoded data multiplexed in the bit stream. is there.

日本のディジタル放送では、以下の非特許文献１に記載されているように、映像信号や音声信号の符号化データである映像ストリームと音声ストリームは、ＭＰＥＧ−２（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐＰｈａｓｅ−２）のシステム規格であるトランスポートストリーム（ＴＳ）形式で多重化されて伝送される。このとき、符号化装置は、映像ストリーム及び音声ストリームに関連するメタデータの符号化データについても、映像ストリーム及び音声ストリームと一緒に多重化して伝送する。 In Japanese digital broadcasting, as described in Non-Patent Document 1 below, a video stream and an audio stream that are encoded data of a video signal and an audio signal are MPEG-2 (Moving Picture Experts Group Phase-2). The data is multiplexed and transmitted in the transport stream (TS) format, which is a system standard. At this time, the encoding apparatus also multiplexes and transmits the encoded data of the metadata related to the video stream and the audio stream together with the video stream and the audio stream.

ＭＰＥＧ−２でのトランスポートストリーム（ＴＳ）の他に、ＭＰＥＧで標準化が進められている新しいトランスポート方式として、ＭＭＴ（ＭＰＥＧＭｅｄｉａＴｒａｎｓｐｏｒｔ）があり、ＭＭＴは、１つのプログラムを構成する１以上の映像コンポーネント（映像ストリーム）と音声コンポーネント（音声ストリーム）を伝送する際、コンポーネント毎に、異なる伝送形態（例えば、放送、通信など）での伝送を可能にしている。 In addition to the transport stream (TS) in MPEG-2, there is MMT (MPEG Media Transport) as a new transport system that is being standardized in MPEG, and MMT is one or more that constitute one program. When transmitting a video component (video stream) and an audio component (audio stream), it is possible to transmit in a different transmission form (for example, broadcast, communication, etc.) for each component.

ここで、ＨＥＶＣ／Ｈ．２６５（以下、「ＨＥＶＣ」と称する）は、ＭＰＥＧ及びＩＴＵ（ＩｎｔｅｒｎａｔｉｏｎａｌＴｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎＵｎｉｏｎ）で標準化された新しい映像符号化方式である。
ＨＥＶＣでは、時間階層符号化（時間方向にスケーラブルな符号化）が導入されており、アクセスユニット（１ピクチャを復号するために必要な符号化データを含む単位）を構成する符号化単位のＮＡＬ（ＮｅｔｗｏｒｋＡｂｓｔｒａｃｔｉｏｎＬａｙｅｒ）ユニット毎に階層レベルを指定することができる。 Here, HEVC / H. H.265 (hereinafter referred to as “HEVC”) is a new video encoding method standardized by MPEG and ITU (International Telecommunication Union).
In HEVC, temporal hierarchical coding (scalable coding in the time direction) has been introduced, and the NAL (the coding unit that constitutes an access unit (a unit including coded data necessary for decoding one picture)) (Network Abstraction Layer) A hierarchical level can be specified for each unit.

図８はＨＥＶＣでの時間階層符号化例を示す説明図である。
図８において、ＴｅｍｐｏｒａｌＩＤは各アクセスユニット（ＡＵ）の階層レベルを示す識別情報である。
ＴｅｍｐｏｒａｌＩＤ＝Ｌ_０の場合、０の階層だけであるため、時間階層符号化は行われていない。
ＴｅｍｐｏｒａｌＩＤ＝Ｌ_１の場合、最大階層が１の階層であり、時間階層符号化が行われている。
同様に、ＴｅｍｐｏｒａｌＩＤ＝Ｌ_２，Ｌ_３の場合、最大階層がそれぞれ２，３の階層であり、時間階層符号化が行われている。 FIG. 8 is an explanatory diagram showing an example of time hierarchy coding in HEVC.
In FIG. 8, TemporalID is identification information indicating the hierarchical level of each access unit (AU).
In the case of TemporalID = L ₀ , only the 0th layer is present, and thus no time layer encoding is performed.
When TemporalID = L ₁ , the maximum hierarchy is a hierarchy of 1, and temporal hierarchy encoding is performed.
Similarly, in the case of TemporalID = L ₂ and L ₃ , the maximum hierarchies are two or three hierarchies, respectively, and temporal hierarchy coding is performed.

時間階層符号化の内容は公知であるため詳細な説明を省略するが、時間階層符号化の制約として、符号化対象のアクセスユニット（ＡＵ）が有する階層レベルより大きい階層レベルを有するアクセスユニット（ＡＵ）は参照することができないというものがある。
なお、ＨＥＶＣでは、最大階層が６までの参照構造による時間階層符号化が可能である。 Since the contents of the time hierarchy coding are known, detailed description thereof will be omitted. However, as a restriction on the time hierarchy coding, an access unit (AU) having a layer level higher than the layer level of the access unit (AU) to be coded is included. ) Cannot be referred to.
In HEVC, temporal hierarchy coding is possible using a reference structure with a maximum hierarchy of up to 6.

図９はピクチャ構造の一例を示す説明図である。
図９において、ＩＲＡＰは、ＨＥＶＣで規定されているＩＲＡＰ（Ｉｎｔｒａｒａｎｄｏｍａｃｃｅｓｓｐｏｉｎｔ）ピクチャのことであり、ビットストリームの途中から復号を開始するときに、表示順でＩＲＡＰピクチャ以降のピクチャについては正常に復号されることが保証される。
ＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅｓ）は、１以上のアクセスユニット（ＡＵ）の映像信号がフレーム間予測符号化方式で符号化された場合に、前記１以上のアクセスユニットの映像信号の全てを復号することが可能な複数のアクセスユニット（ＡＵ）の集合である。即ち、符号化順で先頭のアクセスユニット（ＡＵ）であるＩＲＡＰピクチャと、そのＩＲＡＰピクチャに続くアクセスユニット（ＡＵ）（ＩＲＡＰピクチャ以外のピクチャ）との集合である。
また、ＳＯＰは、階層レベル０を有し、符号化順で先頭のピクチャと、そのピクチャに続く階層レベル１以上のピクチャの集合である。
１つのシーケンスのビットストリームは、１以上のＧＯＰから構成され、１つのＧＯＰは１以上のＳＯＰから構成される。図９の例では、Ｌ_３のＳＯＰとＬ_２のＳＯＰからＧＯＰが構成されている。 FIG. 9 is an explanatory diagram showing an example of a picture structure.
In FIG. 9, IRAP is an IRAP (Intra Random Access Point) picture defined by HEVC. When decoding is started from the middle of a bitstream, the IRAP picture after the IRAP picture is normally displayed in the display order. Guaranteed to be decrypted.
GOP (Group Of Pictures) may decode all of the video signals of the one or more access units when the video signal of one or more access units (AU) is encoded by the inter-frame predictive coding method. It is a set of possible access units (AU). That is, it is a set of an IRAP picture that is the first access unit (AU) in the coding order and an access unit (AU) (picture other than the IRAP picture) that follows the IRAP picture.
The SOP is a set of pictures having a hierarchy level 0, the first picture in the coding order, and pictures of the hierarchy level 1 or higher following the picture.
One sequence of bitstreams is composed of one or more GOPs, and one GOP is composed of one or more SOPs. In the example of FIG. 9, the GOP is composed of the L ₃ SOP and the L ₂ SOP.

図１０は図９のピクチャ構造で符号化される各ピクチャの符号化順及び表示順を示す説明図である。
先頭のＳＯＰにおいて、符号化順で先頭のアクセスユニット（ＡＵ）であるＩＲＡＰピクチャに続く各ピクチャは、１つ前のＳＯＰのピクチャを参照画像として符号化されているものである。
このため、ＩＲＡＰピクチャに続く各ピクチャを正常に復号するには、１つ前のＳＯＰのピクチャを参照する必要があるため、１つ前のＳＯＰのピクチャが既に復号されている必要がある。
よって、先頭のＳＯＰのＩＲＡＰピクチャから復号を開始する場合には、１つ前のＳＯＰのピクチャを参照することができない（１つ前のＳＯＰのピクチャが復号されていない）ので、そのＩＲＡＰピクチャに続く各ピクチャは、正常に復号を行うことができない。
ＨＥＶＣでは、符号化順でＩＲＡＰピクチャに続くピクチャであって、ＩＲＡＰピクチャより表示順が早いピクチャはＬＰ（ＬｅａｄｉｎｇＰｉｃｕｔｒｅ）と呼ばれ、ＩＲＡＰピクチャから復号を開始した場合、ＬＰは正常に復号を行うことができない。
なお、ＩＲＡＰピクチャは、フレーム内符号化方式で符号化されているため、１つ前のＳＯＰのピクチャが復号されていなくても、正常に復号を行うことができる。 FIG. 10 is an explanatory diagram showing the encoding order and display order of each picture encoded with the picture structure of FIG.
In the first SOP, each picture following the IRAP picture that is the first access unit (AU) in the encoding order is encoded using the previous SOP picture as a reference image.
For this reason, in order to normally decode each picture following the IRAP picture, it is necessary to refer to the picture of the previous SOP, and therefore, the picture of the previous SOP needs to be already decoded.
Therefore, when decoding is started from the IRAP picture of the first SOP, the previous SOP picture cannot be referred to (the previous SOP picture has not been decoded). Each subsequent picture cannot be decoded normally.
In HEVC, a picture that follows an IRAP picture in coding order and is earlier in display order than the IRAP picture is called LP (Leading Picture). When decoding starts from an IRAP picture, LP performs decoding normally. I can't.
Note that since the IRAP picture is encoded by the intra-frame encoding method, it can be normally decoded even if the previous SOP picture is not decoded.

例えば、ディジタル放送では、複数のビットストリームが同時に配信され、ユーザによるチャンネル切替によって、表示対象のビットストリームが切り替えられる。
複数のビットストリームには、ＩＲＡＰピクチャが周期的に挿入されているため、ユーザのチャンネル切替時には、いずれかのＩＲＡＰピクチャから復号を開始することで、表示映像を切り替えることができる。 For example, in digital broadcasting, a plurality of bit streams are distributed simultaneously, and the bit stream to be displayed is switched by channel switching by the user.
Since IRAP pictures are periodically inserted into the plurality of bit streams, when a user switches channels, display video can be switched by starting decoding from any of the IRAP pictures.

ＳＴＤ−Ｂ３２（ＡＲＩＢ（一般社団法人電波産業会）で策定されたディジタル放送に関する標準規格）STD-B32 (Standard for digital broadcasting established by ARIB (Radio Industry Association))

従来の符号化装置は以上のように構成されているので、時間階層符号化を行う場合、最大階層が大きくなるほど、ＳＯＰを構成するピクチャの数が多くなる。このため、復号装置がＩＲＡＰピクチャから復号を開始する場合、正常に復号を行うことができないＬＰの数が増加するので、ユーザによるチャンネル切替によって、表示対象のビットストリームが切り替えられる場合、切替後のビットストリームによるピクチャが表示されるまでに多くの時間を要し、何のピクチャも表示されない時間が長くなってしまう課題があった。
即ち、ユーザによるチャンネル切替によって、表示対象のビットストリームが切り替えられた場合、先頭のＬＰ（図１０の例では、Ｂ２５のピクチャ）の表示時刻からＩＲＡＰピクチャの表示時刻になるまでの間、何のピクチャも表示されなくなるので、ＳＯＰを構成するピクチャの数が多くなるほど、何のピクチャも表示されない時間が長くなってしまう課題があった。 Since the conventional coding apparatus is configured as described above, when performing temporal layer coding, the larger the maximum layer, the greater the number of pictures that make up the SOP. For this reason, when the decoding device starts decoding from an IRAP picture, the number of LPs that cannot be decoded normally increases. Therefore, when the bit stream to be displayed is switched by the channel switching by the user, It takes a long time to display a picture by a bitstream, and there is a problem that a time during which no picture is displayed becomes long.
That is, when the bit stream to be displayed is switched by channel switching by the user, what is the interval between the display time of the first LP (B25 picture in the example of FIG. 10) and the display time of the IRAP picture. Since no picture is displayed, there is a problem that the time during which no picture is displayed becomes longer as the number of pictures constituting the SOP increases.

この発明は上記のような課題を解決するためになされたもので、ユーザによりチャンネルが切り替えられたときに何のピクチャも表示されない時間を無くして、シームレスなチャンネル切替を実現することができる符号化装置、復号装置、符号化方法及び復号方法を得ることを目的とする。 The present invention has been made in order to solve the above-described problems, and is an encoding that can realize seamless channel switching by eliminating the time during which no picture is displayed when the channel is switched by the user. It is an object to obtain an apparatus, a decoding apparatus, an encoding method, and a decoding method.

この発明に係る符号化装置は、映像のアクセスユニット単位に映像信号を符号化して、その映像信号の符号化データを出力する映像符号化手段と、映像符号化手段により１以上のアクセスユニットの映像信号がフレーム間予測符号化方式で符号化された場合に、１以上のアクセスユニットの映像信号の全てを復号することが可能な複数のアクセスユニットの集合であるＧＯＰ毎に、提示順で先頭のアクセスユニットの提示時刻を示す提示時刻情報と、符号化順で先頭のアクセスユニットより提示順が早いアクセスユニットの個数を示す個数情報とを含む制御情報を符号化して、ＧＯＰ単位の制御情報の符号化データを出力する制御情報符号化手段とを設け、多重化手段が、映像符号化手段から出力された映像信号の符号化データと制御情報符号化手段から出力された制御情報の符号化データを多重化し、多重化後の符号化データであるビットストリームを出力するようにしたものである。 An encoding apparatus according to the present invention encodes a video signal in units of video access units and outputs encoded data of the video signal, and video of one or more access units by the video encoding means. When the signal is encoded by the inter-frame predictive encoding method, the first in the presentation order for each GOP that is a set of a plurality of access units capable of decoding all of the video signals of one or more access units. The control information including the presentation time information indicating the presentation time of the access unit and the number information indicating the number of the access units whose presentation order is earlier than the first access unit in the encoding order is encoded to encode the control information in GOP units. Control information encoding means for outputting encoded data, and the multiplexing means is adapted for the encoded data of the video signal output from the video encoding means and the control information code. Multiplexes the encoded data of the control information output from the means, in which to output the bit stream is coded data after multiplexing.

この発明によれば、映像符号化手段により１以上のアクセスユニットの映像信号がフレーム間予測符号化方式で符号化された場合に、１以上のアクセスユニットの映像信号の全てを復号することが可能な複数のアクセスユニットの集合であるＧＯＰ毎に、提示順で先頭のアクセスユニットの提示時刻を示す提示時刻情報と、符号化順で先頭のアクセスユニットより提示順が早いアクセスユニットの個数を示す個数情報とを含む制御情報を符号化して、ＧＯＰ単位の制御情報の符号化データを出力する制御情報符号化手段を設け、多重化手段が、制御情報符号化手段から出力された制御情報の符号化データをビットストリームに含めるように構成したので、ユーザによりチャンネルが切り替えられたときに何のピクチャも表示されない時間を無くして、シームレスなチャンネル切替を実現することができる効果がある。 According to the present invention, when the video signal of one or more access units is encoded by the inter-frame prediction encoding method by the video encoding means, it is possible to decode all of the video signals of one or more access units. For each GOP that is a set of a plurality of access units, the presentation time information indicating the presentation time of the first access unit in the presentation order and the number indicating the number of access units earlier in the encoding order than the first access unit in the encoding order Control information encoding means for encoding control information including information and outputting encoded data of GOP unit control information is provided, and the multiplexing means encodes control information output from the control information encoding means. Since the data is configured to be included in the bitstream, there is no time when no picture is displayed when the channel is switched by the user. To an effect capable of seamless channel switching.

この発明の実施の形態１による符号化装置を示す構成図である。It is a block diagram which shows the encoding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による符号化装置の処理内容（符号化方法）を示すフローチャートである。It is a flowchart which shows the processing content (encoding method) of the encoding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による復号装置を示す構成図である。It is a block diagram which shows the decoding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による復号装置の処理内容（復号方法）を示すフローチャートである。It is a flowchart which shows the processing content (decoding method) of the decoding apparatus by Embodiment 1 of this invention. ＭＭＴでビットストリームを伝送する場合の符号化データの概要を示す説明図である。It is explanatory drawing which shows the outline | summary of the coding data in the case of transmitting a bit stream by MMT. ＭＰＵの構成例を示す説明図である。It is explanatory drawing which shows the structural example of MPU. ＨＥＶＣピクチャ構造記述子を示す説明図である。It is explanatory drawing which shows a HEVC picture structure descriptor. ＨＥＶＣでの時間階層符号化例を示す説明図である。It is explanatory drawing which shows the example of time hierarchy encoding in HEVC. ピクチャ構造の一例を示す説明図である。It is explanatory drawing which shows an example of a picture structure. 図９のピクチャ構造で符号化される各ピクチャの符号化順及び提示順を示す説明図である。It is explanatory drawing which shows the encoding order and presentation order of each picture encoded with the picture structure of FIG.

実施の形態１．
図１はこの発明の実施の形態１による符号化装置を示す構成図である。
図１において、音声符号化部１はディジタルの音声信号が与えられると、音声のアクセスユニット（ＡＵ）単位に、例えば、ＭＰＥＧ−４オーディオなどの方式によって当該音声信号を符号化して、その音声信号の符号化データである音声ストリームを生成するとともに、その音声ストリームに関するメタデータを符号化する処理を実施する。
音声ＭＭＴＰペイロード生成部２は音声符号化部１により符号化されたメタデータとアクセスユニット（ＡＵ）単位の音声信号の符号化データからなる音声ＭＭＴＰペイロードを生成する処理を実施する。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing an encoding apparatus according to Embodiment 1 of the present invention.
In FIG. 1, when a digital audio signal is given, the audio encoding unit 1 encodes the audio signal in units of audio access units (AU) by a method such as MPEG-4 audio. The audio stream that is the encoded data is generated, and the process for encoding the metadata related to the audio stream is performed.
The audio MMTP payload generation unit 2 performs a process of generating an audio MMTP payload including the metadata encoded by the audio encoding unit 1 and encoded data of an audio signal in units of access units (AU).

ＨＥＶＣ符号化部３はディジタルの映像信号が与えられると、映像のアクセスユニット（ＡＵ）単位に、ＨＥＶＣ方式によって当該映像信号を符号化して、その映像信号の符号化データである映像ストリームを生成するとともに、その映像ストリームに関するメタデータを符号化する処理を実施する。
映像ＭＭＴＰペイロード生成部４はＨＥＶＣ符号化部３により符号化されたメタデータとアクセスユニット（ＡＵ）単位の映像信号の符号化データからなる映像ＭＭＴＰペイロードを生成する処理を実施する。なお、ＨＥＶＣ符号化部３及び映像ＭＭＴＰペイロード生成部４から映像符号化手段が構成されている。
ここで、音声ストリームや映像ストリームに関するメタデータとして、例えば、各アクセスユニット（ＡＵ）のＤＴＳ（復号時刻）やＰＴＳ（提示時刻）などを示す時刻情報を記述することができる。 When a digital video signal is given, the HEVC encoding unit 3 encodes the video signal by the HEVC method for each video access unit (AU), and generates a video stream that is encoded data of the video signal. At the same time, a process for encoding the metadata related to the video stream is performed.
The video MMTP payload generation unit 4 performs a process of generating a video MMTP payload composed of metadata encoded by the HEVC encoding unit 3 and encoded data of a video signal in units of access units (AU). The HEVC encoding unit 3 and the video MMTP payload generation unit 4 constitute video encoding means.
Here, for example, time information indicating DTS (decoding time) or PTS (presentation time) of each access unit (AU) can be described as metadata related to the audio stream and the video stream.

制御情報符号化部５は音声符号化部１により生成された音声ストリーム及びＨＥＶＣ符号化部３により生成された映像ストリームに関する制御情報として、ＭＭＴで規定されているＰＡメッセージと呼ばれる制御情報を符号化する処理を実施する。
ＰＡメッセージには、１つのプログラム（ＭＭＴでは、パッケージと称する）を構成する１以上の映像コンポーネント（映像ストリーム）や音声コンポーネント（音声ストリーム）に関する情報が記述されている。ＭＭＴでは、映像コンポーネント及び音声コンポーネントがアセットと呼ばれる。
具体的には、アセットを識別するアセットＩＤ、アセットの種類（ＨＥＶＣ形式の映像ストリームやＭＰＥＧ−４オーディオ形式の音声ストリームなどの種類）を識別するアセットタイプ、各アセットのＭＰＵ（ＭｅｄｉａＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を構成しているアクセスユニット（ＡＵ）の中で、提示順（表示順）で先頭のアクセスユニット（ＡＵ）の提示時刻（表示時刻）を示すＭＰＵタイムスタンプ記述子（提示時刻情報）、ＨＥＶＣピクチャ構造記述子、各アセットの符号化データやメタデータを格納しているＭＭＴＰパケットを示すパケットＩＤなどが、パッケージを構成するアセットの数分だけＰＡメッセージに含まれている。
なお、ＭＰＵは、１以上のアクセスユニット（ＡＵ）から構成されており、ＭＰＵ単体で映像や音声の復号処理を行うことができる単位となる。また、ＭＰＵは、１以上のアクセスユニット（ＡＵ）の映像信号がフレーム間予測符号化方式で符号化される場合には、前記１以上のアクセスユニット（ＡＵ）の映像信号の全てを復号することが可能な複数のアクセスユニット（ＡＵ）の集合であるＧＯＰと同じ単位になる。 The control information encoding unit 5 encodes control information called PA message defined by MMT as control information related to the audio stream generated by the audio encoding unit 1 and the video stream generated by the HEVC encoding unit 3. Perform the process.
In the PA message, information on one or more video components (video streams) and audio components (audio streams) constituting one program (referred to as a package in MMT) is described. In MMT, video components and audio components are called assets.
Specifically, an asset ID for identifying an asset, an asset type for identifying an asset type (a type such as a HEVC format video stream or an MPEG-4 audio format audio stream), and an MPU (Media Processing Unit) of each asset are set. MPU time stamp descriptor (presentation time information) indicating the presentation time (display time) of the first access unit (AU) in the presentation order (display order) among the configured access units (AU), HEVC picture structure The descriptor, the packet ID indicating the MMTP packet storing the encoded data and metadata of each asset, and the like are included in the PA message by the number of assets constituting the package.
The MPU is composed of one or more access units (AU), and is a unit that can perform video and audio decoding processing by the MPU alone. Also, the MPU decodes all of the video signals of the one or more access units (AU) when the video signals of the one or more access units (AU) are encoded by the inter-frame predictive coding method. It becomes the same unit as GOP which is a set of a plurality of access units (AU).

ＨＥＶＣピクチャ構造記述子には、ＭＰＵを構成しているアクセスユニット（ＡＵ）の中で、符号化順で先頭のアクセスユニット（ＡＵ）より提示順が早いアクセスユニット（ＡＵ）の個数（ＬＰの枚数）を示す個数情報（ｎｕｍ＿ｏｆ＿ｌｅａｄｉｎｇ＿ｐｉｃｔｕｒｅ）、符号化順で先頭のアクセスユニット（ＡＵ）を構成しているＮＡＬユニット（ナルユニット）の符号化方式を示すピクチャタイプ情報（ｒａｐ＿ｔｙｐｅ）、ＬＰを構成しているＮＡＬユニットの符号化方式を示すピクチャタイプ情報（ｎａｌ＿ｕｎｉｔ＿ｔｙｐｅ＿ｏｆ＿ｌｅａｄｉｎｇ＿ｐｉｃｔｕｒｅ）などが記述されている。 The HEVC picture structure descriptor includes the number of access units (AU) (the number of LPs) among the access units (AU) constituting the MPU that are earlier in presentation order than the first access unit (AU) in the coding order. ) Indicating the number of pieces of information (num_of_leading_picture), picture type information (rap_type) indicating the encoding method of the NAL unit (null unit) constituting the head access unit (AU) in the encoding order, and LP Describes picture type information (nal_unit_type_of_leading_picture) indicating the encoding method of the NAL unit.

制御ＭＭＴＰペイロード生成部６は制御情報符号化部５により符号化された制御情報の符号化データからなる制御ＭＭＴＰペイロードを生成する処理を実施する。
なお、制御情報符号化部５及び制御ＭＭＴＰペイロード生成部６から制御情報符号化手段が構成されている。 The control MMTP payload generation unit 6 performs a process of generating a control MMTP payload composed of encoded data of control information encoded by the control information encoding unit 5.
The control information encoding unit 5 and the control MMTP payload generating unit 6 constitute control information encoding means.

ＭＭＴＰパケット生成部７は音声ＭＭＴＰペイロード生成部２により生成された音声ＭＭＴＰペイロードと、映像ＭＭＴＰペイロード生成部４により生成された映像ＭＭＴＰペイロードと、制御ＭＭＴＰペイロード生成部６により生成された制御ＭＭＴＰペイロードとを多重化して、ビットストリームを構成するＭＭＴＰパケットを生成する処理を実施する。このＭＭＴＰパケットを生成する際、所定のＭＭＴＰヘッダを付与するが、このＭＭＴＰヘッダには、ＭＭＴＰペイロードに含まれている符号化データの種別に応じて割り当てられるパケットＩＤが含まれる。なお、ＭＭＴＰパケット生成部７は多重化手段を構成している。 The MMTP packet generation unit 7 includes an audio MMTP payload generated by the audio MMTP payload generation unit 2, a video MMTP payload generated by the video MMTP payload generation unit 4, a control MMTP payload generated by the control MMTP payload generation unit 6, Is executed to generate MMTP packets constituting the bit stream. When this MMTP packet is generated, a predetermined MMTP header is added, and this MMTP header includes a packet ID assigned according to the type of encoded data included in the MMTP payload. The MMTP packet generation unit 7 constitutes a multiplexing unit.

図１の例では、符号化装置の構成要素である音声符号化部１、音声ＭＭＴＰペイロード生成部２、ＨＥＶＣ符号化部３、映像ＭＭＴＰペイロード生成部４、制御情報符号化部５、制御ＭＭＴＰペイロード生成部６及びＭＭＴＰパケット生成部７のそれぞれが専用のハードウェア（例えば、ＣＰＵを実装している半導体集積回路、あるいは、ワンチップマイコンなど）で構成されているものを想定しているが、符号化装置がコンピュータで構成されていてもよい。
符号化装置をコンピュータで構成する場合、音声符号化部１、音声ＭＭＴＰペイロード生成部２、ＨＥＶＣ符号化部３、映像ＭＭＴＰペイロード生成部４、制御情報符号化部５、制御ＭＭＴＰペイロード生成部６及びＭＭＴＰパケット生成部７の処理内容を記述しているプログラムをコンピュータのメモリに格納し、当該コンピュータのＣＰＵが当該メモリに格納されているプログラムを実行するようにすればよい。
図２はこの発明の実施の形態１による符号化装置の処理内容（符号化方法）を示すフローチャートである。 In the example of FIG. 1, an audio encoding unit 1, an audio MMTP payload generating unit 2, an HEVC encoding unit 3, a video MMTP payload generating unit 4, a control information encoding unit 5, and a control MMTP payload that are components of the encoding device. It is assumed that each of the generation unit 6 and the MMTP packet generation unit 7 is configured with dedicated hardware (for example, a semiconductor integrated circuit on which a CPU is mounted, or a one-chip microcomputer). The digitizing device may be constituted by a computer.
When the encoding apparatus is configured by a computer, an audio encoding unit 1, an audio MMTP payload generating unit 2, an HEVC encoding unit 3, a video MMTP payload generating unit 4, a control information encoding unit 5, a control MMTP payload generating unit 6, and A program describing the processing contents of the MMTP packet generation unit 7 may be stored in a memory of a computer so that the CPU of the computer executes the program stored in the memory.
FIG. 2 is a flowchart showing the processing contents (encoding method) of the encoding apparatus according to Embodiment 1 of the present invention.

図３はこの発明の実施の形態１による復号装置を示す構成図である。
図３において、ストリーム選択部１１は複数の符号化装置（図１の符号化装置、あるいは、図１の符号化装置に相当する符号化装置）から出力されたビットストリーム（ＭＭＴＰパケットからなるビットストリーム）の中から、提示対象のビットストリームを選択して、そのビットストリームをＭＭＴＰパケット解析部１２に出力する処理を実施する。
また、ストリーム選択部１１は提示対象のビットストリームを切り替える指令が与えられた場合、複数の符号化装置から出力されたビットストリームの中から、切替後のビットストリームを選択して、当該ビットストリームをＭＭＴＰパケット解析部１２に出力するとともに、制御ＭＭＴＰペイロード処理部１３により算出された提示時刻になるまでの間、切替前のビットストリームも引き続きＭＭＴＰパケット解析部１２に出力する処理を実施する。なお、ストリーム選択部１１はビットストリーム選択手段を構成している。 3 is a block diagram showing a decoding apparatus according to Embodiment 1 of the present invention.
In FIG. 3, the stream selection unit 11 is a bit stream (a bit stream made up of MMTP packets) output from a plurality of encoding devices (the encoding device in FIG. 1 or an encoding device corresponding to the encoding device in FIG. 1). ) To select a bit stream to be presented and output the bit stream to the MMTP packet analysis unit 12.
Further, when an instruction to switch the bit stream to be presented is given, the stream selection unit 11 selects the bit stream after switching from the bit streams output from the plurality of encoding devices, and selects the bit stream. While outputting to the MMTP packet analysis part 12, until the presentation time calculated by the control MMTP payload processing part 13 is reached, the process of continuously outputting the bit stream before switching to the MMTP packet analysis part 12 is performed. The stream selection unit 11 constitutes a bit stream selection unit.

ＭＭＴＰパケット解析部１２はストリーム選択部１１から出力されたビットストリームを構成しているＭＭＴＰパケットのＭＭＴＰヘッダを解析して、そのＭＭＴＰヘッダに含まれているパケットＩＤを取得し、そのパケットＩＤがＭＭＴＰペイロードに含まれている符号化データが制御情報（ＰＡメッセージ、ＨＥＶＣピクチャ構造記述子）である旨を示していれば、そのＭＭＴＰパケットに含まれているＭＭＴＰペイロードである制御ＭＭＴＰペイロードを制御ＭＭＴＰペイロード処理部１３に出力する。一方、そのパケットＩＤがＭＭＴＰペイロードに含まれている符号化データが音声信号又は映像信号である旨を示していれば、そのＭＭＴＰパケットをアセット分離部１４に出力する処理を実施する。 The MMTP packet analysis unit 12 analyzes the MMTP header of the MMTP packet constituting the bit stream output from the stream selection unit 11, acquires the packet ID included in the MMTP header, and the packet ID is MMTP If the encoded data included in the payload indicates control information (PA message, HEVC picture structure descriptor), the control MMTP payload, which is the MMTP payload included in the MMTP packet, is set as the control MMTP payload. Output to the processing unit 13. On the other hand, if the packet ID indicates that the encoded data included in the MMTP payload is an audio signal or a video signal, a process of outputting the MMTP packet to the asset separation unit 14 is performed.

制御ＭＭＴＰペイロード処理部１３はＭＭＴＰパケット解析部１２から出力された制御ＭＭＴＰペイロードに含まれている符号化データの復号処理を実施して、制御情報であるＰＡメッセージ及びＰＡメッセージに含まれているＨＥＶＣピクチャ構造記述子を復号する。
また、制御ＭＭＴＰペイロード処理部１３はＰＡメッセージに記述されているＭＰＵタイムスタンプ記述子が示す提示順で先頭のアクセスユニット（ＡＵ）の提示時刻と、ＨＥＶＣピクチャ構造記述子に記述されている個数情報（ｎｕｍ＿ｏｆ＿ｌｅａｄｉｎｇ＿ｐｉｃｔｕｒｅ）が示す符号化順で先頭のアクセスユニット（ＡＵ）より提示順が早いアクセスユニット（ＡＵ）の個数（ＬＰの枚数）とから、符号化順で先頭のアクセスユニット（ＡＵ）の提示時刻を算出する処理を実施する。符号化順で先頭のアクセスユニット（ＡＵ）は、先頭のＳＯＰのＩＲＡＰピクチャである。なお、制御ＭＭＴＰペイロード処理部１３は提示時刻算出手段を構成している。 The control MMTP payload processing unit 13 performs a decoding process on the encoded data included in the control MMTP payload output from the MMTP packet analysis unit 12, and performs the PA message as control information and the HEVC included in the PA message. Decode the picture structure descriptor.
Also, the control MMTP payload processing unit 13 presents the presentation time of the first access unit (AU) in the presentation order indicated by the MPU time stamp descriptor described in the PA message, and the number information described in the HEVC picture structure descriptor. The presentation time of the first access unit (AU) in the coding order from the number of access units (AU) (the number of LPs) that are presented earlier than the first access unit (AU) in the coding order indicated by (num_of_leading_picture) The process of calculating is performed. The head access unit (AU) in the coding order is the IRAP picture of the head SOP. The control MMTP payload processing unit 13 constitutes a presentation time calculation unit.

アセット分離部１４は制御ＭＭＴＰペイロード処理部１３により復号されたＰＡメッセージに記述されているアセットＩＤ、アセットタイプ及びパケットＩＤを参照して、ＭＭＴＰパケット解析部１２から出力されたＭＭＴＰパケットに含まれているＭＭＴＰペイロードが音声ＭＭＴＰペイロードであるのか、映像ＭＭＴＰペイロードであるのかを特定し、音声ＭＭＴＰペイロードであれば、そのＭＭＴＰパケットに含まれている音声ＭＭＴＰペイロードを抽出して、その音声ＭＭＴＰペイロードを音声ＭＭＴＰペイロード処理部１５に出力し、映像ＭＭＴＰペイロードであれば、そのＭＭＴＰパケットに含まれている映像ＭＭＴＰペイロードを抽出して、その映像ＭＭＴＰペイロードを映像ＭＭＴＰペイロード処理部１９に出力する処理を実施する。 The asset separation unit 14 is included in the MMTP packet output from the MMTP packet analysis unit 12 with reference to the asset ID, asset type, and packet ID described in the PA message decoded by the control MMTP payload processing unit 13. The MMTP payload is an audio MMTP payload or a video MMTP payload, and if it is an audio MMTP payload, the audio MMTP payload included in the MMTP packet is extracted, and the audio MMTP payload is If the video MMTP payload is output to the MMTP payload processing unit 15, the video MMTP payload included in the MMTP packet is extracted, and the video MMTP payload is output to the video MMTP payload processing unit 19. Hodokosuru.

音声ＭＭＴＰペイロード処理部１５はアセット分離部１４から出力された音声ＭＭＴＰペイロードから音声ストリームのＭＦＵ（ＭｅｄｉａＦｒａｇｍｅｎｔＵｎｉｔ）又はＭＰＵを再構成することで、後段の音声ストリーム復号部１７で復号可能な形式の音声エレメンタリーストリーム（音声ＥＳ）を生成し、その音声ＥＳを音声ＥＳバッファ１６に格納する処理を実施する。ＭＦＵは、ＭＰＵよりも小さな単位であり、１アクセスユニット（ＡＵ）または１ＮＡＬユニットを１ＭＦＵと定義することができる。
また、音声ＭＭＴＰペイロード処理部１５はアセット分離部１４から出力された音声ＭＭＴＰペイロードに含まれている音声ストリームに関するメタデータを抽出し、そのメタデータを音声ＥＳバッファ１６に格納する処理を実施する。
音声ＥＳバッファ１６は音声ＥＳ及びメタデータを一時的に格納するメモリである。 The audio MMTP payload processing unit 15 reconstructs the MFU (Media Fragment Unit) or MPU of the audio stream from the audio MMTP payload output from the asset separation unit 14, so that the audio stream decoding unit 17 can decode the audio stream. A process of generating an audio elementary stream (audio ES) and storing the audio ES in the audio ES buffer 16 is performed. The MFU is a smaller unit than the MPU, and one access unit (AU) or one NAL unit can be defined as one MFU.
In addition, the audio MMTP payload processing unit 15 extracts the metadata regarding the audio stream included in the audio MMTP payload output from the asset separation unit 14 and stores the metadata in the audio ES buffer 16.
The audio ES buffer 16 is a memory that temporarily stores the audio ES and metadata.

音声ストリーム復号部１７は音声ＥＳバッファ１６からメタデータを取り出して、そのメタデータに記述されている時刻情報（各アクセスユニット（ＡＵ）のＤＴＳ（復号時刻）やＰＴＳ（提示時刻）を示す情報）を復号する処理を実施する。
また、音声ストリーム復号部１７は各アクセスユニット（ＡＵ）のＤＴＳ（復号時刻）になると、音声ＥＳバッファ１６から音声ＥＳを取り出して、当該アクセスユニット（ＡＵ）の音声信号を復号し、その復号した音声信号とＰＴＳ（提示時刻）を音声データバッファ１８に格納する処理を実施する。
音声データバッファ１８は音声ストリーム復号部１７により復号された音声信号とＰＴＳ（提示時刻）を一時的に格納するメモリである。 The audio stream decoding unit 17 extracts the metadata from the audio ES buffer 16, and time information described in the metadata (information indicating the DTS (decoding time) and PTS (presentation time) of each access unit (AU)) A process of decoding is performed.
Further, when the DTS (decoding time) of each access unit (AU) is reached, the audio stream decoding unit 17 extracts the audio ES from the audio ES buffer 16, decodes the audio signal of the access unit (AU), and decodes the audio signal. A process of storing the audio signal and the PTS (presentation time) in the audio data buffer 18 is performed.
The audio data buffer 18 is a memory that temporarily stores the audio signal decoded by the audio stream decoding unit 17 and the PTS (presentation time).

映像ＭＭＴＰペイロード処理部１９はアセット分離部１４から出力された映像ＭＭＴＰペイロードから映像ストリームのＭＦＵ又はＭＰＵを再構成することで、後段のＨＥＶＣＥＳ復号部２１で復号可能な形式のＨＥＶＣエレメンタリーストリーム（ＨＥＶＣＥＳ）を生成し、そのＨＥＶＣエレメンタリーストリームをＨＥＶＣＥＳバッファ２０に格納する処理を実施する。
また、映像ＭＭＴＰペイロード処理部１９はアセット分離部１４から出力された映像ＭＭＴＰペイロードに含まれている映像ストリームに関するメタデータを抽出し、そのメタデータをＨＥＶＣＥＳバッファ２０に格納する処理を実施する。
ＨＥＶＣＥＳバッファ２０はＨＥＶＣエレメンタリーストリーム及びメタデータを一時的に格納するメモリである。 The video MMTP payload processing unit 19 reconstructs the video stream MFU or MPU from the video MMTP payload output from the asset separation unit 14, thereby enabling the HEVC elementary stream (HEVC) in a format decodable by the HEVCES decoding unit 21 in the subsequent stage. ES) is generated, and the HEVC elementary stream is stored in the HEVCES buffer 20.
In addition, the video MMTP payload processing unit 19 extracts metadata related to the video stream included in the video MMTP payload output from the asset separation unit 14 and stores the metadata in the HEVCES buffer 20.
The HEVCES buffer 20 is a memory that temporarily stores HEVC elementary streams and metadata.

ＨＥＶＣＥＳ復号部２１はＨＥＶＣＥＳバッファ２０からメタデータを取り出して、そのメタデータに記述されている時刻情報（各アクセスユニット（ＡＵ）のＤＴＳ（復号時刻）やＰＴＳ（提示時刻）を示す情報）を復号する処理を実施する。
また、ＨＥＶＣＥＳ復号部２１は各アクセスユニット（ＡＵ）のＤＴＳ（復号時刻）になると、ＨＥＶＣＥＳバッファ２０からＨＥＶＣエレメンタリーストリームを取り出して、当該アクセスユニット（ＡＵ）の映像信号を復号し、その復号した映像信号である復号画像とＰＴＳ（提示時刻）を復号画像バッファ２２に格納する処理を実施する。
復号画像バッファ２２はＨＥＶＣＥＳ復号部２１により復号された各アクセスユニット（ＡＵ）の復号画像とＰＴＳ（提示時刻）を一時的に格納するメモリである。
なお、映像ＭＭＴＰペイロード処理部１９、ＨＥＶＣＥＳバッファ２０、ＨＥＶＣＥＳ復号部２１及び復号画像バッファ２２から映像復号手段が構成されている。 The HEVCES decoding unit 21 extracts metadata from the HEVCES buffer 20 and decodes time information described in the metadata (information indicating DTS (decoding time) and PTS (presentation time) of each access unit (AU)). Perform the process.
When the HEVCES decoding unit 21 reaches the DTS (decoding time) of each access unit (AU), the HEVCES decoding unit 21 extracts the HEVC elementary stream from the HEVCES buffer 20, decodes the video signal of the access unit (AU), and decodes the decoded video signal. A process of storing the decoded image and the PTS (presentation time) which are video signals in the decoded image buffer 22 is performed.
The decoded image buffer 22 is a memory that temporarily stores the decoded image and PTS (presentation time) of each access unit (AU) decoded by the HEVCES decoding unit 21.
The video MMTP payload processing unit 19, the HEVCES buffer 20, the HEVCES decoding unit 21, and the decoded image buffer 22 constitute video decoding means.

図３の例では、復号装置の構成要素であるストリーム選択部１１、ＭＭＴＰパケット解析部１２、制御ＭＭＴＰペイロード処理部１３、アセット分離部１４、音声ＭＭＴＰペイロード処理部１５、音声ＥＳバッファ１６、音声ストリーム復号部１７、音声データバッファ１８、映像ＭＭＴＰペイロード処理部１９、ＨＥＶＣＥＳバッファ２０、ＨＥＶＣＥＳ復号部２１及び復号画像バッファ２２のそれぞれが専用のハードウェア（バッファ以外は、例えば、ＣＰＵを実装している半導体集積回路、あるいは、ワンチップマイコンなど）で構成されているものを想定しているが、復号装置がコンピュータで構成されていてもよい。
復号装置をコンピュータで構成する場合、音声ＥＳバッファ１６、音声データバッファ１８、ＨＥＶＣＥＳバッファ２０及び復号画像バッファ２２をコンピュータの内部メモリ又は外部メモリ上に構成するとともに、ストリーム選択部１１、ＭＭＴＰパケット解析部１２、制御ＭＭＴＰペイロード処理部１３、アセット分離部１４、音声ＭＭＴＰペイロード処理部１５、音声ストリーム復号部１７、映像ＭＭＴＰペイロード処理部１９及びＨＥＶＣＥＳ復号部２１の処理内容を記述しているプログラムをコンピュータのメモリに格納し、当該コンピュータのＣＰＵが当該メモリに格納されているプログラムを実行するようにすればよい。
図４はこの発明の実施の形態１による復号装置の処理内容（復号方法）を示すフローチャートである。 In the example of FIG. 3, the stream selection unit 11, the MMTP packet analysis unit 12, the control MMTP payload processing unit 13, the asset separation unit 14, the audio MMTP payload processing unit 15, the audio ES buffer 16, and the audio stream that are constituent elements of the decoding device Each of the decoding unit 17, the audio data buffer 18, the video MMTP payload processing unit 19, the HEVCES buffer 20, the HEVCES decoding unit 21, and the decoded image buffer 22 is dedicated hardware (other than the buffer, for example, a semiconductor on which a CPU is mounted. Although an integrated circuit or a one-chip microcomputer is assumed, the decoding device may be configured by a computer.
When the decoding apparatus is configured by a computer, the audio ES buffer 16, the audio data buffer 18, the HEVCES buffer 20, and the decoded image buffer 22 are configured on the internal memory or the external memory of the computer, and the stream selection unit 11, the MMTP packet analysis unit 12, the control MMTP payload processing unit 13, the asset separation unit 14, the audio MMTP payload processing unit 15, the audio stream decoding unit 17, the video MMTP payload processing unit 19 and the HEVCES decoding unit 21 which describe the processing contents of the computer What is necessary is just to make it memorize | store in memory and to run the program stored in the said memory of CPU of the said computer.
FIG. 4 is a flowchart showing the processing contents (decoding method) of the decoding apparatus according to Embodiment 1 of the present invention.

次に動作について説明する。
最初の符号化装置の処理内容を説明する。
音声符号化部１は、ディジタルの音声信号が与えられると、音声のアクセスユニット（ＡＵ）単位に、例えば、ＭＰＥＧ−４オーディオなどの方式によって当該音声信号を符号化して、その音声信号の符号化データである音声ストリームを生成するとともに、その音声ストリームに関するメタデータを符号化する（図２のステップＳＴ１）。
ＨＥＶＣ符号化部３は、ディジタルの映像信号が与えられると、映像のアクセスユニット（ＡＵ）単位に、ＨＥＶＣ方式によって当該映像信号を符号化して、その映像信号の符号化データである映像ストリームを生成するとともに、その映像ストリームに関するメタデータを符号化する（ステップＳＴ２）。 Next, the operation will be described.
The processing contents of the first encoding device will be described.
When a digital audio signal is given, the audio encoding unit 1 encodes the audio signal in units of audio access units (AU) by a method such as MPEG-4 audio, and encodes the audio signal. An audio stream that is data is generated, and metadata relating to the audio stream is encoded (step ST1 in FIG. 2).
When a digital video signal is given, the HEVC encoding unit 3 encodes the video signal by the HEVC method for each video access unit (AU), and generates a video stream that is encoded data of the video signal. At the same time, the metadata about the video stream is encoded (step ST2).

ここで、図５はＭＭＴでビットストリームを伝送する場合の符号化データの概要を示す説明図である。
図５において、アクセスユニット（ＡＵ）は、映像であれば、１ピクチャを復号するために必要な符号化データを含む単位であり、音声であれば、符号化単位となる１以上のサンプルから構成されるフレームである。
ＮＡＬユニットはＨＥＶＣの符号化単位であり、１アクセスユニット（ＡＵ）は、１以上のＮＡＬユニットから構成される。
ＭＰＵは、１以上のアクセスユニットから構成され、ＭＰＵ単体で映像や音声の復号処理を行うことができる単位となる。また、ＭＰＵは、１以上のアクセスユニット（ＡＵ）の映像信号がフレーム間予測符号化方式で符号化される場合には、前記１以上のアクセスユニット（ＡＵ）の映像信号の全てを復号することが可能な複数のアクセスユニット（ＡＵ）の集合であるＧＯＰと同じ単位になる。
ＭＦＵは、ＭＰＵよりも小さな単位であり、１アクセスユニット（ＡＵ）又は１ＮＡＬユニットを１ＭＦＵと定義することができる。 Here, FIG. 5 is an explanatory diagram showing an outline of encoded data when a bit stream is transmitted by MMT.
In FIG. 5, an access unit (AU) is a unit that includes encoded data necessary for decoding one picture if it is a video, and is composed of one or more samples that are an encoding unit if it is audio. Is the frame to be played.
The NAL unit is a HEVC encoding unit, and one access unit (AU) is composed of one or more NAL units.
The MPU is composed of one or more access units, and is a unit that can perform video and audio decoding processing by the MPU alone. Also, the MPU decodes all of the video signals of the one or more access units (AU) when the video signals of the one or more access units (AU) are encoded by the inter-frame predictive coding method. It becomes the same unit as GOP which is a set of a plurality of access units (AU).
The MFU is a smaller unit than the MPU, and one access unit (AU) or one NAL unit can be defined as one MFU.

図６はＭＰＵの構成例を示す説明図である。
図６において、ＭＰＵメタデータは、ＭＰＵに関連するメタデータが記述されるものであり、ＭＰＵに含まれる各アクセスユニット（ＡＵ）のＤＴＳ（復号時刻）やＰＴＳ（提示時刻）を示す時刻情報などを記述することができる。
ムービーフラグメントメタデータ（ＭＦメタ）は、１アクセスユニット（ＡＵ）の符号化データ（サンプルデータ）に付随するメタデータが記述されるものである。例えば、アクセスユニット（ＡＵ）の符号化データがファイル形式で格納される場合、アクセスユニット（ＡＵ）毎に、符号化データが格納されているアドレスや符号化データのデータ長、当該アクセスユニット（ＡＵ）の提示時刻に関する情報が含まれる。
ＭＰＵメタデータ、ムービーフラグメントメタデータ、ＭＦＵ及びＭＭＴの制御情報は、ＭＭＴＰパケット化されて伝送される。ＭＭＴＰパケットはＭＭＴＰヘッダとＭＭＴＰペイロードから構成される。 FIG. 6 is an explanatory diagram showing a configuration example of the MPU.
In FIG. 6, MPU metadata describes metadata related to the MPU, such as time information indicating DTS (decoding time) and PTS (presentation time) of each access unit (AU) included in the MPU. Can be described.
Movie fragment metadata (MF metadata) describes metadata accompanying encoded data (sample data) of one access unit (AU). For example, when the encoded data of the access unit (AU) is stored in a file format, for each access unit (AU), the address where the encoded data is stored, the data length of the encoded data, the access unit (AU) ) Is included.
MPU metadata, movie fragment metadata, MFU and MMT control information are transmitted as MMTP packets. The MMTP packet is composed of an MMTP header and an MMTP payload.

音声ＭＭＴＰペイロード生成部２は、音声符号化部１からメタデータ（ＭＰＵメタデータ、ＭＦメタなど）の符号化データと、アクセスユニット（ＡＵ）単位の音声信号の符号化データとを受けると、ＭＰＵ単位のＭＰＵメタデータの符号化データと、アクセスユニット（ＡＵ）単位のＭＦメタの符号化データ及び音声信号の符号化データ（サンプルデータ）からなる音声ＭＭＴＰペイロードを生成する（ステップＳＴ３）。
映像ＭＭＴＰペイロード生成部４は、ＨＥＶＣ符号化部３からメタデータ（ＭＰＵメタデータ、ＭＦメタなど）の符号化データと、アクセスユニット（ＡＵ）単位の映像信号の符号化データとを受けると、ＭＰＵ単位のＭＰＵメタデータの符号化データと、アクセスユニット（ＡＵ）単位のＭＦメタの符号化データ及び映像信号の符号化データ（サンプルデータ）からなる映像ＭＭＴＰペイロードを生成する（ステップＳＴ４）。 When the audio MMTP payload generation unit 2 receives the encoded data of the metadata (MPU metadata, MF meta, etc.) and the encoded data of the audio signal in units of access units (AU) from the audio encoding unit 1, An audio MMTP payload including encoded data of MPU metadata in units, encoded data of MF meta in units of access units (AU), and encoded data (sample data) of audio signals is generated (step ST3).
When receiving the encoded data of metadata (MPU metadata, MF meta, etc.) and the encoded data of the video signal in units of access units (AU) from the HEVC encoder 3, the video MMTP payload generator 4 receives the MPU. A video MMTP payload composed of encoded data of MPU metadata in units, encoded data of MF meta in units of access units (AU), and encoded data (sample data) of video signals is generated (step ST4).

制御情報符号化部５は、音声符号化部１により生成された音声ストリーム及びＨＥＶＣ符号化部３により生成された映像ストリームに関する制御情報を符号化する（ステップＳＴ５）。
音声ストリーム及び映像ストリームに関する制御情報として、例えば、ＭＭＴで規定されているＰＡメッセージやＨＥＶＣピクチャ構造記述子などを符号化する。
ＰＡメッセージには、上述したように、１つのプログラム（ＭＭＴでは、パッケージと称する）を構成する１以上の映像コンポーネント（映像ストリーム）や音声コンポーネント（音声ストリーム）に関する情報が記述されている。
即ち、ＰＡメッセージには、音声符号化部１及びＨＥＶＣ符号化部３により生成されたアセット（映像ストリーム、音声ストリーム）を識別するアセットＩＤ、アセットの種類を識別するアセットタイプ、各アセットのＭＰＵを構成しているアクセスユニット（ＡＵ）の中で、提示順で先頭のアクセスユニット（ＡＵ）の提示時刻を示すＭＰＵタイムスタンプ記述子、各アセットの符号化データやメタデータを格納しているＭＭＴＰパケットを示すパケットＩＤなどが記述されている。 The control information encoding unit 5 encodes the control information regarding the audio stream generated by the audio encoding unit 1 and the video stream generated by the HEVC encoding unit 3 (step ST5).
As control information related to an audio stream and a video stream, for example, a PA message or HEVC picture structure descriptor defined by MMT is encoded.
As described above, in the PA message, information on one or more video components (video streams) and audio components (audio streams) constituting one program (referred to as a package in MMT) is described.
That is, the PA message includes an asset ID for identifying an asset (video stream, audio stream) generated by the audio encoding unit 1 and HEVC encoding unit 3, an asset type for identifying the asset type, and an MPU for each asset. MMTP packet storing MPU time stamp descriptor indicating the presentation time of the first access unit (AU) in the order of presentation among the configured access units (AU), encoded data and metadata of each asset A packet ID or the like indicating is described.

図７はＨＥＶＣピクチャ構造記述子を示す説明図である。
ＨＥＶＣピクチャ構造記述子には、図７に示すように、ＭＰＵを構成しているアクセスユニット（ＡＵ）の中で、符号化順で先頭のアクセスユニット（ＡＵ）より提示順が早いアクセスユニット（ＡＵ）の個数（ＬＰの枚数）を示す個数情報（ｎｕｍ＿ｏｆ＿ｌｅａｄｉｎｇ＿ｐｉｃｔｕｒｅ）が記述されている。
また、符号化順で先頭のアクセスユニット（ＡＵ）を構成しているＮＡＬユニットの符号化方式を示すピクチャタイプ情報（ｒａｐ＿ｔｙｐｅ）や、ＬＰを構成しているＮＡＬユニットの符号化方式を示すピクチャタイプ情報（ｎａｌ＿ｕｎｉｔ＿ｔｙｐｅ＿ｏｆ＿ｌｅａｄｉｎｇ＿ｐｉｃｔｕｒｅ）などが記述されている。 FIG. 7 is an explanatory diagram showing the HEVC picture structure descriptor.
As shown in FIG. 7, the HEVC picture structure descriptor includes an access unit (AU) that is earlier in presentation order than the first access unit (AU) in the coding order among the access units (AU) constituting the MPU. ) (Number_of_leading_picture) indicating the number (number of LPs).
Also, picture type information (rap_type) indicating the encoding method of the NAL unit constituting the head access unit (AU) in the encoding order, and picture type indicating the encoding method of the NAL unit constituting the LP. Information (nal_unit_type_of_leading_picture) and the like are described.

制御ＭＭＴＰペイロード生成部６は、制御情報符号化部５から制御情報の符号化データを受けると、その制御情報の符号化データからなる制御ＭＭＴＰペイロードを生成する（ステップＳＴ６）。
ＭＭＴＰパケット生成部７は、音声ＭＭＴＰペイロード生成部２により生成された音声ＭＭＴＰペイロードと、映像ＭＭＴＰペイロード生成部４により生成された映像ＭＭＴＰペイロードと、制御ＭＭＴＰペイロード生成部６により生成された制御ＭＭＴＰペイロードとを多重化して、ビットストリームを構成するＭＭＴＰパケットを生成する（ステップＳＴ７）。
このＭＭＴＰパケットを生成する際、所定のＭＭＴＰヘッダを付与するが、このＭＭＴＰヘッダには、ＭＭＴＰペイロードに含まれている符号化データの種別に応じて割り当てられるパケットＩＤが含まれる。 When receiving the encoded data of the control information from the control information encoding unit 5, the control MMTP payload generating unit 6 generates a control MMTP payload including the encoded data of the control information (step ST6).
The MMTP packet generator 7 includes an audio MMTP payload generated by the audio MMTP payload generator 2, a video MMTP payload generated by the video MMTP payload generator 4, and a control MMTP payload generated by the control MMTP payload generator 6. And MMTP packets constituting the bit stream are generated (step ST7).
When this MMTP packet is generated, a predetermined MMTP header is added, and this MMTP header includes a packet ID assigned according to the type of encoded data included in the MMTP payload.

次に復号装置の処理内容を説明する。
ストリーム選択部１１は、複数の符号化装置（図１の符号化装置、あるいは、図１の符号化装置に相当する符号化装置）から出力されたビットストリーム（ＭＭＴＰパケットからなるビットストリーム）が与えられる。
ストリーム選択部１１は、複数のビットストリームの中から、ユーザにより指定されたチャンネルのビットストリーム（提示対象のビットストリーム）を選択して、そのビットストリームをＭＭＴＰパケット解析部１２に出力する。 Next, processing contents of the decoding device will be described.
The stream selection unit 11 is provided with a bit stream (a bit stream composed of MMTP packets) output from a plurality of encoding devices (the encoding device in FIG. 1 or an encoding device corresponding to the encoding device in FIG. 1). It is done.
The stream selection unit 11 selects a bit stream (bit stream to be presented) of a channel designated by the user from a plurality of bit streams, and outputs the bit stream to the MMTP packet analysis unit 12.

ＭＭＴＰパケット解析部１２は、ストリーム選択部１１からビットストリームを受けると、そのビットストリームを構成しているＭＭＴＰパケットのＭＭＴＰヘッダを解析して、そのＭＭＴＰヘッダに含まれているパケットＩＤを取得する。
ＭＭＴＰパケット解析部１２は、そのパケットＩＤがＭＭＴＰペイロードに含まれている符号化データが制御情報（ＰＡメッセージ）である旨を示していれば、そのＭＭＴＰパケットに含まれているＭＭＴＰペイロードである制御ＭＭＴＰペイロードを制御ＭＭＴＰペイロード処理部１３に出力する。
一方、そのパケットＩＤがＭＭＴＰペイロードに含まれている符号化データが音声信号又は映像信号である旨を示していれば、そのＭＭＴＰパケットをアセット分離部１４に出力する。 When receiving the bit stream from the stream selection unit 11, the MMTP packet analysis unit 12 analyzes the MMTP header of the MMTP packet constituting the bit stream, and acquires the packet ID included in the MMTP header.
If the packet ID indicates that the encoded data included in the MMTP payload is control information (PA message), the MMTP packet analysis unit 12 controls the MMTP payload included in the MMTP packet. The MMTP payload is output to the control MMTP payload processing unit 13.
On the other hand, if the packet ID indicates that the encoded data included in the MMTP payload is an audio signal or a video signal, the MMTP packet is output to the asset separation unit 14.

制御ＭＭＴＰペイロード処理部１３は、ＭＭＴＰパケット解析部１２から制御ＭＭＴＰペイロードを受けると、その制御ＭＭＴＰペイロードに含まれている符号化データの復号処理を実施して、制御情報であるＰＡメッセージ及びＰＡメッセージに含まれるＨＥＶＣピクチャ構造記述子を復号する。 When the control MMTP payload processing unit 13 receives the control MMTP payload from the MMTP packet analysis unit 12, the control MMTP payload processing unit 13 performs a decoding process on the encoded data included in the control MMTP payload, and controls the PA message and the PA message as control information. The HEVC picture structure descriptor included in is decoded.

アセット分離部１４は、制御ＭＭＴＰペイロード処理部１３がＰＡメッセージを復号すると、そのＰＡメッセージに記述されているアセットＩＤ、アセットタイプ及びパケットＩＤを参照して、ＭＭＴＰパケット解析部１２から出力されたＭＭＴＰパケットに含まれているＭＭＴＰペイロードが音声ＭＭＴＰペイロードであるのか、映像ＭＭＴＰペイロードであるのかを特定する。
アセット分離部１４は、ＭＭＴＰパケット解析部１２から出力されたＭＭＴＰパケットに含まれているＭＭＴＰペイロードが音声ＭＭＴＰペイロードであれば、そのＭＭＴＰパケットに含まれている音声ＭＭＴＰペイロードを抽出して、その音声ＭＭＴＰペイロードを音声ＭＭＴＰペイロード処理部１５に出力する。
アセット分離部１４は、ＭＭＴＰパケット解析部１２から出力されたＭＭＴＰパケットに含まれているＭＭＴＰペイロードが映像ＭＭＴＰペイロードであれば、そのＭＭＴＰパケットに含まれている映像ＭＭＴＰペイロードを抽出して、その映像ＭＭＴＰペイロードを映像ＭＭＴＰペイロード処理部１９に出力する。 When the control MMTP payload processing unit 13 decrypts the PA message, the asset separation unit 14 refers to the asset ID, asset type, and packet ID described in the PA message, and outputs the MMTP output from the MMTP packet analysis unit 12. It is specified whether the MMTP payload included in the packet is an audio MMTP payload or a video MMTP payload.
If the MMTP payload included in the MMTP packet output from the MMTP packet analysis unit 12 is an audio MMTP payload, the asset separation unit 14 extracts the audio MMTP payload included in the MMTP packet and extracts the audio The MMTP payload is output to the voice MMTP payload processing unit 15.
If the MMTP payload included in the MMTP packet output from the MMTP packet analysis unit 12 is a video MMTP payload, the asset separation unit 14 extracts the video MMTP payload included in the MMTP packet and extracts the video The MMTP payload is output to the video MMTP payload processing unit 19.

音声ＭＭＴＰペイロード処理部１５は、アセット分離部１４から音声ＭＭＴＰペイロードを受けると、その音声ＭＭＴＰペイロードから音声ストリームのＭＦＵ又はＭＰＵを再構成することで、後段の音声ストリーム復号部１７で復号可能な形式の音声エレメンタリーストリーム（音声ＥＳ）を生成し、その音声ＥＳを音声ＥＳバッファ１６に格納する。
音声ＭＭＴＰペイロードから音声ＥＳを生成する処理自体は公知の技術であるため詳細な説明を省略する。
また、音声ＭＭＴＰペイロード処理部１５は、アセット分離部１４から出力された音声ＭＭＴＰペイロードに含まれている音声ストリームに関するメタデータを抽出し、そのメタデータを音声ＥＳバッファ１６に格納する。 When the audio MMTP payload processing unit 15 receives the audio MMTP payload from the asset separation unit 14, the audio MMTP payload processing unit 15 reconstructs the MFU or MPU of the audio stream from the audio MMTP payload, so that the audio stream decoding unit 17 can decode the audio stream. Audio elementary stream (audio ES) is generated, and the audio ES is stored in the audio ES buffer 16.
Since the process itself for generating the audio ES from the audio MMTP payload is a known technique, detailed description thereof is omitted.
In addition, the audio MMTP payload processing unit 15 extracts metadata about the audio stream included in the audio MMTP payload output from the asset separation unit 14 and stores the metadata in the audio ES buffer 16.

音声ストリーム復号部１７は、音声ＥＳバッファ１６からメタデータを取り出して、そのメタデータに記述されている時刻情報（各アクセスユニット（ＡＵ）のＤＴＳ（復号時刻）やＰＴＳ（提示時刻）を示す情報）を復号する。
音声ストリーム復号部１７は、復号したＤＴＳを参照して、各アクセスユニット（ＡＵ）の復号時刻を把握し、各アクセスユニット（ＡＵ）の復号時刻になると、音声ＥＳバッファ１６から音声ＥＳを取り出して、当該アクセスユニット（ＡＵ）の音声信号を復号し、その復号した音声信号とＰＴＳ（提示時刻）を音声データバッファ１８に格納する。
これにより、外部の再生装置（図示せず）は、音声データバッファ１８に格納されている音声信号とＰＴＳ（提示時刻）を取り出せば、その提示時刻に音声信号を再生することができる。 The audio stream decoding unit 17 extracts metadata from the audio ES buffer 16, and information indicating time information (DTS (decoding time) and PTS (presentation time) of each access unit (AU)) described in the metadata. ).
The audio stream decoding unit 17 refers to the decoded DTS, grasps the decoding time of each access unit (AU), and extracts the audio ES from the audio ES buffer 16 when the decoding time of each access unit (AU) comes. The audio signal of the access unit (AU) is decoded, and the decoded audio signal and PTS (presentation time) are stored in the audio data buffer 18.
Thus, if an external playback device (not shown) takes out the audio signal and the PTS (presentation time) stored in the audio data buffer 18, the audio signal can be reproduced at the presentation time.

映像ＭＭＴＰペイロード処理部１９は、アセット分離部１４から映像ＭＭＴＰペイロードを受けると、その映像ＭＭＴＰペイロードから映像ストリームのＭＦＵ又はＭＰＵを再構成することで、後段のＨＥＶＣＥＳ復号部２１で復号可能な形式のＨＥＶＣエレメンタリーストリーム（ＨＥＶＣＥＳ）を生成し、そのＨＥＶＣエレメンタリーストリームをＨＥＶＣＥＳバッファ２０に格納する。
映像ＭＭＴＰペイロードからＨＥＶＣエレメンタリーストリームを生成する処理自体は公知の技術であるため詳細な説明を省略する。
また、映像ＭＭＴＰペイロード処理部１９は、アセット分離部１４から出力された映像ＭＭＴＰペイロードに含まれている映像ストリームに関するメタデータを抽出し、そのメタデータをＨＥＶＣＥＳバッファ２０に格納する。 When the video MMTP payload processing unit 19 receives the video MMTP payload from the asset separation unit 14, the video MMTP payload processing unit 19 reconstructs the MFU or MPU of the video stream from the video MMTP payload, so that the HEVCES decoding unit 21 in the subsequent stage can decode it. A HEVC elementary stream (HEVC ES) is generated, and the HEVC elementary stream is stored in the HEVCES buffer 20.
Since the process itself for generating the HEVC elementary stream from the video MMTP payload is a known technique, detailed description thereof is omitted.
In addition, the video MMTP payload processing unit 19 extracts metadata regarding the video stream included in the video MMTP payload output from the asset separation unit 14, and stores the metadata in the HEVCES buffer 20.

ＨＥＶＣＥＳ復号部２１は、ＨＥＶＣＥＳバッファ２０からメタデータを取り出して、そのメタデータに記述されている時刻情報（各アクセスユニット（ＡＵ）のＤＴＳ（復号時刻）やＰＴＳ（提示時刻）を示す情報）を復号する。
ＨＥＶＣＥＳ復号部２１は、復号したＤＴＳを参照して、各アクセスユニット（ＡＵ）の復号時刻を把握し、各アクセスユニット（ＡＵ）の復号時刻になると、ＨＥＶＣＥＳバッファ２０からＨＥＶＣエレメンタリーストリームを取り出して、当該アクセスユニット（ＡＵ）の映像信号を復号し、その復号した映像信号である復号画像とＰＴＳ（提示時刻）を復号画像バッファ２２に格納する。
これにより、外部の再生装置（図示せず）は、復号画像バッファ２２に格納されている復号画像とＰＴＳ（提示時刻）を取り出せば、その提示時刻に復号画像を再生することができる。 The HEVCES decoding unit 21 extracts the metadata from the HEVCES buffer 20, and obtains time information (information indicating the DTS (decoding time) and PTS (presentation time) of each access unit (AU)) described in the metadata. Decrypt.
The HEVCES decoding unit 21 refers to the decoded DTS, grasps the decoding time of each access unit (AU), and extracts the HEVC elementary stream from the HEVCES buffer 20 at the decoding time of each access unit (AU). Then, the video signal of the access unit (AU) is decoded, and the decoded image and the PTS (presentation time) as the decoded video signal are stored in the decoded image buffer 22.
Thus, if an external playback device (not shown) takes out the decoded image and the PTS (presentation time) stored in the decoded image buffer 22, the decoded image can be reproduced at the presentation time.

外部の再生装置（図示せず）が復号画像と音声信号を再生しているとき、ユーザがリモコン等を用いて、チャンネルを切り替える操作を行うと、提示対象の映像ストリームを切り替える指令（この切替指令には、切替後のチャンネルを示す情報が含まれている）がストリーム選択部１１に与えられる。
ストリーム選択部１１は、外部からチャンネルの切替指令を受けると（図４のステップＳＴ１１：Ｙｅｓの場合）、複数のビットストリームの中から、その切替指令が示す切替後のチャンネルのビットストリームを選択して、そのビットストリームをＭＭＴＰパケット解析部１２に出力する（ステップＳＴ１２）。
また、ストリーム選択部１１は、ユーザによりチャンネルが切り替えられたときに何の復号画像も表示されない時間を無くして、シームレスなチャンネル切替を実現するために、制御ＭＭＴＰペイロード処理部１３からビットストリームの出力停止指令を受けるまでの間（ステップＳＴ１３：Ｎｏの場合）、切替前のチャンネルのビットストリームも引き続きＭＭＴＰパケット解析部１２に出力する（ステップＳＴ１４）。ビットストリームの出力停止指令は、後述するように、現在時刻が符号化順で先頭のアクセスユニット（ＡＵ）の提示時刻になると出力される。
ストリーム選択部１１は、制御ＭＭＴＰペイロード処理部１３からビットストリームの出力停止指令を受けると、切替前のチャンネルのビットストリームの出力を停止して、切替後のチャンネルのビットストリームだけをＭＭＴＰパケット解析部１２に出力する。 When an external playback device (not shown) is playing back a decoded image and an audio signal, if the user performs an operation to switch channels using a remote controller or the like, a command to switch the video stream to be presented (this switching command) Includes information indicating the channel after switching) to the stream selection unit 11.
When receiving a channel switching command from the outside (in the case of step ST11: Yes in FIG. 4), the stream selection unit 11 selects a bit stream of the channel after switching indicated by the switching command from a plurality of bit streams. The bit stream is output to the MMTP packet analysis unit 12 (step ST12).
In addition, the stream selection unit 11 outputs a bit stream from the control MMTP payload processing unit 13 in order to eliminate the time during which no decoded image is displayed when the channel is switched by the user and to realize seamless channel switching. Until a stop command is received (step ST13: No), the bit stream of the channel before switching is continuously output to the MMTP packet analysis unit 12 (step ST14). The bitstream output stop command is output when the current time is the presentation time of the first access unit (AU) in the encoding order, as will be described later.
Upon receiving a bitstream output stop command from the control MMTP payload processing unit 13, the stream selection unit 11 stops outputting the bitstream of the channel before switching, and only the bitstream of the channel after switching is analyzed by the MMTP packet analysis unit. 12 is output.

ＭＭＴＰパケット解析部１２は、ストリーム選択部１１から切替後のチャンネルのビットストリームを受けると、チャンネルの切替前と同様に、そのビットストリームを構成しているＭＭＴＰパケットに含まれている制御ＭＭＴＰペイロードを制御ＭＭＴＰペイロード処理部１３に出力し、そのビットストリームを構成しているＭＭＴＰパケットに含まれている音声ＭＭＴＰペイロード又は映像ＭＭＴＰペイロードをアセット分離部１４に出力する。
また、ＭＭＴＰパケット解析部１２は、ストリーム選択部１１から切替前のチャンネルのビットストリームを受けると、切替後のチャンネルのビットストリームに対する処理と並列の処理で、切替前のチャンネルのビットストリームを構成しているＭＭＴＰパケットに含まれている制御ＭＭＴＰペイロードを制御ＭＭＴＰペイロード処理部１３に出力し、そのビットストリームを構成しているＭＭＴＰパケットに含まれている音声ＭＭＴＰペイロード又は映像ＭＭＴＰペイロードをアセット分離部１４に出力する。 When the MMTP packet analysis unit 12 receives the bit stream of the channel after switching from the stream selection unit 11, the MMTP payload included in the MMTP packet constituting the bit stream is received as in the case before the channel switching. The audio MMTP payload or video MMTP payload included in the MMTP packet constituting the bit stream is output to the asset separation unit 14.
Further, when the MMTP packet analysis unit 12 receives the bit stream of the channel before switching from the stream selection unit 11, the MMTP packet analysis unit 12 configures the bit stream of the channel before switching by processing parallel to the processing of the bit stream of the channel after switching. The control MMTP payload included in the MMTP packet being output is output to the control MMTP payload processing unit 13, and the audio MMTP payload or the video MMTP payload included in the MMTP packet constituting the bit stream is converted into the asset separation unit 14. Output to.

制御ＭＭＴＰペイロード処理部１３は、ＭＭＴＰパケット解析部１２から切替後のチャンネルのビットストリームに係る制御ＭＭＴＰペイロードを受けると、チャンネルの切替前と同様に、その制御ＭＭＴＰペイロードに含まれている符号化データの復号処理を実施して、制御情報であるＰＡメッセージ及びＰＡメッセージに含まれるＨＥＶＣピクチャ構造記述子を復号する（ステップＳＴ１５）。
制御ＭＭＴＰペイロード処理部１３は、ＰＡメッセージ及びＰＡメッセージに含まれるＨＥＶＣピクチャ構造記述子を復号すると、そのＰＡメッセージに記述されているＭＰＵタイムスタンプ記述子が示す提示順で先頭のアクセスユニット（ＡＵ）の提示時刻と、ＨＥＶＣピクチャ構造記述子に記述されている個数情報（ｎｕｍ＿ｏｆ＿ｌｅａｄｉｎｇ＿ｐｉｃｔｕｒｅ）が示す符号化順で先頭のアクセスユニット（ＡＵ）より提示順が早いアクセスユニット（ＡＵ）の個数（ＬＰの枚数）とから、符号化順で先頭のアクセスユニット（ＡＵ）の提示時刻を算出する（ステップＳＴ１６）。 When receiving the control MMTP payload related to the bit stream of the channel after switching from the MMTP packet analysis unit 12, the control MMTP payload processing unit 13 encodes the encoded data included in the control MMTP payload as before the channel switching. Is decoded to decode the PA message as the control information and the HEVC picture structure descriptor included in the PA message (step ST15).
When the control MMTP payload processing unit 13 decodes the PA message and the HEVC picture structure descriptor included in the PA message, the top access unit (AU) in the presentation order indicated by the MPU time stamp descriptor described in the PA message. Presentation time and the number of access units (AU) that are earlier in presentation order than the top access unit (AU) in the encoding order indicated by the number information (num_of_leading_picture) described in the HEVC picture structure descriptor (number of LPs) From the above, the presentation time of the head access unit (AU) in the encoding order is calculated (step ST16).

図１０の例では、符号化順で先頭のアクセスユニット（ＡＵ）はＩＲＡＰ３２であり、提示順で先頭のアクセスユニット（ＡＵ）はＢ２５である。また、ＩＲＡＰ３２より提示順が早いアクセスユニット（ＡＵ）の個数（ＬＰの枚数）は７個である。
したがって、提示順で先頭のアクセスユニット（ＡＵ）であるＢ２５の提示時刻が、例えば、１８時００分００秒であり、フレームレートが１２０枚／１秒であれば、ＩＲＡＰ３２の提示時刻は、Ｂ２５の提示時刻（１８時００分００秒）の５８ｍｓｅｃ（＝７／１２０）後になる。
制御ＭＭＴＰペイロード処理部１３は、現在時刻が符号化順で先頭のアクセスユニット（ＡＵ）の提示時刻になると（ステップＳＴ１７：Ｙｅｓの場合）、切替前のチャンネルのビットストリームの出力停止指令をストリーム選択部１１に出力する（ステップＳＴ１８）。 In the example of FIG. 10, the head access unit (AU) in the coding order is IRAP32, and the head access unit (AU) in the presentation order is B25. Also, the number of access units (AU) (number of LPs) that are presented earlier than IRAP 32 is seven.
Therefore, if the presentation time of B25, which is the first access unit (AU) in the presentation order, is, for example, 18:00:00 and the frame rate is 120 frames / second, the presentation time of IRAP32 is B25. This is 58 msec (= 7/120) after the presentation time (18:00:00).
When the current time is the presentation time of the first access unit (AU) in the encoding order (step ST17: Yes), the control MMTP payload processing unit 13 selects the output stop instruction of the bit stream of the channel before switching as a stream. It outputs to the part 11 (step ST18).

アセット分離部１４は、ＭＭＴＰパケット解析部１２から切替後のチャンネルに係るＭＭＴＰパケットを受けると、チャンネルの切替前と同様に、そのＭＭＴＰパケットに含まれている音声ＭＭＴＰペイロードを音声ＭＭＴＰペイロード処理部１５に出力し、そのＭＭＴＰパケットに含まれている映像ＭＭＴＰペイロードを映像ＭＭＴＰペイロード処理部１９に出力する。
また、アセット分離部１４は、ＭＭＴＰパケット解析部１２から切替前のチャンネルに係るＭＭＴＰパケットを受けると、切替後のチャンネルのビットストリームに対する処理と並列の処理で、切替前のチャンネルに係るＭＭＴＰパケットに含まれている音声ＭＭＴＰペイロードを音声ＭＭＴＰペイロード処理部１５に出力し、そのＭＭＴＰパケットに含まれている映像ＭＭＴＰペイロードを映像ＭＭＴＰペイロード処理部１９に出力する。 Upon receiving the MMTP packet related to the channel after switching from the MMTP packet analysis unit 12, the asset separation unit 14 converts the voice MMTP payload included in the MMTP packet into the voice MMTP payload processing unit 15 as before channel switching. And the video MMTP payload included in the MMTP packet is output to the video MMTP payload processing unit 19.
Further, when the asset separation unit 14 receives the MMTP packet related to the channel before switching from the MMTP packet analysis unit 12, the asset separation unit 14 converts the MMTP packet related to the channel before switching into processing in parallel with the processing for the bit stream of the channel after switching. The included audio MMTP payload is output to the audio MMTP payload processing unit 15, and the video MMTP payload included in the MMTP packet is output to the video MMTP payload processing unit 19.

映像ＭＭＴＰペイロード処理部１９は、アセット分離部１４から切替前のチャンネルに係る映像ＭＭＴＰペイロードを受けると（現在時刻が提示順で先頭のアクセスユニット（ＡＵ）の提示時刻になる前）、チャンネルの切替前と同様に、その映像ＭＭＴＰペイロードからＨＥＶＣエレメンタリーストリームを生成して、そのＨＥＶＣエレメンタリーストリームをＨＥＶＣＥＳバッファ２０に格納するとともに、その映像ＭＭＴＰペイロードに含まれている映像ストリームに関するメタデータを抽出し、そのメタデータをＨＥＶＣＥＳバッファ２０に格納する（ステップＳＴ１９）。
また、映像ＭＭＴＰペイロード処理部１９は、アセット分離部１４から切替後のチャンネルに係る映像ＭＭＴＰペイロードを受けると、切替前のチャンネルのビットストリームに対する処理と並列の処理で、切替後のチャンネルに係る映像ＭＭＴＰペイロードからＨＥＶＣエレメンタリーストリームを生成して、そのＨＥＶＣエレメンタリーストリームをＨＥＶＣＥＳバッファ２０に格納するとともに、その映像ＭＭＴＰペイロードに含まれている映像ストリームに関するメタデータを抽出し、そのメタデータをＨＥＶＣＥＳバッファ２０に格納する（ステップＳＴ２０）。 When the video MMTP payload processing unit 19 receives the video MMTP payload related to the channel before switching from the asset separation unit 14 (before the current time is the presentation time of the first access unit (AU) in the presentation order), the video switching is performed. As before, a HEVC elementary stream is generated from the video MMTP payload, the HEVC elementary stream is stored in the HEVCES buffer 20, and metadata about the video stream included in the video MMTP payload is extracted. The metadata is stored in the HEVCES buffer 20 (step ST19).
In addition, when the video MMTP payload processing unit 19 receives the video MMTP payload related to the channel after switching from the asset separation unit 14, the video MMTP payload processing unit 19 performs processing related to the bit stream of the channel before switching and processing related to the channel after switching. A HEVC elementary stream is generated from the MMTP payload, the HEVC elementary stream is stored in the HEVCES buffer 20, metadata about the video stream included in the video MMTP payload is extracted, and the metadata is stored in the HEVCES buffer. 20 (step ST20).

ＨＥＶＣＥＳ復号部２１は、チャンネルの切替前と同様に、ＨＥＶＣＥＳバッファ２０からメタデータを取り出して、そのメタデータに記述されている時刻情報（各アクセスユニット（ＡＵ）のＤＴＳ（復号時刻）やＰＴＳ（提示時刻）を示す情報）を復号する。
これにより、ＨＥＶＣＥＳ復号部２１は、復号したＤＴＳを参照して、各アクセスユニット（ＡＵ）の復号時刻を把握するが、切替後のチャンネルについては、ＩＲＡＰが最初に復号することが可能なアクセスユニット（ＡＵ）であり（図１０のＧＯＰ構成では、ＩＲＡＰ３２のアクセスユニット（ＡＵ））、ＩＲＡＰより提示順が早いＬＰのアクセスユニット（ＡＵ）の映像信号を復号することができない。このため、切替後のチャンネルについては、ＩＲＡＰの提示時刻になるまでの間、どのアクセスユニット（ＡＵ）の映像信号も復号して表示することができない。 The HEVCES decoding unit 21 takes out the metadata from the HEVCES buffer 20 in the same manner as before channel switching, and extracts time information (DTS (decoding time) of each access unit (AU) or PTS (decoding time)) in the metadata. Information) indicating the presentation time) is decoded.
As a result, the HEVCES decoding unit 21 refers to the decoded DTS to grasp the decoding time of each access unit (AU). However, for the channel after switching, the access unit that can be decoded first by the IRAP. (AU) (the IOP32 access unit (AU) in the GOP configuration of FIG. 10), and the video signal of the LP access unit (AU) whose presentation order is earlier than the IRAP cannot be decoded. For this reason, for the channel after switching, the video signal of any access unit (AU) cannot be decoded and displayed until the IRAP presentation time is reached.

そこで、ＨＥＶＣＥＳ復号部２１は、切替後のチャンネルに係るＩＲＡＰの提示時刻になるまでの間、切替前のチャンネルに係るアクセスユニット（ＡＵ）の映像信号を復号して、その復号した映像信号である復号画像とＰＴＳ（提示時刻）を復号画像バッファ２２に格納する。
これにより、外部の再生装置（図示せず）は、切替後のチャンネルに係るＩＲＡＰの提示時刻になるまでの間、切替前のチャンネルに係る復号画像を再生することができる。
したがって、ユーザによりチャンネルが切り替えられたときに何の復号画像も表示されない時間を無くして、シームレスなチャンネル切替を実現することができる。 Therefore, the HEVCES decoding unit 21 decodes the video signal of the access unit (AU) related to the channel before switching until the IRAP presentation time related to the channel after switching, and is the decoded video signal. The decoded image and the PTS (presentation time) are stored in the decoded image buffer 22.
Accordingly, an external playback device (not shown) can play back the decoded image related to the channel before switching until the IRAP presentation time related to the channel after switching is reached.
Therefore, seamless channel switching can be realized by eliminating the time during which no decoded image is displayed when the channel is switched by the user.

音声ＭＭＴＰペイロード処理部１５は、アセット分離部１４から切替前のチャンネルに係る音声ＭＭＴＰペイロードを受けると（現在時刻が提示順で先頭のアクセスユニット（ＡＵ）の提示時刻になる前）、チャンネルの切替前と同様に、その音声ＭＭＴＰペイロードから音声ＥＳを生成して、その音声ＥＳを音声ＥＳバッファ１６に格納するとともに、その音声ＭＭＴＰペイロードに含まれている音声ストリームに関するメタデータを抽出し、そのメタデータを音声ＥＳバッファ１６に格納する。
また、音声ＭＭＴＰペイロード処理部１５は、アセット分離部１４から切替後のチャンネルに係る音声ＭＭＴＰペイロードを受けると、切替前のチャンネルのビットストリームに対する処理と並列の処理で、切替後のチャンネルに係る音声ＭＭＴＰペイロードから音声ＥＳを生成して、その音声ＥＳを音声ＥＳバッファ１６に格納するとともに、その音声ＭＭＴＰペイロードに含まれている音声ストリームに関するメタデータを抽出し、そのメタデータを音声ＥＳバッファ１６に格納する。 When the voice MMTP payload processing unit 15 receives the voice MMTP payload related to the channel before switching from the asset separation unit 14 (before the current time is the presentation time of the first access unit (AU) in the presentation order), the channel switching is performed. As before, an audio ES is generated from the audio MMTP payload, the audio ES is stored in the audio ES buffer 16, and metadata about the audio stream included in the audio MMTP payload is extracted, and the meta data is extracted. Data is stored in the audio ES buffer 16.
When the audio MMTP payload processing unit 15 receives the audio MMTP payload related to the channel after switching from the asset separation unit 14, the audio MMTP payload processing unit 15 performs processing related to the bit stream of the channel before switching in parallel with the processing related to the bit stream of the channel before switching. An audio ES is generated from the MMTP payload, the audio ES is stored in the audio ES buffer 16, metadata about an audio stream included in the audio MMTP payload is extracted, and the metadata is stored in the audio ES buffer 16. Store.

音声ストリーム復号部１７は、チャンネルの切替前と同様に、音声ＥＳバッファ１６からメタデータを取り出して、そのメタデータに記述されている時刻情報（各アクセスユニット（ＡＵ）のＤＴＳ（復号時刻）やＰＴＳ（提示時刻）を示す情報）を復号する。
音声ストリーム復号部１７は、復号したＤＴＳを参照して、各アクセスユニット（ＡＵ）の復号時刻を把握し、各アクセスユニット（ＡＵ）の復号時刻になると、音声ＥＳバッファ１６から音声ＥＳを取り出して、当該アクセスユニット（ＡＵ）の音声信号を復号し、その復号した音声信号とＰＴＳ（提示時刻）を音声データバッファ１８に格納する。
これにより、外部の再生装置（図示せず）は、音声データバッファ１８に格納されている音声信号とＰＴＳ（提示時刻）を取り出せば、その提示時刻に音声信号を再生することができる。 The audio stream decoding unit 17 extracts the metadata from the audio ES buffer 16 in the same manner as before the channel switching, and the time information described in the metadata (DTS (decoding time) of each access unit (AU), (Information indicating PTS (presentation time)) is decoded.
The audio stream decoding unit 17 refers to the decoded DTS, grasps the decoding time of each access unit (AU), and extracts the audio ES from the audio ES buffer 16 when the decoding time of each access unit (AU) comes. The audio signal of the access unit (AU) is decoded, and the decoded audio signal and PTS (presentation time) are stored in the audio data buffer 18.
Thus, if an external playback device (not shown) takes out the audio signal and the PTS (presentation time) stored in the audio data buffer 18, the audio signal can be reproduced at the presentation time.

ここでは、音声ストリーム復号部１７が、ＨＥＶＣＥＳ復号部２１と同様に、切替後のチャンネルに係るＩＲＡＰの提示時刻になるまでの間は、切替前のチャンネルに係るアクセスユニット（ＡＵ）の音声信号を復号して、その復号した音声信号とＰＴＳ（提示時刻）を音声データバッファ１８に格納することを想定しているが、音声信号については、映像信号のようにフレーム間予測符号化方式で符号化されないため、ＩＲＡＰより提示順が早いＬＰのアクセスユニット（ＡＵ）の音声信号も復号することができる。このため、切替後のチャンネルに係るＩＲＡＰの提示時刻になるまでの間であっても、切替前のチャンネルに係るアクセスユニット（ＡＵ）の音声信号を復号せずに、切替後のチャンネルに係るアクセスユニット（ＡＵ）の音声信号を復号するようにしてもよい。 Here, as in the HEVCES decoding unit 21, the audio stream decoding unit 17 receives the audio signal of the access unit (AU) related to the channel before switching until the IRAP presentation time related to the channel after switching is reached. It is assumed that the decoded audio signal and the PTS (presentation time) are stored in the audio data buffer 18, but the audio signal is encoded by an inter-frame predictive encoding method like a video signal. Therefore, the audio signal of the LP access unit (AU) whose presentation order is earlier than that of the IRAP can also be decoded. Therefore, even before the IRAP presentation time related to the channel after switching is reached, the access related to the channel after switching is not decoded without decoding the audio signal of the access unit (AU) related to the channel before switching. The audio signal of the unit (AU) may be decoded.

復号装置では、復号処理を終了するまで、ステップＳＴ１１〜ＳＴ２０の処理を繰り返し実施する（ステップＳＴ２１）。 In the decoding apparatus, the processes of steps ST11 to ST20 are repeatedly performed until the decoding process is completed (step ST21).

以上で明らかなように、この実施の形態１によれば、１以上のアクセスユニット（ＡＵ）の映像信号がフレーム間予測符号化方式で符号化された場合に、１以上のアクセスユニット（ＡＵ）の映像信号の全てを復号することが可能な複数のアクセスユニット（ＡＵ）の集合であるＧＯＰ毎に、提示順で先頭のアクセスユニット（ＡＵ）の提示時刻を示す提示時刻情報と、符号化順で先頭のアクセスユニット（ＡＵ）より提示順が早いアクセスユニット（ＡＵ）の個数を示す個数情報とを含む制御情報を符号化するように構成したので、復号側において、ユーザによりチャンネルが切り替えられたときに、何の復号画像も表示されない時間を無くして、シームレスなチャンネル切替を実現することができる可能な符号化装置が得られる効果がある。 As is apparent from the above, according to the first embodiment, when video signals of one or more access units (AU) are encoded by the inter-frame predictive coding method, one or more access units (AU). Presentation time information indicating the presentation time of the first access unit (AU) in the presentation order for each GOP, which is a set of a plurality of access units (AU) capable of decoding all of the video signals, and the encoding order Since the control information including the number information indicating the number of access units (AU) whose presentation order is earlier than that of the top access unit (AU) is encoded, the channel is switched by the user on the decoding side. Sometimes, there is an effect that an encoding device capable of realizing seamless channel switching without a time during which no decoded image is displayed can be obtained. .

また、この実施の形態１によれば、提示対象のビットストリームを切り替える指令が与えられた場合、ストリーム選択部１１が、複数の符号化装置から出力されたビットストリームの中から、切替後のビットストリームを選択して、当該ビットストリームをＭＭＴＰパケット解析部１２に出力するとともに、制御ＭＭＴＰペイロード処理部１３により算出された提示時刻になるまで（ビットストリームの出力停止指令を受けるまで）、切替前のビットストリームも引き続きＭＭＴＰパケット解析部１２に出力し、ＨＥＶＣＥＳ復号部２１が、切替後のビットストリームに多重化されている映像信号の符号化データからアクセスユニット単位の映像信号を復号するとともに、制御ＭＭＴＰペイロード処理部１３により算出された提示時刻になるまでの間、切替前のビットストリームに多重化されている映像信号の符号化データからアクセスユニット単位の映像信号を復号するように構成したので、ユーザによりチャンネルが切り替えられたときに、何の復号画像も表示されない時間を無くして、シームレスなチャンネル切替を実現することができる復号装置が得られる効果がある。 Further, according to the first embodiment, when a command to switch the bit stream to be presented is given, the stream selection unit 11 selects the bit after switching from the bit streams output from the plurality of encoding devices. A stream is selected, and the bit stream is output to the MMTP packet analysis unit 12 and until the presentation time calculated by the control MMTP payload processing unit 13 is reached (until the bit stream output stop command is received). The bit stream is also output to the MMTP packet analysis unit 12, and the HEVCES decoding unit 21 decodes the video signal in units of access units from the encoded data of the video signal multiplexed in the bit stream after switching, and the control MMTP The presentation time calculated by the payload processing unit 13 is reached. Since the video signal for each access unit is decoded from the encoded data of the video signal multiplexed in the bit stream before switching, no decoding is performed when the channel is switched by the user. There is an effect that a decoding device capable of realizing seamless channel switching without a time during which no image is displayed can be obtained.

実施の形態２．
上記実施の形態１では、ＨＥＶＣＥＳ復号部２１が、切替後のチャンネルに係るＩＲＡＰの提示時刻になるまでの間、切替前のチャンネルに係るアクセスユニット（ＡＵ）の映像信号を復号して、その復号した映像信号である復号画像とＰＴＳ（提示時刻）を復号画像バッファ２２に格納するものを示したが、切替後のチャンネルに係るＬＰのアクセスユニット（ＡＵ）であっても、そのアクセスユニット（ＡＵ）を構成しているＮＡＬユニットの符号化方式によっては、ＩＲＡＰの復号前であっても、復号することが可能な場合がある。
ＩＲＡＰの復号前であっても、ＬＰのアクセスユニット（ＡＵ）が復号可能であるか否かは、ＨＥＶＣピクチャ構造記述子に記述されているピクチャタイプ情報（ｎａｌ＿ｕｎｉｔ＿ｔｙｐｅ＿ｏｆ＿ｌｅａｄｉｎｇ＿ｐｉｃｔｕｒｅ）を参照すれば、ＬＰを構成しているＮＡＬユニットの符号化方式が分かるため判断することができる。例えば、ＬＰを構成しているＮＡＬユニットの符号化方式がフレーム内符号化方式であれば、１つ前のＳＯＰのアクセスユニット（ＡＵ）の映像信号が復号されていなくても、復号することが可能である。 Embodiment 2. FIG.
In the first embodiment, the HEVCES decoding unit 21 decodes the video signal of the access unit (AU) related to the channel before switching until the IRAP presentation time related to the channel after switching is reached. Although the decoded image and the PTS (presentation time) that are the video signals are stored in the decoded image buffer 22, the access unit (AU) of the LP access unit (AU) related to the channel after switching is shown. Depending on the encoding method of the NAL unit that constitutes), it may be possible to perform decoding even before IRAP decoding.
Whether or not the LP access unit (AU) is decodable even before IRAP decoding is configured by referring to picture type information (nal_unit_type_of_leading_picture) described in the HEVC picture structure descriptor. This can be determined because the encoding method of the NAL unit being used is known. For example, if the encoding method of the NAL unit constituting the LP is an intra-frame encoding method, it can be decoded even if the video signal of the access unit (AU) of the previous SOP is not decoded. Is possible.

ＨＥＶＣＥＳ復号部２１は、ＩＲＡＰの復号前であっても、切替後のチャンネルに係るＬＰのアクセスユニット（ＡＵ）を復号することが可能であれば、そのアクセスユニット（ＡＵ）の映像信号を復号して、その復号した映像信号である復号画像とＰＴＳ（提示時刻）を復号画像バッファ２２に格納する。
例えば、図１０のＧＯＰ構成において、ＬＰであるＢ２５，Ｂ２５，Ｂ２７，Ｂ２８は復号できないが、Ｂ２９，Ｂ３０，Ｂ３１の復号可能であれば、Ｂ２９，Ｂ３０，Ｂ３１の映像信号を復号して、その復号した映像信号である復号画像とＰＴＳ（提示時刻）を復号画像バッファ２２に格納する。
これにより、外部の再生装置（図示せず）は、切替後のチャンネルに係るＩＲＡＰの提示時刻になるまでの間において、Ｂ２５，Ｂ２５，Ｂ２７，Ｂ２８の提示時刻では、切替前のチャンネルに係る復号画像を再生し、Ｂ２９，Ｂ３０，Ｂ３１の提示時刻では、切替後のチャンネルに係る復号画像を再生することができる。 The HEVCES decoding unit 21 decodes the video signal of the access unit (AU) if it can decode the LP access unit (AU) related to the channel after switching even before the IRAP decoding. Then, the decoded image and the PTS (presentation time) which are the decoded video signals are stored in the decoded image buffer 22.
For example, in the GOP configuration of FIG. 10, B25, B25, B27, and B28 that are LPs cannot be decoded, but if B29, B30, and B31 can be decoded, the video signals of B29, B30, and B31 are decoded, The decoded image and the PTS (presentation time) that are the decoded video signals are stored in the decoded image buffer 22.
As a result, an external playback device (not shown) decodes the channel before switching at the presentation time of B25, B25, B27, and B28 until the presentation time of IRAP related to the channel after switching is reached. The image is reproduced, and at the presentation time of B29, B30, B31, the decoded image relating to the channel after switching can be reproduced.

なお、本願発明はその発明の範囲内において、各実施の形態の自由な組み合わせ、あるいは各実施の形態の任意の構成要素の変形、もしくは各実施の形態において任意の構成要素の省略が可能である。 In the present invention, within the scope of the invention, any combination of the embodiments, or any modification of any component in each embodiment, or omission of any component in each embodiment is possible. .

１音声符号化部、２音声ＭＭＴＰペイロード生成部、３ＨＥＶＣ符号化部（映像符号化手段）、４映像ＭＭＴＰペイロード生成部（映像符号化手段）、５制御情報符号化部（制御情報符号化手段）、６制御ＭＭＴＰペイロード生成部（制御情報符号化手段）、７ＭＭＴＰパケット生成部（多重化手段）、１１ストリーム選択部（ビットストリーム選択手段）、１２ＭＭＴＰパケット解析部、１３制御ＭＭＴＰペイロード処理部（提示時刻算出手段）、１４アセット分離部、１５音声ＭＭＴＰペイロード処理部、１６音声ＥＳバッファ、１７音声ストリーム復号部、１８音声データバッファ、１９映像ＭＭＴＰペイロード処理部（映像復号手段）、２０ＨＥＶＣＥＳバッファ（映像復号手段）、２１ＨＥＶＣＥＳ復号部（映像復号手段）、２２復号画像バッファ（映像復号手段）。 DESCRIPTION OF SYMBOLS 1 Audio encoding part, 2 Audio MMTP payload production | generation part, 3 HEVC encoding part (video coding means), 4 Video MMTP payload production | generation part (video coding means), 5 Control information coding part (Control information coding means) ), 6 control MMTP payload generation unit (control information encoding unit), 7 MMTP packet generation unit (multiplexing unit), 11 stream selection unit (bit stream selection unit), 12 MMTP packet analysis unit, 13 control MMTP payload processing unit (Presentation time calculation means), 14 asset separation unit, 15 audio MMTP payload processing unit, 16 audio ES buffer, 17 audio stream decoding unit, 18 audio data buffer, 19 video MMTP payload processing unit (video decoding unit), 20 HEVCES buffer (Video decoding means), 21 HEVCES decoding (Video decoding means), 22 the decoded picture buffer (video decoding means).

Claims

Video encoding means for encoding a video signal in units of video access units and outputting encoded data of the video signal;
A plurality of accesses capable of decoding all of the video signals of the one or more access units when the video signal of the one or more access units is encoded by the interframe predictive encoding method by the video encoding means. For each GOP that is a set of units, it includes presentation time information that indicates the presentation time of the first access unit in the presentation order, and number information that indicates the number of access units that are earlier in presentation order than the first access unit in the encoding order. Control information encoding means for encoding control information and outputting encoded data of control information in GOP units;
The encoded data of the video signal output from the video encoding means and the encoded data of the control information output from the control information encoding means are multiplexed, and a bit stream that is the encoded data after multiplexing is output. An encoding device comprising: multiplexing means.

In addition to the presentation time information and the number information, the control information encoding means is one or more units of an encoding unit constituting an access unit whose presentation order is earlier than the first access unit in the encoding order. 2. The encoding apparatus according to claim 1, wherein control information including picture type information indicating an encoding method of a certain null unit is encoded, and encoded data of control information in GOP units is output.

Stream selection means for selecting a bit stream to be presented from bit streams output from a plurality of encoding devices;
Video decoding means for decoding a video signal in units of access units from encoded data of the video signal multiplexed in the bit stream selected by the stream selection means;
Decoding processing of encoded data of control information in GOP units, which is a set of a plurality of access units multiplexed in the bit stream selected by the stream selection means, and presenting the first access unit in the presentation order From the presentation time information indicating the time and the number information indicating the number of access units whose order of presentation is earlier than the first access unit in the encoding order, the presentation time indicated by the presentation time information and the number indicated by the number information And a presentation time calculation means for calculating the presentation time of the head access unit in the encoding order,
When the instruction to switch the bit stream to be presented is given, the stream selection unit selects the bit stream after switching from the bit streams output from the plurality of encoding devices, and the bit stream Is output to the video decoding means, and before the presentation time calculated by the presentation time calculating means is reached, the bit stream before switching is also output to the video decoding means,
The video decoding means decodes the video signal in units of access units from the encoded data of the video signal multiplexed in the switched bit stream and until the presentation time calculated by the presentation time calculation means is reached. A decoding apparatus for decoding a video signal for each access unit from encoded data of a video signal multiplexed in the bit stream before switching.

In addition to the presentation time information and the number information, the video decoding means is a null that is one or more units of an encoding unit that constitutes an access unit whose presentation order is earlier than the first access unit in the encoding order. When picture type information indicating a unit encoding method is included in the control information, the picture type information included in the control information multiplexed in the switched bitstream is decoded, and the picture type With reference to the encoding method indicated by the information, the access unit that identifies the decodable access unit among the access units whose presentation order is earlier than the first access unit in the encoding order, and the video signal of the decodable access unit is The decoding device according to claim 3, wherein the decoding device performs decoding.

A video encoding means for encoding a video signal in units of video access units and outputting encoded data of the video signal; and
The control information encoding means decodes all of the video signals of the one or more access units when the video signal of one or more access units is encoded by the inter-frame prediction encoding method in the video encoding processing step. For each GOP that is a set of a plurality of access units that can be performed, the presentation time information indicating the presentation time of the first access unit in the presentation order and the access unit whose presentation order is earlier than the first access unit in the coding order A control information encoding processing step that encodes control information including the number information indicating the number and outputs encoded data of the control information in GOP units;
The multiplexing means multiplexes the encoded data of the video signal output in the video encoding processing step and the encoded data of the control information output in the control information encoding processing step, and the encoded data after multiplexing And a multiplexing process step for outputting a bitstream.

A stream selection processing step in which the stream selection means selects a bit stream to be presented from among the bit streams output from the plurality of encoding devices;
A video decoding processing step, wherein the video decoding means decodes a video signal in units of access units from encoded data of the video signal multiplexed in the bit stream selected in the stream selection processing step;
The presentation time calculation means performs a decoding process of the encoded data of the control information in GOP units, which is a set of a plurality of access units multiplexed in the bit stream selected in the stream selection processing step, Decoding the presentation time information indicating the presentation time of the first access unit and the number information indicating the number of access units earlier in the encoding order than the first access unit in the encoding order, and the presentation time indicated by the presentation time information A presentation time calculation processing step of calculating the presentation time of the first access unit in the encoding order from the number indicated by the number information,
In the stream selection processing step, when a command to switch the bit stream to be presented is given, the bit stream after switching is selected from the bit streams output from the plurality of encoding devices, and the bit is selected. While outputting the stream to the video decoding means, until the presentation time calculated in the presentation time calculation processing step, also outputs the bit stream before switching to the video decoding means,
In the video decoding processing step, the video signal for each access unit is decoded from the encoded data of the video signal multiplexed in the bit stream after the switching, and at the presentation time calculated in the presentation time calculation processing step. Until then, the decoding method characterized in that the video signal in units of access units is decoded from the encoded data of the video signal multiplexed in the bit stream before switching.