JP2022084018A

JP2022084018A - Encoding apparatus

Info

Publication number: JP2022084018A
Application number: JP2021190776A
Authority: JP
Inventors: 秀一青木; Shuichi Aoki
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2020-11-25
Filing date: 2021-11-25
Publication date: 2022-06-06

Abstract

To provide an encoding apparatus mounting a constitution to appropriately process a first packet including a bit stream and a second packet including control information.SOLUTION: The encoding apparatus comprises a first processing unit for executing an encoding process regarding the first packet including a bit stream of an audio signal and/or an image signal and a second processing unit for processing the second packet including control information regarding the encoding process, the second processing unit being provided independently of the first processing unit.SELECTED DRAWING: Figure 2

Description

本発明は、符号化装置に関する。 The present invention relates to a coding device.

従来、音声信号及び映像信号などのストリーム信号の圧縮符号化技術が知られている。圧縮符号化技術で用いるパケットは、ストリーム信号を構成するビットストリームを含む第１パケットと、ビットストリームに関する制御情報を含む第２パケットと、を含む。例えば、圧縮符号化技術で用いるパケットとしては、MMTP（MPEG Media Transport Protocol）パケットを用いることができる（例えば、非特許文献１、２）。 Conventionally, compression coding techniques for stream signals such as audio signals and video signals are known. The packet used in the compression coding technique includes a first packet including a bitstream constituting a stream signal and a second packet containing control information about the bitstream. For example, as a packet used in the compression coding technique, an MMTP (MPEG Media Transport Protocol) packet can be used (for example, Non-Patent Documents 1 and 2).

ISO/IEC 23008-1ISO / IEC 23008-1 ISO/IEC 23008-3ISO / IEC 23008-3

発明者等は、鋭意検討の結果、上述した圧縮符号化技術において、第１パケット及び第２パケットの処理方法が定められていないことに着目し、第１パケット及び第２パケットを適切に処理するための構成を検討する必要性を見出した。 As a result of diligent studies, the inventors pay attention to the fact that the processing method of the first packet and the second packet is not defined in the above-mentioned compression coding technique, and appropriately process the first packet and the second packet. We found the need to consider the configuration for this.

そこで、本発明は、上述した課題を解決するためになされたものであり、ビットストリームを含む第１パケット及び制御情報を含む第２パケットを適切に処理するための構成を実装する符号化装置を提供することを目的とする。 Therefore, the present invention has been made to solve the above-mentioned problems, and a coding device that implements a configuration for appropriately processing a first packet including a bit stream and a second packet containing control information is provided. The purpose is to provide.

開示の一態様に係る符号化装置は、音声信号及び映像信号の少なくともいずれか１つのビットストリームを含む第１パケットに関する符号化処理を実行する第１処理部と、前記符号化処理に関する制御情報を含む第２パケットを処理する第２処理部と、を備え、前記第２処理部は、前記第１処理部とは独立して設けられる。 The coding device according to one aspect of the disclosure includes a first processing unit that executes a coding process for a first packet including at least one bitstream of an audio signal and a video signal, and control information related to the coding process. A second processing unit that processes the second packet including the second packet is provided, and the second processing unit is provided independently of the first processing unit.

本発明によれば、ビットストリームを含む第１パケット及び制御情報を含む第２パケットを適切に処理するための構成を実装する復号装置を提供することができる。 According to the present invention, it is possible to provide a decoding device that implements a configuration for appropriately processing a first packet including a bit stream and a second packet containing control information.

図１は、実施形態に係る伝送システム１０を示す図である。FIG. 1 is a diagram showing a transmission system 10 according to an embodiment. 図２は、実施形態に係る送信装置１００を示すブロック図である。FIG. 2 is a block diagram showing a transmission device 100 according to an embodiment. 図３は、実施形態に係る符号化装置を示す図である。FIG. 3 is a diagram showing a coding device according to an embodiment. 図４は、実施形態に係る受信装置２００を示すブロック図である。FIG. 4 is a block diagram showing a receiving device 200 according to an embodiment. 図５は、実施形態に係る復号装置を示す図である。FIG. 5 is a diagram showing a decoding device according to an embodiment. 図６は、実施形態に係るMMTPパケットを説明するための図である。FIG. 6 is a diagram for explaining the MMTP packet according to the embodiment. 図７は、実施形態に係るMPTを説明するための図である。FIG. 7 is a diagram for explaining the MPT according to the embodiment. 図８は、実施形態に係るMPTを説明するための図である。FIG. 8 is a diagram for explaining the MPT according to the embodiment. 図９は、実施形態に係るMMTPパケットのヘッダを説明するための図である。FIG. 9 is a diagram for explaining a header of the MMTP packet according to the embodiment. 図１０は、実施形態に係るMFUのヘッダを説明するための図である。FIG. 10 is a diagram for explaining a header of the MFU according to the embodiment.

次に、本発明の実施形態について説明する。なお、以下の図面の記載において、同一または類似の部分には、同一または類似の符号を付している。ただし、図面は模式的なものであり、各寸法の比率などは現実のものとは異なることに留意すべきである。 Next, an embodiment of the present invention will be described. In the description of the drawings below, the same or similar parts are designated by the same or similar reference numerals. However, it should be noted that the drawings are schematic and the ratio of each dimension is different from the actual one.

したがって、具体的な寸法などは以下の説明を参酌して判断すべきものである。また、図面相互間においても互いの寸法の関係や比率が異なる部分が含まれていることは勿論である。 Therefore, the specific dimensions should be determined in consideration of the following explanation. In addition, it goes without saying that parts having different dimensional relationships and ratios are included between the drawings.

［開示の概要］
開示の概要に係る符号化装置は、音声信号及び映像信号の少なくともいずれか１つのビットストリームを含む第１パケットに関する符号化処理を実行する第１処理部と、前記符号化処理に関する制御情報を含む第２パケットを処理する第２処理部と、を備え、前記第２処理部は、前記第１処理部とは独立して設けられる。 [Summary of disclosure]
The coding device according to the outline of the disclosure includes a first processing unit that executes a coding process for a first packet including at least one bitstream of an audio signal and a video signal, and control information related to the coding process. A second processing unit for processing the second packet is provided, and the second processing unit is provided independently of the first processing unit.

開示の概要では、制御情報を含む第２パケットを処理する第２処理部は、ビットストリームを含む第１パケットに関する符号化処理を実行する第１処理部とは独立して設けられる。このような構成によれば、第１処理部については、FPGA（Field-Programmable Gate Array）によって構成し、第２処理部についてはCPU（Central Processing Unit）によって構成するなどのように、第１パケット及び第２パケットを適切に処理するための構成を実装することができる。 In the outline of the disclosure, the second processing unit that processes the second packet containing the control information is provided independently of the first processing unit that executes the coding process for the first packet including the bit stream. According to such a configuration, the first processing unit is configured by FPGA (Field-Programmable Gate Array), the second processing unit is configured by CPU (Central Processing Unit), and so on. And a configuration for properly processing the second packet can be implemented.

開示の概要に係る復号装置は、音声信号及び映像信号の少なくともいずれか１つのビットストリームを含む第１パケットに関する復号処理を実行する第１処理部と、前記復号処理に関する制御情報を含む第２パケットを処理する第２処理部と、を備え、前記第２処理部は、前記第１処理部とは独立して設けられる。 The decoding device according to the outline of the disclosure includes a first processing unit that executes a decoding process for a first packet including at least one bitstream of an audio signal and a video signal, and a second packet containing control information related to the decoding process. The second processing unit is provided with a second processing unit for processing the above, and the second processing unit is provided independently of the first processing unit.

開示の概要では、制御情報を含む第２パケットを処理する第２処理部は、ビットストリームを含む第１パケットに関する復号処理を実行する第１処理部とは独立して設けられる。このような構成によれば、第１処理部については、FPGAによって構成し、第２処理部についてはCPUによって構成するなどのように、第１パケット及び第２パケットを適切に処理するための構成を実装することができる。 In the outline of the disclosure, the second processing unit that processes the second packet containing the control information is provided independently of the first processing unit that executes the decoding process for the first packet including the bit stream. According to such a configuration, the first processing unit is configured by FPGA, the second processing unit is configured by CPU, and so on, so that the first packet and the second packet are appropriately processed. Can be implemented.

ここで、「第１処理部及び第２処理部が独立する」とは、第１処理部を構成するデバイス及び第２処理部を構成するデバイスが物理的に別々であることを意味する。デバイスとしては、FPGA、CPU、ASIC（Application Specific Integrated Circuit）、CPLD（Complex Programmable Logic Device）、MPU（Micro Processing Unit）、GPU（Graphics Processing Unit）などの中から選択された１以上のデバイスであってもよい。これらのデバイスは、単にプロセッサと称されてもよい。 Here, "the first processing unit and the second processing unit are independent" means that the device constituting the first processing unit and the device constituting the second processing unit are physically separate. The device is one or more devices selected from FPGA, CPU, ASIC (Application Specific Integrated Circuit), CPLD (Complex Programmable Logic Device), MPU (Micro Processing Unit), GPU (Graphics Processing Unit), etc. May be. These devices may simply be referred to as processors.

［実施形態］
（伝送システム）
以下において、実施形態に係る伝送システムについて説明する。図１は、実施形態に係る伝送システム１０を示す図である。図１に示すように、デジタル無線伝送システムは、送信装置１００及び受信装置２００を備える。 [Embodiment]
(Transmission system)
Hereinafter, the transmission system according to the embodiment will be described. FIG. 1 is a diagram showing a transmission system 10 according to an embodiment. As shown in FIG. 1, the digital wireless transmission system includes a transmitting device 100 and a receiving device 200.

実施形態において、伝送システムは、デジタル無線伝送システムであってもよい。デジタル無線伝送システムは、4K、8K衛星放送で用いるシステムであってもよい。伝送のためのパケット形式としては、MMTP（MPEG Media Transport Protocol）に準拠するMMTPパケットが用いられてもよい（例えば、ISO/IEC 23008-1）。さらに、MMTPパケットは、ARIB STD-B60、ARIB TR-B39などで規定される形式を有していてもよい。MMTPパケットは、映像信号について用いられてもよく、音声信号に用いられてもよい。 In embodiments, the transmission system may be a digital wireless transmission system. The digital wireless transmission system may be a system used for 4K or 8K satellite broadcasting. As the packet format for transmission, an MMTP packet conforming to MMTP (MPEG Media Transport Protocol) may be used (for example, ISO / IEC 23008-1). Further, the MMTP packet may have a format specified by ARIB STD-B60, ARIB TR-B39, or the like. The MMTP packet may be used for a video signal or an audio signal.

送信装置１００から受信装置２００への伝送は、特に限定されるものではないが、衛星放送を用いた伝送であってもよく、インターネット網を用いた伝送であってもよく、移動体通信網を用いた伝送であってもよい。 The transmission from the transmitting device 100 to the receiving device 200 is not particularly limited, but may be a transmission using satellite broadcasting, a transmission using an Internet network, or a mobile communication network. It may be the transmission used.

（送信装置の概要）
以下において、実施形態に係る送信装置の概要について説明する。図２は、実施形態に係る送信装置１００を示すブロック図である。 (Overview of transmitter)
Hereinafter, the outline of the transmission device according to the embodiment will be described. FIG. 2 is a block diagram showing a transmission device 100 according to an embodiment.

図２に示すように、送信装置１００は、音声コアエンコーダ１０１と、音声システムエンコーダ１０３と、映像コアエンコーダ１０５と、映像システムエンコーダ１０７と、時刻サーバ１０９と、マルチプレクサ１１１と、送信部１１３と、を有する。 As shown in FIG. 2, the transmission device 100 includes a voice core encoder 101, a voice system encoder 103, a video core encoder 105, a video system encoder 107, a time server 109, a multiplexer 111, and a transmission unit 113. Has.

音声コアエンコーダ１０１は、音声信号のビットストリームを含む第１パケット（以下、音声パケット）に関する符号化処理を実行する第１処理部及び第１音声処理部の一例である。音声コアエンコーダ１０１は、音声パケットを生成するとともに、音声パケットを音声システムエンコーダ１０３に出力する。音声コアエンコーダ１０１は、１以上のプロセッサによって構成される。 The voice core encoder 101 is an example of a first processing unit and a first voice processing unit that execute coding processing for a first packet (hereinafter, voice packet) including a bit stream of a voice signal. The voice core encoder 101 generates a voice packet and outputs the voice packet to the voice system encoder 103. The voice core encoder 101 is composed of one or more processors.

音声システムエンコーダ１０３は、符号化処理に関する制御情報を含む第２パケット（以下、音声システムパケット）を処理する第２処理部及び第２音声処理部の一例である。音声システムエンコーダ１０３は、音声システムパケットを生成するとともに、音声パケット及び音声システムパケットをマルチプレクサに出力する。音声システムエンコーダ１０３は、１以上のプロセッサによって構成される。 The voice system encoder 103 is an example of a second processing unit and a second voice processing unit that process a second packet (hereinafter, voice system packet) including control information related to coding processing. The voice system encoder 103 generates a voice system packet and outputs the voice packet and the voice system packet to the multiplexer. The voice system encoder 103 is composed of one or more processors.

ここで、音声コアエンコーダ１０１及び音声システムエンコーダ１０３は、音声に関する符号化装置１３０を構成してもよい。音声システムエンコーダ１０３は、音声コアエンコーダ１０１とは独立して設けられる。例えば、音声コアエンコーダ１０１は、FPGAによって構成されており、音声システムエンコーダ１０３は、CPUによって構成されてもよい。符号化装置１３０は、時刻サーバ１０９を含んでもよい。符号化装置１３０は、マルチプレクサ１１１を含んでもよい。 Here, the voice core encoder 101 and the voice system encoder 103 may configure a coding device 130 related to voice. The voice system encoder 103 is provided independently of the voice core encoder 101. For example, the voice core encoder 101 may be configured by an FPGA, and the voice system encoder 103 may be configured by a CPU. The coding device 130 may include a time server 109. The coding device 130 may include a multiplexer 111.

映像コアエンコーダ１０５は、映像信号のビットストリームを含む第１パケット（以下、映像パケット）に関する符号化処理を実行する第１処理部及び第１映像処理部の一例である。映像コアエンコーダ１０５は、映像パケットを生成するとともに、映像パケットを映像システムエンコーダ１０７に出力する。映像コアエンコーダ１０５は、１以上のプロセッサによって構成される。 The video core encoder 105 is an example of a first processing unit and a first video processing unit that execute coding processing for a first packet (hereinafter, video packet) including a bit stream of a video signal. The video core encoder 105 generates a video packet and outputs the video packet to the video system encoder 107. The video core encoder 105 is composed of one or more processors.

映像システムエンコーダ１０７は、符号化処理に関する制御情報を含む第２パケット（以下、映像システムパケット）を処理する第２処理部及び第２映像処理部の一例である。映像システムエンコーダ１０７は、映像システムパケットを生成するとともに、映像パケット及び映像システムパケットをマルチプレクサに出力する。映像システムエンコーダ１０７は、１以上のプロセッサによって構成される。 The video system encoder 107 is an example of a second processing unit and a second video processing unit that process a second packet (hereinafter, video system packet) including control information related to coding processing. The video system encoder 107 generates a video system packet and outputs the video packet and the video system packet to the multiplexer. The video system encoder 107 is composed of one or more processors.

ここで、映像コアエンコーダ１０５及び映像システムエンコーダ１０７は、映像に関する符号化装置１５０を構成してもよい。映像システムエンコーダ１０７は、映像コアエンコーダ１０５とは独立して設けられる。例えば、映像コアエンコーダ１０５は、FPGAによって構成されており、映像システムエンコーダ１０７は、CPUによって構成されてもよい。符号化装置１５０は、時刻サーバ１０９を含んでもよい。符号化装置１５０は、マルチプレクサ１１１を含んでもよい。 Here, the video core encoder 105 and the video system encoder 107 may configure a coding device 150 for video. The video system encoder 107 is provided independently of the video core encoder 105. For example, the video core encoder 105 may be configured by an FPGA, and the video system encoder 107 may be configured by a CPU. The coding device 150 may include a time server 109. The coding device 150 may include a multiplexer 111.

なお、符号化装置１３０及び符号化装置１５０は、１つの符号化装置を構成してもよい。 The coding device 130 and the coding device 150 may constitute one coding device.

時刻サーバ１０９は、時刻を管理するサーバである。時刻サーバ１０９は、NTP（Network Time Protocol）サーバであってもよく、PTP（Precision Time Protocol）サーバであってもよい。時刻サーバ１０９によって管理される時刻は、後述する受信装置２００の時刻サーバ２１３によって管理される時刻と同期していてもよい。時刻サーバ１０９によって管理される時刻としては、UTC（Coordinated Universal Time）時刻が用いられてもよい。時刻サーバ１０９によって管理される時刻は、絶対時刻と称されてもよい。 The time server 109 is a server that manages the time. The time server 109 may be an NTP (Network Time Protocol) server or a PTP (Precision Time Protocol) server. The time managed by the time server 109 may be synchronized with the time managed by the time server 213 of the receiving device 200 described later. UTC (Coordinated Universal Time) time may be used as the time managed by the time server 109. The time managed by the time server 109 may be referred to as an absolute time.

実施形態では、音声コアエンコーダ１０１及び音声システムエンコーダ１０３が独立しているため、時刻サーバ１０９は、音声コアエンコーダ１０１に接続されず、音声システムエンコーダ１０３に接続されることに留意すべきである。同様に、映像コアエンコーダ１０５及び映像システムエンコーダ１０７が独立しているため、時刻サーバ１０９は、映像コアエンコーダ１０５に接続されず、映像システムエンコーダ１０７に接続されることに留意すべきである。 It should be noted that in the embodiment, since the voice core encoder 101 and the voice system encoder 103 are independent, the time server 109 is not connected to the voice core encoder 101 but is connected to the voice system encoder 103. Similarly, it should be noted that since the video core encoder 105 and the video system encoder 107 are independent, the time server 109 is not connected to the video core encoder 105 but is connected to the video system encoder 107.

マルチプレクサ１１１は、音声パケット、映像パケット、音声システムパケット及び映像システムパケットを多重する。このようなケースにおいて、マルチプレクサ１１１は、音声システムパケット及び映像システムパケットを１つの音声・映像システムパケットにマージしてもよい。マルチプレクサ１１１に入力された音声システムパケット及び映像システムパケットは破棄される。マルチプレクサ１１１は、音声パケット、映像パケット、音声・映像システムパケットを含む伝送ストリームを送信部１１３に出力する。 The multiplexer 111 multiplexes audio packets, video packets, audio system packets, and video system packets. In such a case, the multiplexer 111 may merge the audio system packet and the video system packet into one audio / video system packet. The audio system packet and the video system packet input to the multiplexer 111 are discarded. The multiplexer 111 outputs a transmission stream including an audio packet, a video packet, and an audio / video system packet to the transmission unit 113.

送信部１１３は、伝送ストリームを受信装置２００に送信する。伝送ストリームの伝送方式は、特に限定されるものではない。送信部１１３は、MMT/IPレイヤ以下のレイヤで必要な処理を実行する。例えば、送信部１１３は、誤り訂正符号化処理、OFDM変調処理などを実行してもよい。 The transmission unit 113 transmits the transmission stream to the receiving device 200. The transmission method of the transmission stream is not particularly limited. The transmission unit 113 executes necessary processing in a layer below the MMT / IP layer. For example, the transmission unit 113 may execute error correction coding processing, OFDM modulation processing, and the like.

（符号化装置の詳細）
以下において、符号化装置の詳細について説明する。図３は、符号化装置１３０及び符号化装置１５０の詳細について説明するための図である。 (Details of encoding device)
The details of the coding apparatus will be described below. FIG. 3 is a diagram for explaining the details of the coding device 130 and the coding device 150.

図３の上段に示すように、音声コアエンコーダ１０１は、音声信号を圧縮符号化することによって音声パケットを生成する。音声コアエンコーダ１０１は、音声パケットを音声システムエンコーダ１０３に出力する。圧縮符号化方式は、ISO/IEC 23008-3（MPEG-H Audio）に準拠する方式であってもよい。音声コアエンコーダ１０１は、MHAS（MPEG-H Audio Stream）パケットをMMTPパケットに格納し、MMTPパケットをIP（Internet Protocol）パケットに格納する。MMTPパケットは、ISO/IEC 23008-1に準拠するパケットであってもよい。IPパケットは、UDP（User Datagram Protocol）を用いて伝送されてもよく、TCP（Transmission Control Protocol）を用いて伝送されてもよい。実施形態において、音声パケットは、MMTPパケットを意味してもよい。 As shown in the upper part of FIG. 3, the voice core encoder 101 generates a voice packet by compressing and encoding the voice signal. The voice core encoder 101 outputs a voice packet to the voice system encoder 103. The compression coding method may be a method conforming to ISO / IEC 23008-3 (MPEG-H Audio). The voice core encoder 101 stores the MHAS (MPEG-H Audio Stream) packet in the MMTP packet, and stores the MMTP packet in the IP (Internet Protocol) packet. The MMTP packet may be a packet conforming to ISO / IEC 23008-1. The IP packet may be transmitted using UDP (User Datagram Protocol) or may be transmitted using TCP (Transmission Control Protocol). In embodiments, the voice packet may mean an MMTP packet.

例えば、音声コアエンコーダ１０１は、音声信号を圧縮符号化したビットストリームに含まれる３種類の情報（設定情報、フレームデータ及びメタデータ）のそれぞれを含むMHASパケットを生成する。設定情報は、圧縮符号化に関する設定情報である。フレームデータは、音声信号を圧縮符号化した情報である。メタデータは、音声信号の属性を定義する情報である。 For example, the voice core encoder 101 generates an MHAS packet containing each of three types of information (setting information, frame data, and metadata) included in a bitstream in which a voice signal is compressed and encoded. The setting information is the setting information related to compression coding. The frame data is information obtained by compressing and encoding an audio signal. Metadata is information that defines the attributes of an audio signal.

なお、音声コアエンコーダ１０１に入力される音声信号は、ステレオ音声信号であってもよく、５．１チャンネル音声信号であってもよく、２２．２チャンネル音声信号であってもよい。 The audio signal input to the audio core encoder 101 may be a stereo audio signal, a 5.1 channel audio signal, or a 22.2 channel audio signal.

音声システムエンコーダ１０３は、制御情報を含む音声システムパケットを生成する。音声システムエンコーダ１０３は、音声パケット及び音声システムパケットをマルチプレクサ１１１に出力する。制御情報は、MPT（MMT Package Table）と称されてもよい。音声システムエンコーダ１０３は、MPTをMMTPパケットに格納し、MMTPパケットをIPパケットに格納する。MMTPパケットは、ISO/IEC 23008-1に準拠するパケットであってもよい。IPパケットは、UDPを用いて伝送されてもよく、TCPを用いて伝送されてもよい。実施形態において、音声システムパケットは、MMTPパケットを意味してもよい。 The voice system encoder 103 generates a voice system packet containing control information. The voice system encoder 103 outputs voice packets and voice system packets to the multiplexer 111. The control information may be referred to as MPT (MMT Package Table). The voice system encoder 103 stores the MPT in the MMTP packet and stores the MMTP packet in the IP packet. The MMTP packet may be a packet conforming to ISO / IEC 23008-1. The IP packet may be transmitted using UDP or may be transmitted using TCP. In embodiments, the voice system packet may mean an MMTP packet.

なお、上述した音声コアエンコーダ１０１は、音声信号の圧縮符号化に特化しているため、音声パケットには制御情報（MPT）が含まれないことに留意すべきである。 It should be noted that since the voice core encoder 101 described above is specialized in compression coding of a voice signal, the voice packet does not include control information (MPT).

制御情報は、音声パケットの復号後のビットストリームを出力する時刻を示す情報要素を含んでもよい。ビットストリームを出力する時刻は、時刻サーバ１０９から取得される時刻に基づいた時刻である。例えば、ビットストリームを出力する時刻として、取得される時刻そのものが用いられてもよく、取得される時刻に一定の時間を加算した時刻が用いられてもよく、取得される時刻から一定の時間を減算した時刻が用いられてもよい。どのような時刻を用いるかは、送信側と受信側とで予め定めておく。時刻を示す情報要素は、MPU（Media Processing Unit）タイムスタンプ記述子と称されてもよい。 The control information may include an information element indicating the time when the bitstream after decoding the voice packet is output. The time to output the bitstream is a time based on the time acquired from the time server 109. For example, as the time to output the bit stream, the acquired time itself may be used, or the acquired time plus a fixed time may be used, and a fixed time may be used from the acquired time. The subtracted time may be used. The time to be used is predetermined by the transmitting side and the receiving side. The information element indicating the time may be referred to as an MPU (Media Processing Unit) time stamp descriptor.

制御情報は、音声パケットと対応するシーケンス番号を示す情報要素を含んでもよい。シーケンス番号は、MPU毎のシーケンス番号であってもよい。MPUは、MMTにおけるデータの処理単位であってもよい。 The control information may include an information element indicating a sequence number corresponding to the voice packet. The sequence number may be a sequence number for each MPU. The MPU may be a data processing unit in the MMT.

制御情報は、MMTPパケットが音声に関するものであることを示す情報要素を含んでもよく、さらにその圧縮符号化方式を示す情報要素を含んでもよい。このような情報要素は、あらかじめ音声システムエンコーダ１０３に設定しておくか、あるいは音声システムエンコーダ１０３に入力されたMMTPパケットに格納されるMHASパケットを解析して特定する。このような情報要素は、MPTのasset_typeに格納される情報要素（例えば、mhm1）であってもよい。 The control information may include an information element indicating that the MMTP packet is related to voice, and may further include an information element indicating the compression coding method thereof. Such an information element is set in the voice system encoder 103 in advance, or is specified by analyzing the MHAS packet stored in the MMTP packet input to the voice system encoder 103. Such an information element may be an information element (for example, mhm1) stored in the asset_type of MPT.

ここで、音声システムエンコーダ１０３は、音声パケット（MMTPパケット）の拡張ヘッダの少なくとも一部を削除してもよい。例えば、１つのオーディオフレームに含まれる音声信号のサンプル数を指定する情報要素が拡張ヘッダに含まれる場合に、音声システムエンコーダ１０３は、音声信号のサンプル数を指定する情報要素を削除してもよい。音声システムエンコーダ１０３は、拡張ヘッダの少なくとも一部が削除された音声パケット（MMTPパケット）を出力してもよい。拡張ヘッダの少なくとも一部を削除することで、例えば、ARIB TR-B39で規定されるMMTPパケットとの互換性を保つことができる。なお、オーディオフレームは、音声信号の符号化処理の最小単位であり、例えば、1024個の音声サンプルを含んでもよい。 Here, the voice system encoder 103 may delete at least a part of the extension header of the voice packet (MMTP packet). For example, when the extension header contains an information element that specifies the number of audio signal samples contained in one audio frame, the audio system encoder 103 may delete the information element that specifies the number of audio signal samples. .. The voice system encoder 103 may output a voice packet (MMTP packet) in which at least a part of the extension header is deleted. By deleting at least a part of the extension header, compatibility with the MMTP packet specified by ARIB TR-B39 can be maintained, for example. The audio frame is the minimum unit of the audio signal coding process, and may include, for example, 1024 audio samples.

図３の中段に示すように、映像コアエンコーダ１０５は、映像信号を圧縮符号化することによって映像パケットを生成する。映像コアエンコーダ１０５は、映像パケットを映像システムエンコーダ１０７に出力する。圧縮符号化方式は、ISO/IEC23008-2（HEVC；High Efficiency Video Coding）に準拠する方式であってもよい。圧縮符号化方式は、ISO/IEC 23090-3（VVC；Versatile Video Coding）に準拠する方式であってもよい。映像コアエンコーダ１０５は、VVCの符号を含むNAL（Network Abstraction Unit）ユニットをMMTPパケットに格納し、MMTPパケットをIPパケットに格納する。MMTPパケットは、ISO/IEC 23008-1に準拠するパケットであってもよい。IPパケットは、UDPであってもよい。実施形態において、映像パケットは、MMTPパケットを意味してもよい。 As shown in the middle of FIG. 3, the video core encoder 105 generates a video packet by compression-coding the video signal. The video core encoder 105 outputs a video packet to the video system encoder 107. The compression coding method may be a method conforming to ISO / IEC23008-2 (HEVC; High Efficiency Video Coding). The compression coding method may be a method conforming to ISO / IEC 23090-3 (VVC; Versatile Video Coding). The video core encoder 105 stores the NAL (Network Abstraction Unit) unit including the VVC code in the MMTP packet, and stores the MMTP packet in the IP packet. The MMTP packet may be a packet conforming to ISO / IEC 23008-1. The IP packet may be UDP. In embodiments, the video packet may mean an MMTP packet.

例えば、映像コアエンコーダ１０５は、映像信号を圧縮符号化したビットストリームに含まれる３種類の情報（設定情報、フレームデータ及びメタデータ）のそれぞれを含むNALユニットを生成する。設定情報は、圧縮符号化に関する設定情報である。フレームデータは、映像信号を圧縮符号化した情報である。メタデータは、映像信号の属性を定義する情報である。 For example, the video core encoder 105 generates a NAL unit containing each of three types of information (setting information, frame data, and metadata) included in a bitstream in which a video signal is compressed and encoded. The setting information is the setting information related to compression coding. The frame data is information obtained by compressing and coding a video signal. Metadata is information that defines the attributes of the video signal.

なお、映像コアエンコーダ１０５に入力される映像信号は、各種の解像度（2K、4K、8Kなど）を有する映像信号であってもよく、ERP（Equirectangular Projection）などの射影変換を用いた全天周映像信号であってもよい。 The video signal input to the video core encoder 105 may be a video signal having various resolutions (2K, 4K, 8K, etc.), and is an all-sky circumference using projection conversion such as ERP (Equirectangular Projection). It may be a video signal.

映像システムエンコーダ１０７は、制御情報を含む映像システムパケットを生成する。映像システムエンコーダ１０７は、映像パケット及び映像システムパケットをマルチプレクサ１１１に出力する。制御情報は、MPTと称されてもよい。映像システムエンコーダ１０７は、MPTをMMTPパケットに格納し、MMTPパケットをIPパケットに格納する。MMTPパケットは、ISO/IEC 23008-1に準拠するパケットであってもよい。IPパケットは、UDPであってもよい。実施形態において、映像システムパケットは、MMTPパケットを意味してもよい。 The video system encoder 107 generates a video system packet containing control information. The video system encoder 107 outputs video packets and video system packets to the multiplexer 111. The control information may be referred to as MPT. The video system encoder 107 stores the MPT in the MMTP packet and stores the MMTP packet in the IP packet. The MMTP packet may be a packet conforming to ISO / IEC 23008-1. The IP packet may be UDP. In embodiments, the video system packet may mean an MMTP packet.

なお、上述した映像コアエンコーダ１０５は、映像信号の圧縮符号化に特化しているため、映像パケットには制御情報（MPT）が含まれないことに留意すべきである。 It should be noted that since the video core encoder 105 described above is specialized in compression coding of a video signal, the video packet does not include control information (MPT).

制御情報は、映像パケットの復号後のビットストリームを出力する時刻を示す情報要素を含んでもよい。ビットストリームを出力する時刻は、時刻サーバ１０９から取得される時刻に基づいた時刻である。例えば、ビットストリームを出力する時刻として、取得される時刻そのものが用いられてもよく、取得される時刻に一定の時間を加算した時刻が用いられてもよく、取得される時刻から一定の時間を減算した時刻が用いられてもよい。どのような時刻を用いるかは、送信側と受信側とで予め定めておく。時刻を示す情報要素は、MPUタイムスタンプ記述子と称されてもよい。 The control information may include an information element indicating the time when the bitstream after decoding the video packet is output. The time to output the bitstream is a time based on the time acquired from the time server 109. For example, as the time to output the bit stream, the acquired time itself may be used, or the acquired time plus a fixed time may be used, and a fixed time may be used from the acquired time. The subtracted time may be used. The time to be used is predetermined by the transmitting side and the receiving side. The information element indicating the time may be referred to as an MPU time stamp descriptor.

なお、ビットストリームを出力する時刻は、時刻サーバ１０９から取得される時刻に基づいた時刻であるため、映像信号及び音声信号を適切に同期させることができることに留意すべきである。 It should be noted that since the time for outputting the bitstream is the time based on the time acquired from the time server 109, it should be noted that the video signal and the audio signal can be appropriately synchronized.

制御情報は、映像パケットと対応するシーケンス番号を示す情報要素を含んでもよい。シーケンス番号は、MPU毎のシーケンス番号であってもよい。MPUは、MMTにおけるデータの処理単位であってもよい。 The control information may include an information element indicating a sequence number corresponding to the video packet. The sequence number may be a sequence number for each MPU. The MPU may be a data processing unit in the MMT.

制御情報は、MMTPパケットが映像に関するものであることを示す情報要素を含んでもよく、さらにその圧縮符号化方式を示す情報要素を含んでもよい。このような情報要素は、あらかじめ映像システムエンコーダ１０７に設定しておくか、あるいは、映像システムエンコーダ１０７に入力されたMMTPパケットの中のNALユニットを解析して特定する。このような情報要素は、MPTのasset_typeに格納される情報要素であってもよい。 The control information may include an information element indicating that the MMTP packet is related to video, and may further include an information element indicating the compression coding method thereof. Such an information element is set in the video system encoder 107 in advance, or is specified by analyzing the NAL unit in the MMTP packet input to the video system encoder 107. Such an information element may be an information element stored in the asset_type of MPT.

図３の下段に示すように、マルチプレクサ１１１は、音声パケット、映像パケット、音声・映像システムパケットを送信部１１３に出力する。音声パケット、映像パケット、音声・映像システムパケットが多重された伝送ストリームは、MMTPフローと称されてもよく、パッケージと称されてもよい。 As shown in the lower part of FIG. 3, the multiplexer 111 outputs an audio packet, a video packet, and an audio / video system packet to the transmission unit 113. A transmission stream in which audio packets, video packets, and audio / video system packets are multiplexed may be referred to as an MMTP flow or a package.

ここで、マルチプレクサ１１１は、MMTPパケットが音声又は映像に関するものであることを示す情報要素を制御情報に基づいて、MMTPパケットが音声及び映像の双方に関するものであることを示す情報要素を新たに生成し、新たに生成された情報要素を含む新たな制御情報（MPT）を生成する。マルチプレクサ１１１は、音声システムエンコーダ１０３及び映像システムエンコーダ１０７から出力された「時刻を示す情報要素」（例えば、MPUタイムスタンプ記述子）を新たな制御情報（MPT）に格納する。 Here, the multiplexer 111 newly generates an information element indicating that the MMTP packet is related to both audio and video based on the control information. And generate new control information (MPT) including the newly generated information element. The multiplexer 111 stores the "information element indicating the time" (for example, the MPU time stamp descriptor) output from the audio system encoder 103 and the video system encoder 107 in the new control information (MPT).

（受信装置の概要）
以下において、実施形態に係る受信装置の概要について説明する。図４は、実施形態に係る受信装置２００を示すブロック図である。 (Overview of receiver)
Hereinafter, the outline of the receiving device according to the embodiment will be described. FIG. 4 is a block diagram showing a receiving device 200 according to an embodiment.

図４に示すように、受信装置２００は、受信部２０１と、デマルチプレクサ２０３と、音声システムデコーダ２０５と、音声コアデコーダ２０７と、映像システムデコーダ２０９と、映像コアデコーダ２１１と、時刻サーバ２１３と、を有する。 As shown in FIG. 4, the receiving device 200 includes a receiving unit 201, a demultiplexer 203, an audio system decoder 205, an audio core decoder 207, a video system decoder 209, a video core decoder 211, and a time server 213. , Have.

受信部２０１は、音声パケット、映像パケット、音声・映像システムパケットを含む伝送ストリームを受信する。受信部２０１は、伝送ストリームの伝送方式は、特に限定されるものではない。受信部２０１は、MMT/IPレイヤ以下のレイヤで必要な処理を実行する。例えば、受信部２０１は、OFDM復調処理、誤り訂正復号処理などを実行してもよい。 The receiving unit 201 receives a transmission stream including an audio packet, a video packet, and an audio / video system packet. The receiving unit 201 is not particularly limited in the transmission method of the transmission stream. The receiving unit 201 executes necessary processing in a layer below the MMT / IP layer. For example, the receiving unit 201 may execute OFDM demodulation processing, error correction / decoding processing, and the like.

デマルチプレクサ２０３は、伝送ストリームから音声パケット及び音声・映像システムパケットを分離する。デマルチプレクサ２０３は、伝送ストリームから映像パケット及び音声・映像システムパケットを分離する。デマルチプレクサ２０３は、音声パケット及び音声・映像システムパケットを音声システムデコーダ２０５に出力し、映像パケット及び音声・映像システムパケットを映像システムデコーダ２０９に出力する。 The demultiplexer 203 separates audio packets and audio / video system packets from the transmission stream. The demultiplexer 203 separates video packets and audio / video system packets from the transmission stream. The demultiplexer 203 outputs the audio packet and the audio / video system packet to the audio system decoder 205, and outputs the video packet and the audio / video system packet to the video system decoder 209.

音声システムデコーダ２０５は、符号化処理に関する制御情報を含む第２パケット（音声・映像システムパケット）を処理する第２処理部及び第２音声処理部の一例である。音声システムデコーダ２０５は、音声・映像システムパケットに含まれる音声に関する制御情報を取得し、音声パケットを音声コアデコーダ２０７に出力する。音声システムデコーダ２０５は、１以上のプロセッサによって構成される。なお、音声システムデコーダ２０５は、音声・映像システムパケットに含まれる映像に関する制御情報を無視してもよい。 The audio system decoder 205 is an example of a second processing unit and a second audio processing unit that process a second packet (audio / video system packet) including control information related to coding processing. The audio system decoder 205 acquires control information related to audio included in the audio / video system packet, and outputs the audio packet to the audio core decoder 207. The voice system decoder 205 is composed of one or more processors. The audio system decoder 205 may ignore the control information related to the video included in the audio / video system packet.

音声コアデコーダ２０７は、第１パケット（音声パケット）に関する復号処理を実行する第１処理部及び第１音声処理部の一例である。音声コアデコーダ２０７は、１以上のプロセッサによって構成される。 The voice core decoder 207 is an example of a first processing unit and a first voice processing unit that execute decoding processing for the first packet (voice packet). The voice core decoder 207 is composed of one or more processors.

ここで、音声システムデコーダ２０５及び音声コアデコーダ２０７は、音声に関する復号装置２３０を構成してもよい。音声システムデコーダ２０５は、音声コアデコーダ２０７とは独立して設けられる。例えば、音声システムデコーダ２０５は、CPUによって構成されており、音声コアデコーダ２０７は、FPGAによって構成されてもよい。復号装置２３０は、時刻サーバ２１３を含んでもよい。復号装置２３０は、デマルチプレクサ２０３を含んでもよい。 Here, the voice system decoder 205 and the voice core decoder 207 may configure the voice decoding device 230. The voice system decoder 205 is provided independently of the voice core decoder 207. For example, the voice system decoder 205 may be configured by a CPU, and the voice core decoder 207 may be configured by an FPGA. The decoding device 230 may include a time server 213. The decoding device 230 may include a demultiplexer 203.

映像システムデコーダ２０９は、符号化処理に関する制御情報を含む第２パケット（音声・映像システムパケット）を処理する第２処理部及び第２映像処理部の一例である。映像システムデコーダ２０９は、音声・映像システムパケットに含まれる映像に関する制御情報を取得し、映像パケットを映像コアデコーダ２１１に出力する。映像システムデコーダ２０９は、１以上のプロセッサによって構成される。なお、映像システムデコーダ２０９は、音声・映像システムパケットに含まれる音声に関する制御情報を無視してもよい。 The video system decoder 209 is an example of a second processing unit and a second video processing unit that process a second packet (audio / video system packet) including control information related to coding processing. The video system decoder 209 acquires control information related to the video included in the audio / video system packet, and outputs the video packet to the video core decoder 211. The video system decoder 209 is composed of one or more processors. The video system decoder 209 may ignore the audio control information included in the audio / video system packet.

映像コアデコーダ２１１は、第１パケット（映像パケット）に関する復号処理を実行する第１処理部及び第１映像処理部の一例である。映像コアデコーダ２１１は、１以上のプロセッサによって構成される。 The video core decoder 211 is an example of a first processing unit and a first video processing unit that execute a decoding process for a first packet (video packet). The video core decoder 211 is composed of one or more processors.

ここで、映像システムデコーダ２０９及び映像コアデコーダ２１１は、映像に関する復号装置２５０を構成してもよい。映像システムデコーダ２０９は、映像コアデコーダ２１１とは独立して設けられる。例えば、映像システムデコーダ２０９は、CPUによって構成されており、映像コアデコーダ２１１は、FPGAによって構成されてもよい。復号装置２５０は、時刻サーバ２１３を含んでもよい。復号装置２５０は、デマルチプレクサ２０３を含んでもよい。 Here, the video system decoder 209 and the video core decoder 211 may configure a decoding device 250 for video. The video system decoder 209 is provided independently of the video core decoder 211. For example, the video system decoder 209 may be configured by a CPU, and the video core decoder 211 may be configured by an FPGA. The decoding device 250 may include a time server 213. The decoding device 250 may include a demultiplexer 203.

なお、復号装置２３０及び復号装置２５０は、１つの復号装置を構成してもよい。 The decoding device 230 and the decoding device 250 may constitute one decoding device.

時刻サーバ２１３は、時刻を管理するサーバである。時刻サーバ２１３は、時刻サーバ１０９と同様に、NTPサーバであってもよく、PTPサーバであってもよい。時刻サーバ２１３によって管理される時刻は、上述した送信装置１００の時刻サーバ１０９によって管理される時刻と同期していてもよい。時刻サーバ２１３によって管理される時刻としては、UTC時刻が用いられてもよい。時刻サーバ２１３によって管理される時刻は、絶対時刻と称されてもよい。 The time server 213 is a server that manages the time. Like the time server 109, the time server 213 may be an NTP server or a PTP server. The time managed by the time server 213 may be synchronized with the time managed by the time server 109 of the transmission device 100 described above. UTC time may be used as the time managed by the time server 213. The time managed by the time server 213 may be referred to as an absolute time.

実施形態では、音声システムデコーダ２０５及び音声コアデコーダ２０７が独立しているため、時刻サーバ２１３は、音声コアデコーダ２０７に接続されず、音声システムデコーダ２０５に接続されることに留意すべきである。同様に、映像システムデコーダ２０９及び映像コアデコーダ２１１が独立しているため、時刻サーバ２１３は、映像コアデコーダ２１１に接続されず、映像システムデコーダ２０９に接続されることに留意すべきである。 It should be noted that in the embodiment, since the voice system decoder 205 and the voice core decoder 207 are independent, the time server 213 is not connected to the voice core decoder 207 but is connected to the voice system decoder 205. Similarly, it should be noted that since the video system decoder 209 and the video core decoder 211 are independent, the time server 213 is not connected to the video core decoder 211 but is connected to the video system decoder 209.

（復号装置の詳細）
以下において、復号装置の詳細について説明する。図５は、復号装置２３０及び復号装置２５０の詳細について説明するための図である。 (Details of decryption device)
The details of the decoding device will be described below. FIG. 5 is a diagram for explaining the details of the decoding device 230 and the decoding device 250.

図５の上段に示すように、デマルチプレクサ２０３は、音声パケット、映像パケット及び音声・映像システムパケットを含む伝送ストリームを受信部２０１から取得する。デマルチプレクサ２０３は、音声パケット及び音声・映像システムパケットを音声システムデコーダ２０５に出力する。デマルチプレクサ２０３は、映像パケット及び音声・映像システムパケットを映像システムデコーダ２０９に出力する。 As shown in the upper part of FIG. 5, the demultiplexer 203 acquires a transmission stream including an audio packet, a video packet, and an audio / video system packet from the receiving unit 201. The demultiplexer 203 outputs an audio packet and an audio / video system packet to the audio system decoder 205. The demultiplexer 203 outputs a video packet and an audio / video system packet to the video system decoder 209.

図５の中段に示すように、音声システムデコーダ２０５は、音声に関する制御情報を含む音声・映像システムパケットを処理する。音声システムデコーダ２０５は、音声パケットを音声コアデコーダ２０７に出力する。上述したように、制御情報は、MPTと称されてもよい。音声システムデコーダ２０５は、IPパケットに格納されたMMTPパケットを取得し、MMTPパケットに格納されたMPTを取得する。 As shown in the middle of FIG. 5, the audio system decoder 205 processes audio / video system packets containing control information related to audio. The voice system decoder 205 outputs voice packets to the voice core decoder 207. As mentioned above, the control information may be referred to as MPT. The voice system decoder 205 acquires the MMTP packet stored in the IP packet and acquires the MPT stored in the MMTP packet.

ここで、音声システムデコーダ２０５は、音声パケット（MMTPパケット）の拡張ヘッダの少なくとも一部を追加してもよい。例えば、１つのオーディオフレームに含まれる音声信号のサンプル数を指定する情報要素が拡張ヘッダから送信装置１００で削除された場合に、音声システムデコーダ２０５は、音声信号のサンプル数を指定する情報要素を拡張ヘッダに追加してもよい。音声システムデコーダ２０５は、拡張ヘッダの少なくとも一部が追加された音声パケット（MMTPパケット）を出力してもよい。例えば、１つのオーディオフレームに1024個の音声サンプルが含まれる場合に、音声システムデコーダ２０５は、拡張ヘッダに1024を追加（格納）する。 Here, the voice system decoder 205 may add at least a part of the extension header of the voice packet (MMTP packet). For example, when an information element that specifies the number of audio signal samples included in one audio frame is deleted from the extension header by the transmission device 100, the audio system decoder 205 provides an information element that specifies the number of audio signal samples. It may be added to the extension header. The voice system decoder 205 may output a voice packet (MMTP packet) to which at least a part of the extension header is added. For example, if one audio frame contains 1024 audio samples, the audio system decoder 205 adds (stores) 1024 to the extension header.

なお、１つのオーディオフレームに含まれる音声サンプルの数は、放送サービス毎に予め定められていてもよい。すなわち、音声システムデコーダ２０５は、放送サービスによって音声サンプルの数を特定することが可能であり、放送サービスに応じて音声サンプルの数を追加（格納）してもよい。或いは、音声システムデコーダ２０５は、オーディオフレームに含まれる音声サンプルに相当するMHASパケットについて音声サンプルをカウントすることによって、カウントされた音声サンプルの数を追加（格納）してもよい。 The number of audio samples included in one audio frame may be predetermined for each broadcasting service. That is, the audio system decoder 205 can specify the number of audio samples by the broadcasting service, and the number of audio samples may be added (stored) according to the broadcasting service. Alternatively, the voice system decoder 205 may add (store) the number of counted voice samples by counting the voice samples for the MHAS packet corresponding to the voice sample contained in the audio frame.

さらに、音声システムデコーダ２０５は、音声・映像システムパケットに含まれる「時刻を示す情報要素」（例えば、MPUタイムスタンプ記述子）に基づいて、音声パケットを音声コアデコーダ２０７に出力するタイミングを調整してもよい。例えば、音声システムデコーダ２０５は、時刻サーバ２１３から取得する時刻とMPUタイムスタンプ記述子によって特定される時刻とを比較することによって、音声パケットを音声コアデコーダ２０７に出力するタイミングを調整してもよい。具体的には、音声コアデコーダ２０７にMHASパケットを入力し、そのMHASパケットを復号後に音声を出力するまでの遅延をTとしたとき、音声システムデコーダ２０５は、MPUタイムスタンプ記述子によって特定される時刻よりTだけ以前の時刻に、MHASパケットを音声コアデコーダ２０７に出力する。或いは、音声システムデコーダ２０５は、直前のMPUに対応するMPUタイムスタンプ記述子によって特定される時刻（T1）と現MPUに対応するMPUタイムスタンプ記述子によって特定される時刻（T2）とを比較することによって、音声パケットを音声コアデコーダ２０７に出力するタイミングを調整してもよい。具体的には、直前のMPUに対応する最初のMHASパケットを音声コアデコーダ２０７に出力してから、T2-T1の時間が経過後に後続のMPUに対応する最初のMHASパケットを音声コアデコーダ２０７に出力する。 Further, the audio system decoder 205 adjusts the timing of outputting the audio packet to the audio core decoder 207 based on the "information element indicating the time" (for example, the MPU time stamp descriptor) included in the audio / video system packet. You may. For example, the voice system decoder 205 may adjust the timing of outputting the voice packet to the voice core decoder 207 by comparing the time acquired from the time server 213 with the time specified by the MPU time stamp descriptor. .. Specifically, when an MHAS packet is input to the voice core decoder 207 and the delay from decoding the MHAS packet to outputting the voice is T, the voice system decoder 205 is specified by the MPU time stamp descriptor. The MHAS packet is output to the voice core decoder 207 at a time T or earlier than the time. Alternatively, the voice system decoder 205 compares the time (T1) specified by the MPU time stamp descriptor corresponding to the previous MPU with the time (T2) specified by the MPU time stamp descriptor corresponding to the current MPU. Thereby, the timing of outputting the voice packet to the voice core decoder 207 may be adjusted. Specifically, after the first MHAS packet corresponding to the immediately preceding MPU is output to the voice core decoder 207, the first MHAS packet corresponding to the subsequent MPU is sent to the voice core decoder 207 after the time of T2-T1 has elapsed. Output.

音声コアデコーダ２０７は、音声パケットに関する復号処理を実行することによって音声信号を取得する。復号方式は、ISO/IEC 23008-3（MPEG-H Audio）に準拠する方式であってもよい。音声コアデコーダ２０７は、IPパケットに格納されたMMTPパケットを取得し、MMTPパケットに格納されたMHASパケットを取得する。 The voice core decoder 207 acquires a voice signal by executing a decoding process relating to the voice packet. The decoding method may be a method conforming to ISO / IEC 23008-3 (MPEG-H Audio). The voice core decoder 207 acquires the MMTP packet stored in the IP packet and acquires the MHAS packet stored in the MMTP packet.

図５の下段に示すように、映像システムデコーダ２０９は、映像に関する制御情報を含む音声・映像システムパケットを処理する。映像システムデコーダ２０９は、映像パケットを映像コアデコーダ２１１に出力する。上述したように、制御情報は、MPTと称されてもよい。映像システムデコーダ２０９は、IPパケットに格納されたMMTPパケットを取得し、MMTPパケットに格納されたMPTを取得する。 As shown in the lower part of FIG. 5, the video system decoder 209 processes audio / video system packets including control information related to video. The video system decoder 209 outputs a video packet to the video core decoder 211. As mentioned above, the control information may be referred to as MPT. The video system decoder 209 acquires the MMTP packet stored in the IP packet and acquires the MPT stored in the MMTP packet.

さらに、映像システムデコーダ２０９は、音声・映像システムパケットに含まれる「時刻を示す情報要素」（例えば、MPUタイムスタンプ記述子）に基づいて、映像パケットを映像コアデコーダ２１１に出力するタイミングを調整してもよい。例えば、映像システムデコーダ２０９は、時刻サーバ２１３から取得する時刻とMPUタイムスタンプ記述子によって特定される時刻とを比較することによって、映像パケットを映像コアデコーダ２１１に出力するタイミングを調整してもよい。具体的には、映像コアデコーダ２１１にMPU先頭のNALユニットを入力し、そのMPUに属する映像フレームを出力するまでの遅延をT'としたとき、映像システムデコーダ２０９は、MPUタイムスタンプ記述子によって特定される時刻よりT'だけ以前の時刻にNALユニットを映像コアデコーダ２１１に出力する。或いは、映像システムデコーダ２０９は、直前のMPUに対応するMPUタイムスタンプ記述子によって特定される時刻（T1’）と現MPUに対応するMPUタイムスタンプ記述子によって特定される時刻(T2’)とを比較することによって、映像パケットを映像コアデコーダ２１１に出力するタイミングを調整してもよい。具体的には、直前のMPUに対応する最初のNALユニットを映像コアデコーダ２１１に出力してから、T2'-T1'の時間が経過後に、後続のMPUに対応する最初のNALユニットを映像コアデコーダ２１１に出力する。 Further, the video system decoder 209 adjusts the timing of outputting the video packet to the video core decoder 211 based on the "information element indicating the time" (for example, the MPU time stamp descriptor) included in the audio / video system packet. You may. For example, the video system decoder 209 may adjust the timing of outputting the video packet to the video core decoder 211 by comparing the time acquired from the time server 213 with the time specified by the MPU time stamp descriptor. .. Specifically, when the NAL unit at the head of the MPU is input to the video core decoder 211 and the delay until the video frame belonging to the MPU is output is set to T', the video system decoder 209 uses the MPU time stamp descriptor. The NAL unit is output to the video core decoder 211 at a time T'before the specified time. Alternatively, the video system decoder 209 sets the time specified by the MPU time stamp descriptor corresponding to the immediately preceding MPU (T1') and the time specified by the MPU time stamp descriptor corresponding to the current MPU (T2'). By comparing, the timing of outputting the video packet to the video core decoder 211 may be adjusted. Specifically, after the first NAL unit corresponding to the immediately preceding MPU is output to the video core decoder 211 and the time of T2'-T1'elapses, the first NAL unit corresponding to the subsequent MPU is output to the video core. Output to the decoder 211.

映像コアデコーダ２１１は、映像パケットに関する復号処理を実行することによって映像信号を取得する。復号方式は、ISO/IEC23008-2（HEVC；High Efficiency Video Coding）に準拠する方式であってもよい。復号方式は、ISO/IEC 23090-3（VVC；Versatile Video Coding）に準拠する方式であってもよい。映像コアデコーダ２１１は、IPパケットに格納されたMMTPパケットを取得し、MMTPパケットに格納されたNALユニットを取得する。 The video core decoder 211 acquires a video signal by executing a decoding process related to the video packet. The decoding method may be a method conforming to ISO / IEC23008-2 (HEVC; High Efficiency Video Coding). The decoding method may be a method conforming to ISO / IEC 23090-3 (VVC; Versatile Video Coding). The video core decoder 211 acquires the MMTP packet stored in the IP packet and acquires the NAL unit stored in the MMTP packet.

（MMTPパケット）
以下において、MMTPパケットについて説明する。上述したように、MMTPパケットは、音声パケットの一例であってもよく、映像パケットの一例であってもよい。ここでは、音声パケットについて例示する。 (MMTP packet)
The MMTP packet will be described below. As described above, the MMTP packet may be an example of an audio packet or an example of a video packet. Here, a voice packet will be illustrated.

図６に示すように、MMTでは、オーディオフレーム、MPU（Media Processing Unit）、MFU（Media Fragment Unit）などの単位が規定されている。 As shown in FIG. 6, in MMT, units such as an audio frame, an MPU (Media Processing Unit), and an MFU (Media Fragment Unit) are defined.

オーディオフレームは、音声信号の符号化処理の最小単位である。オーディオフレームは、典型的には1024個の音声サンプルを含んでもよい。 An audio frame is the smallest unit of audio signal coding processing. The audio frame may typically contain 1024 audio samples.

MPUは、MMTにおけるデータの処理単位である。MPUは、所定長（例えば、500msec程度）のオーディオフレームを含んでもよい。所定長は、映像信号に関する処理単位（Movie Fragment）の時間長と同程度であってもよい。 MPU is a data processing unit in MMT. The MPU may include an audio frame having a predetermined length (for example, about 500 msec). The predetermined length may be about the same as the time length of the processing unit (Movie Fragment) related to the video signal.

MFUは、MMTPパケットによって伝送されるデータの単位である。MFUは、MPUを分割することによって生成されてもよい。MFUは、MPUの生成を省略して、AU（Access Unit）又はNALユニットから直接的に生成されてもよい。上述したMHASパケットは、MFUとして扱われてもよい。 MFU is a unit of data transmitted by an MMTP packet. The MFU may be generated by splitting the MPU. The MFU may be generated directly from the AU (Access Unit) or NAL unit, omitting the generation of the MPU. The above-mentioned MHAS packet may be treated as an MFU.

MMTPパケットは、1個以上のMFUを含む。MFUのサイズが大きい場合には、1個のMFUから2個以上のMTTPパケットが生成されてもよい。MFUのサイズが小さい場合には、2個以上のMFUから1個のMTTPパケットが生成されてもよい。 An MMTP packet contains one or more MFUs. If the size of the MFU is large, one MFU may generate two or more MTTP packets. If the size of the MFU is small, one MTTP packet may be generated from two or more MFUs.

（MPT）
以下において、MPTについて説明する。上述したように、MPTは、制御情報の一例である。ここでは、音声信号に関するMPTについて例示する。 (MPT)
The MPT will be described below. As mentioned above, MPT is an example of control information. Here, an example of MPT related to an audio signal is given.

図７に示すように、MPTは、table_id、version、MPT_mode、MMT_package_id_length、MMT_package_idを含んでもよい。 As shown in FIG. 7, the MPT may include table_id, version, MPT_mode, MMT_package_id_length, and MMT_package_id.

table_idは、MPTの構成を表す識別子を格納するフィールドである。versionは、MPTのバージョンを格納するフィールドである。MPT_modeは、MPTがサブセットに分割されているときの動作を格納するフィールドである。MMT_package_id_lengthは、MMT_package_idの長さを格納するフィールドである。MMT_package_idは、パッケージを識別する値を格納するフィールドである。 table_id is a field that stores an identifier that represents the configuration of MPT. version is a field that stores the version of MPT. MPT_mode is a field that stores the behavior when the MPT is divided into subsets. MMT_package_id_length is a field that stores the length of MMT_package_id. MMT_package_id is a field that stores a value that identifies the package.

さらに、MPTは、アセット情報を格納するフィールド（以下、アセット情報ループ）を含む。図８に示すように、アセット情報ループは、identifier_type、asset_id_scheme、asset_id_length、asset_id、asset_type、asset_check_relation_flag、location_countを含む。 Further, the MPT includes a field for storing asset information (hereinafter referred to as an asset information loop). As shown in FIG. 8, the asset information loop includes identifier_type, asset_id_scheme, asset_id_length, asset_id, asset_type, asset_check_relation_flag, location_count.

identifier_typeは、MMTPパケットフローのID体系を格納するフィールドである。asset_id_schemeは、アセットIDの形式を格納するフィールドである。asset_id_lengthは、asset_idの長さを格納するフィールドである。asset_idは、アセットIDを格納するフィールドである。asset_typeは、アセットの種類を格納するフィールドである。asset_clock_relation_flagは、アセットのクロック情報フィールドの有無を格納するフィールドである。location_countは、アセットのロケーションを格納するフィールドである。 identifier_type is a field that stores the ID system of the MMTP packet flow. asset_id_scheme is a field that stores the format of the asset ID. asset_id_length is a field that stores the length of asset_id. asset_id is a field that stores the asset ID. asset_type is a field that stores the type of asset. asset_clock_relation_flag is a field that stores the presence or absence of the clock information field of the asset. location_count is a field that stores the location of the asset.

このような背景下において、アセット情報ループは、図８に示すフィールドに加えて、アセット記述子を格納する領域（asset_descriptors_byte）を含む。アセット記述子を格納する領域は、上述したMPUタイムスタンプ記述子を格納することが可能である。MPUタイムスタンプ記述子は、MPUのシーケンス番号の変更に応じて更新される。 Under such a background, the asset information loop includes an area (asset_descriptors_byte) for storing the asset descriptor in addition to the field shown in FIG. The area for storing the asset descriptor can store the MPU time stamp descriptor described above. The MPU timestamp descriptor is updated in response to changes in the MPU sequence number.

さらに、アセット情報ループは、MPUを提示する位置を提供するMPU提示領域指定記述子を含んでもよい。上述したMPU毎のシーケンス番号は、MPU提示領域指定記述子に含まれてもよい。 In addition, the asset information loop may include an MPU presentation area specification descriptor that provides a location to present the MPU. The above-mentioned sequence number for each MPU may be included in the MPU presentation area specification descriptor.

（MMTPパケットのヘッダ）
以下において、MMTPパケットのヘッダについて説明する。図１０に示すように、MMTPパケットのヘッダは、Version、packet_counter_flag、FEC_Type、extension_flag、RAP_flag、packet_id、timestamp、packet_sequence_number、extension headerを含む。 (Header of MMTP packet)
The header of the MMTP packet will be described below. As shown in FIG. 10, the header of the MMTP packet includes Version, packet_counter_flag, FEC_Type, extension_flag, RAP_flag, packet_id, timestamp, packet_sequence_number, extension header.

Versionは、MMTPのバージョン番号を格納するフィールドである。packet_counter_flagは、パケットカウンタが存在するか否かを示すフラグを格納するフィールドである。FEC_Typeは、MMTPパケットの誤り訂正に関する情報を格納するフィールドである。extension_flagは、MMTPパケットのヘッダ拡張を行うか否かを示すフラグを格納するフィールドである。RAP_flagは、MMTペイロードがランダムアクセスポイントの先頭を含むか否かを示すフラグを格納するフィールドである。packet_idは、MMTPパケットのペイロードのデータを識別する識別子を格納するフィールドである。timestampは、MMTPパケットの先頭のバイトが送信エンティティから送信される時刻を格納するフィールドである。例えば、時刻は、NTPタイムスタンプによって表される。packet_sequence_numberは、同一のパケット識別子を有するMMTPパケットの順序を格納するフィールドである。extension headerは、MTTPパケットのヘッダ拡張の種別を格納するフィールドである。 Version is a field that stores the version number of MMTP. packet_counter_flag is a field that stores a flag indicating whether or not a packet counter exists. FEC_Type is a field that stores information related to error correction of MMTP packets. extension_flag is a field that stores a flag indicating whether to extend the header of the MMTP packet. RAP_flag is a field that stores a flag indicating whether the MMT payload contains the beginning of a random access point. packet_id is a field that stores an identifier that identifies the data in the payload of an MMTP packet. timestamp is a field that stores the time when the first byte of the MMTP packet is sent by the sending entity. For example, the time is represented by an NTP time stamp. packet_sequence_number is a field that stores the sequence of MMTP packets with the same packet identifier. The extension header is a field that stores the type of header extension for MTTP packets.

（MFUのヘッダ）
以下において、MFUのヘッダについて説明する。図１０に示すように、MFUのヘッダは、MPU Fragment Type、Timed Flag、Fragmentation Indicator、aggregation_flag、fragment_counter、MPU_sequence_number、movie_fragment_sequence_number、offset、priority、dep_counterを含む。 (MFU header)
The header of MFU will be described below. As shown in FIG. 10, the MFU header includes the MPU Fragment Type, Timed Flag, Fragmentation Indicator, aggregation_flag, fragment_counter, MPU_sequence_number, movie_fragment_sequence_number, offset, priority, dep_counter.

MPU Fragment Typeは、MMTPのペイロードに格納する情報のフラグメントの種別を格納するフィールドである。Timed Flagは、MMTPのペイロードに格納するデータが提示時間を指定するか否かを示すフラグを格納するフィールドである。Fragmentation Indicatorは、MMTPのペイロードに格納するデータの分割状態を格納するフィールドである。aggregation_flagは、MMTPのペイロードに2以上のデータが格納されているか否かを示すフラグを格納するフィールドである。fragment_counterは、MMTPパケットのペイロードに格納されたデータよりも後に存在する分割データの数を格納するフィールドである。MPU_sequence_numberは、MMTPのペイロードにMPUメタデータ、ムービーフラグメントメタデータ又はMFUが格納される場合に、これらのデータの属するMPUのシーケンス番号を格納するフィールドである。movie_fragment_sequence_numberは、MFUが属するムービーフラグメントのシーケンス番号を格納するフィールドである。offsetは、MFUが属するMPUにおけるMFUのオフセットを格納するフィールドである。priorityは、MFUが属するMPUにおけるMFUの相対的な重要度を格納するフィールドである。dep_counterは、MFUを復号しなければ復号できないMFUの数を格納するフィールドである。 MPU Fragment Type is a field that stores the type of fragment of information stored in the payload of MMTP. Timed Flag is a field that stores a flag indicating whether or not the data stored in the payload of MMTP specifies the presentation time. The Fragmentation Indicator is a field that stores the split state of the data stored in the MMTP payload. aggregation_flag is a field that stores a flag indicating whether or not two or more data are stored in the MMTP payload. fragment_counter is a field that stores the number of fragmented data that exist after the data stored in the payload of the MMTP packet. The MPU_sequence_number is a field that stores the sequence number of the MPU to which the MPU metadata, the movie fragment metadata, or the MFU belongs when the MPU metadata, the movie fragment metadata, or the MFU is stored in the MMTP payload. movie_fragment_sequence_number is a field that stores the sequence number of the movie fragment to which the MFU belongs. offset is a field that stores the MFU offset in the MPU to which the MFU belongs. priority is a field that stores the relative importance of the MFU in the MPU to which the MFU belongs. dep_counter is a field that stores the number of MFUs that cannot be decrypted without decrypting the MFUs.

（作用及び効果）
実施形態では、符号化装置１３０又は符号化装置１５０において、制御情報を含む第２パケット（音声・映像システムパケット）を処理する第２処理部（音声システムエンコーダ１０３又は映像システムエンコーダ１０７）は、ビットストリームを含む第１パケットに関する符号化処理を実行する第１処理部（音声コアエンコーダ１０１又は映像コアエンコーダ１０５）とは独立して設けられる。このような構成によれば、第１処理部については、FPGAによって構成し、第２処理部についてはCPUによって構成するなどのように、第１パケット及び第２パケットを適切に処理するための構成を実装することができる。 (Action and effect)
In the embodiment, in the coding device 130 or the coding device 150, the second processing unit (audio system encoder 103 or video system encoder 107) that processes the second packet (audio / video system packet) containing the control information is a bit. It is provided independently of the first processing unit (audio core encoder 101 or video core encoder 105) that executes coding processing for the first packet including the stream. According to such a configuration, the first processing unit is configured by FPGA, the second processing unit is configured by CPU, and so on, so that the first packet and the second packet are appropriately processed. Can be implemented.

実施形態では、復号装置２３０又は復号装置２５０において、制御情報を含む第２パケット（音声・映像システムパケット）を処理する第２処理部（音声システムデコーダ２０５又は映像システムデコーダ２０９）は、ビットストリームを含む第１パケットに関する復号処理を実行する第１処理部（音声コアデコーダ２０７又は映像コアデコーダ２１１）とは独立して設けられる。このような構成によれば、第１処理部については、FPGAによって構成し、第２処理部についてはCPUによって構成するなどのように、第１パケット及び第２パケットを適切に処理するための構成を実装することができる。 In the embodiment, in the decoding device 230 or the decoding device 250, the second processing unit (voice system decoder 205 or video system decoder 209) that processes the second packet (voice / video system packet) containing the control information performs a bit stream. It is provided independently of the first processing unit (audio core decoder 207 or video core decoder 211) that executes the decoding process for the included first packet. According to such a configuration, the first processing unit is configured by FPGA, the second processing unit is configured by CPU, and so on, so that the first packet and the second packet are appropriately processed. Can be implemented.

実施形態では、制御情報は、音声信号又は映像信号の復号後のビットストリームを出力する時刻を示す情報要素を含む。このような構成によれば、第２処理部が第１処理部と独立して設けられることを前提とした場合に、第２処理部（音声システムデコーダ２０５又は映像システムデコーダ２０９）が音声パケット又は映像パケットを出力するタイミングを調整することができる。従って、第１処理部（音声コアデコーダ２０７又は映像コアデコーダ２１１）は、同期を意識することなく、音声パケット又は映像パケットを直ちにデコードすることができ、復号装置２３０又は復号装置２５０の構成を全体として単純化することができる。 In the embodiment, the control information includes an information element indicating a time when the bitstream after decoding the audio signal or the video signal is output. According to such a configuration, assuming that the second processing unit is provided independently of the first processing unit, the second processing unit (audio system decoder 205 or video system decoder 209) may be an audio packet or The timing of outputting video packets can be adjusted. Therefore, the first processing unit (audio core decoder 207 or video core decoder 211) can immediately decode the audio packet or the video packet without being aware of synchronization, and the entire configuration of the decoding device 230 or the decoding device 250 can be configured. Can be simplified as.

［その他の実施形態］
本発明は上述した開示によって説明したが、この開示の一部をなす論述及び図面は、この発明を限定するものであると理解すべきではない。この開示から当業者には様々な代替実施形態、実施例及び運用技術が明らかとなろう。 [Other embodiments]
Although the invention has been described by the disclosure described above, the statements and drawings that form part of this disclosure should not be understood to limit the invention. This disclosure will reveal to those skilled in the art various alternative embodiments, examples and operational techniques.

上述した開示では特に触れていないが、MMTに関する用語は、ISO/IEC 23008-1、ARIB STD-B60、ARIB TR-B39などで規定された内容に基づいて解釈されてもよい。 Although not specifically mentioned in the above disclosure, terms relating to MMT may be interpreted based on the contents specified in ISO / IEC 23008-1, ARIB STD-B60, ARIB TR-B39 and the like.

上述した開示では、音声信号又は映像信号の復号後のビットストリームを出力する時刻を示す情報要素として、MPUタイムスタンプ記述子を例示した。しかしながら、上述した開示はこれに限定されるものではない。ビットストリームを出力する時刻を示す情報要素は、MPU拡張タイムスタンプ記述子であってもよい。 In the above disclosure, the MPU time stamp descriptor is exemplified as an information element indicating the time when the bitstream after decoding the audio signal or the video signal is output. However, the above disclosure is not limited to this. The information element indicating the time when the bitstream is output may be an MPU extended time stamp descriptor.

上述した開示では、音声システムパケット及び映像システムパケットが１つの音声・映像システムパケットにマージされるケースについて例示した。しかしながら、上述した開示はこれに限定されるものではない。音声システムパケット及び映像システムパケットは、１つの音声・映像システムパケットにマージされることなく、伝送ストリームとして伝送されてもよい。このようなケースにおいて、マルチプレクサ１１１は、音声パケット、映像パケット、音声システムパケット及び映像システムパケットを多重してもよい。同様に、デマルチプレクサ２０３は、音声パケット、映像パケット、音声システムパケット及び映像システムパケットを分離してもよい。 In the above disclosure, the case where the audio system packet and the video system packet are merged into one audio / video system packet has been exemplified. However, the above disclosure is not limited to this. The voice system packet and the video system packet may be transmitted as a transmission stream without being merged into one voice / video system packet. In such a case, the multiplexer 111 may multiplex the audio packet, the video packet, the audio system packet and the video system packet. Similarly, the demultiplexer 203 may separate audio packets, video packets, audio system packets and video system packets.

上述した開示では、音声信号又は映像信号の復号後のビットストリームを出力する時刻が時刻サーバ１０９及び時刻サーバ２１３から取得されるケースについて例示した。しかしながら、実施形態はこれに限定されるものではない。音声信号又は映像信号の復号後のビットストリームを出力する時刻は、ARIB STD-B60、ARIB TR-B39などで規定されているように、NTPパケットを放送波で伝送することによって取得されてもよい。 In the above-mentioned disclosure, the case where the time to output the bitstream after decoding the audio signal or the video signal is acquired from the time server 109 and the time server 213 is exemplified. However, the embodiments are not limited to this. The time to output the bitstream after decoding the audio signal or video signal may be acquired by transmitting an NTP packet as a broadcast wave as specified by ARIB STD-B60, ARIB TR-B39, and the like. ..

上述した開示では特に触れていないが、送信装置１００及び受信装置２００が行う各処理をコンピュータに実行させるプログラムが提供されてもよい。また、プログラムは、コンピュータ読取り可能媒体に記録されていてもよい。コンピュータ読取り可能媒体を用いれば、コンピュータにプログラムをインストールすることが可能である。ここで、プログラムが記録されたコンピュータ読取り可能媒体は、非一過性の記録媒体であってもよい。非一過性の記録媒体は、特に限定されるものではないが、例えば、CD-ROMやDVD-ROM等の記録媒体であってもよい。 Although not specifically mentioned in the above disclosure, a program may be provided that causes a computer to execute each process performed by the transmitting device 100 and the receiving device 200. The program may also be recorded on a computer-readable medium. Computer-readable media can be used to install programs on a computer. Here, the computer-readable medium on which the program is recorded may be a non-transient recording medium. The non-transient recording medium is not particularly limited, but may be, for example, a recording medium such as a CD-ROM or a DVD-ROM.

或いは、送信装置１００及び受信装置２００が行う各処理を実行するためのプログラムを記憶するメモリ及びメモリに記憶されたプログラムを実行するプロセッサによって構成されるチップが提供されてもよい。 Alternatively, a chip composed of a memory for storing a program for executing each process performed by the transmitting device 100 and the receiving device 200 and a processor for executing the program stored in the memory may be provided.

１０…伝送システム、１００…送信装置、１０１…音声コアエンコーダ、１０３…音声システムエンコーダ、１０５…映像コアエンコーダ、１０７…映像システムエンコーダ、１０９…時刻サーバ、１１１…マルチプレクサ、１１３…送信部、２００...受信装置、２０１…受信部、２０３…デマルチプレクサ、２０５…音声システムデコーダ、２０７…音声コアデコーダ、２０９…映像システムデコーダ、２１１…映像コアデコーダ、２１３…時刻サーバ 10 ... transmission system, 100 ... transmitter, 101 ... voice core encoder, 103 ... voice system encoder, 105 ... video core encoder, 107 ... video system encoder, 109 ... time server, 111 ... multiplexer, 113 ... transmitter, 200. .. Receiver, 201 ... Receiver, 203 ... Demultiplexer, 205 ... Audio system decoder, 207 ... Audio core decoder, 209 ... Video system decoder, 211 ... Video core decoder, 213 ... Time server

Claims

A first processing unit that executes coding processing for a first packet including at least one bitstream of an audio signal and a video signal, and a first processing unit.
A second processing unit for processing a second packet including control information related to the coding process is provided.
The second processing unit is a coding device provided independently of the first processing unit.

The first processing unit outputs the first packet to the second processing unit.
The coding device according to claim 1, wherein the second processing unit outputs the first packet and the second packet.

The coding device according to claim 1 or 2, wherein the control information includes an information element indicating a time when a bit stream after decoding of the first packet is output.

The coding device according to any one of claims 1 to 3, wherein the control information includes an information element indicating a sequence number corresponding to the first packet.

The second processing unit deletes at least a part of the extension header of the first packet and outputs the first packet. Either claim 2, claim 3 or claim 4 quoting claim 2. The coding apparatus according to claim 1.

The first processing unit includes a first audio processing unit corresponding to the audio signal and a first video processing unit corresponding to the video signal.
The second aspect of claim 1 to claim 5, wherein the second processing unit includes a second audio processing unit corresponding to the audio signal and a second video processing unit corresponding to the video signal. Encoding device.