CN111614976B

CN111614976B - Transmission device, reception device, transmission method, and reception method

Info

Publication number: CN111614976B
Application number: CN202010428465.2A
Authority: CN
Inventors: 井口贺敬; 远间正真
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2014-09-12
Filing date: 2015-09-02
Publication date: 2022-05-24
Anticipated expiration: 2035-09-02
Also published as: JP7361288B2; WO2016038851A1; JP7054828B2; CN111614976A; JP2022069536A; JP2021122110A; JP2023164690A

Abstract

A transmitting apparatus, a receiving apparatus, a transmitting method and a receiving method. The present invention is a wireless communication device including a generation unit that generates a coded stream by performing rate control so as to satisfy a specification of a reception buffer model, and a transmission unit that transmits a transport packet obtained by packetizing the coded stream, wherein the reception buffer model includes: a first buffer for converting a packet having a variable-length header stored in a received transport packet into a first packet having a fixed-length header expanded by a header; a second buffer converting the first packet into a second packet having a variable-length payload; a third buffer converting the second packet into a NAL unit; and a fourth buffer for outputting, to the decoder, an access unit generated from the plurality of accumulated NAL units at a timing of a decoding time corresponding to the access unit.

Description

Transmission device, reception device, transmission method, and reception method

This application is a divisional application of the original application entitled "transmitting apparatus, receiving apparatus, transmitting method, and receiving method" with an application date of 2015, 9/2 and an application number of 201580047734.8.

Technical Field

The application relates to a transmitting device, a receiving device, a transmitting method and a receiving method.

Background

With the advanced development of broadcasting and communication services, ultra-high-definition moving image content such as 8K (7680 × 4320 pixels: hereinafter also referred to as "8K 4K") and 4K (3840 × 2160 pixels: hereinafter also referred to as "4K 2K") has been studied and introduced. The receiving apparatus needs to decode and display encoded data of a received ultra-high definition moving picture in real time, and in particular, the processing load at the time of decoding a moving picture with a resolution of 8K or the like is large, and it is difficult to decode such a moving picture in real time by 1 decoder. Therefore, a method of performing decoding processing in parallel by using a plurality of decoders to reduce the processing load per 1 decoder and achieve real-time processing has been studied.

The encoded data is multiplexed based on a multiplexing scheme such as MPEG-2TS (Transport Stream) or MMT (MPEG Media Transport) and then transmitted. For example, non-patent document 1 discloses a technique for transmitting encoded media data packet by packet in accordance with MMT.

Prior art documents

Non-patent document

Non-patent document 1: information technology-High efficiency coding and media delivery in heterologous environment-Part 1: MPEG Media Transport (MMT), ISO/IEC DIS 23008-1

Disclosure of Invention

A transmission device according to an aspect of the present application includes: a generation unit that generates a coded stream by performing rate control so as to satisfy a specification of a reception buffer model predetermined to guarantee a buffering operation of a reception device; and a transmitting unit that packetizes the generated encoded stream and transmits a transport packet obtained by the packetization; the transport packet is composed of a fixed-length header and a variable-length payload, and the reception buffer model includes: a first buffer that receives the transport packet, converts a packet, which is stored in the received transport packet and is composed of a variable-length packet header and a variable-length payload, into a first packet having a fixed-length packet header expanded by a header, and outputs the first packet obtained by the conversion at a fixed bit rate; a second buffer which converts the first packet to be output into a second packet including a header and a variable-length payload, and outputs the second packet obtained by the conversion at a fixed bit rate; a third buffer converting the outputted second packet into a NAL unit, and outputting the NAL unit obtained by the conversion at a fixed bit rate; and a fourth buffer for sequentially accumulating the outputted NAL units, generating an access unit from the accumulated NAL units, and outputting the generated access unit to a decoder at a timing of a decoding time corresponding to the access unit.

These general and specific aspects can be realized by a system, an apparatus, an integrated circuit, a computer program, or a computer-readable recording medium such as a CD-ROM, or any combination of a system, an apparatus, an integrated circuit, a computer program, and a recording medium.

Drawings

Fig. 1 is a diagram showing an example of dividing a picture into slice segments (slice segments).

Fig. 2 is a diagram showing an example of a PES packet sequence in which picture data is stored.

Fig. 3 is a diagram showing an example of dividing a picture according to embodiment 1.

Fig. 4 is a diagram showing an example of dividing a picture according to a comparative example of embodiment 1.

Fig. 5 is a diagram showing an example of data of an access unit according to embodiment 1.

Fig. 6 is a block diagram of a transmitting apparatus according to embodiment 1.

Fig. 7 is a block diagram of a receiving apparatus according to embodiment 1.

Fig. 8 is a diagram showing an example of an MMT packet according to embodiment 1.

Fig. 9 is a diagram showing another example of the MMT packet according to embodiment 1.

Fig. 10 is a diagram showing an example of data input to each decoding unit according to embodiment 1.

Fig. 11 is a diagram showing an example of an MMT packet and header information according to embodiment 1.

Fig. 12 is a diagram showing another example of data input to each decoding unit according to embodiment 1.

Fig. 13 is a diagram showing an example of dividing a picture according to embodiment 1.

Fig. 14 is a flowchart of a transmission method according to embodiment 1.

Fig. 15 is a block diagram of a receiving apparatus according to embodiment 1.

Fig. 16 is a flowchart of a reception method according to embodiment 1.

Fig. 17 is a diagram showing an example of an MMT packet and header information according to embodiment 1.

Fig. 18 is a diagram showing an example of an MMT packet and header information according to embodiment 1.

Fig. 19 is a diagram showing a configuration of an MPU.

Fig. 20 is a diagram showing a structure of MF metadata.

Fig. 21 is a diagram for explaining a data transmission sequence.

Fig. 22 is a diagram showing an example of a method for decoding without using header information.

Fig. 23 is a block diagram of a transmitting apparatus according to embodiment 2.

Fig. 24 is a flowchart of a transmission method according to embodiment 2.

Fig. 25 is a block diagram of a receiving apparatus according to embodiment 2.

Fig. 26 is a flowchart of an operation for specifying the MPU start position and NAL unit position.

Fig. 27 is a flowchart of an operation of acquiring initialization information based on a transmission order type and decoding media data based on the initialization information.

Fig. 28 is a flowchart of the operation of the receiving apparatus when the low delay presentation mode is set.

Fig. 29 is a diagram showing an example of the transmission sequence of the MMT packet when transmitting the auxiliary data.

Fig. 30 is a diagram for explaining an example in which the transmission apparatus generates the auxiliary data based on the structure of moof.

Fig. 31 is a diagram for explaining reception of assistance data.

Fig. 32 is a flowchart of the receiving operation using the auxiliary data.

Fig. 33 is a diagram showing a configuration of an MPU configured by a plurality of movie fragments.

Fig. 34 is a diagram for explaining a transmission procedure of the MMT packet when the MPU having the configuration of fig. 33 is transmitted.

Fig. 35 is a diagram 1 for explaining an operation example of the receiving apparatus when 1 MPU is configured by a plurality of movie fragments.

Fig. 36 is a diagram 2 for explaining an operation example of the receiving apparatus when 1 MPU is configured by a plurality of movie fragments.

Fig. 37 is a flowchart illustrating the operation of the receiving method described with reference to fig. 35 and 36.

Fig. 38 is a diagram showing a case where non-VCL NAL units are aggregated individually as data units.

Fig. 39 is a diagram showing a case where non-VCL NAL units are collectively defined as data units.

Fig. 40 is a flowchart of the operation of the receiving apparatus when packet loss occurs.

Fig. 41 is a flowchart of a receiving operation when the MPU is divided into a plurality of movie fragments.

Fig. 42 is a diagram showing an example of a prediction structure of a picture at each temporalld when temporal adaptability is realized.

Fig. 43 is a diagram showing a relationship between the Decoding Time (DTS) and the display time (PTS) of each picture in fig. 42.

Fig. 44 is a diagram showing an example of a prediction structure of a picture that requires a picture delay process and a picture reordering process.

Fig. 45 is a diagram showing an example in which an MPU configured by an MP4 format is divided into a plurality of movie fragments and stored in an MMTP payload or an MMTP packet.

Fig. 46 is a diagram for explaining a method and problem of calculating PTS and DTS.

Fig. 47 is a flowchart of a reception operation when calculating DTS using information for calculating DTS.

Fig. 48 is a diagram for explaining a method of depositing a data unit in MMT into a payload.

Fig. 49 is a flowchart of the operation of the transmission device according to embodiment 3.

Fig. 50 is a flowchart of the operation of the receiving apparatus according to embodiment 3.

Fig. 51 is a diagram showing an example of a specific configuration of a transmission device according to embodiment 3.

Fig. 52 is a diagram showing an example of a specific configuration of a receiving apparatus according to embodiment 3.

FIG. 53 shows a method of storing non-timed media in an MPU and a method of transferring non-timed media in an MMTP packet.

Fig. 54 is a diagram showing an example in which a plurality of pieces of divided data obtained by dividing a file are packed and transmitted for each piece of divided data.

Fig. 55 is a diagram showing another example in which a plurality of pieces of divided data obtained by dividing a file are packed and transferred for each piece of divided data.

Fig. 56 is a diagram showing the syntax of a loop for each file in the resource management table.

Fig. 57 is a flowchart of an operation of determining a divided data number in the receiving apparatus.

Fig. 58 is a flowchart of an operation of determining the number of divided data in the receiving apparatus.

Fig. 59 is a flowchart of an operation for determining whether to operate a slice counter in the transmission apparatus.

Fig. 60 is a diagram for explaining a method of determining the number of pieces of divided data and the number of pieces of divided data (in the case of using a slice counter).

Fig. 61 is a flowchart of the operation of the transmitting apparatus in the case of using the slice counter.

Fig. 62 is a flowchart of the operation of the receiving apparatus in the case of using the slice counter.

Fig. 63 is a diagram showing a service configuration in a case where the same program is transmitted by a plurality of IP data streams.

Fig. 64 is a diagram showing an example of a specific configuration of a transmitting apparatus.

Fig. 65 is a diagram showing an example of a specific configuration of a receiving apparatus.

Fig. 66 is a flowchart of the operation of the transmission device.

Fig. 67 is a flowchart of the operation of the receiving apparatus.

Fig. 68 is a diagram showing a reception buffer model defined by ARIB STD B-60, particularly a reception buffer model in the case of using only the broadcast transmission path.

Fig. 69 is a diagram showing an example in which a plurality of data units are collectively stored in one payload.

Fig. 70 is a diagram showing an example of a case where a plurality of data units are collectively stored in one payload and a video signal in NAL size format is regarded as one data unit.

Fig. 71 is a diagram showing the structure of the payload of an MMTP packet whose data unit length is not shown.

Fig. 72 is a diagram showing an extended region given to a packet unit.

Fig. 73 is a diagram showing an operation flow of the receiving apparatus.

Fig. 74 is a diagram showing an example of a specific configuration of a transmitting apparatus.

Fig. 75 is a diagram showing an example of a specific configuration of a receiving apparatus.

Fig. 76 is a diagram showing an operation flow of the transmission device.

Fig. 77 is a diagram showing an operation flow of the receiving apparatus.

Description of reference numerals:

15. 100, 300, 500, 700 transmitter

16. 101, 301 encoding unit

17. 102 multiplexing part

18. 104 transmitting part

20. 200, 400, 600, 800 receiving device

21 pack filter part

22 transmission order type discriminating section

23 random access unit

24. 212 control information acquisition unit

25 data acquisition part

26 calculating unit

27 initialization information acquisition unit

28. 206 decoding command part

29. 204A, 204B, 204C, 204D, 402 decoding unit

30 presentation part

201 tuner

202 demodulation section

203 inverse multiplexing part

205 display part

211 type discriminating section

213 slice information acquiring part

214 decoded data generating part

302 imparting part

303. 503, 702 sending part

401. 601, 801 receiving unit

501 division part

502. 603 structural part

602 determination unit

701 generation part

802 first buffer

803 second buffer

804 a third buffer

805 fourth buffer

806 decoding unit

Detailed Description

A transmission device according to an aspect of the present application includes: a generation unit that generates a coded stream by performing rate control so as to satisfy a specification of a reception buffer model predetermined to guarantee a buffering operation of a reception device; and a transmitting unit that packetizes the generated encoded stream and transmits a transport packet obtained by the packetization; the transport packet is composed of a fixed-length header and a variable-length payload, and the reception buffer model includes: a first buffer that receives the transport packet, converts a packet, which is stored in the received transport packet and is composed of a variable-length packet header and a variable-length payload, into a first packet having a fixed-length packet header expanded by a header, and outputs the first packet obtained by the conversion at a fixed bit rate; a second buffer which converts the first packet to be output into a second packet including a header and a variable-length payload, and outputs the second packet obtained by the conversion at a fixed bit rate; a third buffer converting the second packet to be output into a NAL unit, and outputting the NAL unit obtained by the conversion at a fixed bit rate; and a fourth buffer for sequentially accumulating the outputted NAL units, generating an access unit from the accumulated NAL units, and outputting the generated access unit to a decoder at a timing of a decoding time corresponding to the access unit.

When the transmitting device performs data transfer using a system such as MMT, the buffering operation of the receiving device can be ensured.

For example, the bit rate of the first buffer in the reception buffer model at the time of outputting the first packet may be set in accordance with the transmission rate after the packet header of the first packet is extended.

For example, the coded stream may be a stream obtained by storing NAL units in a NAL size format in which a size region of 4 bytes is added to the beginning of the NAL unit in a multiplex layer.

A receiving apparatus according to an aspect of the present application includes: a receiving unit configured to receive a transport packet including a fixed-length header and a variable-length payload; a first buffer that converts a packet, which is stored in the received transport packet and is composed of a variable-length header and a variable-length payload, into a first packet having a fixed-length header expanded by a header, and outputs the first packet obtained by the conversion at a fixed bit rate; a second buffer which converts the first packet to be output into a second packet including a header and a variable-length payload, and outputs the second packet obtained by the conversion at a fixed bit rate; a third buffer converting the outputted second packet into a NAL unit, and outputting the NAL unit obtained by the conversion at a fixed bit rate; a fourth buffer for sequentially accumulating the outputted NAL units, generating an access unit from the accumulated NAL units, and outputting the generated access unit at a timing of a decoding time corresponding to the access unit; and a decoder that decodes the access units output by the fourth buffer.

This receiver can perform decoding operation without underflow or overflow.

A transmission method according to an aspect of the present application includes: a generation step of generating a coded stream by performing rate control so as to satisfy a specification of a reception buffer model predetermined to guarantee a buffering operation of a reception device; and a transmission step of packetizing the generated encoded stream, and transmitting a transport packet obtained by the packetizing; the transport packet is composed of a fixed-length header and a variable-length payload, and the reception buffer model includes: a first buffer that receives the transport packet, converts a packet, which is stored in the received transport packet and is composed of a variable-length packet header and a variable-length payload, into a first packet having a fixed-length packet header expanded by a header, and outputs the first packet obtained by the conversion at a fixed bit rate; a second buffer which converts the first packet to be output into a second packet including a header and a variable-length payload, and outputs the second packet obtained by the conversion at a fixed bit rate; a third buffer converting the outputted second packet into a NAL unit, and outputting the NAL unit obtained by the conversion at a fixed bit rate; and a fourth buffer for sequentially accumulating the outputted NAL units, generating an access unit from the accumulated NAL units, and outputting the generated access unit to a decoder at a timing of a decoding time corresponding to the access unit.

A reception method according to an embodiment of the present application includes: a reception step of receiving a transport packet including a fixed-length header and a variable-length payload; a first conversion step of converting a packet, which is stored in the received transport packet and is composed of a variable-length header and a variable-length payload, into a first packet having a fixed-length header expanded by a header, and outputting the first packet obtained by the conversion at a fixed bit rate; a second conversion step of converting the first packet to be output into a second packet including a fixed-length header and a variable-length payload, and outputting the second packet obtained by the conversion at a fixed bit rate; a third conversion step of converting the second packet to be output into an NAL unit, and outputting the NAL unit obtained by the conversion at a fixed bit rate; a generation step of sequentially accumulating the output NAL units, generating an access unit from the accumulated NAL units, and outputting the generated access unit to a decoder at a timing of a decoding time corresponding to the access unit; and a decoding step of decoding the access unit output from the fourth buffer.

These general and specific aspects may be realized by a system, an apparatus, an integrated circuit, a computer program, or a computer-readable recording medium such as a CD-ROM, or any combination of a system, an apparatus, an integrated circuit, a computer program, or a recording medium.

Hereinafter, embodiments will be described in detail with reference to the drawings.

The embodiments described below are all general or specific examples. The numerical values, shapes, materials, constituent elements, arrangement positions and connection modes of the constituent elements, steps, order of the steps, and the like shown in the following embodiments are merely examples, and are not intended to limit the present application. Among the components of the following embodiments, components that are not recited in the independent claims indicating the highest concept will be described as arbitrary components.

(knowledge forming the basis of the present application)

Recently, high resolution of a display of a television, a smart mobile phone, a tablet terminal, or the like has been increasingly advanced. In particular, the broadcast in japan is scheduled for a service of 8K4K (resolution 8K × 4K) in 2020. In the case of a super-high resolution video such as 8K4K, it is difficult to decode the video in real time by a single decoder, and therefore, a method of performing decoding processing in parallel by a plurality of decoders has been studied.

Since the encoded data is multiplexed based on a multiplexing system such as MPEG-2TS or MMT and transmitted, the receiving apparatus needs to separate the encoded data of the moving image from the multiplexed data before decoding. Hereinafter, the process of separating encoded data from multiplexed data is referred to as "inverse multiplexing".

When decoding processing is performed in parallel, it is necessary to allocate encoded data to be decoded to each decoder. When distributing the encoded data, the encoded data itself needs to be analyzed, and in particular, in the content of 8K or the like, the bit rate is very high, and thus the processing load due to the analysis is heavy. Therefore, there is a problem that the inverse multiplexing portion becomes a bottleneck and real-time reproduction cannot be performed.

In the moving picture coding schemes such as h.264 and h.265 standardized by MPEG and ITU, a transmission apparatus codes a picture by dividing the picture into a plurality of regions called "slices" or "slice segments" and independently decoding each of the divided regions. Therefore, for example, in h.265, a receiving apparatus that receives a broadcast can achieve parallelism of decoding processes by separating data of each slice from received data and outputting the data of each slice to each decoder.

Fig. 1 is a diagram showing an example of dividing 1 picture into 4 slices in HEVC. For example, the receiving apparatus includes 4 decoders, and each decoder decodes any one of the 4 slice segments.

In conventional broadcasting, a transmission apparatus stores 1 picture (access unit in MPEG system standard) in 1 PES packet, and multiplexes the PES packets into a TS packet sequence. Therefore, the receiving device needs to separate the respective clips by parsing the data of the access unit stored in the payload after separating the payload of the PES packet, and output the data of the respective separated clips to the decoder.

However, the present inventors have found that, since the processing amount when analyzing data of an access unit to separate fragments is large, it is difficult to perform the processing in real time.

Fig. 2 is a diagram showing an example in which data of a picture divided into slices is stored in a payload of a PES packet.

As shown in FIG. 2, for example, data of a plurality of slices (slices 1-4) is stored in the payload of 1 PES packet. Further, the PES packets are multiplexed into a TS packet train.

(embodiment mode 1)

Hereinafter, a case where h.265 is used as an example of a coding method for a moving picture will be described, but the present embodiment can be applied to a case where another coding method such as h.264 is used.

Fig. 3 is a diagram showing an example of dividing an access unit (picture) in units of division in the present embodiment. The access unit is divided into 2 equal parts in the horizontal and vertical directions by a function called "tile (tile)" imported by h.265, and divided into 4 tiles in total. Further, the sliced segments create associations with tiles in a one-to-one correspondence.

The reason why the sample is equally divided into 2 parts in the horizontal and vertical directions will be described. First, in decoding, a line memory for storing data of horizontal 1 line is generally required, but if the resolution is ultrahigh, such as 8K4K, the size of the line memory increases because the size in the horizontal direction increases. In the installation of the receiving apparatus, it is preferable to be able to reduce the size of the line memory. In order to reduce the size of the line memory, division in the vertical direction is required. A data structure such as a tile is required in the vertical direction division. Tiles are utilized for these reasons.

On the other hand, since an image generally has a high correlation in the horizontal direction, the encoding efficiency is further improved if a large range in the horizontal direction can be referred to. Therefore, from the viewpoint of coding efficiency, it is preferable to divide the access unit in the horizontal direction.

By equally dividing the access unit by 2 in the horizontal and vertical directions, these 2 characteristics are taken into consideration, and both the installation aspect and the encoding efficiency can be taken into consideration. When a single decoder can decode a 4K2K moving image in real time, the receiving apparatus can decode an 8K4K image in real time by dividing an 8K4K image 4 into equal parts and dividing each of the divided parts into 4K 2K.

Next, the reason why tiles obtained by dividing an access unit in the horizontal and vertical directions are associated with slice segments in a one-to-one correspondence will be described. In h.265, an access unit is composed of a plurality of units called "NAL (Network Adaptation Layer) units".

The payload of a NAL unit stores any one of an access unit delimiter indicating the start position of an access unit, an SPS (Sequence Parameter Set) as initialization Information at the time of decoding commonly used in Sequence units, a PPS (Picture Parameter Set) as initialization Information at the time of decoding commonly used in pictures, an SEI (Supplemental Enhancement Information) that is not necessary for the decoding process itself but is necessary for processing and display of the decoding result, and encoded data of slices. The header of the NAL unit contains type information for identifying data held in the payload.

Here, the transmitting apparatus can set the basic unit as a NAL unit when multiplexing encoded data in a multiplexing format such as MPEG-2TS, MMT (MPEG Media Transport: moving picture experts group Media Transport), MPEG DASH (Dynamic Adaptive Streaming over HTTP), or RTP (Real-time Transport Protocol). In order to store 1 slice segment in 1 NAL unit, when an access unit is divided into areas, it is preferable to divide the access unit in slice segment units. For this reason, the transmitting apparatus creates an association of tiles with slice segments in one-to-one correspondence.

As shown in fig. 4, the transmission apparatus can set 1 tile segment to tile 4 in a group. However, in this case, since all tiles are held in 1 NAL unit, it is difficult for the receiving apparatus to separate the tiles from the multiplexing layer.

In addition, although there are an independent slice that can be independently decoded and a reference slice that refers to the independent slice in the slice segment, a case where the independent slice is used will be described here.

Fig. 5 is a diagram showing an example of data of an access unit divided so that tiles match the boundaries of slice segments as shown in fig. 3. The data of the access unit includes NAL units storing access unit delimiters arranged at the head, NAL units storing SPS, PPS, and SEI arranged at the rear, and data of slices storing data of tiles 1 to 4 arranged at the rear. In addition, the data of the access unit may not contain part or all of the NAL units of SPS, PPS, and SEI.

Next, the configuration of the transmission device 100 according to the present embodiment will be described. Fig. 6 is a block diagram showing an example of the configuration of the transmission device 100 according to the present embodiment. The transmission device 100 includes an encoding unit 101, a multiplexing unit 102, a modulation unit 103, and a transmission unit 104.

The encoding unit 101 encodes the input image, for example, in accordance with h.265, to generate encoded data. Further, for example, as shown in fig. 3, the encoding unit 101 divides an access unit into 4 slices (tiles) and encodes each slice.

The multiplexing unit 102 multiplexes the encoded data generated by the encoding unit 101. The modulation unit 103 modulates the data obtained by multiplexing. The transmitter 104 transmits the modulated data as a broadcast signal.

Next, the configuration of the receiving apparatus 200 according to the present embodiment will be described. Fig. 7 is a block diagram showing an example of the configuration of the receiving apparatus 200 according to the present embodiment. The reception device 200 includes a tuner 201, a demodulation unit 202, an inverse multiplexing unit 203, a plurality of decoding units 204A to 204D, and a display unit 205.

The tuner 201 receives a broadcast signal. The demodulation unit 202 demodulates the received broadcast signal. The demodulated data is input to the inverse multiplexing unit 203.

The inverse multiplexing unit 203 separates the demodulated data into division units, and outputs the data of each division unit to the decoding units 204A to 204D. Here, the division unit means a division region obtained by dividing an access unit, for example, a slice in h.265. Here, the 8K4K image is divided into 4 images of 4K 2K. Therefore, there are 4 decoding units 204A to 204D.

The plurality of decoding units 204A to 204D operate in synchronization with each other based on a predetermined reference clock. Each Decoding unit decodes the coded data of the division unit in accordance with the DTS (Decoding Time Stamp) of the access unit, and outputs the Decoding result to the display unit 205.

The display unit 205 combines the plurality of decoding results output from the plurality of decoding units 204A to 204D to generate an output image of 8K 4K. The display unit 205 displays the generated output image according to a PTS (Presentation Time Stamp) of the access unit acquired separately. In addition, when merging the decoding results, the display unit 205 may perform filtering processing such as deblocking filtering on boundary regions of adjacent division units such as boundaries between tiles so that the boundaries are visually inconspicuous.

In the above description, the transmission device 100 and the reception device 200 that transmit or receive a broadcast are described as an example, but the content may be transmitted and received via a communication network. When the reception device 200 receives the content via the communication network, the reception device 200 separates multiplexed data from an IP packet received via a network such as ethernet.

In broadcasting, the transmission path delay from the transmission of a broadcast signal to the arrival at the receiving apparatus 200 is fixed. On the other hand, in a communication network such as the internet, a transmission path delay from data transmitted from a server to the reception device 200 is not constant due to the influence of congestion. Therefore, the receiving apparatus 200 often does not perform strict synchronous playback of a reference clock based on PCR in the broadcast MPEG-2 TS. Therefore, the receiving apparatus 200 may display the output image of 8K4K on the display unit according to the PTS without strictly synchronizing the decoding units.

Further, decoding processing for all the division units may not be completed at the time indicated by the PTS of the access unit due to congestion of the communication network or the like. In this case, the reception apparatus 200 skips the display of the access unit, or delays the display until the decoding of at least 4 division units is completed and the generation of the image of 8K4K is completed.

In addition, the content may be transmitted and received by using broadcasting and communication in combination. The method can also be applied to the reproduction of multiplexed data stored in a recording medium such as a hard disk or a memory.

Next, a multiplexing method of access units divided into slices when MMT is used as a multiplexing method will be described.

Fig. 8 is a diagram showing an example of packing data of an access unit of HEVC into an MMT. SPS, PPS, SEI, and the like are not necessarily included in the access unit, but their existence is illustrated in this example.

NAL units such as an access unit delimiter, SPS, PPS, and SEI arranged in an access unit before the first slice segment are grouped together and stored in the MMT packet # 1. The subsequent clips are stored in different MMT packets for each clip.

As shown in fig. 9, the NAL unit placed before the top slice in the access unit may be stored in the same MMT packet as the top slice.

Furthermore, if NAL units of End-of-Sequence or End-of-Bitstream, etc. indicating the End of a Sequence or stream are appended after the last slice segment, these are kept in the same MMT packet as the last slice segment. However, since NAL units such as End-of-Sequence and End-of-Bitstream may be inserted into the End point of the decoding process or the connection point of 2 streams, it is preferable that the receiving apparatus 200 be able to easily acquire these NAL units from the multiplex layer. In this case, these NAL units may also be stored in MMT packets different from slice segments. Accordingly, the receiving apparatus 200 can easily separate these NAL units from the multiplex layer.

Further, TS, DASH, RTP, or the like may be used as the multiplexing method. In these systems, the transmission apparatus 100 also stores different slice segments in different packets. This ensures that the receiving apparatus 200 can separate the slice from the multiplex layer.

For example, when TS is used, PES packetizing is performed on encoded data in units of slice segments. When the RTP is adopted, the encoded data is RTP packetized in units of slice segments. In these cases, as in MMT packet #1 shown in fig. 8, NAL units and slice segments arranged before the slice segments may be packetized separately.

When the TS is adopted, the transmission device 100 indicates a unit of data stored in a PES packet by using a data alignment descriptor or the like. Note that DASH is a system for downloading MP 4-format data units called "segments" by HTTP or the like, and therefore the transmission apparatus 100 does not packetize encoded data at the time of transmission. Therefore, the transmission device 100 may create a sub-sample in units of slice segments and store information indicating the storage location of the sub-sample in the header of the MP4 so that the reception device 200 can detect the slice segment in the multiplex layer under the MP 4.

The MMT packing of the snippets is described in detail below.

As shown in fig. 8, by packetizing encoded data, data commonly referred to in decoding of all slices in an access unit such as SPS and PPS is stored in an MMT packet # 1. In this case, the reception device 200 concatenates the payload data of the MMT packet #1 and the data of each slice, and outputs the resulting data to the decoding unit. In this way, the reception device 200 can easily generate input data to the decoding unit by concatenating the payloads of the plurality of MMT packets.

Fig. 10 is a diagram showing an example of generating input data to the decoding units 204A to 204D from the MMT packet shown in fig. 8. The inverse multiplexer 203 concatenates the payload data of the MMT packet #1 and the MMT packet #2 to generate data necessary for the decoder 204A to decode the slice 1. The inverse multiplexer 203 generates input data in the same manner for the decoders 204B to 204D. That is, the inverse multiplexer 203 concatenates the MMT packet #1 and the payload data of the MMT packet #3 to generate input data for the decoder 204B. The inverse multiplexer 203 concatenates the MMT packet #1 and the payload data of the MMT packet #4 to generate input data for the decoder 204C. The inverse multiplexer 203 concatenates the MMT packet #1 and the payload data of the MMT packet #5 to generate input data for the decoder 204D.

The inverse multiplexing unit 203 may remove NAL units necessary for non-decoding processing such as access unit delimiters and SEI from the payload data of the MMT packet #1, separate only NAL units of SPS and PPS necessary for decoding processing, and add the separated NAL units to the sliced data.

When the encoded data is packetized as shown in fig. 9, the inverse multiplexing unit 203 outputs an MMT packet #1 including the head data of the access unit in the multiplexing layer to the 1 st decoding unit 204A. The inverse multiplexing unit 203 analyzes the MMT packet including the header data of the access unit in the multiplexing layer, separates NAL units of the SPS and PPS, and adds the separated NAL units of the SPS and PPS to each data of the 2 nd and subsequent slices, thereby generating input data for each of the 2 nd and subsequent decoding units.

Further, it is preferable that the reception apparatus 200 be able to identify the type of data held in the MMT payload and the index number of the slice segment in the access unit when the slice segment is held in the payload, by using information contained in the header of the MMT packet. Here, the type of data refers to either data before a slice segment (NAL units arranged before the leading slice segment in an access unit are collectively referred to as this) or data of a slice segment. When a Unit obtained by slicing an MPU such as a slice segment is stored in an MMT packet, a mode for storing an MFU (Media Fragment Unit) is adopted. When the transmission device 100 adopts this mode, for example, data unit (data unit) which is a basic unit of data in the MFU can be set as a sample (data unit in MMT, corresponding to an access unit) or a sub-sample (unit into which a sample is divided).

At this time, the header of the MMT packet contains a field called "Fragmentation indicator" and a field called "Fragmentation counter".

The Fragmentation indicator indicates whether or not the data stored in the payload of the MMT packet is data obtained by fragmenting a data unit, and when the data is data obtained by fragmenting, indicates that the fragment is the beginning or the last fragment of the data unit, or neither the beginning nor the last fragment. In other words, the Fragmentation indicator included in the header of a certain packet indicates (1) that the packet is included only in the data unit as the basic data unit; (2) the data unit is divided into a plurality of packets for storage, and the packet is the packet at the head of the data unit; (3) dividing the data unit into a plurality of packets for storage, wherein the packets are packets except the head and the last of the data unit; and (4) the data unit is divided into a plurality of packets and stored, and the packet is the identification information of any one of 4 items of the last packet of the data unit.

The Fragment counter is an index number indicating that data stored in the MMT packet is equivalent to the second Fragment in the data unit.

Therefore, by setting the sample in the MMT as a data unit by the transmission device 100 and setting the data before the slice and each slice as a slice unit of the data unit, the reception device 200 can identify the type of data stored in the payload using the information included in the header of the MMT packet. That is, the inverse multiplexer 203 can generate input data to the decoders 204A to 204D with reference to the header of the MMT packet.

Fig. 11 is a diagram showing an example when a sample is set as a data unit and data before a slice and the slice are packed as a slice of the data unit.

The pre-slice segment data and the slice segments are divided into 5 slices of slice #1 to slice # 5. Each fragment is saved into an individual MMT packet. At this time, the values of the Fragmentation indicator and Fragmentation counter included in the header of the MMT packet are as shown in the figure.

For example, a Fragment indicator is a binary 2-bit value. The Fragment indicator at the beginning of the data unit, i.e., the MMT packet #1, the Fragment indicator at the end of the data unit, i.e., the MMT packet #5, and the Fragment indicators of the MMT packets #2 to #4, i.e., the packets therebetween, are set to different values, respectively. Specifically, the Fragment indicator of the MMT packet #1, which is the head of the data unit, is set to 01, the Fragment indicator of the MMT packet #5, which is the last data unit, is set to 11, and the Fragment indicators of the MMT packets #2 to #4, which are the packets therebetween, are set to 10. When the data unit includes only 1 MMT packet, the Fragment indicator is set to 00.

The Fragment counter is 4, which is a value obtained by subtracting 1 from 5, which is the total number of fragments, in the MMT packet #1, and is sequentially decreased one by one in the subsequent packets, and is 0 in the last MMT packet # 5.

Therefore, the reception apparatus 200 can identify the MMT packet holding the data before the slice segment by using either the Fragment indicator or the Fragment counter. Further, the reception apparatus 200 can identify the MMT packet holding the nth Fragment by referring to the Fragment counter.

The header of the MMT packet additionally contains the sequence number of the Movie Fragment (Movie Fragment) to which the data unit belongs in the MPU, the sequence number of the MPU itself, and the sequence number of the sample to which the data unit belongs in the Movie Fragment. The inverse multiplexer 203 can uniquely determine the sample to which the data unit belongs by referring to these.

Furthermore, since the inverse multiplexing unit 203 can determine the index number of the segment in the data unit from the Fragment counter or the like, even when a packet loss occurs, the segment stored in the segment can be uniquely specified. For example, even when the slice #4 shown in fig. 11 cannot be acquired due to a packet loss, the inverse multiplexing unit 203 knows that the slice received immediately after the slice #3 is the slice #5, and therefore can accurately output the slice segment 4 stored in the slice #5 to the decoding unit 204D, not the decoding unit 204C.

When a transmission path that ensures that no packet loss occurs is used, the inverse multiplexing unit 203 may periodically process the arriving packets without determining the type of data stored in the MMT packet or the index number of the slice with reference to the header of the MMT packet. For example, when an access unit is transmitted with 5 MMT packets in total of pre-slice data and 4 slice segments, the reception apparatus 200 can sequentially acquire pre-slice data and 4-slice data by determining pre-slice data of the access unit for which decoding is started and then sequentially processing the received MMT packets.

A modified example of the packing will be described below.

The slice does not necessarily have to be divided in both the horizontal direction and the vertical direction within the plane of the access unit, and the access unit may be divided only in the horizontal direction or only in the vertical direction as shown in fig. 1.

Furthermore, when the access unit is divided only in the horizontal direction, tiles need not be employed.

The number of divisions in the plane of the access unit is arbitrary and is not limited to 4. However, the area size of the slice segment and the tile needs to be equal to or larger than the lower limit of the coding standard such as h.265.

The transmission apparatus 100 may store identification information indicating the intra-plane division method of the access unit in the MMT message, the descriptor of the TS, or the like. For example, information indicating the number of divisions in the horizontal direction and the vertical direction in the plane may be stored. Alternatively, the unique identification information may be assigned to a dividing method such as dividing equally in the horizontal direction and the vertical direction 2 as shown in fig. 3, or dividing equally in the horizontal direction 4 as shown in fig. 1. For example, when the access unit is divided as shown in fig. 3, the identification information indicates pattern 1, and when the access unit is divided as shown in fig. 1, the identification information indicates pattern 1.

Further, information indicating restrictions on coding conditions associated with the in-plane division method may be included in the multiplex layer. For example, information indicating that 1 slice is composed of 1 tile may be used. Alternatively, information indicating that a reference block in the case of performing motion compensation at the time of decoding a slice or a tile is limited to a slice or a tile at the same position in a picture, or to a block in a predetermined range in an adjacent slice segment, or the like may be used.

Further, the transmission apparatus 100 may switch whether or not to divide the access unit into a plurality of slices according to the resolution of the moving image. For example, the transmission apparatus 100 may divide the access unit into 4 units when the moving image to be processed is 8K4K, and may not perform in-plane division when the moving image to be processed has a resolution of 4K 2K. By defining the division method in the case of the moving image of 8K4K in advance, the reception apparatus 200 can determine whether or not there is an in-plane division and the division method by acquiring the resolution of the received moving image, and can switch the decoding operation.

The reception device 200 can detect the presence or absence of in-plane division by referring to the header of the MMT packet. For example, when the access unit is not divided, if the data unit of the MMT is set as a sample, the data unit is not fragmented. Therefore, the reception apparatus 200 can determine that an access unit is not divided when the value of the Fragment counter included in the header of the MMT packet is always zero. Alternatively, the reception apparatus 200 may detect whether or not the value of the Fragmentation indicator is always 01. The reception apparatus 200 can determine that an access unit is not divided even when the value of the Fragmentation indicator is always 01.

The receiving apparatus 200 can also cope with a case where the number of divisions in the plane of the access unit does not match the number of decoding units. For example, when the reception device 200 includes 2

decoding units

204A and 204B capable of decoding 8K2K encoded data in real time, the inverse multiplexing unit 203 outputs 2 of 4 slices constituting 8K4K encoded data to the decoding unit 204A.

Fig. 12 is a diagram showing an operation example when data subjected to MMT packetization as shown in fig. 8 is input to the 2

decoding units

204A and 204B. Here, it is preferable that the receiving apparatus 200 be able to directly combine and output the decoding results of the

decoding units

204A and 204B. Therefore, the inverse multiplexer 203 selects the slice segment to be output to each of the

decoders

204A and 204B so that the decoding results of the

decoders

204A and 204B are spatially continuous.

The inverse multiplexing unit 203 may select a decoding unit to be used, based on the resolution, frame rate, or the like of the encoded data of the moving image. For example, when the reception apparatus 200 includes 4 decoding units of 4K2K, if the resolution of the input image is 8K4K, the reception apparatus 200 performs decoding processing using all 4 decoding units. Further, as long as the resolution of the input image is 4K2K, the reception apparatus 200 performs the decoding process using only 1 decoding unit. Alternatively, even if the intra-plane division is 4, when 8K4K can be decoded in real time by a single decoding unit, the inverse multiplexing unit 203 merges all the division units and outputs the result to 1 decoding unit.

The receiving apparatus 200 may determine the decoding unit to be used in consideration of the frame rate. For example, there are cases where: when the receiving apparatus 200 includes 2 decoding units having an upper limit of 60fps of frame rate that can be decoded in real time at a resolution of 8K4K, 120fps of encoded data is input at 8K 4K. In this case, if the intra-plane is configured by 4 division units, slice 1 and slice 2 are input to the decoding unit 204A, and slice 3 and slice 4 are input to the decoding unit 204B, as in the example of fig. 12. Since each of the

decoding units

204A and 204B can decode 120fps in real time as long as it is 8K2K (with a resolution of half of 8K 4K), the decoding processing can be performed by these 2

decoding units

204A and 204B.

Even if the resolution and the frame rate are the same, the amount of processing differs depending on the encoding system itself, such as the level (profile) or the level (level) in the encoding system, or h.264 or h.265. Therefore, the receiving apparatus 200 may select a decoding unit to be used based on the information. Further, when all the encoded data received through broadcasting or communication cannot be decoded, or when all the slices or tiles constituting the area selected by the user cannot be decoded, the reception apparatus 200 may automatically determine the slices or tiles that are decodable within the processing range of the decoding unit. Alternatively, the reception apparatus 200 may also provide a user interface for the user to select the decoded region. In this case, the reception apparatus 200 may display a warning message indicating that the entire area cannot be decoded, or may display information indicating the number of decodable areas, slices, or tiles.

The above method can also be applied to a case where MMT packets storing the same encoded data in the form of fragments are transmitted and received through a plurality of transmission paths such as broadcasting and communication.

The transmission device 100 may perform coding such that the regions of the respective slices overlap each other in order to make the boundary of the division unit inconspicuous. In the example shown in FIG. 13, the 8K4K picture is divided into 4 slices 1-4. For example, the cut pieces 1 to 3 are 8K × 1.1K, and the cut piece 4 is 8K × 1K. Further, adjacent cut pieces overlap each other. In this way, motion compensation at the time of encoding can be efficiently performed at the boundary at the time of 4-division indicated by a dotted line, and therefore the image quality at the boundary portion can be improved. Thus, the image quality deterioration at the boundary portion is reduced.

In this case, the display unit 205 cuts out an 8K × 1K region from among the 8K × 1.1K regions, and combines the obtained regions. The transmission device 100 may encode the slice segments so as to be overlapped with each other and transmit the encoded slice segments separately while including information indicating the overlapping range in the multiplex layer or the encoded data.

In addition, the same method can be applied to the use of tiles.

The following describes a flow of operations of the transmission device 100. Fig. 14 is a flowchart showing an example of the operation of the transmission apparatus 100.

First, the encoding unit 101 divides a picture (access unit) into a plurality of slices (tiles) that are a plurality of areas (S101). Next, the encoding unit 101 encodes each of the plurality of slices so that each of the plurality of slices can be independently decoded, thereby generating encoded data corresponding to each of the plurality of slices (S102). The encoding unit 101 may encode a plurality of slices by a single encoding unit, or may perform parallel processing by a plurality of encoding units.

Next, the multiplexing unit 102 stores the plurality of encoded data generated by the encoding unit 101 in a plurality of MMT packets, thereby multiplexing the plurality of encoded data (S103). Specifically, as shown in fig. 8 and 9, the multiplexer 102 stores a plurality of encoded data in a plurality of MMT packets so that the encoded data corresponding to different slice segments is not stored in 1 MMT packet. As shown in fig. 8, the multiplexer 102 stores control information that is used in common for all decoding units within a picture in an MMT packet #1 that is different from the MMT packets #2 to #5 in which a plurality of encoded data are stored. Here, the control information includes at least one of an access unit delimiter, an SPS, a PPS, and an SEI.

The multiplexer 102 may store the control information in the same MMT packet as any one of the plurality of MMT packets in which the plurality of encoded data are stored. For example, as shown in fig. 9, the multiplexer 102 may store the control information in the top MMT packet (MMT packet #1 in fig. 9) among a plurality of MMT packets in which a plurality of encoded data are stored.

Finally, the transmission apparatus 100 transmits a plurality of MMT packets. Specifically, the modulation unit 103 modulates the multiplexed data, and the transmission unit 104 transmits the modulated data (S104).

Fig. 15 is a block diagram showing an example of the configuration of the receiving apparatus 200, and is a diagram showing in detail the inverse multiplexing unit 203 shown in fig. 7 and the configuration of the subsequent stage thereof. As shown in fig. 15, the receiving apparatus 200 further includes a decode command unit 206. The inverse multiplexing unit 203 further includes a type discrimination unit 211, a control information acquisition unit 212, a slice information acquisition unit 213, and a decoded data generation unit 214.

The following describes the flow of the operation of the receiving apparatus 200. Fig. 16 is a flowchart showing an example of the operation of the receiving apparatus 200. Here, the operation for 1 access unit is shown. When the decoding processing of a plurality of access units is executed, the processing of the present flowchart is repeated.

First, the reception device 200 receives a plurality of packets (MMT packets) generated by the transmission device 100, for example (S201).

Next, the type discrimination unit 211 analyzes the header of the received packet to acquire the type of encoded data stored in the received packet (S202).

Next, the type discrimination unit 211 determines whether the data stored in the received packet is data before slicing or sliced based on the type of the acquired encoded data (S203).

When the data stored in the received packet is pre-sliced data (yes in S203), the control information acquisition unit 212 acquires pre-sliced data of the access unit to be processed from the payload of the received packet and stores the pre-sliced data in the memory (S204).

On the other hand, if the data stored in the received packet is fragmented data (no in S203), the receiving apparatus 200 determines whether or not the data stored in the received packet is encoded data of any one of the plurality of areas, using the header information of the received packet. Specifically, the slice information acquiring unit 213 analyzes the header of the received packet to acquire the index Idx of the slice stored in the received packet (S205). Specifically, the index number Idx is an index number within Movie Fragment of an access unit (sample in MMT).

The processing of step S205 may be performed in step S202.

Next, the decoded data generation unit 214 determines a decoding unit that decodes the slice segment (S206). Specifically, the index Idx is associated with a plurality of decoding units in advance, and the decoded data generator 214 determines the decoding unit corresponding to the index Idx acquired in step S205 as the decoding unit for decoding the slice segment.

As described in the example of fig. 12, the decoded data generation unit 214 may determine a decoding unit that decodes a slice segment based on at least one of the resolution of an access unit (picture), the method of dividing the access unit into a plurality of slices (tiles), and the processing capacity of a plurality of decoding units provided in the reception apparatus 200. For example, the decoded data generation unit 214 determines the access unit division method based on the identification information in the descriptor such as the MMT message and the section (section) of the TS.

Next, the decoded data generating unit 214 combines the control information, which is included in any one of the plurality of packets and is used in common for all the decoding units within the picture, with each of the plurality of pieces of encoded data of the plurality of slices, thereby generating a plurality of pieces of input data (combined data) to be input to the plurality of decoding units. Specifically, the decoded data generator 214 acquires fragmented data from the payload of the received packet. The decoded data generation unit 214 combines the data before the slice segment stored in the memory in step S204 with the acquired data of the slice segment, thereby generating input data to the decoding unit determined in step S206 (S207).

After step S204 or S207, if the data of the received packet is not the final data of the access unit (no in S208), the processing from step S201 onward is performed again. That is, the above-described processing is repeated until input data to the plurality of decoding units 204A to 204D corresponding to all the slices included in the access unit is generated.

The timing at which the packet is received is not limited to the timing shown in fig. 16, and a plurality of packets may be received in advance or sequentially and stored in a memory or the like.

On the other hand, if the data of the received packet is the final data of the access unit (yes in S208), the decode command unit 206 outputs the plurality of input data generated in step S207 to the corresponding decoding units 204A to 204D (S209).

Next, the plurality of decoding units 204A to 204D decode the plurality of input data in parallel according to the DTS of the access means, thereby generating a plurality of decoded images (S210).

Finally, the display unit 205 generates a display image by combining the plurality of decoded images generated by the plurality of decoding units 204A to 204D, and displays the display image according to the PTS of the access unit (S211).

The receiving apparatus 200 analyzes the payload data of the MMT packet storing header information of the MPU or header information of the Movie Fragment to acquire the DTS and PTS of the access unit. When the TS is used as the multiplexing scheme, the receiving apparatus 200 acquires the DTS and PTS of the access unit from the header of the PES packet. When the receiving apparatus 200 uses RTP as the multiplexing method, the DTS and PTS of the access unit are acquired from the header of the RTP packet.

When the decoding results of the plurality of decoding units are merged, the display unit 205 may perform filtering processing such as deblocking filtering on a boundary between adjacent division units. In addition, since the filtering process is not required when the decoding result of a single decoding unit is displayed, the display unit 205 may switch the process depending on whether or not the filtering process is performed on the boundary of the decoding results of a plurality of decoding units. Whether or not the filtering process is necessary may be determined in advance depending on the presence or absence of division. Alternatively, information indicating whether or not the filtering process is necessary may be separately stored in the multiplexing layer. In addition, information necessary for filtering processing such as filter coefficients is sometimes stored in SPS, PPS, SEI, or slice segments. The decoding units 204A to 204D or the inverse multiplexing unit 203 acquires these pieces of information by analyzing the SEI, and outputs the acquired pieces of information to the display unit 205. The display unit 205 performs filtering processing using these pieces of information. When these pieces of information are stored in the slice, the decoding units 204A to 204D preferably acquire these pieces of information.

In the above description, an example in which the types of data stored in a slice are 2 types, that is, pre-slice data and slice segment data, has been described, but the types of data may be 3 or more. In this case, discrimination according to the type is performed in step S203.

When the data size of the clip is large, the transmission device 100 may fragment the clip and store the fragment in the MMT packet. That is, the transmission device 100 may fragment the data before the fragmentation and the fragmentation. In this case, if the access unit is set to be equal to the data unit as in the example of the packing shown in fig. 11, the following problem occurs.

For example, when the slice segment 1 is divided into 3 slices, the slice segment 1 is divided into 3 packets of Fragment counter values 1 to 3 to transmit. In addition, after the Fragment segment 2, the Fragment counter value becomes 4 or more, and the correlation between the Fragment counter value and the data stored in the payload cannot be obtained. Therefore, the reception apparatus 200 cannot specify the packet storing the head data of the slice from the header information of the MMT packet.

In this case, the reception apparatus 200 may parse the data of the payload of the MMT packet and determine the start position of the slice. Here, as a format for storing NAL units in a multiplex layer, there are 2 types of formats called "byte stream format" in which a start code composed of a specific bit string is added immediately before a NAL unit header, and "NAL size format" in which a field indicating the size of a NAL unit is added.

The byte stream format is used in MPEG-2 systems, RTP, and the like. NAL size formats are utilized in MP4, and DASH and MMT using MP4, and the like.

When the byte stream format is used, the reception apparatus 200 analyzes whether or not the head data of the packet matches the start code. If the start data of a packet matches the start code, the receiving apparatus 200 can detect whether or not the data included in the packet is fragmented data by acquiring the type of NAL unit from the NAL unit header immediately following the NAL unit header.

On the other hand, in the NAL size format, the reception apparatus 200 cannot detect the start position of the NAL unit based on the bit string. Therefore, in order to acquire the start position of the NAL unit, the reception device 200 needs to sequentially read data corresponding to the size of the NAL unit from the leading NAL unit of the access unit and shift the pointer.

However, if the header of the MPU or Movie Fragment of the MMT indicates the size of a unit of a sub-sample and the sub-sample corresponds to the pre-slicing data or the slice segment, the reception apparatus 200 can determine the start position of each NAL unit based on the size information of the sub-sample. Therefore, the transmission apparatus 100 may include information indicating whether or not the information in units of subsamples exists in the MPU or the Movie Fragment in information acquired by the reception apparatus 200 at the start of data reception, such as the MPT in the MMT.

The data of the MPU is data expanded based on the MP4 format. MP4 has a mode in which parameter sets such as SPS and PPS of h.264 or h.265 can be stored as sample data and a mode in which they cannot be stored. Further, information for determining the mode is represented as an entry name of SampleEntry (sample entry). When the parameter set is included in the sample using the mode that can be saved, the reception apparatus 200 acquires the parameter set by the method described above.

On the other hand, when the mode that cannot be stored is used, the parameter set is stored as Decoder Specific Information (Decoder characteristic Information) in the SampleEntry or is stored using a stream for the parameter set. Here, since a stream for parameter sets is not generally used, it is preferable that the transmission apparatus 100 stores parameter sets in the Decoder Specific Information. In this case, the receiving apparatus 200 analyzes the SampleEntry transmitted as the metadata of the MPU or the metadata of the Movie Fragment in the MMT packet, and acquires the parameter set referred to by the access unit.

When storing a parameter set as sample data, the reception apparatus 200 can acquire a parameter set necessary for decoding by referring to only the sample data without referring to SampleEntry. At this time, the transmission apparatus 100 may not save the parameter set in SampleEntry. In this way, since the same SampleEntry can be used by the transmission device 100 in different MPUs, the processing load of the transmission device 100 at the time of MPU creation can be reduced. Further, there is an advantage that the reception apparatus 200 does not need to refer to the parameter set in SampleEntry.

Alternatively, the transmission apparatus 100 may store 1 default parameter set in SampleEntry and store the parameter set referred to by the access unit in the sample data. In the conventional MP4, since the parameter set is generally stored in the SampleEntry, there is a possibility that the reception device may stop reproduction when the parameter set is not present in the SampleEntry. This problem can be solved by using the above method.

Alternatively, the transmission apparatus 100 may save the parameter set to the sample data only when the parameter set different from the default parameter set is used.

In addition, since both modes can save parameter sets into the SampleEntry, the transmitting device 100 can also always save parameter sets into the visual SampleEntry, from which the receiving device 200 always obtains parameter sets.

In the MMT standard, header information of MP4 such as Moov and Moof is transmitted as MPU metadata or movie clip metadata, but the transmission device 100 may not necessarily transmit MPU metadata and movie clip metadata. The receiving apparatus 200 may determine whether or not SPS and PPS are stored in the sample data based on services of ARIB (Association of Radio Industries and Businesses) standards, a type of resource, or whether or not MPU element transmission is performed.

Fig. 17 is a diagram showing an example of the case where data before a slice and data units are set to different data units.

In the example shown in FIG. 17, the data before the section and the data sizes of the sections 1 to 4 are Length #1 to Length #5, respectively. The values of the Fragmentation indicator, the Fragmentation counter, and Offset fields contained in the header of the MMT packet are shown in the figure.

Here, Offset is Offset information indicating a bit length (Offset) from the beginning of encoded data of a sample (access unit or picture) to which payload data belongs to the beginning byte of payload data (encoded data) included in the MMT packet. The value of Fragment counter is described as starting from a value obtained by subtracting 1 from the total number of fragments, but may start from another value.

Fig. 18 is a diagram showing an example of slicing data units. In the example shown in fig. 18, the slice 1 is divided into 3 slices and stored in the MMT packet #2 to the MMT packet #4, respectively. At this time, the data sizes of the respective slices are set to Length #2_1 to Length #2_3, respectively, and the values of the respective fields are as shown in the figure.

As described above, when a data unit such as a slice is set as a data unit, the start of an access unit and the start of a slice can be determined as follows based on the field value of the MMT header.

The beginning of the payload in a packet whose Offset value is 0 is the beginning of an access unit.

The value of Offset is a value other than 0, and the beginning of the payload of a packet whose Fragmentation index value is 00 or 01 is the beginning of a slice.

When neither data unit fragmentation nor packet loss occurs, the receiving apparatus 200 can determine the index number of the fragment stored in the MMT packet based on the number of fragments acquired after the start of the access unit is detected.

In addition, when the data unit of the data before the slice is fragmented, the reception apparatus 200 can detect the access unit and the beginning of the slice in the same manner.

When a packet loss occurs or when the SPS, PPS, and SEI included in the data before the slice are set to different data units, the reception apparatus 200 can also determine the start position of the slice or tile in the picture (access unit) by determining the MMT packet that stores the header data of the slice based on the analysis result of the MMT header and then analyzing the header of the slice. The amount of processing involved in the analysis of slice headers is small, and the processing load does not become a problem.

As described above, each of the plurality of encoded data of the plurality of slices is associated with a basic data unit (data unit), which is a unit of data stored in 1 or more packets, in a one-to-one manner. Further, each of the plurality of encoded data is stored in 1 or more MMT packets.

The header information of each MMT packet includes Fragmentation indicator (identification information) and Offset (Offset information).

The reception apparatus 200 determines the start of payload data included in a packet having header information including a Fragmentation indicator having a value of 00 or 01 as the start of encoded data of each slice. Specifically, the receiving apparatus 200 determines the start of payload data included in a packet having header information including Offset having a value other than 0 and a Fragmentation indicator having a value of 00 or 01 as the start of encoded data of each slice.

In the example of fig. 17, the start of the data unit is either the start of the access unit or the start of the slice, and the Fragmentation indicator has a value of 00 or 01. The receiving apparatus 200 can also determine which of the access unit delimiter and the slice the start of the data unit is by referring to the type of NAL unit, and can detect the start of the access unit or the start of the slice without referring to Offset.

As described above, when the transmitting apparatus 100 divides the data before the slice into a plurality of data units by packetizing the start of the NAL unit so as to start from the start of the payload of the MMT packet, the receiving apparatus 200 can detect the start of the access unit or the slice by analyzing the Fragmentation indicator and the NAL unit header. The type of NAL unit exists in the first byte of the NAL unit header. Therefore, the receiving apparatus 200 can acquire the type of NAL unit by additionally analyzing 1 byte data when analyzing the MMT packet header. In the case of audio, the receiving apparatus 200 may detect the beginning of the access unit, and may determine whether the value of the Fragmentation indicator is 00 or 01.

As described above, when the encoded data encoded so as to be divisionally decodable is stored in the PES packet of the MPEG-2TS, the transmission device 100 can use the data alignment descriptor. Hereinafter, an example of a method of storing encoded data in a PES packet will be described in detail.

For example, in HEVC, the transmission device 100 can indicate which of an access unit, a slice segment, and a tile is data stored in a PES packet by using a data alignment descriptor. The type of alignment in HEVC is specified as follows.

The type of alignment 8 represents a slice of HEVC. The type of alignment 9 represents a slice or access unit of HEVC. The type of alignment 12 represents a slice or tile of HEVC.

Therefore, the transmission device 100 can indicate that the data of the PES packet is either the slice or the pre-slice data by using, for example, type 9. Since a type indicating that the slice is not a slice is separately specified, the transmission device 100 may use a type indicating that the slice is not a slice.

The DTS and PTS included in the header of the PES packet are set only in the PES packet including the leading data of the access unit. Therefore, if the type is 9 and a field of DTS or PTS exists in the PES packet, the receiving apparatus 200 can determine that the entire access unit or the leading partition unit in the access unit is stored in the PES packet.

The transmitting apparatus 100 may use a field such as transport _ priority indicating the priority of a TS packet storing a PES packet containing the head data of an access unit so that the receiving apparatus 200 can distinguish the data included in the packet. The receiving apparatus 200 may determine the data included in the PES packet by analyzing whether or not the payload of the PES packet is an access unit delimiter. In addition, the data _ alignment _ indicator of the PES packet header indicates whether data is stored in the PES packet by these types. As long as the flag (data _ alignment _ indicator) is set to 1, the data held in the PES packet is guaranteed to be of the type indicated by the data alignment descriptor.

The transmission device 100 may use the data alignment descriptor only when PES packetizing is performed in units of divisible decoding such as slices. Accordingly, the receiving device 200 can determine that encoded data is PES-packetized in divisionally decodable units when the data alignment descriptor exists, and can determine that encoded data is PES-packetized in access unit units when the data alignment descriptor does not exist. In addition, when the data _ alignment _ indicator is set to 1 and no data alignment descriptor exists, the content of the PES packet in units of access units is specified in the MPEG-2TS standard.

If the data alignment descriptor is included in the PMT, the receiving device 200 can determine that the PES packet is performed in units capable of being divided and decoded, and can generate input data to each decoding unit based on the packetized units. When it is determined that parallel decoding of encoded data is necessary based on program information or information of another descriptor that the data alignment descriptor is not included in the PMT, the receiving apparatus 200 analyzes slice headers of slices and the like to generate input data to each decoding unit. When the encoded data can be decoded by the single decoding unit, the receiving apparatus 200 decodes the data of the entire access unit by the decoding unit. When information indicating whether or not the encoded data is composed of divisible decoding units such as slices and tiles is separately indicated by a descriptor of the PMT, the reception device 200 may determine whether or not the encoded data can be decoded in parallel based on the analysis result of the descriptor.

Since the DTS and PTS included in the header of the PES packet are set only in the PES packet including the leading data of the access unit, when the divided access units perform PES packetization, the 2 nd and subsequent PES packets do not include information indicating the DTS and PTS of the access unit. Therefore, when performing decoding processing in parallel, the decoding units 204A to 204D and the display unit 205 use the DTS and PTS stored in the header of the PES packet including the leading data of the access unit.

(embodiment mode 2)

In embodiment 2, a method of storing data in NAL size format in an MPU conforming to MP4 format in MMT will be described. In the following, a storage method to an MPU for MMT will be described as an example, but such a storage method can also be applied to DASH based on the MP4 format.

[ method of storing in MPU ]

In the MP4 format, a plurality of access units are collectively saved in 1 MP4 file. The MPU for MMT can save data of each media, including an arbitrary number of access units, into 1 MP4 file. Since the MPU is a unit that can decode independently, for example, an access unit in units of GOPs is stored in the MPU.

Fig. 19 is a diagram showing a configuration of an MPU. The MPU starts with ftyp, mmpu, and moov, which are collectively defined as MPU metadata. The moov stores initialization information and MMT track (hint track) common to files.

In the moof, initialization information and size of each sample or sub-sample, information (sample _ duration, sample _ size, sample _ composition _ time _ offset) that can specify Presentation Time (PTS) and Decoding Time (DTS), data _ offset indicating the position of data, and the like are stored.

Further, the plurality of access units are respectively saved as samples in mdat (mdat box). Data other than the samples among moof and mdat is defined as movie fragment metadata (hereinafter referred to as MF metadata), and sample data of mdat is defined as media data.

Fig. 20 is a diagram showing the structure of MF metadata. As shown in fig. 20, the MF metadata is more specifically composed of type (type), length (length), and data (data) of moof box (moof), and type (type) and length (length) of mdat box (mdat).

When storing an access unit in MP4 data, there are a mode in which parameter sets such as SPS and PPS of h.264 or h.265 can be stored as sample data and a mode in which the parameter sets cannot be stored.

Here, in the above-described mode that cannot be saved, the parameter set is saved to Decoder Specific Information of SampleEntry of moov. In the above-described mode capable of saving, the parameter set is included in the sample.

MPU metadata, MF metadata, and media data are stored in the MMT payload, and a slice type (FT) is stored in the header of the MMT payload as an identifier that can identify these data. FT-0 denotes MPU metadata, FT-1 denotes MF metadata, and FT-2 denotes media data.

In addition, although fig. 19 shows an example in which MPU metadata units and MF metadata units are stored as data units in the MMT payload, units such as ftyp, mmpu, moov, and moof may be stored as data units in the MMT payload. Likewise, in fig. 19, an example in which a sample unit is saved as a data unit into the MMT payload is illustrated. Alternatively, the data unit may be configured in units of samples or NAL units, and stored in the MMT payload in units of data units. Such data unit may be further stored in the MMT payload in units after fragmentation.

[ conventional Transmission method and problems ]

Conventionally, when a plurality of access units are packaged in MP4 format, moov and moof are created at the time when all samples stored in MP4 are ready.

When the MP4 format is transmitted in real time by broadcasting or the like, for example, if the samples stored in 1 MP4 file are GOP units, moov and moof are created after storing time samples of GOP units, and therefore, a delay associated with encapsulation occurs. Due to such encapsulation on the transmitting side, the end-to-end delay always extends the GOP unit amount of time. Accordingly, it is difficult to provide a service in real time, and particularly when live content is transmitted, deterioration of the service for viewers is caused.

Fig. 21 is a diagram for explaining a data transmission sequence. When the MMT is applied to broadcasting, as shown in fig. 21 (a), if the MMT packets are placed in the MMT packet in the order of the constituent MPUs and transmitted (the MMT packets are transmitted in the order of #1, #2, #3, #4, #5, and # 6), a delay due to encapsulation occurs in the transmission of the MMT packet.

In order to prevent this delay due to encapsulation, as shown in fig. 21 (b), MPU header information such as MPU metadata and MF metadata is not transmitted (packets #1 and #2 are not transmitted, and packets #3 to #6 are transmitted in this order). Further, the following method is considered: as shown in fig. 20 (c), the media data is transmitted without waiting for the MPU header information to be created, and the MPU header information is transmitted after the media data is transmitted (transmission is performed in the order of #3 to #6, #1, and # 2).

When the MPU header information is not transmitted, the receiving apparatus does not decode using the MPU header information, and when the MPU header information is transmitted after the comparison with the media data, the receiving apparatus waits for the acquisition of the MPU header information and then decodes.

However, in the conventional reception device conforming to MP4, decoding without MPU header information cannot be guaranteed. Further, when the receiving apparatus performs decoding by a special process without using the MPU header, the decoding process becomes complicated by the conventional transmission rule, and it is highly likely that real-time decoding becomes difficult. When the receiving apparatus waits for the acquisition of MPU header information and then performs decoding, the buffering of media data is necessary until the receiving apparatus acquires the header information, but the buffer model is not defined and decoding cannot be guaranteed.

Then, as shown in fig. 20 (d), the transmission device according to embodiment 2 stores only common information in the MPU metadata, thereby transmitting the MPU metadata prior to the media data. The transmission device according to embodiment 2 transmits the MF metadata delayed in generation to the media data. Accordingly, a transmission method or a reception method capable of guaranteeing decoding of media data is provided.

The following describes a reception method when each of the transmission methods shown in fig. 21 (a) to (d) is used.

In each transmission method shown in fig. 21, MPU metadata, MFU metadata, and media data are first structured in the order of MPU metadata, MFU metadata, and media data.

After the MPU data is configured, when the transmitting apparatus transmits data in the order of MPU metadata, MF metadata, and media data as shown in (a) of fig. 21, the receiving apparatus can decode by any one of the following methods (a-1) and (a-2).

(A-1) the receiving device acquires MPU header information (MPU metadata and MF metadata), and then decodes the media data using the MPU header information.

(A-2) the reception apparatus decodes the media data without using MPU header information.

Such methods all have the advantage that although a delay due to encapsulation occurs on the transmitting side, there is no need to buffer media data in order to acquire an MPU header in the receiving device. When buffering is not performed, a memory for buffering does not need to be mounted, and buffering delay does not occur. The method (a-1) is also applicable to a conventional receiving apparatus because it decodes using MPU header information.

When the transmitting apparatus transmits only media data as shown in fig. 21 (B), the receiving apparatus can perform decoding by the method (B-1) described below.

(B-1) the reception apparatus decodes the media data without using the MPU header information.

Note that, although not shown, when MPU metadata is transmitted prior to the transmission of the media data in fig. 21 (B), decoding can be performed by the following method (B-2).

(B-2) the reception apparatus decoding the media data using the MPU metadata.

Both the methods (B-1) and (B-2) have the advantages that no delay due to packaging occurs on the transmission side, and there is no need to buffer media data in order to acquire an MPU header. However, since the methods (B-1) and (B-2) do not decode the header information using the MPU, there is a possibility that special processing is required for the decoding.

When the transmitting apparatus transmits data in the order of media data, MPU metadata, and MF metadata as shown in fig. 21 (C), the receiving apparatus can decode by any one of the following methods (C-1) and (C-2).

(C-1) the reception device decodes the media data after acquiring MPU header information (MPU metadata and MF metadata).

(C-2) the receiving apparatus decodes the media data without using MPU header information.

When the method (C-1) is used, the media data needs to be buffered in order to acquire MPU header information. In contrast, when the method (C-2) is used, there is no need to perform buffering for acquiring MPU header information.

In addition, neither of the methods (C-1) and (C-2) described above causes a delay due to encapsulation on the transmission side. Further, since the method of (C-2) does not use MPU header information, there is a possibility that special processing is required.

When the transmitting apparatus transmits data in the order of MPU metadata, media data, and MF metadata as shown in fig. 21 (D), the receiving apparatus can decode by any one of the following methods (D-1) and (D-2).

(D-1) the receiving apparatus acquires the MF metadata after acquiring the MPU metadata, and then decodes the media data.

(D-2) the reception device decodes the media data without using the MF metadata after acquiring the MPU metadata.

In the case of the method (D-1), media data needs to be buffered in order to acquire MF metadata, and in the case of the method (D-2), buffering for acquiring MF metadata is not required.

Since the method of (D-2) does not perform decoding using MF metadata, there is a possibility that special processing is required.

As described above, there is an advantage that decoding can be performed even in the conventional MP4 reception device when decoding can be performed using the MPU metadata and the MF metadata.

In fig. 21, MPU data is composed in the order of MPU metadata, MFU metadata, and media data, and in moof, position information (offset) for each sample or subsample is determined based on the composition. The MF metadata also includes data (size or type of frame) other than the media data in the mdat frame.

Therefore, when the receiving apparatus determines the media data based on the MF metadata, the receiving apparatus reconstructs the data in the order when the MPU data is configured regardless of the order in which the data is transmitted, and then decodes the data using moov of the MPU metadata or moof of the MF metadata.

In fig. 21, the MPU data is configured in the order of the MPU metadata, MFU metadata, and media data, but the MPU data may be configured in a different order from that of fig. 21 to determine the position information (offset).

For example, the MPU data may be composed in the order of MPU metadata, media data, and MF metadata, and negative position information (offset) may be indicated in the MF metadata. In this case, regardless of the order of transmitting data, the receiving apparatus reconstructs data in the order in which the MPU data was configured on the transmitting side, and then decodes the data using moov or moof.

In addition, the transmitting apparatus may transmit, as signaling, information indicating an order in which the MPU data is configured, and the receiving apparatus may reconstruct the data based on the information transmitted as the signaling.

As described above, the receiving apparatus receives the packaged MPU metadata, the packaged media data (sample data), and the packaged MF metadata in this order as shown in fig. 21 (d). Here, MPU metadata is an example of the first metadata, and MF metadata is an example of the second metadata.

Next, the receiving device reconstructs MPU data (file in MP4 format) including the received MPU metadata, the received MF metadata, and the received sample data. Then, the sample data included in the reconstructed MPU data is decoded using the MPU metadata and the MF metadata. The MF metadata is metadata including data (for example, length stored in mbox) that can be generated only after sample data generation on the transmitting side.

More specifically, the operation of the receiving apparatus is performed by each component constituting the receiving apparatus. For example, the receiving apparatus includes a receiving unit that receives the data, a reconstruction unit that reconstructs the MPU data, and a decoding unit that decodes the MPU data. The receiving unit, the generating unit, and the decoding unit are each realized by a microcomputer, a processor, a dedicated circuit, or the like.

[ method of decoding without using header information ]

Next, a method of decoding without using header information will be described. Here, a method of decoding without using header information in a receiving apparatus regardless of whether or not header information is transmitted in a transmitting side will be described. That is, this method can be applied to any of the transmission methods described with reference to fig. 21. However, some decoding methods are applicable only to a specific transmission method.

Fig. 22 is a diagram showing an example of a method for decoding without using header information. In fig. 22, only the MMT payload containing only media data and the MMT packet are illustrated, and the MMT payload containing MPU metadata or MF metadata and the MMT packet are not illustrated. In the following description of fig. 22, it is assumed that media data belonging to the same MPU are continuously transmitted. Note that, although the case where a sample is stored in the payload as media data is described as an example, it is needless to say that in the following description of fig. 22, NAL units may be stored, or fragmented NAL units may be stored.

In order to decode media data, a receiving device first has to acquire initialization information required for decoding. In addition, if the medium is a video, the receiving apparatus must acquire initialization information for each sample, specify the starting position of the MPU which is a random access unit, and acquire the starting positions of the sample and NAL unit. In addition, the receiving apparatus needs to determine the Decoding Time (DTS) and Presentation Time (PTS) of the sample, respectively.

Thus, the receiving apparatus can perform decoding without using header information, for example, by using the following method. In addition, when NAL unit units or units obtained by slicing NAL units are stored in the payload, the following description may be given with "sample" replaced with "sample NAL unit".

< random Access (═ initial samples of a specific MPU) >

When the header information is not transmitted, the receiving apparatus has the following

methods

1 and 2 in order to specify the top sample of the MPU. In addition, method 3 can be used when header information is transmitted.

Method 1 the receiving device takes samples contained in an MMT packet whose RAP _ flag is 1' in the MMT packet header.

Method 2 the receiving device takes samples of 'sample number 0' in the MMT payload header.

[ method 3] when at least either one of MPU metadata and MF metadata is transmitted before and after media data, a receiving device acquires a sample included in an MMT payload in which a slice type (FT) in an MMT payload header has been switched to the media data.

In

methods

1 and 2, if a plurality of samples belonging to different MPUs are mixed in 1 payload, it is impossible to determine which NAL unit is a random access point (RAP _ flag is 1 or sample number is 0). Therefore, it is necessary to limit the samples of different MPUs to not coexist in 1 payload, or to limit the RAP _ flag to 1 when the last (or first) sample is a random access point when the samples of different MPUs coexist in 1 payload.

In order to acquire the start position of the NAL unit by the receiving apparatus, it is necessary to sequentially shift the read pointer of the data by the size amount of the NAL unit from the leading NAL unit of the sample.

When data is fragmented, the receiving apparatus can specify a data unit by referring to the fragment _ indicator or the fragment _ number.

< decision of DTS of sample >

The DTS of the sample is determined by

methods

1 and 2.

[ method 1] A reception device determines the DTS of a start sample based on a prediction structure. However, since this method requires analysis of encoded data and may be difficult to decode in real time, the following method 2 is preferable.

[ method 2] separately transmitting the DTS of the start sample, and the reception device acquires the DTS of the transmitted start sample. As a method of transmitting the DTS of the top sample, for example, a method of transmitting the DTS of the MPU top sample using the MMT-SI, a method of transmitting the DTS of each sample using the MMT header extension area, or the like is available. The DTS may be an absolute value or a relative value with respect to the PTS. In addition, the transmitting side may transmit, as a signaling, whether or not the DTS of the start sample is included.

In addition, in both method 1 and method 2, the DTS of the subsequent samples is calculated as a fixed frame rate.

As a method of storing the DTS of each sample in the packet header, there is a method of storing the DTS of the sample included in the MMT packet in the 32-bit NTP timestamp field in the MMT packet header, in addition to using the extension region. When the DTS cannot be expressed by the number of bits of 1 packet header (32 bits), the DTS may be expressed by a plurality of packet headers, or the DTS may be expressed by combining the NTP timestamp field of the packet header and the extension field. When DTS information is not included, it is regarded as a known value (e.g., ALL 0).

< PTS decision of sample >

The receiving apparatus obtains the PTS of the top sample from the MPU time stamp descriptor of each asset included in the MPU. The receiving apparatus calculates the subsequent sample PTS as a fixed frame rate using a parameter indicating the display order of the samples such as POC. In this way, in order to calculate the DTS and PTS without using header information, it is necessary to perform transmission based on a fixed frame rate.

When the MF metadata is transmitted, the receiving apparatus can calculate the absolute values of the DTS and PTS from the relative time information of the DTS or PTS with respect to the header sample indicated by the MF metadata and the absolute value of the time stamp of the MPU header sample indicated by the MPU time stamp descriptor.

When the DTS and PTS are calculated by encoded data analysis, the receiving device may calculate them by using SEI information included in the access unit.

Initialization information (parameter set) >

[ case of video ]

In the case of video, parameter sets are saved into the sample data. In addition, when MPU metadata and MF metadata are not transmitted, it is ensured that a parameter set necessary for decoding can be acquired only by referring to sample data.

As shown in fig. 21 (a) and (d), when MPU metadata is transmitted prior to media data, it may be defined that a parameter set is not stored in SampleEntry. In this case, the receiving device does not refer to the parameter set of SampleEntry but refers to only the parameter set within the sample.

When MPU metadata is transmitted prior to media data, a parameter set common to MPUs or a default parameter set may be stored in SampleEntry, and the receiving apparatus may refer to the parameter set of SampleEntry and the parameter set in a sample. By storing the parameter set in the SampleEntry, a conventional receiving apparatus that cannot reproduce the parameter set if the parameter set is not present in the SampleEntry can also perform decoding.

[ case of Audio ]

In the case of audio, the LATM header is required for decoding, and in MP4, the LATM header must be included in the Sample Entry (Sample Entry). However, when header information is not transmitted, it is difficult for the receiving apparatus to acquire the LATM header, and therefore the LATM header is separately included in control information such as SI. Additionally, the LATM header may also be incorporated into a message, table, or descriptor. In addition, the LATM head may be included in the sample.

The receiving apparatus acquires the LATM header from the SI or the like before decoding starts, and starts decoding of the audio. Alternatively, as shown in fig. 21 (a) and 21 (d), when transmitting MPU metadata prior to media data, the reception apparatus can receive the LATM header prior to the media data. Therefore, even when MPU metadata is transmitted prior to media data, decoding can be performed by a conventional receiving apparatus.

< other >

The transmission order or the type of transmission order may also be notified as control information of an MMT header or payload header, or an MPT or other table, message, descriptor, etc. The type of transmission sequence here is, for example, 4 types of transmission sequences shown in fig. 21 (a) to (d), and identifiers for identifying the types may be stored in positions that can be obtained before decoding starts.

The type of the transmission sequence may be different between audio and video, or may be common between audio and video. Specifically, for example, audio may be transmitted in the order of MPU metadata, MF metadata, and media data as shown in fig. 21 (a), and video may be transmitted in the order of MPU metadata, media data, and MF metadata as shown in fig. 21 (d).

By the above method, the receiving apparatus can perform decoding without using header information. When MPU metadata is transmitted prior to media data (fig. 21 (a) and 21 (d)), the conventional receiving apparatus can perform decoding.

In particular, by transmitting the MF metadata later than the media data ((d) of fig. 21), it is possible to decode the media data by the conventional receiving apparatus without delay due to the encapsulation.

[ constitution and operation of Transmission apparatus ]

Next, the configuration and operation of the transmission device will be described. Fig. 23 is a block diagram of a transmitting apparatus according to embodiment 2, and fig. 24 is a flowchart of a transmitting method according to embodiment 2.

As shown in fig. 23, the transmission device 15 includes an encoding unit 16, a multiplexing unit 17, and a transmission unit 18.

The encoding unit 16 encodes the video or audio to be encoded, for example, in accordance with h.265, to generate encoded data (S10).

The multiplexing unit 17 multiplexes (packetizes) the encoded data generated by the encoding unit 16 (S11). Specifically, the multiplexing unit 17 packages sample data, MPU metadata, and MF metadata constituting a file in the MP4 format. The sample data is data obtained by encoding a video signal or an audio signal, the MPU metadata is an example of the first metadata, and the MF metadata is an example of the second metadata. The first metadata and the second metadata are both metadata used for decoding sample data, but the difference is that the second metadata includes data that can be generated only after the sample data is generated.

Here, the data that can be generated only after the generation of sample data is, for example, data other than the sample data stored in mdat in the MP4 format (data in the header of mdat, i.e., type and length shown in fig. 20). Here, the second metadata may include a length as at least a part of the data.

The transmitting section 18 transmits the packaged MP4 format file (S12). The transmission unit 18 transmits the file in the MP4 format by the method shown in fig. 21 (d), for example. That is, the packaged MPU metadata, the packaged sample data, and the packaged MF metadata are transmitted in this order.

The encoding unit 16, the multiplexing unit 17, and the transmission unit 18 are each realized by a microcomputer, a processor, a dedicated circuit, or the like.

[ constitution of the receiving apparatus ]

Next, the configuration and operation of the receiving apparatus will be described. Fig. 25 is a block diagram of a receiving apparatus according to embodiment 2.

As shown in fig. 25, the reception device 20 includes a packet filtering unit 21, a transmission order type determining unit 22, a random access unit 23, a control information acquiring unit 24, a data acquiring unit 25, a PTS/DTS calculating unit 26, an initialization information acquiring unit 27, a decode command unit 28, a decoding unit 29, and a presentation unit 30.

[ action 1 of the receiving apparatus ]

First, an operation of the reception device 20 for specifying the MPU start position and NAL unit position when the medium is a video will be described. Fig. 26 is a flowchart of such an operation of the reception device 20. Here, it is assumed that the transmission order type of MPU data is stored in the SI information by the transmission device 15 (multiplexing unit 17).

First, the packet filtering unit 21 performs packet filtering on a received file. The transmission order type determination unit 22 analyzes the packet-filtered SI information and acquires the transmission order type of MPU data (S21).

Next, the transmission order type determination unit 22 determines (determines) whether or not the MPU header information (at least one of the MPU metadata and the MF metadata) is included in the packet-filtered data (S22). When the MPU header information is included (yes in S22), the random access unit 23 detects switching of the slice type of the MMT payload header to the media data, and specifies an MPU start sample (S23).

On the other hand, when the MPU header information is not included (no in S22), the random access unit 23 specifies an MPU start sample based on the RAP _ flag of the MMT header or the sample number of the MMT payload header (S24).

The transmission order type determination unit 22 determines whether or not the MF metadata is included in the packet-filtered data (S25). When determining that the MF metadata is included (yes at S25), the data acquisition unit 25 acquires a NAL unit by reading the NAL unit based on the sample included in the MF metadata, the offset of the sub-sample, and the size information (S26). On the other hand, when determining that the MF metadata is not included (no in S25), the data acquisition unit 25 acquires NAL units by sequentially reading data of the size of the NAL units from the leading NAL unit of the sample (S27).

If the receiving apparatus 20 determines in step S22 that the MPU header information is included, the MPU head sample may be determined by the process of step S24 instead of step S23. When it is determined that the MPU header information is included, the process of step S23 and the process of step S24 may be used in combination.

If the reception device 20 determines in step S25 that MF metadata is included, it may acquire a NAL unit by the processing of step S27 without the processing of step S26. When it is determined that the MF metadata is included, the process of step S23 and the process of step S24 may be used in combination.

It is assumed that the MF metadata is included in the media data and the MF data is transmitted later in step S25. In this case, the receiving apparatus 20 may buffer the media data, wait until the MF metadata is acquired, and then perform the process of step S26, or the receiving apparatus 20 may determine whether or not to perform the process of step S27 without waiting for the MF metadata to be acquired.

For example, the receiving apparatus 20 may determine whether or not to wait for the MF metadata to be acquired based on whether or not a buffer having a buffer size capable of buffering the media data is held. The reception device 20 may determine whether or not to wait for the MF metadata to be acquired based on whether or not the end-to-end delay is small. The receiving apparatus 20 may perform the decoding process mainly by the process of step S26, and may use the process of step S27 when a process pattern such as a packet loss occurs.

When the transmission order type is determined in advance, step S22 and step S26 may be omitted, and in this case, the reception device 20 may determine a method of determining the MPU start sample and a method of determining the NAL unit, taking into account the buffer size and the end-to-end delay.

When the transmission sequence type is known in advance, the transmission sequence type determination unit 22 is not required in the reception device 20.

Further, although not described in fig. 26, the decode command unit 28 outputs the data acquired by the data acquisition unit to the decoding unit 29 based on the PTS and DTS calculated by the PTS/DTS calculation unit 26 and the initialization information acquired by the initialization information acquisition unit 27. The decoding unit 29 decodes the data, and the presentation unit 30 presents the decoded data.

[ action 2 of the receiving apparatus ]

Next, an operation in which the reception device 20 acquires initialization information based on the transmission order type and decodes media data based on the initialization information will be described. Fig. 27 is a flowchart of such an operation.

First, the packet filtering unit 21 performs packet filtering on a received file. The transmission sequence type determination unit 22 analyzes the packet-filtered SI information to acquire a transmission sequence type (S301).

Next, the transmission order type determination unit 22 determines whether MPU metadata is transmitted (S302). When it is determined that the MPU metadata is transmitted (yes in S302), the transmission order type determination unit 22 determines whether the MPU metadata is transmitted prior to the media data based on the result of the analysis in step S301 (S303). When the MPU metadata is transmitted prior to the media data (yes in S303), the initialization information acquisition unit 27 decodes the media data based on the common initialization information included in the MPU metadata and the initialization information of the sample data (S304).

On the other hand, when it is determined that the MPU metadata is transmitted after the media data (no in S303), the data acquisition unit 25 buffers the media data until the MPU metadata is acquired (S305), and performs the process of step S304 after acquiring the MPU metadata.

If it is determined in step S302 that MPU metadata has not been transmitted (no in S302), the initialization information acquisition unit 27 decodes the media data based on only the initialization information of the sample data (S306).

If the decoding of the media data is guaranteed only when the transmission side is based on the initialization information of the sample data, the process of step S306 is used without performing the process based on the determination of step S302 and step S303.

In addition, the receiving apparatus 20 may determine whether or not to buffer the media data before step S305. In this case, the reception device 20 transitions to the process of step S305 when determining to buffer the media data, and transitions to the process of step S306 when determining not to buffer the media data. The determination of whether or not to buffer media data may be made based on the buffer size and the occupancy of the receiving apparatus 20, or may be made in consideration of the end-to-end delay, for example, by selecting a small end-to-end delay.

[ action 3 of the receiving apparatus ]

Here, details of a transmission method or a reception method when MF metadata is transmitted later than media data (fig. 21 (c) and fig. 21 (d)) will be described. Hereinafter, the case of fig. 21 (d) will be described as an example. Note that, only the method of (d) in fig. 21 is used for transmission, and the signaling of the transmission order type is not performed.

As described above, as shown in fig. 21 (d), when data is transmitted in the order of MPU metadata, media data, and MF metadata, the following 2 decoding methods can be performed:

(D-1) the reception device 20 acquires the MF metadata after acquiring the MPU metadata, and then decodes the media data.

(D-2) the reception device 20, after acquiring the MPU metadata, decodes the media data without using the MF metadata.

Here, in D-1, buffering of media data for obtaining MF metadata is required, but decoding can be performed by a receiving apparatus conforming to MP4 in the related art because decoding can be performed using MPU header information. In D-2, although it is not necessary to buffer media data for obtaining MF metadata, decoding cannot be performed using MF metadata, and therefore special processing is required for decoding.

In addition, in the method of fig. 21 (d), the MF metadata is transmitted after the media data, and thus there is an advantage in that a delay due to encapsulation does not occur, thereby reducing an end-to-end delay.

The reception device 20 can select the above 2 decoding methods according to the capability of the reception device 20 and the quality of service provided by the reception device 20.

The transmitting device 15 must ensure that decoding can be performed with reduced occurrence of buffer overflow or underflow during the decoding operation of the receiving device 20. As an element for defining a decoder model when decoding is performed by the method of D-1, for example, the following parameters can be used.

Buffer size for reconstitution of MPU (MPU buffer)

For example, for the buffer size ═ maximum rate × maximum MPU time × α, maximum rate (rate) means the rank of encoded data, the upper limit rate of the rank + overhead of the MPU header. The maximum MPU time is the maximum time length of a GOP when 1MPU is 1GOP (video).

Here, the audio may be in units of GOP common to the video or in units of other units. α is a margin for not causing overflow, and may be multiplied by the maximum rate × the maximum MPU time or added. When multiplied, alpha is more than or equal to 1, and when added, alpha is more than or equal to 0.

The upper limit of the decode delay time from the input of data to the MPU buffer until the decoding is performed. (TSTD _ delay in STD of MPEG-TS)

For example, in transmission, the DTS is set so that the acquisition completion time of MPU data of the receiver is not more than the DTS, taking into account the upper limit values of the maximum MPU time and the decoding delay time.

The transmitter 15 may also assign a DTS and a PTS in accordance with a decoder model when decoding is performed by the method of D-1. Accordingly, the transmitting device 15 can transmit auxiliary information necessary for decoding by the D-2 method while ensuring that the receiving device performing decoding by the D-1 method performs the decoding.

For example, the transmitting device 15 can guarantee the operation of the receiving device that performs decoding by the D-2 method by transmitting, as signaling, the pre-buffering time of the decoder buffer when performing decoding by the D-2 method.

The pre-buffering time may be included in SI control information such as a message, a table, a descriptor, etc., or may be included in a header of the MMT packet or the MMT payload. Furthermore, the SEI within the encoded data may also be covered. The DTS and PTS for decoding by the D-1 method may be stored in the MPU time stamp descriptor and the SampleEntry, and the DTS and PTS or the pre-buffering time for decoding by the D-2 method may be described in the SEI.

The reception device 20 may select the decoding method D-1 when the reception device 20 corresponds to only the decoding operation conforming to MP4 using the MPU header, and may select either one of the methods when the reception device 20 corresponds to both of D-1 and D-2.

The transmitter 15 may assign a DTS and a PTS to ensure a decoding operation of one side (D-1 in the present description), and may transmit auxiliary information for assisting the decoding operation of the other side.

Further, comparing the case of the method using D-2 with the case of the method using D-1, there is a high possibility that the end-to-end delay becomes large due to the delay due to the pre-buffering of the MF metadata. Therefore, the receiving apparatus 20 may select the method of D-2 for decoding when it is desired to reduce the end-to-end delay. For example, the receiving apparatus 20 may always use the D-2 method when it is desired to always reduce the end-to-end delay. The receiving apparatus 20 may use the method of D-2 only when it operates in a low-delay presentation mode in which presentation is performed with low delay, such as live content, channel selection, and channel switching (zapping).

Fig. 28 is a flow chart of such a receiving method.

First, the reception device 20 receives the MMT packet and acquires MPU data (S401). Then, the receiving apparatus 20 (transmission order type discriminating unit 22) determines whether or not to present the program in the low-delay presentation mode (S402).

When the program is not presented in the low-latency presentation mode (no in S402), the receiving apparatus 20 (the random access unit 23 and the initialization information acquisition unit 27) acquires random access and initialization information using header information (S405). The reception device 20(PTS/DTS calculating unit 26, decode command unit 28, decoding unit 29, presentation unit 30) performs decoding and presentation processing based on the PTS and DTS provided from the transmission side (S406).

On the other hand, when presenting the program in the low-latency presentation mode (yes in S402), the receiving apparatus 20 (the random access unit 23 and the initialization information acquisition unit 27) acquires random access and initialization information by a decoding method not using header information (S403). The reception device 20 also performs decoding and presentation processing based on auxiliary information for decoding without using the PTS, DTS, and header information provided by the transmission side (S404). In step S403 and step S404, the processing may be performed using MPU metadata.

[ transmitting/receiving method using auxiliary data ]

The above describes the transmission/reception operation when MF metadata is transmitted later than media data ((c) of fig. 21 and (d) of fig. 21). Next, a method will be described in which the transmitting device 15 transmits auxiliary data having a function of part of the MF metadata, so that decoding can be started earlier and end-to-end delay can be reduced. Here, an example in which the auxiliary data is further transmitted based on the transmission method shown in fig. 21 (d) will be described, but the method using the auxiliary data can also be applied to the transmission methods shown in fig. 21 (a) to (c).

Fig. 29 (a) is a diagram showing an MMT packet transmitted by the method shown in fig. 21 (d). That is, data is transmitted in the order of MPU metadata, media data, and MF metadata.

Here, sample #1, sample #2, sample #3, and sample #4 are samples included in the media data. Although the media data is stored in the MMT packet in sample units, the media data may be stored in the MMT packet in NAL unit units or in units obtained by dividing NAL units. In addition, sometimes multiple NAL units are aggregated and stored into MMT packets.

As described in D-1 above, in the method shown in (D) of fig. 21, that is, when data is transmitted in the order of MPU metadata, media data, and MF metadata, there is a method of acquiring the MPU metadata, then acquiring the MF metadata, and then decoding the media data. In the method of D-1, although it is necessary to buffer media data for obtaining MF metadata, the method of D-1 has an advantage that it can be applied to a receiving apparatus conforming to MP4 in the related art because decoding is performed using MPU header information. On the other hand, there is a disadvantage that the reception apparatus 20 must wait until the MF metadata is acquired to restart decoding.

In contrast, in the method using the auxiliary data, as shown in fig. 29 (b), the auxiliary data is transmitted prior to the MF metadata.

The MF metadata includes information indicating the DTS, PTS, offset, and size of all samples included in the movie fragment. On the other hand, the auxiliary data includes information indicating the DTS, PTS, offset, and size of some of the samples included in the movie fragment.

For example, the MF metadata includes information of all samples (sample #1 to sample #4), whereas the auxiliary data includes information of a part of samples (sample #1 to sample # 2).

In the case shown in fig. 29 (b), since the samples #1 and #2 can be decoded by using the auxiliary data, the End-to-End delay is reduced compared to the transmission method of D-1. In addition, the auxiliary data may be included by combining the information of the samples, or the auxiliary data may be repeatedly transmitted.

For example, in fig. 29 (c), the transmission device 15 includes the information of sample #1 in the auxiliary information when the auxiliary information is transmitted at the timing of a, and includes the information of sample #1 and sample #2 in the auxiliary information when the auxiliary information is transmitted at the timing of B. When the transmission device 15 transmits the auxiliary information at the timing of C, the auxiliary information includes information of sample #1, sample #2, and sample # 3.

The MF metadata includes information about sample #1, sample #2, sample #3, and sample #4 (information about all samples in a movie fragment).

The assistance data does not necessarily need to be sent immediately after generation.

In addition, in the header of the MMT packet or MMT payload, a type indicating that auxiliary data is stored is specified.

For example, when auxiliary data is saved in the MMT payload using the MPU mode, a data type indicating that it is auxiliary data is specified as a fragment _ type field value (for example, FT ═ 3). The auxiliary data may be data based on the structure of moof, or may have another structure.

When the auxiliary data is saved as a control signal (descriptor, table, message) in the MMT payload, a descriptor tag, table ID, message ID, and the like indicating that the auxiliary data is specified.

In addition, PTS or DTS may be stored in the header of the MMT packet or MMT payload.

[ example of generating auxiliary data ]

An example in which the transmitting apparatus generates the auxiliary data based on the structure of moof will be described below. Fig. 30 is a diagram for explaining an example in which the transmission apparatus generates the auxiliary data based on the structure of moof.

In a general MP4, moof is produced for movie clips as shown in fig. 20. The moof includes information indicating the DTS, PTS, offset, and size of samples included in the movie fragment.

Here, the transmission device 15 constructs an MP4(MP4 file) using only a part of the sample data constituting the MPU, and generates auxiliary data.

For example, as shown in fig. 30 (a), the transmission device 15 generates MP4 using only sample #1 among samples #1 to #4 constituting the MPU, with the header of moof + mdat as auxiliary data.

Next, as shown in fig. 30 (b), the transmission device 15 generates MP4 using the sample #1 and the sample #2 among the samples #1 to #4 constituting the MPU, with the head of moof + mdat as the next auxiliary data.

Next, as shown in fig. 30 (c), the transmission device 15 generates MP4 using the sample #1, the sample #2, and the sample #3 among the samples #1 to #4 constituting the MPU, with the head of moof + mdat as the next auxiliary data.

Next, as shown in fig. 30 (d), the transmission device 15 generates all MPs 4 in samples #1 to #4 constituting the MPU, with the head of moof + mdat being movie fragment metadata.

In addition, although the transmission device 15 generates the auxiliary data for each sample, the auxiliary data may be generated for each N samples. The value of N is an arbitrary number, and for example, when transmitting auxiliary data M times when transmitting 1 MPU, N may be equal to full sample/M.

In addition, the information indicating the shift of the sample in moof may be a shift value after ensuring that the sample entry area of the subsequent number of samples is a NULL area.

In addition, the auxiliary data may be generated so as to be a structure in which the MF metadata is sliced.

Example of receiving operation using auxiliary data

Reception of the assistance data generated as described in fig. 30 will be described. Fig. 31 is a diagram for explaining reception of assistance data. In fig. 31 (a), the number of samples constituting the MPU is 30, and the auxiliary data is generated and transmitted for each 10 samples.

In fig. 30 (a), the auxiliary data #1 includes sample information of samples #1 to #10, the auxiliary data #2 includes sample information of samples #1 to #20, and the MF metadata includes sample information of samples #1 to # 30.

Samples #1 to #10, samples #11 to #20, and samples #21 to #30 are stored in 1 MMT payload, but may be stored in sample units or NAL units, or may be stored in units of fragmentation or aggregation.

The reception device 20 receives packets of MPU elements, samples, MF elements, and auxiliary data, respectively.

The receiving device 20 concatenates the sample data in the order of reception (backward), and updates the current auxiliary data after receiving the latest auxiliary data. In addition, the receiving apparatus 20 can constitute a complete MPU by replacing the auxiliary data with the MF metadata at the end.

When receiving the auxiliary data #1, the receiving device 20 concatenates the data as shown in the upper stage of fig. 31 (b), thereby configuring an MP 4. Accordingly, the receiving apparatus 20 can analyze the samples #1 to #10 using the MPU metadata and the information of the auxiliary data #1, and can perform decoding based on the information of the PTS, DTS, offset, and size included in the auxiliary data.

When receiving the auxiliary data #2, the receiving apparatus 20 concatenates the data as in the middle of fig. 31 (b), thereby configuring the MP 4. Accordingly, the receiving apparatus 20 can analyze the samples #1 to #20 using the MPU metadata and the information of the auxiliary data #2, and can perform decoding based on the information of PTS, DTS, offset, and size included in the auxiliary data.

When receiving the MF metadata, the receiving apparatus 20 concatenates the data as shown in the lower stage of fig. 31 (b), thereby configuring the MP 4. Accordingly, the receiving apparatus 20 can analyze the samples #1 to #30 using the MPU metadata and the MF metadata, and can decode based on information of PTS, DTS, offset, and size included in the MF metadata.

When there is no auxiliary data, the reception device 20 can start acquiring information of the sample only after receiving the MF metadata, and therefore, it is necessary to start decoding after receiving the MF metadata. However, since the transmission device 15 generates and transmits the auxiliary data, the reception device 20 can acquire the information of the sample using the auxiliary data without waiting for the reception of the MF metadata, and thus can advance the decoding start time. Note that, by the transmitter 15 generating the moof-based auxiliary data described with reference to fig. 30, the receiver 20 can directly analyze the auxiliary data by the analyzer (parser) of the conventional MP 4.

Further, the newly generated auxiliary data or MF metadata contains information of a sample that is duplicated with the auxiliary data transmitted in the past. Therefore, even when the past auxiliary data cannot be acquired due to a packet loss or the like, the MP4 can be reconstructed by using the newly acquired auxiliary data or MF metadata, and the information (PTS, DTS, size, and offset) of the sample can be acquired.

In addition, the auxiliary data does not necessarily need to include information of past sample data. For example, auxiliary data #1 may correspond to sample data #1- #10, and auxiliary data #2 may correspond to sample data #11- # 20. For example, as shown in fig. 31 (c), the transmission device 15 may sequentially transmit the full MF metadata as data unit and the unit fragmented from the data unit as auxiliary data.

In order to cope with packet loss, the transmission device 15 may repeatedly transmit the auxiliary data or the MF metadata.

The MMT packet and MMT payload storing the auxiliary data include an MPU serial number and a resource ID, as in the case of MPU metadata, MF metadata, and sample data.

The above-described reception operation using the auxiliary data will be described with reference to the flowchart of fig. 32. Fig. 32 is a flowchart of the receiving operation using the auxiliary data.

First, the receiving apparatus 20 receives the MMT packet and parses the packet header or the payload header (S501). Next, the reception device 20 analyzes whether the fragmentation type is auxiliary data or MF metadata (S502), and if the fragmentation type is auxiliary data, updates the auxiliary data over the past (S503). At this time, if there is no past auxiliary data of the same MPU, the reception device 20 directly sets the received auxiliary data as new auxiliary data. The receiving device 20 then acquires and decodes a sample based on the MPU metadata, auxiliary data, and sample data (S507).

On the other hand, if the slice type is MF metadata, the receiving apparatus 20 overwrites the past auxiliary data with MF metadata in step S505 (S505). The receiving device 20 then acquires and decodes a sample in the form of a complete MPU based on the MPU metadata, MF metadata, and sample data (S506).

Although not shown in fig. 32, in step S502, the receiving apparatus 20 stores data in the buffer when the fragmentation type is MPU metadata, and stores data concatenated backward for each sample in the buffer when the fragmentation type is sample data.

When the auxiliary data cannot be acquired due to a packet loss, the reception device 20 can decode the sample by overwriting the latest auxiliary data or by using the past auxiliary data.

The transmission cycle and the number of times of transmission of the auxiliary data may be predetermined values. Information on the transmission cycle or number (count, countdown) may be transmitted together with the data. For example, the data unit header may store a time stamp such as a transmission cycle, a transmission count, and initial _ cpb _ removal _ delay.

The auxiliary data including information of the first sample of the MPU is transmitted 1 or more times before initial _ CPB _ removal _ delay, and thus the CPB buffer model can be satisfied. At this time, in the MPU timestamp descriptor, a value based on picture timing SEI is saved.

The transmission method using such an auxiliary data reception operation is not limited to the MMT method, and can be applied to a case where a packet having an ISOBMFF file format is streamed, such as MPEG-DASH.

[ Transmission method when 1 MPU is constituted by a plurality of movie fragments ]

In the above description of fig. 19 and the following, 1 MPU is configured by 1 movie segment, and here, a case where 1 MPU is configured by a plurality of movie segments is described. Fig. 33 is a diagram showing a configuration of an MPU configured by a plurality of movie fragments.

In fig. 33, samples (#1- #6) stored in 1 MPU are stored in 2 movie segments. The 1 st movie fragment is generated based on samples #1- #3, generating the corresponding moof box. Movie No. 2 slices are generated based on samples #4- #6, generating the corresponding moof boxes.

The headers of moof box and mdat box in the 1 st movie fragment are saved as movie fragment metadata #1 into the MMT payload and MMT package. On the other hand, the headers of moof box and mdat box in the 2 nd movie fragment are saved as movie fragment metadata #2 in the MMT payload and MMT package. In fig. 33, the MMT payload storing the movie fragment metadata is shaded.

The number of samples constituting the MPU and the number of samples constituting the movie clip are arbitrary. For example, 2 movie slices may be configured by setting the number of samples constituting the MPU to the number of GOP-unit samples and setting one-half of the number of GOP-unit samples as movie slices.

Note that, although an example is shown in which 2 movie fragments (moof box and mdat box) are included in 1 MPU, 3 or more movie fragments included in 1 MPU may be included instead of 2. The samples stored in the movie fragment may be divided into an arbitrary number of samples, instead of the number of equally divided samples.

In fig. 33, the MPU metadata unit and the MF metadata unit are stored as data units in the MMT payload, respectively. However, the transmitting device 15 may store the units of ftyp, mmpu, moov, moof, and the like as data units in the MMT payload in units of data units, or may store the data units in units of data unit fragments in the MMT payload. The transmission device 15 may store the MMT payload in units of aggregated data units.

Further, in fig. 33, samples are held in MMT payload in sample units. However, the transmission device 15 may constitute a data unit not by sample unit but by NAL unit or by a unit obtained by combining a plurality of NAL units, and store the data unit in the MMT payload. The transmission device 15 may store the data units in units of fragments in the MMT payload, or may store the data units in units of aggregates in the MMT payload.

In fig. 33, the MPU is configured by moof #1, mdat #1, moof #2, and mdat #2 in this order, and offset is given to moof #1 as mdat #1 to which the corresponding values are added. However, the offset may be assigned as mdat #1 before moof # 1. However, in this case, movie fragment metadata cannot be generated in the form of moof + mdat, and the headers of moof and mdat are separately transmitted.

Next, a transmission procedure of the MMT packet when transmitting the MPU having the configuration described in fig. 33 will be described. Fig. 34 is a diagram for explaining the transmission sequence of MMT packets.

Fig. 34 (a) shows a transmission sequence when the MMT packet is transmitted in the order of the constituent MPUs shown in fig. 33. Specifically, (a) of fig. 34 shows an example in which MPU elements, MF elements #1, media data #1 (samples #1 to #3), MF elements #2, and media data #2 (samples #4 to #6) are transmitted in this order.

Fig. 34 (b) shows an example in which the MPU element, media data #1 (samples #1 to #3), MF element #1, media data #2 (samples #4 to #6), and MF element #2 are transmitted in this order.

Fig. 34 (c) shows an example in which media data #1 (samples #1 to #3), MPU elements, MF elements #1, media data #2 (samples #4 to #6), and MF elements #2 are transmitted in this order.

MF tuple #1 is generated with samples #1- #3, and MF tuple #2 is generated with samples #4- # 6. Therefore, when the transmission method of fig. 34 (a) is used, a delay due to encapsulation occurs in transmission of sample data.

In contrast, when the transmission methods of fig. 34 (b) and 34 (c) are used, samples can be transmitted without waiting for the generation of an MF element, and thus the end-to-end delay can be reduced without causing a delay due to encapsulation.

In the transmission sequence in fig. 34 (a), since 1 MPU is divided into a plurality of movie fragments, and the number of samples stored in the MF element is reduced compared to the case of fig. 19, the delay amount due to packaging can be reduced compared to the case of fig. 19.

In addition to the method described here, for example, the transmission device 15 may connect the MF element #1 and MF element #2 and transmit them together at the end of the MPU. In this case, the MF elements of different movie fragments may be aggregated and stored in 1 MMT payload. Alternatively, the MF elements of different MPUs may be aggregated and stored in the MMT payload.

[ receiving method when 1 MPU is constituted by a plurality of movie fragments ]

Here, an operation example of the reception device 20 that receives and decodes the MMT packet transmitted in the transmission order described in fig. 34 (b) will be described. Fig. 35 and 36 are diagrams for explaining such an operation example.

The reception device 20 receives the MMT packet including the MPU element, the sample, and the MF element, which is transmitted in the transmission order shown in fig. 35, respectively. The sample data is concatenated in the order of reception.

The receiving apparatus 20 concatenates the data at T1, which is the time when the MF element #1 is received, as shown in (1) of fig. 36, and configures an MP 4. Accordingly, the receiving apparatus 20 can acquire samples #1 to #3 based on the MPU metadata and the information of the MF element #1, and can decode based on the information of PTS, DTS, offset, and size included in the MF element.

Further, the reception device 20 concatenates the data as shown in (2) of fig. 36 at T2, which is the time when the MF element #2 is received, and configures the MP 4. Accordingly, the receiving apparatus 20 can acquire samples #4 to #6 based on the MPU metadata and the information of the MF element #2, and can decode based on the information of the PTS, DTS, offset, and size of the MF element. The reception device 20 may also acquire samples #1 to #6 based on the information of MF metadata #1 and MF metadata #2 by concatenating data to form MP4 as shown in fig. 36 (3).

By dividing 1 MPU into a plurality of movie fragments, the time until the first MF element among the MPUs is acquired is shortened, and thus the decoding start time can be advanced. Further, the buffer size for accumulating samples before decoding can be reduced.

The transmission device 15 may set the division unit of the movie fragment so that the time from the transmission (or reception) of the first sample in the movie fragment to the transmission (or reception) of the MF element corresponding to the movie fragment is shorter than the time of initial _ cpb _ removal _ delay specified by the encoder. By setting in this manner, the reception buffer can be matched to the cpb buffer, and low-delay decoding can be achieved. In this case, the absolute time based on initial _ cpb _ remove _ delay can be used for PTS and DTS.

The transmission device 15 may divide movie fragments at equal intervals or may divide subsequent movie fragments at intervals shorter than the previous movie fragment. Accordingly, the receiving apparatus 20 can receive the MF element including the information of the sample before the sample is decoded without fail, and can perform continuous decoding.

The following 2 methods can be used to calculate the absolute time of PTS and DTS.

(1) The absolute time of the PTS and DTS is determined based on the reception time (T1 or T2) of MF element #1 or MF element #2 and the relative time of the PTS and DTS included in the MF element.

(2) The absolute time of the PTS and DTS is determined based on the absolute time of signaling transmission from the transmitting side such as an MPU timestamp descriptor, and the relative time of the PTS and DTS included in the MF element.

The absolute time at which the (2-a) transmission device 15 transmits the signaling may be an absolute time calculated based on initial _ cpb _ removal _ delay specified by the encoder.

The absolute time at which the (2-B) transmission device 15 transmits the signaling may be an absolute time calculated based on a predicted value of the reception time of the MF element.

Further, MF stream #1 and MF stream #2 may be repeatedly transmitted. By repeatedly transmitting MF element #1 and MF element #2, the receiving apparatus 20 can acquire an MF element again even when the MF element cannot be acquired due to a packet loss or the like.

An identifier indicating the order of movie fragments can be saved in the payload header of the MFU containing the samples constituting the movie fragment. On the other hand, an identifier indicating the order of MF elements constituting a movie fragment is not included in the MMT payload. Accordingly, the reception apparatus 20 recognizes the order of MF elements through packet _ sequence _ number. Alternatively, the transmitting device 15 may store the identifier indicating that the MF element belongs to the first movie fragment in the control information (message, table, descriptor), MMT header, MMT payload header, or data unit header as a signaling.

The transmitting device 15 may transmit the MPU element, the MF element, and the sample in a predetermined transmission order determined in advance, and the receiving device 20 may perform the receiving process based on the predetermined transmission order determined in advance. The transmission device 15 may transmit the transmission order as signaling, and the reception device 20 may select (determine) the reception process based on the signaling information.

The reception method described above will be described with reference to fig. 37. Fig. 37 is a flowchart illustrating the operation of the receiving method described with reference to fig. 35 and 36.

First, the reception device 20 determines (identifies) whether the data included in the payload is MPU metadata, MF metadata, or sample data (MFU) based on the slice type indicated by the MMT payload (S601, S602). When the data is sample data, the receiving apparatus 20 buffers the sample, and waits for the MF metadata corresponding to the sample to be received and decoding to be started (S603).

On the other hand, if the data is MF metadata in step S602, the reception device 20 acquires sample information (PTS, DTS, position information, and size) from the MF metadata, acquires a sample based on the acquired sample information, and decodes and presents the sample based on the PTS and DTS (S604).

In addition, although not shown, if the data is MPU metadata, initialization information necessary for decoding is included in the MPU metadata. Therefore, the reception device 20 accumulates the initialization information and uses it for decoding sample data in step S604.

When the receiving apparatus 20 stores the received data (MPU metadata, MF metadata, and sample data) of the MPUs in the storing apparatus, the data is rearranged into the configuration of the MPUs described with reference to fig. 19 or 33 and then stored.

In the transmitting side, the MMT packet is given a packet sequence number to a packet having the same packet ID. In this case, the MMT packet including MPU metadata, MF metadata, and sample data may be rearranged in the transmission order and then given a packet sequence number, or may be given a packet sequence number in the order before rearrangement.

When the packet sequence numbers are given in the order before the rearrangement, the receiving apparatus 20 can rearrange the data into the constituent order of the MPUs based on the packet sequence numbers, and accumulation becomes easy.

[ method of detecting the beginning of an Access Unit and the beginning of a slice ]

A method of detecting the start of an access unit or the start of a slice segment based on information of an MMT header and an MMT payload header will be described.

Here, 2 examples are shown of a case where non-VCL NAL units (access unit delimiter, VPS, SPS, PPS, SEI, and the like) are collectively stored as data units in the MMT payload, and a case where non-VCL NAL units are stored as data units in 1 MMT payload by aggregating the data units.

In the case of fig. 38, the beginning of an access unit is an MMT packet whose fragment _ type value is MFU, and is the beginning data of an MMT payload containing a data unit whose aggregation _ flag value is 1 and whose offset value is 0. At this time, the Fragmentation _ indicator value is 0.

In the case of fig. 38, the start of a slice segment is an MMT packet whose fragment _ type value is MFU, and is the start data of an MMT payload whose aggregation _ flag value is 0 and whose fragmentation _ indicator value is 00 or 01.

Fig. 39 is a diagram showing a case where non-VCL NAL units are grouped as data units. In addition, the field value of the packet header is as shown in fig. 17 (or fig. 18).

In the case of fig. 39, with respect to the head of an access unit, the head data of the payload in a packet whose Offset value is 0 is the head of the access unit.

In the case of fig. 39, regarding the beginning of the clip, the beginning data of the payload of the packet whose Offset value is a value other than 0 and whose Fragmentation indicator value is 00 or 01 is the beginning of the clip.

[ receiving process when packet loss occurs ]

In general, when transmitting data in the MP4 format in an environment where packet loss occurs, the reception apparatus 20 recovers the packet by means of Application Layer FEC (Application Layer FEC) or packet retransmission control.

However, when packet loss occurs when AL-FEC is not used in a stream such as broadcast, the packet cannot be recovered.

The receiving apparatus 20 needs to restart decoding of video and audio after data is lost due to packet loss. For this reason, the receiving apparatus 20 needs to detect the beginning of an access unit or NAL unit and start decoding from the beginning of the access unit or NAL unit.

However, since the start of the NAL unit of MP4 format does not include a start code, the reception apparatus 20 cannot detect the start of an access unit or NAL unit even when parsing the stream.

Fig. 40 is a flowchart of the operation of the receiving apparatus 20 when a packet loss occurs.

The reception apparatus 20 detects packet loss by using a packet sequence number, a packet counter, or a Fragment counter of a header of the MMT packet or the MMT payload (S701), and determines which packet is lost according to a context (S702).

When the reception device 20 determines that packet loss has not occurred (no in S702), it constructs an MP4 file and decodes an access unit or NAL unit (S703).

When determining that packet loss has occurred (yes in S702), the reception device 20 generates NAL units corresponding to the NAL units for which packet loss occurs from dummy (dummy) data, and constructs an MP4 file (S704). The receiving apparatus 20 indicates the type of the NAL unit as dummy data when the dummy data is added to the NAL unit.

The receiving apparatus 20 detects the head of the next access unit or NAL unit by the method described with reference to fig. 17, 18, 38, and 39, inputs the detected head data to the decoder from the head data, and can restart decoding (S705).

When a packet loss occurs, the receiving apparatus 20 may restart decoding from the beginning of the access unit and the NAL unit based on the information detected from the packet header, or may restart decoding from the beginning of the access unit and the NAL unit based on the header information of the reconstructed MP4 file including the NAL unit of the dummy data.

When the reception device 20 accumulates the MP4 file (MPU), packet data (NAL unit or the like) lost due to packet loss may be acquired separately by broadcasting or communication and accumulated (replaced).

At this time, when the reception device 20 acquires a lost packet by communication, it notifies the server of information (packet ID, MPU serial number, packet serial number, IP stream number, IP address, and the like) of the lost packet, and acquires the packet. The reception device 20 may acquire packet groups before and after the lost packet at the same time, not limited to the lost packet.

[ method of composing movie fragments ]

Here, the method of forming a movie fragment will be described in detail.

As explained with reference to fig. 33, the number of samples constituting a movie fragment and the number of movie fragments constituting 1 MPU are arbitrary. For example, the number of samples constituting a movie fragment and the number of movie fragments constituting 1 MPU may be determined to be fixed to predetermined numbers or may be determined dynamically.

Here, by configuring movie fragments so as to satisfy the following conditions on the transmitting side (transmitting device 15), it is possible to ensure low-delay decoding by the receiving device 20.

The conditions are as follows.

The transmitting apparatus 15 generates and transmits MF elements by dividing sample data into units as movie fragments so that the receiving apparatus 20 can receive MF elements including information of an arbitrary sample (i)) before the decoding time (dts (i)) of the sample is determined.

Specifically, the transmitting device 15 constructs a movie fragment using samples (including the i-th sample) that have been encoded prior to the dts (i).

In order to ensure low-latency decoding, the number of samples constituting a movie fragment or the number of movie fragments constituting 1 MPU is dynamically determined, for example, by the following method.

(1) When decoding is started, the decoding time DTS (0) of Sample (0) at the beginning of the GOP is a time based on initial _ cpb _ removal _ delay. The transmitting apparatus constructs 1 st movie fragment using already coded samples at a time prior to DTS (0). The transmission device 15 generates MF metadata corresponding to the 1 st movie fragment and transmits the MF metadata at a timing earlier than DTS (0).

(2) The transmission device 15 constructs movie fragments so as to satisfy the above conditions in the subsequent samples.

For example, when the first sample of a movie fragment is assumed to be the kth sample, the MF element of the movie fragment including the kth sample is transmitted before the decoding time dts (k) of the kth sample. The transmitting device 15 uses the kth sample to the I-th sample to form the movie fragment when the encoding completion time of the I-th sample precedes dts (k) and the encoding completion time of the (I +1) -th sample succeeds dts (k).

In addition, the transmitting device 15 may also use the kth sample to less than the ith sample to form a movie fragment.

(3) After the last sample of the MPU is encoded, the transmitting device 15 constructs a movie fragment from the remaining samples, and generates and transmits MF metadata corresponding to the movie fragment.

In addition, the transmitting apparatus 15 may not use all the samples that have been already encoded to constitute the movie fragment, but use some of the samples that have already been already encoded to constitute the movie fragment.

In addition, the above description shows an example in which the number of samples constituting a movie fragment and the number of movie fragments constituting 1 MPU are dynamically determined based on the above conditions in order to ensure low-latency decoding. However, the method of determining the number of samples and the number of movie clips is not limited to this method. For example, the number of movie clips constituting 1 MPU may be fixed to a predetermined value, and the number of samples may be determined so as to satisfy the above conditions. The number of movie fragments constituting 1 MPU and the time at which the movie fragments are divided (or the code amount of the movie fragments) may be fixed to predetermined values, and the number of samples may be determined so as to satisfy the above conditions.

In addition, when the MPU is divided into a plurality of movie fragments, information indicating whether or not the MPU is divided into a plurality of movie fragments, an attribute of the divided movie fragments, or an attribute of an MF element corresponding to the divided movie fragments may be transmitted.

Here, the attribute of a movie fragment is information indicating whether the movie fragment is the first movie fragment of the MPU, the last movie fragment of the MPU, or other movie fragments.

The attribute of the MF element is information indicating whether the MF element is an MF element corresponding to the first movie fragment of the MPU, an MF element corresponding to the last movie fragment of the MPU, an MF element corresponding to another movie fragment, or the like.

The transmission device 15 may store and transmit the number of samples constituting a movie fragment and the number of movie fragments constituting 1 MPU as control information.

[ operation of the receiving apparatus ]

The operation of the receiving apparatus 20 based on the movie fragment configured as described above will be described.

The receiving apparatus 20 determines the absolute time of each of the PTS and DTS based on the absolute time transmitted as a signaling from the transmitting side, such as an MPU timestamp descriptor, and the relative time of the PTS and DTS included in the MF element.

When the MPU is divided based on the information on whether the MPU is divided into a plurality of movie fragments, the reception apparatus 20 performs processing as follows based on the attributes of the divided movie fragments.

(1) When a movie fragment is the first movie fragment of the MPU, the reception device 20 generates the absolute time of the PTS and DTS using the absolute time of the PTS of the first sample included in the MPU timestamp descriptor and the relative time of the PTS and DTS included in the MF element.

(2) When the movie fragment is not the first movie fragment of the MPU, the reception device 20 generates the absolute time of the PTS and DTS using the relative time of the PTS and DTS included in the MF element without using the information of the MPU timestamp descriptor.

(3) When the movie segment is the last movie segment of the MPU, the reception device 20 calculates the absolute time of the PTS and DTS of all samples, and then resets the calculation process of the PTS and DTS (addition process of relative time). The reset process may be performed in the movie clip at the beginning of the MPU.

The receiving apparatus 20 may determine whether or not a movie fragment is divided as follows. The receiving apparatus 20 may acquire attribute information of the movie fragment as follows.

For example, the reception apparatus 20 may also determine whether or not to be split based on the identifier movie _ fragment _ sequence _ number field value indicating the order of movie fragments shown in the MMTP payload header.

Specifically, the receiving apparatus 20 may determine that the MPU is divided into a plurality of movie fragments when the number of movie fragments included in 1 MPU is 1, the movie _ fragment _ sequence _ number field value is 1, and a value equal to or greater than 2 is present in the field value.

The receiving apparatus 20 may determine that the MPU is divided into a plurality of movie fragments when the number of movie fragments included in 1 MPU is 1, the movie _ fragment _ sequence _ number field value is 0, and values other than 0 exist in the field value.

The attribute information of the movie fragment can be determined based on movie _ fragment _ sequence _ number in the same manner.

Note that instead of using movie _ fragment _ sequence _ number, it is also possible to determine whether or not a movie fragment is divided, or attribute information of a movie fragment, by counting the transmission of movie fragments or MF elements included in 1 MPU.

With the above-described configuration of the transmitting apparatus 15 and the receiving apparatus 20, the receiving apparatus 20 can receive movie fragment metadata at intervals shorter than the MPU, and can start decoding with low delay. Further, decoding with low delay is possible by the decoding process based on the MP4 analysis method.

The reception operation when the MPU is divided into a plurality of movie fragments as described above will be described with a flowchart. Fig. 41 is a flowchart of a receiving operation when the MPU is divided into a plurality of movie fragments. In addition, the flowchart illustrates the operation of step S604 of fig. 37 in more detail.

First, the receiving apparatus 20 acquires MF metadata when the data type is MF metadata, based on the data type indicated by the MMTP payload header (S801).

Next, the reception device 20 determines whether or not the MPU is divided into a plurality of movie fragments (S802), and when the MPU is divided into a plurality of movie fragments (yes in S802), determines whether or not the received MF metadata is metadata at the head of the MPU (S803). When the received MF metadata is MF metadata at the head of the MPU (yes in S803), the reception apparatus 20 calculates absolute times of PTS and DTS from the absolute time of PTS indicated by the MPU timestamp descriptor and the relative times of PTS and DTS indicated by the MF metadata (S804), and determines whether the received MF metadata is last metadata of the MPU (S805).

On the other hand, when the received MF metadata is not the MF metadata at the beginning of the MPU (no in S803), the reception apparatus 20 calculates the absolute times of the PTS and DTS using the relative times of the PTS and DTS shown in the MF metadata without using the information of the MPU timestamp descriptor (S808), and proceeds to the process of step S805.

If it is determined in step S805 that the MF metadata is the last MF metadata of the MPU (yes in S805), the reception device 20 calculates the absolute times of the PTSs and DTSs of all samples, and then resets the calculation processes of the PTSs and DTSs. If it is determined in step S805 that the MF metadata is not the last MF metadata of the MPU (no in S805), the reception apparatus 20 ends the process.

When it is determined in step S802 that the MPU has not been divided into a plurality of movie fragments (no in S802), the reception apparatus 20 acquires sample data based on MF metadata transmitted after the MPU, and determines PTS and DTS (S807).

Although not shown, the reception device 20 finally performs decoding processing and presentation processing based on the determined PTS and DTS.

[ problem generated when movie fragments are divided and solving strategy thereof ]

Thus, a method of reducing end-to-end delay by splitting movie slices is described. The newly created problem and its solution when splitting a movie fragment will be described later.

First, as a background, a picture structure in encoded data will be described. Fig. 42 is a diagram showing an example of a prediction structure of each picture of temporalld (time ID) when temporal adaptability is realized.

In a Coding scheme such as MPEG-4AVC or HEVC (High Efficiency Video Coding), temporal adaptability (temporal adaptability) can be achieved by using B pictures (bidirectional reference prediction pictures) that can be referred to from other pictures.

The temporalld shown in fig. 42 (a) is an identifier of a hierarchy of the coding structure, and a larger value of temporalld indicates a deeper hierarchy. The square blocks represent pictures, Ix in each block represents an I picture (intra-picture prediction picture), Px represents a P picture (forward reference prediction picture), and Bx represent B pictures (bidirectional reference prediction pictures). The x in Ix/Px/Bx shows the display order, representing the order in which the pictures are displayed. Arrows between pictures indicate a reference relationship, and for example, a picture indicating B4 generates a predicted image using I0 and B8 as reference images. Here, one picture is prohibited from using another picture having a temporalld larger than its own temporalld as a reference image. The predetermined le level is for temporal adaptability, and for example, in fig. 42, if all pictures are decoded, a 120fps (frame per second) video is obtained, and if only a level having a temporalld of 0 to 3 is decoded, a 60fps video is obtained.

Fig. 43 is a diagram showing a relationship between the Decoding Time (DTS) and the display time (PTS) of each picture in fig. 42. For example, picture I0 shown in fig. 43 is displayed after the decoding of B4 is completed, so that no gap is generated in the decoding and display.

As shown in fig. 43, when B pictures are included in the prediction structure, the decoding order differs from the display order, and therefore, it is necessary to perform a picture delay process and a picture rearrangement (reordering) process after decoding the pictures in the receiving apparatus 20.

Although the above describes an example of a prediction structure of a picture in temporal adaptability, in some cases, when temporal adaptability is not used, a delay process and a reordering process of pictures may be required by the prediction structure. Fig. 44 is a diagram showing an example of a prediction structure of a picture that requires a picture delay process and a picture reordering process. In addition, the numbers in fig. 44 indicate the decoding order.

As shown in fig. 44, according to the prediction structure, the sample at the head in the decoding order may be different from the sample at the head in the presentation order, and in fig. 44, the sample at the head in the presentation order is the 4 th sample in the decoding order. Fig. 44 shows an example of the prediction structure, but the prediction structure is not limited to this. In other prediction structures, the sample that starts in the decoding order may be different from the sample that starts in the presentation order.

Fig. 45 is a diagram showing an example in which an MPU configured by an MP4 format is divided into a plurality of movie fragments and stored in an MMTP payload or an MMTP packet, as in fig. 33. The number of samples constituting the MPU or the number of samples constituting the movie segment is arbitrary. For example, 2 movie slices may be configured by setting the number of samples constituting the MPU to the number of samples in GOP units and setting the number of samples that is one-half of the GOP units as movie slices. The 1 sample may be regarded as 1 movie clip, or the samples constituting the MPU may not be divided.

Fig. 45 shows an example in which 1 MPU includes 2 movie fragments (moof box and mdat box), but the number of movie fragments included in 1 MPU may be different from 2. The number of movie fragments included in 1 MPU may be 3 or more, or may be the number of samples included in the MPU. Further, the samples stored in the movie fragment may be divided into an arbitrary number of samples, instead of the number of equally divided samples.

The movie fragment metadata (MF metadata) includes information on the PTS, DTS, offset, and size of a sample included in a movie fragment, and when decoding a sample, the receiving apparatus 20 extracts the PTS and DTS from the MF element including the information on the sample, and determines the decoding timing or presentation timing.

Hereinafter, for the sake of detailed description, the absolute value of the decoding time of the i sample is referred to as dts (i), and the absolute value of the presentation time is referred to as pts (i).

The information of the ith sample among the time stamp information stored in the moof of the MF element is specifically a relative value between the decoding time of the ith sample and the (i +1) th sample, and a relative value between the decoding time of the ith sample and the presentation time, and hereinafter, these are referred to as dt (i) and ct (i).

In the movie fragment metadata #1, the DT (i) and CT (i) of samples #1- #3 are included, and in the movie fragment metadata #2, the DT (i) and CT (i) of samples #4- #6 are included.

The PTS absolute value of the access unit at the beginning of the MPU is stored in the MPU timestamp descriptor, and the reception device 20 calculates the PTS and DTs based on the PTS _ MPU, CT, and DT of the access unit at the beginning of the MPU.

Fig. 46 is a diagram for explaining a method of calculating PTS and DTS and problems when an MPU is configured with samples #1 to # 10.

Fig. 46 (a) shows an example in which the MPU is not divided into movie fragments, fig. 46 (b) shows an example in which the MPU is divided into 2 movie fragments of 5 sample units, and fig. 46 (c) shows an example in which the MPU is divided into 10 movie fragments per sample unit.

As described with reference to fig. 45, when PTS and DTs are calculated using the MPU timestamp descriptor and the timestamp information (CT and DT) in the MP4, the sample that is the top in the presentation order of fig. 44 is the 4 th sample in the decoding order. Therefore, the PTS held in the MPU timestamp descriptor is the PTS (absolute value) of the sample that is 4 th in decoding order. Hereinafter, this sample will be referred to as "a sample". The sample at the beginning of the decoding order is referred to as "B sample".

Since the absolute time information relating to the time stamp is only information of the MPU time stamp descriptor, the reception apparatus 20 cannot calculate the PTS (absolute time) and DTS (absolute time) of other samples until the a sample arrives. The receiving apparatus 20 cannot calculate the PTS and DTS of the B sample.

In the example of fig. 46 (a), the a samples are included in the same movie fragment as the B samples and stored in 1 MF element. Therefore, the reception device 20 can determine the DTS of the B sample immediately after receiving the MF element.

In the example of fig. 46 (B), the a samples are contained in the same movie fragment as the B samples and stored in 1 MF element. Therefore, the reception device 20 can determine the DTS of the B sample immediately after receiving the MF element.

In the example of fig. 46 (c), the a samples are contained in different movie fragments from the B samples. Therefore, the reception apparatus 20 cannot determine the DTs of the B sample unless it receives the MF element including the CT and DT of the movie fragment including the a sample.

Therefore, in the case of the example in fig. 46 (c), the reception apparatus 20 cannot start decoding immediately after the B sample arrives.

As described above, if a samples are not included in movie fragments including B samples, the reception apparatus 20 cannot start decoding of B samples unless the MF element involved in the movie fragment including a samples is received.

This problem occurs when the first sample in the cue order does not coincide with the first sample in the decoding order, because the movie fragment is divided to such an extent that the a sample and the B sample are not stored in the same movie fragment. This problem occurs regardless of whether the MF element is sent later or earlier.

As described above, when the first sample in the presentation order does not match the first sample in the decoding order, if the a sample and the B sample are not stored in the same movie fragment, the DTS cannot be determined immediately after the B sample is received. Then, the transmission device 15 separately transmits information enabling the reception side to calculate the DTS (absolute value) of the B sample or the DTS (absolute value) of the B sample. Such information may also be transmitted using control information, a header, or the like.

The reception device 20 calculates the DTS (absolute value) of the B sample using such information. Fig. 47 is a flowchart of a receiving operation when calculating DTS using such information.

The receiving apparatus 20 receives the movie fragment at the beginning of the MPU (S901), and determines whether or not the a sample and the B sample are stored in the same movie fragment (S902). When the information is stored in the same movie fragment (yes in S902), the reception apparatus 20 calculates the DTS using only the information of the MF element without using the DTS (absolute time) of the B sample, and starts decoding (S904). In step S904, the reception device 20 may determine the DTS using the DTS of the B sample.

On the other hand, if the a sample and the B sample are not stored in the same movie fragment in step S902 (no in S902), the receiving apparatus 20 acquires the DTS (absolute time) of the B sample, determines the DTS, and starts decoding (S903).

In the above description, an example has been described in which the absolute value of the decoding time and the absolute value of the presentation time of each sample are calculated using the MF element (time stamp information stored in moof of the MP4 format) in the MMT standard, but it is needless to say that the MF element may be replaced with any control information that can be used to calculate the absolute value of the decoding time and the absolute value of the presentation time of each sample. Examples of such control information include control information obtained by replacing the relative value ct (i) of the decoding time of the i-th sample and the (i +1) -th sample with the relative value ct (i) of the presentation time of the i-th sample and the (i +1) -th sample, or control information including both the relative value ct (i) of the decoding time of the i-th sample and the (i +1) -th sample and the relative value ct (i) of the presentation time of the (i +1) -th sample of the i-th sample.

(embodiment mode 3)

[ outline ]

In embodiment 3, a description will be given of a content transmission method and a data structure in the case of transmitting a content such as a video, audio, subtitle, and data broadcast by broadcasting. That is, a description will be given of a content transmission method and a data structure specialized for reproduction of a broadcast stream.

In embodiment 3, an example in which an MMT scheme (hereinafter, also simply referred to as MMT) is used as the multiplexing scheme is described, but other multiplexing schemes such as MPEG-DASH and RTP may be used.

First, details of a method of storing a payload in a Data Unit (DU) in an MMT will be described. Fig. 48 is a diagram for explaining a method of depositing a data unit in MMT into a payload.

In MMT, a transmitting apparatus stores a part of data constituting an MPU as a data unit in an MMTP payload, adds a header, and transmits the data. The MMTP payload header and the MMTP header are contained in the header. The unit of a data unit may be NAL unit or sample unit. In the case where the MMTP packet is scrambled, the payload becomes the object of the scrambling.

Fig. 48 (a) shows an example in which the transmitting apparatus stores a plurality of data units in a single payload in a lump. In the example of fig. 48 (a), a Data Unit Header (DUH) and a Data Unit Length (DUL) are assigned to the head of each of a plurality of Data units, and a plurality of Data units to which the Data Unit Header and the Data Unit Length are assigned are collected and stored in the payload.

Fig. 48 (b) shows an example of storing one data unit in one payload. In the example of fig. 48 (b), a data unit header is assigned to the head of a data unit and stored in the payload. Fig. 48 (c) shows an example in which one data unit is divided, and a data unit header is added to the divided data unit and stored in the payload.

The data units are classified into the following types: a time-MFU, which is a medium including information related to synchronization such as video, audio, and subtitles, a non-time-MFU, which is a medium such as a file not including information related to synchronization, MPU metadata, MF metadata, and the like, and a data unit header is determined according to the type of a data unit. In addition, no data unit header exists in the MPU metadata and the MF metadata.

The transmitting apparatus cannot group different types of data units as a principle, but may be specified to group different types of data units. For example, when the size of the MF metadata is small, such as when the MF metadata is divided into movie fragments for each sample, the number of packets can be reduced and the transmission capacity can be reduced by integrating the MF metadata with the media data.

When the data unit is an MFU, a part of information of the MPU, such as information configuring the MPU (MP4), is stored as a header.

For example, the header of the timed-MFU includes movie _ fragment _ sequence _ number, sample _ number, offset, priority, and dependency _ counter, and the header of the non-timed-MFU includes item _ iD. The meaning of the fields is shown in standards such as ISO/IEC23008-1 or ARIB STD-B60. The meaning of each field defined in such a standard is described below.

movie _ fragment _ sequence _ number represents the sequence number of the movie fragment to which the MFU belongs, and is also shown in ISO/IEC 14496-12.

sample _ number represents the sample number to which the MFU belongs, and is also shown in ISO/IEC 14496-12.

The offset represents an offset amount of the MFU in the sample to which the MFU belongs in byte units.

The priority indicates the relative importance of the MFU in the MPU to which the MFU belongs, and an MFU with a higher priority number is more important than an MFU with a lower priority number.

The dependency _ counter indicates the number of MFUs on which the decoding process depends (i.e., the number of MFUs that cannot perform the decoding process if the decoding process is not performed on the MFUs). For example, when a B picture or a P picture refers to an I picture when the MFU is HEVC, the B picture or the P picture cannot be decoded unless the I picture is decoded.

Therefore, in the case where the MFU is a sample unit, in the dependency _ counter in the MFU of the I picture, the number of pictures referring to the I picture is shown. In the case where the MFU is an NAL unit, the dependency _ counter in the MFU belonging to the I picture indicates the number of NAL units belonging to a picture referring to the I picture. Furthermore, in the case of a video signal subjected to temporal hierarchical coding, since the MFU of the extension layer depends on the MFU of the base layer, the number of MFUs of the extension layer is indicated in the dependency _ counter of the MFUs of the base layer. The cost field cannot be generated if not after the number of MFUs that depend is decided.

item _ iD represents an identifier uniquely determining an item.

[ MP4 non-support mode ]

As described in fig. 19 and 21, as a method for transmitting an MPU in the MMT by the transmission apparatus, there are a method for transmitting MPU metadata or MF metadata before or after media data, and a method for transmitting only media data. In addition, there are a method of decoding using a reception apparatus or a reception method conforming to MP4 and a method of decoding without using a header.

As a method of transmitting data specialized for broadcast stream reproduction, for example, there is a method of transmitting data that does not support MP4 reconstruction in a receiving apparatus.

As a transmission method that does not support MP4 reconstruction in the receiving apparatus, for example, as shown in fig. 21 (b), a method is used in which metadata (MPU metadata and MF metadata) is not transmitted. In this case, the field value of the fragmentation type (information indicating the kind of the data unit) included in the MMTP packet is fixed to 2(═ MFU).

When metadata is not transmitted, as described above, in a receiving device or the like conforming to MP4, the received data cannot be decoded into MP4, but can be decoded without using metadata (header).

Therefore, the metadata is not information that is necessarily required for decoding and reproducing the broadcast stream. Also, the information of the data unit header in the timed-MFU illustrated in fig. 48 is information for reconstructing the MP4 in the receiving apparatus. Since the MP4 does not need to be reconstructed in the broadcast stream reproduction, information of a data unit header (hereinafter also referred to as a timed-MFU header) in the timed-MFU is not necessarily required for the broadcast stream reproduction.

The receiving apparatus can easily reconstruct the MP4 by using the metadata and information for reconstructing the MP4 (hereinafter also referred to as MP4 configuration information) in the data unit header. However, even if only one of the metadata and the MP4 configuration information in the data unit header is transmitted, the receiving apparatus cannot easily reconstruct the MP 4. Only one of the transmission metadata and the information for reconstructing the MP4 provides little advantage, and the generation and transmission of unnecessary information increases the processing and reduces the transmission efficiency.

Therefore, the transmitting apparatus controls the data structure and transmission of the MP4 configuration information by the following method. The transmitting apparatus determines whether or not the MP4 configuration information is indicated in the data unit header based on whether or not the metadata is transmitted. Specifically, the transmitting apparatus indicates MP4 configuration information in the data unit header when metadata is transferred, and does not indicate MP4 configuration information in the data unit header when metadata is not transferred.

As a method of not indicating the MP4 configuration information in the data unit header, for example, the following method can be used.

1. The transmitting apparatus sets the MP4 configuration information as reserved (reserved) and does not operate. This can reduce the amount of processing on the transmitting side (the amount of processing of the transmitting device) for generating the MP4 configuration information.

2. The transmitting apparatus deletes the MP4 configuration information and performs header compression. This reduces the amount of processing on the transmitting side for generating the MP4 configuration information, and reduces the transmission capacity.

In addition, when deleting the MP4 configuration information and performing header compression, the transmission device may indicate a flag indicating that the MP4 configuration information is deleted (compressed). The flag is shown in a header (MMTP header, MMTP payload header, data unit header) or control information, etc.

The information indicating whether or not to transmit the metadata may be determined in advance, or may be separately transmitted to the receiving apparatus by signaling (signaling) in the header or the control information.

For example, information indicating whether or not to transmit metadata corresponding to the MFU may be stored in the MFU header.

On the other hand, the receiving apparatus can determine whether or not the MP4 configuration information is indicated based on whether or not the metadata is transmitted.

Here, when the order of data transmission (for example, the order of MPU metadata, MF metadata, and media data) is determined, the receiving apparatus may determine whether or not the metadata is received before the media data.

In the case where the MP4 configuration information is represented, the reception apparatus uses the MP4 configuration information for the reconstruction of the MP 4. Alternatively, the receiving apparatus can use MP4 configuration information when detecting the head of another access unit or NAL unit, or the like.

The MP4 configuration information may be all or part of the timed-MFU header.

In the same manner, the transmitting apparatus may determine whether the item id is represented in the non-timed-MFU header based on whether or not the metadata is transmitted.

The transmitting apparatus may indicate the MP4 configuration information only in one of the time-MFU and non-time-MFU. When only one of the two parties represents the MP4 configuration information, the transmitting device determines whether to represent the MP4 configuration information based on not only whether to transmit the metadata but also on whether to represent the time-MFU or the non-time-MFU. In the receiving apparatus, it can be determined whether or not the MP4 configuration information is represented based on whether or not the metadata is transmitted and the timed/non-timed flag.

In the above description, the transmitting apparatus determines whether or not to indicate MP4 configuration information based on whether or not metadata (both MPU metadata and MF metadata) is transmitted. However, the transmission device may be: when a part of the metadata (either MPU metadata or MF metadata) is not transmitted, the MP4 configuration information is not indicated.

The transmitting apparatus may determine whether or not to indicate the MP4 configuration information based on information other than the metadata.

For example, it is also possible to provide: modes such as MP 4-supported mode/MP 4-unsupported mode are defined, and the transmitting apparatus indicates MP4 configuration information in the data unit header when the MP4 supports the mode, and does not indicate MP4 configuration information in the data unit header when the MP 4-unsupported mode. Further, it is also possible to: the transmitting apparatus transmits metadata and indicates MP4 configuration information in the data unit header when the MP4 supports the mode, and does not transmit metadata and does not indicate MP4 configuration information in the data unit header when the MP4 does not support the mode.

[ operation flow of Transmission device ]

Next, the operation flow of the transmission device will be described. Fig. 49 is an operation flow of the transmission device.

The transmitting apparatus first determines whether or not to transmit metadata (S1001). When the transmitting apparatus determines that the metadata is to be transferred (yes in S1002), the flow proceeds to step S1003, and MP4 configuration information is generated, stored in the header, and transferred (S1003). In this case, the transmitting apparatus also generates and transmits metadata.

On the other hand, when the transmitting apparatus determines that metadata is not to be transferred (no in S1002), the transmitting apparatus transfers the metadata without generating MP4 configuration information and storing the configuration information in a header (S1004). In this case, the transmitting apparatus does not generate metadata and does not transmit it.

Whether or not to transfer metadata in step S1001 may be determined in advance, or may be determined based on whether or not metadata is generated inside the transmission device and whether or not the metadata is transferred inside the transmission device.

[ operation flow of receiver ]

Next, the operation flow of the receiving apparatus will be described. Fig. 50 is an operation flow of the receiving apparatus.

The receiving apparatus first determines whether or not metadata is transmitted (S1101). By monitoring the slice type in the MMTP packet payload, it can be determined whether the metadata is transferred. In addition, whether to transmit or not may be determined in advance.

When the receiving apparatus determines that the metadata is transferred (yes in S1102), the receiving apparatus reconstructs the MP4 and performs decoding processing using the MP4 configuration information (S1103). On the other hand, if it is determined that the metadata is not transferred (no in S1102), the reconstruction processing of the MP4 is not performed, and the decoding processing is performed without using the MP4 configuration information (S1104).

Further, the receiving apparatus can perform detection of a random access point, detection of an access unit head, detection of a NAL unit head, and the like without using the MP4 configuration information by using the method described above, and can perform decoding processing, detection of packet loss, and recovery processing from packet loss.

For example, when non-VCL NAL units are grouped individually as data units as shown in fig. 38, the access unit header is the header data of the MMT payload with the aggregation _ flag value of 1. At this time, the Fragmentation _ indicator value is 0.

In addition, the beginning of the slice is the beginning data of the MMT payload whose aggregation _ flag value is 0 and whose fragmentation _ indicator value is 00 or 01.

The receiving apparatus can detect the beginning of the access unit and the slice based on the above information.

The receiving apparatus may analyze the NAL unit header in a packet including the head of a data unit whose fragmentation _ indicator value is 00 or 01, and detect whether the type of the NAL unit is an AU delimiter and whether the type of the NAL unit is a slice segment.

[ simple mode of broadcasting ]

Although the above description has been made of a method of transmitting data specialized for broadcast stream reproduction, in which MP4 configuration information is not supported in a receiving apparatus, the method of transmitting data specialized for broadcast stream reproduction is not limited to this.

As a method of transmitting data specialized for broadcast stream reproduction, for example, the following method can be used.

The transmitting apparatus may not use the AL-FEC in the fixed reception environment of broadcasting. In case of not using AL-FEC, FEC _ type in MMTP header is always fixed to 0.

The transmitting apparatus may always use AL-FEC in the mobile broadcast reception environment and the UDP transmission mode of communication. In case of using AL-FEC, FEC _ type in MMTP header is always 0 or 1.

The transmitting apparatus may not perform bulk transfer of resources. When the bulk transfer of the resource is not performed, the location _ information indicating the number of transfer positions of the resource in the MPT may be fixed to 1.

The transmission device may not perform the mixed transmission of the resource, the program, and the message.

For example, a broadcast simple mode may be defined, and the transmitting apparatus may set the MP4 non-support mode in the broadcast simple mode, or may use the above-described data transmission method specialized for broadcast stream playback. Whether or not the broadcast mode is set may be determined in advance, or the transmitting apparatus may store a flag indicating that the broadcast mode is set as control information and transmit the control information to the receiving apparatus.

The transmitting apparatus may use the above-described data transmission method specialized for broadcast stream playback as the broadcast simple mode based on whether the MP4 unsupported mode (whether metadata is transmitted) is described in fig. 49, and when the MP4 unsupported mode is used.

In the broadcast simple mode, the receiving apparatus is set to the MP4 non-support mode, and can perform decoding processing without reconstructing the MP 4.

In the broadcast simple mode, the receiving apparatus determines the function specified for the broadcast, and can perform the receiving process specified for the broadcast.

Thus, in the broadcast simple mode, by using only the function specialized for the broadcast, it is possible to reduce not only processing unnecessary for the transmitting apparatus and the receiving apparatus but also transmission overhead by compressing unnecessary information without transmission.

In the case of using the MP4 non-support mode, presentation information for supporting an accumulation method other than the MP4 configuration may be shown.

Examples of accumulation methods other than the MP4 configuration include a method of directly accumulating MMT packets or IP packets, and a method of converting MMT packets into MPEG-2TS packets.

In addition, in the case of the MP4 non-supported mode, a format that does not comply with the MP4 configuration may be used.

For example, when the MP4 non-support mode is used as the data stored in the MFU, the data may be in a format in which a byte header is added instead of a format in which the size of an NAL unit is added to the beginning of an NAL unit in the MP4 format.

In MMT, a resource type indicating the type of the resource is described in 4CC registered in MP4REG (http:// www.mp4ra.org), and when HEVC is used as a video signal, "HEV 1" or "HVC 1" is used. "HVC 1" is a form that may also contain parameter sets among samples, "HEV 1" is a form that does not contain parameter sets among samples, but contains parameter sets in sample entries in MPU metadata.

In the case of the broadcast simple mode or the MP4 non-support mode, it can be also specified that the parameter set must be included in the sample without transmitting the MPU metadata and the MF metadata. Further, it can be defined that: whichever asset type represents "HEV 1" or "HVC 1" must take the form of "HVC 1".

[ supplement 1: transmitting device

As described above, a transmission apparatus that does not operate with MP4 configuration information set as reserved (reserved) when metadata is not transmitted can be configured as shown in fig. 51. Fig. 51 is a diagram showing an example of a specific configuration of a transmission device.

The transmission device 300 includes an encoding unit 301, an adding unit 302, and a transmission unit 303. The encoding unit 301, the giving unit 302, and the transmitting unit 303 are each realized by, for example, a microcomputer, a processor, a dedicated circuit, or the like.

The encoding unit 301 encodes a video signal or an audio signal to generate sample data. The sample data is specifically a data unit.

The adding unit 302 adds header information including MP4 configuration information to sample data, which is data obtained by encoding a video signal or an audio signal. The MP4 configuration information is information for reconstructing the sample data into a file in the MP4 format on the receiving side, and is information whose content differs depending on whether or not the presentation time of the sample data is decided.

As described above, the assigning unit 302 includes MP4 configuration information such as motion _ fragment _ sequence _ number, sample _ number, offset, priority, and dependency _ counter in the header (header information) of the time-MFU, which is an example of sample data (sample data including information related to synchronization) whose presentation time is determined.

On the other hand, the assigning unit 302 includes MP4 configuration information such as item _ id in the header (header information) of a timed-MFU, which is an example of sample data (sample data not including information related to synchronization) whose presentation time is not decided.

In addition, when the transmission unit 303 does not transmit the metadata corresponding to the sample data (for example, in the case of (b) of fig. 21), the assigning unit 302 assigns header information not including the MP4 configuration information to the sample data, depending on whether or not the presentation time of the sample data is determined.

Specifically, the assigning unit 302 assigns header information not including the first MP4 configuration information to the sample data when the presentation time of the sample data is determined, and assigns header information including the second MP4 configuration information to the sample data when the presentation time of the sample data is not determined.

For example, as shown in step S1004 of fig. 49, when the transmission unit 303 does not transmit metadata corresponding to sample data, the addition unit 302 essentially does not generate MP4 configuration information and essentially does not store the MP 3578 configuration information in a header (header information) by setting the MP4 configuration information to a reserved (fixed value). The metadata includes MPU metadata and movie fragment metadata.

The transmission unit 303 transmits the sample data to which the header information is added. More specifically, the transmission unit 303 packetizes and transmits the sample data to which the header information is added by the MMT scheme.

As described above, in the transmission method and the reception method specialized for reproduction of a broadcast stream, the reception apparatus side does not need to reconstruct a data unit to the MP 4. When the receiving apparatus does not need to be reconfigured to the MP4, unnecessary information such as MP4 configuration information is not generated, and processing by the transmitting apparatus is reduced.

On the other hand, the transmitting apparatus needs to maintain the compatibility with the standard so that necessary information needs to be transmitted, but does not need to separately transmit extra additional information or the like.

With the configuration of the transmission device 300, by setting the area in which the MP4 configuration information is stored to a fixed value or the like, it is possible to transmit only necessary information based on the standard without transmitting the MP4 configuration information, and thus it is possible to obtain an effect that redundant additional information is not transmitted. That is, the configuration of the transmitting apparatus and the processing amount of the receiving apparatus can be reduced. In addition, since useless data is not transmitted, the transmission efficiency can be improved.

[ supplement 2: receiving device

The receiving apparatus corresponding to the transmitting apparatus 300 may be configured as shown in fig. 52, for example. Fig. 52 is a diagram showing another example of the configuration of the receiving apparatus.

The reception device 400 includes a reception unit 401 and a decoding unit 402. The receiving unit 401 and the decoding unit 402 are realized by, for example, a microcomputer, a processor, a dedicated circuit, or the like.

The receiving unit 401 receives sample data, which is encoded in a video signal or an audio signal and to which header data including MP4 configuration information for reconstructing the sample data into a file in MP4 format is added.

When the receiving unit does not receive the metadata corresponding to the sample data and the presentation time of the sample data is determined, the decoding unit 402 decodes the sample data without using the MP4 configuration information.

For example, as shown in step S1104 in fig. 50, when the metadata corresponding to the sample data is not received by the receiving unit 401, the decoding unit 402 executes the decoding process without using the MP4 configuration information.

This can reduce the configuration of the reception apparatus 400 and the amount of processing in the reception apparatus 400.

(embodiment mode 4)

[ outline ]

In embodiment 4, a method of storing an asynchronous (non-timed) medium such as a file, which does not include information related to synchronization, in an MPU and a method of transmitting the medium in an MMTP packet are described. In embodiment 4, an MPU in the MMT is described as an example, but the present invention is also applicable to DASH based on MP 4.

First, details of a method of storing non-timed media (hereinafter, also referred to as "asynchronous media data") in the MPU will be described with reference to fig. 53. FIG. 53 shows a method of storing non-timed media in an MPU and a method of transferring non-timed media in an MMT package.

The MPU for storing non-timed media is composed of boxes (box) such as ftyp, mmpu, moov, meta, etc., and stores information related to files stored in the MPU. A plurality of idat boxes can be stored in the meta box, and one file is stored as item in the idat box.

Part of ftyp, mmpu, moov and meta boxes form a data unit as MPU metadata, and item or idat boxes form a data unit as MFU.

After being summarized or fragmented, the data units are given a data unit header, an MMTP payload header, and an MMTP header and transmitted as MMTP packets.

Fig. 53 shows an example in which File #1 and File #2 are stored in one MPU. The MPU metadata is not partitioned, and the MFU is partitioned and stored in the MMTP packet, but is not limited thereto, and may be aggregated or fragmented according to the size of the data unit. In addition, MPU metadata may not be transmitted, in which case only the MFU is transmitted.

Header information such as a data unit header shows itemID (an identifier that uniquely identifies an item), and an MMTP payload header or an MMTP header contains a packet sequence number (sequence number for each packet) and an MPU sequence number (sequence number of an MPU, a number unique within a resource).

The data structure of the MMTP payload header or MMTP header other than the data unit header includes an aggregation _ flag, a fragmentation _ indicator, a fragmentation _ counter, and the like, as in the case of the timed media (hereinafter, also referred to as "synchronous media data") described above.

Next, a specific example of header information in the case of dividing and packing a file (Item or MFU) will be described with reference to fig. 54 and 55.

Fig. 54 and 55 are diagrams showing an example in which a plurality of pieces of divided data obtained by dividing a file are packaged and transmitted for each piece of divided data. Fig. 54 and 55 specifically show information (packet sequence number, fragment counter, fragment identifier, MPU sequence number, and item ID) included in any one of the data unit header, MMTP payload header, and MMTP packet header, which is header information of each of the MMTP packets after the division. Fig. 54 is a diagram showing an example in which File #1 is divided into M (M < ═ 256), and fig. 55 is a diagram showing an example in which File #2 is divided into N (256< N).

The divided data number indicates an index of the divided data with respect to the beginning of the file, and this information is not transmitted. That is, the divided data number is not included in the header information. The divided data number is a number added to each packet corresponding to a plurality of divided data obtained by dividing a file, and is a number given by adding 1 to the first packet in ascending order.

The packet sequence number is the sequence number of a packet having the same packet ID, and in fig. 54 and 55, the divided data at the beginning of the file is a, and the divided data up to the end of the file is given consecutive numbers. The packet sequence number is a number given by adding 1 to the divided data from the beginning of the file in ascending order, and is a number corresponding to the divided data number.

The slice counter indicates the number of pieces of divided data that are later than the divided data among the pieces of divided data obtained by dividing one file. When the number of pieces of divided data obtained by dividing one file, that is, the number of pieces of divided data exceeds 256, the slice counter indicates the remainder obtained by dividing the number of pieces of divided data by 256. In the example of fig. 54, since the number of divided data is 256 or less, the field value of the slice counter is (M-divided data number). On the other hand, in the example of fig. 55, since the number of divided data exceeds 256, the value is obtained by dividing (N-divided data number) by 256 ((N-divided data number)% 256).

The fragment identifier indicates a state of division of data stored in the MMTP packet, and indicates a value indicating that the data is the first divided data, the last divided data, the other divided data, or one or more data units that are not divided. Specifically, the slice identifier is "01" in the first divided data, "11" in the last divided data, "10" in the remaining divided data, and "00" in the undivided data unit.

In the present embodiment, the remainder obtained by dividing the number of divided data by 256 is shown when the number of divided data exceeds 256, but the number of divided data is not limited to 256, and may be other numbers (predetermined numbers).

In the case where a file is divided as shown in fig. 54 and 55, and a plurality of pieces of divided data obtained by dividing the file are transmitted while being given conventional header information, the following information is not present in the receiving apparatus: the data stored in the received MMTP packet is the number of divided data (divided data number) in the original file, the number of divided data of the file, or information from which the divided data number and the number of divided data can be derived. Therefore, in the conventional transmission method, even if the MMTP packet is received, the divided data number or the number of divided data of the data stored in the received MMTP packet cannot be uniquely detected.

For example, as shown in fig. 54, when the number of divided data is 256 or less and the number of divided data is known in advance to be 256 or less, the number of divided data or the number of divided data can be determined by referring to the slice counter. However, when the number of divided data is 256 or more, the number of divided data or the number of divided data cannot be determined.

When the number of divided data of a file is limited to 256 or less, if the data size transferable in one packet is x bytes, the maximum size of the transferable file is limited to x × 256 bytes. For example, in the broadcast, it is assumed that x is 4 kbytes, and in this case, the maximum size of a file that can be transferred is limited to 4k 256 to 1 mbyte. Therefore, when a file of more than 1 mbyte is desired to be transferred, the number of divided data of the file cannot be limited to 256 or less.

Further, for example, since the first divided data or the last divided data of a file can be detected by referring to the slice identifier, the number of MMTP packets can be counted until the MMTP packet including the last divided data of the file is received, or the number of divided data can be calculated by combining the MMTP packet including the last divided data of the file with the packet sequence number after the MMTP packet including the last divided data of the file is received. However, when receiving data from an MMTP packet including split data in the middle of a file (i.e., split data that is neither the first split data nor the last split data of the file), the number of split data or the number of split data of the split data cannot be determined. The split data number or the split data number of the split data can be determined only after receiving the MMTP packet including the last split data of the file.

The problem described in fig. 54 and 55 is to uniquely identify the divided data number and the number of divided data of the file at the time of starting reception of the packet including the divided data of the file from the middle, and the following method is used.

First, the divided data number is explained.

As for the split data number, a packet sequence number in the split data at the head of the file (item) is signaled.

The method for signaling transmission is stored in the control information of the management file. Specifically, the packet sequence number a of the divided data at the head of the file in fig. 54 and 55 is stored in the control information. The receiving device acquires the value of a from the control information, and calculates the divided data number from the packet sequence number indicated in the packet header.

The divided data number of the divided data is obtained by subtracting the packet sequence number a of the first divided data from the packet sequence number of the divided data.

The control information for managing the file is, for example, a resource management table defined in ARIB STD-B60. The resource management table shows file size, version information, and the like for each file, and stores and transmits the file in a data transmission message. Fig. 56 is a diagram showing the syntax of a loop for each file in the resource management table.

When the area of the existing file management table cannot be expanded, signaling may be performed using a 32-bit area of a part of the item _ info _ byte field indicating information of an item. In a part of the area of the item _ info _ byte, a flag indicating whether or not the packet sequence number in the divided data at the beginning of the file (item) is indicated may be included in, for example, a reserved _ future _ use field of the control information.

When a file is repeatedly transmitted in a data carousel or the like, a plurality of packet sequence numbers may be indicated, or a packet sequence number at the head of the file to be transmitted immediately thereafter may be indicated.

The packet sequence number of the divided data at the beginning of the file is not limited, and may be information associating the divided data number of the file with the packet sequence number.

Next, the number of divided data will be described.

The order of the loop for each file included in the resource management table may be defined as the file transfer order. Thus, since the packet sequence numbers at the head of two consecutive files in the transfer order are known, the number of divided data of a file transferred before can be determined by subtracting the packet sequence number at the head of the file transferred before from the packet sequence number at the head of the file transferred after. That is, for example, in the case where File #1 shown in fig. 54 and File #2 shown in fig. 55 are files that are consecutive in this order, the last packet sequence number of File #1 and the first packet sequence number of File #2 are given consecutive numbers.

Further, the number of divided data of a file may be specified by specifying a file dividing method. For example, when the number of pieces of divided data is N, the number of pieces of divided data can be inversely calculated from item _ size indicated in the resource management table by setting the size of each of the 1 st to (N-1) th pieces of divided data to L and defining the size of the nth piece of divided data as a mantissa (item _ size-L (N-1)). In this case, the number of pieces of divided data is the integer value obtained by carrying the mantissa (item _ size/L). In addition, the file division method is not limited thereto.

Alternatively, the number of pieces of divided data may be stored in the resource management table as it is.

In the receiving apparatus, by using the above-described method, control information is received, and the number of divided data is calculated based on the control information. In addition, the packet sequence number corresponding to the divided data number of the file can be calculated based on the control information. In addition, when the reception timing of the packet of the divided data is earlier than the reception timing of the control information, the divided data number or the number of the divided data may be calculated at the timing when the control information is received.

In addition, when the divided data number or the number of divided data is signaled by using the above method, the divided data number or the number of divided data is not determined based on the slice counter, and the slice counter becomes useless data. Therefore, when information for specifying the number of divided data and the number of divided data is signaled by using the above-described method or the like in the asynchronous media transmission, the fragmentation counter is not used, or header compression may be performed. This can reduce the processing amount of the transmitting device and the receiving device, and can improve the transmission efficiency. That is, when asynchronous media is transmitted, the segment counter may be set to be reserved (invalidated). Specifically, the value of the slice counter may be a fixed value such as "0". In addition, the slice counter may be ignored when receiving asynchronous media.

When a synchronous medium such as video or audio is stored, the transmission order of the MMTP packets in the transmitting device matches the arrival order of the MMTP packets in the receiving device, and the packets are not retransmitted. In this case, if it is not necessary to detect a packet loss and reconstruct the packet, the fragmentation counter may not be used. In other words, in this case, the slice counter may be set to be reserved (invalidated).

Further, without using the slice counter, detection of a random access point, detection of the head of an access unit, detection of the head of a NAL unit, and the like can be performed, and decoding processing, detection of packet loss, recovery processing from packet loss, and the like can be performed.

Further, in the transmission of real-time content such as live broadcasting, transmission with lower delay is required, and it is required to sequentially package and transmit encoded data. However, in the real-time content transmission, since the conventional slice counter cannot determine the number of pieces of divided data when the first piece of divided data is transmitted, a delay occurs after the transmission of the first piece of divided data is completed until the number of pieces of divided data is determined after all the encoding of the data unit is completed. In this case, by not using the slice counter using the above method, the delay can be reduced.

Fig. 57 is an operation flow of determining a divided data number in the receiving apparatus.

The receiving apparatus acquires control information describing information of a file (S1201). The reception device determines whether or not the control information indicates the packet sequence number at the beginning of the file (S1202), and if the control information indicates the packet sequence number at the beginning of the file (yes in S1202), calculates the packet sequence number corresponding to the divided data number of the divided data of the file (S1203). Then, the receiving apparatus acquires the MMTP packet storing the divided data, and then specifies the divided data number of the file based on the packet sequence number stored in the packet header of the acquired MMTP packet (S1204). On the other hand, when the control information does not include the packet sequence number indicating the file head (no in S1202), the receiving apparatus acquires the MMTP packet including the last fragmented data of the file, and then specifies the fragmented data number using the fragment identifier and the packet sequence number stored in the packet header of the acquired MMTP packet (S1205).

Fig. 58 is an operation flow of determining the number of divided data in the receiving apparatus.

The receiving apparatus acquires control information in which information of a file is described (S1301). The receiving device determines whether or not information capable of calculating the number of divided data of the file is included in the control information (S1302), and if it is determined that information capable of calculating the number of divided data is included (yes in S1302), the number of divided data is calculated based on the information included in the control information (S1303). On the other hand, when the receiving apparatus determines that the number of divided data cannot be calculated (no in S1302), after acquiring the MMTP packet including the last divided data of the file, the receiving apparatus specifies the number of divided data using the fragment identifier and the packet sequence number stored in the header of the acquired MMTP packet (S1304).

Fig. 59 is an operation flow for determining whether or not to use the slice counter in the transmitting apparatus.

First, the transmitting apparatus determines whether the medium to be transmitted (hereinafter also referred to as "media data") is a synchronous medium or an asynchronous medium (S1401).

If the result of the determination in step S1401 is a synchronized medium (synchronized medium in S1402), the transmitting apparatus determines whether the MMTP packets transmitted and received in the environment in which the synchronized medium is transmitted are in the same order and the reconstruction of the packet is not necessary when the packet is lost (S1403). When the transmitting apparatus determines that it is unnecessary (yes in S1403), it does not use the slice counter (S1404). On the other hand, when the transmitting apparatus determines that it is not unnecessary (no in S1403), it operates the slice counter (S1405).

If the result of the determination in step S1401 is an asynchronous medium (asynchronous medium in S1402), the transmission apparatus determines whether or not to use the fragmentation counter based on whether or not the number of divided data and the number of divided data are signaled using the method described above. Specifically, when the transmission device signals the number of divided data and the number of divided data (yes in S1406), the transmission device does not use the slice counter (S1404). On the other hand, when the transmission device does not signal the number of divided data or the number of divided data (no in S1406), the transmission device uses the slice counter (S1405).

In addition, when the transmitting apparatus does not use the slice counter, the transmitting apparatus may retain the value of the slice counter, or may perform header compression.

The transmitting apparatus may determine whether to perform signaling on the divided data number and the number of divided data based on whether or not the slice counter is used.

In addition, when the synchronous medium does not use the segment counter, the transmitting apparatus may signal the segment number and the number of segments in the asynchronous medium by using the above-described method. Conversely, the operation of the synchronized media may be determined based on whether the segment counter is used by the unsynchronized media. In this case, whether or not the slice is used can be the same operation in the synchronous media and the asynchronous media.

Next, a method of determining the number of pieces of divided data and the number of pieces of divided data (in the case of using a slice counter) will be described. Fig. 60 is a diagram for explaining a method of determining the number of pieces of divided data and the number of pieces of divided data (in the case of using a slice counter).

As described with reference to fig. 54, when the number of pieces of divided data is 256 or less and the number of pieces of divided data is known in advance to be 256 or less, the number of pieces of divided data or the number of pieces of divided data can be specified by referring to the slice counter.

When the number of divided data of a file is limited to 256 or less, the maximum size of the file that can be transferred is limited to x × 256 bytes when the data size that can be transferred in one packet is x bytes. For example, in the broadcast, x is assumed to be 4 kbytes, and in this case, the maximum size of a file that can be transmitted is limited to 4k 256 kbytes to 1 mbyte.

When the file size exceeds the maximum size of the transferable file, the file is divided in advance so that the size of the divided file becomes x × 256 bytes or less. Each of a plurality of divided files obtained by dividing a file is processed as one file (item), and is further divided into 256 or less, and divided data obtained by further dividing is stored in an MMTP packet and transferred.

Further, information indicating that the item is a divided file, the number of divided files, and the serial numbers of the divided files may be stored in the control information and transmitted to the receiving apparatus. These pieces of information may be stored in the resource management table, or may be represented by a part of the existing field item _ info _ byte.

When the item is one of a plurality of divided files obtained by dividing one file, the receiving apparatus identifies the other divided files and can reconstruct the original file. In addition, in the receiving apparatus, the number of divided data and the number of divided data can be uniquely determined by using the number of divided files of the divided file, the index of the divided file, and the slice counter in the control information. In addition, the number of pieces of divided data and the number of pieces of divided data can be uniquely determined without using a packet sequence number or the like.

Here, it is preferable that item _ ids of a plurality of divided files obtained by dividing one file are the same. When another item _ id is assigned, the item _ id of the first divided file may be indicated in order to uniquely refer to the file based on other control information and the like.

In addition, a plurality of divided files may necessarily belong to the same MPU. When a plurality of files are stored in the MPU, it is necessary to store a file obtained by dividing one file, instead of storing a plurality of types of files. The receiving apparatus can detect the update of the file by confirming the version information of each MPU even without confirming the version information of each item.

Fig. 61 is an operation flow of the transmitting apparatus in the case of using the slice counter.

First, the transmitting apparatus confirms the size of a file to be transferred (S1501). Next, the transmitting apparatus determines whether or not the file size exceeds x × 256 bytes (x is a data size that can be transferred for one packet, for example, MTU size) (S1502), and when the file size exceeds x × 256 bytes (yes in S1502), the transmitting apparatus divides the file so that the size of the divided file is smaller than x × 256 bytes (S1503). Then, the divided files are transferred as items, and information related to the divided files (for example, the divided files, serial numbers in the divided files, and the like) is stored in the control information and transferred (S1504). On the other hand, when the file size is smaller than x × 256 bytes (no in S1502), the file is transferred as an item as usual (S1505).

Fig. 62 is an operation flow of the receiving apparatus in the case of using the slice counter.

First, the receiving apparatus acquires and analyzes control information related to file transfer such as a resource management table (S1601). Next, the receiving apparatus determines whether or not the desired item is a divided file (S1602). When the receiving apparatus determines that the desired file is a divided file (yes in S1602), the receiving apparatus acquires information for reconstructing the file, such as the divided file or an index of the divided file, from the control information (S1603). Then, the receiving apparatus acquires the items constituting the divided file, and reconstructs the original file (S1604). On the other hand, when the receiving apparatus determines that the desired file is not a divided file (no in S1602), the receiving apparatus acquires the file as in the normal case (S1605).

In short, the transmitting apparatus signals the packet sequence number of the divided data at the beginning of the file. In addition, the transmitting apparatus signals information that can determine the number of divided data. Alternatively, the transmission apparatus defines a division rule capable of determining the number of divided data. In addition, the transmitting apparatus does not use the slice counter and performs reservation or header compression.

When the packet sequence number of the data at the beginning of the file is transmitted by signaling, the reception apparatus determines the split data number and the split data number based on the packet sequence number of the split data at the beginning of the file and the packet sequence number of the MMTP packet.

From another viewpoint, the transmitting apparatus divides a file, and divides and transmits data for each divided file. The information (sequence number, number of splits, etc.) associated with the split file is signaled.

The receiving device determines the number of the divided data and the number of the divided data according to the fragment counter and the serial number of the divided file.

This makes it possible to uniquely identify the divided data number and the divided data. Further, the divided data number of the divided data can be specified at the time of receiving the divided data in the middle, so that the waiting time can be reduced and the memory can be reduced.

Further, by not using the slice counter, the configuration of the transmitter/receiver apparatus can reduce the amount of processing and improve the transmission efficiency.

Fig. 63 is a diagram showing a service configuration in a case where the same program is transmitted by a plurality of IP data streams. The following examples are shown here: data of a part (video/audio) of a program having a service ID of 2 is transmitted by an IP data stream using the MMT system, and data having the same service ID but different from the part of the program is transmitted by an IP data stream using the high BS data transmission system (in this example, the file transmission protocol is different, but the same protocol may be used).

The transmitting device multiplexes the IP data so that the receiving device can ensure that the data composed of a plurality of IP data streams is ready before the decoding time.

The receiving apparatus performs processing based on the decoding time using data composed of a plurality of IP data streams, thereby realizing guaranteed receiver operation.

[ supplement: transmitting apparatus and receiving apparatus

As described above, the transmission device that transmits data without using the slice counter can be configured as shown in fig. 64. Further, the receiving apparatus for receiving data without using the slice counter can be configured as shown in fig. 65. Fig. 64 is a diagram showing an example of a specific configuration of a transmitting apparatus. Fig. 65 is a diagram showing an example of a specific configuration of a receiving apparatus.

The transmission device 500 includes a dividing unit 501, a configuration unit 502, and a transmission unit 503. The dividing unit 501, the constituting unit 502, and the transmitting unit 503 are each realized by, for example, a microcomputer, a processor, a dedicated circuit, or the like.

The reception device 600 includes a reception unit 601, a determination unit 602, and a configuration unit 603. The receiving unit 601, the determining unit 602, and the configuring unit 603 are each realized by, for example, a microcomputer, a processor, a dedicated circuit, or the like.

The components of the transmission apparatus 500 and the reception apparatus 600 will be described in detail in the description of the transmission method and the reception method, respectively.

First, a transmission method will be described with reference to fig. 66. Fig. 66 is an operation flow (transmission method) of the transmission device.

First, the dividing unit 501 of the transmission device 500 divides data into a plurality of pieces of divided data (S1701).

Next, the configuration unit 502 of the transmission device 500 configures a plurality of packets by giving header information to each of the plurality of divided data and packetizing the plurality of divided data (S1702).

Then, the transmission unit 503 of the transmission device 500 transmits the plurality of packets thus configured (S1703). The transmission unit 503 transmits the divided data information and the invalidated slice counter value. The divided data information is information for specifying the number of divided data and the number of divided data. The divided data number is a number indicating that the divided data is the second divided data among a plurality of divided data. The divided data number is the number of a plurality of divided data.

This can reduce the processing amount of the transmission device 500.

Next, a reception method will be described with reference to fig. 67. Fig. 67 shows an operation flow of the receiving apparatus (receiving method).

First, the reception unit 601 of the reception apparatus 600 receives a plurality of packets (S1801).

Next, the determination unit 602 of the reception apparatus 600 determines whether or not the divided data information is acquired from the plurality of received packets (S1802).

When the determination unit 602 determines that the divided data information is acquired (yes in S1802), the configuration unit 603 of the reception device 600 configures data from the received packets without using the value of the slice counter included in the header information (S1803).

On the other hand, when the determination unit 602 determines that the divided data information has not been acquired (no in S1802), the configuration unit 603 may configure data from a plurality of received packets using a slice counter included in the header information (S1804).

This can reduce the processing amount of the receiving apparatus 600.

(embodiment 5)

[ summary ]

In embodiment 5, a transmission method of transport packets (TLV packets) in the case of storing NAL units in a NAL size format in a multiplex layer is described.

As described in embodiment 1, when NAL units of h.264 and h.265 are stored in a multiplex layer, there are two types of storage. One is a form called "byte stream format" in which a start code composed of a specific bit string is attached immediately before the NAL unit header. The other is a form called "NAL size format" in which a field indicating the size of a NAL unit is attached. The byte stream format is used in MPEG-2 systems, RTP, and the like, and the NAL size format is used in MP4, DASH using MP4, MMT, and the like.

In the byte stream format, the start code is composed of 3 bytes, and an arbitrary byte (a byte having a value of 0) can be added.

On the other hand, in the NAL size format in the general MP4, size information is represented by any one of 1 byte, 2 bytes, and 4 bytes. This size information is represented by the length sizeminusone field in the HEVC sample entry. The case where the value of this field is "0" indicates 1 byte, the case where the value of this field is "1" indicates 2 bytes, and the case where the value of this field is "3" indicates 4 bytes.

Here, in ARIB STD-B60 "MMT-based media transport stream scheme in digital broadcasting" standardized in 7 months 2014, when a NAL unit is stored in a multiplex layer and the output of an HEVC encoder is a byte stream, a byte start code is removed and the size of a byte NAL unit represented by 32 bits (unsigned integer) is added as length information immediately before the NAL unit. MPU metadata including the HEVC sample entry is not transmitted, and the size information is fixed to 32 bits (4 bytes).

In the ARIB STD-B60 "MMT-based media transport stream scheme in digital broadcasting", a pre-decoding buffer of a video signal is defined as a CPB in a reception buffer model that a transmitting device considers at the time of transmission in order to guarantee a buffering operation in a receiving device.

However, there are the following problems. In CPB in the MPEG-2 system and HRD in HEVC, it is prespecified that a video signal is in a byte stream format. Therefore, for example, when the rate control of transport packets is performed on the premise of the byte stream format to which a start code of 3 bytes is added, there is a possibility that a receiving apparatus that receives transport packets of NAL size format to which a size region of 4 bytes is added cannot satisfy the reception buffer model in ARIB STD-B60. In addition, the reception buffer model in ARIB STD-B60 does not indicate a specific buffer size and extraction rate, and therefore it is difficult to guarantee a buffering operation in the reception apparatus.

In order to solve the above problem, a reception buffer model for ensuring a buffering operation in a receiver is defined as follows.

Fig. 68 shows a reception buffer model defined by ARIB STD B-60, particularly in the case of using only the broadcast transmission path.

The receive buffer model includes a TLV packet buffer (first buffer), an IP packet buffer (second buffer), an MMTP buffer (third buffer), and a pre-decode buffer (fourth buffer). Here, in the broadcast transmission path, a jitter cancellation buffer or a buffer for FEC is not required, and therefore, is omitted.

The TLV packet buffer receives TLV packets (transport packets) from the broadcast transmission path, converts IP packets composed of variable-length headers (IP packet header, full header at the time of IP packet compression, and compressed header at the time of IP packet compression) and variable-length payloads stored in the received TLV packets into IP packets (first packets) having fixed-length IP packet headers obtained by header expansion, and outputs the converted IP packets at a fixed bit rate.

The IP packet buffer converts the IP packet into an MMTP packet (second packet) having a header and a variable-length payload, and outputs the MMTP packet obtained by the conversion at a fixed bit rate. The IP packet buffer may also be merged with the MMTP buffer.

The MMTP buffer converts the output MMTP packet into NAL units, and outputs the NAL units obtained by the conversion at a fixed bit rate.

The pre-decoding buffer sequentially accumulates the output NAL units, generates an access unit from the accumulated NAL units, and outputs the generated access unit to the decoder at a timing of a decoding time corresponding to the access unit.

In the reception buffer model shown in fig. 68, the MMTP buffer and the pre-decode buffer, which are buffers other than the TLV packet buffer and the IP packet buffer at the previous stage, are characterized by following the reception buffer model in the MPEG-2 TS.

For example, the MMTP buffer for video (MMTP B1) is composed of buffers corresponding to a Transport Buffer (TB) and a Multiplexing Buffer (MB) in MPEG-2 TS. The MMTP buffer for audio (MMTP Bn) is constituted by a buffer corresponding to the Transmission Buffer (TB) in the MPEG-2 TS.

The buffer size of the transmission buffer is set to a fixed value as in the MPEG-2 TS. For example, the MTU size is n times (n may be a decimal number or an integer and is 1 or more).

In addition, the MMTP packet size is specified such that the overhead rate of the MMTP packet header is smaller than the overhead rate of the PES packet header. Thus, the extraction rates RX1, RXn, RXs of the transmission buffer in the MPEG-2TS can be applied as they are from the extraction rate of the transmission buffer.

The size of the multiplexing buffer and the extraction rate are set to the MB size and RBX1 in the MPEG-2TS, respectively.

In addition to the above reception buffer model, the following restriction is also provided to solve the problem.

The HRD specification for HEVC presupposes the byte stream format, and MMT is NAL size format with a size region of 4 bytes appended to the beginning of the NAL unit. Therefore, in NAL size form at the time of encoding, rate control is performed to satisfy HDR.

That is, the transmitting apparatus performs rate control of the transport packet based on the reception buffer model and the restriction.

In the receiving apparatus, by performing the reception processing using the signal, the decoding operation can be performed without underflow or overflow.

Even if the size region at the beginning of the NAL unit is not 4 bytes, the rate control is performed so as to satisfy the HRD in consideration of the size region at the beginning of the NAL unit.

The extraction rate of the TLV packet buffer (bit rate when the TLV packet buffer outputs the IP packet) is set in consideration of the transfer rate after the IP header extension.

That is, TLV packets having variable data sizes are input, and after removing the TLV header and expanding (restoring) the IP header, the transfer rate of the IP packet to be output is considered. In other words, the amount of increase and decrease of the header is considered for the input transfer rate.

Specifically, since the data size is variable, packets subjected to IP header compression and packets not subjected to IP header compression coexist, and the size of the IP header differs depending on the packet type such as IPv4 or IPv6, the transfer rate of the output IP packet is not unique. Therefore, the average packet length of the variable-length data size is determined, and the transfer rate of the IP packet output according to the TLV packet is determined.

Here, in order to specify the maximum transmission rate after the IP header is expanded, the transmission rate is determined assuming that the IP header is always compressed.

When IPv4 and IPv6 are mixed in packet types or when the packet types are not defined separately, the transmission rate is determined assuming that the header size is large and the increase rate of the header after expansion is large for IPv6 packets.

For example, when all the IP packets stored in the TLV packet buffer having an average packet length of S, TLV are IPv6 packets and header compression is performed, the maximum output transfer rate after TLV header removal and IP header expansion is:

input rate × { S/(S + IPv6 header compression amount) }.

More specifically, the average packet length S of the TLV packets is set with reference to S being 0.75 × 1500(1500 is assumed to be the maximum MTU size), and the IPv6 header compression amount is set to TLV header length-IPv 6 header length-UDP header length

The maximum output transfer rate after removal of TLV header and extension of IP header in case of 3-40-8 is:

input rate × 1.0417 ≈ input rate × 1.05.

In the MMT scheme, when data units are grouped, as shown in fig. 69, a data unit length and a data unit header are added before the data units.

However, for example, when a video signal in the NAL size format is stored as one data unit, as shown in fig. 70, 2 fields indicating the size are provided for one data unit, and the fields are repeated as information. Fig. 70 is a diagram showing an example of a case where a plurality of data units are collectively stored in one payload and a video signal in NAL size format is regarded as one data unit. Specifically, the size area at the head of the NAL size format (hereinafter referred to as "size area") and the data unit length field located before the data unit header in the MMTP payload header are both fields indicating the size and are duplicated as information. For example, in the case where the length of the NAL unit is L bytes, L bytes are indicated in the size region, and L bytes + "length of the size region" (bytes) are indicated in the data unit length field. Although the values indicated in the size field and the data unit length field do not completely match, the values can be said to be overlapping because one value can easily calculate the other value.

As described above, when data including size information of data therein is stored as data units and a plurality of the data units are collectively stored in one payload, there is a problem that the overhead rate is high and the transmission efficiency is low because the size information is duplicated.

Therefore, when the transmitting apparatus stores data including size information of the data as data units and stores a plurality of the data units in one payload in a lump, it is conceivable to store the data units as shown in fig. 71 and 72.

As shown in fig. 71, it is conceivable that NAL units including size fields are stored as data units, and the MMTP payload header does not indicate the data unit length included in the past. Fig. 71 is a diagram showing the structure of the payload of an MMTP packet whose data unit length is not shown.

As shown in fig. 72, a flag indicating whether or not the data unit length is indicated, and information indicating the length of the size area may be newly stored in the header. The position where the flag and the information indicating the length of the size area are stored may be indicated in units of data cells by a data cell header or the like, or may be indicated in units (packet units) in which a plurality of data cells are grouped. Fig. 72 shows an example of an extended area given to a packet unit. The storage location of the newly indicated information is not limited to this, and may be an MMTP payload header, an MMTP header, or control information.

On the receiving side, when the flag indicating whether or not the data unit length is compressed indicates that the data unit length is compressed, the length information of the size area inside the data unit is acquired, and the size area is acquired based on the length information of the size area, whereby the data unit length can be calculated using the acquired length information of the size area and the size area.

With the above method, the data amount can be reduced on the transmitting side, and the transmission efficiency can be improved.

However, the overhead may be reduced by reducing the size area without reducing the data unit length. In the case of reducing the size area, information indicating whether the size area is reduced or not and information indicating the length of the data unit length field may be stored.

Wherein the MMTP payload header also contains length information.

When NAL units including size fields are stored as data units, the payload size field in the MMTP payload header may be reduced regardless of whether they are grouped together.

In addition, when storing data not including a size area as a data unit, if the data unit is summarized and the data unit length is expressed, the payload size area in the MMTP payload header can be reduced.

In the case of reducing the payload size region, a flag indicating whether to reduce, length information of a reduced size field, or length information of an unreduced size field may be indicated in the same manner as described above.

Fig. 73 shows an operation flow of the receiving apparatus.

As described above, the transmitting apparatus stores NAL units including size fields as data units, and the data unit length included in the MMTP payload header is not indicated in the MMTP packet.

Hereinafter, a case will be described as an example where a flag indicating whether or not the data unit length is indicated and length information of the size area are indicated in the MMTP packet.

The receiving apparatus determines whether the data unit includes a size region and the data unit length is reduced based on the information transmitted from the transmitting side (S1901).

When it is determined that the data cell length is reduced (yes in S1902), the length information of the size region inside the data cell is acquired, and then the size region inside the data cell is analyzed to acquire the data cell length by calculation (S1903).

On the other hand, when it is determined that the data cell length has not been reduced (no in S1902), the data cell length is calculated from either the data cell length or the size region inside the data cell as usual (S1904).

However, when the flag indicating whether the data unit length is reduced or not and the length information of the size area are known in advance by the receiving apparatus, the flag and the length information may not be transmitted. In this case, the receiving apparatus performs the processing shown in fig. 73 based on predetermined information.

[ supplement: transmitting apparatus and receiving apparatus

As described above, the transmitting apparatus that performs rate control so as to satisfy the predetermined reception buffer model at the time of encoding can also be configured as shown in fig. 74. Further, the receiving apparatus that receives and decodes the transport packet transmitted from the transmitting apparatus can also be configured as shown in fig. 75. Fig. 74 is a diagram showing an example of a specific configuration of a transmitting apparatus. Fig. 75 is a diagram showing an example of a specific configuration of a receiving apparatus.

The transmission device 700 includes a generation unit 701 and a transmission unit 702. The generation unit 701 and the transmission unit 702 are each realized by, for example, a microcomputer, a processor, a dedicated circuit, or the like.

The reception device 800 includes a reception unit 801, a first buffer 802, a second buffer 803, a third buffer 804, a fourth buffer 805, and a decoding unit 806. The receiving unit 801, the first buffer 802, the second buffer 803, the third buffer 804, the fourth buffer 805, and the decoding unit 806 are each realized by, for example, a microcomputer, a processor, a dedicated circuit, or the like.

The components of the transmission apparatus 700 and the reception apparatus 800 are explained in detail in the explanation of the transmission method and the reception method, respectively.

First, a transmission method will be described with reference to fig. 76. Fig. 76 shows an operation flow (transmission method) of the transmission device.

First, the generating unit 701 of the transmitting apparatus 700 generates a coded stream by performing rate control so as to satisfy the specification of a reception buffer model predetermined to guarantee the buffering operation of the receiving apparatus (S2001).

Next, the transmission unit 702 of the transmission device 700 packetizes the generated bit stream, and transmits a transport packet obtained by the packetization (S2002).

The reception buffer model used in the transmission device 700 has the configurations of the first to fourth buffers 802 to 805 having the configuration of the reception device 800, and therefore, the description thereof is omitted.

Thus, when data transfer is performed using a scheme such as MMT, the transmission device 700 can ensure the buffering operation of the reception device 800.

Next, a reception method will be described with reference to fig. 77. Fig. 77 shows an operation flow of the receiving apparatus (receiving method).

First, the reception unit 801 of the reception device 800 receives a transport packet including a fixed-length packet header and a variable-length payload (S2101).

Next, the first buffer 802 of the receiving apparatus 800 converts the packet composed of the variable-length header and the variable-length payload stored in the received transport packet into a first packet having a fixed-length header expanded by the header, and outputs the first packet obtained by the conversion at a fixed bit rate (S2102).

Next, the second buffer 803 of the reception apparatus 800 converts the first packet obtained by the conversion into a second packet composed of a header and a variable-length payload, and outputs the second packet obtained by the conversion at a fixed bit rate (S2103).

Next, the third buffer 804 of the reception apparatus 800 converts the output second packet into an NAL unit, and outputs the NAL unit obtained by the conversion at a fixed bit rate (S2104).

Next, the fourth buffer 805 of the reception apparatus 800 sequentially accumulates the outputted NAL units, generates an access unit from the accumulated NAL units, and outputs the generated access unit to the decoder at the timing of the decoding time corresponding to the access unit (S2105).

Then, the decoding unit 806 of the reception apparatus 800 decodes the access unit outputted from the fourth buffer (S2106).

In this way, the receiving apparatus 800 can perform decoding operation without underflow or overflow.

(other embodiments)

As described above, the transmission device, the reception device, the transmission method, and the reception method according to the embodiment have been described, but the present application is not limited to this embodiment.

Each processing unit included in the transmission device and the reception device according to the above-described embodiments is typically realized as an LSI (large scale integrated circuit) which is an integrated circuit. They may be independently formed into a single chip, or may be partially or entirely formed into a single chip.

The integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. An FPGA (Field Programmable Gate Array) that can be programmed after LSI manufacture, or a reconfigurable processor that can reconfigure connection and setting of circuit cells within an LSI may be used.

In the above embodiments, each component may be configured by dedicated hardware, or may be realized by executing a software program suitable for each component. Each component may be realized by a program execution unit such as a CPU or a processor reading out and executing a software program recorded in a recording medium such as a hard disk or a semiconductor memory.

In other words, the transmission device and the reception device include a processing circuit (processing circuit) and a storage device (storage) electrically connected to the processing circuit (accessible from the control circuit). The processing circuit includes at least one of dedicated hardware and a program execution unit. In addition, in the case where the processing circuit includes a program execution unit, the storage device stores a software program executed by the program execution unit. The processing circuit uses the storage device to execute the transmission method or the reception method according to the above-described embodiments.

Further, the present application may be the software program or a nonvolatile computer-readable recording medium on which the program is recorded. It is obvious that the program can be circulated via a transmission medium such as the internet.

The numbers used in the above description are all numbers exemplified for the purpose of specifically describing the present application, and the present application is not limited to the exemplified numbers.

Note that, the division of the functional blocks in the block diagrams is an example, and a plurality of functional blocks may be implemented as 1 functional block, or 1 functional block may be divided into a plurality of functional blocks, or a part of the functions may be transferred to another functional block. Further, the functions of a plurality of functional blocks having similar functions may be processed in parallel or in time division by a single piece of hardware or software.

The order of executing the steps included in the transmission method or the reception method is an example order for specifically describing the present application, and may be an order other than the above. Further, a part of the above steps may be executed simultaneously (in parallel) with other steps.

The above describes the transmission device, the reception device, the transmission method, and the reception method according to one or more embodiments of the present application based on the embodiments, but the present application is not limited to the embodiments. The present invention is not limited to the embodiments described above, and various modifications and variations can be made without departing from the spirit and scope of the present invention.

Industrial applicability

The present application can be applied to a device or apparatus that transmits media such as video data and audio data.

Claims

1. A transmission device is characterized by comprising:

a generating unit configured to generate a coded stream by rate control satisfying a predetermined rule predetermined to guarantee a buffering operation of a receiving apparatus; and

a transmitting unit configured to sequentially transmit transport packets obtained by packing the encoded streams;

The transport packet includes a fixed length header and a variable length payload,

the predetermined specification is determined by the following buffers:

a first buffer that converts an IP packet, which is stored in the transport packet sequentially received by the receiving apparatus and includes a variable-length header and a variable-length payload, into a first packet having a fixed-length header expanded by a header, and outputs the first packet at a fixed first bit rate;

a second buffer which converts the first packet into a second packet including a header and a variable-length payload and outputs the second packet at a fixed second bit rate;

a third buffer for converting the second packet into a NAL unit, i.e. a network adaptation layer unit, and outputting the NAL unit at a fixed third bit rate; and

a fourth buffer for sequentially accumulating the NAL units, generating an access unit from the accumulated NAL units, and outputting the access unit to a decoder of a receiving apparatus at a timing of a decoding time corresponding to the access unit,

the extraction rate of the first buffer is equal to or lower than the maximum transmission rate of the first packet having the fixed-length packet header obtained by header-expanding the variable-length packet header.

2. A reception device is characterized by comprising:

a receiving unit configured to sequentially receive a transport packet including a fixed-length packet header and a variable-length payload; and

a first buffer that converts an IP packet, which is stored in the transport packet and includes a variable-length header and a variable-length payload, into a first packet having a fixed-length header expanded by a header, and outputs the first packet at a fixed first bit rate;

a third buffer, for converting the second packet into NAL unit, i.e. network adaptation layer unit, and outputting the NAL unit at a fixed third bit rate;

a fourth buffer for sequentially accumulating the NAL units, generating an access unit from the accumulated NAL units, and outputting the access unit at a timing of a decoding time corresponding to the access unit,

3. A transmission method, characterized in that, in the method,

generating a coded stream by rate control satisfying a predetermined rule predetermined to guarantee a buffering operation of a receiving apparatus;

sequentially transmitting a transmission packet formed by packaging the coding stream;

the predetermined specification is determined by the following buffers:

4. A method of reception, characterized in that,

sequentially receiving a transport packet including a fixed-length header and a variable-length payload;

converting an IP packet stored in the transport packet and including a variable-length header and a variable-length payload into a first packet having a fixed-length header expanded by a header, and outputting the first packet at a fixed first bit rate, wherein an extraction rate is equal to or less than a maximum transport rate of the first packet having the fixed-length header obtained by header-expanding the variable-length header;

converting the first packet into a second packet comprising a header and a variable-length payload, the second packet being output at a fixed second bit rate;

converting the second packet into a NAL unit, i.e., a network adaptation layer unit, and outputting the NAL unit at a fixed third bit rate;

the NAL units are sequentially accumulated, an access unit is generated from the accumulated NAL units, and the access unit is output at a timing of a decoding time corresponding to the access unit.