WO2017138470A1 - 送信装置、送信方法、受信装置および受信方法 - Google Patents
送信装置、送信方法、受信装置および受信方法 Download PDFInfo
- Publication number
- WO2017138470A1 WO2017138470A1 PCT/JP2017/004146 JP2017004146W WO2017138470A1 WO 2017138470 A1 WO2017138470 A1 WO 2017138470A1 JP 2017004146 W JP2017004146 W JP 2017004146W WO 2017138470 A1 WO2017138470 A1 WO 2017138470A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image data
- encoded
- frame rate
- information
- video stream
- Prior art date
Links
- 230000005540 biological transmission Effects 0.000 title claims description 198
- 238000000034 method Methods 0.000 title claims description 54
- 238000012545 processing Methods 0.000 claims abstract description 96
- 238000006243 chemical reaction Methods 0.000 claims description 288
- 238000003780 insertion Methods 0.000 claims description 29
- 230000037431 insertion Effects 0.000 claims description 29
- 239000000284 extract Substances 0.000 claims description 12
- 238000004148 unit process Methods 0.000 claims description 3
- 239000010410 layer Substances 0.000 description 91
- AWSBQWZZLBPUQH-UHFFFAOYSA-N mdat Chemical compound C1=C2CC(N)CCC2=CC2=C1OCO2 AWSBQWZZLBPUQH-UHFFFAOYSA-N 0.000 description 44
- 208000031509 superficial epidermolytic ichthyosis Diseases 0.000 description 40
- 238000005516 engineering process Methods 0.000 description 21
- 238000012546 transfer Methods 0.000 description 21
- 239000012634 fragment Substances 0.000 description 20
- 230000006870 function Effects 0.000 description 18
- 230000002123 temporal effect Effects 0.000 description 16
- 230000006978 adaptation Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 3
- 239000011229 interlayer Substances 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 101150002258 HDR1 gene Proteins 0.000 description 1
- 230000002730 additional effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005520 electrodynamics Effects 0.000 description 1
- 238000005401 electroluminescence Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000012384 transportation and delivery Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/2365—Multiplexing of several video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/33—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234327—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into layers, e.g. base layer and one or more enhancement layers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234363—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the spatial resolution, e.g. for clients with a lower screen resolution
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234381—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the temporal resolution, e.g. decreasing the frame rate by frame skipping
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/23439—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements for generating different versions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/23605—Creation or processing of packetized elementary streams [PES]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/266—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
- H04N21/2662—Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
- H04N21/4345—Extraction or processing of SI, e.g. extracting service information from an MPEG stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
- H04N21/4347—Demultiplexing of several video streams
Definitions
- the present technology relates to a transmission device, a transmission method, a reception device, and a reception method, and more particularly, to a transmission device that transmits ultra-high resolution image data at a high frame rate.
- Patent Literature 1 includes media encoding that is scalable and generates a base layer stream for a low-resolution image service and an enhancement layer stream for a high-resolution image service. It is described that a broadcast signal is transmitted.
- the purpose of this technology is to facilitate the processing on the receiving side according to the decoding capability.
- the concept of this technology is First image data for processing ultra-high resolution image data at a high frame rate to obtain a high resolution image at a basic frame rate, and high resolution at a high frame rate using the first image data together with the first image data.
- Second image data for obtaining an image third image data for obtaining an ultra-high resolution image at a basic frame rate by using the first image data together with the first image data, and the first to third image data
- an image processing unit for obtaining fourth image data for use in combination with a high frame rate to obtain an ultra-high resolution image
- the transmission apparatus includes an information insertion unit that inserts information corresponding to information related to image data of the video stream inserted in each of the predetermined number of video streams into the container.
- the image processing unit processes the ultrahigh resolution image data at a high frame rate, and obtains first to fourth image data.
- the first image data is image data for obtaining a high-resolution image at the basic frame rate.
- the second image data is image data for obtaining a high-resolution image at a high frame rate by using it together with the first image data.
- the third image data is image data that is used together with the first image data to obtain an ultra-high resolution image at the basic frame rate.
- the fourth image data is image data for obtaining an ultra-high resolution image at a high frame rate by being used together with the first to third image data.
- the container including a predetermined number of video streams having encoded image data of the first to fourth image data is transmitted by the transmission unit.
- Information corresponding to information related to image data of the video stream inserted into each of a predetermined number of video streams is inserted into the container by the information transmission unit.
- the container transmitted by the transmission unit includes a first video stream having encoded image data of first image data and encoded image data of second image data, and an encoded image of third image data.
- a second video stream having encoded image data of the data and the fourth image data is included, and the information insertion unit containers the information in a state where each of the first and second video streams is managed by one track. It may be inserted into.
- the container is MP4 (ISOBMFF)
- information related to the encoded image data of the two image data included in the video stream is arranged in the “moof” block corresponding to the track.
- the number of video streams (files) is two, which is simple.
- a container analysis unit (demultiplexer) of a receiver having a basic frame rate, for example, a 60P receiver needs to read a 120P stream and skip unnecessary pictures.
- a receiver with a high frame rate, for example, a 120P receiver does not do anything extra, and simply decodes the picture of the 120P stream.
- the information insertion unit when inserting the information into the container, relates to the information related to the encoded image data of the first image data and the encoded image data of the second image data regarding the first video stream.
- Information is inserted in groups, and for the second video stream, information on the encoded image data of the third image data and information on the encoded image data of the fourth image data are inserted in groups. May be. By being grouped in this way, the reception side can easily determine which encoded image data each information relates to.
- the picture of the first image data and the picture of the second image data are encoded alternately, that is, in time order
- the picture of the third image data and the picture of the fourth image data may be encoded alternately, that is, alternately in time order. Encoding in this way makes it possible to smoothly decode each picture on the receiving side.
- a decoding process can be performed within a range of decoding capability in a receiver that decodes only the first image data or only the first image data and the third image data. Will be guaranteed.
- the container transmitted by the transmission unit includes a first video stream having encoded image data of the first image data and encoded image data of the second image data, and a code of the third image data.
- the second video stream having the encoded image data and the encoded image data of the fourth image data is included, and the information insertion unit manages the information in a state where each of the first and second video streams is managed by two tracks. May be inserted into the container.
- the number of video streams (files) is two, which is simple.
- a container analysis unit (demultiplexer) of a receiver having a basic frame rate, for example, a 60P receiver needs to read a 120P stream and skip unnecessary pictures.
- a receiver with a high frame rate, for example, a 120P receiver does not do anything extra, and simply decodes the picture of the 120P stream.
- the picture of the first image data and the picture of the second image data are encoded alternately, that is, in time order in the first video stream, and the third video data is encoded in the third video stream.
- the picture of the image data and the picture of the fourth image data may be encoded alternately, that is, alternately in time order. Encoding in this way makes it possible to smoothly decode each picture on the receiving side.
- a decoding process can be performed within a range of decoding capability in a receiver that decodes only the first image data or only the first image data and the third image data. Will be guaranteed.
- the container transmitted by the transmission unit includes a first video stream having encoded image data of the first image data, and a second video stream having encoded image data of the second image data.
- the third video stream having the encoded image data of the third image data and the fourth video stream having the encoded image data of the fourth image data are included.
- To the fourth video stream may be inserted in a state where each of the fourth video streams is managed by one track.
- the container is MP4 (ISOBMFF)
- information related to encoded image data of one image data included in the video stream is arranged in a “moof” block corresponding to the track.
- the number of video streams (files) is four.
- a basic frame rate receiver such as a 60P receiver, guarantees the so-called downward compatibility of reading a 60P stream and passing it to the decoder without any extra consciousness.
- a receiver with a high frame rate for example, a 120P receiver, needs to combine two streams into one stream in the decoding order and transfer it to the decoder.
- information corresponding to information regarding image data included in each of the predetermined number of video streams is inserted into the container. Therefore, on the receiving side, it is easy to extract predetermined encoded image data from the first to fourth image data included in a predetermined number of streams and perform decoding processing based on this information according to the decoding capability. It becomes possible.
- an image data having a high frame rate and an ultra-high resolution is a transmission image in which high dynamic range photoelectric conversion characteristics are obtained by performing photoelectric conversion on high dynamic range image data using high dynamic range photoelectric conversion characteristics.
- the information insertion unit further inserts conversion characteristic information indicating high dynamic range photoelectric conversion characteristics or electro-optical conversion characteristics corresponding to the characteristics into a video stream having encoded image data of the first image data. It may be made like.
- the high dynamic range photoelectric conversion characteristic may be a characteristic of a hybrid log gamma.
- the high dynamic range photoelectric conversion characteristic may be a characteristic of a PQ curve. Since the conversion characteristic information is inserted in this manner, it is possible to easily perform appropriate electro-optic conversion based on the conversion characteristic information on the reception side.
- the information insertion unit adds the value of the conversion data based on the characteristic of the PQ curve to the video stream having the encoded image data of the first image data. Further, conversion information for converting into a value of conversion data based on the normal dynamic range photoelectric conversion characteristic may be inserted. By inserting the conversion information in this way, it is possible to obtain display image data satisfactorily on the receiving side when performing normal dynamic range display.
- a receiving unit for receiving a container including a predetermined number of video streams The predetermined number of video streams are obtained by processing ultra-high resolution image data at a high frame rate, the first image data for obtaining a high resolution image at a basic frame rate, and the first image Second image data for use with data to obtain a high resolution image at a high frame rate, and third image data for use with the first image data to obtain an ultra high resolution image at a basic frame rate
- fourth image data for obtaining an ultra-high resolution image at a high frame rate using together with the first to third image data, Information corresponding to information related to image data of the video stream inserted in each of the predetermined number of video streams is inserted into the container, Based on the information inserted in the container according to the decoding capability, predetermined encoded image data is selectively extracted from the encoded image data of the first to fourth image data, and decoding processing is performed.
- the receiving apparatus further includes a processing unit for obtaining data.
- a container including a predetermined number of video streams is received by the receiving unit.
- the predetermined number of video streams have encoded image data of first to fourth image data obtained by processing ultrahigh resolution image data at a high frame rate.
- the first image data is image data for obtaining a high-resolution image at the basic frame rate.
- the second image data is image data for obtaining a high-resolution image at a high frame rate by using it together with the first image data.
- the third image data is image data that is used together with the first image data to obtain an ultra-high resolution image at the basic frame rate.
- the fourth image data is image data for obtaining an ultra-high resolution image at a high frame rate by being used together with the first to third image data.
- the processing unit Based on the information inserted in the container, the processing unit selectively extracts predetermined encoded image data from the encoded image data of the first to fourth image data according to the decoding capability, and performs decoding processing. This is done to obtain image data.
- information corresponding to information on image data included in each video stream inserted into each of a predetermined number of video streams is inserted into the container, and is inserted into the container according to the decoding capability.
- predetermined encoded image data is selectively extracted from the encoded image data of the first to fourth image data, and the decoding process is performed. Therefore, it is possible to easily perform a decoding process according to the decoding capability.
- an image data having a high frame rate and an ultra-high resolution is a transmission image in which high dynamic range photoelectric conversion characteristics are obtained by performing photoelectric conversion on high dynamic range image data using high dynamic range photoelectric conversion characteristics.
- Conversion characteristic information indicating a high dynamic range photoelectric conversion characteristic or an electro-optical conversion characteristic corresponding to this characteristic is inserted into a video stream having encoded image data of the first image data.
- the image data obtained by the decoding process may be subjected to electro-optic conversion based on the conversion characteristic information to obtain display image data.
- a transmission image in which high-dynamic range photoelectric conversion characteristics are provided by performing photoelectric conversion on high-dynamic range image data using high-dynamic range photoelectric conversion characteristics is the characteristic of the PQ curve, and the value of the conversion data based on the characteristic of the PQ curve is converted into the normal dynamic range photoelectric conversion characteristic in the video stream having the encoded image data of the first image data.
- Conversion information for converting into conversion data values is inserted, and the processing unit performs dynamic range conversion on the image data obtained by decoding processing based on the conversion information when performing normal dynamic range display.
- Obtaining a display image data in the normal dynamic range transmission image data normally performs electro-optic conversion by the dynamic range optic converter characteristics, may be adapted. As a result, when performing normal dynamic range display, display image data can be obtained satisfactorily.
- a level designation value of a video stream corresponding to the encoded image data of the first image data is inserted into the container corresponding to the encoded image data of the first image data, and the second image data
- the transmission apparatus includes an information insertion unit that inserts a level designation value of a video stream obtained by combining the encoded image data of the first and second image data.
- high-frame-rate image data is processed by the image processing unit, and the first image data for obtaining the basic frame-rate image and the high-frame-rate image data are used together with the first image data.
- Second image data to be obtained is obtained.
- the transmission unit transmits a container including one or more video streams having encoded image data of the first and second image data.
- the level insertion value of the video stream corresponding to the encoded image data of the first image data is inserted into the container by the information insertion unit corresponding to the encoded image data of the first image data, Corresponding to the encoded image data of the second image data, the level designation value of the video stream that combines the encoded image data of the first and second image data is inserted.
- the level designation value of the video stream is inserted into the container, so that the reception side can code the first and second image data based on the information of the level designation value of the video stream. From the converted image data, data corresponding to the decoding capability can be selectively sent to the decoder for processing.
- a receiving unit for receiving a container including one or more video streams The one or more video streams have first image data for obtaining a basic frame rate image and second image data for obtaining high frame rate image data together with the first image data, A level designation value of a video stream corresponding to the encoded image data of the first image data is inserted into the container corresponding to the encoded image data of the first image data, and the second Corresponding to the encoded image data of the image data, the level designation value of the video stream that combines the encoded image data of the first and second image data is inserted, One or more encoded image data is selectively selected from the encoded image data of the first and second image data based on the level designation value of the video stream inserted in the container according to the decoding capability.
- the receiving apparatus further includes a processing unit that extracts and performs decoding processing to obtain image data.
- the receiving unit receives a container including one or more video streams.
- the one or more video streams include first image data for obtaining a basic frame rate image and second image data for obtaining high frame rate image data together with the first image data. have.
- the level designation value of the video stream corresponding to the encoded image data of the first image data is inserted into the container corresponding to the encoded image data of the first image data
- the second Corresponding to the encoded image data of the image data a level designation value of the video stream in which the encoded image data of the first and second image data is combined is inserted.
- the processing unit selectively selects one or more encoded image data from the encoded image data of the first and second image data based on the level designation value of the video stream inserted in the container according to the decoding capability. And is subjected to a decoding process to obtain image data.
- the one corresponding to the decoding capability is selectively selected from the encoded image data of the first and second image data.
- the data is sent to the decoder for processing, and the processing in the decoder can be performed efficiently.
- FIG. 1 is a block diagram illustrating a configuration example of an MPEG-DASH-based stream distribution system.
- FIG. It is a figure which shows an example of the relationship of each structure arrange
- Embodiment> [Outline of MPEG-DASH based stream distribution system] First, an outline of an MPEG-DASH-based stream distribution system to which the present technology can be applied will be described.
- FIG. 1A shows a configuration example of an MPEG-DASH-based stream distribution system 30A.
- the media stream and the MPD file are transmitted through a communication network transmission line (communication transmission line).
- N service receivers 33-1, 33-2,..., 33-N connect a CDN (Content ⁇ ⁇ ⁇ ⁇ ⁇ Delivery Network) 34 to the DASH stream file server 31 and the DASH MPD server 32. It is a connected configuration.
- CDN Content ⁇ ⁇ ⁇ ⁇ ⁇ Delivery Network
- the DASH stream file server 31 generates a DASH specification stream segment (hereinafter referred to as “DASH segment” as appropriate) based on media data (video data, audio data, caption data, etc.) of predetermined content, and receives a service. A segment is sent in response to an HTTP request from the machine.
- the DASH stream file server 31 may be a dedicated streaming server or may be used as a web server.
- the DASH stream file server 31 responds to a request for a segment of a predetermined stream sent from the service receiver 33 (33-1, 33-2,..., 33-N) via the CDN 34, The segment of the stream is transmitted to the requesting receiver via the CDN 34.
- the service receiver 33 refers to the rate value described in the MPD (Media Presentation Description) file and selects the stream with the optimum rate according to the state of the network environment where the client is located. And make a request.
- the DASH MPD server 32 is a server that generates an MPD file for acquiring a DASH segment generated in the DASH stream file server 31.
- An MPD file is generated based on content metadata from a content management server (not shown) and the segment address (url) generated in the DASH stream file server 31.
- the DASH stream file server 31 and the DASHDAMPD server 32 may be physically the same.
- each attribute is described using an element called “Representation” for each stream such as video and audio.
- representations are described by dividing the representation. The service receiver 33 can select an optimum stream according to the state of the network environment in which the service receiver 33 is placed as described above with reference to the rate value.
- FIG. 1B shows a configuration example of an MPEG-DASH-based stream distribution system 30B.
- the media stream and the MPD file are transmitted through an RF transmission path (broadcast transmission path).
- This stream distribution system 30B includes a broadcast transmission system 36 to which a DASH stream file server 31 and a DASH-MPD server 32 are connected, and M service receivers 35-1, 35-2,..., 35-M. It is configured.
- the broadcast transmission system 36 puts the DASH specification stream segment (DASH segment) generated by the DASH stream file server 31 and the MPD file generated by the DASH MPD server 32 on the broadcast wave. Send.
- DASH segment DASH specification stream segment
- FIG. 2 shows an example of the relationship between the structures arranged hierarchically in the MPD file.
- a media presentation Media Presentation
- a media presentation includes a plurality of periods (Periods) separated by time intervals. For example, the first period starts from 0 seconds, the next period starts from 100 seconds, and so on.
- AdaptationSet there are a plurality of adaptation sets (AdaptationSet) in the period.
- Each adaptation set depends on differences in media types such as video and audio, differences in language and viewpoints even with the same media type.
- FIG. 2 (c) there are a plurality of representations (Representation) in the adaptation set.
- Each representation depends on stream attributes, such as differences in rates.
- the representation includes segment info (SegmentInfo).
- SegmentInfo As shown in FIG. 2D, the representation includes segment info (SegmentInfo).
- Segment info As shown in FIG. 2 (e), there are a plurality of media segments (Media) in which information for each of the initialization segment (Initialization Segment) and the segment (Segment) further divided into periods is described. Segment) exists.
- the media segment includes address (url) information and the like for actually acquiring segment data such as video and audio.
- stream switching can be freely performed between a plurality of representations included in the adaptation set.
- an optimal rate stream can be selected according to the state of the network environment on the receiving side, and video distribution without interruption is possible.
- FIG. 3 shows a configuration example of the transmission / reception system 10 as an embodiment.
- the transmission / reception system 10 includes a service transmission system 100 and a service receiver 200.
- the service transmission system 100 corresponds to the DASH stream file server 31 and the DASH MPD server 32 of the stream distribution system 30A shown in FIG.
- the service transmission system 100 corresponds to the DASH stream file server 31, the DASH MPD server 32, and the broadcast transmission system 36 of the stream distribution system 30B shown in FIG.
- the service receiver 200 is connected to the service receivers 33 (33-1, 33-2,..., 33-N) of the stream distribution system 30A shown in FIG. Correspond.
- the service receiver 200 is connected to the service receiver 35 (35-1, 35-2,..., 35-M) of the stream distribution system 30B shown in FIG. Correspond.
- the service transmission system 100 transmits a DASH / MP4, that is, an MPD file as a metafile and an MP4 as a container including a media stream (media segment) such as video or audio, to a communication network transmission path (FIG. 1A). (Refer to FIG. 1 (b)).
- FIG. 4 shows an example of an MP4 stream transmitted through a communication network transmission line or an RF transmission line.
- the entire service stream is fragmented and transmitted so that an image sound is output in the middle of transmission such as general broadcasting.
- IS initialization segment
- it starts with an initialization segment (IS: initialization segment)
- IS initialization segment
- the initialization segment has a box structure based on ISOBMFF (ISO Base Media Media Format).
- ISOBMFF ISO Base Media Media Format
- a “ftyp” box indicating a file type (File type) is arranged at the top, followed by a “moov” box for control. Although the detailed description is omitted, the “moov” box includes various boxes including the illustrated “mvex” box.
- a “leva” box is arranged in the “mvex” box. In this “leva” box, the assignment of the level defined by “temporal_layerID” is defined, and the picture is grouped for each level, or the level is assigned to the level. Individual tracks may be assigned.
- Control information is entered in the“ moof ”box.
- the “mdat” box contains the actual signal (transmission media) such as video and audio.
- a movie fragment (Movie Fragment) is configured by the “mdat” box and the “mdat” box. Since the “mdat” box of one movie fragment contains a fragment obtained by fragmenting the transmission medium, the control information entering the “moof” box becomes control information related to the fragment.
- MPEG Video GOP Group Picture
- the media stream is a predetermined number of images obtained by processing image data (moving image data) of ultra high resolution (UHD: Ultra High Definition) at a high frame rate (HFR: High Frame Rate).
- image data moving image data
- UHD Ultra High Definition
- HFR High Frame Rate
- the image data having a high frame rate and ultra-high resolution is, for example, 120K 4K / 8K image data.
- the predetermined number of video streams have encoded image data of the first to fourth image data.
- the first image data is base layer image data for obtaining a high-resolution image at a basic frame rate (normal frame rate).
- the second image data is base layer image data used together with the first image data to obtain a high resolution image at a high frame rate.
- the third image data is scalable layer image data used together with the first image data to obtain an ultra-high resolution image at the basic frame rate.
- the fourth image data is scalable layer image data used together with the first to third image data to obtain an ultra-high resolution image at a high frame rate.
- the first to fourth image data are obtained as follows. That is, the first image data is downgraded to the fifth image data obtained by extracting the first picture by downsampling from two consecutive picture units in the ultrahigh resolution image data at a high frame rate. It was obtained by applying a scale treatment. Each first picture taken out here may be mixed with the second picture at a predetermined ratio.
- the second image data is downscaled to sixth image data obtained by extracting each second picture from two consecutive pictures in super high resolution image data at a high frame rate by downsampling. It is obtained by processing. Each second picture taken out here may be mixed with the first picture at a predetermined ratio.
- the third image data is obtained by taking a difference between the seventh image data obtained by subjecting the first image data to the upscaling process and the fifth image data. is there.
- the fourth image data is obtained by taking the difference between the eighth image data obtained by subjecting the second image data to the upscaling process and the sixth image data. is there.
- Information corresponding to information related to image data of the video stream inserted in each of a predetermined number of video streams is inserted into MP4 as a container.
- information on image data included in the video stream is information such as “general_level_idc”, “general_profile_idc”, “sublayer_level_idc”, and “sublayer_profile_idc” included in SPS (sequence Parameter Set), and information corresponding to these information Located in the “moof” block.
- Case 1 Case 1
- Case 2 Case 2
- Case 3 Case 3
- the number of video streams (video files) and the number of tracks managing each video stream are different. Can be considered.
- “Case 1" MP4 includes a first video stream having encoded image data of first and second image data as base layer image data, and third and fourth image data as scalable layer image data. A second video stream having image data is included, and each of the first and second video streams is managed by one track.
- the picture of the first image data and the picture of the second image data are alternately encoded in the first video stream, and the picture of the third image data and the fourth image data in the second video stream.
- information corresponding to information related to encoded image data of two image data included in the video stream is arranged in a “moof” block corresponding to the track. That is, information is arranged in a state where the first and second video streams are managed by one track.
- information relating to the encoded image data of the first image data and information relating to the encoded image data of the second image data are grouped and inserted, and the second video stream is inserted.
- information relating to the encoded image data of the third image data and information relating to the encoded image data of the fourth image data are grouped and inserted.
- “Case 2" MP4 includes a first video stream having encoded image data of first and second image data as base layer image data, and third and fourth image data as scalable layer image data. A second video stream having image data is included, and each of the first and second video streams is managed by two tracks.
- the picture of the first image data and the picture of the second image data are alternately encoded in the first video stream, and the picture of the third image data and the fourth image data in the second video stream.
- the video stream there is a “moof” block for each track, and information relating to one of the encoded image data of the two image data that the video stream has is arranged. That is, information is arranged in a state where the first and second video streams are managed by two tracks.
- a first video stream having first encoded image data that is base layer image data, a second video stream having second encoded image data that is base layer image data, and scalable A third video stream having encoded image data of third image data that is image data of a layer and a fourth video stream having encoded image data of fourth image data that is image data of a scalable layer Included, and the first to fourth video streams are managed in separate tracks.
- information corresponding to information related to encoded image data of one image data included in the video stream is arranged in a “moof” block corresponding to each track. That is, information is arranged in a state where each of the first to fourth video streams is managed by one track.
- the high-frame-rate image data that is the basis of the first to fourth image data is, for example, high-dynamic range photoelectric conversion by performing photoelectric conversion based on high-dynamic range photoelectric conversion characteristics on high-dynamic range image data.
- This is transmission image data having characteristics.
- Conversion characteristic information indicating a high dynamic range photoelectric conversion characteristic or an electro-optical conversion characteristic corresponding to this characteristic is inserted into a video stream having encoded image data of the first image data.
- High dynamic range photoelectric conversion characteristics include hybrid log gamma characteristics and PQ curve characteristics.
- the converted data value based on the characteristic of the PQ curve is converted into the converted data based on the normal dynamic range photoelectric conversion characteristic in the video stream having the encoded image data of the first image data. Conversion information for converting to a value of is inserted.
- the service receiver 200 receives the MP4 as the above-mentioned container sent from the service transmission system 100 through a communication network transmission line (see FIG. 1A) or an RF transmission line (see FIG. 1B).
- the MP4 includes a predetermined number of video streams having encoded image data of the first to fourth image data. Further, as described above, information corresponding to the information related to the image data of the video stream inserted into each of the predetermined number of video streams is inserted into the MP4.
- the service receiver 200 selectively extracts predetermined encoded image data from the encoded image data of the first to fourth image data based on the information inserted in the MP4 according to the decoding capability, and performs decoding processing. To obtain image data.
- the encoded image data of the first image data is selectively subjected to a decoding process, and the high-resolution at the basic frame rate.
- the image data for displaying the image is obtained.
- the encoded image data of the first and second image data is selectively subjected to a decoding process to obtain a high Image data for displaying a high-resolution image at a frame rate is obtained.
- the encoded image data of the first and third image data is selectively subjected to decoding processing, Image data for displaying an ultra-high resolution image at a basic frame rate is obtained.
- the encoded image data of all the first to fourth image data is subjected to a decoding process to obtain a high Image data for displaying an ultra-high resolution image at a frame rate is obtained.
- the service receiver 200 When the service receiver 200 displays a high dynamic range, the service receiver 200 applies high dynamic range to the image data obtained by the decoding process based on the video stream having the first image data or the conversion characteristic information inserted in the MP4. Range electro-optic conversion is performed to obtain image data for display with a high dynamic range.
- the service receiver 200 when the service receiver 200 performs normal dynamic range display, if the high dynamic photoelectric conversion characteristic indicated by the conversion characteristic information is a hybrid log gamma curve characteristic, the service receiver 200 outputs the normal dynamic range as it is to the image data obtained by the decoding process.
- the image data for display having the normal dynamic range is obtained by performing the electro-optic conversion based on the range electro-optic conversion characteristic.
- the service receiver 200 When the service receiver 200 performs normal dynamic range display, if the high dynamic photoelectric conversion characteristic indicated by the conversion characteristic information is a characteristic of the PQ curve, the service receiver 200 adds the first image data to the image data obtained by the decoding process.
- the dynamic range conversion is performed based on the conversion information inserted in the video stream, and the normal dynamic range transmission image data is obtained.
- the normal dynamic range transmission image data is subjected to the electrodynamic conversion based on the normal dynamic range electroluminescence conversion characteristics, and the normal dynamic range is obtained. Obtain range display image data.
- FIG. 5 shows an outline of encoding / decoding processing in the service transmission system 100 and the service receiver 200.
- the video encoder 104 of the service transmission system 100 receives image data “HFR / UHD video” having a high frame rate (HFR) and ultra-high resolution (UHD).
- the image data “HFR / UHD ⁇ video ” is processed, and two video streams (in case 1 and case 2) having encoded image data of the first to fourth image data, or four A video stream (case 3) is obtained and transmitted.
- the video decoder 204A performs a decoding process on the encoded image data of all the first to fourth image data.
- image data “HFR / UHD video” for displaying an ultra-high resolution image at a high frame rate is obtained.
- the video decoder 204B selectively decodes the encoded image data of the first and third image data. As a result, image data “LFR / UHD video” for displaying an ultra-high resolution image at the basic frame rate is obtained.
- the video decoder 204C selectively decodes the encoded image data of the first and second image data.
- image data “HFR / HD video” for displaying a high-resolution image at a high frame rate is obtained.
- the video decoder 204D selectively decodes the encoded image data of the first image data.
- the image data “LFR / HD video” for displaying a high-resolution image at the basic frame rate is obtained.
- FIG. 6 hierarchically shows the first to fourth image data described above.
- a case where the high frame rate is 120P is shown.
- the horizontal axis shows the display order (POC: picture order of composition), the display time is on the left, and the display time is on the right.
- POC picture order of composition
- first image data “HD 60P” which is image data of the base layer, and its group ID (group_id) is set to “0”.
- the first image data is image data constituting the basic 60P, and the temporal layer ID (TemporalLayerId) is set to “0”.
- This second image data is image data constituting an extension 60P for making 120P image data, and the temporal layer ID (TemporalLayerId) is set to “1”.
- the second image data has scalability in the time direction with respect to the first image data “HD 60P”.
- the first and second image data are transmitted as the same video stream (video file).
- group ID when only the basic 60P is decoded, this group ID can be used as a guideline for determining which packet to send to the video decoder.
- the basic 60P and extended 60P packets may be alternately sent to the video decoder.
- the third image data is image data constituting the basic 60P, and the temporal layer ID (TemporalLayerId) is set to “0”.
- the third image data is spatially scalable with respect to the first image data “HD 60P”.
- the fourth image data is image data constituting an extension 60P for making 120P image data, and a temporal layer ID (TemporalLayerId) is set to “1”.
- the fourth image data is temporally scalable with respect to the third image data “Sc-UHDU60P” and spatially scalable with respect to the second “HD + 60P HFR”.
- the third and fourth image data are transmitted as the same video stream (video file).
- grouping by group ID when only the basic 60P is decoded, it can be used as a guideline for determining which packet of the group ID should be sent to the decoder.
- the basic 60P and extended 60P packets may be alternately sent to the video decoder.
- HD 60P Based on the first image data “HD 60P”, it is possible to reproduce a high resolution (HD) image (60P HD image) at the basic frame rate. Further, based on the first image data “HD 60P” and the second “HD + 60P HFR”, it is possible to reproduce a high resolution (HD) image (120P HD image) at a high frame rate.
- the first image data “HD ⁇ 60P ”and the third image data“ Sc-UHD 60P ” it is possible to reproduce an ultra-high resolution (UHD) image (60P UHD image) at the basic frame rate.
- the first image data “HD 60P”, the second image data “HD + 60P HFR”, the third image data “Sc-UHD 60P” and the fourth image data “Sc-UHD + 60P HFR” Based on this, it is possible to reproduce an ultra high resolution (UHD) image (120P UHD image) at a high frame rate.
- the numbers attached to the rectangular frames indicating the pictures indicate the encoding order and thus the decoding order.
- the decoding process is performed only on the encoded image data of the first image data, the decoding is performed in the order of 0 ⁇ 4 ⁇ 8 ⁇ .
- decoding is performed in the order of 0 ⁇ 2 ⁇ 4 ⁇ 6 ⁇ .
- decoding is performed in the order of 0 ⁇ 1 ⁇ 4 ⁇ 5 ⁇ .
- decoding processing is performed on the first to fourth image data
- decoding is performed in the order of 0 ⁇ 1 ⁇ 2 ⁇ 3 ⁇ 4 ⁇ 5 ⁇ .
- the encoding order of the images is 0 ⁇ 1 ⁇ 2 ⁇ 3 ⁇ 4 ⁇ 5 ⁇ . To do. In this way, the delay from reception to display can be minimized.
- the picture of the first image data and the picture of the second image data are encoded alternately. Will be.
- the pictures of the third image data and the pictures of the fourth image data are encoded alternately.
- FIG. 7 shows a configuration example of an MP4 stream (file) in Case 1 (Case 1).
- the illustration of the initialization segment (IS) and the boxes “styp”, “sidx”, and “six”, which are surrounded by a broken line frame in FIG. 4, is omitted.
- the example shown is an example of fragmented MP4 (Fragmented MP4).
- a predetermined number of movie fragments (Movie Fragment) configured by a “moof” box containing control information and an “mdat” box containing media data itself are arranged. Since the “mdat” box contains a fragment obtained by fragmenting the track data, the control information entering the “moof” box is control information related to the fragment.
- the code of the first and second image data is stored in the “mdat” box.
- the converted image data (access unit) is arranged for a predetermined number of pictures, for example, 1 GOP.
- the first image data access unit (AU: Access Unit) and the second image data access unit are alternately arranged.
- the location of each access unit is indicated by information in the “SIDX” box and “SSIX” box.
- Each access unit includes NAL units such as “VPS”, “SPS”, “PPS”, “SEI”, and “SLC”. Note that “VPS” and “SPS” are inserted into the head access unit of the GOP, for example.
- FIG. 8 shows an example of SPS (VPS) elements.
- the first to fourth image data are configured as shown in FIG.
- the value of “general_level_idc” is “156”, and the overall level of the encoded image data of the first to fourth image data (complex difference in pixel rate of scalable encoding) is “level 5.2”. Is shown.
- the value of “general_profile_idc” is “7”, and the overall profile (scalable encoding type) of the encoded image data of the first to fourth image data is “Scalable Main 10 Profile”. Is shown.
- sublayer_level_present_flag [j-1] is “1”
- the value of “sublayer_level_idc [j-1]” is “153”
- “sublayer_profile_idc [j-1]” is “7”. This indicates that the overall level of the encoded image data of the third and first image data is “level ⁇ 5.1”, and that the profile is “Scalable Main 10 Profile”.
- sublayer_level_present_flag [j-2] is “1”
- the value of “sublayer_level_idc [j-2]” is “150”
- “sublayer_profile_idc [j-2]” is “2”. This indicates that the overall level of the encoded image data of the second and first image data is “level 5”, and that the profile is “Main 10 Profile”.
- sublayer_level_present_flag [j-3] is “1”
- the value of “sublayer_level_idc [j-3]” is “123”
- “sublayer_profile_idc [j-3]” is “2”. This indicates that the level of the encoded image data of the first image data is “level 4.1”, and that the profile is “Main 10” Profile ”.
- the first video stream is managed by one track.
- control information for managing the encoded image data of the first image data in the “mdat” block and the encoded image of the second image data in the “mdat” block are stored.
- “moof (moof 0)” box “tscl” corresponding to the encoded image data of the first image data in the “mdat” block is included.
- a “tscl” box corresponding to the encoded image data of the second image data in the “mdat” block exists in the “moof (moof 0)” box.
- the “mdat” box contains the third and fourth image data.
- Encoded image data access units
- the third image data access unit AU: Access Unit
- the fourth image data access unit are alternately arranged.
- the location of each access unit is indicated by information in the “SIDX” box and “SSIX” box.
- Each access unit is composed of NAL units such as “PPS”, “SEI”, and “SLC”.
- extractor NAL units are arranged immediately before all access units.
- the numerical value shown in the rectangular frame indicating each access unit indicates the decoding order. This is the same in the following similar figures. For example, when decoding the access unit of “1”, it is necessary to refer to the access unit of “0”. In this case, “0” is added to the extractor arranged immediately before the access unit of “1”. The access unit's decoding result is copied and used.
- the decoding time stamp is added so that the decoding order of 120P in the base layer is 0 ⁇ 2 ⁇ 4 ⁇ 6 ⁇ .
- the decoding order of 60P is 0 ⁇ 4 ⁇ . That is, the basic 60P and the extended 60P are set so that the time stamp values are alternated in both the display order and the decoding order.
- the second video stream is managed by one track.
- control information for managing the encoded image data of the third image data in the “mdat” block and the encoded image of the fourth image data in the “mdat” block are stored.
- tscl corresponding to the encoded image data of the third image data in the “mdat” block is included in the “moof (moof 1)” box.
- a “tscl” box corresponding to the encoded image data of the fourth image data in the “mdat” block exists in the “moof (moof 1)” box.
- the transmission order of each sample (picture) is 0 ⁇ 1 ⁇ 2 ⁇ 3 ⁇ 4 ⁇ 5 ⁇ . In this way, the delay from reception to display can be minimized.
- FIG. 9 schematically shows an example of control information in the “moof (moof 0)” box.
- the MP4 stream how scalable layers are mapped by the “leva (level assignment)” box of the initialization segment (IS) corresponding to this “moof (moof 0)” box.
- a loop is made by the number of levels (level), and “Track_id”, “assignment_type”, and “grouping_type” are designated for each.
- level_count 2” is described, which indicates that one track “TR0” has two levels “level0, level1”.
- a “traf” box exists in the “moof (moof 0)” box
- a “run” box exists in the box.
- parameters of “sample_count” and “sample_composition_time_offset” are described. With these parameters, the basic 60P.
- a time stamp value indicating the decoding order and display order of the extended 60P is set.
- a “tfdt” box exists in the “moof (moof 0)” box, and two “sgpd” boxes exist in the box.
- the first “sgpd” box information relating to the first image data is arranged.
- a parameter of “grouping_type” is described.
- grouping_type 1” is set, indicating that the grouping type is a temporal layer group.
- a “scif” box exists under the “sgpd” box, and a parameter “group_id” is described in the “scif” box.
- “primary_groupID” is described together with “group_id”. The same applies to each description part of “group_id” below. This is for identifying that the group in which the value of “group_id” and the value of “primary_groupID” match is a basic 60P group.
- “group_id 0”, which matches the value of “primary_groupID”, this group is identified as a basic 60P group.
- tscl box exists in the “sgpd” box.
- four parameters “temporalLayerId”, “tllevel_idc”, “Tlprofile”, and “tlConstantFrameRate” are described.
- “TemporalLayerId” is set to “0” to indicate that the first image data corresponds to a picture (sample) included in the basic 60P.
- “TlConstantFrameRate” is set to “1” to indicate that the frame rate is constant.
- “Tllevel_idc” indicates the level of the encoded image data of the first image data, and is matched with “sublayer_level_idc [j-3]” of the SPS (or VPS) element.
- “tllevel_idc” is set to “123”.
- “Tlprofile” indicates the profile of the encoded image data of the first image data, and is matched with “sublayer_profile_idc [j-3]” of the SPS (or VPS) element described above.
- “Tlprofile” is set to “2”.
- the “sgpd” box there is a “scif” box, and a parameter “group_id” is described in this “scif” box.
- primary_groupID is described together with “group_id”.
- the “sgpd” box includes a “tscl” box. In this “tscl” box, four parameters “temporalLayerId”, “tllevel_idc”, “Tlprofile”, and “tlConstantFrameRate” are described.
- TemporalLayerId is set to “1” to indicate that the second image data corresponds to a picture (sample) included in the extension 60P.
- TlConstantFrameRate is set to “1” to indicate that the frame rate is constant.
- Tllevel_idc indicates the overall level of the encoded image data of the second and first image data, and is matched with “sublayer_level_idc [j-2]” of the SPS (or VPS) element described above. Here, “tllevel_idc” is set to “150”.
- Tlprofile indicates the profile of the encoded image data of the second and first image data, and is matched with “sublayer_profile_idc [j-2]” of the SPS (or VPS) element.
- Tlprofile is set to “2”.
- FIG. 10 schematically shows an example of control information in the “moof (moof 1)” box.
- a “traf” box exists in the “moof (moof 1)” box
- a “run” box exists in the box.
- parameters of “sample_count” and “sample_composition_time_offset” are described. With these parameters, the basic 60P.
- a time stamp value indicating the display order and decoding order of the extended 60P is set.
- a “tfdt” box exists in the “moof (moof 1)” box, and two “sgpd” boxes exist in succession in the box.
- the first “sgpd” box information relating to the first image data is arranged.
- a parameter of “grouping_type” is described.
- grouping_type 1” is set, indicating that the grouping type is a temporal layer group.
- the “sgpd” box there is a “scif” box, and a parameter “group_id” is described in this “scif” box.
- primary_groupID is described together with “group_id”.
- the “sgpd” box includes a “tscl” box. In this “tscl” box, four parameters “temporalLayerId”, “tllevel_idc”, “Tlprofile”, and “tlConstantFrameRate” are described.
- “TemporalLayerId” is set to “0” to indicate that the third image data corresponds to a picture (sample) included in the basic 60P.
- “TlConstantFrameRate” is set to “1” to indicate that the frame rate is constant.
- “Tllevel_idc” indicates the overall level of the encoded image data of the third and first image data, and is matched with “sublayer_level_idc [j ⁇ 1]” of the SPS (or VPS) element.
- tllevel_idc is set to “153”.
- Tlprofile indicates the overall profile of the encoded image data of the third and first image data, and is matched with “sublayer_profile_idc [j ⁇ 1]” of the SPS (or VPS) element described above.
- Tlprofile is set to “7”.
- the “sgpd” box there is a “scif” box, and a parameter “group_id” is described in this “scif” box.
- primary_groupID is described together with “group_id”.
- the “sgpd” box includes a “tscl” box. In this “tscl” box, four parameters “temporalLayerId”, “tllevel_idc”, “Tlprofile”, and “tlConstantFrameRate” are described.
- TemporalLayerId is set to “1” to indicate that the fourth image data corresponds to a picture (sample) included in the extension 60P.
- TlConstantFrameRate is set to “1” to indicate that the frame rate is constant.
- Tllevel_idc indicates the overall level of the encoded image data of the fourth to first image data, and is matched with “general_level_idc] of the SPS (or VPS) element described above. Here, “tllevel_idc” is set to “156”.
- Tlprofile indicates a profile of the entire encoded image data of the encoded image data of the fourth to first image data, and is matched with “general_profile_idc]” of the SPS (or VPS) element described above.
- “Tlprofile” is set to “7”.
- FIG. 11 shows a configuration example of an MP4 stream (file) in Case 2 (Case 2).
- the illustration of the initialization segment (IS) and the boxes “styp”, “sidx”, and “six”, which are surrounded by a broken line frame in FIG. 4, is omitted.
- the example shown is an example of fragmented MP4 (Fragmented MP4).
- a predetermined number of movie fragments (Movie Fragment) configured by a “moof” box containing control information and an “mdat” box containing media data itself are arranged. Since the “mdat” box contains a fragment obtained by fragmenting the track data, the control information entering the “moof” box is control information related to the fragment.
- the code of the first and second image data is stored in the “mdat” box.
- the converted image data (access unit) is arranged for a predetermined number of pictures, for example, 1 GOP.
- the first image data access unit (AU: Access Unit) and the second image data access unit are alternately arranged.
- the location of each access unit is indicated by information in the “SIDX” box and “SSIX” box.
- Each access unit includes NAL units such as “VPS”, “SPS”, “PPS”, “SEI”, and “SLC”. Note that “VPS” and “SPS” are inserted into the head access unit of the GOP, for example.
- an extractor NAL unit is arranged immediately before the second image data access unit for reference from the second image data access unit to the first image data access unit of another track.
- the access unit “2” is decoded, it is necessary to refer to the access unit “0”.
- “0” is displayed in the extractor arranged immediately before the access unit “2”. The decoding result of the access unit is copied and used.
- the first video stream is managed by two tracks. There are two “moof” boxes (moof 0, moof 1) corresponding to the “mdat” block. Control information for managing the encoded image data of the first image data in the “mdat” block exists in the “moof (moof 0)” box.
- tscl box exists in the “moof (moof 0)” box.
- “moof (moof 1)” box The details in the “moof (moof 1)” box will be described later, but the “tscl” box exists in the “moof (moof 1)” box.
- the “mdat” box contains the third and fourth image data.
- Encoded image data access units
- the third image data access unit AU: Access Unit
- the fourth image data access unit are alternately arranged.
- the location of each access unit is indicated by information in the “SIDX” box and “SSIX” box.
- Each access unit is composed of NAL units such as “PPS”, “SEI”, and “SLC”.
- the NAL unit of the extractor is arranged in For example, when decoding the access unit of “1”, it is necessary to refer to the access unit of “0”. In this case, “0” is displayed in the extractor arranged immediately before the access unit of “1”. The decoding result of the access unit is copied and used.
- the decoding time stamp is added so that the decoding order of 120P in the base layer is 0 ⁇ 2 ⁇ 4 ⁇ 6 ⁇ .
- the decoding order of 60P is 0 ⁇ 4 ⁇ . That is, the basic 60P and the extended 60P are set so that the time stamp values are alternated in both the display order and the decoding order.
- the second video stream is managed by two tracks.
- Control information for managing the encoded image data of the third image data in the “mdat” block exists in the “moof (moof 2)” box.
- tscl box exists in the “moof (moof 2)” box.
- “tsof” box exists in the “moof (moof 3)” box.
- the transmission order of each sample (picture) is 0 ⁇ 1 ⁇ 2 ⁇ 3 ⁇ 4 ⁇ 5 ⁇ . In this way, the delay from reception to display can be minimized.
- FIG. 12 schematically shows an example of control information in the “moof (moof 1)” box and the “moof (moof 1)” box.
- the layer having scalability is mapped by the “leva (level (assignement)” box of the initialization segment (IS) corresponding to these “moof” boxes.
- a loop is made by the number of levels (level), and “Track_id”, “grouping_type”, and “assignment_type” are designated for each.
- a “traf” box exists in the “moof (moof 1)” box
- a “run” box exists in the box.
- parameters of “sample_count” and “sample_composition_time_offset” are described. With these parameters, a time stamp value indicating the display order and decoding order of the basic 60P is set.
- a “tfdt” box exists in the “moof (moof 0)” box, and an “sgpd” box exists in the box.
- Information relating to the first image data is arranged in the “sgpd” box.
- a parameter of “grouping_type” is described.
- grouping_type 1” is set, indicating that the grouping type is a temporal layer group.
- the “sgpd” box there is a “scif” box, and a parameter “group_id” is described in this “scif” box.
- primary_groupID is described together with “group_id”.
- group_id 0”, which matches the value of “primary_groupID”, this group is identified as a basic 60P group.
- the “sgpd” box includes a “tscl” box. In this “tscl” box, four parameters “temporalLayerId”, “tllevel_idc”, “Tlprofile”, and “tlConstantFrameRate” are described.
- “TemporalLayerId” is set to “0” to indicate that the first image data corresponds to a picture (sample) included in the basic 60P.
- “TlConstantFrameRate” is set to “1” to indicate that the frame rate is constant.
- “Tllevel_idc” indicates the level of the encoded image data of the first image data, and is matched with “sublayer_level_idc [j-3]” of the SPS (or VPS) element.
- “tllevel_idc” is set to “123”.
- “Tlprofile” indicates the profile of the encoded image data of the first image data, and is matched with “sublayer_profile_idc [j-3]” of the SPS (or VPS) element described above.
- “Tlprofile” is set to “2”.
- a “traf” box exists in the “moof (moof 1)” box, and a “tfhd” box exists in the box.
- This “tfhd” box has a track ID “track_id”, which indicates that the track is “TR1”.
- a “traf” box exists in the “moof (moof 1)” box, and a “tfdt” box exists in the box.
- the decoding time “baseMediaDecodeTime” of the first access unit after the “moof (moof 1)” box there is a description of the decoding time “baseMediaDecodeTime” of the first access unit after the “moof (moof 1)” box.
- the decode time “baseMediaDecodeTime” is the same value as the decode time “baseMediaDecodeTime” of the track TR0 pointed to by the extractor.
- a “traf” box exists in the “moof (moof 1)” box, and a “run” box exists in the box.
- this “run” box parameters of “sample_count” and “sample_composition_time_offset” are described. With these parameters, a time stamp value indicating the display order and decoding order of the extended 60P is set.
- a “tfdt” box exists in the “moof (moof 1)” box, and an “sgpd” box exists in the box.
- this “sgpd” box information related to the second image data is arranged.
- a parameter of “grouping_type” is described.
- grouping_type 1” is set, indicating that the grouping type is a temporal layer group.
- the “sgpd” box there is a “scif” box, and a parameter “group_id” is described in this “scif” box.
- primary_groupID is described together with “group_id”.
- the “sgpd” box includes a “tscl” box. In this “tscl” box, four parameters “temporalLayerId”, “tllevel_idc”, “Tlprofile”, and “tlConstantFrameRate” are described.
- TemporalLayerId is set to “1” to indicate that the second image data corresponds to a picture (sample) included in the extension 60P.
- TlConstantFrameRate is set to “1” to indicate that the frame rate is constant.
- Tllevel_idc indicates the overall level of the encoded image data of the second and first image data, and is matched with “sublayer_level_idc [j-2]” of the SPS (or VPS) element described above. Here, “tllevel_idc” is set to “150”.
- Tlprofile indicates the overall profile of the encoded image data of the second and first image data, and is matched with “sublayer_profile_idc [j-2]” of the SPS (or VPS) element described above.
- Tlprofile is set to “2”.
- FIG. 13 schematically shows an example of control information in the “moof (moof 2)” box and the “moof (moof 3)” box.
- a “traf” box exists in the “moof (moof 2)” box, and a “run” box exists in the box.
- this “run” box parameters of “sample_count” and “sample_composition_time_offset” are described. With these parameters, a time stamp value indicating the display order and decoding order of the basic 60P is set.
- a “tfdt” box exists in the “moof (moof 2)” box, and an “sgpd” box exists in the box.
- this “sgpd” box information relating to the third image data is arranged.
- a parameter of “grouping_type” is described.
- grouping_type 1” is set, indicating that the grouping type is a temporal layer group.
- the “sgpd” box there is a “scif” box, and a parameter “group_id” is described in this “scif” box.
- primary_groupID is described together with “group_id”.
- this group is identified as not a basic 60P group.
- the “sgpd” box includes a “tscl” box. In this “tscl” box, four parameters “temporalLayerId”, “tllevel_idc”, “Tlprofile”, and “tlConstantFrameRate” are described.
- “TemporalLayerId” is set to “0” to indicate that the third image data corresponds to a picture (sample) included in the basic 60P.
- “TlConstantFrameRate” is set to “1” to indicate that the frame rate is constant.
- “Tllevel_idc” indicates the overall level of the encoded image data of the third and first image data, and is matched with “sublayer_level_idc [j ⁇ 1]” of the SPS (or VPS) element.
- tllevel_idc is set to “153”.
- Tlprofile indicates the overall profile of the encoded image data of the third and first image data, and is matched with “sublayer_profile_idc [j ⁇ 1]” of the SPS (or VPS) element described above.
- Tlprofile is set to “7”.
- a “traf” box exists in the “moof (moof 3)” box, and a “tfhd” box exists in the box.
- this “tfhd” box there is a description of the track ID “track_id”, which indicates that the track is “TR3”.
- a “traf” box exists in the “moof (moof 3)” box, and a “tfdt” box exists in the box.
- this “tfdt” box there is a description of the decoding time “baseMediaDecodeTime” of the first access unit after the “moof (moof 3)” box.
- the decode time “baseMediaDecodeTime” is set to the same value as the decode time “baseMediaDecodeTime” of the track TR2 pointed to by the extractor, and hence the decode time “baseMediaDecodeTime” of the track TR0.
- a “traf” box exists in the “moof (moof 1)” box, and a “run” box exists in the box.
- this “run” box parameters of “sample_count” and “sample_composition_time_offset” are described. With these parameters, a time stamp value indicating the display order and decoding order of the extended 60P is set.
- a “tfdt” box exists in the “moof (moof 3)” box, and an “sgpd” box exists in the box.
- this “sgpd” box information on the fourth image data is arranged.
- a parameter of “grouping_type” is described.
- the “sgpd” box there is a “scif” box, and a parameter “group_id” is described in this “scif” box.
- primary_groupID is described together with “group_id”.
- the “sgpd” box includes a “tscl” box. In this “tscl” box, four parameters “temporalLayerId”, “tllevel_idc”, “Tlprofile”, and “tlConstantFrameRate” are described.
- “TemporalLayerId” is set to “1” to indicate that the fourth image data corresponds to a picture (sample) included in the extension 60P.
- “TlConstantFrameRate” is set to “1” to indicate that the frame rate is constant.
- “Tllevel_idc” indicates the overall level of the encoded image data of the fourth to first image data, and is matched with “general_level_idc” of the SPS (or VPS) element described above.
- “tllevel_idc” is set to “156”.
- “Tlprofile” indicates the overall profile of the encoded image data of the fourth to first image data, and is matched with “general_profile_idc] of the SPS (or VPS) element described above.
- “Tlprofile” is set to “7”.
- FIG. 14 shows a configuration example of an MP4 stream (file) in Case 3 (Case 3).
- the illustration of the initialization segment (IS) and the boxes “styp”, “sidx”, and “six”, which are surrounded by a broken line frame in FIG. 4, is omitted.
- the example shown is an example of fragmented MP4 (Fragmented MP4).
- a predetermined number of movie fragments (Movie Fragment) configured by a “moof” box containing control information and an “mdat” box containing media data itself are arranged. Since the “mdat” box contains a fragment obtained by fragmenting the track data, the control information entering the “moof” box is control information related to the fragment.
- the “mdat” box has encoded image data (access unit) of the first image data.
- the location of each access unit is indicated by information in the “SIDX” box and “SSIX” box.
- Each access unit includes NAL units such as “VPS”, “SPS”, “PPS”, “SEI”, and “SLC”. Note that “VPS” and “SPS” are inserted into the head access unit of the GOP, for example.
- the first video stream is managed by one track, and one “moof” box (moofmo0) exists corresponding to the “mdat” block.
- Control information for managing the encoded image data of the first image data in the “mdat” block exists in the “moof (moof 0)” box.
- the “mdat” box has the encoded image data ( Access units) are arranged for a predetermined number of pictures, for example, 1 GOP.
- the location of each access unit is indicated by information in the “SIDX” box and “SSIX” box.
- Each access unit is composed of NAL units such as “PPS”, “SEI”, and “SLC”.
- extractor NAL units are arranged immediately before all access units for reference from the second image data access unit to the first image data access unit of another track. For example, when the access unit “2” is decoded, it is necessary to refer to the access unit “0”. In this case, “0” is displayed in the extractor arranged immediately before the access unit “2”. The decoding result of the access unit is copied and used.
- the second video stream is managed by one track, and one “moof” box (moof 1) exists corresponding to the “mdat” block. .
- Control information for managing the encoded image data of the second image data in the “mdat” block exists in the “moof (moof 1)” box.
- the decoding time stamp is added so that the decoding order of 120P in the base layer is 0 ⁇ 2 ⁇ 4 ⁇ 6 ⁇ .
- the decoding order of 60P is 0 ⁇ 4 ⁇ . That is, the basic 60P and the extended 60P are set so that the time stamp values are alternated in both the display order and the decoding order.
- the encoded image data of the third image data are arranged for a predetermined number of pictures, for example, 1 GOP.
- the location of each access unit is indicated by information in the “SIDX” box and “SSIX” box.
- Each access unit is composed of NAL units such as “PPS”, “SEI”, and “SLC”.
- extractor NAL units are arranged immediately before all access units. For example, when decoding the access unit of “1”, it is necessary to refer to the access unit of “0”. In this case, “0” is displayed in the extractor arranged immediately before the access unit of “1”. The decoding result of the access unit is copied and used.
- the third video stream is managed by one track, and one “moof” box (moofmo2) exists corresponding to the “mdat” block.
- Control information for managing the encoded image data of the third image data in the “mdat” block exists in the “moof (moof 2)” box.
- the “mdat” box has encoded image data ( Access units) are arranged for a predetermined number of pictures, for example, 1 GOP. The location of each access unit is indicated by information in the “SIDX” box and “SSIX” box. Each access unit is composed of NAL units such as “PPS”, “SEI”, and “SLC”.
- the NAL unit of the extractor is arranged in For example, when decoding the access unit “3”, it is necessary to refer to the access unit “2” or “1”. In this case, the access unit is placed immediately before the access unit “2” or “1”. The decoded results of the access units “2” and “1” are copied to the two extractors.
- the fourth video stream is managed by one track, and there is one “moof” box (moof) 3) corresponding to the “mdat” block.
- Control information for managing the encoded image data of the fourth image data in the “mdat” block exists in the “moof (moof 3)” box.
- the transmission order of each sample (picture) is 0 ⁇ 1 ⁇ 2 ⁇ 3 ⁇ 4 ⁇ 5 ⁇ . In this way, the delay from reception to display can be minimized.
- sample_count and “sample_composition_time_offset” for setting the time stamp value indicating the display order and decoding order of the extended 60P will be further described.
- “BaseMediaDecodeTime” in the “tfdt” box represents the decoding time stamp of the first sample (picture) of the fragment.
- the decoding time of each subsequent sample is described by “sample_count” in the “run” box.
- the display time stamp of each sample is represented by “sample_composition_time_offset” indicating an offset (offset) from “sample_count”.
- sample_count of “0” matches “baseMediaDecodeTime”, and “sample_count” of “2” and “4” is incremented one by one in units of 120 Hz. It becomes the value. This indicates that the decoding time of the sample of “2” that is the sample of the extended 60P is sandwiched between the decoding times of the samples of “0” and “4” that are the samples of the basic 60P.
- “Sample_count” of “1” is the same value as the previous extractor and indicates that there is no time offset.
- the extractor “3” is arranged when referring to “2”, and “sample_count” takes the same value as “2”.
- the reference destination of the sample “3” is “1”
- the value increased by 1 to “sample_count” of “1” is set to the value of “sample_count” of “3”.
- sample_count corresponding to the decoding time is added with an accuracy of 120 Hz.
- a receiver that performs base 60P decoding of the base layer transfers only samples belonging to the group of the basic 60P to the decoder one by one.
- sample_count of the “2” extractor in the base layer is the same as the “sample_count” of “0”.
- sample_count of “2” is a value increased by 1 to “sample_count” of the immediately preceding extractor.
- the value of “sample_count” of “4” is a value further increased by 1 to “sample_count” of “2”.
- this is similarly performed. In this way, “sample_count” corresponding to the decoding time is added with an accuracy of 120 Hz.
- an extractor of “1” represents an inter-layer reference, its “sample_count” is the same value as “0”, and “sample_count” of “1” is the same as the previous extractor Value.
- the extractor of “3” refers to another track in the scalable layer (scalable layer)
- its “sample_count” is the same as “1” or “2” of the base layer (base layer)
- “sample_count” is the same as “2”.
- the value of “sample_count” of “3” is the same value as “2”.
- Example_count of “5” has the same value as “4”.
- the decoding of the sample of “3”, which is the extended 60P sample is performed between the decoding times of the two samples, “1” and “5”, which are the basic 60P samples. Time will be caught.
- a receiver that performs 60P decoding of a scalable layer transfers “sample_count” of the samples in the layer to the decoder in units of samples belonging to the basic 60P group.
- FIG. 15 shows a description example of the MPD file in the case of transmission with a two-stream configuration (case 1 and case 2).
- FIG. 16 shows the “Value” semantics of “SupplementaryDescriptor”.
- AdaptationSet an adaptation set for the video stream
- the video stream is supplied in an MP4 file structure, and it is shown that there are 150 levels and 156 levels of HEVC encoded image data.
- this MPD file there is a first representation (Representation) corresponding to a first video stream having encoded image data of the first and second image data, and the third and fourth image data.
- the location of the second video stream is indicated as “video-bitstreamScalable.mp4” by the description “ ⁇ BaseURL> video-bitstreamScalable.mp4 ⁇ / BaseURL>”.
- FIG. 17 shows a description example of the MPD file in the case of transmission with a 4-stream configuration (case 2).
- case 2 a 4-stream configuration
- the MPD file includes first, second, and second video streams corresponding to the first, second, third, and fourth video streams respectively having encoded image data of the first, second, third, and fourth image data.
- FIG. 18 shows a configuration example of the service transmission system 100.
- the service transmission system 100 includes a control unit 101, an HDR (High Dynamic Range) photoelectric conversion unit 102, an RGB / YCbCr conversion unit 103, a video encoder 104, a container encoder 105, and a transmission unit 106.
- HDR High Dynamic Range
- the control unit 101 includes a CPU (Central Processing Unit) and controls the operation of each unit of the service transmission system 100 based on a control program.
- the HDR photoelectric conversion unit 102 applies the HDR photoelectric conversion characteristic to the image data (video data) Vh having a high frame rate and an ultra-high resolution (for example, 4K 120P) and a high dynamic range (HDR) to perform photoelectric conversion. Then, HDR transmission image data V1 is obtained.
- This HDR transmission video data V1 is a video material produced by HDR OETF.
- the characteristics of STD-B67 (HLG: HybridGLog-Gamma) or ST2084 (PQ: PPerceptual Quantizer curve) are applied as the HDR photoelectric conversion characteristics.
- FIG. 19 shows an example of photoelectric conversion characteristics of SDR (normal dynamic range) and HDR (high dynamic range).
- the horizontal axis indicates the input luminance level
- the vertical axis indicates the transmission code value.
- the broken line a indicates the SDR photoelectric conversion characteristic (BT.709: gamma characteristic).
- BT.709 gamma characteristic
- the transmission code value becomes the peak level MP.
- SL is 100 cd / m 2 .
- the solid line b indicates the characteristics of STD-B67 (HLG) as the HDR photoelectric conversion characteristics.
- a one-dot chain line c indicates the characteristic of ST2084 (PQ curve) as the HDR photoelectric conversion characteristic.
- ST2084 PQ curve
- the characteristics of STD-B67 include a compatible area with SDR photoelectric conversion characteristics (BT.709: gamma characteristics).
- BT.709 gamma characteristics
- the curves of both characteristics match from the input luminance level of zero to the compatibility limit value of both characteristics.
- the transmission code value becomes the compatibility level SP.
- the characteristic of ST2084 (PQ curve) corresponds to a high luminance and is a curve of a quantization step that is said to be suitable for human visual characteristics.
- the RGB / YCbCr converter 103 converts the HDR transmission video data V1 obtained by the HDR photoelectric converter 102 from the RGB domain to the YCbCr (luminance / color difference) domain.
- RGB domain RGB domain
- luminance / color difference domains are not limited to YCbCr.
- the video encoder 104 performs encoding such as MPEG4-AVC or HEVC on the HDR transmission video data V1 converted into the YCbCr domain to obtain encoded image data, and a predetermined number including the encoded image data Video stream.
- the first video stream having the encoded image data of the first and second image data and the encoded image data of the third and fourth image data Is generated (see FIGS. 6, 7, and 11).
- a first video stream having encoded image data of the first image data, a second video stream having encoded image data of the second image data, A third video stream having encoded image data of third image data and a fourth video stream having encoded image data of fourth image data are generated (see FIGS. 6 and 14).
- the video encoder 104 converts the photoelectric conversion characteristic of the HDR transmission image data V1 in the VUI (video usability information) area of the SPS / NAL unit of the access unit (AU) or the electro-optical conversion characteristic corresponding to the characteristic.
- Insert characteristic information (transferfunction).
- the photoelectric conversion characteristic of the HDR transmission image data V1 is STD-B67 (HLG)
- Conversion characteristic information indicating 709 gamma characteristic
- the conversion characteristic information indicating STD-B67 (HLG) is inserted in a newly defined transfer function SEI message (transfer_function SEI message) to be described later, which is inserted into the “SEIs” portion of the access unit (AU). Be placed.
- the video encoder 104 adds the dynamic range to the “SEIs” portion of the access unit (AU).
- a newly defined dynamic range conversion SEI message (Dynamic_range_conv SEI message), which will be described later, having conversion information for conversion is inserted.
- This conversion information is conversion information for converting the conversion data value based on the characteristics of ST2084 (PQ curve) into the conversion data values based on the SDR photoelectric conversion characteristics.
- a solid line a indicates an example of an SDR-OETF curve indicating SDR photoelectric conversion characteristics.
- a solid line b shows an example of the characteristics of ST2084 (PQ curve) as an HDR-OETF curve.
- the horizontal axis indicates the input luminance level
- P1 indicates the input luminance level corresponding to the SDR peak level
- P2 indicates the input luminance level corresponding to the HDR maximum level.
- the vertical axis represents the transmission code value or the relative value of the normalized coding level.
- the relative maximum level M indicates the HDR maximum level and the SDR maximum level.
- the reference level G indicates the HDR OETF transmission level at the input luminance level P1 corresponding to the SDR maximum level, which means a so-called reference white level, and a range higher than this level is used for HDR-specific glitter expression. It shows that.
- a branch level B indicates a level at which the SDR OETF curve and the HDR OETF curve branch off from the same track.
- Pf indicates an input luminance level corresponding to the branch level.
- the branch level B can be an arbitrary value of 0 or more. If the branch level is not specified, it is approximated by a distribution operation method corresponding to the branch level or a ratio from the whole on the receiving side.
- FIG. 21 shows a top access unit of GOP (Group Of Pictures) when the encoding method is HEVC.
- a decoding SEI message group “Prefix_SEIs” is arranged before a slice (slices) in which pixel data is encoded, and a display SEI message group “ “Suffix_SEIs” is arranged.
- the transfer function SEI message and the dynamic range conversion SEI message are arranged as an SEI message group “Suffix_SEIs”, for example, as illustrated.
- FIG. 22A shows a structure example (Syntax) of a transfer function SEI message.
- FIG. 22B shows the contents (Semantics) of main information in the structural example.
- the 8-bit field of “transferfunction” indicates the photoelectric conversion characteristic of the transmission video data V1 or the electro-optical conversion characteristic corresponding to the characteristic. If the value of this element is different from the value of “transferfunction” of VUI, it is replaced with the value of this element.
- the 16-bit field “peak_luminance” indicates the maximum luminance level. This maximum luminance level indicates the maximum luminance level of the content, for example, in a program or a scene. On the receiving side, this value can be used as a reference value when creating a display image suitable for the display capability.
- An 8-bit field of “color_space” indicates color space information.
- FIG. 23 shows a structural example (Syntax) of a dynamic range conversion SEI message.
- FIG. 24 shows the contents (Semantics) of main information in the structural example.
- the 1-bit flag information “Dynamic_range_conv_cancel_flag” indicates whether the message “Dynamic_range_conv” is to be refreshed. “0” indicates that the message “Dynamic_range_conv” is refreshed. “1” indicates that the message “Dynamic_range_conv” is not refreshed, that is, the previous message is maintained as it is.
- the 8-bit field of “coded_data_bit_depth” indicates the number of encoded pixel bits (the number of bits of the transmission code value).
- a 14-bit field of “reference_level” indicates a reference luminance level value, that is, a reference level G (see FIG. 20).
- the 1-bit flag information of “ratio_conversion_flag” indicates that simple conversion is performed, that is, a conversion coefficient exists.
- the 1-bit flag information of “conversion_table_flag” indicates that a conversion table is used, that is, conversion table information exists.
- a 16-bit field of “branch_level” indicates a branch level B (see FIG. 20).
- ratio_conversion_flag When “ratio_conversion_flag” is “1”, an 8-bit field of “level_conversion_ratio” exists. This field indicates a conversion coefficient (level conversion ratio). When “conversion_table_flag” is “1”, an 8-bit field of “table_size” exists. This field indicates the number of input conversion tables. There are as many 16-bit fields of “level_R [i]”, “level_G [i]”, and “level_B [i]” as many as the number of inputs. A field of “level_R [i]” indicates a value after conversion of a red component (Red component). A field of “level_G [i]” indicates a value after conversion of a green component (Red component). The field of “level_B [i]” indicates a value after conversion of the blue component (Red component).
- the container encoder 105 generates a container including a predetermined number of video streams VS generated by the video encoder 104, here an MP4 stream, as a distribution stream STM.
- the MP4 stream including the first video stream having the encoded image data of the first and second image data, and the third and fourth image data An MP4 stream including the second video stream having the encoded image data is generated (see FIGS. 6, 7, and 11).
- the MP4 stream including the first video stream having the encoded image data of the first image data and the second having the encoded image data of the second image data.
- An MP4 stream including the stream is generated (see FIGS. 6 and 14).
- the transmitting unit 106 transmits the MP4 distribution stream STM obtained by the container encoder 105 on a broadcast wave or a network packet, and transmits it to the service receiver 200.
- Image data (video data) Vh having a high frame rate and an extremely high resolution (for example, 4K 120P) and a high dynamic range (HDR) is supplied to the HDR photoelectric conversion unit 102.
- the HDR video data Vh is subjected to photoelectric conversion with the HDR photoelectric conversion characteristic, and HDR transmission video data as a video material produced by HDR-OETF is obtained.
- the characteristics of STD-B67 (HLG) or the characteristics of ST2084 (PQ curve) are applied as the HDR photoelectric conversion characteristics.
- the HDR transmission video data V1 obtained by the HDR photoelectric conversion unit 102 is converted from the RGB domain to the YCbCr domain by the RGB / YCbCr conversion unit 103, and then supplied to the video encoder 104.
- the video encoder 104 for example, MPEG4-AVC or HEVC is encoded on the HDR transmission video data V1 converted into the YCbCr domain to obtain encoded image data, and this encoded image data is included.
- a predetermined number of video streams are generated.
- the first video stream having the encoded image data of the first and second image data and the encoded image data of the third and fourth image data Is generated (see FIGS. 6, 7, and 11).
- a first video stream having encoded image data of the first image data, a second video stream having encoded image data of the second image data, A third video stream having encoded image data of third image data and a fourth video stream having encoded image data of fourth image data are generated (see FIGS. 6 and 14).
- the conversion characteristic information (transferfunction) indicating the photoelectric conversion characteristic of the HDR transmission video data V1 or the electro-optical conversion characteristic corresponding to the characteristic. Is inserted.
- the photoelectric conversion characteristic of the HDR transmission video data V1 is STD-B67 (HLG)
- Conversion characteristic information indicating 709 gamma characteristic
- the conversion characteristic information indicating STD-B67 (HLG) is arranged in a transfer function SEI message (see FIG. 22) inserted in the “SEIs” portion of the access unit (AU).
- the dynamic range conversion is performed on the “SEIs” portion of the access unit (AU).
- a dynamic range / conversion SEI message (see FIG. 23) having the conversion information is inserted.
- This conversion information is conversion information for converting the conversion data value based on the characteristics of ST2084 (PQ curve) into the conversion data values based on the SDR photoelectric conversion characteristics.
- the predetermined number of video streams VS generated by the video encoder 104 is supplied to the container encoder 105.
- a container including a predetermined number of video streams VS, here an MP4 stream, is generated as a distribution stream STM.
- the MP4 stream including the first video stream having the encoded image data of the first and second image data, and the third and fourth image data An MP4 stream including the second video stream having the encoded image data is generated (see FIGS. 6, 7, and 11).
- the MP4 stream including the first video stream having the encoded image data of the first image data and the second having the encoded image data of the second image data.
- An MP4 stream including the stream is generated (see FIGS. 6 and 14).
- the MP4 stream generated as the distribution stream STM by the container encoder 105 is supplied to the transmission unit 106.
- the MP4 distribution stream STM obtained by the container encoder 105 is transmitted to the service receiver 200 on a broadcast wave or a network packet.
- FIG. 25 shows a configuration example of the service receiver 200.
- the service receiver 200 includes a control unit 201, a reception unit 202, a container decoder 203, a video decoder 204, a YCbCr / RGB conversion unit 205, an HDR lightning conversion unit 206, and an SDR lightning conversion unit 207. ing.
- the control unit 201 includes a CPU (Central Processing Unit) and controls the operation of each unit of the service receiver 200 based on a control program.
- the receiving unit 202 receives the MP4 distribution stream STM transmitted from the service transmission system 100 on broadcast waves or net packets.
- the container decoder (multiplexer) 103 is based on the “moof” block information from the MP4 distribution stream STM received by the receiving unit 202 according to the decoding capability of the receiver 200 under the control of the control unit 201. Thus, the encoded image data of the required image data is selectively extracted and sent to the video decoder 204.
- the container decoder 203 extracts encoded image data of all the first to fourth image data, Send to decoder 204. Also, for example, when the receiver 200 has a decoding capability capable of processing ultra-high resolution image data at the basic frame rate, the container decoder 203 takes out the encoded image data of the first and third image data, Send to video decoder 204.
- the container decoder 203 takes out the encoded image data of the first and second image data, and outputs the video Send to decoder 204. Also, for example, when the receiver 200 has a decoding capability capable of processing high-resolution image data at the basic frame rate, the container decoder 203 takes out the encoded image data of the first image data and sends it to the video decoder 204. send.
- the container decoder 203 checks the level value (tlevel_idc) inserted in the container, compares with the decoding capability of the video decoder 204, and determines whether or not reception is possible. At this time, a value corresponding to the complexity (general_level_idc) of the entire stream in the received video stream is detected from “tlevel_idc” in the “moof” block.
- the container decoder 203 checks “tlevel_idc” in the “moof” block corresponding to the value of another element (sublayer_level_idc) in the video stream. Then, it is determined whether or not decoding is possible within the corresponding range, and the encoded image data of the corresponding image data is transferred to the video decoder 204.
- the container decoder 203 detects a value corresponding to the complexity (general_level_idc) of the entire stream in the received video stream from “tlevel_idc” in the “moof” block, and matches the decoding capability of the receiver.
- the encoded image data of all the image data included in the received video stream is transferred to the video decoder 204 in the order of the decoding time stamp.
- the video decoder 204 decodes the encoded image data selectively extracted by the container decoder 203 to obtain HDR transmission video data V1 ′.
- the HDR transmission video data V1 ′ is an image for displaying an ultra-high resolution image at a high frame rate. It becomes data.
- the HDR transmission video data V1 ′ displays an ultra-high resolution image at the basic frame rate. Image data.
- the HDR transmission video data V1 ′ is an image for displaying a high-resolution image at a high frame rate. It becomes data.
- the HDR transmission video data V1 ′ is an image for displaying a high-resolution image at the basic frame rate. It becomes data.
- the video decoder 204 extracts the parameter set and SEI message inserted in the encoded image data selectively extracted by the container decoder 203 and sends the extracted parameter set and SEI message to the control unit 201.
- the extracted information includes conversion characteristic information (transferfunction) indicating the photoelectric conversion characteristic of the transmission video data V1 inserted in the VUI area of the SPS NAL unit of the access unit described above or the electro-optical conversion characteristic corresponding to the characteristic, and transfer A function SEI message (see FIG. 22) is also included.
- conversion characteristic information transferfunction
- transfer A function SEI message see FIG. 22
- the extracted information includes a dynamic range / conversion SEI message (see FIG. 23) when the HDR photoelectric conversion characteristic applied to the HDR transmission video data V1 ′ is the characteristic of ST2084 (PQ curve).
- the control unit 201 recognizes dynamic range conversion information (conversion table, conversion coefficient).
- the YCbCr / RGB conversion unit 205 converts the HDR transmission video data V1 ′ obtained by the video decoder 204 from the YCbCr (luminance / color difference) domain to the RGB domain.
- the HDR photoelectric conversion unit 206 applies HDR electro-optical conversion characteristics to the HDR transmission video data V1 ′ converted into the RGB domain to obtain display video data Vhd for displaying an HDR image.
- the control unit 201 gives the HDR photoelectric conversion unit 206 the HDR light conversion characteristic recognized from the VUI or the transfer function SEI message, that is, the HDR light conversion characteristic corresponding to the HDR photoelectric conversion characteristic applied on the transmission side. Set.
- the SDR photoelectric conversion unit 207 obtains display video data Vsd for displaying an SDR image by applying the SDR electro-optic conversion characteristics to the HDR transmission video data V1 ′ converted into the RGB domain.
- the HDR photoelectric conversion characteristic applied to the HDR transmission video data V1 ′ is the STD-B67 (HLG) characteristic
- the SDR photoelectric conversion unit 207 converts the HDR transmission video data V1 ′ into the SDR photoelectric conversion as it is.
- display video data Vsd for displaying the SDR image is obtained.
- the SDR photoelectric conversion unit 207 converts the dynamic range conversion information ( Based on the conversion table and conversion coefficient, dynamic range conversion is performed to obtain SDR transmission image data.
- SDR electro-optic conversion characteristics By applying SDR electro-optic conversion characteristics to the SDR transmission image data, display video data Vsd for displaying an SDR image is obtained. obtain.
- the vertical axis represents the output luminance level and corresponds to the horizontal axis of FIG.
- the horizontal axis indicates the transmission code value, and corresponds to the vertical axis in FIG.
- a solid line a is an SDR-EOTF curve indicating SDR electro-optic conversion characteristics.
- the SDR EOTF curve corresponds to the SDR OETF curve indicated by the solid line a in FIG.
- a solid line b is an HDR-EOTF curve showing the HDR electro-optic conversion characteristics.
- This HDR-EOTF curve corresponds to the characteristics of ST2084 (PQ curve) as the HDR-OETF curve indicated by the solid line b in FIG.
- P1 ′ indicates an output luminance level corresponding to a predetermined level H lower than the reference level G.
- the input data up to a predetermined level H lower than the reference level G in the HDR transmission video data V1 ′ is converted so as to coincide with the value of the conversion data based on the SDR photoelectric conversion characteristics.
- Input data below the branch level B is used as output data as it is.
- dynamic range level conversion is performed based on the tone mapping characteristic TM indicated by the alternate long and short dash line.
- level H is converted to level H ′
- reference level G is converted to level G ′
- level M is set to level M as it is.
- the level conversion based on the tone mapping characteristic TM is performed on the input data from the level H to the level M, so that the image quality deterioration due to the level saturation from the reference level G to the relative maximum level M can be reduced.
- the reception unit 202 receives the MP4 distribution stream STM transmitted from the service transmission system 100 on broadcast waves or net packets. This distribution stream STM is supplied to the container decoder 203.
- the container decoder 203 is necessary based on the “moof” block information from the MP4 distribution stream STM received by the reception unit 202 according to the decoding capability of the receiver 200.
- the encoded image data of the image data to be selected is selectively extracted and supplied to the video decoder 204.
- the container decoder 203 extracts encoded image data of all the first to fourth image data. To the video decoder 204. Also, for example, when the receiver 200 has a decoding capability capable of processing image data with ultra-high resolution at the basic frame rate, the container decoder 203 extracts encoded image data of the first and third image data. And supplied to the video decoder 204.
- the container decoder 203 extracts the encoded image data of the first and second image data. To the video decoder 204. Further, for example, when the receiver 200 has a decoding capability capable of processing high-resolution image data at the basic frame rate, the container decoder 203 extracts the encoded image data of the first image data, and the video decoder 204.
- the encoded image data selectively extracted by the container decoder 203 is subjected to decoding processing to obtain HDR transmission video data V1 ′.
- the HDR transmission video data V1 ′ is an image for displaying an ultra-high resolution image at a high frame rate. It is data.
- the HDR transmission video data V1 ′ displays an ultra-high resolution image at the basic frame rate. Image data.
- the HDR transmission video data V1 ′ is an image for displaying a high-resolution image at a high frame rate. It is data.
- the HDR transmission video data V1 ′ is an image for displaying a high-resolution image at the basic frame rate. It is data.
- the video decoder 204 extracts the parameter set and SEI message inserted in the encoded image data selectively extracted by the container decoder 203 and sends them to the control unit 201.
- the conversion characteristic information (transfer function) indicating the photoelectric conversion characteristic of the transmission video data V1 inserted in the VUI area of the SPS NAL unit or the electro-optical conversion characteristic corresponding to the characteristic
- the transfer function SEI message Based on (see FIG. 22), the HDR photoelectric conversion characteristic applied to the HDR transmission video data V1 ′ is recognized. Further, the control unit 201 recognizes dynamic range conversion information (conversion table, conversion coefficient) based on the dynamic range / conversion SEI message (see FIG. 23).
- HDR transmission video data V1 ′ obtained by the video decoder 204 is converted from the YCbCr domain to the RGB domain by the YCbCr / RGB conversion unit 205 and then supplied to the HDR light conversion unit 206 or the SDR light conversion unit 207.
- the HDR electro-optic conversion characteristics are applied to the HDR transmission video data V1 ′ converted to the RGB domain, and display video data Vhd for displaying an HDR image is obtained.
- the HDR photoelectric conversion unit 206 controls the control unit 201 to control the HDR light-emission characteristic recognized from the VUI or transfer function SEI message, that is, the HDR light-emission corresponding to the HDR photoelectric conversion characteristic applied on the transmission side. Conversion characteristics are set.
- the SDR electro-optic conversion characteristics are applied to the HDR transmission video data V1 ′ converted to the RGB domain, and display video data Vsd for displaying an SDR image is obtained.
- the HDR photoelectric conversion characteristic applied to the HDR transmission video data V1 ′ is the STD-B67 (HLG) characteristic
- the SDR electro-optical conversion characteristic is applied to the HDR transmission video data V1 ′ as it is.
- HDR photoelectric conversion characteristics applied to the HDR transmission video data V1 ′ are the characteristics of ST2084 (PQ curve)
- dynamic range conversion information conversion table, conversion
- the dynamic range conversion is performed based on the coefficient to obtain SDR transmission image data (see FIG. 26), and the SDR electro-optic conversion characteristics are applied to the SDR transmission image data.
- information corresponding to information (SPS information) related to image data included in each of the predetermined number of video streams is stored in a container (MP4 stream information). To be inserted in the “moof” block). Therefore, on the receiving side, it is easy to extract predetermined encoded image data from the first to fourth image data included in a predetermined number of streams and perform decoding processing based on this information according to the decoding capability. It becomes possible.
- the HDR photoelectric conversion characteristic or the conversion characteristic information indicating the electro-optical conversion characteristic corresponding to this characteristic is inserted into the video stream having the encoded image data of the first image data. It is. Therefore, on the receiving side, it is possible to easily perform appropriate electro-optic conversion based on this conversion characteristic information.
- the high dynamic range photoelectric conversion characteristic is the characteristic of the PQ curve
- the converted data based on the characteristic of the PQ curve is converted into the video stream having the encoded image data of the first image data.
- the conversion information for converting the value into the value of the conversion data based on the normal dynamic range photoelectric conversion characteristic is inserted. Therefore, on the receiving side, when the high dynamic range photoelectric conversion characteristic is the characteristic of the PQ curve, it is possible to obtain display image data satisfactorily when performing normal dynamic range display.
- the decoding time (tfdt) of the track fragment (tfdt) of the “moof” is obtained for at least the first offset information of the track including the extension stream.
- the container is MP4 (ISOBMFF)
- ISOBMFF ISOB MFF
- the present technology is not limited to the MP4 container, and can be similarly applied to containers of other formats such as MPEG-2 TS and MMT.
- this technique can also take the following structures.
- a transmission apparatus comprising: an information insertion unit that inserts information corresponding to information related to image data of the video stream inserted into each of the predetermined number of video streams into the container.
- the first video stream having the encoded image data of the first image data and the encoded image data of the second image data;
- a second video stream having encoded image data of third image data and encoded image data of the fourth image data is included;
- the information insertion part The transmission apparatus according to (1), wherein the information is inserted into the container in a state where each of the first and second video streams is managed by one track.
- the information insertion unit When inserting the information into the container, For the first video stream, information relating to the encoded image data of the first image data and information relating to the encoded image data of the second image data are inserted as a group, As for the second video stream, information relating to the encoded image data of the third image data and information relating to the encoded image data of the fourth image data are grouped and inserted. Transmission according to (2) apparatus. (4) In the first video stream, the picture of the first image data and the picture of the second image data are encoded alternately, The transmission apparatus according to (2) or (3), wherein a picture of the third image data and a picture of the fourth image data are alternately encoded in the second video stream.
- the first video stream having the encoded image data of the first image data and the encoded image data of the second image data;
- a second video stream having encoded image data of third image data and encoded image data of the fourth image data is included;
- the information insertion part The transmission device according to (1), wherein the information is inserted into the container in a state where the first and second video streams are managed by two tracks.
- the picture of the first image data and the picture of the second image data are encoded alternately
- the transmission device according to (5), wherein the picture of the third image data and the picture of the fourth image data are encoded alternately in the second video stream.
- the container of the predetermined format transmitted by the transmission unit has a first video stream having the encoded image data of the first image data and an encoded image data of the second image data.
- a second video stream, a third video stream having encoded image data of the third image data, and a fourth video stream having encoded image data of the fourth image data are included.
- the information insertion part The transmission apparatus according to (1), wherein the information is inserted in a state where each of the first to fourth video streams is managed by one track.
- the high frame rate and ultra-high resolution image data is transmission image data obtained by performing photoelectric conversion on the high dynamic range image data according to the high dynamic range photoelectric conversion characteristics to provide the high dynamic range photoelectric conversion characteristics.
- the information insertion part The conversion characteristic information indicating the high dynamic range photoelectric conversion characteristic or the electro-optical conversion characteristic corresponding to the characteristic is further inserted into a video stream having the encoded image data of the first image data.
- (1) to (7) The transmission apparatus in any one of.
- the information insertion unit Conversion information for converting the converted data value based on the PQ curve characteristic into the converted data value based on the normal dynamic range photoelectric conversion characteristic is further inserted into the video stream having the encoded image data of the first image data.
- a receiving unit that receives a container of a predetermined format including a predetermined number of video streams,
- the predetermined number of video streams are obtained by processing ultra-high resolution image data at a high frame rate, the first image data for obtaining a high resolution image at a basic frame rate, and the first image Second image data for use with data to obtain a high resolution image at a high frame rate, and third image data for use with the first image data to obtain an ultra high resolution image at a basic frame rate
- fourth image data for obtaining an ultra-high resolution image at a high frame rate using together with the first to third image data,
- Information corresponding to information related to image data of the video stream inserted in each of the predetermined number of video streams is inserted into the container, Based on the information inserted in the container according to the decoding capability, predetermined encoded image data is selectively extracted from the encoded image data of the first to fourth image data, and decoding processing is performed.
- a receiving device further comprising a processing unit for obtaining data.
- Ultra high resolution image data at the high frame rate is transmission image data in which high dynamic range photoelectric conversion characteristics are obtained by performing photoelectric conversion on high dynamic range image data using high dynamic range photoelectric conversion characteristics; Conversion characteristic information indicating the high dynamic range photoelectric conversion characteristic or the electro-optic conversion characteristic corresponding to the characteristic is inserted into the video stream having the encoded image data of the first image data, The processing unit 14.
- the receiving apparatus wherein the image data obtained by the decoding process is subjected to electro-optic conversion based on the conversion characteristic information to obtain display image data.
- Ultra high resolution image data at the high frame rate is transmission image data obtained by performing photoelectric conversion by high dynamic range photoelectric conversion characteristics on the high dynamic range image data to have high dynamic range photoelectric conversion characteristics;
- the high dynamic range photoelectric conversion characteristic is a characteristic of a PQ curve, Conversion information for converting the value of the conversion data based on the characteristics of the PQ curve into the value of the conversion data based on the normal dynamic range photoelectric conversion characteristics is inserted into the video stream having the encoded image data of the first image data.
- the processing unit When displaying the normal dynamic range, The image data obtained by the decoding process is subjected to dynamic range conversion based on the conversion information to obtain normal dynamic range transmission image data, and the normal dynamic range transmission image data is subjected to electro-optic conversion based on the normal dynamic range electro-optical conversion characteristics.
- the display device according to (13), wherein the image data for display is obtained.
- the reception unit includes a reception step of receiving a container of a predetermined format including a predetermined number of video streams,
- the predetermined number of video streams are obtained by processing ultra-high resolution image data at a high frame rate, the first image data for obtaining a high resolution image at a basic frame rate, and the first image Second image data for use with data to obtain a high resolution image at a high frame rate, and third image data for use with the first image data to obtain an ultra high resolution image at a basic frame rate
- fourth image data for obtaining an ultra-high resolution image at a high frame rate using together with the first to third image data, Information corresponding to information related to image data of the video stream inserted in each of the predetermined number of video streams is inserted into the container, Based on the information inserted in the container according to the decoding capability, predetermined encoded image data is selectively extracted from the encoded image data of the first to fourth image data, and decoding processing is performed.
- a receiving method further comprising a processing step of obtaining data.
- the main feature of the present technology is that when transmitting a container including a predetermined number of video streams related to spatio-temporal scalability, information on image data (SPS of the SPS) included in each of the predetermined number of video streams is transmitted.
- SPS of the SPS information on image data
- the container the “moof” block of the MP4 stream
- the first to fourth included in the predetermined number of streams are based on this information according to the decoding capability. This makes it easy to extract predetermined encoded image data from the image data and perform the decoding process (see FIGS. 7, 11, and 14).
Abstract
Description
ハイフレームレートで超高解像度の画像データを処理して、基本フレームレートで高解像度の画像を得るための第1の画像データと、上記第1の画像データと共に用いてハイフレームレートで高解像度の画像を得るための第2の画像データと、上記第1の画像データと共に用いて基本フレームレートで超高解像度の画像を得るための第3の画像データと、上記第1から第3の画像データと共に用いてハイフレームレートで超高解像度の画像を得るための第4の画像データを得る画像処理部と、
上記第1から第4の画像データの符号化画像データを持つ所定数のビデオストリームを含むコンテナを送信する送信部と、
上記コンテナに上記所定数のビデオストリームのそれぞれに挿入された当該ビデオストリームが持つ画像データに関する情報に対応した情報を挿入する情報挿入部を備える
送信装置にある。
所定数のビデオストリームを含むコンテナを受信する受信部を備え、
上記所定数のビデオストリームは、ハイフレームレートで超高解像度の画像データを処理して得られた、基本フレームレートで高解像度の画像を得るための第1の画像データと、上記第1の画像データと共に用いてハイフレームレートで高解像度の画像を得るための第2の画像データと、上記第1の画像データと共に用いて基本フレームレートで超高解像度の画像を得るための第3の画像データと、上記第1から第3の画像データと共に用いてハイフレームレートで超高解像度の画像を得るための第4の画像データを持ち、
上記コンテナに上記所定数のビデオストリームのそれぞれに挿入された当該ビデオストリームが持つ画像データに関する情報に対応した情報が挿入されており、
デコード能力に応じて、上記コンテナに挿入されている情報に基づき、上記第1から第4の画像データの符号化画像データから所定の符号化画像データを選択的に取り出してデコード処理を行って画像データを得る処理部をさらに備える
受信装置にある。
ハイフレームレートの画像データを処理して、基本フレームレート画像を得るための第1の画像データと該第1の画像データと共に用いてハイフレームレートの画像データを得るための第2の画像データを得る画像処理部と、
上記第1および第2の画像データの符号化画像データを持つ1つ以上のビデオストリームを含むコンテナを送信する送信部と、
上記コンテナに、上記第1の画像データの符号化画像データに対応して、上記第1の画像データの符号化画像データに対応したビデオストリームのレベル指定値を挿入し、上記第2の画像データの符号化画像データに対応して、上記第1および第2の画像データの符号化画像データを合わせたビデオストリームのレベル指定値を挿入する情報挿入部を備える
送信装置にある。
1つ以上のビデオストリームを含むコンテナを受信する受信部を備え、
上記1つ以上のビデオストリームは、基本フレームレート画像を得るための第1の画像データと該第1の画像データと共に用いてハイフレームレートの画像データを得るための第2の画像データを持ち、
上記コンテナに、上記第1の画像データの符号化画像データに対応して、上記第1の画像データの符号化画像データに対応したビデオストリームのレベル指定値が挿入されており、上記第2の画像データの符号化画像データに対応して、上記第1および第2の画像データの符号化画像データを合わせたビデオストリームのレベル指定値が挿入されており、
デコード能力に応じて、上記コンテナに挿入されている上記ビデオストリームのレベル指定値に基づき、上記第1および第2の画像データの符号化画像データから一つ以上の符号化画像データを選択的に取り出してデコード処理を行って画像データを得る処理部をさらに備える
受信装置にある。
1.実施の形態
2.変形例
[MPEG-DASHベースのストリーム配信システムの概要]
最初に、本技術を適用し得るMPEG-DASHベースのストリーム配信システムの概要を説明する。
図3は、実施の形態としての送受信システム10の構成例を示している。この送受信システム10は、サービス送信システム100とサービス受信機200により構成されている。この送受信システム10において、サービス送信システム100は、上述の図1(a)に示すストリーム配信システム30AのDASHストリームファイルサーバ31およびDASH MPDサーバ32に対応する。また、この送受信システム10において、サービス送信システム100は、上述の図1(b)に示すストリーム配信システム30BのDASHストリームファイルサーバ31、DASH MPDサーバ32および放送送出システム36に対応する。
MP4に、ベースレイヤの画像データである第1、第2の画像データの符号化画像データを持つ第1のビデオストリームと、スケーラブルレイヤの画像データである第3、第4の画像データの符号化画像データを持つ第2のビデオストリームが含まれ、第1および第2のビデオストリームはそれぞれ1トラックで管理される。
MP4に、ベースレイヤの画像データである第1、第2の画像データの符号化画像データを持つ第1のビデオストリームと、スケーラブルレイヤの画像データである第3、第4の画像データの符号化画像データを持つ第2のビデオストリームが含まれ、第1および第2のビデオストリームがそれぞれ2トラックで管理される。
MP4に、ベースレイヤの画像データである第1の符号化画像データを持つ第1のビデオストリームと、ベースレイヤの画像データである第2の符号化画像データを持つ第2のビデオストリームと、スケーラブルレイヤの画像データである第3の画像データの符号化画像データを持つ第3のビデオストリームと、スケーラブルレイヤの画像データである第4の画像データの符号化画像データを持つ第4のビデオストリームが含まれ、第1から第4のビデオストリームが各々別のトラックで管理される。
図18は、サービス送信システム100の構成例を示している。このサービス送信システム100は、制御部101と、HDR(High Dynamic Range:ハイダイナミックレンジ)光電変換部102と、RGB/YCbCr変換部103と、ビデオエンコーダ104と、コンテナエンコーダ105と、送信部106を有している。
出力データ=分岐レベルB+(入力データ-分岐レベルB)*C ・・・(1)
図25は、サービス受信機200の構成例を示している。このサービス受信機200は、制御部201と、受信部202と、コンテナデコーダ203と、ビデオデコーダ204と、YCbCr/RGB変換部205と、HDR電光変換部206と、SDR電光変換部207を有している。
なお、上述実施の形態においては、基本ストリームと拡張ストリームとが別のトラックで伝送される場合に、拡張ストリームはエクストラクタ(extractor)に依存した構成を前提とする記載としている。しかし、これは単なる例で、実際はエクストラクタが存在せずとも拡張ストリームのデコードタイミングを管理することは可能である。
(1)ハイフレームレートで超高解像度の画像データを処理して、基本フレームレートで高解像度の画像を得るための第1の画像データと、上記第1の画像データと共に用いてハイフレームレートで高解像度の画像を得るための第2の画像データと、上記第1の画像データと共に用いて基本フレームレートで超高解像度の画像を得るための第3の画像データと、上記第1から第3の画像データと共に用いてハイフレームレートで超高解像度の画像を得るための第4の画像データを得る画像処理部と、
上記第1から第4の画像データの符号化画像データを持つ所定数のビデオストリームを含む所定フォーマットのコンテナを送信する送信部と、
上記コンテナに上記所定数のビデオストリームのそれぞれに挿入された当該ビデオストリームが持つ画像データに関する情報に対応した情報を挿入する情報挿入部を備える
送信装置。
(2)上記送信部が送信する上記所定フォーマットのコンテナには、上記第1の画像データの符号化画像データと上記第2の画像データの符号化画像データを持つ第1のビデオストリームと、上記第3の画像データの符号化画像データと上記第4の画像データの符号化画像データを持つ第2のビデオストリームが含まれており、
上記情報挿入部は、
上記第1および第2のビデオストリームをそれぞれ1トラックで管理する状態で上記情報を上記コンテナに挿入する
前記(1)に記載の送信装置。
(3)上記情報挿入部は、
上記情報を上記コンテナに挿入する際に、
上記第1のビデオストリームに関しては上記第1の画像データの符号化画像データに関する情報と上記第2の画像データの符号化画像データに関する情報をグループ分けして挿入し、
上記第2のビデオストリームに関しては上記第3の画像データの符号化画像データに関する情報と上記第4の画像データの符号化画像データに関する情報をグループ分けして挿入する
前記(2)に記載の送信装置。
(4)上記第1のビデオストリームにおいて上記第1の画像データのピクチャと上記第2の画像データのピクチャが交互に符号化されており、
上記第2のビデオストリームにおいて上記第3の画像データのピクチャと上記第4の画像データのピクチャが交互に符号化されている
前記(2)または(3)に記載の送信装置。
(5)上記送信部が送信する上記所定フォーマットのコンテナには、上記第1の画像データの符号化画像データと上記第2の画像データの符号化画像データを持つ第1のビデオストリームと、上記第3の画像データの符号化画像データと上記第4の画像データの符号化画像データを持つ第2のビデオストリームが含まれており、
上記情報挿入部は、
上記第1および第2のビデオストリームをそれぞれ2トラックで管理する状態で上記情報を上記コンテナに挿入する
前記(1)に記載の送信装置。
(6)上記第1のビデオストリームにおいて上記第1の画像データのピクチャと上記第2の画像データのピクチャが交互に符号化されており、
上記第2のビデオストリームにおいて上記第3の画像データのピクチャと上記第4の画像データのピクチャが交互に符号化されている
前記(5)に記載の送信装置。
(7)上記送信部が送信する上記所定フォーマットのコンテナには、上記第1の画像データの符号化画像データを持つ第1のビデオストリームと、上記第2の画像データの符号化画像データを持つ第2のビデオストリームと、上記第3の画像データの符号化画像データを持つ第3のビデオストリームと、上記第4の画像データの符号化画像データを持つ第4のビデオストリームが含まれており、
上記情報挿入部は、
上記第1から第4のビデオストリームをそれぞれ1トラックで管理する状態で上記情報を挿入する
前記(1)に記載の送信装置。
(8)上記ハイフレームレートで超高解像度の画像データは、ハイダイナミックレンジ画像データにハイダイナミックレンジ光電変換特性による光電変換を行ってハイダイナミックレンジ光電変換特性を持たせた伝送画像データであり、
上記情報挿入部は、
上記ハイダイナミックレンジ光電変換特性または該特性に対応した電光変換特性を示す変換特性情報を、上記第1の画像データの符号化画像データを持つビデオストリームにさらに挿入する
前記(1)から(7)のいずれかに記載の送信装置。
(9)上記ハイダイナミックレンジ光電変換特性はハイブリッドログガンマの特性である
前記(8)に記載の送信装置。
(10)上記ハイダイナミックレンジ光電変換特性はPQカーブの特性である
前記(8)に記載の送信装置。
(11)上記情報挿入部は、
上記第1の画像データの符号化画像データを持つビデオストリームに、上記PQカーブの特性による変換データの値を通常ダイナミックレンジ光電変換特性による変換データの値に変換するための変換情報をさらに挿入する
前記(10)に記載の送信装置。
(12)ハイフレームレートで超高解像度の画像データを処理して、基本フレームレートで高解像度の画像を得るための第1の画像データと、上記第1の画像データと共に用いてハイフレームレートで高解像度の画像を得るための第2の画像データと、上記第1の画像データと共に用いて基本フレームレートで超高解像度の画像を得るための第3の画像データと、上記第1から第3の画像データと共に用いてハイフレームレートで超高解像度の画像を得るための第4の画像データを得る画像処理ステップと、
送信部により、上記第1から第4の画像データの符号化画像データを持つ所定数のビデオストリームを含む所定フォーマットのコンテナを送信する送信ステップと、
上記コンテナに上記所定数のビデオストリームのそれぞれに挿入された当該ビデオストリームが持つ画像データに関する情報に対応した情報を挿入する情報挿入ステップを有する
送信方法。
(13)所定数のビデオストリームを含む所定フォーマットのコンテナを受信する受信部を備え、
上記所定数のビデオストリームは、ハイフレームレートで超高解像度の画像データを処理して得られた、基本フレームレートで高解像度の画像を得るための第1の画像データと、上記第1の画像データと共に用いてハイフレームレートで高解像度の画像を得るための第2の画像データと、上記第1の画像データと共に用いて基本フレームレートで超高解像度の画像を得るための第3の画像データと、上記第1から第3の画像データと共に用いてハイフレームレートで超高解像度の画像を得るための第4の画像データを持ち、
上記コンテナに上記所定数のビデオストリームのそれぞれに挿入された当該ビデオストリームが持つ画像データに関する情報に対応した情報が挿入されており、
デコード能力に応じて、上記コンテナに挿入されている情報に基づき、上記第1から第4の画像データの符号化画像データから所定の符号化画像データを選択的に取り出してデコード処理を行って画像データを得る処理部をさらに備える
受信装置。
(14)上記ハイフレームレートで超高解像度の画像データは、ハイダイナミックレンジ画像データにハイダイナミックレンジ光電変換特性による光電変換を行ってハイダイナミックレンジ光電変換特性を持たせた伝送画像データであり、
上記第1の画像データの符号化画像データを持つビデオストリームに、上記ハイダイナミックレンジ光電変換特性または該特性に対応した電光変換特性を示す変換特性情報が挿入されており、
上記処理部は、
上記デコード処理で得られた画像データに上記変換特性情報に基づいて電光変換を行って表示用画像データを得る
前記13に記載の受信装置。
(15)上記ハイフレームレートで超高解像度の画像データは、ハイダイナミックレンジ画像データにハイダイナミックレンジ光電変換特性による光電変換を行ってハイダイナミックレンジ光電変換特性を持たせた伝送画像データであり、
上記ハイダイナミックレンジ光電変換特性はPQカーブの特性であり、
上記第1の画像データの符号化画像データを持つビデオストリームに、上記PQカーブの特性による変換データの値を通常ダイナミックレンジ光電変換特性による変換データの値に変換するための変換情報が挿入されており、
上記処理部は、
通常ダイナミックレンジ表示をするとき、
上記デコード処理で得られた画像データに、上記変換情報に基づいてダイナミックレンジ変換を行って通常ダイナミックレンジ伝送画像データを得、該通常ダイナミックレンジ伝送画像データに通常ダイナミックレンジ電光変換特性による電光変換を行って表示用画像データを得る
前記(13)に記載の受信装置。
(16)受信部により、所定数のビデオストリームを含む所定フォーマットのコンテナを受信する受信ステップを有し、
上記所定数のビデオストリームは、ハイフレームレートで超高解像度の画像データを処理して得られた、基本フレームレートで高解像度の画像を得るための第1の画像データと、上記第1の画像データと共に用いてハイフレームレートで高解像度の画像を得るための第2の画像データと、上記第1の画像データと共に用いて基本フレームレートで超高解像度の画像を得るための第3の画像データと、上記第1から第3の画像データと共に用いてハイフレームレートで超高解像度の画像を得るための第4の画像データを持っており、
上記コンテナに上記所定数のビデオストリームのそれぞれに挿入された当該ビデオストリームが持つ画像データに関する情報に対応した情報が挿入されており、
デコード能力に応じて、上記コンテナに挿入されている情報に基づき、上記第1から第4の画像データの符号化画像データから所定の符号化画像データを選択的に取り出してデコード処理を行って画像データを得る処理ステップをさらに有する
受信方法。
30A,30B・・・MPEG-DASHベースのストリーム配信システム
31・・・DASHストリームファイルサーバ
32・・・DASH MPDサーバ
33,33-1~33-N・・・サービス受信機
34・・・CDN
35,35-1~35-M・・・サービス受信機
36・・・放送送出システム
100・・・サービス送信システム
101・・・制御部
102・・・HDR光電変換部
103・・・RGB/YCbCr変換部
104・・・ビデオエンコーダ
105・・・コンテナエンコーダ
106・・・送信部
200,200A,200B,200C,200D・・・サービス受信機
201・・・制御部
202・・・受信部
203・・・コンテナデコーダ
204,204A,204B,204C,204D・・・ビデオデコーダ
205・・・YCbCr/RGB変換部
206・・・HDR電光変換部
207・・・SDR電光変換部
Claims (20)
- ハイフレームレートで超高解像度の画像データを処理して、基本フレームレートで高解像度の画像を得るための第1の画像データと、上記第1の画像データと共に用いてハイフレームレートで高解像度の画像を得るための第2の画像データと、上記第1の画像データと共に用いて基本フレームレートで超高解像度の画像を得るための第3の画像データと、上記第1から第3の画像データと共に用いてハイフレームレートで超高解像度の画像を得るための第4の画像データを得る画像処理部と、
上記第1から第4の画像データの符号化画像データを持つ所定数のビデオストリームを含むコンテナを送信する送信部と、
上記コンテナに上記所定数のビデオストリームのそれぞれに挿入された当該ビデオストリームが持つ画像データに関する情報に対応した情報を挿入する情報挿入部を備える
送信装置。 - 上記送信部が送信する上記コンテナには、上記第1の画像データの符号化画像データと上記第2の画像データの符号化画像データを持つ第1のビデオストリームと、上記第3の画像データの符号化画像データと上記第4の画像データの符号化画像データを持つ第2のビデオストリームが含まれており、
上記情報挿入部は、
上記第1および第2のビデオストリームをそれぞれ1トラックで管理する状態で上記情報を上記コンテナに挿入する
請求項1に記載の送信装置。 - 上記情報挿入部は、
上記情報を上記コンテナに挿入する際に、
上記第1のビデオストリームに関しては上記第1の画像データの符号化画像データに関する情報と上記第2の画像データの符号化画像データに関する情報をグループ分けして挿入し、
上記第2のビデオストリームに関しては上記第3の画像データの符号化画像データに関する情報と上記第4の画像データの符号化画像データに関する情報をグループ分けして挿入する
請求項2に記載の送信装置。 - 上記第1のビデオストリームにおいて上記第1の画像データのピクチャと上記第2の画像データのピクチャが交互に符号化されており、
上記第2のビデオストリームにおいて上記第3の画像データのピクチャと上記第4の画像データのピクチャが交互に符号化されている
請求項2に記載の送信装置。 - 上記送信部が送信する上記コンテナには、上記第1の画像データの符号化画像データと上記第2の画像データの符号化画像データを持つ第1のビデオストリームと、上記第3の画像データの符号化画像データと上記第4の画像データの符号化画像データを持つ第2のビデオストリームが含まれており、
上記情報挿入部は、
上記第1および第2のビデオストリームをそれぞれ2トラックで管理する状態で上記情報を上記コンテナに挿入する
請求項1に記載の送信装置。 - 上記第1のビデオストリームにおいて上記第1の画像データのピクチャと上記第2の画像データのピクチャが交互に符号化されており、
上記第2のビデオストリームにおいて上記第3の画像データのピクチャと上記第4の画像データのピクチャが交互に符号化されている
請求項5に記載の送信装置。 - 上記送信部が送信する上記コンテナには、上記第1の画像データの符号化画像データを持つ第1のビデオストリームと、上記第2の画像データの符号化画像データを持つ第2のビデオストリームと、上記第3の画像データの符号化画像データを持つ第3のビデオストリームと、上記第4の画像データの符号化画像データを持つ第4のビデオストリームが含まれており、
上記情報挿入部は、
上記第1から第4のビデオストリームをそれぞれ1トラックで管理する状態で上記情報を挿入する
請求項1に記載の送信装置。 - 上記ハイフレームレートで超高解像度の画像データは、ハイダイナミックレンジ画像データにハイダイナミックレンジ光電変換特性による光電変換を行ってハイダイナミックレンジ光電変換特性を持たせた伝送画像データであり、
上記情報挿入部は、
上記ハイダイナミックレンジ光電変換特性または該特性に対応した電光変換特性を示す変換特性情報を、上記第1の画像データの符号化画像データを持つビデオストリームにさらに挿入する
請求項1に記載の送信装置。 - 上記ハイダイナミックレンジ光電変換特性はハイブリッドログガンマの特性である
請求項8に記載の送信装置。 - 上記ハイダイナミックレンジ光電変換特性はPQカーブの特性である
請求項8に記載の送信装置。 - 上記情報挿入部は、
上記第1の画像データの符号化画像データを持つビデオストリームに、上記PQカーブの特性による変換データの値を通常ダイナミックレンジ光電変換特性による変換データの値に変換するための変換情報をさらに挿入する
請求項10に記載の送信装置。 - 画像処理部が、ハイフレームレートで超高解像度の画像データを処理して、基本フレームレートで高解像度の画像を得るための第1の画像データと、上記第1の画像データと共に用いてハイフレームレートで高解像度の画像を得るための第2の画像データと、上記第1の画像データと共に用いて基本フレームレートで超高解像度の画像を得るための第3の画像データと、上記第1から第3の画像データと共に用いてハイフレームレートで超高解像度の画像を得るための第4の画像データを得る画像処理ステップと、
送信部が、上記第1から第4の画像データの符号化画像データを持つ所定数のビデオストリームを含むコンテナを送信する送信ステップと、
情報挿入部が、上記コンテナに上記所定数のビデオストリームのそれぞれに挿入された当該ビデオストリームが持つ画像データに関する情報に対応した情報を挿入する情報挿入ステップを有する
送信方法。 - 所定数のビデオストリームを含むコンテナを受信する受信部を備え、
上記所定数のビデオストリームは、ハイフレームレートで超高解像度の画像データを処理して得られた、基本フレームレートで高解像度の画像を得るための第1の画像データと、上記第1の画像データと共に用いてハイフレームレートで高解像度の画像を得るための第2の画像データと、上記第1の画像データと共に用いて基本フレームレートで超高解像度の画像を得るための第3の画像データと、上記第1から第3の画像データと共に用いてハイフレームレートで超高解像度の画像を得るための第4の画像データを持ち、
上記コンテナに上記所定数のビデオストリームのそれぞれに挿入された当該ビデオストリームが持つ画像データに関する情報に対応した情報が挿入されており、
デコード能力に応じて、上記コンテナに挿入されている情報に基づき、上記第1から第4の画像データの符号化画像データから所定の符号化画像データを選択的に取り出してデコード処理を行って画像データを得る処理部をさらに備える
受信装置。 - 上記ハイフレームレートで超高解像度の画像データは、ハイダイナミックレンジ画像データにハイダイナミックレンジ光電変換特性による光電変換を行ってハイダイナミックレンジ光電変換特性を持たせた伝送画像データであり、
上記第1の画像データの符号化画像データを持つビデオストリームに、上記ハイダイナミックレンジ光電変換特性または該特性に対応した電光変換特性を示す変換特性情報が挿入されており、
上記処理部は、
上記デコード処理で得られた画像データに上記変換特性情報に基づいて電光変換を行って表示用画像データを得る
請求項13に記載の受信装置。 - 上記ハイフレームレートで超高解像度の画像データは、ハイダイナミックレンジ画像データにハイダイナミックレンジ光電変換特性による光電変換を行ってハイダイナミックレンジ光電変換特性を持たせた伝送画像データであり、
上記ハイダイナミックレンジ光電変換特性はPQカーブの特性であり、
上記第1の画像データの符号化画像データを持つビデオストリームに、上記PQカーブの特性による変換データの値を通常ダイナミックレンジ光電変換特性による変換データの値に変換するための変換情報が挿入されており、
上記処理部は、
通常ダイナミックレンジ表示をするとき、
上記デコード処理で得られた画像データに、上記変換情報に基づいてダイナミックレンジ変換を行って通常ダイナミックレンジ伝送画像データを得、該通常ダイナミックレンジ伝送画像データに通常ダイナミックレンジ電光変換特性による電光変換を行って表示用画像データを得る
請求項13に記載の受信装置。 - 受信部が、所定数のビデオストリームを含むコンテナを受信する受信ステップを有し、
上記所定数のビデオストリームは、ハイフレームレートで超高解像度の画像データを処理して得られた、基本フレームレートで高解像度の画像を得るための第1の画像データと、上記第1の画像データと共に用いてハイフレームレートで高解像度の画像を得るための第2の画像データと、上記第1の画像データと共に用いて基本フレームレートで超高解像度の画像を得るための第3の画像データと、上記第1から第3の画像データと共に用いてハイフレームレートで超高解像度の画像を得るための第4の画像データを持っており、
上記コンテナに上記所定数のビデオストリームのそれぞれに挿入された当該ビデオストリームが持つ画像データに関する情報に対応した情報が挿入されており、
処理部が、デコード能力に応じて、上記コンテナに挿入されている情報に基づき、上記第1から第4の画像データの符号化画像データから所定の符号化画像データを選択的に取り出してデコード処理を行って画像データを得る処理ステップをさらに有する
受信方法。 - ハイフレームレートの画像データを処理して、基本フレームレート画像を得るための第1の画像データと該第1の画像データと共に用いてハイフレームレートの画像データを得るための第2の画像データを得る画像処理部と、
上記第1および第2の画像データの符号化画像データを持つ1つ以上のビデオストリームを含むコンテナを送信する送信部と、
上記コンテナに、上記第1の画像データの符号化画像データに対応して、上記第1の画像データの符号化画像データに対応したビデオストリームのレベル指定値を挿入し、上記第2の画像データの符号化画像データに対応して、上記第1および第2の画像データの符号化画像データを合わせたビデオストリームのレベル指定値を挿入する情報挿入部を備える
送信装置。 - 画像処理部が、ハイフレームレートの画像データを処理して、基本フレームレート画像を得るための第1の画像データと該第1の画像データと共に用いてハイフレームレートの画像データを得るための第2の画像データを得る画像処理ステップと、
送信部が、上記第1および第2の画像データの符号化画像データを持つ1つ以上のビデオストリームを含むコンテナを送信する送信ステップと、
情報挿入部が、上記コンテナに、上記第1の画像データの符号化画像データに対応して、上記第1の画像データの符号化画像データに対応したビデオストリームのレベル指定値を挿入し、第2の画像データの符号化画像データに対応して、上記第1および第2の画像データの符号化画像データを合わせたビデオストリームのレベル指定値を挿入する情報挿入ステップを有する
送信方法。 - 1つ以上のビデオストリームを含むコンテナを受信する受信部を備え、
上記1つ以上のビデオストリームは、基本フレームレート画像を得るための第1の画像データと該第1の画像データと共に用いてハイフレームレートの画像データを得るための第2の画像データを持ち、
上記コンテナに、上記第1の画像データの符号化画像データに対応して、上記第1の画像データの符号化画像データに対応したビデオストリームのレベル指定値が挿入されており、上記第2の画像データの符号化画像データに対応して、上記第1および第2の画像データの符号化画像データを合わせたビデオストリームのレベル指定値が挿入されており、
デコード能力に応じて、上記コンテナに挿入されている上記ビデオストリームのレベル指定値に基づき、上記第1および第2の画像データの符号化画像データから一つ以上の符号化画像データを選択的に取り出してデコード処理を行って画像データを得る処理部をさらに備える
受信装置。 - 受信部が、1つ以上のビデオストリームを含むコンテナを受信する受信ステップを有し、
上記1つ以上のビデオストリームは、基本フレームレート画像を得るための第1の画像データと該第1の画像データと共に用いてハイフレームレートの画像データを得るための第2の画像データを持ち、
上記コンテナに、上記第1の画像データの符号化画像データに対応して、上記第1の画像データの符号化画像データに対応したビデオストリームのレベル指定値が挿入されており、上記第2の画像データの符号化画像データに対応して、上記第1および第2の画像データの符号化画像データを合わせたビデオストリームのレベル指定値が挿入されており、
処理部が、デコード能力に応じて、上記コンテナに挿入されている上記ビデオストリームのレベル指定値に基づき、上記第1および第2の画像データの符号化画像データから一つ以上の符号化画像データを選択的に取り出してデコード処理を行って画像データを得る処理ステップをさらに有する
受信方法。
Priority Applications (11)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA3009777A CA3009777C (en) | 2016-02-09 | 2017-02-06 | Transmission device, transmission method, reception device and reception method |
JP2017566926A JP6947039B2 (ja) | 2016-02-09 | 2017-02-06 | 送信装置、送信方法、受信装置および受信方法 |
MX2018009410A MX2018009410A (es) | 2016-02-09 | 2017-02-06 | Dispositivo de transmision, metodo de transmision, dispositivo de recepcion y metodo de recepcion. |
KR1020247003573A KR20240017138A (ko) | 2016-02-09 | 2017-02-06 | 송신 장치, 송신 방법, 수신 장치 및 수신 방법 |
CN201780009645.3A CN108605152B (zh) | 2016-02-09 | 2017-02-06 | 发送装置、发送方法、接收装置和接收方法 |
EP17750193.9A EP3416393B1 (en) | 2016-02-09 | 2017-02-06 | Transmission device, transmission method, reception device and reception method |
KR1020187021191A KR20180109889A (ko) | 2016-02-09 | 2017-02-06 | 송신 장치, 송신 방법, 수신 장치 및 수신 방법 |
US16/072,542 US10764615B2 (en) | 2016-02-09 | 2017-02-06 | Transmission device, transmission method, reception device and reception method |
US16/930,011 US11223859B2 (en) | 2016-02-09 | 2020-07-15 | Transmission device, transmission method, reception device and reception method |
US17/457,848 US11792452B2 (en) | 2016-02-09 | 2021-12-06 | Transmission device, transmission method, reception device and reception method |
US18/459,666 US20230412859A1 (en) | 2016-02-09 | 2023-09-01 | Transmission device, transmission method, reception device and reception method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016023185 | 2016-02-09 | ||
JP2016-023185 | 2016-02-09 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/072,542 A-371-Of-International US10764615B2 (en) | 2016-02-09 | 2017-02-06 | Transmission device, transmission method, reception device and reception method |
US16/930,011 Continuation US11223859B2 (en) | 2016-02-09 | 2020-07-15 | Transmission device, transmission method, reception device and reception method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017138470A1 true WO2017138470A1 (ja) | 2017-08-17 |
Family
ID=59563534
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2017/004146 WO2017138470A1 (ja) | 2016-02-09 | 2017-02-06 | 送信装置、送信方法、受信装置および受信方法 |
Country Status (8)
Country | Link |
---|---|
US (4) | US10764615B2 (ja) |
EP (1) | EP3416393B1 (ja) |
JP (1) | JP6947039B2 (ja) |
KR (2) | KR20180109889A (ja) |
CN (1) | CN108605152B (ja) |
CA (1) | CA3009777C (ja) |
MX (1) | MX2018009410A (ja) |
WO (1) | WO2017138470A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020177297A (ja) * | 2019-04-15 | 2020-10-29 | キヤノン株式会社 | 画像処理装置、撮像装置、画像処理方法、及びプログラム |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102477041B1 (ko) * | 2015-03-24 | 2022-12-14 | 소니그룹주식회사 | 송신 장치, 송신 방법, 수신 장치 및 수신 방법 |
JP6843655B2 (ja) * | 2017-03-09 | 2021-03-17 | キヤノン株式会社 | 送信装置、受信装置、情報処理方法及びプログラム |
KR20200095651A (ko) * | 2019-02-01 | 2020-08-11 | 삼성전자주식회사 | 고 동적 범위 콘텐트를 재생하는 전자 장치 및 그 방법 |
CN115769586A (zh) * | 2020-05-28 | 2023-03-07 | 抖音视界有限公司 | 视频编解码中的参考图片列表信令通知 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11266457A (ja) * | 1998-01-14 | 1999-09-28 | Canon Inc | 画像処理装置、方法、及び記録媒体 |
JP2008543142A (ja) | 2005-05-24 | 2008-11-27 | ノキア コーポレイション | デジタル放送における階層的な送受信のための方法および装置 |
JP2015005976A (ja) * | 2013-06-18 | 2015-01-08 | パナソニック インテレクチュアル プロパティ コーポレーション オブアメリカPanasonic Intellectual Property Corporation of America | 送信方法 |
JP2015057875A (ja) * | 2013-08-09 | 2015-03-26 | ソニー株式会社 | 送信装置、送信方法、受信装置、受信方法、符号化装置および符号化方法 |
JP2015065530A (ja) * | 2013-09-24 | 2015-04-09 | ソニー株式会社 | 符号化装置、符号化方法、送信装置および受信装置 |
EP2947560A1 (en) * | 2014-05-20 | 2015-11-25 | LG Electronics Inc. | Video data processing method and device for display adaptive video playback |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6650783B2 (en) * | 1998-01-14 | 2003-11-18 | Canon Kabushiki Kaisha | Image processing apparatus and method for processing images with different scalabilites |
US6118820A (en) * | 1998-01-16 | 2000-09-12 | Sarnoff Corporation | Region-based information compaction as for digital images |
US20060156363A1 (en) * | 2005-01-07 | 2006-07-13 | Microsoft Corporation | File storage for scalable media |
US7725593B2 (en) | 2005-07-15 | 2010-05-25 | Sony Corporation | Scalable video coding (SVC) file format |
US9769230B2 (en) * | 2010-07-20 | 2017-09-19 | Nokia Technologies Oy | Media streaming apparatus |
US20130243391A1 (en) * | 2010-11-23 | 2013-09-19 | Samsung Electronics Co., Ltd. | Method and apparatus for creating a media file for multilayer images in a multimedia system, and media-file-reproducing apparatus using same |
IT1403450B1 (it) * | 2011-01-19 | 2013-10-17 | Sisvel S P A | Flusso video costituito da frame video combinati, e procedimento e dispositivi per la sua generazione, trasmissione, ricezione e riproduzione |
US8447674B2 (en) | 2011-07-21 | 2013-05-21 | Bank Of America Corporation | Multi-stage filtering for fraud detection with customer history filters |
WO2013030458A1 (en) * | 2011-08-31 | 2013-03-07 | Nokia Corporation | Multiview video coding and decoding |
JP6407717B2 (ja) * | 2011-09-27 | 2018-10-17 | コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. | 画像のダイナミックレンジ変換のための装置及び方法 |
US9648317B2 (en) * | 2012-01-30 | 2017-05-09 | Qualcomm Incorporated | Method of coding video and storing video content |
RU2612577C2 (ru) * | 2012-07-02 | 2017-03-09 | Нокиа Текнолоджиз Ой | Способ и устройство для кодирования видеоинформации |
DE112013003531T5 (de) * | 2012-08-10 | 2015-04-23 | Lg Electronics Inc. | Signal-Sende-/Empfangsvorrichtung und Signal-Sende/-Empfangsverfahren |
WO2014034463A1 (ja) * | 2012-08-27 | 2014-03-06 | ソニー株式会社 | 送信装置、送信方法、受信装置および受信方法 |
JP2015013726A (ja) | 2013-07-05 | 2015-01-22 | キヤノンファインテック株式会社 | シート収納装置と画像形成装置 |
JP5774652B2 (ja) | 2013-08-27 | 2015-09-09 | ソニー株式会社 | 送信装置、送信方法、受信装置および受信方法 |
-
2017
- 2017-02-06 KR KR1020187021191A patent/KR20180109889A/ko not_active IP Right Cessation
- 2017-02-06 JP JP2017566926A patent/JP6947039B2/ja active Active
- 2017-02-06 EP EP17750193.9A patent/EP3416393B1/en active Active
- 2017-02-06 MX MX2018009410A patent/MX2018009410A/es unknown
- 2017-02-06 US US16/072,542 patent/US10764615B2/en active Active
- 2017-02-06 KR KR1020247003573A patent/KR20240017138A/ko not_active Application Discontinuation
- 2017-02-06 CN CN201780009645.3A patent/CN108605152B/zh active Active
- 2017-02-06 WO PCT/JP2017/004146 patent/WO2017138470A1/ja active Application Filing
- 2017-02-06 CA CA3009777A patent/CA3009777C/en active Active
-
2020
- 2020-07-15 US US16/930,011 patent/US11223859B2/en active Active
-
2021
- 2021-12-06 US US17/457,848 patent/US11792452B2/en active Active
-
2023
- 2023-09-01 US US18/459,666 patent/US20230412859A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11266457A (ja) * | 1998-01-14 | 1999-09-28 | Canon Inc | 画像処理装置、方法、及び記録媒体 |
JP2008543142A (ja) | 2005-05-24 | 2008-11-27 | ノキア コーポレイション | デジタル放送における階層的な送受信のための方法および装置 |
JP2015005976A (ja) * | 2013-06-18 | 2015-01-08 | パナソニック インテレクチュアル プロパティ コーポレーション オブアメリカPanasonic Intellectual Property Corporation of America | 送信方法 |
JP2015057875A (ja) * | 2013-08-09 | 2015-03-26 | ソニー株式会社 | 送信装置、送信方法、受信装置、受信方法、符号化装置および符号化方法 |
JP2015065530A (ja) * | 2013-09-24 | 2015-04-09 | ソニー株式会社 | 符号化装置、符号化方法、送信装置および受信装置 |
EP2947560A1 (en) * | 2014-05-20 | 2015-11-25 | LG Electronics Inc. | Video data processing method and device for display adaptive video playback |
Non-Patent Citations (2)
Title |
---|
A MONOU, I. ET AL.: "On the high level syntax for SVC", DOCUMENT: JVT-P032, JOINT VIDEO TEAM (JVT) OF ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 AND ITU-T SG 16 Q.6, 21 July 2005 (2005-07-21), XP030040992, Retrieved from the Internet <URL:http://wftp3.itu.int/av-arch/jvt-site/2005_07_Poznan/JVT-P03> [retrieved on 20170220] * |
NACCARI, M. ET AL.: "High dynamic range compatibility information SEI message", DOCUMENT: JCTVC-U0033 (VERSION 5), JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT-VC) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11, 29 June 2015 (2015-06-29), XP030117446, Retrieved from the Internet <URL:http://phenix.it-sudparis.eu/jct/doc_end_user/documents/21_Warsaw/wgll/JCTVC-U0033-v5.zip> [retrieved on 20170220] * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020177297A (ja) * | 2019-04-15 | 2020-10-29 | キヤノン株式会社 | 画像処理装置、撮像装置、画像処理方法、及びプログラム |
JP7332325B2 (ja) | 2019-04-15 | 2023-08-23 | キヤノン株式会社 | 画像処理装置、撮像装置、画像処理方法、及びプログラム |
Also Published As
Publication number | Publication date |
---|---|
US20190037250A1 (en) | 2019-01-31 |
KR20240017138A (ko) | 2024-02-06 |
JPWO2017138470A1 (ja) | 2018-11-29 |
CN108605152A (zh) | 2018-09-28 |
EP3416393B1 (en) | 2024-05-08 |
MX2018009410A (es) | 2018-09-21 |
US11792452B2 (en) | 2023-10-17 |
KR20180109889A (ko) | 2018-10-08 |
US10764615B2 (en) | 2020-09-01 |
JP6947039B2 (ja) | 2021-10-13 |
CA3009777A1 (en) | 2017-08-17 |
CA3009777C (en) | 2024-04-16 |
EP3416393A4 (en) | 2018-12-19 |
US20220094993A1 (en) | 2022-03-24 |
US20230412859A1 (en) | 2023-12-21 |
US11223859B2 (en) | 2022-01-11 |
CN108605152B (zh) | 2021-07-16 |
EP3416393A1 (en) | 2018-12-19 |
US20200351529A1 (en) | 2020-11-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11838564B2 (en) | Transmission apparatus, transmission method, reception apparatus, and reception method | |
US11792452B2 (en) | Transmission device, transmission method, reception device and reception method | |
US10999605B2 (en) | Signaling of important video information in file formats | |
CN110915221B (zh) | 发送装置、发送方法、接收装置、以及接收方法 | |
US20220201308A1 (en) | Media file processing method and device therefor | |
JP2017069978A (ja) | 送信装置、送信方法、受信装置および受信方法 | |
US20240040169A1 (en) | Media file processing method and device therefor | |
US20240089518A1 (en) | Media file processing method and device | |
US20230336751A1 (en) | Method and apparatus for generating/receiving media file which signals output layer set information, and computer-readable recording medium storing media file | |
US20230379481A1 (en) | Media file generation/reception method and device for signaling operating point information and output layer set information, and computer-readable recording medium in which media file is stored | |
US20240056618A1 (en) | Method and device for generating/receiving media file including nal unit array information, and method for transmitting media file |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17750193 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2017566926 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 3009777 Country of ref document: CA |
|
ENP | Entry into the national phase |
Ref document number: 20187021191 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: MX/A/2018/009410 Country of ref document: MX |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2017750193 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2017750193 Country of ref document: EP Effective date: 20180910 |