WO2016002493A1 - ファイル生成装置および方法、並びにコンテンツ再生装置および方法 - Google Patents
ファイル生成装置および方法、並びにコンテンツ再生装置および方法 Download PDFInfo
- Publication number
- WO2016002493A1 WO2016002493A1 PCT/JP2015/067231 JP2015067231W WO2016002493A1 WO 2016002493 A1 WO2016002493 A1 WO 2016002493A1 JP 2015067231 W JP2015067231 W JP 2015067231W WO 2016002493 A1 WO2016002493 A1 WO 2016002493A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sample
- subsample
- information
- sub
- sample group
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/238—Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
- H04N21/2387—Stream processing in response to a playback request from an end-user, e.g. for trick-play
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/238—Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/262—Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
- H04N21/26258—Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for generating a list of items to be played back in a given order, e.g. playlist, or scheduling item distribution according to such list
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/438—Interfacing the downstream path of the transmission network originating from a server, e.g. retrieving MPEG packets from an IP network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47217—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/4728—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/816—Monomedia components thereof involving special video data, e.g 3D video
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/85406—Content authoring involving a specific file format, e.g. MP4 format
Definitions
- the present disclosure relates to a file generation apparatus and method, and a content reproduction apparatus and method, and more particularly, to a file generation apparatus and method, and a content reproduction apparatus and method capable of efficiently performing access based on data types in a segment. .
- MPEG-DASH Dynamic Adaptive Streaming over HTTP
- HTTP hypertext transfer protocol
- MPEG-DASH information for acquiring a segment of arbitrary time is described in MPD.
- the access information of subsegment in segment is described in sidx at the beginning of segment file.
- the dependency hierarchy information of IPB picture and its size information are described in ssix after sidx at the beginning of segment file .
- sidx and ssix are general-purpose access information that does not require interpretation of the structure of MP4 (moof), and can be used for stream access information such as MPEG-2TS (see Non-Patent Document 1). ).
- MPEG-DASH Dynamic-Adaptive-Streaming-over-HTTP
- URL http://mpeg.chiariglione.org/standards/mpeg-dash/media-presentation-description-and-segment-formats/text-isoiec-23009-12012-dam -1)
- HEVC uses the Tile structure, and it is possible to decode only the area that needs to be decoded by the application.
- Tile is handled as a subsample.
- I / B / P access information in picture units can be described.
- This disclosure has been made in view of such a situation, and is intended to efficiently access an arbitrary subsample in a sample.
- the file generation device uses a definition flag for defining a subsample in a file including a bitstream generated by encoding content obtained by dividing an image into a plurality of subsamples.
- a subsample information generating unit that generates subsample access information for accessing the subsamples, and a subsample access information generated by the subsample information generating unit are multiplexed,
- a file generation unit configured to generate the file;
- the subsample information generation unit can generate the subsample access information by mapping a sample group of defined subsamples to a level.
- the sub-sample information generation unit can generate the sub-sample access information by collecting the sample group expressing the position information of the sub-sample and the sample group of the sub-sample, and then mapping them to a level.
- the sub-sample information generation unit defines a sample group that stores position information of sub-samples and a sample group that stores reference information to the sample groups of the sub-samples, and then stores the reference information. Sample groups to be mapped to levels can generate the sub-sample access information.
- the sub-sample information generation unit defines a sample group that bundles the sample group expressing the position information of the sub-sample and the sample group of the sub-sample, and then collects the sample group that stores the reference information.
- the subsample access information can be generated by mapping to a level.
- the sub-sample information generation unit collects the sample group expressing the position information of the sub-sample and the sample group of the sub-sample, maps each sample group to a level, and sets the sub-sample access information Can be generated.
- the subsample is a tile.
- the subsample is 3D audio.
- It may further include an encoding unit that encodes content obtained by dividing the image into a plurality of subsamples and generates the bitstream.
- a file generation method is a file generation device for defining subsamples in a file including a bitstream generated by encoding content obtained by dividing an image into a plurality of subsamples.
- a sample group of subsamples is defined using a definition flag, subsample access information for accessing the subsamples is generated, and the generated subsample access information is multiplexed to generate the file.
- the content reproduction device uses a definition flag for defining a subsample in a file including a bitstream generated by encoding content obtained by dividing an image into a plurality of subsamples.
- Subsample access information for defining a sample group of subsamples and accessing the subsamples is generated, and the information acquisition unit for acquiring the subsample access information from the multiplexed file is acquired by the acquisition unit.
- a sample acquisition unit that acquires an arbitrary subsample using the subsample access information, and a reproduction unit that reproduces the arbitrary subsample acquired by the sample acquisition unit.
- the sub-sample access information is generated by mapping a sample group of defined sub-samples to a level.
- the sub-sample access information is generated by grouping a sample group expressing the position information of the sub-sample and the sample group of the sub-sample, and then mapping them to a level.
- the sub-sample access information is collected by defining a sample group that expresses position information of the sub-sample and a reference group that stores reference information for the sample group of the sub-sample, and then stores the reference information. Generated by mapping sample groups to levels.
- the sub-sample access information is grouped by defining a sample group representing the position information of the sub-samples and a sample group set that bundles the sample groups of the sub-samples. It is generated by mapping.
- the sub-sample access information is generated by grouping a sample group expressing sub-sample position information and the sample group of the sub-sample, and then mapping each sample group to a level.
- the subsample is a tile.
- the subsample is 3D audio.
- the content is stored in a server connected via a network.
- a content playback method is a content playback device for defining subsamples in a file including a bitstream generated by encoding content obtained by dividing an image into a plurality of subsamples.
- a subsample in a file including a bitstream generated by encoding content in which an image is divided into a plurality of subsamples, a subsample is defined using a definition flag for defining the subsample.
- Subsample access information for defining the sample group and accessing the subsample is generated.
- the generated subsample access information is multiplexed to generate the file.
- a subsample is defined using a definition flag for defining the subsample.
- sub-sample access information for accessing the sub-sample is generated, and the sub-sample access information is acquired from the multiplexed file. Then, an arbitrary subsample is acquired using the acquired subsample access information, and the acquired arbitrary subsample is reproduced.
- file generation device and content reproduction device may be independent devices or may be internal blocks constituting one device.
- a file can be generated.
- access by the data type in the segment can be performed efficiently.
- content can be played back.
- access by the data type in the segment can be performed efficiently.
- First embodiment information processing system
- Second embodiment computer
- FIG. 1 is a diagram illustrating a configuration example of an information processing system to which the present disclosure is applied.
- the information processing system 10 in FIG. 1 is configured by connecting a Web server 12 connected to a file generation device 11 and a moving image playback terminal 14 via the Internet 13.
- the Web server 12 distributes the image data of the moving image content to the moving image playback terminal 14 in tile units (tiled streaming) in accordance with MPEG-DASH.
- the file generation device 11 acquires image data of moving image content, encodes it in units of tiles, and generates a video stream.
- the file generation device 11 converts the video stream of each tile into a file for each time unit of about several seconds to about 10 seconds called a segment.
- the file generation device 11 uploads the image file of each tile obtained as a result to the Web server 12.
- the file generation device 11 acquires the audio data of the moving image content for each object (details will be described later) and encodes the data in units of objects to generate an audio stream.
- the file generation device 11 converts the audio stream of each object into a file in segments, and uploads the audio file of each object obtained as a result to the Web server 12.
- an object is a sound source, and audio data of each object is acquired by a microphone or the like attached to the object.
- the object may be an object such as a fixed microphone stand or a moving object such as a person.
- the file generation device 11 encodes audio metadata including object position information (audio position information) indicating the position of each object (acquisition position of audio data), an object ID that is an ID unique to the object, and the like.
- the file generation device 11 converts the encoded audio metadata into a file in segments, and uploads the resulting audio metafile to the Web server 12.
- the file generation device 11 is an MPD (Media that manages image files and audio files including image frame size information indicating the image frame size of an image of moving image content, tile position information indicating the position of each tile on the image, and the like. Presentation Description) file (control information) is generated.
- the file generation device 11 uploads the MPD file to the Web server 12.
- the Web server 12 stores the image file, audio file, audio metafile, and MPD file uploaded from the file generation device 11.
- the Web server 12 includes a segment group including an image file of a plurality of segments of a tile with a tile ID “1” and a segment group including an image file of a plurality of segments of a tile with a tile ID “2”. And are stored. Further, the Web server 12 stores a segment group composed of audio files of a plurality of segments of the object with the object ID “1” and a segment group composed of audio files of the plurality of segments of the object with the object ID “2”. ing. Although illustration is omitted, the segment group of the audio metafile is also stored in the same manner.
- tile #i the tile with the tile ID i
- object #i the object with the object ID i
- the Web server 12 functions as a transmission unit, and transmits stored image files, audio files, audio metafiles, MPD files, and the like to the video playback terminal 14 in response to a request from the video playback terminal 14.
- the video playback terminal 14 is configured to include a streaming control unit 21, a playback unit 22, and an HTTP access unit 23.
- the playback unit 22 includes an audio playback unit 31 and a video playback unit 32.
- the streaming control unit 21 is software for controlling data to be streamed from the Web server 12 and is executed by the video playback terminal 14 and functions on the video playback terminal 14.
- the streaming control unit 21 causes the video playback terminal 14 to acquire the MPD file from the Web server 12.
- the streaming control unit 21 determines the tiles in the display area based on the display area that is the display area in the image of the moving image content instructed from the moving image reproduction unit 32 and the tile position information included in the MPD file. Identify. Then, the streaming control unit 21 instructs the HTTP access unit 23 to send a transmission request for the image file of the tile.
- the streaming control unit 21 instructs the HTTP access unit 23 to send a voice metafile transmission request. Then, the streaming control unit 21 specifies an object corresponding to the image in the display area based on the display area, the image frame size information included in the MPD file, and the object position information included in the audio metafile. Then, the streaming control unit 21 instructs the HTTP access unit 23 to request transmission of the audio file of the object.
- the audio playback unit 31 is software that plays back an audio file acquired from the Web server 12, is executed by the video playback terminal 14, and functions on the video playback terminal 14.
- the video playback unit 32 is software that plays back an image file acquired from the Web server 12, and is executed by the video playback terminal 14 and functions on the video playback terminal 14.
- the moving image reproduction unit 32 instructs the streaming control unit 21 on the display area.
- the moving image reproduction unit 32 decodes and outputs the image file acquired from the Web server 12 in response to the instruction.
- the audio playback unit 31 decodes and outputs the audio file acquired from the Web server 12 in response to the instruction.
- the HTTP access unit 23 is software that controls communication with the Web server 12 via the Internet 13 using HTTP, and is executed by the video playback terminal 14 and functions on the video playback terminal 14.
- the HTTP access unit 23 causes the video playback terminal 14 to transmit a transmission request for an image file, an audio file, and an audio metafile in response to an instruction from the streaming control unit 21.
- the HTTP access unit 23 causes the video playback terminal 14 to receive the image file, the audio file, and the audio metafile transmitted from the Web server 12 in response to the transmission request.
- FIG. 2 is a diagram illustrating an example of a tile.
- the image of the moving image content is divided into a plurality of tiles, and a tile ID is assigned to each tile in order from 1.
- the moving image content image is divided into four tiles # 1 to # 4.
- FIG. 3 is a diagram for explaining an object.
- the sound of eight objects in the image is acquired as the sound of the moving image content, and object IDs are assigned to the objects in order from 1.
- Object # 1 to object # 5 are moving objects, respectively, and object # 6 to object # 8 are fixed objects.
- the image of the moving image content is divided into 5 (vertical) ⁇ 7 (horizontal) tiles.
- the display area 51 composed of 2 (vertical) ⁇ 3 (horizontal) tiles
- the display area 51 includes object # 1, object # 2, and Only object # 6 is included. Therefore, for example, the moving image playback terminal 14 acquires and plays back only the audio files of the object # 1, the object # 2, and the object # 6 from the Web server 12.
- the object in the display area 51 can be specified based on the image frame size information and the object position information as described below.
- FIG. 4 is a diagram for explaining the object position information.
- the object position information includes the horizontal angle ⁇ A ( ⁇ 180 ° ⁇ ⁇ A ⁇ 180 °), the vertical angle ⁇ A ( ⁇ 90 ° ⁇ ⁇ A ⁇ 90 °), and the distance rA (0 ⁇ rA). ).
- the horizontal angle ⁇ A is, for example, when the photographing position at the center of the image is the origin (base point) O, the horizontal direction of the image is the X direction, the vertical direction is the Y direction, and the depth direction perpendicular to the XY plane is the Z direction. This is the horizontal angle between the straight line connecting the object 60 and the origin O and the YZ plane.
- the vertical angle ⁇ A is the angle in the vertical direction between the straight line connecting the object 60 and the origin O and the XZ plane, and the distance rA is the distance between the object 60 and the origin O.
- the angle of left rotation and upward rotation is a positive angle
- the angle of right rotation and downward rotation is a negative angle
- FIG. 5 is a diagram for explaining image frame size information.
- the image frame size information is composed of a horizontal angle ⁇ v1 at the left end of the image frame, a horizontal angle ⁇ v2 at the right end, a vertical angle ⁇ v1 at the upper end of the image frame, a vertical angle ⁇ v2 at the lower end, and a distance rv. .
- the horizontal angle ⁇ v1 is, for example, an image frame when the image capturing position at the center of the image is the origin O, the horizontal direction of the image is the X direction, the vertical direction is the Y direction, and the depth direction perpendicular to the XY plane is the Z direction. This is the horizontal angle between the straight line connecting the left end and the origin O and the YZ plane.
- the horizontal angle ⁇ v2 is an angle in the horizontal direction between the straight line connecting the right end of the image frame and the origin O and the YZ plane. Accordingly, the angle obtained by combining the horizontal angle ⁇ v1 and the horizontal angle ⁇ v2 is the horizontal field angle.
- the vertical angles ⁇ V1 and ⁇ v2 are the angles between the straight line connecting the top and bottom edges of the image frame and the origin O and the XZ plane, respectively, and the combined angle of the vertical angle ⁇ V1 and the vertical angle ⁇ v2 is the vertical field angle. is there.
- the distance rv is the distance between the origin O and the plane of the image.
- the object position information and the image frame size information represent the positional relationship between the object 60, the image frame, and the origin O, respectively. Therefore, the position of each object on the image can be detected (recognized) based on the object position information and the image frame size information. As a result, the object in the display area 51 can be specified.
- MPEG-DASH Information for acquiring a segment of an arbitrary time is described in an MPD (Media Presentation Discriotion) file.
- MPD Media Presentation Discriotion
- the access information of the subsegment (subsegment) in the segment is described in sidx (Segment index box) at the head of the segment file.
- sidx Segment index box
- the dependency hierarchy (level) of the IPB picture is added to the ssix (Subsegment index box) after the first sidx of the segment file. ) Information and its size information are described.
- Sidx and ssix do not need to interpret the structure of MP4 (moof), are general-purpose access information, and can also be used for access information of streams such as MPEG-2TS.
- FIG. 6 is a diagram illustrating a configuration example of an MP4 file conforming to MPEG-DASH including sidx and ssix.
- the MP4 file conforming to MPEG-DASH is composed of an Initial segment file storing encoding initialization information and a plurality of media segment files storing samples.
- the Initial segment file is composed of ftyp, stbl (sample table box), and moov including mvex.
- stbl of the Initial segment file the type of byte range pointed to by ssix can be defined in sgpd (sample group description box).
- sgpd sample group description box
- mvex the type defined in sgpd can be mapped to level in leva (level assignment box). Each entry of sgpd and leva is linked in the index order, and by using these, a level that is one of the information stored in ssix can be defined.
- a media segment (hereinafter also simply referred to as a segment) file includes an mdat that stores multiple moofs and pictures, and styp, sidx, and ssix are placed before the top moof, that is, at the top of the segment file. Yes.
- a plurality of moofs and mdats included in the segment file are called sub-segments.
- Sidx and ssix store access information to the sub-segments that make up the segment file obtained from MPD information (time, URL).
- Sidx stores a table of sub-segment (moof + mdat) size (referenced_size). Therefore, it is possible to obtain only a sub-segment at an arbitrary time from sidx information, that is, random access.
- Ssix stores byte range using the level value mapped in leva. Therefore, it is possible to access the byte range of any level in the sub-segment from the ssix information.
- the definition of the subsample is determined for each codec. For example, when one picture is composed of a plurality of tiles in HEVC, the tiles are managed as subsamples constituting the samples in the MP4 file.
- FIG. 7 shows an example of the subsample information box.
- the subsample information box has only the size of the inside (subsample) of the sample. What this sub-sample is is the definition flag field on the second line from the top, and by setting the definition flag there, it can be seen what sub-sample it is.
- the definition flag 0 in HEVC is set, and it can be seen that the sample is a subsample of the NAL boundary as shown in FIG.
- the definition flag field in HEVC is set to 2 in the definition flag field, which indicates that the tile is a subsample of the tile.
- FIG. 8 is a diagram showing an example of the definition of the HVEC subsample.
- a sub-sample is defined on the basis of the value of the flags field of the sub-sample information box as specified below.
- the presence of this box is optional; however, if present in a track containing HEVC data, it shall have the semantics defined here.
- flags specifies the type of sub-sample information given in this box as follows: 0: NAL-unit-based sub-samples. A sub-sample contains one or more contiguous NAL units. 1: Decoding-unit-based sub-samples. A sub-sample contains exactly one decoding unit.
- a sub-sample either contains one tile and the associated non-VCL NAL units, if any, of the VCL NAL unit (s) containing the tile, or contains one or more non-VCL NAL units.
- 3 CTU-row-based sub-samples. A sub-sample either contains one CTU row within a slice and the associated non-VCL NAL units, if any, of the VCL NAL unit (s) containing the CTU row or contains one or more non-VCL NAL units. This type of sub-sample information shall not be used when entropy_coding_sync_enabled_flag is equal to 0.
- Slice-based sub-samples A sub-sample either contains one slice (where each slice may contain one or more slice segments, each of which is a NAL unit) and the associated non-VCL NAL units, if any, or contains one or more non-VCL NAL units .
- a subsample definition flag for defining individual subsamples in HEVC.
- a definition flag for subsamples for individually defining subsamples is also provided in 3D audio.
- FIG. 9 is a diagram showing an example of the definition of a 3D audio subsample.
- a sub-sample is defined on the basis of the value of the flags field of the sub-sample information box as specified below.
- the presence of this box is optional; however, if present in a track containing 3D audio data, it shall have the semantics defined here.
- flags specifies the type of sub-sample information given in this box as follows: 0: channel audio decoding frame sub-sample 1: HOA audio decoding frame sub-sample 2: Object-based sub-samples.
- Definition flag 2 indicates that it is a subsample of object audio.
- sidx and ssix are general-purpose access information without the need to interpret the structure of MP4 (moof), and streams such as MPEG-2TS It can also be used for access information.
- the subsample information box is stored in each moof, so the processing for acquiring the actual data was more.
- 3D audio is a standard that can encode the sound of a plurality of objects, which are sounds in an image, as part of independent streams. is there. Therefore, even in 3D audio, it is assumed that there is a demand for accessing only one object, like the HEVC tile described above.
- subsample access information examples of tiles or 3D audio as subsamples.
- this technology is not limited to subsamples, tiles, or 3D audio, and is a general-purpose mechanism that can describe some element that constitutes a sample. Is an extended definition.
- subsample access information when collectively calling information for accessing subsamples such as sgpd, leva, and ssix, they are referred to as subsample access information in this specification.
- FIG. 10 is a diagram illustrating an example of the definition of the sub-sample sample group of the present technology. That is, in the example of FIG. 10, a sample group of subsamples is defined.
- Codec_parameter is identification information indicating codec information
- “Flags” is a definition flag that defines a subsample for each codec (the subsample definition flag described above). The definition flag can also be said to be a flag for identifying a subsample.
- mapping 3d audio subsamples to levels will be described with reference to FIG. That is, in the example of FIG. 11, an example in which the definition of FIG. 10 is mapped to a level using sgpd (sample (group description box) and leva (level assignment box) is shown.
- sgpd sample (group description box)
- leva level assignment box
- sgpd sample group description ⁇ ⁇ box
- mha1 is identification information indicating 3d audio.
- entry_count 3
- sgpd includes three entries. The three entries are defined as 0 for mha1, 1 for mha1, and 2 for mha1. 0 (definition flag) of mha1 indicates that the channel audio is 3d audio. 2 (definition flag) of mha1 indicates that the object audio is 3d audio. 3 (definition flag) of mha1 indicates 3d audio metadata.
- sgpd sample group description box
- leva level group assignment box
- Level1 is channel audio
- Lavel2 is object audio
- Lavel3 is metadata
- mapping HEVC tile sub-samples to levels will be described with reference to FIG. That is, in the example of FIG. 12, an example in which the definition of FIG. 10 is mapped to a level using sgpd (sample group description box) and leva (level assignment box) is shown.
- sgpd sample group description box
- leva level assignment box
- sgpd sample group description ⁇ ⁇ box
- grouping_type 'sgss'
- hvc1 is identification information indicating 3d audio.
- entry_count 4
- sgpd includes four entries. The four entries are all defined as 2 of hvc1, and 2 (definition flag) of hvc1 indicates a tile of 3 HEVC.
- sgpd sample group description box
- leva level group assignment box
- Level1 is HEVC Tile1
- Lavel2 is Tile2
- Lavel3 is Tile3
- Lavel4 is Tile4.
- a new general-purpose sample group is defined by using the definition flag for sub-samples that are individually defined in the codec. Therefore, as described above, all the codec sub-samples can perform level mapping using the existing sgpd and ssix. Thereby, it is possible to efficiently access an arbitrary subsample in the sample.
- grouping_type 'aoif'
- syntax of the audio object sample group is composed of objectTheta, objectGamma, objectLength, maxObjectTheta1, maxObjectTheta2, objectGamma1, objectGamma2, objectLength1, objectLength2
- objectTheta is a horizontal angle indicating the position of the object.
- objectGamma is a vertical angle indicating the position of the object.
- objectLength is a distance indicating the position of the object.
- maxObjectTheta1 is the leftmost horizontal angle indicating the position of the object.
- maxObjectTheta2 is the rightmost horizontal angle indicating the position of the object.
- objectGamma1 is the lowest vertical angle indicating the position of the object.
- objectGamma2 is the vertical angle at the top indicating the position of the object.
- objectLength1 is the distance in the foremost direction indicating the position of the object. objectLength2, the farthest distance indicating the position of the object.
- grouping_type 'sgss'.
- num_of_sample_group number of sample groups
- a sample group entry that is desired to be used that is, referred to
- a sample group index is stored.
- sgpd sample group description box
- grouping_type 'aoif'. Is done.
- mha1 is identification information indicating 3d audio.
- entry_count 2
- sgpd includes two entries. The two entries are the position information where the entry of index 1 is 0, 0, 0, 0, 0, 0, 0 (meaning nul), and the entry of index 2 is ⁇ x, yx, lx, ⁇ 1x, ⁇ 2x , Y1x, y2x, l1x, l2x.
- mha1 0 (definition flag), aiof 1 is 3d audio channel audio, and index 1 (0, 0, 0, 0, 0, 0, 0, 0 (nul) of the audio object sample group in FIG. This means that the location information) is referred to.
- index 1 ( ⁇ x, yx, lx, ⁇ 1x, ⁇ 2x, y1x, y2x, l1x, l2x) of the audio object sample group in FIG. Information).
- 3 (definition flag) of mha1 is metadata of 3d audio, and the position of index 1 (0, 0, 0, 0, 0, 0, 0, 0 (meaning nul) of the audio object sample group in FIG. Information).
- sgpd sample group description box
- leva level group assignment box
- information that can be read from the entry of leva in FIG. 15 is information that Level1 is channel audio, Lavel2 is object audio, and Lavel3 is metadata.
- byte range can be stored in ssix using the level value mapped in leva.
- a method 2 for grouping a plurality of sample groups will be described.
- a sample group that bundles a plurality of sub-sample groups is defined.
- the sub-sample group set stores a sample group entry that refers to this sample group set and a sample group index.
- sgpd sample group description box
- grouping_type 'aoif'. Is done.
- entry_count 2
- sgpd includes two entries. The two entries are the position information where the entry of index 1 is 0, 0, 0, 0, 0, 0, 0, 0 (null), and the entry of index 2 is ⁇ x, yx, lx, ⁇ 1x, ⁇ 2x, y1x , Y2x, l1x, l2x position information.
- sample groups are collected as a sample group set as shown on the left side of FIG. 18, and are assigned level as shown on the right side of FIG.
- sgpd sample group description box
- grouping_type 'sgsg'
- entry_count 3
- sgpd includes three entries. Of the three entries, the entries of index 1 are 'sgsg', 1 and 'aoif', 1.
- the entries of index 2 are 'sgsg', 2 and 'aoif', 2.
- the entries of index 3 are 'sgsg', 3 and 'aoif', 1.
- sgpd sample group description box
- leva level group assignment box
- Level 1 is channel audio, and the object information is null.
- Lavel2 is object audio, and object information is ⁇ x, yx, lx, ⁇ 1x, ⁇ 2x, y1x, y2x, l1x, l2x.
- Lavel3 is metadata and object information is null.
- byte range can be stored in ssix using the level value mapped in leva.
- a method 3 for grouping a plurality of sample groups will be described.
- a plurality of sample groups set to the same level are defined in the level assignment. That is, in the method 3, in addition to the leva / aoif level assignment in FIG. 11, the leva / sgss level assignment shown in FIG. 19 is performed.
- sgpd sample group description box
- grouping_type 'aoif'
- entry_count 3
- sgpd includes three entries.
- the three entries are the position information where the entry of index 1 is 0, 0, 0, 0, 0, 0, 0 (null), and the entry of index 2 is ⁇ x, yx, lx, ⁇ 1x, ⁇ 2x, y1x , Y2x, l1x, l2x position information.
- the entry of index 3 is position information of 0, 0, 0, 0, 0, 0, 0 (null).
- sgpd sample group description box
- leva level group assignment box
- Level 1 is channel audio, and the object information is null.
- Lavel2 is object audio, and object information is ⁇ x, yx, lx, ⁇ 1x, ⁇ 2x, y1x, y2x, l1x, l2x.
- Lavel3 is metadata and object information is null.
- the target object audio level is analyzed from the analysis result read from the entry of leva in FIG. 19 and the analysis result read from the entry of leva in FIG.
- FIG. 20 is a diagram comparing methods 1 to 3 for grouping a plurality of sample groups.
- the good point of Method 1 is that the cost for expansion is unnecessary because it only refers to the sample group defined separately from the newly defined sgss.
- the bad point of the method 1 is that there is no general versatility of expansion, and it is necessary to cope with each similar requirement.
- Method 2 defines a sample group that groups multiple sample groups, so it is highly versatile and can be implemented in any combination.
- the disadvantage of Method 2 is that it requires a new extension to group sample groups.
- the good point of Method 3 is that it is possible to cope with the addition of operation rules because there is no need for additional definition because each level is set by level assignment box for each sample group.
- the bad point of Method 3 is that the number of levels assignment boxes is required for the number of sample groups, and the data structure in the file becomes redundant.
- a new general-purpose sample group is defined by using the definition flag for subsamples individually defined by the codec.
- sub-samples of all codecs can be mapped at the existing sgpd and ssix level. Thereby, it is possible to efficiently access an arbitrary subsample in the sample.
- FIG. 21 is a block diagram illustrating a configuration example of the file generation device 11 of FIG.
- the file generation device 11 encodes content data, and generates a plurality of MP4 files having the same content and different bit rates, and the MPD file described above.
- the file generation device 11 is configured to include an encoding unit 151, a subsample information generation unit 152, an MP4 file multiplexer 153, and a file transmission unit 154.
- the encoding unit 151 encodes the content data using, for example, HEVC, generates a bit stream, and supplies the generated bit stream to the MP4 file multiplexer 153. Also, the encoding unit 151 supplies the position information of the object audio and the subsample information to the subsample information generation unit 152. At the time of encoding, the encoding unit 151 performs encoding by dividing a picture into a plurality of tiles. In the case of a tile, information about the tile such as position information of the tile is also sent to the subsample information generation unit 152. Supplied.
- the subsample information generation unit 152 generates sample group information based on the position information of the audio object from the encoding unit 151. At this time, level information is also generated. Further, the subsample information generation unit 152 generates ssis information of subsample information included in the moof of the MP4 file based on MPEG-DASH based on the generated sample group information. The subsample information generation unit 152 supplies the generated sample group information of the position information of the audio object, the level information, and the ssix information of the subsample information to the MP4 file multiplexer 153.
- the MP4 file multiplexer 153 generates an MP4 file compliant with MPEG-DASH from the bit stream from the encoding unit 151, sample group information of level information of the audio object from the subsample information generation unit 152, level information, And ssix information of sub-sample information are multiplexed. That is, an MP4 file in which subsample information and gsix information are multiplexed is generated. Specifically, the subsample information is stored in a subsample information box in moof.
- the MP4 file generated by being multiplexed by the MP4 file multiplexer 153 is supplied to the file transmission unit 154.
- the file transmission unit 154 transmits the MP4 file to the Web server 12 and stores it in a storage unit (not shown).
- the file generation device 11 is also configured with an MPD file generation unit, and an MPD file is generated there.
- the generated MPD file is stored in a storage unit (not shown) of the Web server 12 by the file transmission unit 154.
- step S101 the encoding unit 151 encodes the content data using, for example, HEVC and generates a bit stream.
- the encoding unit 151 supplies the generated bit stream to the MP4 file multiplexer 153.
- the encoding unit 151 supplies the position information and subsample information of the object audio to the subsample information generation unit 152.
- step S102 the subsample information generation unit 152 acquires the position information and subsample information of the object audio.
- step S103 the sub-sample information generation unit 152 generates sample group information of object audio position information based on the position information of the audio object. That is, in step S103, sub-sample access information such as aoif, leva, sgss, and sgsg is generated according to the methods 1 to 3 described above with reference to FIGS.
- step S104 the subsample information generation unit 152 generates ssix of the subsample information.
- the subsample information generation unit 152 supplies the generated sample group information of the position information of the audio object, the level information, and the ssix information of the subsample information to the MP4 file multiplexer 153.
- the MP4 file multiplexer 153 generates an MPEG-DASH-compliant MP4 file from the HEVC bitstream from the encoding unit 151, and sample group information of the position information of the audio object from the subsample information generation unit 152 And level information and ssix information of sub-sample information are multiplexed. That is, an MP4 file in which sample group information of level information of audio objects, level information, and ssix information of subsample information are multiplexed is generated. Specifically, the subsample information is stored in a subsample information box of moof.
- the MP4 file generated by being multiplexed by the MP4 file multiplexer 153 is supplied to the file transmission unit 154.
- the file transmission unit 154 transmits the MP4 file to the Web server 12 and stores it in a storage unit (not shown).
- step S121 the streaming control unit 21 analyzes the MPD file in the storage unit (not shown) of the Web server 12 and acquires the URL (access) information of the segment file to be acquired. That is, the streaming control unit 21 takes into account the screen size and the state of the transmission path based on the analyzed MPD file, and selects the optimum image size, tile, and encoding speed according to the segment size to acquire. Get the URL (access) information of the file. This access information is supplied to the HTTP access unit 23.
- step S122 the HTTP access unit 23 uses the access information from the streaming control unit 21 to acquire an Initial segment of an MP4 file having a desired encoding rate.
- step S123 the streaming control unit 21 analyzes the level corresponding to the object (a) of the position information to be reproduced. This level analysis processing will be described later with reference to FIGS.
- step S124 the HTTP access unit 23 acquires sidx / ssix from the top of the segment file.
- step S125 the streaming control unit 21 analyzes the range of the index number of the object (a) in the segment file from the sidx / ssix acquired from the HTTP access unit 23 based on the level analyzed in step S123. .
- step S126 the HTTP access unit 23 acquires only the object (a) from the Web server 12 by HTTP. That is, the HTTP access unit 23 acquires only the object (a) from the Web server 12 by HTTP based on the index number range of the object (a) in the segment file analyzed by the streaming control unit 21.
- step S127 the audio reproduction unit 31 reproduces the audio data of the object (a) from the HTTP access unit 23 under the control of the streaming control unit 21. That is, the audio reproduction unit 31 decodes the audio data of the object (a) from the HTTP access unit 23 and outputs it to a speaker (not shown).
- step S123 in FIG. 23 the level analysis process in step S123 in FIG. 23 will be described with reference to the flowchart in FIG.
- FIG. 24 the level analysis processing in the case of the method 1 for grouping a plurality of sample groups described above with reference to FIGS. 14 and 15 is shown.
- step S151 the streaming control unit 21 analyzes position information from aoif (audio object sample group in FIG. 15).
- step S152 the streaming control unit 21 refers to the index of the target aoif from leva (level assignment box in FIG. 15) and sgss (subsample sample group in FIG. 15).
- step S153 the streaming control unit 21 analyzes the level of the object audio.
- the level is analyzed in the case of the method 1 for grouping a plurality of sample groups.
- step S123 in FIG. 23 level analysis processing in the case of the method 2 for grouping a plurality of sample groups described above with reference to FIGS. 16 to 18 is shown.
- step S171 the streaming control unit 21 analyzes position information from aoif (the audio object sample group of A in FIG. 17).
- step S172 the streaming control unit 21 analyzes the object audio information from sgss (subsample group of B in FIG. 17).
- step S173 the streaming control unit 21 analyzes the target aoif level from leva (level assignment box in FIG. 18) and sgsg (sample group set in FIG. 18).
- the level is analyzed in the case of the method 2 for grouping a plurality of sample groups.
- step S123 in FIG. 23 level analysis processing in the case of the method 3 for grouping a plurality of sample groups described above with reference to FIG. 19 is shown.
- step S191 the streaming control unit 21 analyzes position information from leva (level assignment box in FIG. 19) and aoif (audio object sample group in FIG. 19).
- step S192 the streaming control unit 21 analyzes the target level from leva (level assignment box in FIG. 11) and sgss (subsample group in FIG. 11).
- step S193 the streaming control unit 21 analyzes the level information of the target object audio from the analysis result in step S192.
- the level is analyzed in the case of the method 3 for grouping a plurality of sample groups.
- a new general-purpose sample group is defined by using the definition flag for subsamples individually defined by the codec.
- sub-samples of all codecs can be mapped at the existing sgpd and ssix level. Thereby, it is possible to efficiently access an arbitrary subsample in the sample.
- This technology can be applied to information other than 3d audio and information other than tiles. Thereby, the access by the data type in a segment is realizable.
- HEVC bitstream encoded by HEVC
- the file format is not limited to the MP4 file format or the AVC file format. If the problems and effects of the present technology are the same, the present invention can be similarly applied to another file format, a stream used for transmission, and a stream used for storing in a file.
- the series of processes described above can be executed by hardware or can be executed by software.
- a program constituting the software is installed in the computer.
- the computer includes, for example, a general-purpose personal computer capable of executing various functions by installing various programs by installing a computer incorporated in dedicated hardware.
- FIG. 27 is a block diagram showing an example of the hardware configuration of a computer that executes the above-described series of processing by a program.
- a CPU Central Processing Unit
- ROM Read Only Memory
- RAM Random Access Memory
- an input / output interface 805 is connected to the bus 804.
- An input unit 806, an output unit 807, a storage unit 808, a communication unit 809, and a drive 810 are connected to the input / output interface 805.
- the input unit 806 includes a keyboard, a mouse, a microphone, and the like.
- the output unit 807 includes a display, a speaker, and the like.
- the storage unit 808 includes a hard disk, a nonvolatile memory, and the like.
- the communication unit 809 includes a network interface or the like.
- the drive 810 drives a removable recording medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
- the CPU 801 loads the program stored in the storage unit 808 to the RAM 803 via the input / output interface 805 and the bus 804 and executes the program, for example. Is performed.
- the program executed by the computer 800 can be provided by being recorded in a removable recording medium 811 as a package medium or the like, for example.
- the program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
- the program can be installed in the storage unit 808 via the input / output interface 805 by attaching the removable recording medium 811 to the drive 810.
- the program can be received by the communication unit 809 via a wired or wireless transmission medium and installed in the storage unit 808.
- the program can be installed in the ROM 802 or the storage unit 808 in advance.
- the program executed by the computer may be a program that is processed in time series in the order described in this specification, or in parallel or at a necessary timing such as when a call is made. It may be a program for processing.
- the step of describing the program recorded on the recording medium is not limited to the processing performed in chronological order according to the described order, but may be performed in parallel or It also includes processes that are executed individually.
- system represents the entire apparatus composed of a plurality of devices (apparatuses).
- the configuration described as one device (or processing unit) may be divided and configured as a plurality of devices (or processing units).
- the configurations described above as a plurality of devices (or processing units) may be combined into a single device (or processing unit).
- a configuration other than that described above may be added to the configuration of each device (or each processing unit).
- a part of the configuration of a certain device (or processing unit) may be included in the configuration of another device (or other processing unit). . That is, the present technology is not limited to the above-described embodiment, and various modifications can be made without departing from the gist of the present technology.
- the present technology can take a configuration of cloud computing in which one function is shared and processed by a plurality of devices via a network.
- each step described in the above flowchart can be executed by one device or can be shared by a plurality of devices.
- the plurality of processes included in the one step can be executed by being shared by a plurality of apparatuses in addition to being executed by one apparatus.
- the method for transmitting such information is not limited to such an example.
- these pieces of information may be transmitted or recorded as separate data associated with the encoded bitstream without being multiplexed into the encoded bitstream.
- the term “associate” means that an image (which may be a part of an image such as a slice or a block) included in the bitstream and information corresponding to the image can be linked at the time of decoding. Means. That is, information may be transmitted on a transmission path different from that of the image (or bit stream).
- Information may be recorded on a recording medium (or another recording area of the same recording medium) different from the image (or bit stream). Furthermore, the information and the image (or bit stream) may be associated with each other in an arbitrary unit such as a plurality of frames, one frame, or a part of the frame.
- this technique can also take the following structures.
- a sample group of subsamples is defined using a definition flag for defining the subsamples
- a subsample information generation unit for generating subsample access information for accessing the subsample
- a file generation device comprising: a file generation unit that multiplexes subsample access information generated by the subsample information generation unit to generate the file.
- the subsample information generation unit generates the subsample access information by mapping a sample group of defined subsamples to a level.
- the subsample information generation unit generates a subsample access information by collecting a sample group expressing position information of a subsample and a sample group of the subsample, and then mapping the sample group to a level.
- the file generation device according to (1) or (2).
- the sub-sample information generation unit defines the sample group that stores reference information to the sample group that represents the position information of the sub-sample and the sample group of the sub-sample, and then collects the reference.
- the file generation device according to any one of (1) to (3), wherein a sample group storing information is mapped to a level to generate the sub-sample access information.
- the subsample information generation unit stores the reference information after defining a sample group that bundles the sample group expressing the position information of the subsample and the sample group of the subsample.
- the file generation device according to any one of (1) to (3), wherein a sample group is mapped to a level to generate the subsample access information.
- the subsample information generation unit collects the sample group expressing the position information of the subsample and the sample group of the subsample, and then maps each sample group to a level, so that the subsample
- the file generation device according to any one of (1) to (3), which generates access information.
- the file generation device according to any one of (1) to (6), wherein the subsample is a tile.
- the file generation device according to any one of (1) to (7), wherein the subsample is 3D audio.
- the file generation device is In a file containing a bitstream generated by encoding content divided into multiple subsamples, a subsample sample group is defined using a definition flag for defining the subsample. Generate sub-sample access information for access, A file generation method for generating the file by multiplexing generated sub-sample access information.
- (11) In a file including a bitstream generated by encoding content in which an image is divided into a plurality of subsamples, defining a sample group of subsamples using a definition flag for defining the subsamples, Subsample access information for accessing the subsample is generated, and an information acquisition unit for acquiring the subsample access information from the multiplexed file; A sample acquisition unit that acquires an arbitrary subsample using the subsample access information acquired by the acquisition unit; A content reproduction apparatus comprising: a reproduction unit that reproduces an arbitrary subsample acquired by the sample acquisition unit.
- the content reproduction device according to (11), wherein the subsample access information is generated by mapping a sample group of defined subsamples to a level.
- the subsample access information is generated by mapping a sample group expressing position information of a subsample and a sample group of the subsample, and then mapping the sample group to the level.
- the sub-sample access information is collected by defining a sample group that expresses position information of the sub-sample and a reference group that stores reference information to the sample group of the sub-sample, and then the reference information
- the content playback device according to any one of (11) to (13), wherein the content group is generated by mapping a sample group storing the content to a level.
- the sub-sample access information is a sample in which the reference information is stored after the sample group that defines the position information of the sub-sample and the sample group set that binds the sample group of the sub-sample are combined.
- the content playback device according to any one of (11) to (13), wherein the content playback device is generated by mapping a group to a level.
- the sub-sample access information is generated by collecting a sample group expressing position information of a sub-sample and a sample group of the sub-sample, and then mapping each sample group to a level.
- (11) The content reproduction device according to any one of (13).
- (17) The content reproduction device according to any one of (11) to (16), wherein the subsample is a tile.
- (18) The content reproduction device according to any one of (11) to (16), wherein the subsample is 3D audio.
- the content reproduction device according to any one of (11) to (18), wherein the content is stored in a server connected via a network.
- the content reproduction device is In a file containing a bitstream generated by encoding content divided into multiple subsamples, a subsample sample group is defined using a definition flag for defining the subsample. Sub-sample access information for access is generated, sub-sample access information is obtained from the multiplexed file, Using the obtained subsample access information, obtain an arbitrary subsample, A content playback method that plays back any subsample that has been acquired.
Abstract
Description
1.第1の実施の形態(情報処理システム)
2.第2の実施の形態(コンピュータ)
(情報処理システムの構成)
図1は、本開示を適用した情報処理システムの構成例を説明する図である。
図2は、タイルの例を示す図である。
図3は、オブジェクトを説明する図である。
図4は、オブジェクト位置情報を説明する図である。
図5は、画枠サイズ情報を説明する図である。
MPEG-DASHにおいては、任意の時間のセグメント(segment)を取得するための情報がMPD(Media Presentation Discriotion)ファイルに記述されている。また、セグメントファイル(Segment file)内の任意の時間のデータを取得するために、セグメントファイルの先頭のsidx(Segment index box)にセグメント内のサブセグメント(subsegment)のアクセス情報が記述されている。さらに、トリックプレイ(trick play)などの目的で、任意のI/Pピクチャだけを取得するために、セグメントファイルの先頭のsidxの後のssix(Subsegment index box)にIPBのピクチャの依存階層(レベル)の情報とそのサイズ情報が記述されている。
図6は、sidxおよびssix を含むMPEG-DASHに準拠したMP4ファイルの構成例を示す図である。
ここで、HEVC規格の符号化においては、図2を参照して上述したように、画像を複数のタイルに分割するタイル構造を利用し、アプリケーションによりデコードが必要となる領域(タイル)のみをデコードすることが可能となっている。
flags specifies the type of sub-sample information given in this box as follows:
0: NAL-unit-based sub-samples. A sub-sample contains one or more contiguous NAL units.
1: Decoding-unit-based sub-samples. A sub-sample contains exactly one decoding unit.
2: Tile-based sub-samples. A sub-sample either contains one tile and the associated non-VCL NAL units, if any, of the VCL NAL unit(s) containing the tile, or contains one or more non-VCL NAL units.
3: CTU-row-based sub-samples. A sub-sample either contains one CTU row within a slice and the associated non-VCL NAL units, if any, of the VCL NAL unit(s) containing the CTU row or contains one or more non-VCL NAL units. This type of sub-sample information shall not be used when entropy_coding_sync_enabled_flag is equal to 0.
4: Slice-based sub-samples. A sub-sample either contains one slice (where each slice may contain one or more slice segments, each of which is a NAL unit) and the associated non-VCL NAL units, if any, or contains one or more non-VCL NAL units.
flags specifies the type of sub-sample information given in this box as follows:
0: channel audio decoding frame sub-sample
1: HOA audio decoding frame sub-sample
2: Object-based sub-samples.
3: 3d audio metadata sub-sample
3Dオーディオにおける定義フラグ=0は、チャンネルオーディオのサブサンプルであることを示している。定義フラグ=1は、球状マイクロホンで録音したオーディオのサブサンプルであることを示している。定義フラグ=2は、オブジェクトオーディオのサブサンプルであることを示している。定義フラグ=3は、3dオーディオメタデータのサブサンプルであることを示している。
そこで、本技術においては、コーデックなどで個別に定義しているサブサンプル用の定義フラグを活用し、新たな汎用のサンプルグループを定義することで、すべてのコーデックのサブサンプルに、既存のsgpdとssixによるレベルのマッピングが可能となる。これにより、サンプル内の任意のサブサンプルへのアクセスを効率よく行うことができるようになる。
objectThetaは、オブジェクトの位置を示す水平方向の角度である。objectGammaは、オブジェクトの位置を示す垂直方向の角度である。objectLengthは、オブジェクトの位置を示す距離である。maxObjectTheta1は、オブジェクトの位置を示す最も左方向の水平方向の角度である。maxObjectTheta2は、オブジェクトの位置を示す最も右方向の水平方向の角度である。objectGamma1は、オブジェクトの位置を示す最も下方向の垂直方向の角度である。objectGamma2は、オブジェクトの位置を示す最も上方向の垂直方向の角度である。objectLength1は、オブジェクトの位置を示す最も手前方向の距離である。objectLength2、オブジェクトの位置を示す最も奥方向の距離である。
図21は、図1のファイル生成装置11の構成例を示すブロック図である。
次に、図22のフローチャートを参照して、ファイル生成装置11によるファイル生成処理について説明する。
次に、図23のフローチャートを参照して、動画再生端末14のオブジェクトオーディオ再生処理について説明する。
(1) 画像が複数のサブサンプルに分割されたコンテンツを符号化して生成されたビットストリームを含むファイルにおいて、サブサンプルを定義するための定義フラグを用いてサブサンプルのサンプルグループを定義して、サブサンプルにアクセスするためのサブサンプルアクセス情報を生成するサブサンプル情報生成部と、
前記サブサンプル情報生成部により生成されたサブサンプルアクセス情報を多重化して、前記ファイルを生成するファイル生成部
を備えるファイル生成装置。
(2) 前記サブサンプル情報生成部は、定義されたサブサンプルのサンプルグループをレベルにマップして、前記サブサンプルアクセス情報を生成する
前記(1)に記載のファイル生成装置。
(3) 前記サブサンプル情報生成部は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとをまとめてから、レベルにマップして、前記サブサンプルアクセス情報を生成する
前記(1)または(2)に記載のファイル生成装置。
(4) 前記サブサンプル情報生成部は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとへの参照情報を格納するサンプルグループを定義することでまとめてから、前記参照情報を格納するサンプルグループをレベルにマップして、前記サブサンプルアクセス情報を生成する
前記(1)乃至(3)のいずれかに記載のファイル生成装置。
(5) 前記サブサンプル情報生成部は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとを束ねるサンプルグループセットを定義することでまとめてから、前記参照情報を格納するサンプルグループをレベルにマップして、前記サブサンプルアクセス情報を生成する
前記(1)乃至(3)のいずれかに記載のファイル生成装置。
(6) 前記サブサンプル情報生成部は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとをまとめてから、それぞれのサンプルグループをレベルにそれぞれマップして、前記サブサンプルアクセス情報を生成する
前記(1)乃至(3)のいずれかに記載のファイル生成装置。
(7) 前記サブサンプルは、タイルである
前記(1)乃至(6)のいずれかに記載のファイル生成装置。
(8) 前記サブサンプルは、3Dオーディオである
前記(1)乃至(7)のいずれかに記載のファイル生成装置。
(9) 前記画像が複数のサブサンプルに分割されたコンテンツを符号化して、前記ビットストリームを生成する符号化部を
さらに備える前記(1)乃至(8)のいずれかに記載のファイル生成装置。
(10) ファイル生成装置が、
画像が複数のサブサンプルに分割されたコンテンツを符号化して生成されたビットストリームを含むファイルにおいて、サブサンプルを定義するための定義フラグを用いてサブサンプルのサンプルグループを定義して、サブサンプルにアクセスするためのサブサンプルアクセス情報を生成し、
生成されたサブサンプルアクセス情報を多重化して、前記ファイルを生成する
ファイル生成方法。
(11) 画像が複数のサブサンプルに分割されたコンテンツを符号化して生成されたビットストリームを含むファイルにおいて、サブサンプルを定義するための定義フラグを用いてサブサンプルのサンプルグループを定義して、サブサンプルにアクセスするためのサブサンプルアクセス情報が生成され、多重化されたファイルから、サブサンプルアクセス情報を取得する情報取得部と、
前記取得部により取得されたサブサンプルアクセス情報を用いて、任意のサブサンプルを取得するサンプル取得部と、
前記サンプル取得部により取得された任意のサブサンプルを再生する再生部と
を備えるコンテンツ再生装置。
(12) 前記サブサンプルアクセス情報は、定義されたサブサンプルのサンプルグループをレベルにマップして生成されている
前記(11)に記載のコンテンツ再生装置。
(13) 前記サブサンプルアクセス情報は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとをまとめてから、レベルにマップして生成されている
前記(11)または(12)に記載のコンテンツ再生装置。
(14) 前記サブサンプルアクセス情報は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとへの参照情報を格納するサンプルグループを定義することでまとめてから、前記参照情報を格納するサンプルグループをレベルにマップして生成されている
前記(11)乃至(13)のいずれかに記載のコンテンツ再生装置。
(15) 前記サブサンプルアクセス情報は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとを束ねるサンプルグループセットを定義することでまとめてから、前記参照情報を格納するサンプルグループをレベルにマップして生成されている
前記(11)乃至(13)のいずれかに記載のコンテンツ再生装置。
(16) 前記サブサンプルアクセス情報は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとをまとめてから、それぞれのサンプルグループをレベルにそれぞれマップして生成されている
前記(11)乃至(13)のいずれかに記載のコンテンツ再生装置。
(17) 前記サブサンプルは、タイルである
前記(11)乃至(16)のいずれかに記載のコンテンツ再生装置。
(18) 前記サブサンプルは、3Dオーディオである
前記(11)乃至(16)のいずれかに記載のコンテンツ再生装置。
(19) 前記コンテンツは、ネットワークを介して接続されるサーバに記憶されている
前記(11)乃至(18)のいずれかに記載のコンテンツ再生装置。
(20) コンテンツ再生装置が、
画像が複数のサブサンプルに分割されたコンテンツを符号化して生成されたビットストリームを含むファイルにおいて、サブサンプルを定義するための定義フラグを用いてサブサンプルのサンプルグループを定義して、サブサンプルにアクセスするためのサブサンプルアクセス情報が生成され、多重化されたファイルから、サブサンプルアクセス情報を取得し、
取得されたサブサンプルアクセス情報を用いて、任意のサブサンプルを取得し、
取得された任意のサブサンプルを再生する
コンテンツ再生方法。
Claims (20)
- 画像が複数のサブサンプルに分割されたコンテンツを符号化して生成されたビットストリームを含むファイルにおいて、サブサンプルを定義するための定義フラグを用いてサブサンプルのサンプルグループを定義して、サブサンプルにアクセスするためのサブサンプルアクセス情報を生成するサブサンプル情報生成部と、
前記サブサンプル情報生成部により生成されたサブサンプルアクセス情報を多重化して、前記ファイルを生成するファイル生成部
を備えるファイル生成装置。 - 前記サブサンプル情報生成部は、定義されたサブサンプルのサンプルグループをレベルにマップして、前記サブサンプル情報を生成する
請求項1に記載のファイル生成装置。 - 前記サブサンプル情報生成部は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとをまとめてから、レベルにマップして、前記サブサンプル情報を生成する
請求項2に記載のファイル生成装置。 - 前記サブサンプル情報生成部は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとへの参照情報を格納するサンプルグループを定義することでまとめてから、前記参照情報を格納するサンプルグループをレベルにマップして、前記サブサンプルアクセス情報を生成する
請求項3に記載のファイル生成装置。 - 前記サブサンプル情報生成部は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとを束ねるサンプルグループセットを定義することでまとめてから、前記参照情報を格納するサンプルグループをレベルにマップして、前記サブサンプルアクセス情報を生成する
請求項3に記載のファイル生成装置。 - 前記サブサンプル情報生成部は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとをまとめてから、それぞれのサンプルグループをレベルにそれぞれマップして、前記サブサンプルアクセス情報を生成する
請求項3に記載のファイル生成装置。 - 前記サブサンプルは、タイルである
請求項1に記載のファイル生成装置。 - 前記サブサンプルは、3Dオーディオである
請求項1に記載のファイル生成装置。 - 前記画像が複数のサブサンプルに分割されたコンテンツを符号化して、前記ビットストリームを生成する符号化部を
さらに備える請求項1に記載のファイル生成装置。 - ファイル生成装置が、
画像が複数のサブサンプルに分割されたコンテンツを符号化して生成されたビットストリームを含むファイルにおいて、サブサンプルを定義するための定義フラグを用いてサブサンプルのサンプルグループを定義して、サブサンプルにアクセスするためのサブサンプルアクセス情報を生成し、
生成されたサブサンプルアクセス情報を多重化して、前記ファイルを生成する
ファイル生成方法。 - 画像が複数のサブサンプルに分割されたコンテンツを符号化して生成されたビットストリームを含むファイルにおいて、サブサンプルを定義するための定義フラグを用いてサブサンプルのサンプルグループを定義して、サブサンプルにアクセスするためのサブサンプルアクセス情報が生成され、多重化されたファイルから、サブサンプルアクセス情報を取得する情報取得部と、
前記取得部により取得されたサブサンプルアクセス情報を用いて、任意のサブサンプルを取得するサンプル取得部と、
前記サンプル取得部により取得された任意のサブサンプルを再生する再生部と
を備えるコンテンツ再生装置。 - 前記サブサンプルアクセス情報は、定義されたサブサンプルのサンプルグループをレベルにマップして生成されている
請求項11に記載のコンテンツ再生装置。 - 前記サブサンプルアクセス情報は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとをまとめてから、レベルにマップして生成されている
請求項12に記載のコンテンツ再生装置。 - 前記サブサンプルアクセス情報は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとへの参照情報を格納するサンプルグループを定義することでまとめてから、前記参照情報を格納するサンプルグループをレベルにマップして生成されている
請求項13に記載のコンテンツ再生装置。 - 前記サブサンプルアクセス情報は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとを束ねるサンプルグループセットを定義することでまとめてから、前記参照情報を格納するサンプルグループをレベルにマップして生成されている
請求項13に記載のコンテンツ再生装置。 - 前記サブサンプルアクセス情報は、サブサンプルの位置情報を表現するサンプルグループと、前記サブサンプルのサンプルグループとをまとめてから、それぞれのサンプルグループをレベルにそれぞれマップして生成されている
請求項13に記載のコンテンツ再生装置。 - 前記サブサンプルは、タイルである
請求項11に記載のコンテンツ再生装置。 - 前記サブサンプルは、3Dオーディオである
請求項11に記載のコンテンツ再生装置。 - 前記コンテンツは、ネットワークを介して接続されるサーバに記憶されている
請求項11に記載のコンテンツ再生装置。 - コンテンツ再生装置が、
画像が複数のサブサンプルに分割されたコンテンツを符号化して生成されたビットストリームを含むファイルにおいて、サブサンプルを定義するための定義フラグを用いてサブサンプルのサンプルグループを定義して、サブサンプルにアクセスするためのサブサンプルアクセス情報が生成され、多重化されたファイルから、サブサンプルアクセス情報を取得し、
取得されたサブサンプルアクセス情報を用いて、任意のサブサンプルを取得し、
取得された任意のサブサンプルを再生する
コンテンツ再生方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/317,970 US10271076B2 (en) | 2014-06-30 | 2015-06-16 | File generation device and method, and content playback device and method |
JP2016531238A JP6493403B2 (ja) | 2014-06-30 | 2015-06-16 | ファイル生成装置および方法、並びにコンテンツ再生装置および方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014-134559 | 2014-06-30 | ||
JP2014134559 | 2014-06-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016002493A1 true WO2016002493A1 (ja) | 2016-01-07 |
Family
ID=55019040
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/067231 WO2016002493A1 (ja) | 2014-06-30 | 2015-06-16 | ファイル生成装置および方法、並びにコンテンツ再生装置および方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US10271076B2 (ja) |
JP (1) | JP6493403B2 (ja) |
WO (1) | WO2016002493A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020184357A1 (ja) * | 2019-03-08 | 2020-09-17 | ソニー株式会社 | 情報処理装置、情報処理方法及び情報処理プログラム |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014002619A1 (ja) * | 2012-06-25 | 2014-01-03 | ソニー株式会社 | 画像復号装置、画像復号方法、画像符号化装置及び画像符号化方法 |
JP2014011638A (ja) * | 2012-06-29 | 2014-01-20 | Canon Inc | 画像符号化装置、画像符号化方法及びプログラム、画像復号装置、画像復号方法及びプログラム |
JP2014030187A (ja) * | 2012-07-02 | 2014-02-13 | Canon Inc | メディアファイル生成方法、メディアファイル生成プログラム |
JP2014057227A (ja) * | 2012-09-13 | 2014-03-27 | Sony Corp | コンテンツ供給装置、コンテンツ供給方法、プログラム、およびコンテンツ供給システム |
WO2014057896A1 (ja) * | 2012-10-09 | 2014-04-17 | シャープ株式会社 | コンテンツ送信装置、コンテンツ再生装置、コンテンツ配信システム、コンテンツ送信装置の制御方法、コンテンツ再生装置の制御方法、制御プログラムおよび記録媒体 |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4389365B2 (ja) * | 1999-09-29 | 2009-12-24 | ソニー株式会社 | トランスポートストリーム記録装置および方法、トランスポートストリーム再生装置および方法、並びにプログラム記録媒体 |
US6549674B1 (en) * | 2000-10-12 | 2003-04-15 | Picsurf, Inc. | Image compression based on tiled wavelet-like transform using edge and non-edge filters |
FI120125B (fi) * | 2000-08-21 | 2009-06-30 | Nokia Corp | Kuvankoodaus |
US7200272B2 (en) * | 2002-01-31 | 2007-04-03 | Canon Kabushiki Kaisha | Image processing method storing input encoded data into a memory |
JP2004048299A (ja) | 2002-07-10 | 2004-02-12 | Matsushita Electric Ind Co Ltd | データ構造、多重化方法、および逆多重化方法 |
JP2004140668A (ja) * | 2002-10-18 | 2004-05-13 | Canon Inc | 情報処理方法 |
US8813142B2 (en) * | 2003-01-31 | 2014-08-19 | Qwest Communications International Inc. | Methods, systems and apparatus for providing video transmissions over multiple media |
JP2006042913A (ja) * | 2004-07-30 | 2006-02-16 | Olympus Corp | 画像観察装置 |
US20060233247A1 (en) * | 2005-04-13 | 2006-10-19 | Visharam Mohammed Z | Storing SVC streams in the AVC file format |
JP4716949B2 (ja) * | 2005-09-02 | 2011-07-06 | 株式会社リコー | 画像処理装置および画像処理方法 |
JP5326234B2 (ja) * | 2007-07-13 | 2013-10-30 | ソニー株式会社 | 画像送信装置、画像送信方法および画像送信システム |
KR100942142B1 (ko) * | 2007-10-11 | 2010-02-16 | 한국전자통신연구원 | 객체기반 오디오 콘텐츠 송수신 방법 및 그 장치 |
US8614960B2 (en) * | 2008-05-14 | 2013-12-24 | Samsung Electronics Co., Ltd. | Method and apparatus for transmitting and receiving data by using time slicing |
JP2013024752A (ja) * | 2011-07-21 | 2013-02-04 | Ricoh Co Ltd | 光学センサおよび画像形成装置 |
JP5143295B1 (ja) * | 2012-01-27 | 2013-02-13 | 株式会社東芝 | 電子機器及びインデックス生成方法 |
WO2013122385A1 (en) * | 2012-02-15 | 2013-08-22 | Samsung Electronics Co., Ltd. | Data transmitting apparatus, data receiving apparatus, data transreceiving system, data transmitting method, data receiving method and data transreceiving method |
WO2014111547A1 (en) * | 2013-01-18 | 2014-07-24 | Canon Kabushiki Kaisha | Method, device, and computer program for encapsulating partitioned timed media data |
-
2015
- 2015-06-16 JP JP2016531238A patent/JP6493403B2/ja not_active Expired - Fee Related
- 2015-06-16 WO PCT/JP2015/067231 patent/WO2016002493A1/ja active Application Filing
- 2015-06-16 US US15/317,970 patent/US10271076B2/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014002619A1 (ja) * | 2012-06-25 | 2014-01-03 | ソニー株式会社 | 画像復号装置、画像復号方法、画像符号化装置及び画像符号化方法 |
JP2014011638A (ja) * | 2012-06-29 | 2014-01-20 | Canon Inc | 画像符号化装置、画像符号化方法及びプログラム、画像復号装置、画像復号方法及びプログラム |
JP2014030187A (ja) * | 2012-07-02 | 2014-02-13 | Canon Inc | メディアファイル生成方法、メディアファイル生成プログラム |
JP2014057227A (ja) * | 2012-09-13 | 2014-03-27 | Sony Corp | コンテンツ供給装置、コンテンツ供給方法、プログラム、およびコンテンツ供給システム |
WO2014057896A1 (ja) * | 2012-10-09 | 2014-04-17 | シャープ株式会社 | コンテンツ送信装置、コンテンツ再生装置、コンテンツ配信システム、コンテンツ送信装置の制御方法、コンテンツ再生装置の制御方法、制御プログラムおよび記録媒体 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020184357A1 (ja) * | 2019-03-08 | 2020-09-17 | ソニー株式会社 | 情報処理装置、情報処理方法及び情報処理プログラム |
US11936962B2 (en) | 2019-03-08 | 2024-03-19 | Sony Group Corporation | Information processing device and information processing method |
Also Published As
Publication number | Publication date |
---|---|
US20170134768A1 (en) | 2017-05-11 |
JPWO2016002493A1 (ja) | 2017-04-27 |
US10271076B2 (en) | 2019-04-23 |
JP6493403B2 (ja) | 2019-04-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2947549C (en) | Information processing apparatus and information processing method | |
JP6697695B2 (ja) | 情報処理装置および情報処理方法 | |
US20180165358A1 (en) | Information processing apparatus and information processing method | |
US9865304B2 (en) | File generation device and method, and content playback device and method | |
WO2015008538A1 (ja) | 情報処理装置および情報処理方法 | |
US11252397B2 (en) | File generation apparatus and file generation method as well as reproduction apparatus and reproduction method | |
JP6508206B2 (ja) | 情報処理装置および方法 | |
US10945000B2 (en) | File generation apparatus and file generation method as well as reproduction apparatus and reproduction method | |
JP2022019932A (ja) | 情報処理装置および情報処理方法 | |
JP6493403B2 (ja) | ファイル生成装置および方法、並びにコンテンツ再生装置および方法 | |
JP6501127B2 (ja) | 情報処理装置および方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15815355 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016531238 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15317970 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15815355 Country of ref document: EP Kind code of ref document: A1 |