WO2020261689A1 - 情報処理装置、情報処理方法、再生処理装置及び再生処理方法 - Google Patents
情報処理装置、情報処理方法、再生処理装置及び再生処理方法 Download PDFInfo
- Publication number
- WO2020261689A1 WO2020261689A1 PCT/JP2020/014884 JP2020014884W WO2020261689A1 WO 2020261689 A1 WO2020261689 A1 WO 2020261689A1 JP 2020014884 W JP2020014884 W JP 2020014884W WO 2020261689 A1 WO2020261689 A1 WO 2020261689A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- content
- file
- information
- selection information
- content configuration
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 37
- 238000003672 processing method Methods 0.000 title claims abstract description 12
- 238000007781 pre-processing Methods 0.000 claims abstract description 70
- 238000000034 method Methods 0.000 claims description 71
- 230000006978 adaptation Effects 0.000 claims description 28
- 230000008569 process Effects 0.000 claims description 23
- 239000000203 mixture Substances 0.000 claims description 20
- 230000004048 modification Effects 0.000 description 46
- 238000012986 modification Methods 0.000 description 46
- 238000010586 diagram Methods 0.000 description 30
- 230000005540 biological transmission Effects 0.000 description 17
- 239000007788 liquid Substances 0.000 description 12
- 238000004458 analytical method Methods 0.000 description 9
- 238000005259 measurement Methods 0.000 description 8
- 230000008859 change Effects 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 230000015654 memory Effects 0.000 description 4
- 238000009877 rendering Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 125000002066 L-histidyl group Chemical group [H]N1C([H])=NC(C([H])([H])[C@](C(=O)[*])([H])N([H])[H])=C1[H] 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 230000008929 regeneration Effects 0.000 description 1
- 238000011069 regeneration method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/8146—Monomedia components thereof involving graphical data, e.g. 3D object, 2D graphics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23412—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs for generating or manipulating the scene composition of objects, e.g. MPEG-4 objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/23439—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements for generating different versions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/2362—Generation or processing of Service Information [SI]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
- H04N21/4345—Extraction or processing of SI, e.g. extracting service information from an MPEG stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
- H04N21/4347—Demultiplexing of several video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/442—Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
- H04N21/44209—Monitoring of downstream path of the transmission network originating from a server, e.g. bandwidth variations of a wireless network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/462—Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
- H04N21/4621—Controlling the complexity of the content stream or additional data, e.g. lowering the resolution or bit-rate of the video stream for a mobile client with a small screen
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/816—Monomedia components thereof involving special video data, e.g 3D video
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/85406—Content authoring involving a specific file format, e.g. MP4 format
Definitions
- the present invention relates to an information processing device, an information processing method, a reproduction processing device, and a reproduction processing method.
- the two-dimensional image may be referred to as 2D (Dimension) content below.
- 2D Displayed Device
- 360-degree video distribution that can be looked around in all directions is also performed on a video distribution site on the Web. Being able to look around in all directions means that the line-of-sight direction can be freely selected.
- the 360-degree video is called 3DoF (Degree of Freedom) video or 3DoF content.
- 3DoF Degree of Freedom
- a two-dimensionally encoded video is distributed from the distribution server and displayed on the client.
- 3DoF + content There is also content called 3DoF + content. Like the 3DoF content, the 3DoF + content can be looked around in all directions, and the viewpoint position can be moved a little. In 3DoF + content, the range in which the viewpoint position can be moved is assumed to be a range in which the user can move his / her head while sitting. The 3DoF + content realizes the movement of the viewpoint position by using one or more two-dimensionally encoded images.
- the 6DoF image is an image that can be looked around in all directions in the three-dimensional space and can be walked around in the displayed three-dimensional space. Being able to walk around in three-dimensional space means that the viewpoint position can be freely selected.
- the three-dimensional space may be referred to as a 3D space.
- 6DoF content is 3D content that represents a 3D space with one or more 3D model data.
- the 3D model data is also called 3D model data, and the 3D content may be called 3D content.
- a method of distributing 6DoF content for example, there is a method of configuring a three-dimensional space with a plurality of three-dimensional model data and transmitting it as a plurality of object streams.
- the configuration information of the three-dimensional space called the scene description (Scene Description) may be used.
- the scene description (Scene Description)
- MPEG Motion Picture Experts Group
- This scene description is a method of expressing a scene with a graph of a tree hierarchical structure called a scene graph and expressing the scene graph in a binary format.
- 6DoF content is a video material that expresses a 3D space with 3D model data for each time.
- the following three methods can be mentioned as examples of the expression method of the 6DoF content.
- One method is an expression method called object-based in the present invention.
- 6DoF content arranges 3D model data for each 3D object such as a person or an object, which is an individual display object in a video, in the 3D space to cover the entire 3D space. It has a content structure to be expressed.
- the object-based representation method is characterized in that the client that reproduces the 6DoF content simultaneously processes the largest amount of 3D model data among the three methods.
- the definition can be changed and displayed for each three-dimensional object such as an individual person or an object. Therefore, of the three methods, it can be said that the configuration method has a high degree of freedom for the playback process of the client.
- the other one is an expression method called space-based in the present invention.
- 6DoF content is not divided into 3D model data for each 3D object such as a person or an object, but a content structure that expresses the entire target 3D space as one 3D model data.
- the space-based representation method is characterized in that the client processes one ternary model data at the time of reproduction, and the processing power of the three methods is the lowest.
- the definition of the entire 6DoF content is fixed, and it can be said that the degree of freedom for the playback process of the client is extremely low.
- the 6DoF content has a content structure in which a specific 3D object is used as individual 3D model data and a 3D space that does not include the 3D object is expressed as one 3D model data.
- a plurality of three-dimensional model data are used in the reproduction process of the client, but the number of the three-dimensional model data is smaller than the number used in the object-based expression method. That is, in the mixed representation method, the client is required to have higher processing power than the space-based representation method, but may have lower processing power than the object-based representation method.
- the degree of freedom of the client's playback processing is also higher than that of the space-based expression method and lower than that of the object-based expression method.
- the content structure of 6DoF content is different in each expression method. Therefore, when the scene description is described so as to include some 6DoF contents having different expression methods, the client should select the content composition of the expression method with as much freedom as possible to improve the viewing experience of the user. It can be expanded and is preferable.
- the client when selecting an appropriate content structure from the scene description, the client will select the content structure after performing various analyzes.
- This analysis includes, for example, an analysis of the entire scene description and an analysis of the Adaptation Set in the MPD (Media Presentation Description). Since such analysis includes analysis of parts that are not actually used, it can be said that the efficiency of selecting the content configuration by the client device is poor.
- the present disclosure provides an information processing device, an information processing method, a reproduction processing device, and a reproduction processing method that allow the client device to efficiently select the content configuration.
- the preprocessing unit performs playback processing of each of the above-mentioned contents for one or more contents having a content structure expressing a virtual space composed of one or more three-dimensional objects and their spatial arrangement information. Generates content configuration selection information for determining whether or not is possible.
- the file generation unit generates a file including the data of the virtual space and the content configuration selection information.
- Non-Patent Document 1 (above)
- Non-Patent Document 2 "ISO / IEC 14496-12: 2015", Information technology. Coding of audio-visual objects. Part12: ISO base media file format, 2015-12
- Non-Patent Document 3 "ISO / IEC 23009-1: 2014”, Information technology. Dynamic adaptive streaming over HTTP (DASH), Part1: Media presentation description and segment formats, 2014-5
- the contents described in the above-mentioned non-patent documents are also incorporated in the present specification by reference.
- the contents described in the above-mentioned non-patent documents are also the basis for determining the support requirements.
- it is used in the structure / term used in the Scene Description described in Non-Patent Document 1, the File Structure described in Non-Patent Document 2, and the MPEG-DASH standard described in Non-Patent Document 3.
- the term is not directly described in the detailed description of the invention, it is within the scope of disclosure of the present technology and satisfies the support requirements of the claims.
- FIG. 1 is a diagram showing the structure of 6DoF content.
- the client may determine whether or not the reproduction ability of the client device itself can be sufficiently exhibited by the following three indexes.
- the first index is an index for determining whether or not the scene description file and the 3D model data file can be decoded independently.
- the second index is an index for determining whether or not all the scene description file and the three-dimensional model data file can be decoded.
- the third index is an index for determining whether or not the data after the decoding process can be rendered.
- Rendering means arranging and displaying in a three-dimensional space. When the client device determines whether or not the reproduction processing of each content configuration is possible based on these indexes, the following information may be used.
- the first information is the @mimeType attribute and @codecs attribute stored in the AdaptationSet representing the scene description among the AdaptationSets of the MPD file. Based on this information, the client device determines whether or not the scene description can be decoded. More specifically, @mimeType attribute can be used to determine whether the client device supports the file format of the scene description. In addition, @codecs attribute can be used to determine whether the client device supports the codec that encoded the scene description. This allows you to know which format the scene description is in, such as MPEG-4 Scene Description or glTF (GL Transmission Format) 2.0, and the client device plays the scene description. I know if I can do it.
- the second information is the sceneProfileLevelIndication filed that is stored when the scene description is represented by ISOBMFF (ISO Base Media File Format).
- ISOBMFF ISO Base Media File Format
- the client device determines whether or not it is possible to render the data after the scene description decoding process.
- This information is used to determine the playback processing capacity of the client device used to reconstruct the three-dimensional space from the scene graph (hierarchical structure) represented by the scene description (data of the scene graph).
- the sceneProfileLevelIndication filed includes the maximum number of points for each scene in the case of a point cloud, and includes the maximum number of vertices, the maximum number of faces, and the maximum number of vertices for each scene in the case of a mesh. That is, this information tells us how much playback processing power is needed for the entire scene.
- the third information is the number of external 3D model data files that make up the scene obtained from the scene graph represented by the scene description file.
- the client device determines whether or not the scene description file and the three-dimensional model data file can be decoded. For example, the client device determines that the reproduction is possible if the number of decoders of the three-dimensional model data owned by the client device is larger than the number of external three-dimensional model data files constituting the scene. In this case, the larger the number of decoders used, the higher the reproduction processing capacity required of the client device.
- the fourth information is @mimeType attribute and @codecs attribute stored in the AdaptationSet representing each 3D model data among the AdaptationSets possessed by the MPD file.
- @mimeType attribute includes, for example, information on the file format in which 3D model data is stored.
- @codecs attribute includes information on what codec the 3D model data is encoded in, profile information of the codec, and level information. Based on this information, the client device determines whether or not each three-dimensional model data can be decoded. More specifically, it can be determined by @mimeType attribute whether or not the client device supports each 3D model data file format. In addition, it can be determined by @codecs attribute whether or not the client device supports the codec that encodes each 3D model data.
- @codecs attribute includes the maximum number of points of 3D model data in the case of point cloud, and the maximum number of vertices and maximum number of faces of 3D model data in the case of mesh. And the maximum number of vertices are included.
- the fifth information is the @bandwidth attribute stored in the Representation of each 3D model data held by the MPD file.
- the client device determines whether or not each three-dimensional model data can be decoded. For example, the client device can use this information to determine whether the 3D model data alone has a reproducible bit rate or a reproducible bit rate for the entire scene.
- the first, fourth and fifth information is used as the first index
- the third, fourth and fifth information is used as the second index
- the second and fourth information is used as a third index.
- the content creator prepares and distributes a plurality of content configurations as 6DoF content.
- the content configurations of the object-based, space-based, and mixed-type expression methods are referred to as object-based content configurations, space-based content configurations, and mixed-type content configurations.
- the client device selects and reproduces the mixed content configuration if the playback processing capacity is high, but the playback processing capacity is high. If it is low, the space-based content structure is selected and played.
- the scene description in this case is described so as to include two content configurations.
- the client device uses the first to fifth information to select the content configuration, so that all the analysis of the scene description and the MPD adaptation set of the three-dimensional model data constituting the scene are set. Analyze the information described. This process is inefficient because it also analyzes the part of the content structure that is not actually used in each scene.
- the client device has not received the information for determining which of the contents having different content configurations can be played back. Therefore, it is difficult for the client device to determine whether or not the reproduction processing is possible without actually decoding and rendering. Therefore, a system in which the client device can efficiently select the content configuration will be described.
- FIG. 2 is a system configuration diagram of an example of a distribution system.
- the distribution system 100 includes a file generation device 1 which is an information processing device, a client device 2 which is a reproduction processing device, and a Web server 3.
- the file generation device 1, the client device 2, and the Web server 3 are connected to the network 4. Then, the file generation device 1, the client device 2, and the Web server 3 can communicate with each other via the network 4.
- the distribution system 100 may include a plurality of file generation devices 1 and a plurality of client devices 2, respectively.
- the file generation device 1 generates 6DoF content.
- the file generation device 1 uploads the generated 6DoF content to the Web server 3.
- the configuration in which the Web server 3 provides the 6DoF content to the client device 2 will be described, but the distribution system 100 can adopt another configuration.
- the file generation device 1 may include the functions of the Web server 3, store the generated 6DoF content in its own device, and provide it to the client device 2.
- the Web server 3 holds the 6DoF content uploaded from the file generation device 1. Then, the Web server 3 provides the designated 6DoF content according to the request from the client device 2.
- the client device 2 transmits a 6DoF content transmission request to the Web server 3. Then, the client device 2 acquires the 6DoF content specified in the transmission request from the Web server 3. Then, the client device 2 decodes the 6DoF content, generates a video, and displays the video on a display device such as a monitor.
- 6DoF content represents a three-dimensional space with one or more three-dimensional objects.
- the 3D object is represented using the coordinate system in the Bounding Box normalized by the local coordinate system of the 6DoF content, and is compressed and encoded to become a bitstream.
- a scene description is used to place this bitstream in a three-dimensional space.
- a scene displaying each three-dimensional object at each time is represented by a graph having a tree hierarchical structure called a scene graph, and the scene graph is represented in a binary format or a text format.
- the scene graph is spatial display control information, and is configured by defining information related to the display of a three-dimensional object as a configuration unit and hierarchically combining a plurality of nodes.
- the node includes a node for coordinate conversion information that converts from one coordinate system to another, a node for position information and size information of a three-dimensional object, and a node for access information to a three-dimensional object and audio data.
- the 6DoF content includes scene description data which is spatial display control information and media data of a plurality of 3D objects (for example, a combination of mesh data and texture data of 3D objects). It shall be composed of. Further, the 6DoF content may include audio data. Other formats such as point cloud can be applied to the media data of the 3D object. Further, in this embodiment, the scene description file conforms to the MPEG-4 Scene Description (ISO / IEC 14496-11).
- the MPEG-4 Scene Description data is a binary version of the scene graph in the format of BIFS (Binary Format for Scenes).
- BIFS Binary Format for Scenes
- the conversion of this scene graph to BIFS is possible by using a predetermined algorithm.
- the scene description in ISOBMFF the scene can be defined for each time, and it is possible to express a three-dimensional object whose position and size change.
- FIG. 3 is a block diagram of the file generator.
- the file generation device 1 which is an information processing device has a generation processing unit 10 and a control unit 11.
- the control unit 11 executes a process related to the control of the generation processing unit 10.
- the control unit 11 performs integrated control such as the operation timing of each unit of the generation processing unit 10.
- the generation processing unit 10 includes a data input unit 101, a preprocessing unit 102, an encoding unit 103, a file generation unit 104, and a transmission unit 105.
- the data input unit 101 accepts the input of the original information for generating the three-dimensional object and the meta information.
- the data input unit 101 outputs the acquired original information to the preprocessing unit 102.
- the data input unit 101 accepts data input.
- the data received by the data input unit 101 includes metadata such as a 3D object and arrangement information of the 3D object.
- the data input unit 101 outputs the acquired data to the preprocessing unit 102.
- the preprocessing unit 102 receives input of data including metadata such as 3D objects and arrangement information of 3D objects from the data input unit 101. Then, the preprocessing unit 102 determines the bitstream configuration based on the acquired data, and generates a scene graph using the metadata of each 3D object and the access information to the bitstream.
- the metadata includes control information such as what codec is used for compression.
- the preprocessing unit 102 generates each content configuration selection information including information on any one or a plurality of the above-mentioned first to fifth content configurations for each content configuration.
- This content composition selection information gives an index of the reproduction processing capacity required to reproduce the scene of each content composition.
- the preprocessing unit 102 stores the content configuration selection information for each content configuration in the scene description.
- the client device 2 can select a content configuration that can be reproduced by using the content configuration selection information.
- the storage of the content configuration selection information according to this embodiment will be described in detail below.
- FIG. 4 is a diagram for explaining a method of storing the content configuration selection information according to the first embodiment.
- the preprocessing unit 102 arranges child nodes for each content configuration under the Switch node in the scene description.
- the content configuration 301 is the content configuration of the mixed expression method
- the content configuration 302 is the content configuration of the space-based expression method.
- the preprocessing unit 102 extends the Switch node so as to store the information used for decoding the entire scene of each content configuration and determining whether or not rendering is possible as the content configuration selection information.
- FIG. 5 is a diagram showing an example of the syntax of the extended Switch node in the first embodiment.
- the preprocessing unit 102 shows a plurality of content configurations in the Choice field at the Switch node. Further, the preprocessing unit 102 newly adds Points field, VertivesParFace Field, Faces field, Indicators field, Num3DmodeData Field, 3DmodeIDataMimeType Field, 3DmodeDataCodec field and Bitrate field indicating the content configuration selection information of each content configuration. Then, the preprocessing unit 102 stores the value for each content configuration by storing the value in the newly added field in the content configuration order indicated by the Choice field.
- Points is the number of points in the point cloud.
- VertivesParFace is the number of vertices on the face of the mesh. Faces is the number of faces in the mesh. Indices is the number of vertices in the mesh.
- Num3DmodelData is the number of 3D model data to be externally referenced. This Num3D modelData corresponds to the third information.
- 3DmodelDataMimeType is a MimeType of 3D model data to be externally referenced.
- 3DmodelDataCodec is a codec for externally referenced 3D model data. These Num3DmodelData and 3DmodelDataCodec correspond to the fourth information.
- Bitrate is a bit rate including externally referenced 3D model data. This Bitrate corresponds to the fifth information.
- the preprocessing unit 102 outputs the three-dimensional object and the generated scene graph to the coding unit 103. Further, the preprocessing unit 102 outputs the metadata to the file generation unit 104.
- the coding unit 103 receives the input of the three-dimensional object and the scene graph from the preprocessing unit 102. Then, the coding unit 103 encodes the three-dimensional object to generate a bit stream. In addition, the coding unit 103 encodes the acquired scene graph and generates a scene description. After that, the encoding unit 103 outputs the generated bit stream and scene description to the file generation unit 104.
- the file generation unit 104 receives the input of the bit stream and the scene description from the encoding unit 103. Further, the file generation unit 104 receives the input of metadata from the preprocessing unit 102. Then, the file generation unit 104 creates a file by storing the acquired bit stream in an ISOBMFF file for each segment, and generates a segment file of the bit stream. In addition, the file generation unit 104 generates a segment file of the scene description by storing the scene description data in the ISOBMFF file for each segment to create a file.
- the file generation unit 104 generates an MPD (Media Presentation Description) file based on the data acquired from the preprocessing unit 102.
- the MPD file stores meta information of 6DoF content such as media type, video and audio segment file information.
- the transmission unit 105 acquires the ISOBMFF file and the MPD file of the bitstream and the scene description from the file generation unit 104, transmits them to the Web server 3, and uploads them.
- FIG. 6 is a block diagram of the client device.
- the client device 2 has a reproduction processing unit 20 and a control unit 21.
- the control unit 21 controls the operation of each unit of the reproduction processing unit 20.
- the control unit 21 comprehensively controls the operation timing of each unit of the reproduction processing unit 20.
- the reproduction processing unit 20 includes a file acquisition unit 201, a measurement unit 202, a file processing unit 203, a decoding processing unit 204, a display control unit 205, a display information generation unit 206, and a display unit 207.
- the file acquisition unit 201 acquires the MPD file corresponding to the 6DoF content to be played back from the Web server 3. Then, the file acquisition unit 201 acquires the scene description information of the 6DoF content to be reproduced based on the MPD file.
- the file acquisition unit 201 acquires an ISOBMFF file in which a scene description of 6DoF content to be displayed by accessing the Web server 3 is stored. Then, the file acquisition unit 201 outputs the ISOBMFF file in which the scene description is stored to the file processing unit 203.
- the file acquisition unit 201 acquires the bit stream information selected by the file processing unit 203 from the file processing unit 203. Then, the file acquisition unit 201 accesses the Web server 3 and acquires the segment file of the selected bit stream. After that, the file acquisition unit 201 outputs the acquired bitstream segment file to the file processing unit 203.
- the measurement unit 202 measures the transmission band of the transmission line between the client device 2 and the WEB server. Then, the measurement unit 202 outputs the measurement result of the transmission band to the file processing unit 203.
- the file processing unit 203 receives the input of the MPD file corresponding to the 6DoF content to be reproduced from the file acquisition unit 201. Then, the file processing unit 203 acquires the scene description information of the 6DoF content that parses and reproduces the acquired MPD file. The file processing unit 203 also recognizes a plurality of data used for adaptive distribution. For example, in the case of adaptive distribution in which the bit rate is switched, the information of the bit stream segment file corresponding to each bit rate is acquired. In this case, the file processing unit 203 outputs the scene description information of the 6DoF content to be reproduced to the file acquisition unit 201.
- the file processing unit 203 receives the input of the ISOBMFF file in which the scene description is stored from the file acquisition unit 201.
- the file processing unit 203 parses the acquired ISOBMFF file.
- the file processing unit 203 acquires the Switch node of the scene description.
- the file processing unit 203 acquires the content configuration selection information from the Switch node.
- the file processing unit 203 selects the content configuration to be used according to the reproduction processing capacity of the client device 2 from the acquired content configuration selection information.
- the file processing unit 203 acquires the child nodes in the scene graph corresponding to the selected content configuration.
- the file processing unit 203 acquires the coordinate conversion information, the arrangement information of the three-dimensional object, and the access information in the scene of the selected content configuration.
- the file processing unit 203 receives the input of the measurement result of the transmission band from the measurement unit 202. Then, the file processing unit 203 selects a bitstream segment file to be reproduced based on the parsing result of the scene description, the information indicating the transmission band acquired from the measurement unit 202, and the like. Then, the file processing unit 203 outputs the information of the segment file of the selected bit stream to the file acquisition unit 201. At this time, by changing the segment file of the bit stream selected according to the transmission band, adaptive distribution according to the bit rate is realized.
- the file processing unit 203 receives the input of the segment file of the selected bitstream from the file acquisition unit 201. Then, the file processing unit 203 extracts the bitstream data from the acquired bitstream segment file and outputs it to the decoding processing unit 204.
- the decoding processing unit 204 receives the input of bitstream data from the file processing unit 203. Then, the decoding processing unit 204 performs a decoding process on the acquired bit stream data. After that, the decoding processing unit 204 outputs the decoded bit stream data to the display information generation unit 206.
- the display control unit 205 receives the input of the operator's viewpoint position and line-of-sight direction information from an input device (not shown). Then, the display control unit 205 outputs the acquired information on the viewpoint position and the viewpoint direction to the display information generation unit 206.
- the display control unit 205 receives input of information on what kind of three-dimensional object exists from the file processing unit 203.
- the operator can also input designated information indicating the three-dimensional object of interest by using an input device instead of the viewpoint position and line-of-sight information.
- the display control unit 205 acquires designated information indicating a three-dimensional object of interest designated by the operator.
- the display control unit 205 displays information on the viewpoint position and the viewpoint direction so as to track the three-dimensional object designated by the designated information with the passage of time. Output to. This makes it possible to display, for example, an image that tracks a three-dimensional object specified by the operator.
- the display control unit 205 when displaying the position of a three-dimensional object, the display control unit 205 generates information for specifying the designated three-dimensional object from the 6DoF contents.
- the display information generation unit 206 receives the scene description and the decoded bitstream data, and the acquired information on the viewpoint position and the viewpoint direction, and generates the display information. The details of the display information generation unit 206 will be described below.
- the display information generation unit 206 arranges the three-dimensional object which is the acquired bitstream data in the three-dimensional space based on the scene description. Further, the display information generation unit 206 receives input of information on the viewpoint position and the line-of-sight direction of the operator from the display control unit 205. Then, the display information generation unit 206 renders a three-dimensional object arranged in the three-dimensional space according to the viewpoint position and the line-of-sight direction, and generates an image for display. After that, the display information generation unit 206 supplies the generated display image to the display unit 207.
- the display unit 207 has a display device such as a monitor.
- the display unit 207 receives an input of a display image generated by the display information generation unit 206. Then, the display unit 207 causes the display device to display the acquired image for display.
- FIG. 7 is a flowchart of a file generation process by the file generation device according to the first embodiment.
- the preprocessing unit 102 generates content configuration selection information (step S1).
- the preprocessing unit 102 generates content configuration selection information to be stored in the Switch node in the scene display.
- the preprocessing unit 102 generates scene graph data in which the content configuration selection information is stored in the Switch node (step S2).
- the preprocessing unit 102 outputs the scene graph data in which the content configuration selection information is stored in the Switch node to the coding unit 103.
- the encoding unit 103 encodes the data of the 3D object to generate a bit stream of the 3D object. Further, the coding unit 103 encodes the acquired scene graph and generates a scene description (step S3).
- the file generation unit 104 stores the acquired bitstream in an ISOBMFF file for each segment and generates a bitstream segment file. Further, the file generation unit 104 stores the scene description data in the ISOBMFF file for each segment to generate the scene description segment file (step S4).
- the transmission unit 105 outputs the segment file generated by the file generation unit 104 to the Web server 3 (step S5).
- FIG. 8 is a flowchart of the reproduction process executed by the client device according to the first embodiment.
- the file acquisition unit 201 acquires the MPD file corresponding to the 6DoF content to be played back from the Web server 3 (step S11).
- the file acquisition unit 201 outputs the acquired MPD file to the file processing unit 203.
- the file processing unit 203 parses the MPD file input from the file acquisition unit 201 and executes the analysis process (step S12).
- the file processing unit 203 specifies the scene description of the 6DoF content to be reproduced based on the analysis result.
- the file acquisition unit 201 acquires the scene description specified by the file processing unit 203.
- the file processing unit 203 parses the scene description acquired by the file acquisition unit 201 and acquires the Switch node.
- the file processing unit 203 acquires the content configuration selection information from the Switch node (step S13).
- the file processing unit 203 selects the content configuration according to the playback processing capacity of the client device 2 using the acquired content configuration selection information (step S14).
- the file processing unit 203 parses the child nodes corresponding to the selected content configuration. After that, the file processing unit 203 acquires a bitstream segment file corresponding to the 6DoF content to be reproduced based on the parsing result (step S15).
- the decoding processing unit 204 performs decoding processing on the bitstream segment file. After that, the decoding processing unit 204 outputs the bitstream data to the display information generation unit 206.
- the display control unit 205 outputs the input information on the viewpoint position and the line-of-sight direction to the display information generation unit 206.
- the display information generation unit 206 renders a three-dimensional object and adds position information using the information of the viewpoint position and the line-of-sight direction acquired from the display control unit 205 to generate an image for display and display it on the display unit 207.
- the viewing process is executed (step S16).
- the file generation device stores the content configuration selection information in the Switch node, which is the root node of the scene description, and provides it to the client device.
- the client device acquires the Switch node, which is the root node of the scene description, acquires the content configuration selection information, and selects the content configuration using the acquired content configuration selection information.
- the client device can acquire the content configuration selection information by acquiring the Switch node, and selects the content configuration without analyzing other nodes in the scene description. You can get the information to do. Therefore, it is possible to select an efficient content structure.
- Content creators can prepare multiple content configurations and distribute 6DoF content that can be used by client devices with different playback capabilities. Then, the client device can efficiently select the content configuration according to the reproduction processing capacity of the own device.
- the file generation device 1 according to this modification is different from the first embodiment in that the content configuration selection information is stored in the SampleEntry of the ISOBMFF file of the scene description.
- the preprocessing unit 102 of the file generation device 1 generates content configuration selection information as meta. Then, the preprocessing unit 102 transmits the metadata including the content configuration selection information to the file generation unit 104.
- the file generation unit 104 receives the input of the bit stream and the scene description from the encoding unit 103. Further, the file generation unit 104 receives input of metadata including content configuration selection information from the preprocessing unit 102.
- the file generation unit 104 creates a bitstream segment file by storing the acquired bitstream in an ISOBMFF file for each segment, as in the first embodiment, for the bitstream.
- the file generation unit 104 acquires the content configuration selection information from the metadata. Then, the file generation unit 104 stores the content configuration selection information in the SampleEntry of the ISOBMFF file of the scene description.
- the file generation unit 104 newly adds the 6DoFContentStructBox indicated by BOX303 to the SampleEntry of the ISOBMFF file of the scene description.
- FIG. 9 is a diagram showing an ISOBMFF file of the scene description in the modified example (1) of the first embodiment. Then, the file generation unit 104 stores the content configuration selection information for each content configuration in the 6DoFContentStructBox.
- the file generation unit 104 stores the content configuration selection information represented by the syntax shown in FIG. 10 in the 6DoFContentStructBox.
- FIG. 10 is a diagram showing an example of the syntax of the content configuration selection information stored in the Sample Entry in the modified example (1) of the first embodiment.
- the file configuration selection information in this case is the same information as the field newly added in the first embodiment shown in FIG.
- the file generation unit 104 creates a file by storing the scene description data in the ISOBMFF file for each segment, and generates a segment file of the scene description. Then, the file generation unit 104 outputs the ISOBMFF file of the scene description including the file configuration information to the transmission unit 105.
- the file processing unit 203 of the client device 2 receives the input of the ISOBMFF file of the scene description from the file acquisition unit 201. Then, the Initialization Segment of the ISOBMFF file of the scene description is acquired. Next, the file processing unit 203 acquires the content configuration selection information from the 6DoFContentStructBox in the acquired Initialization Segment. Then, the file processing unit 203 selects the content configuration to be used from the scene description using the acquired content configuration selection information.
- the client device can acquire the content composition selection information before the analysis of the scene description itself, and can select the content composition without analyzing the scene description itself. It can be carried out.
- the client device does not have to analyze the scene description itself in order to select the content configuration.
- the configuration of the modification (1) is effective when the playback processing capacity required for playback of the content configuration does not change every hour. In this case, it is not necessary to extend the scene description itself.
- the file generation device 1 represents the content configuration selection information of each content configuration by determining the value of each content configuration selection information and grouping them, and indicating the group to which each content configuration belongs by the group. It is different from the embodiment of 1.
- FIG. 11 is a diagram showing an example of a group of content configuration selection information.
- a group given 01 to 03 as a Liquid Performance ID which is an identification number, is set. Then, what kind of value is set for each content composition selection information is shown for each group.
- the preprocessing unit 102 of the file generation device 1 has information of a group of content configuration selection information shown in FIG. Then, the preprocessing unit 102 stores in the Switch node the information indicating which group corresponds to each content configuration by the Liquided Performance ID.
- the file processing unit 203 of the client device 2 analyzes the Switch node of the scene description and acquires the LiquidedPerformanceID of the group to which each content configuration belongs. Then, the file processing unit 203 determines the reproduction processing capacity required for reproduction of each content configuration from the content configuration selection information assigned to the group to which each content configuration belongs, and selects the content configuration.
- the file generation device notifies the content configuration selection information for each content configuration using the content configuration selection information group. As a result, it is not necessary to generate detailed content configuration selection information for each content configuration, and it is possible to lighten the generation process of the content configuration selection information.
- This method can also be applied to the modified example (1) of the first embodiment.
- the first implementation of the file generation device 1 according to this modification is to use Matroska Media Container (http://www.matroska.org/) instead of ISOBMFF as the file format when transmitting 3D model data. Different from the form.
- FIG. 12 is a diagram showing the format of Matroska Media Container.
- the file generation unit 104 of the file generation device 1 stores a 6DoFContentStruct element having content configuration selection information in the Track Entry element.
- the file generation unit 104 stores the Element Type as binary data and the SelectContentStructMetadata () shown in FIG. 10 as EBML (Extensible Binary Meta Language) data as binary data.
- EBML Extensible Binary Meta Language
- the file processing unit 203 of the client device 2 acquires the Initialization Segment of the Matroska Media Container file including the scene description. Then, the file processing unit 203 acquires the content configuration selection information from the 6DoFContentStruct element included in the Initialization Segment and selects the content configuration.
- container configuration selection information is generated for each container configuration, but the present invention is not limited to this, and for example, it is possible to store the Element Type as an Integer so as to represent the Liquid Performance ID as in the transformation example (2). is there.
- FIG. 13 is a diagram for explaining a method of storing the content configuration selection information according to the modified example (4) of the first embodiment.
- the file generation unit 104 of the file generation device 1 newly defines a Liquid Performance node 311 under the Group node of the content configuration 301, and under the Group node of the content configuration 302, for example. Define a new Liquid Performance node 312 in. Then, the file generation unit 104 stores the content configuration selection information of the content configuration 301 in each Liquid Performance node 311. In addition, the file generation unit 104 stores the content configuration selection information of the content configuration 302 in each Liquid Performance node 312.
- FIG. 14 is a diagram showing an example of the syntax of the Liquid Performance node.
- the file generation unit 104 defines a Liquid Performance node as a node having content configuration selection information of one content configuration using the syntax as shown in FIG.
- the Liquided Performance node is registered with information that serves as an index for determining whether or not the content configuration can be reproduced.
- Points represents the number of points in the point cloud.
- VertivesParFace represents the number of vertices on the face of the mesh. Faces represents the number of faces in the mesh. Indices represents the number of vertices in the mesh.
- Num3DmodelData represents the number of 3D model data to be externally referenced.
- 3DmodelDataMimeType represents the MimeType of the externally referenced 3D model data.
- 3DmodelDataCodec represents a codec of externally referenced 3D model data.
- Bitrate represents a bit rate including externally referenced 3D model data.
- the file processing unit 203 of the client device 2 acquires a Liquided Performance node for each content configuration. At this stage, the file processing unit 203 does not acquire other child nodes under this Group node. Next, the file processing unit 203 acquires the content configuration selection information of each content configuration from each Liquided Performance node. Then, the file processing unit 203 selects the content configuration. After that, the file processing unit 203 acquires and parses the Group node and below of the selected content configuration.
- the client device can select the content configuration by acquiring and analyzing the Liquid Performance node directly under the Group node of each content configuration. Therefore, the processing can be reduced as compared with the case of analyzing the entire scene description.
- the content configuration selection information can be provided to the client device without changing the existing node.
- a node for storing the content configuration selection information is generated as a child node of the Group node, but if it is a root node of the content configuration, it may be a child node of another node.
- the container configuration selection information is generated for each container configuration, but the present invention is not limited to this.
- the Liquid Performance node may be configured to hold the Liquid Performance ID in the modification (2).
- the file generation device 1 is different from the first embodiment in that the content configuration selection information is stored in the Adaptation Set in the MPD file indicating the access information to the scene description.
- the preprocessing unit 102 of the file generation device 1 generates content configuration selection information. Then, the preprocessing unit 102 transmits the metadata including the content configuration selection information to the file generation unit 104.
- the file generation unit 104 receives the input of the bit stream and the scene description from the encoding unit 103. Further, the file generation unit 104 receives input of metadata including content configuration selection information from the preprocessing unit 102.
- the file generation unit 104 creates a file by storing the acquired bitstream in an ISOBMFF file for each segment, and generates a bitstream segment file.
- the file generation unit 104 generates a segment file of the scene description by storing the scene description data in the ISOBMFF file for each segment to create a file.
- the file generation unit 104 generates an MPD file based on the data acquired from the preprocessing unit 102. At this time, the file generation unit 104 acquires the content configuration selection information included in the metadata. Then, the file generation unit 104 defines the 6DoFContentStruct descriptor shown in FIG. 15 in the AdaptationSet 320 of the scene description in the MPD file shown in FIG. FIG. 15 is a diagram showing a description example of 6DoFContentStructDescriptor. Further, the file generation unit 104 stores the CSC element for each content configuration in the 6DoFContentStructDescriptor according to the acquired content configuration selection information, and registers the content configuration selection information with the attribute of the CSC element.
- FIG. 16 is a diagram showing the semantics of CSC in the modified example (5) of the first embodiment.
- the CSC describes the definition of an information element indicating the capability of the content structure.
- the CSC consists of one or more elements up to 255 elements.
- @Use indicates the attribute information of whether each element used is Optional or Mandatory.
- CSC @ points represents the number of points in the point cloud.
- CSC @ VertivesParFace represents the number of vertices on the face of the mesh.
- CSC @ Faces represents the number of faces in the mesh.
- CSC @ Indicators represents the number of vertices in the mesh.
- CSC @ Num3DmodelData represents the number of 3D model data to be externally referenced.
- CSC @ 3DmodelDataMimeType represents the MimeType of the externally referenced 3D model data.
- CSC @ 3DmodelDataCodec represents the codec of the externally referenced 3D model data.
- CSC @ Bitrate represents the bit rate including the externally referenced 3D model data.
- the file processing unit 203 of the client device 2 acquires the content configuration selection information by parsing the MPD file. That is, the file processing unit 203 can acquire the content configuration selection information at the time of acquiring the MPD file, and can determine whether or not there is playable content in the scene description.
- the client device can select a playable content configuration without acquiring a scene description. This makes it possible to efficiently select the content configuration.
- the content configuration selection information is stored in the MPD file, it can be said to be effective when the reproduction processing capacity required for reproducing the content configuration does not change every hour.
- the container configuration selection information is generated for each container configuration, but the present invention is not limited to this, and for example, the attribute of the CSC element is configured to store the information representing the Liquided Performance ID as in the modification (2). It is also possible.
- the file generation device 1 according to the present embodiment is different from the first embodiment in that the content configuration selection information is stored in the Adaptation Set of each scene description as a file configuration in which the scene description is a separate file for each content configuration. ..
- the file generation device 1 according to the present embodiment is also represented by the block diagram of FIG.
- the client device 2 according to the present embodiment is also represented by the block diagram of FIG. In the following description, the description of the operation of each part similar to that of the first embodiment may be omitted.
- FIG. 17 is a diagram for explaining a method of storing the content configuration selection information according to the second embodiment.
- the preprocessing unit 102 of the file generation device 1 according to this modification generates scene descriptions 331 and 332 for each content configuration.
- the scene description 331 is a scene description of the content configuration # 1.
- the scene description 332 is a scene description of the content configuration # 2.
- the preprocessing unit 102 generates the content configuration selection information of each of the content configurations # 1 and # 2, and outputs the information to the file generation unit 104.
- the file generation unit 104 acquires the scene descriptions 331 and 332 from the encoding unit 103. Further, the file generation unit 104 acquires the content configuration selection information of the content configurations # 1 and # 2 from the preprocessing unit 102.
- the file generation unit 104 stores the scene descriptions 331 and 332 in the ISOBMFF file. Further, the file generation unit 104 stores the respective content configuration information in each of the adaptation sets of the scene descriptions 331 and 332 in the MPD file by using the 6DoFContentStruct Descriptor shown in FIG.
- the file processing unit 203 of the client device 2 acquires the MPD file from the file acquisition unit 201. Then, the file processing unit 203 acquires the content configuration selection information included in each of the adaptation sets of the scene descriptions 331 and 332 of the MPD file. Then, the file processing unit 203 selects the content configuration to be used by using the acquired content configuration selection information. After that, the file processing unit 203 acquires the scene description of the selected content configuration from the Web server 3 via the file acquisition unit 201.
- the client device can select the content configuration when the MPD file is acquired.
- the method according to the first embodiment and its modified example since the scene description including the unused content configuration is acquired, the unused data is acquired.
- the client device since the client device according to the present embodiment does not need to acquire the scene description data of the extra content configuration, the content is more efficient than the first embodiment and its modified example. Configuration selection is possible.
- the content configuration selection information since the content configuration selection information is stored in the MPD file, it can be said to be effective when the reproduction processing capacity required for reproducing the content configuration does not change every hour.
- the file generation device 1 according to this modification is different from the second embodiment in that the content configuration selection information is stored in the SampleEntry of the ISOBMFF file of the scene description.
- the preprocessing unit 102 of the file generation device 1 generates content configuration selection information. Then, the preprocessing unit 102 transmits the metadata including the content configuration selection information to the file generation unit 104.
- the file generation unit 104 receives the input of the bit stream and the scene description from the encoding unit 103. Further, the file generation unit 104 receives the input of the metadata including the information configuration selection information from the preprocessing unit 102.
- the file generation unit 104 creates a file by storing the acquired bitstream in an ISOBMFF file for each segment, and generates a bitstream segment file.
- the file generation unit 104 generates a segment file of the scene description by storing the scene description data in the ISOBMFF file for each segment to create a file.
- the file generation unit 104 When storing this scene description in the ISOBMFF file, the file generation unit 104 stores the content configuration selection information included in the metadata in the SampleEntry of the ISOBMFF file of the scene description. In this case, the file generation unit 104 can store the content configuration selection information by the storage method shown in FIGS. 9 and 10.
- the file processing unit 203 of the client device 2 acquires the ISOBMFF file of the scene description from the file acquisition unit 201. Next, the file processing unit 203 acquires the Initialization Segment of the ISOBMFF file of the acquired scene description. Then, the file processing unit 203 acquires the content configuration selection information from the 6DoFContentStructBox included in the Initialization Segment of the ISOBMFF file of the scene description. Then, the file processing unit 203 selects the content configuration to be used by using the acquired content configuration selection information. After that, the file processing unit 203 acquires the scene description of the selected content configuration from the Web server 3 via the file acquisition unit 201.
- the client device acquires the content configuration selection information from the 6DoFContentStructBox included in the Initialization Segment of the ISOBMFF file of the scene description. After that, the client device may acquire the scene description data of the content configuration to be used, and may not acquire the scene description data of the other unused content configuration. Therefore, efficient content configuration selection becomes possible.
- the file generation unit 104 can store the content configuration selection information and provide it to the client device in the same manner as in the modification (3) of the first embodiment.
- the content creator can increase the number of clients that can be played by providing the adapted content within the content structure.
- no information is provided for the client to determine whether the content adapted in the content structure can be played back by the client. Therefore, a system that allows the client to efficiently select the content configuration when adaptation is performed within the content configuration will be described.
- the file generation device 1 may generate and store information indicating the minimum required reproduction processing capacity together with information indicating the maximum reproduction processing capacity required for reproduction, store the information, and provide the information to the client device 2. It is different from the first embodiment.
- the file generation device 1 according to the present embodiment is also represented by the block diagram of FIG.
- the client device 2 according to the present embodiment is also represented by the block diagram of FIG. In the following description, the description of the operation of each part similar to that of the first embodiment may be omitted.
- the preprocessing unit 102 of the file generation device 1 generates content configuration selection information for each content configuration.
- the preprocessing unit 102 expands the Switch node of the scene description as shown in FIG. 18 to store the content configuration selection information for each content configuration.
- FIG. 18 is a diagram showing an example of the syntax of the extended Switch node in the second embodiment.
- MaxPoints is the maximum number of points in the point cloud.
- MinPoints is the minimum number of points in the point cloud.
- MaxVertivesParFace is the maximum number of vertices on the face of the mesh.
- MinVertivesParFace is the minimum number of vertices on the face of the mesh.
- MaxFaces is the maximum number of faces in the mesh.
- MinFaces is the minimum number of faces in the mesh.
- MaxIndices is the maximum number of vertices in the mesh.
- MinIndices is the minimum number of vertices in the mesh.
- MaxNum3DmodelData is the maximum number of 3D model data to be externally referenced.
- MinNum3DmodelData is the minimum number of 3D model data to be externally referenced.
- Max3DmodelDataCodec is the maximum value of the codec of the externally referenced 3D model data.
- Min3DmodelDataCodec is the minimum value of the codec of the externally referenced 3D model data.
- MaxBitrate is the maximum bit rate including the externally referenced 3D model data.
- MinBitrate is the minimum bit rate including the externally referenced 3D model data.
- the content composition information indicating the maximum value is information indicating the reproduction processing capacity capable of reliably reproducing the content.
- the content configuration information indicating the minimum value is information indicating a reproduction processing ability capable of reproducing the content when adaptation is executed, and is a reproduction process capable of reproducing a part of the content. It can be said that it is information showing ability.
- the preprocessing unit 102 outputs a scene graph including a Switch node represented by the syntax shown in FIG. 18 to the coding unit 103.
- the file processing unit 203 of the client device 2 acquires the Switch node from the acquired scene description. Then, the file processing unit 203 acquires the system configuration selection information of each system configuration from the Switch node. Then, the file processing unit 203 selects the system configuration to be used by using the acquired system configuration selection information. In this case, the file processing unit 203 is requested by the content configuration selection information representing the minimum value even if the client device 2 does not satisfy the playback processing capacity required by the content configuration selection information representing the maximum value in a certain system configuration. If the playback processing capacity is satisfied, the content structure can be selected.
- the file processing unit 203 parses the Group node and below of the selected system configuration in the scene description.
- the maximum playback processing capacity is required for the content structure regardless of the adaptation, so that the playable content structure is not selected if the adaptation is performed.
- the client device selects a content configuration that is difficult to reproduce when the reproduction processing capacity is most requested, as long as the content configuration can be reproduced when adaptation is performed. Can be played.
- the file generation device 1 according to this modification is different from the third embodiment in that the content configuration selection information is stored in the SampleEntry of the ISOBMFF file of the scene description.
- the file generation unit 104 receives the input of the content configuration selection information from the preprocessing unit 102.
- the file generation unit 104 newly adds 6DoFContentStructBox to the SampleEntry of the ISOBMFF file of the scene description.
- the file generation unit 104 stores the content configuration selection information represented by the syntax shown in FIG. 19 in the 6DoFContentStructBox.
- FIG. 19 is a diagram showing an example of the syntax of the content configuration selection information stored in the Sample Entry in the modified example (1) of the third embodiment.
- the file generation unit 104 outputs the ISOBMFF file of the scene description including the file configuration information to the transmission unit 105.
- the file processing unit 203 of the client device 2 receives the input of the ISOBMFF file of the scene description from the file acquisition unit 201. Then, the Initialization Segment of the ISOBMFF file of the scene description is acquired. Next, the file processing unit 203 acquires the content configuration selection information from the 6DoFContentStructBox in the acquired Initialization Segment. Then, the file processing unit 203 selects the content configuration to be used from the scene description using the acquired content configuration selection information. In this case, the file processing unit 203 is requested by the content configuration selection information representing the minimum value even if the client device 2 does not satisfy the playback processing capacity required by the content configuration selection information representing the maximum value in a certain system configuration. If the playback processing capacity is satisfied, the content structure can be selected.
- the client device can select and reproduce the content configuration that can be reproduced when adaptation is performed. ..
- the content creator can distribute 6DoF content corresponding to the reproduction of the client devices having different reproduction abilities in consideration of the adaptation in the content composition.
- the file generation device 1 according to this modification is different from the second embodiment in that the content configuration selection information is stored in the Adaptation Set in the MPD file indicating the access information to the scene description.
- the preprocessing unit 102 of the file generation device 1 generates content configuration selection information. Then, the preprocessing unit 102 transmits the metadata including the content configuration selection information to the file generation unit 104.
- the file generation unit 104 receives the input of the bit stream and the scene description from the encoding unit 103. Further, the file generation unit 104 receives input of metadata including content configuration selection information from the preprocessing unit 102.
- the file generation unit 104 creates a file by storing the acquired bitstream in an ISOBMFF file for each segment, and generates a bitstream segment file.
- the file generation unit 104 generates a segment file of the scene description by storing the scene description data in the ISOBMFF file for each segment to create a file.
- the file generation unit 104 generates an MPD file based on the data acquired from the preprocessing unit 102. At this time, the file generation unit 104 defines 6DoFContentStruct descriptor in the Adaptation Set of the scene description in the MPD file as shown in FIG. Then, the file generation unit 104 stores the CSC element for each content configuration in the 6DoFContentStructDescriptor according to the content configuration selection information, and registers the content configuration selection information with the attribute of the CSC element.
- FIG. 20 is a diagram showing the semantics of CSC in the modified example (2) of the third embodiment. As shown in FIG. 20, the CSC describes the definition of an information element indicating the capability of the content structure.
- the file generation unit 104 outputs the ISOBMFF file of the scene description including the file configuration information to the transmission unit 105.
- the file processing unit 203 of the client device 2 acquires the content configuration selection information by parsing the MPD file. Then, the file processing unit 203 selects the content configuration to be used from the scene description using the acquired content configuration selection information. In this case, the file processing unit 203 is requested by the content configuration selection information representing the minimum value even if the client device 2 does not satisfy the playback processing capacity required by the content configuration selection information representing the maximum value in a certain system configuration. If the playback processing capacity is satisfied, the content structure can be selected.
- the client device can select and play the content configuration that can be played back when adaptation is performed. ..
- the content creator can distribute 6DoF content corresponding to the reproduction of the client devices having different reproduction abilities in consideration of the adaptation in the content composition.
- container configuration selection information is generated for each container configuration, but the present invention is not limited to this, and for example, the Liquid Performance ID is used to represent the maximum value and the minimum value. It is also possible.
- the file generation device 1 according to the present embodiment is different from the first to third embodiments in that the configuration information of each three-dimensional model data is stored in the representation included in the adaptation set of the three-dimensional model data in the MPD.
- the file generation device 1 according to the present embodiment is also represented by the block diagram of FIG.
- the client device 2 according to the present embodiment is also represented by the block diagram of FIG. In the following description, the description of the operation of each part similar to that of the first embodiment may be omitted.
- the file generation unit 104 of the file generation device 1 acquires the content configuration selection information of each content configuration from the preprocessing unit 102. Then, when the MPD file is generated, the file generation unit 104 stores the configuration information of the three-dimensional model data for each Representation of the Adaptation Set of the three-dimensional model data.
- the file generation unit 104 stores, for example, the number of points in the case of a point cloud in the attribute of Representation @ numPoint as the configuration information of the three-dimensional model data. Further, the file generation unit 104 stores the number of vertices of the face in the case of the mesh in Representation @ vpf, the number of faces in Representation @ numFase, and the number of vertices in the attribute of Representation @ numIndices.
- the file processing unit 203 of the client device 2 has its own content that can be reproduced and processed separately from the content configuration selection information. It is possible to select the configuration.
- the file generation unit 104 may be stored in another element such as AdaptationSet or Preselection.
- the file generation unit 104 may store the minimum value and the maximum value in the adaptation set of the three-dimensional model data in the MPD.
- the file generation unit 104 stores the maximum value and the minimum value of the number of points in the case of the point cloud as attributes of AdaptationSet @ MaxNumPoint and AdaptationSet @ MinNumPoint.
- the file generation unit 104 stores the maximum value and the minimum value of the number of vertices of the surface in the case of the mesh in the attributes of AdaptationSet @ MaxVpf and AdaptationSet @ MinVpf, and stores the maximum value and the minimum value of the number of surfaces in the attributes of AdaptationSet @ MaxNumFace and AdaptationSet. It is stored in the attribute of @MinFace, and the maximum and minimum values of the number of vertices are stored in the attributes of AdaptationSet @ MaxNumIndices and AdaptationSet @ MinIndices.
- the configuration according to this embodiment can be used to determine the playback processing capacity required for playback when transmitting 3D model data using an MPD file without using a scene description.
- FIG. 21 is a diagram for explaining how to use the configuration information stored in the Adaptation Set of the three-dimensional model data.
- the file generation unit 104 stores information on the maximum value and the minimum value of the number of points in the AdaptationSet 340 of the three-dimensional model data. Further, the file generation unit 104 stores the number of points in Representations 341 to 343 included in the AdaptationSet 340.
- the file processing unit 203 of the client device 2 can determine whether or not there is a Representation that can be reproduced at the level of the AdaptationSet 340. When there is a Representation that can be regenerated, the file processing unit 203 can select and regenerate the Representation that can be regenerated from Representations 341 to 343.
- the file generation unit 104 may store the configuration information of the three-dimensional model data in another location. For example, the file generation unit 104 newly defines a 3DmModelDataMetadataBox in the SampleEntry of the ISOBMFF file of the 3D model data. Then, the file generation unit 104 may store the number of points in the case of the point cloud of the three-dimensional model data, and the 3DmModelDataMetadataBox in which the number of vertices, the number of faces, and the number of vertices of the faces in the case of the mesh are newly defined.
- the file generation unit 104 can also use Matroska Media Container as the file format instead of ISOBMFF. In that case, the file generation unit 104 newly stores the configuration information in the Track Entry element as a 3D modelDataMetadata element. At this time, the file generation unit 104 sets Element Type to binary, and as EBML data, the number of points in the case of Point Cloud of D model data and the number of vertices, the number of faces, and the number of vertices of faces in the case of mesh are binary data. Store as.
- the content configuration selection information can be used even when the client device does not have the ability to reproduce the 6DoF content of the specific three-dimensional model data.
- a content creator may distribute a content configuration using one three-dimensional model data and a spherical image.
- the client device can play the content using the content configuration selection information. It becomes possible to determine whether or not.
- FIG. 22 is a hardware configuration diagram of the computer.
- the file generation device 1 and the client device 2 can be realized by the computer 90 shown in FIG.
- the processor 91, the memory 92, the network interface 93, the non-volatile storage 94, the input / output interface 95, and the display interface 86 are connected to each other via a bus.
- External devices such as an input device, an output device, a storage device, and a drive are connected to the input / output interface 95.
- the input device is, for example, a keyboard, a mouse, a microphone, a touch panel, an input terminal, or the like.
- the output device is, for example, a speaker, an output terminal, or the like.
- the storage device is, for example, a hard disk, a RAM (Random Access Memory) disk, or the like.
- the drive drives removable media such as magnetic disks, optical disks, magneto-optical disks, or semiconductor memories.
- a display 98 which is a display device, is connected to the display interface 96.
- the network interface 93 is connected to an external network.
- the file generation device 1 and the client device 2 are connected to each other via the network interface 93. Further, the file generation device 1 and the client device 2 are connected to the Web server 3 via the network interface 93.
- the non-volatile storage 94 is a built-in auxiliary storage device such as a hard disk or SSD (Solid State Drive).
- the processor 91 for example, loads the program stored in the non-volatile storage 94 into the memory 92 via the bus and executes the series of processing described above. Is done.
- the memory 92 also appropriately stores data and the like necessary for the processor 91 to execute various processes.
- the program executed by the processor 91 can be recorded and applied to removable media such as package media, for example.
- the program can be installed in the non-volatile storage 94 via the input / output interface 95 by mounting the removable media in the drive which is the external device 97.
- the program can also be provided via wired or wireless transmission media such as local area networks, the Internet, and digital satellite broadcasting. In that case, the program can be received at the network interface 93 and installed in the non-volatile storage 94.
- this program can be installed in advance in the non-volatile storage 94.
- each content can be reproduced for one or a plurality of contents having a content structure expressing a virtual space composed of one or more three-dimensional objects and their spatial arrangement information.
- a pre-processing unit that generates content configuration selection information for judgment, and An information processing device including a file generation unit that generates a file containing the data in the virtual space and the content configuration selection information.
- the preprocessing unit includes availability identification information for determining whether or not decoding and rendering of the entire virtual space can be performed in the reproduction processing device that reproduces the content in the content configuration selection information.
- the information processing device stores the content configuration selection information in the Switch node of the scene description.
- the information processing device stores the content configuration selection information for each content configuration in the scene description.
- the preprocessing unit generates the content configuration selection information as metadata, and then generates the content configuration selection information as metadata.
- the information processing device according to the appendix (2), wherein the file generation unit generates a content file in which the content configuration selection information is stored.
- the information processing device according to the appendix (6), wherein the file generation unit generates the content file as an ISOBMFF file and stores the content configuration selection information in the 6DoFContentStructBox in the SampleEntry of the content file.
- the preprocessing unit has the content configuration selection information for each group in which the content configuration is determined in advance, and the content configuration selection information of the group to which the content belongs is used as the content of the content.
- the information processing device according to Appendix (2) which is used as configuration selection information.
- the preprocessing unit generates the content configuration selection information as metadata, and generates the content configuration selection information as metadata.
- the information processing device according to the appendix (2), wherein the file generation unit generates metadata storing the content configuration selection information.
- the information processing device (10) The information processing device according to the appendix (9), wherein the file generation unit generates the content file as an MPD file and stores the content configuration selection information in the Adaptation Set of the MPD file. (11) The preprocessing unit generates different scene descriptions for each of the content configurations for a plurality of the contents having different content configurations. The information processing device according to the appendix (2), wherein the file generation unit stores the content configuration selection information in the 6DoFContentStructBox in the AdaptationSet of the MPD file or the SampleEntry of the ISOBMFF file for each scene description.
- each content can be reproduced for one or more contents having a content structure expressing a virtual space composed of one or more three-dimensional objects and their spatial arrangement information.
- a file acquisition unit that acquires a file containing content configuration selection information for determination and data of the content, and a file acquisition unit. The content configuration selection information is acquired from the file acquired by the file acquisition unit, and based on the acquired content configuration selection information, it is determined whether or not each content can be reproduced and reproduced.
- a file processing unit that selects the content,
- a playback processing device including a playback unit that reproduces the content selected by the file processing unit.
- each of the contents can be reproduced for one or more contents having a content structure expressing a virtual space composed of one or more three-dimensional objects and their spatial arrangement information.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Graphics (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
非特許文献2:"ISO/IEC 14496-12:2015", Information technology. Coding of audio-visual objects. Part12:ISO base media file format, 2015-12
非特許文献3:"ISO/IEC 23009-1:2014", Information technology. Dynamic adaptive streaming over HTTP(DASH), Part1:Media presentation description and segment formats, 2014-5
また、以下に示す項目順序に従って本開示を説明する。
1.1 第1の実施形態の変形例(1)
1.2 第1の実施形態の変形例(2)
1.3 第1の実施形態の変形例(3)
1.4 第1の実施形態の変形例(4)
1.5 第1の実施形態の変形例(5)
2.第2の実施形態
2.1 第2の実施形態の変形例(1)
3.第3の実施形態
3.1 第3の実施形態の変形例(1)
3.2 第3の実施形態の変形例(2)
4.第4の実施形態
対象物ベース、空間ベース及び混合型の表現手法の各コンテンツ構成を有する6DoFコンテンツの配信においては、例えば、シーンディスクリプションファイル、MPDファイル及び3Dモデルデータファイルが図1のように構成されて配信される。図1は、6DoFコンテンツの構成を表す図である。現状のこれらのコンテンツ構成においては、クライアントは下記の3つの指標でクライアント装置自身の再生能力が十分発揮できるか否かを判定する場合がある。
図2は、配信システムの一例のシステム構成図である。配信システム100は、情報処理装置であるファイル生成装置1、再生処理装置であるクライアント装置2及びWebサーバ3を含む。ファイル生成装置1、クライアント装置2及びWebサーバ3は、ネットワーク4に接続される。そして、ファイル生成装置1、クライアント装置2及びWebサーバ3は、ネットワーク4を介して相互に通信可能である。ここで、図1においては、各装置を1台ずつ示しているが、配信システム100は、ファイル生成装置1及びクライアント装置2をそれぞれ複数台含んでもよい。
次に、ファイル生成装置1の詳細について説明する。図3は、ファイル生成装置のブロック図である。情報処理装置であるファイル生成装置1は、図3に示すように、生成処理部10及び制御部11を有する。制御部11は、生成処理部10の制御に関する処理を実行する。例えば、制御部11は、生成処理部10の各部の動作タイミングなどの統括制御を行う。生成処理部10は、データ入力部101、前処理部102、符号化部103、ファイル生成部104及び送信部105を有する。
図6は、クライアント装置のブロック図である。図6に示すように、クライアント装置2は、再生処理部20及び制御部21を有する。制御部21は、再生処理部20の各部の動作を制御する。例えば、制御部21は、再生処理部20の各部の動作のタイミングを統括制御する。再生処理部20は、ファイル取得部201、計測部202、ファイル処理部203、復号処理部204、表示制御部205、表示情報生成部206及び表示部207を有する。
次に、図7を参照して、第1の実施形態に係るファイル生成装置1によるファイル生成処理の流れについて詳細に説明する。図7は、第1の実施形態に係るファイル生成装置によるファイル生成処理のフローチャートである。
次に、図8を参照して、本実施形態に係るクライアント装置2により実行される再生処理の流れを説明する。図8は、第1の実施形態に係るクライアント装置により実行される再生処理のフローチャートである。
本変形例に係るファイル生成装置1は、シーンディスクリプションのISOBMFFファイルのSampleEntryにコンテンツ構成選択情報を格納することが第1の実施形態と異なる。
本変形例に係るファイル生成装置1は、各コンテンツ構成選択情報の値を決めてグループ化し、そのグループにより各コンテンツ構成の属するグループを示すことで各コンテンツ構成のコンテンツ構成選択情報を表すことが第1の実施形態と異なる。
本変形例に係るファイル生成装置1は、3次元モデルデータを伝送する際のファイルフォーマットとしてISOBMFFではなく、Matroska Media Container(http://www.matroska.org/)を用いることが第1の実施形態と異なる。
本実施例に係るファイル生成装置1は、シーンディスクリプションにおいて、コンテンツ構成毎にコンテンツ構成選択情報を格納することが第1の実施形態と異なる。図13は、第1の実施形態の変形例(4)に係るコンテンツ構成選択情報の格納方法を説明するための図である。
本変形例に係るファイル生成装置1は、シーンディスクリプションへのアクセス情報を示すMPDファイルにおけるAdaptationSetにコンテンツ構成選択情報を格納することが第1の実施形態と異なる。
本実施形態に係るファイル生成装置1は、コンテンツ構成毎にシーンディスクリプションを別ファイルとするファイル構成として、各シーンディスクリプションのAdaptationSetにコンテンツ構成選択情報を格納することが第1の実施形態と異なる。本実施形態に係るファイル生成装置1も図3のブロック図で表される。また、本実施形態に係るクライアント装置2も図6のブロック図で表される。以下の説明では、第1の実施形態と同様の各部の動作については説明を省略する場合がある。
本変形例に係るファイル生成装置1は、コンテンツ構成選択情報をシーンディスクリプションのISOBMFFファイルのSampleEntryに格納することが第2の実施形態と異なる。
6DoFコンテンツの配信において、最大の処理能力、ビットレート、精細度をなどのアダプテーションの変更が可能である。そして、1つのコンテンツ構成内で3次元モデルデータ毎のビットレートアダプテーションなどにより、クライアントに要求される再生処理能力は変化する。例えば、あるコンテンツ構成で3次元モデルデータのメッシュの面数のバリエーションが最大のストリームを選択すれば、クライアントには高い再生処理能力が要求されるが、最小の面数を選択すればクライアントに要求される再生処理能力を低く抑えることができる。
本変形例に係るファイル生成装置1は、シーンディスクリプションのISOBMFFファイルのSampleEntryにコンテンツ構成選択情報を格納することが第3の実施形態と異なる。
本変形例に係るファイル生成装置1は、シーンディスクリプションへのアクセス情報を示すMPDファイルにおけるAdaptationSetにコンテンツ構成選択情報を格納することが第2の実施形態と異なる。
本実施形態に係るファイル生成装置1は、MPDにおける3次元モデルデータのAdaptationSetに含まれるRepresentationに、各3次元モデルデータの構成情報を格納することが第1~第3の実施形態と異なる。本実施形態に係るファイル生成装置1も図3のブロック図で表される。また、本実施形態に係るクライアント装置2も図6のブロック図で表される。以下の説明では、第1の実施形態と同様の各部の動作については説明を省略する場合がある。
図22は、コンピュータのハードウェア構成図である。ファイル生成装置1及びクライアント装置2は、図22に示すコンピュータ90によって実現可能である。コンピュータ90において、プロセッサ91、メモリ92、ネットワークインタフェース93、不揮発性ストレージ94、入出力インタフェース95及びディスプレイインタフェース86は、バスを介して相互に接続される。
前記仮想空間のデータ及び前記コンテンツ構成選択情報を含むファイルを生成するファイル生成部と
を備えた情報処理装置。
(2)前記前処理部は、前記コンテンツを再生する再生処理装置において、前記仮想空間の全体のデコード及びレンダリングが実行可能か否かを判定するための可否識別情報を前記コンテンツ構成選択情報に含める付記(1)に記載の情報処理装置。
(3)前記前処理部は、前記コンテンツ構成選択情報をシーンディスクリプションに格納する付記(1)又は(2)に記載の情報処理装置。
(4)前記前処理部は、前記コンテンツ構成選択情報を前記シーンディスクリプションのSwitchノードに格納する付記(3)に記載の情報処理装置。
(5)前記前処理部は、前記コンテンツ構成選択情報を前記シーンディスクリプションにおけるコンテンツ構成毎に格納する付記(3)に記載の情報処理装置。
(6)前記前処理部は、前記コンテンツ構成選択情報をメタデータとして生成し、
前記ファイル生成部は、前記コンテンツ構成選択情報を格納したコンテンツファイルを生成する
付記(2)に記載の情報処理装置。
(7)前記ファイル生成部は、前記コンテンツファイルをISOBMFFファイルとして生成し、且つ、前記コンテンツファイルのSampleEntryにおける6DoFContentStructBoxに前記コンテンツ構成選択情報を格納する付記(6)に記載の情報処理装置。
(8)前記前処理部は、予め前記コンテンツ構成が決められたグループ毎の前記コンテンツ構成選択情報を有し、各前記コンテンツが属する前記グループの前記コンテンツ構成選択情報をそれぞれの前記コンテンツの前記コンテンツ構成選択情報とする付記(2)に記載の情報処理装置。
(9)前記前処理部は、前記コンテンツ構成選択情報をメタデータとして生成し、
前記ファイル生成部は、前記コンテンツ構成選択情報を格納したメタデータを生成する
付記(2)に記載の情報処理装置。
(10)前記ファイル生成部は、前記コンテンツファイルをMPDファイルとして生成し、且つ、前記コンテンツ構成選択情報を前記MPDファイルのAdaptationSetに格納する付記(9)に記載の情報処理装置。
(11)前記前処理部は、異なるコンテンツ構成を有する複数の前記コンテンツについて、前記コンテンツ構成毎に異なるシーンディスクリプションを生成し、
前記ファイル生成部は、各前記シーンディスクリプションに対するMPDファイルのAdaptationSet又はISOBMFFファイルのSampleEntryにおける6DoFContentStructBoxに前記コンテンツ構成選択情報を格納する
付記(2)に記載の情報処理装置。
(12)前記前処理部は、前記コンテンツを再生可能な再生処理能力を示す情報を前記コンテンツ構成情報とすることを特徴とする付記(1)~(11)のいずれか1項に記載の情報処理装置。
(13)前記前処理部は、前記コンテンツの一部分を再生可能な再生処理能力を示す情報を前記コンテンツ構成選択情報に含ませる付記(12)に記載の情報処理装置。
(14)1以上の3次元オブジェクトとそれらの空間配置情報から構成される仮想空間を表現するコンテンツ構成を有する1つ又は複数のコンテンツについて、各前記コンテンツの再生処理が可能であるか否かを判定するためのコンテンツ構成選択情報を生成し、
前記仮想空間のデータ及び前記コンテンツ構成選択情報を含むファイルを生成する
処理をコンピュータに実行させる情報処理方法。
(15)1以上の3次元オブジェクトとそれらの空間配置情報から構成される仮想空間を表現するコンテンツ構成を有する1つ又は複数のコンテンツについて、各前記コンテンツの再生処理が可能であるか否かを判定するためのコンテンツ構成選択情報及び前記コンテンツのデータを含むファイルを取得するファイル取得部と、
前記ファイル取得部により取得された前記ファイルから前記コンテンツ構成選択情報を取得し、取得した前記コンテンツ構成選択情報を基に、各前記コンテンツの再生処理が可能であるか否かを判定し、再生する前記コンテンツを選択するファイル処理部と、
前記ファイル処理部により選択された前記コンテンツを再生する再生部と
を備えた再生処理装置。
(16)1以上の3次元オブジェクトとそれらの空間配置情報から構成される仮想空間を表現するコンテンツ構成を有する1つ又は複数のコンテンツについて、各前記コンテンツの再生処理が可能であるか否かを判定するためのコンテンツ構成選択情報を含むファイルを取得し、
取得した前記ファイルから前記コンテンツ構成選択情報を取得し、取得した前記コンテンツ構成選択情報を基に、各前記コンテンツの再生処理が可能であるか否かを判定し、再生する前記コンテンツを選択し、
選択された前記コンテンツのデータを取得し、
取得した前記データを用いて選択した前記コンテンツを再生する
処理をコンピュータに実行させる再生処理方法。
2 クライアント装置
3 Webサーバ
4 ネットワーク
10 生成処理部
11 制御部
20 再生処理部
21 制御部
101 データ入力部
102 前処理部
103 符号化部
104 ファイル生成部
105 送信部
201 ファイル取得部
202 計測部
203 ファイル処理部
204 復号処理部
205 表示制御部
206 表示情報生成部
207 表示部
Claims (16)
- 1以上の3次元オブジェクトとそれらの空間配置情報から構成される仮想空間を表現するコンテンツ構成を有する1つ又は複数のコンテンツについて、各前記コンテンツの再生処理が可能であるか否かを判定するためのコンテンツ構成選択情報を生成する前処理部と、
前記仮想空間のデータ及び前記コンテンツ構成選択情報を含むファイルを生成するファイル生成部と
を備えた情報処理装置。 - 前記前処理部は、前記コンテンツを再生する再生処理装置において、前記仮想空間の全体のデコード及びレンダリングが実行可能か否かを判定するための可否識別情報を前記コンテンツ構成選択情報に含める請求項1に記載の情報処理装置。
- 前記前処理部は、前記コンテンツ構成選択情報をシーンディスクリプションに格納する請求項2に記載の情報処理装置。
- 前記前処理部は、前記コンテンツ構成選択情報を前記シーンディスクリプションのSwitchノードに格納する請求項3に記載の情報処理装置。
- 前記前処理部は、前記コンテンツ構成選択情報を前記シーンディスクリプションにおけるコンテンツ構成毎に格納する請求項3に記載の情報処理装置。
- 前記前処理部は、前記コンテンツ構成選択情報をメタデータとして生成し、
前記ファイル生成部は、前記コンテンツ構成選択情報を格納したコンテンツファイルを生成する
請求項2に記載の情報処理装置。 - 前記ファイル生成部は、前記コンテンツファイルをISOBMFF(ISO Base Media File Format)ファイルとして生成し、且つ、前記コンテンツファイルのSampleEntryにおける6DoFContentStructBoxに前記コンテンツ構成選択情報を格納する請求項6に記載の情報処理装置。
- 前記前処理部は、予め前記コンテンツ構成が決められたグループ毎の前記コンテンツ構成選択情報を有し、各前記コンテンツが属する前記グループの前記コンテンツ構成選択情報をそれぞれの前記コンテンツの前記コンテンツ構成選択情報とする請求項2に記載の情報処理装置。
- 前記前処理部は、前記コンテンツ構成選択情報をメタデータとして生成し、
前記ファイル生成部は、前記コンテンツ構成選択情報を格納したメタデータファイルを生成する
請求項2に記載の情報処理装置。 - 前記ファイル生成部は、前記メタデータファイルをMPD(Media Presentation Description)ファイルとして生成し、且つ、前記コンテンツ構成選択情報を前記MPDファイルのAdaptationSetに格納する請求項9に記載の情報処理装置。
- 前記前処理部は、異なるコンテンツ構成を有する複数の前記コンテンツについて、前記コンテンツ構成毎に異なるシーンディスクリプションを生成し、
前記ファイル生成部は、各前記シーンディスクリプションに対するMPDファイルのAdaptationSet又はISOBMFFファイルのSampleEntryにおける6DoFContentStructBoxに前記コンテンツ構成選択情報を格納する
請求項2に記載の情報処理装置。 - 前記前処理部は、前記コンテンツを再生可能な再生処理能力を示す情報を前記コンテンツ構成選択情報とすることを特徴とする請求項1に記載の情報処理装置。
- 前記前処理部は、前記コンテンツの一部分を再生可能な再生処理能力を示す情報を前記コンテンツ構成選択情報に含ませる請求項12に記載の情報処理装置。
- 1以上の3次元オブジェクトとそれらの空間配置情報から構成される仮想空間を表現するコンテンツ構成を有する1つ又は複数のコンテンツについて、各前記コンテンツの再生処理が可能であるか否かを判定するためのコンテンツ構成選択情報を生成し、
前記仮想空間のデータ及び前記コンテンツ構成選択情報を含むファイルを生成する
処理をコンピュータに実行させる情報処理方法。 - 1以上の3次元オブジェクトとそれらの空間配置情報から構成される仮想空間を表現するコンテンツ構成を有する1つ又は複数のコンテンツについて、各前記コンテンツの再生処理が可能であるか否かを判定するためのコンテンツ構成選択情報及び前記コンテンツのデータを含むファイルを取得するファイル取得部と、
前記ファイル取得部により取得された前記ファイルから前記コンテンツ構成選択情報を取得し、取得した前記コンテンツ構成選択情報を基に、各前記コンテンツの再生処理が可能であるか否かを判定し、再生する前記コンテンツを選択するファイル処理部と、
前記ファイル処理部により選択された前記コンテンツを再生する再生部と
を備えた再生処理装置。 - 1以上の3次元オブジェクトとそれらの空間配置情報から構成される仮想空間を表現するコンテンツ構成を有する1つ又は複数のコンテンツについて、各前記コンテンツの再生処理が可能であるか否かを判定するためのコンテンツ構成選択情報を含むファイルを取得し、
取得した前記ファイルから前記コンテンツ構成選択情報を取得し、取得した前記コンテンツ構成選択情報を基に、各前記コンテンツの再生処理が可能であるか否かを判定し、再生する前記コンテンツを選択し、
選択された前記コンテンツのデータを取得し、
取得した前記データを用いて選択した前記コンテンツを再生する
処理をコンピュータに実行させる再生処理方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202080044831.2A CN114026875A (zh) | 2019-06-25 | 2020-03-31 | 信息处理装置、信息处理方法、再现处理装置和再现处理方法 |
EP20831253.8A EP3982638A4 (en) | 2019-06-25 | 2020-03-31 | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, REPRODUCTION PROCESSING DEVICE AND REPRODUCTION PROCESSING METHOD |
US17/617,014 US20220239994A1 (en) | 2019-06-25 | 2020-03-31 | Information processing apparatus, information processing method, reproduction processing apparatus, and reproduction processing method |
JP2021527391A JP7544048B2 (ja) | 2019-06-25 | 2020-03-31 | 情報処理装置、情報処理方法、再生処理装置及び再生処理方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962866430P | 2019-06-25 | 2019-06-25 | |
US62/866,430 | 2019-06-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020261689A1 true WO2020261689A1 (ja) | 2020-12-30 |
Family
ID=74060549
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2020/014884 WO2020261689A1 (ja) | 2019-06-25 | 2020-03-31 | 情報処理装置、情報処理方法、再生処理装置及び再生処理方法 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20220239994A1 (ja) |
EP (1) | EP3982638A4 (ja) |
JP (1) | JP7544048B2 (ja) |
CN (1) | CN114026875A (ja) |
WO (1) | WO2020261689A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022220291A1 (ja) * | 2021-04-15 | 2022-10-20 | ソニーグループ株式会社 | 情報処理装置および方法 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12088667B1 (en) | 2023-03-30 | 2024-09-10 | Dropbox, Inc. | Generating and managing multilocational data blocks |
US12093299B1 (en) | 2023-03-30 | 2024-09-17 | Dropbox, Inc. | Generating and summarizing content blocks within a virtual space interface |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019064853A1 (ja) * | 2017-09-26 | 2019-04-04 | キヤノン株式会社 | 情報処理装置、情報提供装置、制御方法、及びプログラム |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4789604B2 (ja) | 2005-12-02 | 2011-10-12 | シャープ株式会社 | コンテンツ切替判定システム及び切替指示端末、並びに、コンテンツ切替判定方法 |
KR101167246B1 (ko) * | 2007-07-23 | 2012-07-23 | 삼성전자주식회사 | 3차원 콘텐츠 재생 장치 및 그 제어 방법 |
US8553028B1 (en) | 2007-10-29 | 2013-10-08 | Julian Michael Urbach | Efficiently implementing and displaying independent 3-dimensional interactive viewports of a virtual world on multiple client devices |
CN102318351B (zh) * | 2009-02-17 | 2016-06-15 | 三星电子株式会社 | 图形图像处理方法和设备 |
US8957920B2 (en) * | 2010-06-25 | 2015-02-17 | Microsoft Corporation | Alternative semantics for zoom operations in a zoomable scene |
US9716920B2 (en) | 2010-08-05 | 2017-07-25 | Qualcomm Incorporated | Signaling attributes for network-streamed video data |
FR2974474B1 (fr) | 2011-04-19 | 2017-11-17 | Prologue | Procedes et appareils de production et de traitement de representations de scenes multimedias |
CN103635891B (zh) | 2011-05-06 | 2017-10-27 | 奇跃公司 | 大量同时远程数字呈现世界 |
RU2621697C2 (ru) * | 2012-08-31 | 2017-06-07 | Функе Диджитал Тв Гайд Гмбх | Электронный путеводитель по медиаконтенту |
US9111378B2 (en) | 2012-10-31 | 2015-08-18 | Outward, Inc. | Virtualizing content |
EP3148200B1 (en) * | 2014-06-30 | 2020-06-17 | Sony Corporation | Information processing device and method selecting content files based on encoding parallelism type |
RU2728534C2 (ru) * | 2016-02-12 | 2020-07-30 | Сони Корпорейшн | Устройство обработки информации и способ обработки информации |
KR102655630B1 (ko) * | 2018-10-08 | 2024-04-08 | 삼성전자주식회사 | 3차원 비디오 컨텐츠를 포함하는 미디어 파일을 생성하는 방법 및 장치 및 3차원 비디오 컨텐츠를 재생하는 방법 및 장치 |
EP4013042A1 (en) * | 2019-09-30 | 2022-06-15 | Sony Group Corporation | Information processing device, reproduction processing device, and information processing method |
-
2020
- 2020-03-31 WO PCT/JP2020/014884 patent/WO2020261689A1/ja active Application Filing
- 2020-03-31 US US17/617,014 patent/US20220239994A1/en active Pending
- 2020-03-31 JP JP2021527391A patent/JP7544048B2/ja active Active
- 2020-03-31 CN CN202080044831.2A patent/CN114026875A/zh active Pending
- 2020-03-31 EP EP20831253.8A patent/EP3982638A4/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019064853A1 (ja) * | 2017-09-26 | 2019-04-04 | キヤノン株式会社 | 情報処理装置、情報提供装置、制御方法、及びプログラム |
Non-Patent Citations (4)
Title |
---|
"Information technology. Coding of audio-visual objects", ISO/IEC 14496-12:2015, December 2015 (2015-12-01) |
"Information technology. Coding of audio-visual objects. Part 11: Sene description and application engine", ISO/IEC 14496-11:2015, November 2015 (2015-11-01) |
"Information technology. Dynamic adaptive streaming over HTTP(DASH), Part 1: Media presentation description and segment formats", ISO/IEC 23009-1:2014, May 2014 (2014-05-01) |
See also references of EP3982638A4 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022220291A1 (ja) * | 2021-04-15 | 2022-10-20 | ソニーグループ株式会社 | 情報処理装置および方法 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2020261689A1 (ja) | 2020-12-30 |
EP3982638A1 (en) | 2022-04-13 |
JP7544048B2 (ja) | 2024-09-03 |
US20220239994A1 (en) | 2022-07-28 |
EP3982638A4 (en) | 2023-03-29 |
CN114026875A (zh) | 2022-02-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020261689A1 (ja) | 情報処理装置、情報処理方法、再生処理装置及び再生処理方法 | |
US8411758B2 (en) | Method and system for online remixing of digital multimedia | |
WO2019202207A1 (en) | Processing video patches for three-dimensional content | |
KR100976887B1 (ko) | 동적 미디어 사양 작성기 및 적용기를 작성 및 적용하는 방법 및 시스템 | |
JPWO2020137642A1 (ja) | 情報処理装置および情報処理方法 | |
KR100991583B1 (ko) | 편집 정보를 미디어 콘텐츠와 결합하는 방법, 컴퓨터 판독가능 저장 매체 및 시스템 | |
US10931930B2 (en) | Methods and apparatus for immersive media content overlays | |
JP7415936B2 (ja) | 情報処理装置および情報処理方法 | |
JP7439762B2 (ja) | 情報処理装置および情報処理方法、並びにプログラム | |
JP7480773B2 (ja) | 情報処理装置、情報処理方法、再生処理装置及び再生処理方法 | |
JP7501372B2 (ja) | 情報処理装置および情報処理方法 | |
WO2021065277A1 (ja) | 情報処理装置、再生処理装置及び情報処理方法 | |
CN114450966A (zh) | 用于异构沉浸式媒体的表示和流式传输的数据模型 | |
WO2021065605A1 (ja) | 情報処理装置および情報処理方法 | |
WO2021002338A1 (ja) | 情報処理装置、情報処理方法、再生処理装置及び再生処理方法 | |
WO2007084870A2 (en) | Method and system for recording edits to media content | |
TWI820490B (zh) | 利用衍生視訊軌道實現場景描述的方法和系統 | |
WO2021002142A1 (ja) | 情報処理装置、情報処理方法、再生処理装置及び再生処理方法 | |
WO2021140956A1 (ja) | 情報処理装置および方法 | |
WO2020145139A1 (ja) | 情報処理装置および情報処理方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20831253 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021527391 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2020831253 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2020831253 Country of ref document: EP Effective date: 20220106 |