WO2023176928A1 - 情報処理装置および方法 - Google Patents

情報処理装置および方法 Download PDF

Info

Publication number
WO2023176928A1
WO2023176928A1 PCT/JP2023/010321 JP2023010321W WO2023176928A1 WO 2023176928 A1 WO2023176928 A1 WO 2023176928A1 JP 2023010321 W JP2023010321 W JP 2023010321W WO 2023176928 A1 WO2023176928 A1 WO 2023176928A1
Authority
WO
WIPO (PCT)
Prior art keywords
media
data
description
file
interaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2023/010321
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
光浩 平林
遼平 高橋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Group Corp
Original Assignee
Sony Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Group Corp filed Critical Sony Group Corp
Priority to CN202380022930.4A priority Critical patent/CN118743228A/zh
Priority to JP2024508252A priority patent/JPWO2023176928A1/ja
Priority to EP23770881.3A priority patent/EP4496319A4/en
Publication of WO2023176928A1 publication Critical patent/WO2023176928A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234318Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into objects, e.g. MPEG-4 objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/016Input arrangements with force or tactile feedback as computer generated output to the user
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format

Definitions

  • the present disclosure relates to an information processing device and method, and particularly relates to an information processing device and method that can suppress reduction in playback performance of media data associated with 3D data.
  • glTF The GL Transmission Format
  • 3D three-dimensional objects in three-dimensional space
  • Non-Patent Document 5 In parallel with the standardization of coding and transmission technology for haptic media, research has begun to search for a technology for handling haptic media in MPEG-I Scene Description (see, for example, Non-Patent Document 5).
  • the present disclosure has been made in view of this situation, and is intended to suppress reduction in playback performance of media data associated with 3D data.
  • An information processing device includes an acquisition unit that acquires encoded data of dynamic haptic media associated with 3D data to be played based on a description of a scene description file; a decoding unit that decodes the encoded data and generates the haptic media data based on the description of the scene description file; and a decoding unit that generates the haptic media data based on the description of the scene description file; a storage unit that stores data in a storage area; and a generation unit that reads data of the haptic media stored in the storage area and generates haptic media information for output based on the description of the scene description file.
  • An information processing device comprising:
  • An information processing method obtains encoded data of dynamic haptic media associated with 3D data to be played based on the description of a scene description file, and decoding the encoded data based on the description, generating data of the haptic media, and storing the data of the haptic media in a storage area corresponding to an accessor specified by the scene description file;
  • the information processing method reads data of the haptic media stored in the storage area based on the description of the scene description file, and generates haptic media information for output.
  • An information processing device includes a file generation unit that generates a scene description file that specifies an accessor for storing dynamic haptic media associated with 3D data in a predetermined storage area. It is an information processing device.
  • An information processing method is an information processing method that generates a scene description file that specifies an accessor for storing dynamic haptic media associated with 3D data in a predetermined storage area. .
  • the information processing device acquires encoded data of the interaction-type media associated with 3D data to be played, based on a description of the interaction-type media included in a scene description file. and a decoding unit that decodes the acquired encoded data and generates data of the interaction type media based on the description of the scene description file.
  • An information processing method obtains encoded data of the interaction-type media associated with 3D data to be played, based on a description of the interaction-type media included in a scene description file;
  • the information processing method decodes the acquired encoded data based on the description of the scene description file and generates data of the interaction type media.
  • An information processing device is an information processing device including a file generation unit that generates a scene description file that includes a description of interaction media associated with 3D data.
  • An information processing method is an information processing method that generates a scene description file that includes a description of interaction media associated with 3D data.
  • encoded data of dynamic haptic media associated with 3D data to be played back is acquired based on the description of the scene description file, and the scene disc is Based on the description in the scene description file, the encoded data is decoded to generate haptic media data, and the haptic media data is stored in the storage area corresponding to the accessor specified by the scene description file. Based on the description in the scene description file, the haptic media data stored in the storage area is read out, and haptic media information for output is generated.
  • a scene description file is generated that specifies an accessor for storing dynamic haptic media associated with 3D data in a predetermined storage area.
  • encoded data of the interaction-type media associated with the 3D data to be played is determined based on the description of the interaction-type media included in the scene description file.
  • the acquired encoded data is decoded based on the description of the scene description file, and interaction media data is generated.
  • a scene description file is generated that includes a description of interaction media associated with 3D data.
  • FIG. 2 is a diagram showing an example of the main configuration of glTF2.0.
  • FIG. 3 is a diagram showing an example of glTF objects and reference relationships.
  • FIG. 3 is a diagram illustrating a description example of a scene description.
  • FIG. 3 is a diagram illustrating a method of accessing binary data.
  • FIG. 3 is a diagram illustrating a description example of a scene description.
  • FIG. 2 is a diagram illustrating the relationship between a buffer object, a buffer view object, and an accessor object.
  • FIG. 7 is a diagram showing an example of description of buffer object, buffer view object, and accessor object.
  • FIG. 2 is a diagram illustrating a configuration example of an object of a scene description.
  • FIG. 3 is a diagram illustrating a description example of a scene description.
  • FIG. 3 is a diagram illustrating a method for expanding an object.
  • FIG. 2 is a diagram illustrating the configuration of client processing.
  • FIG. 3 is a diagram illustrating a configuration example of an extension for handling timed metadata.
  • FIG. 3 is a diagram illustrating a description example of a scene description.
  • FIG. 3 is a diagram illustrating a description example of a scene description.
  • FIG. 3 is a diagram illustrating a configuration example of an extension for handling timed metadata.
  • FIG. 2 is a diagram showing an example of the main configuration of a client. 3 is a flowchart illustrating an example of the flow of client processing.
  • FIG. 2 is a diagram illustrating an overview of haptic media encoding.
  • FIG. 3 is a diagram illustrating an example of expanding ISOBMFF for storing haptic media.
  • FIG. 7 is a diagram illustrating an example of expanding scene descriptions for handling haptic media.
  • FIG. 3 is a diagram illustrating an example of how haptic media is played.
  • FIG. 7 is a diagram illustrating an example of expanding scene descriptions for handling haptic media.
  • FIG. 7 is a diagram illustrating an example of specifying an accessor corresponding to a buffer that stores dynamic haptic media.
  • FIG. 3 is a diagram illustrating an example of a description of dynamic haptic media in a scene description.
  • FIG. 3 is a diagram illustrating an example of element semantics.
  • FIG. 3 is a diagram illustrating an example of a scene description regarding interaction-type media.
  • FIG. 3 is a diagram illustrating an example of element semantics.
  • FIG. 3 is a diagram illustrating an example of element semantics.
  • FIG. 3 is a diagram illustrating an example of a scene description regarding interaction-type media.
  • FIG. 3 is a diagram illustrating an example of element semantics.
  • FIG. 2 is a block diagram showing an example of the main configuration of a file generation device. 3 is a flowchart illustrating an example of the flow of file generation processing.
  • FIG. 2 is a block diagram showing an example of the main configuration of a client device.
  • 3 is a flowchart illustrating an example of the flow of reproduction processing.
  • 1 is a block diagram showing an example of the main configuration of a computer.
  • Non-patent document 1 (mentioned above)
  • Non-patent document 2 (mentioned above)
  • Non-patent document 3 (mentioned above)
  • Non-patent document 4 (mentioned above)
  • Non-patent document 5 (mentioned above)
  • the contents described in the above-mentioned non-patent documents and the contents of other documents referred to in the above-mentioned non-patent documents are also the basis for determining support requirements.
  • the syntax and terms such as glTF2.0 and its extension described in Non-Patent Documents 1 to 3 are not directly defined in this disclosure, they are within the scope of this disclosure and are claimed. shall meet the support requirements for the following:
  • technical terms such as parsing, syntax, and semantics are also within the scope of this disclosure and are claimed even if they are not directly defined in this disclosure. shall meet the support requirements for the following:
  • glTF The GL Transmission Format
  • glTF2.0 for example, as shown in FIG. 1, is composed of a JSON format file (.glTF), a binary file (.bin), and an image file (.png, .jpg, etc.).
  • Binary files store binary data such as geometry and animation.
  • the image file stores data such as texture.
  • the JSON format file is a scene description file written in JSON (JavaScript (registered trademark) Object Notation).
  • a scene description is metadata that describes (an explanation of) a scene of 3D content. This scene description defines what kind of scene it is.
  • a scene description file is a file that stores such scene descriptions. In this disclosure, the scene description file is also referred to as a scene description file.
  • JSON format file consists of a list of key (KEY) and value (VALUE) pairs.
  • KEY key
  • VALUE value
  • the key is composed of a character string. Values are composed of numbers, strings, boolean values, arrays, objects, null, etc.
  • key-value pairs (“KEY”:”VALUE”) can be grouped together using ⁇ (curly braces).
  • curly braces
  • the object grouped in curly braces is also called a JSON object.
  • An example of the format is shown below. “user”: ⁇ "id”:1, "name”:"tanaka” ⁇
  • JSON object containing the "id”:1 pair and "name”:"tanaka” pair is defined as the value corresponding to the key (user).
  • zero or more values can be arrayed using square brackets ([]).
  • This array is also called a JSON array.
  • a JSON object can also be applied as an element of this JSON array.
  • An example of the format is shown below.
  • Figure 2 shows the glTF objects that can be written at the top of a JSON format file and the reference relationships they can have.
  • the long circles in the tree structure shown in FIG. 2 indicate objects, and the arrows between the objects indicate reference relationships.
  • objects such as "scene”, “node”, “mesh”, “camera”, “skin”, “material”, and “texture” are written at the top of the JSON format file.
  • FIG. 3 An example of the description of such a JSON format file (scene description) is shown in Figure 3.
  • the JSON format file 20 in FIG. 3 shows a description example of a part of the top level.
  • this top level object 21 is the glTF object shown in FIG.
  • reference relationships between objects are shown as arrows 22. More specifically, the reference relationship is indicated by specifying the index of the element in the array of the referenced object in the property of the higher-level object.
  • FIG. 4 is a diagram illustrating a method of accessing binary data.
  • binary data is stored in a buffer object.
  • information for accessing binary data eg, URI (Uniform Resource Identifier), etc.
  • URI Uniform Resource Identifier
  • FIG. 4 is a diagram illustrating a method of accessing binary data.
  • binary data is stored in a buffer object.
  • information for accessing binary data eg, URI (Uniform Resource Identifier), etc.
  • URI Uniform Resource Identifier
  • Figure 5 shows a description example of a mesh object (mesh) in a JSON format file.
  • vertex attributes such as NORMAL, POSITION, TANGENT, and TEXCORD_0 are defined as keys, and for each attribute, the referenced accessor object is specified as a value. has been done.
  • Figure 6 shows the relationship between the buffer object, buffer view object, and accessor object. Furthermore, an example of description of these objects in the JSON format file is shown in FIG.
  • the buffer object 41 is an object that stores information (URI, etc.) for accessing binary data, which is real data, and information indicating the data length (for example, byte length) of the binary data.
  • a in FIG. 7 shows an example of the description of the buffer object 41.
  • "bytelength”:102040" shown in A of FIG. 7 indicates that the byte length of the buffer object 41 is 102040 bytes, as shown in FIG. 6.
  • "uri”:"duck.bin” shown in A of FIG. 7 indicates that the URI of the buffer object 41 is "duck.bin", as shown in FIG.
  • the buffer view object 42 is an object that stores information regarding a subset area of binary data specified in the buffer object 41 (that is, information regarding a partial area of the buffer object 41).
  • B in FIG. 7 shows an example of the description of the buffer view object 42.
  • the buffer view object 42 indicates, for example, the identification information of the buffer object 41 to which the buffer view object 42 belongs, and the position of the buffer view object 42 within the buffer object 41.
  • Information such as an offset (for example, byte offset) and a length (for example, byte length) indicating the data length (for example, byte length) of the buffer view object 42 is stored.
  • each buffer view object that is, for each subset area.
  • information such as “buffer”:0”, “bytelength”:25272", and “byteOffset”:0 shown in the upper part of B in FIG. 7 is shown in the buffer object 41 in FIG. This is information about the first buffer view object 42 (bufferView[0]).
  • information such as "buffer”:0, "bytelength”:76768, and "byteOffset”:25272, shown at the bottom of B in FIG. 7, is shown in the buffer object 41 in FIG. This is information about the second buffer view object 42 (bufferView[1]) that is displayed.
  • buffer:0 of the first buffer view object 42 (bufferView[0]) shown in B of FIG. This indicates that the identification information of the buffer object 41 to which the buffer object 41 belongs is “0" (Buffer[0]). Further, “bytelength”:25272” indicates that the byte length of the buffer view object 42 (bufferView[0]) is 25272 bytes. Furthermore, “byteOffset”:0 indicates that the byte offset of the buffer view object 42 (bufferView[0]) is 0 bytes.
  • buffer “buffer”:0" of the second buffer view object 42 (bufferView[1]) shown in B of FIG. This indicates that the identification information of the buffer object 41 to which the buffer object 41 belongs is “0" (Buffer[0]). Further, “bytelength”:76768” indicates that the byte length of the buffer view object 42 (bufferView[0]) is 76768 bytes. Further, “byteOffset”:25272” indicates that the byte offset of the buffer view object 42 (bufferView[0]) is 25272 bytes.
  • the accessor object 43 is an object that stores information regarding how to interpret the data of the buffer view object 42.
  • C in FIG. 7 shows a description example of the accessor object 43.
  • the accessor object 43 includes, for example, identification information of the buffer view object 42 to which the accessor object 43 belongs, and an offset indicating the position of the buffer view object 42 within the buffer object 41. (for example, byte offset), the component type of the buffer view object 42, the number of data stored in the buffer view object 42, the type of data stored in the buffer view object 42, and the like. This information is written for each buffer view object.
  • bufferView In the example of C in Figure 7, "bufferView”:0”, “byteOffset”:0”, “componentType”:5126”, “count”:2106", “type”:”VEC3”, etc. information is shown.
  • “bufferView”:0” indicates that the identification information of the buffer view object 42 to which the accessor object 43 belongs is “0" (bufferView[0]), as shown in FIG.
  • “byteOffset”:0” indicates that the byte offset of the buffer view object 42 (bufferView[0]) is 0 bytes.
  • componentType FLOAT type (OpenGL macro constant).
  • count indicates that the number of data stored in the buffer view object 42 (bufferView[0]) is 2106.
  • type indicates that (the type of) data stored in the buffer view object 42 (bufferView[0]) is a three-dimensional vector.
  • a point cloud is 3D content that represents a three-dimensional structure (three-dimensional object) as a collection of many points.
  • Point cloud data is composed of position information (also referred to as geometry) and attribute information (also referred to as attribute) for each point.
  • Attributes can contain arbitrary information.
  • the attributes may include color information, reflectance information, normal line information, etc. of each point. In this way, the point cloud has a relatively simple data structure, and by using a sufficiently large number of points, any three-dimensional structure can be expressed with sufficient accuracy.
  • FIG. 8 is a diagram illustrating an example of the configuration of objects in a scene description when the point cloud is static.
  • FIG. 9 is a diagram showing an example of the scene description.
  • the mode of the primitives object is specified as 0, indicating that data is treated as a point in a point cloud.
  • an accessor to a buffer that stores the position information of a point is specified. Ru.
  • an accessor to a buffer that stores color information of a point (Point) is specified. There may be one buffer and one buffer view (data may be stored in one file).
  • Each glTF2.0 object can store newly defined objects in an extension object.
  • FIG. 10 shows a description example when defining a newly defined object (ExtensionExample). As shown in FIG. 10, when using a newly defined extension, the extension object name (ExtensionExample in the example of FIG. 10) is written in "extensionUsed" and "extensionRequired". This indicates that this extension is an extension that will be used or an extension that is required for loading.
  • the client device acquires a scene description, acquires 3D object data based on the scene description, and generates a display image using the scene description and 3D object data.
  • a presentation engine, a media access function, etc. perform processing.
  • the presentation engine 51 of the client device 50 acquires the initial value of a scene description and information for updating the scene description (hereinafter also referred to as update information). and generates a scene description at the time to be processed. Then, the presentation engine 51 analyzes the scene description and specifies the media (video, audio, etc.) to be played. The presentation engine 51 then requests the media access function 52 to acquire the media via the media access API (Application Program Interface).
  • the presentation engine 51 also performs pipeline processing settings, buffer designation, and the like.
  • the media access function 52 acquires various media data requested by the presentation engine 51 from the cloud, local storage, etc.
  • the media access function 52 supplies various data (encoded data) of the acquired media to a pipeline 53.
  • the pipeline 53 decodes various data (encoded data) of the supplied media by pipeline processing, and supplies the decoding results to a buffer 54.
  • the buffer 54 holds various data on the supplied media.
  • the presentation engine 51 performs rendering and the like using various media data held in the buffer 54.
  • Timed media is media data that changes in the time axis direction, such as a moving image in a two-dimensional image.
  • glTF was applicable only to still image data as media data (3D object content). In other words, glTF did not support video media data.
  • animation a method of switching still images along the time axis
  • MPEG-I Scene Description applies glTF2.0, applies JSON format files as scene descriptions, and extends glTF so that it can handle timed media (e.g. video data) as media data. It is being considered to do so.
  • timed media e.g. video data
  • the following extensions are made, for example.
  • FIG. 12 is a diagram illustrating an extension for handling timed media.
  • the MPEG media object (MPEG_media) is an extension of glTF, and is an object that specifies attributes of MPEG media such as video data, such as uri, track, renderingRate, and startTime.
  • an MPEG texture video object (MPEG_texture_video) is provided as an extension object (extensions) of the texture object (texture).
  • the MPEG texture video object stores information on the accessor corresponding to the buffer object to be accessed.
  • the MPEG texture video object is an object that specifies the index of the accessor that corresponds to the buffer in which the texture media specified by the MPEG media object (MPEG_media) is decoded and stored. .
  • FIG. 13 is a diagram showing a description example of an MPEG media object (MPEG_media) and an MPEG texture video object (MPEG_texture_video) in a scene description to explain the extension for handling timed media.
  • MPEG_media MPEG media object
  • MPEG_texture_video MPEG texture video object
  • an MPEG texture video object MPEG_texture_video
  • extension object extensions
  • the accessor index (“2" in this example) is specified as the value of the MPEG video texture object.
  • an MPEG media object (MPEG_media) is set as an extension object (extensions) of glTF in the 7th to 16th lines from the top, as shown below.
  • MPEG media object various information regarding the MPEG media object, such as the encoding and URI of the MPEG media object, is stored.
  • each frame data is decoded and sequentially stored in a buffer, but its position etc. changes, so the scene description stores this changing information so that the renderer can read the data.
  • a system will be established to do so.
  • an MPEG buffer circular object (MPEG_buffer_circular) is provided as an extension object (extensions) of the buffer object (buffer).
  • the MPEG buffer circular object stores information for dynamically storing data within the buffer object. For example, information such as information indicating the data length of the buffer header (bufferHeader) and information indicating the number of frames is stored in this MPEG buffer circular object.
  • the buffer header stores information such as, for example, an index, a timestamp and data length of the frame data to be stored.
  • an MPEG accessor timed object (MPEG_timed_accessor) is provided as an extension object (extensions) of the accessor object (accessor).
  • the buffer view object (bufferView) referred to in the time direction may change (the position may vary). Therefore, information indicating the referenced buffer view object is stored in this MPEG accessor timed object.
  • an MPEG accessor timed object stores information indicating a reference to a buffer view object (bufferView) in which a timed accessor information header is written.
  • the timed accessor information header is, for example, header information that stores information in a dynamically changing accessor object and a buffer view object.
  • FIG. 14 is a diagram showing a description example of an MPEG buffer circular object (MPEG_buffer_circular) and an MPEG accessor timed object (MPEG_accessor_timed) in a scene description to explain the extension for handling timed media.
  • MPEG_buffer_circular MPEG buffer circular object
  • MPEG_accessor_timed MPEG accessor timed object
  • an MPEG accessor timed object MPEG_accessor_timed
  • Parameters and their values such as the index of the buffer view object (in this example, "1"), update rate (updataRate), and immutable information (immutable), are specified as the value of the MPEG accessor timed object.
  • an MPEG buffer circular object (MPEG_buffer_circular) is set as an extension object (extensions) of the buffer object (buffer), as shown below.
  • Parameters such as buffer frame count (count), header length (headerLength), and update rate (updataRate) and their values are specified as values of the MPEG buffer circular object.
  • FIG. 15 is a diagram for explaining an extension for handling timed media.
  • FIG. 15 shows an example of the relationship between an MPEG accessor timed object, an MPEG buffer circular object, an accessor object, a buffer view object, and a buffer object.
  • the MPEG buffer circular object of the buffer object stores time-varying data in the buffer area indicated by the buffer object, such as buffer frame count (count), header length (headerLength), update rate (updataRate), etc.
  • the information necessary to do so is stored.
  • parameters such as an index (idex), a timestamp (timestamp), and a data length (length) are stored in a buffer header (bufferHeader) that is a header of the buffer area.
  • the MPEG accessor timed object of the accessor object stores information about the referenced buffer view object, such as the buffer view object index (bufferView), update rate (updataRate), immutable information (immutable), etc. Ru. Additionally, this MPEG accessor timed object stores information regarding the buffer view object in which the timed accessor information header to be referenced is stored. A timestamp delta (timestamp_delta), update data of an accessor object, update data of a buffer view object, etc. can be stored in the timed accessor information header.
  • timestamp delta timestamp_delta
  • the scene description is spatial arrangement information for arranging one or more 3D objects in 3D space.
  • the contents of this scene description can be updated along the time axis. In other words, the placement of 3D objects can be updated over time.
  • the client processing performed in the client device at that time will be explained.
  • FIG. 16 shows an example of the main configuration of the client device regarding client processing
  • FIG. 17 is a flowchart showing an example of the flow of the client processing
  • the client device includes a presentation engine (hereinafter also referred to as PE) 51, a media access function (MediaAccessFuncon (hereinafter also referred to as MAF)) 52, a pipeline (Pipeline) 53, and a buffer. (Buffer) 54.
  • the presentation engine (PE) 51 includes a glTF analysis section 63 and a rendering processing section 64.
  • the presentation engine (PE) 51 causes the media access function 52 to acquire media, acquires the data via the buffer 54, and performs processing related to display. Specifically, for example, processing is performed in the following flow.
  • the glTF analysis unit 63 of the presentation engine (PE) 51 starts PE processing as shown in the example of FIG. and parse the scene description.
  • step S22 the glTF analysis unit 63 checks the media associated with the 3D object (texture), the buffer that stores the media after processing, and the accessor.
  • step S23 the glTF analysis unit 63 notifies the media access function 52 of the information as a file acquisition request.
  • the media access function (MAF) 52 starts MAF processing as in the example of FIG. 17, and obtains the notification in step S11.
  • the media access function 52 acquires the media (3D object file (mp4)) based on the notification.
  • step S13 the media access function 52 decodes the acquired media (3D object file (mp4)).
  • step S14 the media access function 52 stores the decoded media data in the buffer 54 based on the notification from the presentation engine (PE51).
  • step S24 the rendering processing unit 64 of the presentation engine 51 reads (obtains) the data from the buffer 54 at an appropriate timing.
  • step S25 the rendering processing unit 64 performs rendering using the acquired data to generate a display image.
  • the media access function 52 executes these processes for each time (each frame) by repeating the processes of step S13 and step S14. Furthermore, the rendering processing unit 64 of the presentation engine 51 executes these processes for each time (each frame) by repeating the processes of step S24 and step S25.
  • the media access function 52 ends the MAF processing, and the presentation engine 51 ends the PE processing. In other words, the client processing ends.
  • Haptic media is information that expresses virtual sensations using, for example, vibration.
  • Haptic media for example, is used in association with 3D data, which is information representing a three-dimensional space.
  • 3D data includes, for example, content that expresses the three-dimensional shape of a 3D object placed in a three-dimensional space (e.g., mesh, point cloud, etc.), and video content or audio content (e.g., video) that is developed in a three-dimensional space. and audio 6DoF content, etc.).
  • content that expresses the three-dimensional shape of a 3D object placed in a three-dimensional space (e.g., mesh, point cloud, etc.)
  • video content or audio content e.g., video
  • audio 6DoF content etc.
  • the media associated with 3D data may be any information and is not limited to this haptic media.
  • images, sounds, etc. may be included in this media.
  • Media associated with 3D data e.g., images, sounds, vibrations, etc.
  • synchronous media that is played in synchronization with the progression (change) of the scene (state of 3D space) in the time direction
  • synchronous media that is played back in synchronization with the progression (change) of the scene (state of 3D space) in the time direction
  • interaction-type media that is played when a predetermined condition is satisfied in a scene (that is, played in response to a predetermined event).
  • Haptics media of synchronous media is also referred to as synchronous haptics media.
  • haptics media which is interaction type media is also referred to as interaction type haptics media.
  • Synchronous haptic media includes, for example, vibrations that occur when the wind blows or a 3D object moves, in response to the changes in the scene (to represent changes in the scene).
  • Interaction-type haptic media occurs to express the sensation when a user's avatar touches a 3D object, when the avatar moves a 3D object, or when the avatar collides with a 3D object, etc. vibration, etc.
  • haptic media are not limited to these examples.
  • media associated with 3D data include media that can change in the time direction and media that do not change.
  • Media that can change in the time direction may include, for example, media whose playback content (actions) can change in the time direction.
  • the "media whose playback content can change over time” may include, for example, moving images, long-term audio information, vibration information, and the like.
  • “media whose playback content can change over time” includes, for example, media that is played only during a predetermined time period, and media whose content is played according to the time (for example, media that is displayed according to the time). (images to be played, sounds to be played, media in which the manner of vibration, etc. can be changed), etc. may also be included.
  • media that can change in the time direction may include, for example, media that have associated playback conditions (events) that can change in the time direction.
  • the "media whose linked playback conditions can change in the time direction” may include, for example, media in which the content of the event can change in the time direction, such as touching, pushing, knocking down, etc.
  • “media whose linked playback conditions can change in the time direction” may include, for example, media in which the position at which an event occurs can change in the time direction. For example, media may be included that is played when the right side of the object is touched at time T1, and that is played when the left side of the object is touched at time T2.
  • any media may be used as long as it changes in the time direction, and is not limited to these examples.
  • “media that does not change in the time direction” may include, for example, media in which the playback content (action) does not change in the time direction (media in which the action is the same at any time).
  • “media that does not change in the time direction” includes, for example, media whose associated playback conditions (events) do not change in the time direction (media where the content of the event or the position at which the event occurs is the same at any time). May be included.
  • the ability to change in the time direction is also referred to as "dynamic.”
  • timed media is also referred to as dynamic media.
  • haptic media that can change in the time direction are also referred to as dynamic haptic media.
  • something that does not change in the time direction is also called "static.”
  • media that does not change over time are also referred to as static media.
  • haptic media that does not change over time is also referred to as static haptic media.
  • Non-Patent Document 3 such a haptic media encoding method is proposed.
  • haptic signals (wav) and haptic signal descriptions (ivs, ahap) are encoded using the architecture shown in the upper part of Fig. 18, and are encoded in interchange format (gmap) and distribution format (mpg). is generated.
  • the table at the bottom of FIG. 18 shows an example of the configuration of the distribution format.
  • the haptic media bitstream is composed of a binary header and a binary body.
  • the binary header stores information such as the characteristics of the encoded data (Haptics stream) of the haptics media, the rendering device, and the encoding method. Further, encoded data (Haptics stream) of haptics media is stored in the binary body.
  • FIG. 19 is a diagram showing an example of expanding ISOBMFF for storing the haptic media.
  • a media type 'hapt' is defined to store haptic media.
  • a haptics sample entry has been prepared as a media information box. However, the internal structure of the haptics sample entry was undefined.
  • Non-Patent Document 5 proposes four gLTF extensions, MPEG_haptic, MPEG_material_haptic, MPEG_avatar, and MPEG_interaction, as shown in FIG. 20, in order to support haptic media in scene descriptions.
  • MPEG_haptic is information (for example, link information, etc.) for referencing haptic media data (also referred to as haptics data) referenced from the scene description.
  • This haptics data exists as independent data, similar to data such as audio and images. Further, this haptics data may be encoded (or may be encoded data).
  • MPEG_material_haptic which is a mesh/material extension of an already defined 3D object, defines haptic material information (which haptic media is associated with where in the 3D object (mesh), etc.). This material information defines static haptic media information. Furthermore, information for accessing MPEG_haptic (for example, link information, etc.) can also be defined in this haptic material information.
  • MPEG_avatar defines the 3D shape (avatar) of the user that moves in 3D space.
  • MPEG_interaction lists the conditions that the avatar (user) can perform (what the user can do) and the possible actions (how the object reacts). For example, MPEG_interaction defines the interaction (i.e., event) that occurs between the user (MPEG_avatar) and the 3D object, and the actions that occur as a result (e.g., when the user touches the 3D object, a vibration occurs, etc.).
  • FIG. 1 An example of how to play haptic media using these extensions of scene descriptions is shown in FIG.
  • the avatar defined in MPEG_avatar when the avatar defined in MPEG_avatar generates an interaction (event) defined in MPEG_interaction, an action corresponding to that interaction will be triggered, and a static image will be created according to the location where the interaction occurred according to the material information in MPEG_materal_haptics.
  • haptic media is generated and played (eg, vibrations output by a vibration device are rendered).
  • the haptics data referenced by MPEG_haptic shown in MPEG_materal_haptics is read, and dynamic haptics media is generated and played.
  • Dynamic haptic media support> ⁇ Use of PE/MAF>
  • MPEG_haptics did not have a definition regarding MAF (Fig. 16, etc.).
  • MPEG_materal_haptics could not handle timed metadata (Timed media).
  • client processing using MAF or PE
  • Non-Patent Document 2 it is difficult to reproduce dynamic haptic media using client processing (using MAF or PE) as described in Non-Patent Document 2. Therefore, there was a risk that the playback performance of media data associated with 3D data would be reduced.
  • the scene description specifies an accessor to the buffer in which dynamic haptic media associated with 3D data is stored ( Method 1).
  • a scene description including such a designation is generated on the encoding side and provided to the decoding side.
  • dynamic haptic media is acquired based on the scene description and stored in a buffer corresponding to the designated accessor.
  • an information processing device that generates a scene description file etc. uses an accessor to store dynamic haptic media associated with 3D data in a predetermined storage area.
  • the present invention includes a file generation unit that generates a scene description file that specifies the scene description file.
  • an accessor is used to store dynamic haptic media associated with 3D data in a predetermined storage area. Generate a scene description file that specifies.
  • an information processing device that plays back media associated with 3D data may play back media associated with the 3D data to be played based on the description of the scene description file.
  • an acquisition unit that acquires encoded data of the haptic media
  • a decoding unit that decodes the encoded data and generates haptic media data based on the description of the scene description file
  • the haptics media may be stored in a storage area corresponding to an accessor specified by the scene description file, and haptic media data stored in the storage area based on the description of the scene description file.
  • a generating unit that reads out the haptic media information for output (i.e., control information that controls the driving of the output unit that outputs the haptic media (for example, how the vibration device vibrates), etc.).
  • control information that controls the driving of the output unit that outputs the haptic media (for example, how the vibration device vibrates), etc.
  • dynamic The coded data of the haptic media is acquired, the coded data is decoded based on the description of the scene description file, the data of the haptic media is generated, and the data of the haptic media is
  • the haptics are stored in the storage area corresponding to the accessor specified by the scene description file, and based on the description of the scene description file, the data of the haptic media stored in the storage area is read out, and the haptics for output are stored. Generate media information.
  • the second information processing device can reproduce dynamic haptic media using client processing using MAF and PE.
  • the first information processing device can enable the second information processing device to play back dynamic haptic media using client processing using MAF or PE.
  • this dynamic haptics media may include synchronous haptics media that is played in synchronization with the progression of the scene in the time direction.
  • the generation unit of the second information processing device reads the data of this synchronous haptic media from the storage area corresponding to the accessor specified by the scene description file at a timing corresponding to a predetermined playback timing. , may generate haptic media information.
  • this dynamic haptic media may include interaction-type haptic media that is played when a predetermined condition is satisfied in the scene due to a user operation or the like.
  • the generation unit of the second information processing device reads the data of this interaction type haptic media from the storage area corresponding to the accessor specified by the scene description file, and generates the haptic media.
  • Tix media information may also be generated.
  • Method 1-1 when method 1 is applied, as shown in the second row from the top of the table in Figure 22, the "material” property can be extended to create dynamic haptics associated with 3D data.
  • An accessor for storing the media in a predetermined storage area may be specified (method 1-1). That is, the accessor specification in Method 1 may be performed anywhere in the scene description, but may be performed, for example, in the "material” property defined as material information of texture.
  • the file generation unit of the first information processing device may set an accessor for storing dynamic haptic media associated with 3D data in a predetermined storage area (buffer) in the "material" property. You may also generate a specified scene description file.
  • the storage unit of the second information processing device stores dynamic haptic media data associated with 3D data using an accessor (, The dynamic haptic media associated with the 3D data may be stored in a storage area corresponding to a predetermined storage area (buffer).
  • the accessor in method 1 may be specified in MPEG_material_haptics defined for the material.
  • MPEG_material_haptics may be extended and attributes specified for handling timed metadata may be defined.
  • An example of scene description in that case is shown in FIG.
  • the index of the accessor corresponding to the buffer in which the haptic media specified in MPEG_media is stored is specified (“accessor”: 2).An example of the semantics of the elements included in this description is shown below. It is shown in FIG.
  • the PE can refer to MPEG_media (dynamic haptics media data) stored in the buffer from the material (MPEG_material_haptics) via the accessor, as shown in Figure 23. .
  • MPEG_media dynamic haptics media data
  • MAF can store its MPEG_media in its buffer based on such a description. Therefore, the second information processing device can use PE and MAF to reproduce media data associated with 3D data. In other words, it is possible to suppress a reduction in playback performance of media data associated with 3D data.
  • Interaction processing> For example, in the case of interaction-based media, when an interaction (event) occurs, the client device acquires the file of the media and decrypts the data. However, when retrieving media files at the timing when an interaction occurs in this way, there is a delay in at least the time required for the file retrieval protocol (e.g., retrieving from the server using HTTP (HyperText Transfer Protocol)) and the time required for feedback. There was a risk that Therefore, it has been difficult to reproduce such media at the correct timing. In other words, there was a risk that the playback performance of media data associated with 3D data would be reduced.
  • HTTP HyperText Transfer Protocol
  • the first information processing device includes a file generation unit that generates a scene description file that includes a description of interaction media associated with 3D data. Furthermore, in the first information processing method, a scene description file is generated that includes a description of interaction media associated with the 3D data.
  • the second information processing device includes an acquisition unit that acquires encoded data of interaction-type media associated with 3D data to be played based on a description of interaction-type media included in a scene description file; and a decoding unit that decodes the obtained encoded data based on the description of the scene description file and generates interaction-type media data. Furthermore, in the second information processing method, encoded data of the interaction-type media associated with the 3D data to be played is acquired based on the description of the interaction-type media included in the scene description file, and the scene description Based on the file description, the obtained encoded data is decoded to generate interaction media data.
  • the first information processing device can use the scene description to control the second information processing device to acquire interaction-type media data.
  • the first information processing device can cause the second information processing device to acquire the data of the interaction type media at a timing (in advance) at which the above-described playback delay does not occur.
  • the second information processing device can acquire the data of the interaction type media at a timing (in advance) at which the above-described playback delay does not occur. In other words, it is possible to suppress a reduction in playback performance of media data associated with 3D data.
  • this interaction type media may be any media as long as it executes a process when a predetermined condition is satisfied in a scene due to a user operation or the like.
  • this interactive media may include haptic information.
  • This interaction type media may also include image information.
  • this interactive media may include audio information.
  • interactive media is not limited to these examples.
  • ⁇ Method 2-1> Furthermore, when method 2 is applied, as shown in the fourth row from the top of the table in FIG. ).
  • the description regarding interaction-type media included in the above-mentioned scene description file does not include a description indicating whether or not interaction-type processing that is executed when a predetermined condition is satisfied in the scene due to user operation, etc. is possible. But that's fine.
  • the acquisition unit of the second information processing device may acquire encoded data of interaction-type media.
  • FIG. 26 is a diagram showing a description example of a scene description when method 2 described above is applied. Further, FIG. 27 is a diagram showing an example of the semantics of elements included in the description. As shown in FIG. 26, "event_control" is defined in MPEG_media. As shown in FIG. 27, event_control is flag information indicating whether or not the haptic media playback process is valid based on the event (interaction). For example, when event_control is set to true, it indicates that processing based on the event can be executed, that is, the media being handled is interaction-type media.
  • the second information processing device can identify the description regarding the interaction type media and acquire the data of the interaction type media based on the description. For example, based on the description, the second information processing device can acquire data of the interaction type media at a timing (in advance) at which the above-described playback delay does not occur.
  • the first information processing device can control the second information processing device to acquire the data of the interaction media using the description regarding the interaction media.
  • the first information processing device can cause the second information processing device to acquire the data of the interaction type media at a timing (in advance) at which the above-described playback delay does not occur, based on the description. In other words, it is possible to suppress a reduction in playback performance of media data associated with 3D data.
  • the description regarding interaction-type media included in the above-described scene description file may include a description indicating whether or not the interaction-type media can be selected according to user operation or avatar attribute information.
  • the acquisition unit of the second information processing device may perform user operation or avatar attribute information. You may select interaction type media depending on the situation. Further, if the description regarding interaction-type media included in the scene description file described above indicates that interaction-type media cannot be selected, the acquisition unit of the second information processing device selects the predetermined interaction-type media. You may choose.
  • avatar_dependent_media is defined in MPEG_media.
  • avatar_dependent_media is flag information indicating whether or not it is possible to select the interaction type media to be applied from among a plurality of media according to the user operation or the attribute information of the avatar. For example, when avatar_dependent_media is set to true, it indicates that the interaction type media to be applied can be selected from a plurality of media according to user operations or attribute information of the avatar. In other words, it is shown that a plurality of interactive media are available that can be selected depending on the user or avatar.
  • the second information processing device can select interaction type media based on such a description.
  • the first information processing device can cause the second information processing device to select such interaction type media. This makes it possible to play more diverse media. In other words, it is possible to suppress a reduction in playback performance of media data associated with 3D data.
  • Method 2-3 media data acquisition processing conditions may be described as shown in the sixth row from the top of the table in FIG. 22 (method 2-3).
  • the description regarding interaction media included in the scene description file described above may include a description regarding acquisition of encoded data of interaction media associated with 3D data to be played.
  • the acquisition unit of the second information processing device may acquire the encoded data according to the description regarding acquisition of the encoded data.
  • fetch_timing_information is defined in MPEG_media.
  • fetch_timing_information is a description regarding acquisition of encoded data of interaction type media.
  • the second information processing device can acquire the interaction type media based on this description. In other words, this description allows the first information processing device to control the acquisition of interaction media by the second information processing device in more detail. In other words, it is possible to suppress a reduction in playback performance of media data associated with 3D data.
  • fetch_timing_information which is a description regarding acquisition of interaction-type media
  • fetch_timing_information may include any information.
  • fetch_timing_information may include a description regarding acquisition conditions.
  • the acquisition unit of the second information processing device may acquire the encoded data of the interaction type media when the condition is satisfied.
  • the second information processing device can acquire the interaction media based on the description regarding the acquisition conditions. In other words, this description allows the first information processing device to control the acquisition of interaction media by the second information processing device based on the acquisition conditions.
  • the description regarding this acquisition condition may include a description indicating whether the condition is "before initialization of information necessary for the scene". For example, if the description indicates that the acquisition condition is "before initialization of information necessary for the scene," the acquisition unit of the second information processing device may perform interaction processing before initialization of the information.
  • the encoded data of type media may also be obtained.
  • FIG. 28 shows an example of the semantics of the elements of fetch_timing_information.
  • this Initial is flag information indicating whether or not to obtain encoded data of this interaction type media when initializing information necessary for a scene. For example, if Initial is set to true, it indicates that the encoded data of this interaction type media will be acquired when initializing the information necessary for the scene.
  • the second information processing device can select whether or not to acquire the encoded data of this interaction-type media, for example, when initializing information necessary for a scene.
  • the first information processing device uses this description to control whether or not the second information processing device acquires the encoded data of this interaction type media when initializing information necessary for the scene. can do.
  • the description regarding the acquisition conditions may include a description indicating the LoD (Level Of Detail) of the position corresponding to the interaction type media to be acquired. For example, if the LoD of the position corresponding to the interaction-type media to be acquired is larger than the LoD indicated by the description, even if the acquisition unit of the second information processing device acquires the encoded data of the interaction-type media. good.
  • LoD Level Of Detail
  • Lod is defined as fetch_timing_information in MPEG_media.
  • this Lod is a description indicating acquisition conditions regarding the LoD of encoded data of interaction type media. For example, if the LoD of the position corresponding to the interaction-type media to be acquired is larger than this LoD, the encoded data of the interaction-type media is acquired.
  • the second information processing device for example, when the second information processing device gets sufficiently close to the position corresponding to the interaction-type media to be acquired (when the position is displayed larger than the LoD setting), Encoded data of this interaction type media can be obtained.
  • the first information processing device for example, when it gets sufficiently close to the position corresponding to the interaction type media to be obtained (when the position is displayed larger than the LoD setting), the first information processing device
  • the second information processing device can be controlled to obtain encoded data of the interaction type media.
  • the description regarding the acquisition conditions may include a description indicating the distance to the position corresponding to the interaction type media to be acquired.
  • the acquisition unit of the second information processing device may acquire the encoded data of the interaction media when the viewpoint or the avatar approaches the position within a distance indicated by the description.
  • “Distance” is defined as fetch_timing_information in MPEG_media.
  • This Distance is a description indicating the acquisition condition regarding the viewing distance to the position (mesh/texture) to which this interaction type media is linked. For example, if the viewing distance is closer (shorter) than this Distance, encoded data of the interaction type media is acquired.
  • this Distance setting for example, when the second information processing device gets sufficiently close to the position corresponding to the interaction-type media to be acquired (when the distance is closer than the distance setting), the second information processing device acquires the interaction-type media.
  • encoded data can be obtained.
  • the first information processing device gets sufficiently close to the position corresponding to the interaction-type media to be acquired (if the distance is shorter than the Distance setting)
  • the first information processing device acquires the interaction-type media.
  • the second information processing device can be controlled to acquire the encoded data.
  • the description regarding the acquisition condition may include a description indicating whether the condition is "the position corresponding to the interaction type media is within sight.” For example, if the description indicates that the condition for acquisition is that the position corresponding to the interaction type media comes within the field of view, the acquisition unit of the second information processing device Encoded data of the interaction type media may be obtained.
  • view_frustum is defined as fetch_timing_information in MPEG_media. This view_frustum acquires the encoded data of this interaction type media when the position (mesh/texture) associated with this interaction type media enters the user's (camera) field of view, as shown in Figure 28. Show that.
  • the second information processing device can select whether or not to acquire the encoded data of the interaction-type media when the position corresponding to the interaction-type media comes into view, for example, according to the view_frustum setting. I can do it.
  • the first information processing device determines whether the second information processing device acquires the encoded data of the interaction-type media when the position corresponding to the interaction-type media comes into view. You can control whether or not.
  • the description regarding the acquisition conditions may include a description indicating the recommended time to acquire the interaction media.
  • the acquisition unit of the second information processing device may acquire the encoded data of the interaction media at the recommended time indicated by the description.
  • “recommended_Fetch_time” is defined as fetch_timing_information in MPEG_media. As shown in FIG. 28, this recommended_Fetch_time indicates a time recommended as a time to acquire encoded data of this interaction type media. For example, if a scene in which an interaction is likely to occur is known, a recommended time is set so that encoded data can be acquired at that timing (at an earlier timing).
  • the second information processing device can acquire the encoded data of the interaction-type media at the recommended timing (time) according to the recommended_Fetch_time setting.
  • the first information processing device can control the timing (time) at which the second information processing device acquires the encoded data of the interaction type media.
  • the first information processing device can control the second information processing device to acquire the encoded data of the interaction type media at a more appropriate timing (time).
  • the description regarding the acquisition conditions may include a description indicating a predetermined spatial area in which the interaction media is to be acquired.
  • the acquisition unit of the second information processing device may acquire encoded data of interaction-type media when the viewpoint or avatar is located within the spatial region indicated by the description.
  • fetch_boundaries is defined as fetch_timing_information in MPEG_media. This fetch_boundaries indicates that, as shown in FIG. 28, when the user (camera) is located in the mesh space expressed by this index, encoded data of interaction media is acquired.
  • the second information processing device can acquire the encoded data of the interaction media when the viewpoint or avatar is located within the spatial region indicated by the description.
  • the first information processing device can control the spatial region from which the second information processing device acquires the encoded data of the interaction-type media based on this description.
  • fetch_timing_information may include a description regarding how to obtain interaction media.
  • the acquisition unit of the second information processing device may acquire the encoded data of the interaction media according to the description regarding the acquisition method.
  • the second information processing device can acquire the interaction type media based on the description regarding the acquisition method.
  • the first information processing device can control the acquisition method of the interaction type media by the second information processing device using this description.
  • the description regarding this acquisition method may include a description indicating whether or not encoded data of interaction-type media is included in the 3D data file. For example, if it is indicated that the encoded data of the interaction-type media is not included in the 3D data file, the acquisition unit of the second information processing device may acquire the encoded data.
  • delivery_with_texture_video is defined as fetch_timing_information in MPEG_media.
  • This delivery_with_texture_video indicates whether the encoded data of the interaction type media is stored in the same file as the 3D data at the position (mesh/texture) associated with the interaction type media, as shown in Figure 28. This is flag information to indicate. For example, if delivery_with_texture_video is true, it indicates that encoded data of interaction media is included in the 3D data file. That is, in this case, since the encoded data of the interaction type media can be obtained from the 3D data file, there is no need to obtain any interaction type media file other than the 3D data file.
  • the second information processing device can select the source file for the encoded data of the interaction media according to the delivery_with_texture_video setting. In other words, the second information processing device can select whether to acquire the interaction media file according to the delivery_with_texture_video setting. In other words, the first information processing device can control whether or not the second information processing device acquires the interactive media file based on this description.
  • the description regarding this acquisition method may include a description indicating the priority of interaction media.
  • the acquisition unit of the second information processing device may acquire the encoded data of the interaction type media according to this priority.
  • priority is defined as fetch_timing_information in MPEG_media. As shown in FIG. 28, this priority indicates the priority of playback or rendering of interaction type media. For example, interaction-type media with a high priority indicate that playback or rendering is highly important. For example, it is possible to indicate which of a plurality of media with the same acquisition conditions should be prioritized.
  • the second information processing device can, for example, control the acquisition order of encoded data of interaction-type media or select encoded data to acquire, according to this priority setting.
  • the second information processing device may acquire encoded data from interaction-type media with a high priority first, or may acquire encoded data only from interaction-type media with a sufficiently high priority. good.
  • the first information processing device can control the order in which the second information processing device acquires the encoded data of the interaction type media and the selection of the encoded data to be acquired by the second information processing device.
  • fetch_timing_information may include a description regarding the type of interaction media.
  • the acquisition unit of the second information processing device may acquire encoded data of interaction-type media according to a description regarding the type of interaction-type media.
  • the second information processing device can acquire the interaction type media based on the description regarding the type of the interaction type media.
  • the first information processing device can control the acquisition of interaction media by the second information processing device using this description.
  • the description regarding the type of interaction media may include a description indicating whether the interaction media is dynamic media.
  • the acquisition unit of the second information processing device may acquire the encoded data using a method suitable for the dynamic media.
  • moving_object is defined as fetch_timing_information in MPEG_media. As shown in FIG. 28, this moving_object refers to LoD, Distance, Recommended_Fetch_time, Fetch_boundaries, etc. of Fetch_timinig_information because this interactive media moves in 3D space (it is dynamic). If moving_object is True, dynamically changing values are obtained from the moving_object_metadata file, which is timed metadata.
  • the second information processing device can acquire encoded data of interaction-type media in a method according to its type, according to the moving_object setting.
  • the first information processing device can cause the second information processing device to acquire the encoded data using a method according to the type of interaction media.
  • the second information processing device can acquire the encoded data of the interaction type media using a method according to the setting of this moving_object.
  • the first information processing device can control the method by which the second information processing device acquires the encoded data of the interaction type media using this description.
  • the description regarding the type of interaction media may include a description specifying an accessor corresponding to a storage area that stores dynamic media.
  • the acquisition unit of the second information processing device may store the dynamic media in the storage area corresponding to the accessor specified by this description.
  • the second information processing device can send and receive interaction-type media using the buffer corresponding to the accessor indicated by the accessors.
  • the first information processing device can control the second information processing device to send and receive interaction media using the buffer corresponding to the accessor indicated by the accessors.
  • ⁇ Material> a description regarding the interaction type media as described above may be written in the material of the scene description file.
  • the file generation unit of the first information processing device may generate a scene description file that stores a description regarding the interaction type media in the material.
  • the acquisition unit of the second information processing device may acquire the encoded data of the interaction-type media based on the description of the interaction-type media in the material of the scene description file.
  • Method 2 when method 2 is applied, as shown in the seventh row from the top of the table in FIG. 2-4).
  • the acquisition unit of the second information processing device may acquire the encoded data of the interaction-type media based on the description of the interaction-type media in the material of the scene description file.
  • the description regarding the interaction type media described above is written in MPEG_media.
  • the description regarding the interaction type media is stored as file information in the material of the scene description. Such a configuration may also be used.
  • ⁇ Method 2-5> Furthermore, when method 2 is applied, as shown at the bottom of the table in FIG. 22, a description regarding the interaction media may be stored as pre-processing information of the interaction media in the scene description material. (Method 2-5).
  • the acquisition unit of the second information processing device acquires encoded data of the interaction-type media based on a description of the interaction-type media described as pre-processing information of the interaction-type media in the material of the scene description file. You may obtain it.
  • FIG. 29 shows a description example in that case.
  • the description regarding the interaction type media described above is written as "properties" (pre-processing information) outside of MPEG_media. That is, the description regarding the interaction type media is stored as pre-processing information in the material of the scene description.
  • the semantics of this property is shown in FIG. Note that examples of the semantics of elements such as event_control, avatar_dependent_media, and fetch_timing_information are the same as those in FIGS. 27 and 28. Such a configuration may also be used.
  • FIG. 31 is a block diagram illustrating an example of the configuration of a file generation device that is one aspect of an information processing device to which the present technology is applied.
  • the file generation device 100 shown in FIG. 31 is a device that encodes 3D object content (for example, 3D data such as a point cloud) associated with media such as haptic media, and stores it in a file container such as ISOBMFF.
  • the file generation device 100 also generates a scene description file of the 3D object content.
  • FIG. 31 shows the main things such as the processing unit and the flow of data, and not all of the things shown in FIG. 31 are shown. That is, in the file generation device 100, there may be a processing unit that is not shown as a block in FIG. 31, or there may be a process or a data flow that is not shown as an arrow or the like in FIG.
  • the file generation device 100 includes a control section 101 and a file generation processing section 102.
  • the control unit 101 controls the file generation processing unit 102.
  • the file generation processing unit 102 is controlled by the control unit 101 and performs processing related to file generation.
  • the file generation processing unit 102 includes an input unit 111, a preprocessing unit 112, an encoding unit 113, a preprocessing unit 114, an encoding unit 115, a file generation unit 116, a storage unit 117, and an output unit 118.
  • the file generation section 116 includes an SD file generation section 121, a 3D file generation section 122, and a media file generation section 123.
  • the input unit 111 performs processing related to acquiring data of 3D object content.
  • the input unit 111 may acquire 3D data from outside the file generation device 100.
  • the input unit 111 may acquire media data associated with the 3D data from outside the file generation device 100.
  • the input unit 111 may supply the acquired 3D data to the preprocessing unit 112.
  • the input unit 111 may supply the acquired media data to the preprocessing unit 114.
  • the preprocessing unit 112 executes processing related to preprocessing performed on 3D data before encoding.
  • the preprocessing unit 112 may acquire 3D data supplied from the input unit 111. Further, the preprocessing unit 112 may acquire information necessary for generating a scene description from the acquired 3D data or the like. Furthermore, the preprocessing unit 112 may supply the acquired information to the file generation unit 116 (the SD file generation unit 121 thereof). Further, the preprocessing unit 112 may supply 3D data to the encoding unit 113.
  • the encoding unit 113 executes processing related to encoding 3D data.
  • the encoding unit 113 may acquire 3D data supplied from the preprocessing unit 112.
  • the encoding unit 113 may encode the acquired 3D data and generate the encoded data.
  • the encoding unit 113 may supply the generated encoded data to the file generation unit 116 (3D file generation unit 122 thereof).
  • the preprocessing unit 114 executes processing related to preprocessing performed on media data associated with 3D data before encoding. For example, the preprocessing unit 114 may acquire media data supplied from the input unit 111. Further, the preprocessing unit 114 may acquire information necessary for generating a scene description from the acquired media data or the like. Further, the preprocessing unit 114 may supply the acquired information to the file generation unit 116 (the SD file generation unit 121 thereof). Further, the preprocessing unit 114 may supply media data to the encoding unit 115.
  • the encoding unit 115 executes processing related to encoding media data. For example, the encoding unit 115 may acquire media data supplied from the preprocessing unit 114. Furthermore, the encoding unit 115 may encode the acquired media data and generate the encoded data. Further, the encoding unit 115 may supply the generated encoded data to the file generation unit 116 (the media file generation unit 123).
  • the file generation unit 116 performs processing related to generation of files and the like.
  • the SD file generation unit 121 performs processing related to generation of a scene description file.
  • the 3D file generation unit 122 performs processing related to generation of a 3D file that stores (encoded data of) 3D data.
  • the media file generation unit 123 performs processing related to generation of a media file that stores (encoded data of) media data.
  • the SD file generation unit 121 acquires information supplied from the encoding unit 113 and information supplied from the encoding unit 115.
  • the SD file generation unit 121 generates a scene description based on the information.
  • the SD file generation unit 121 generates a scene description file and stores the generated scene description. Further, the SD file generation unit 121 supplies the scene description file to the storage unit 117.
  • the 3D file generation unit 122 acquires encoded data of 3D data supplied from the encoding unit 113.
  • the 3D file generation unit 122 generates a 3D file and stores the encoded data.
  • the 3D file generation unit 122 supplies the 3D file to the storage unit 117.
  • the media file generation unit 123 acquires encoded data of media data supplied from the encoding unit 115.
  • the media file generation unit 123 generates a media file and stores its encoded data.
  • the media file generation unit 123 supplies the media file to the storage unit 117.
  • the storage unit 117 has an arbitrary storage medium such as a hard disk or a semiconductor memory, and executes processing related to data storage. For example, the storage unit 117 may acquire the scene description file supplied from the SD file generation unit 121 of the file generation unit 116 and store it in the storage medium. Further, the storage unit 117 may acquire a 3D file supplied from the 3D file generation unit 122 of the file generation unit 116 and store it in the storage medium. Further, the storage unit 117 may acquire a media file supplied from the media file generation unit 123 of the file generation unit 116 and store it in the storage medium. Furthermore, the storage unit 117 may read files and the like recorded on the storage medium and supply them to the output unit 118 according to a request from the control unit 101 or the output unit 118 or at a predetermined timing.
  • the storage unit 117 may read files and the like recorded on the storage medium and supply them to the output unit 118 according to a request from the control unit 101 or the output unit 118 or at a predetermined timing.
  • the output unit 118 may acquire the file etc. supplied from the storage unit 117 and output the file etc. to the outside of the file generation device 100 (for example, a distribution server, a playback device, etc.).
  • the above-described first information processing device is used, and ⁇ 3.
  • the present technology described above in Dynamic Haptic Media Support may also be applied.
  • the file generation device 100 can perform ⁇ 3. Dynamic Haptic Media Support> The same effect as described above can be obtained. That is, the file generation device 100 can suppress reduction in playback performance of media data associated with 3D data.
  • the above-described first information processing device is used, and ⁇ 4.
  • the present technology described above in "Support for interaction-based media” may be applied.
  • method 2 may be applied, and the SD file generation unit 121 may generate a scene description file that includes a description of interaction media associated with 3D data.
  • other methods may be applied.
  • a plurality of the present techniques may be applied in combination as appropriate. By doing so, the file generation device 100 can perform ⁇ 4. It is possible to obtain the same effect as described above in ⁇ Support for interaction-type media''. That is, the file generation device 100 can suppress reduction in playback performance of media data associated with 3D data.
  • the input unit 111 of the file generation device 100 acquires 3D data and media data associated with the 3D data in step S101.
  • the preprocessing unit 112 performs preprocessing on the 3D data. For example, the preprocessing unit 112 acquires information used to generate a scene description, which is spatial arrangement information for arranging one or more 3D objects in a 3D space, from the 3D data. Further, the preprocessing unit 114 performs preprocessing on the media data. For example, the preprocessing unit 114 acquires information used to generate a scene description, which is spatial arrangement information for arranging one or more 3D objects in a 3D space, from the media data.
  • step S103 the SD file generation unit 121 uses the information to generate a scene description file that describes the media data associated with the 3D data.
  • step S104 the encoding unit 113 encodes the 3D data and generates encoded data. Furthermore, the encoding unit 115 encodes media data associated with the 3D data to generate encoded data.
  • step S105 the 3D file generation unit 122 generates a 3D file (ISOBMFF) that stores encoded data of 3D data. Furthermore, the media file generation unit 123 generates a media file (ISOBMFF) that stores encoded data of media data.
  • ISOBMFF 3D file
  • the media file generation unit 123 generates a media file (ISOBMFF) that stores encoded data of media data.
  • step S106 the storage unit 117 stores the generated scene description file, 3D file, and media file on the storage medium.
  • the output unit 118 reads the scene description file, 3D file, and media file from the storage unit 117, and outputs the read file to the outside of the file generation device 100 at a predetermined timing.
  • the output unit 118 may transmit (upload) the file read from the storage unit 117 to another device such as a distribution server or a playback device via a communication medium such as a network.
  • the output unit 118 may record the files and the like read from the storage medium onto an external recording medium such as a removable medium. In that case, the output file may be supplied to another device (such as a distribution server or a playback device) via the external recording medium, for example.
  • step S107 ends, the file generation process ends.
  • the file generation device 100 is the above-mentioned first information processing device, and ⁇ 3.
  • the present technology described above in Dynamic Haptic Media Support may also be applied.
  • the SD file generation unit 121 creates a scene description file that specifies an accessor for storing dynamic haptic media associated with 3D data in a predetermined storage area. May be generated. Also, other methods may be applied. Further, a plurality of the present techniques may be applied in combination as appropriate. By doing so, the file generation device 100 can perform ⁇ 3. Dynamic Haptic Media Support> The same effect as described above can be obtained. That is, the file generation device 100 can suppress reduction in playback performance of media data associated with 3D data.
  • the file generation device 100 is the above-described first information processing device, and ⁇ 4.
  • the present technology described above in "Support for interaction-based media" may be applied.
  • method 2 may be applied, and in step S103, the SD file generation unit 121 may generate a scene description file that includes a description of interaction media associated with 3D data.
  • other methods may be applied.
  • a plurality of the present techniques may be applied in combination as appropriate. By doing so, the file generation device 100 can perform ⁇ 4. It is possible to obtain the same effect as described above in ⁇ Support for interaction-type media''. That is, the file generation device 100 can suppress reduction in playback performance of media data associated with 3D data.
  • FIG. 33 is a block diagram illustrating an example of the configuration of a client device that is one aspect of an information processing device to which the present technology is applied.
  • the client device 200 shown in FIG. 33 is a playback device that performs playback processing of 3D data and media data associated with the 3D data based on a scene description.
  • the client device 200 acquires a file generated by the file generation device 100, and reproduces 3D data and media data stored in the file. At this time, the client device 200 performs processing related to the reproduction based on the scene description file.
  • FIG. 33 shows the main things such as the processing unit and the flow of data, and not all of the things shown in FIG. 33 are shown. That is, in the client device 200, there may be a processing unit that is not shown as a block in FIG. 33, or there may be a process or a data flow that is not shown as an arrow or the like in FIG.
  • the client device 200 includes a control section 201 and a client processing section 202.
  • the control unit 201 performs processing related to controlling the client processing unit 202.
  • the client processing unit 202 performs processing related to reproduction of 3D data and media data.
  • the client processing unit 202 includes an SD file acquisition unit 211, an SD file analysis unit 212, a 3D file acquisition unit 213, a 3D data decoding unit 214, a buffer 215, a display information generation unit 216, a media file acquisition unit 217, and a media data decoding unit 218. , a buffer 219, a media information generation section 220, and an output section 221.
  • the SD file acquisition unit 211 performs processing related to acquiring a scene description file.
  • the SD file acquisition unit 211 may acquire a scene description file or the like supplied from outside the client device 200, such as a distribution server or the file generation device 100. Further, the SD file acquisition unit 211 may supply the acquired scene description file to the SD file analysis unit 212.
  • the SD file analysis unit 212 performs processing related to scene description file analysis.
  • the SD file analysis unit 212 may acquire a scene description file supplied from the SD file acquisition unit 211. Further, the SD file analysis section 212 may analyze the scene description file and control the 3D file acquisition section 213 and the media file acquisition section 217 according to the description. That is, the SD file analysis unit 212 may control the acquisition of 3D files and media files according to the description of the scene description file. Furthermore, the SD file analysis section 212 may control the 3D data decoding section 214 and the media data decoding section 218 according to the description in the scene description file. That is, the SD file analysis unit 212 may control the decoding of 3D data and media data according to the description of the scene description file.
  • the SD file analysis unit 212 may control the buffer 215 and the buffer 219 according to the description in the scene description file. That is, the SD file analysis unit 212 may control the storage of 3D data and media data in the buffer according to the description of the scene description file.
  • the 3D file acquisition unit 213 performs processing related to 3D file acquisition under the control of the SD file analysis unit 212.
  • the 3D file acquisition unit 213 may acquire a 3D file or the like supplied from outside the client device 200, such as a distribution server or the file generation device 100. Further, the 3D file acquisition unit 213 may extract encoded data of 3D data stored in the acquired 3D file and supply it to the 3D data decoding unit 214.
  • the 3D data decoding unit 214 performs processing related to 3D data decoding under the control of the SD file analysis unit 212. For example, the 3D data decoding unit 214 may acquire encoded data of 3D data supplied from the 3D file acquisition unit 213. Further, the 3D data decoding unit 214 may decode the encoded data. Further, the 3D data decoding unit 214 may supply the 3D data obtained through the decoding to the buffer 215.
  • the buffer 215 performs processing related to storing 3D data under the control of the SD file analysis unit 212.
  • the buffer 215 may acquire 3D data supplied from the 3D data decoding unit 214.
  • the buffer 215 may also store the 3D data in a storage area specified in the scene description file. Further, the buffer 215 may read 3D data from the storage area and supply it to the display information generation section 216 based on a request from the control section 201 or the display information generation section 216 or at a predetermined timing.
  • the display information generation unit 216 performs processing related to displaying 3D data. For example, the display information generation unit 216 may acquire 3D data read from the buffer 215. Further, the display information generation unit 216 may perform rendering of the 3D data and generate display information (for example, a display image, etc.). Further, the display information generation section 216 may supply the generated display information to the output section 221.
  • the media file acquisition unit 217 performs processing related to media file acquisition under the control of the SD file analysis unit 212.
  • the media file acquisition unit 217 may acquire media files etc. supplied from outside the client device 200, such as a distribution server or the file generation device 100.
  • the media file acquisition unit 217 may extract encoded data of media data stored in the acquired media file and supply it to the media data decoding unit 218.
  • the media data decoding unit 218 performs processing related to decoding of media data under the control of the SD file analysis unit 212.
  • the media data decoding unit 218 may acquire encoded data of 3D data supplied from the media file acquisition unit 217.
  • the media data decoding unit 218 may decode the encoded data.
  • the media data decoding unit 218 may supply the media data obtained through the decoding to the buffer 219.
  • the buffer 219 performs processing related to storing media data under the control of the SD file analysis unit 212.
  • the buffer 219 may acquire media data supplied from the media data decoding unit 218.
  • Buffer 219 may also store the media data in a storage area specified in the scene description file.
  • the buffer 219 may read media data from the storage area and supply it to the media information generation section 220 based on a request from the control section 201 or the media information generation section 220 or at a predetermined timing.
  • the media information generation unit 220 performs processing related to output of media data. For example, the media information generation unit 220 may acquire media data read from the buffer 219. The media information generation unit 220 may also perform rendering of the media data and generate output media information (for example, haptic media information for output, image for display, audio information for output, etc.). . Furthermore, the media information generation section 220 may supply the generated media information to the output section 221.
  • the output unit 221 includes a display device, an audio output device, a haptics device (for example, a vibration device), and outputs the above-mentioned display information and media information (image display, audio output, haptic media output (for example, vibration output)). etc.).
  • the output unit 221 may acquire display information supplied from the display information generation unit 216.
  • the output unit 221 may acquire media information supplied from the media information generation unit 220.
  • the output unit 221 may display the acquired display information on a display unit (for example, a display). Further, the output unit 221 may output the acquired media information to a media output unit (for example, a vibration device, etc.).
  • the above-described second information processing device is used, and ⁇ 3.
  • the present technology described above in Dynamic Haptic Media Support may also be applied.
  • the media file acquisition unit 217 may acquire encoded data of dynamic haptic media associated with the 3D data to be played based on the description of the scene description file. .
  • the media data decoding unit 218 may decode the encoded data based on the description of the scene description file to generate haptic media data.
  • the buffer 219 may store haptic media data in a storage area corresponding to an accessor specified by the scene description file.
  • the media information generation unit 220 may read haptic media data stored in the storage area of the buffer 219 based on the description of the scene description file, and generate haptic media information for output. .
  • other methods may be applied.
  • a plurality of the present techniques may be applied in combination as appropriate.
  • the client device 200 can perform ⁇ 3. Dynamic Haptic Media Support> The same effect as described above can be obtained. That is, the client device 200 can suppress reduction in playback performance of media data associated with 3D data.
  • the client device 200 having the above configuration the above-described second information processing device is used, and ⁇ 4.
  • the present technology described above in "Support for interaction-based media” may be applied.
  • the media file acquisition unit 217 acquires encoded data of interaction-type media associated with 3D data to be played based on the description of interaction-type media included in the scene description file. It's okay. Furthermore, the media data decoding unit 218 may decode the obtained encoded data based on the description of the scene description file to generate interaction-type media data. By doing this, the client device 200 can perform ⁇ 4. It is possible to obtain the same effect as described above in ⁇ Support for interaction-type media''. That is, the client device 200 can suppress reduction in playback performance of media data associated with 3D data.
  • the SD file acquisition unit 211 of the client device 200 acquires a scene description file in step S201. Furthermore, the SD file analysis unit 212 analyzes the scene description file.
  • step S202 the 3D file acquisition unit 213 acquires a 3D file according to the scene description file.
  • step S203 the 3D data decoding unit 214 decodes the encoded data of the 3D data.
  • the buffer 215 stores the 3D data obtained by the decoding in a storage area specified by the scene description file.
  • step S204 the display information generation unit 216 reads out the 3D data stored in the buffer 215 and renders it. That is, the display information generation section 216 generates display information (display image, etc.) using the read 3D data, and supplies it to the output section 221 for display.
  • the process in step S204 ends the process advances to step S209.
  • each process of steps S205 to S208 is executed.
  • step S205 the media file acquisition unit 217 acquires the media file according to the scene description file.
  • step S206 the media file acquisition unit 217 determines whether or not the reproduction conditions of the acquired media file are satisfied, and waits until it is determined that the conditions are satisfied. Furthermore, if it is determined that the reproduction conditions for the acquired media file are satisfied, the process advances to step S207.
  • step S207 the media data decoding unit 218 decodes the encoded data of the media data.
  • the buffer 219 stores the 3D data obtained by the decoding in a storage area specified by the scene description file.
  • step S208 the media information generation unit 220 reads the media data stored in the buffer 219 and renders it. That is, the media information generation section 220 generates media information (vibration information, etc.) using the read media data, and supplies it to the output section 221 for output.
  • the process in step S208 ends the process advances to step S209.
  • step S209 the control unit 201 determines whether or not to end the playback process. If it is determined that the process does not end, the process returns to step S202 and step S205. Furthermore, if it is determined that the playback process should be ended, the playback process is ended.
  • the client device 200 is the second information processing device described above, and ⁇ 3.
  • the present technology described above in Dynamic Haptic Media Support may also be applied.
  • the media file acquisition unit 217 may acquire encoded data of dynamic haptic media associated with the 3D data to be played based on the description of the scene description file. .
  • the media data decoding unit 218 may decode the encoded data based on the description of the scene description file to generate haptic media data.
  • the buffer 219 may store haptic media data in a storage area corresponding to an accessor specified by the scene description file
  • the media information generation unit 220 may store haptic media data in a storage area corresponding to an accessor specified by the scene description file. Based on the haptic media data stored in the storage area, the haptic media information for output may be generated. Also, other methods may be applied.
  • the client device 200 can perform ⁇ 3. Dynamic Haptic Media Support> The same effect as described above can be obtained. That is, the client device 200 can suppress reduction in playback performance of media data associated with 3D data.
  • the client device 200 having the above configuration the above-described second information processing device is used, and ⁇ 4.
  • the present technology described above in "Support for interaction-based media” may be applied.
  • the media file acquisition unit 217 acquires encoded data of the interaction-type media associated with the 3D data to be played, based on the description of the interaction-type media included in the scene description file. You may.
  • the media data decoding unit 218 may decode the obtained encoded data based on the description of the scene description file to generate interaction-type media data.
  • other methods may be applied.
  • a plurality of the present techniques may be applied in combination as appropriate. By doing this, the client device 200 can perform ⁇ 4. It is possible to obtain the same effect as described above in ⁇ Support for interaction-type media''. That is, the client device 200 can suppress reduction in playback performance of media data associated with 3D data.
  • the series of processes described above can be executed by hardware or software.
  • the programs that make up the software are installed on the computer.
  • the computer includes a computer built into dedicated hardware and, for example, a general-purpose personal computer that can execute various functions by installing various programs.
  • FIG. 35 is a block diagram showing an example of the hardware configuration of a computer that executes the series of processes described above using a program.
  • a CPU Central Processing Unit
  • ROM Read Only Memory
  • RAM Random Access Memory
  • An input/output interface 910 is also connected to the bus 904.
  • An input section 911 , an output section 912 , a storage section 913 , a communication section 914 , and a drive 915 are connected to the input/output interface 910 .
  • the input unit 911 includes, for example, a keyboard, a mouse, a microphone, a touch panel, an input terminal, and the like.
  • the output unit 912 includes, for example, a display, a speaker, an output terminal, and the like.
  • the storage unit 913 includes, for example, a hard disk, a RAM disk, a nonvolatile memory, and the like.
  • the communication unit 914 includes, for example, a network interface.
  • the drive 915 drives a removable medium 921 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
  • the CPU 901 executes the above-described series by, for example, loading a program stored in the storage unit 913 into the RAM 903 via the input/output interface 910 and the bus 904 and executing it. processing is performed.
  • the RAM 903 also appropriately stores data necessary for the CPU 901 to execute various processes.
  • a program executed by a computer can be applied by being recorded on a removable medium 921 such as a package medium, for example.
  • the program can be installed in the storage unit 913 via the input/output interface 910 by attaching the removable medium 921 to the drive 915.
  • the program may also be provided via wired or wireless transmission media, such as a local area network, the Internet, or digital satellite broadcasting.
  • the program can be received by the communication unit 914 and installed in the storage unit 913.
  • this program can also be installed in the ROM 902 or storage unit 913 in advance.
  • the present technology can be applied to any configuration.
  • the present technology can be applied to various electronic devices.
  • the present technology can be applied to a processor (e.g., video processor) as a system LSI (Large Scale Integration), a module (e.g., video module) that uses multiple processors, etc., a unit (e.g., video unit) that uses multiple modules, etc.
  • a processor e.g., video processor
  • the present invention can be implemented as a part of a device, such as a set (for example, a video set), which is a unit with additional functions.
  • the present technology can also be applied to a network system configured by a plurality of devices.
  • the present technology may be implemented as cloud computing in which multiple devices share and jointly perform processing via a network.
  • this technology will be implemented in a cloud service that provides services related to images (moving images) to any terminal such as a computer, AV (Audio Visual) equipment, mobile information processing terminal, IoT (Internet of Things) device, etc. You may also do so.
  • a system refers to a collection of multiple components (devices, modules (components), etc.), and it does not matter whether all the components are in the same housing or not. Therefore, multiple devices housed in separate casings and connected via a network, and one device with multiple modules housed in one casing are both systems. .
  • Systems, devices, processing units, etc. to which this technology is applied can be used in any field, such as transportation, medical care, crime prevention, agriculture, livestock farming, mining, beauty, factories, home appliances, weather, and nature monitoring. . Moreover, its use is also arbitrary.
  • the present technology can be applied to systems and devices used for providing ornamental content and the like. Further, for example, the present technology can be applied to systems and devices used for transportation, such as traffic situation supervision and automatic driving control. Furthermore, for example, the present technology can also be applied to systems and devices used for security. Furthermore, for example, the present technology can be applied to systems and devices used for automatic control of machines and the like. Furthermore, for example, the present technology can also be applied to systems and devices used in agriculture and livestock farming. Further, the present technology can also be applied to systems and devices that monitor natural conditions such as volcanoes, forests, and oceans, and wildlife. Furthermore, for example, the present technology can also be applied to systems and devices used for sports.
  • the term “flag” refers to information for identifying multiple states, and includes not only information used to identify two states, true (1) or false (0), but also information for identifying three or more states. Information that can identify the state is also included. Therefore, the value that this "flag” can take may be, for example, a binary value of 1/0, or a value of three or more. That is, the number of bits constituting this "flag" is arbitrary, and may be 1 bit or multiple bits.
  • identification information can be assumed not only to be included in the bitstream, but also to include differential information of the identification information with respect to certain reference information, so this specification
  • flags can be assumed not only to be included in the bitstream, but also to include differential information of the identification information with respect to certain reference information, so this specification
  • flags and “identification information” include not only that information but also difference information with respect to reference information.
  • encoded data may be transmitted or recorded in any form as long as it is associated with encoded data.
  • the term "associate" means, for example, that when processing one data, the data of the other can be used (linked). In other words, data that are associated with each other may be combined into one piece of data, or may be made into individual pieces of data.
  • information associated with encoded data (image) may be transmitted on a transmission path different from that of the encoded data (image).
  • information associated with encoded data (image) may be recorded on a different recording medium (or in a different recording area of the same recording medium) than the encoded data (image). good.
  • this "association" may be a part of the data instead of the entire data.
  • an image and information corresponding to the image may be associated with each other in arbitrary units such as multiple frames, one frame, or a portion within a frame.
  • embodiments of the present technology are not limited to the embodiments described above, and various changes can be made without departing from the gist of the present technology.
  • the configuration described as one device (or processing section) may be divided and configured as a plurality of devices (or processing sections).
  • the configurations described above as a plurality of devices (or processing units) may be configured as one device (or processing unit).
  • part of the configuration of one device (or processing unit) may be included in the configuration of another device (or other processing unit) as long as the configuration and operation of the entire system are substantially the same. .
  • the above-mentioned program may be executed on any device.
  • the device has the necessary functions (functional blocks, etc.) and can obtain the necessary information.
  • each step of one flowchart may be executed by one device, or may be executed by multiple devices.
  • the multiple processes may be executed by one device, or may be shared and executed by multiple devices.
  • multiple processes included in one step can be executed as multiple steps.
  • processes described as multiple steps can also be executed together as one step.
  • the processing of the steps described in the program may be executed chronologically in the order described in this specification, or may be executed in parallel, or may be executed in parallel. It may also be configured to be executed individually at necessary timings, such as when a request is made. In other words, the processing of each step may be executed in a different order from the order described above, unless a contradiction occurs. Furthermore, the processing of the step of writing this program may be executed in parallel with the processing of other programs, or may be executed in combination with the processing of other programs.
  • the present technology can also have the following configuration.
  • an acquisition unit that acquires encoded data of dynamic haptic media associated with 3D data to be played based on the description of the scene description file; a decoding unit that decodes the encoded data and generates data of the haptic media based on the description of the scene description file; a storage unit that stores data of the haptic media in a storage area corresponding to an accessor specified by the scene description file;
  • An information processing device comprising: a generation unit that reads data of the haptic media stored in the storage area based on the description of the scene description file, and generates haptic media information for output.
  • the information processing device stores the data of the haptic media in the storage area corresponding to the accessor specified in the material of the scene description file.
  • the haptic media includes synchronous haptic media that is played in synchronization with the progression of the scene in the time direction,
  • the information processing device reads data of the synchronous haptic media from the storage area at a timing corresponding to a predetermined playback timing, and generates the haptic media information. .
  • the haptic media includes interaction-type haptic media that is played when a predetermined condition is satisfied in the scene by a user operation,
  • the generation unit reads data of the interaction type haptic media from the storage area and generates the haptic media information when the condition is satisfied.
  • An information processing device comprising a file generation unit that generates a scene description file that specifies an accessor for storing dynamic haptic media associated with 3D data in a predetermined storage area.
  • the haptic media includes synchronous haptic media that is reproduced in synchronization with the progression of the scene in the time direction.
  • the haptic media includes interaction-type haptic media that is played when a predetermined condition is satisfied in a scene by a user operation.
  • An information processing method that generates a scene description file that specifies an accessor for storing dynamic haptic media associated with 3D data in a predetermined storage area.
  • an acquisition unit that acquires encoded data of the interaction-type media associated with the 3D data to be played based on the description of the interaction-type media included in the scene description file;
  • An information processing device comprising: a decoding unit that decodes the acquired encoded data based on the description of the scene description file and generates data of the interaction type media.
  • the description regarding the interaction type media includes a description indicating whether or not interaction type processing is possible to be executed when a predetermined condition is satisfied in the scene by a user operation, The information processing device according to (21), wherein the acquisition unit acquires the encoded data when it is indicated that the interaction type processing is possible.
  • the description regarding the interaction type media includes a description indicating whether or not the interaction type media can be selected according to a user operation or attribute information of an avatar;
  • the acquisition unit includes: If it is indicated that the interaction type media can be selected, selecting the interaction type media according to the user operation or the attribute information of the avatar;
  • the description regarding the interaction type media includes a description regarding the acquisition of the encoded data, The information processing device according to any one of (21) to (23), wherein the acquisition unit acquires the encoded data according to a description regarding acquisition of the encoded data.
  • the description regarding acquisition of the encoded data includes a description regarding acquisition conditions, The information processing device according to (24), wherein the acquisition unit acquires the encoded data when the condition is satisfied.
  • the description regarding the condition includes a description indicating whether the condition is before the initialization of information necessary for the scene, The information processing device according to (25), wherein the acquisition unit acquires the encoded data before the information is initialized, if the description indicates that the condition is before the information is initialized.
  • the description regarding the condition includes a description indicating the LoD of the position corresponding to the interaction type media, The information processing device according to (25) or (26), wherein the acquisition unit acquires the encoded data when the LoD at the position is larger than the LoD indicated by the description.
  • the description regarding the condition includes a description indicating a distance to a position corresponding to the interaction type media, The information processing according to any one of (25) to (27), wherein the acquisition unit acquires the encoded data when the viewpoint or the avatar approaches the position within the distance indicated by the description. Device.
  • the description regarding the condition includes a description indicating whether the condition is that the position corresponding to the interaction type media is within sight, If the description indicates that the condition is that the position is within the field of view, the acquisition unit acquires the encoded data when the position enters the field of view (25) to (28).
  • the information processing device according to any one of.
  • the description regarding the conditions includes a description indicating a recommended time for acquiring the encoded data, The information processing device according to any one of (25) to (29), wherein the acquisition unit acquires the encoded data at the recommended time indicated by the description.
  • the description regarding the condition includes a description indicating a predetermined spatial area, The information processing device according to any one of (25) to (30), wherein the acquisition unit acquires the encoded data when a viewpoint or an avatar is located within the spatial region indicated by the description.
  • the description regarding the acquisition of the encoded data includes a description regarding the acquisition method of the encoded data, The information processing device according to any one of (24) to (31), wherein the acquisition unit acquires the encoded data according to a description regarding the acquisition method.
  • the description regarding the acquisition method includes a description indicating whether the encoded data is included in the 3D data file, The information processing apparatus according to (32), wherein the acquisition unit acquires the encoded data when it is indicated that the encoded data is not included in the 3D data file.
  • the description regarding the acquisition method includes a description indicating the priority of the encoded data, The information processing device according to (32) or (33), wherein the acquisition unit acquires the encoded data according to the priority.
  • the description regarding the acquisition of encoded data includes a description regarding the type of the interaction type media, The information processing device according to any one of (24) to (34), wherein the acquisition unit acquires the encoded data according to a description regarding the type of the interaction type media.
  • the description regarding the type of the interaction type media includes a description indicating whether the interaction type media is a dynamic media, The information processing device according to (35), wherein the acquisition unit acquires the encoded data in a method according to the dynamic medium, when it is indicated that the interaction type media is the dynamic medium.
  • the description regarding the type of interaction type media includes a description specifying an accessor corresponding to a storage area that stores dynamic media, The information processing device according to (35) or (36), wherein the acquisition unit stores the acquired encoded data in the storage area corresponding to the designated accessor.
  • An information processing device that includes a file generation unit that generates a scene description file that includes a description of interaction media associated with 3D data.
  • Device. The information processing according to (51) or (52), wherein the description regarding the interaction type media includes a description indicating whether or not the interaction type media can be selected according to a user operation or attribute information of an avatar.
  • Device. The information processing device according to any one of (51) to (53), wherein the description regarding the interaction type media includes a description regarding acquisition of the interaction type media.
  • the information processing device (54), wherein the description regarding acquisition of the interaction type media includes a description regarding acquisition conditions.
  • the information processing device (5) or (56), wherein the description regarding the condition includes a description indicating an LoD of a position corresponding to the interaction type media to be acquired.
  • Device. (60) The information processing device according to any one of (55) to (59), wherein the description regarding the condition includes a description indicating a recommended time to acquire the interaction type media. (61) The information processing device according to any one of (55) to (60), wherein the description regarding the condition includes a description indicating a predetermined spatial area from which the interaction type media is acquired. (62) The information processing device according to any one of (54) to (61), wherein the description regarding acquisition of the interaction type media includes a description regarding a method for acquiring the interaction type media.
  • the information processing device (62), wherein the description regarding the acquisition method includes a description indicating whether or not the interaction type media is included in the 3D data file. (64) The information processing device according to (61) or (62), wherein the description regarding the acquisition method includes a description indicating a priority of the interaction type media. (65) The information processing device according to any one of (54) to (64), wherein the description regarding acquisition of the interaction type media includes a description regarding the type of the interaction type media. (66) The information processing device according to (65), wherein the description regarding the type of the interaction type media includes a description indicating whether the interaction type media is a dynamic medium.
  • the information processing device according to any one of (51) to (71), wherein the interaction type media includes image information.
  • the interaction type media includes audio information.
  • An information processing method that generates a scene description file that includes a description of interaction media associated with 3D data.
  • 100 File generation device 101 Control unit, 102 File generation processing unit, 111 Input unit, 112 Preprocessing unit, 113 Encoding unit, 114 Preprocessing unit, 115 Encoding unit, 116 File generation unit, 117 Recording section, 118 output section, 121 SD file generation section, 122 3D file generation section, 123 media file generation section, 200 client device, 201 control section, 202 client processing section, 211 SD file acquisition section, 212 SD file analysis section, 213 3D File acquisition part , 214 3D data decoding unit, 215 buffer, 216 display information generation unit, 217 media file acquisition unit, 218 media data decoding unit, 219 buffer, 220 media information generation unit, 221 output unit

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
PCT/JP2023/010321 2022-03-18 2023-03-16 情報処理装置および方法 Ceased WO2023176928A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202380022930.4A CN118743228A (zh) 2022-03-18 2023-03-16 信息处理装置和方法
JP2024508252A JPWO2023176928A1 (https=) 2022-03-18 2023-03-16
EP23770881.3A EP4496319A4 (en) 2022-03-18 2023-03-16 Information processing device and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263321305P 2022-03-18 2022-03-18
US63/321,305 2022-03-18

Publications (1)

Publication Number Publication Date
WO2023176928A1 true WO2023176928A1 (ja) 2023-09-21

Family

ID=88023408

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/010321 Ceased WO2023176928A1 (ja) 2022-03-18 2023-03-16 情報処理装置および方法

Country Status (4)

Country Link
EP (1) EP4496319A4 (https=)
JP (1) JPWO2023176928A1 (https=)
CN (1) CN118743228A (https=)
WO (1) WO2023176928A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2026009824A1 (ja) * 2024-07-05 2026-01-08 ソニーグループ株式会社 情報処理装置および方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120092146A1 (en) * 2009-12-11 2012-04-19 Gwangju Institute Of Science And Technology Method for expressing haptic information using control information, and system for transmitting haptic information
JP2014239430A (ja) * 2013-05-24 2014-12-18 イマージョン コーポレーションImmersion Corporation 触覚データを符号化及びストリーミングする方法及びシステム
JP2018527655A (ja) * 2015-07-13 2018-09-20 トムソン ライセンシングThomson Licensing ユーザ・ハプティック空間(HapSpace)に基づくハプティック・フィードバックおよびインタラクティブ性を提供する方法および装置
WO2021251185A1 (ja) * 2020-06-11 2021-12-16 ソニーグループ株式会社 情報処理装置および方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120092146A1 (en) * 2009-12-11 2012-04-19 Gwangju Institute Of Science And Technology Method for expressing haptic information using control information, and system for transmitting haptic information
JP2014239430A (ja) * 2013-05-24 2014-12-18 イマージョン コーポレーションImmersion Corporation 触覚データを符号化及びストリーミングする方法及びシステム
JP2018527655A (ja) * 2015-07-13 2018-09-20 トムソン ライセンシングThomson Licensing ユーザ・ハプティック空間(HapSpace)に基づくハプティック・フィードバックおよびインタラクティブ性を提供する方法および装置
WO2021251185A1 (ja) * 2020-06-11 2021-12-16 ソニーグループ株式会社 情報処理装置および方法

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
"Information technology - Coding of audio-visual objects - Part 12: ISO base media file format, TECHNICAL CORRIGENDUM 1", ISO/IEC 14496-12: 2015/COR.1, ISO/IEC JTC 1/SC 29/WG 11, 2016/6/3
"ISO/IEC Standard. Draft Amendment ISO/IEC 14496-12:2020/DAM 1", 20 January 2021, article "ISO/IEC 14496-12:2020/CD Amd-1 Information technology - Coding of audio-visual objects - Part 12: ISO base media file format AMENDMENT 1: Support for new media types (haptics, volumetric visual) and other improvements", pages: 1 - 7, XP009553131 *
"Text of ISO/IEC CD 23090-14 Scene Description for MPEG Media", ISO/IEC JTC 1/SC 29/WG 3N00485, 12 October 2021 (2021-10-12)
CHRIS ULLRICHYESHWANT MUTHUSAMYFABIEN DANIEAUQUENTIN GALVANEPHILIPPE GUILLOTELERIC VEZZOLITITOUAN RABU: "MPEG-I SD Revised Haptic Schema and Processing Model", ISO/IEC JTC 1/SC 29/WG 3M58487_V3, 2021/10
QUENTIN GALVANEFABIEN DANIEAUPHILIPPE GUILLOTELERIC VEZZOLIALEXANDRE HULSKENTITOUAN RABUANDREAS NOLLLARS NOCKENBERG: "WD on the Coded Representation of Haptics - Phase 1", ISO/IEC JTC 1/SC 29/WG 2, M58748, 2021/10
SAURABH BHATIAPATRICK COZZIALEXEY KNYAZEVTONY PARISI, KHRONOS GLTF2.0, 9 June 2017 (2017-06-09), Retrieved from the Internet <URL:https://github.com/KhronosGroup/glTF/tree/master/specification/2.>
See also references of EP4496319A4

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2026009824A1 (ja) * 2024-07-05 2026-01-08 ソニーグループ株式会社 情報処理装置および方法

Also Published As

Publication number Publication date
CN118743228A (zh) 2024-10-01
JPWO2023176928A1 (https=) 2023-09-21
EP4496319A1 (en) 2025-01-22
EP4496319A4 (en) 2025-04-02

Similar Documents

Publication Publication Date Title
JP7775842B2 (ja) 情報処理装置および方法
KR102770771B1 (ko) 이종의 실감 미디어의 리프리젠테이션 및 스트리밍을 위한 데이터 모델
JP7722368B2 (ja) 情報処理装置および方法
JP7823691B2 (ja) 情報処理装置、情報処理方法、再生処理装置及び再生処理方法
JP7722385B2 (ja) 情報処理装置および方法
WO2024024874A1 (ja) 情報処理装置および方法
US12243180B2 (en) Information processing device and method
WO2023176928A1 (ja) 情報処理装置および方法
WO2022220278A1 (ja) 情報処理装置および方法
WO2023054156A1 (ja) 情報処理装置および方法
JP7782550B2 (ja) 情報処理装置および方法
US20260019677A1 (en) Information processing apparatus and method
WO2023204289A1 (ja) 情報処理装置および方法
JP7635394B2 (ja) マルチトラックベースの没入型メディアプレイアウト
US20240193862A1 (en) Information processing device and method
WO2024202638A1 (ja) 情報処理装置および方法
JP7691210B2 (ja) メディアストリーミング及び再生中のプレロール及びミッドロールをサポートするための方法、装置及びコンピュータプログラム
WO2025206029A1 (ja) 情報処理装置および方法
WO2024143466A1 (ja) 情報処理装置および方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23770881

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2024508252

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 202380022930.4

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2023770881

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2023770881

Country of ref document: EP

Effective date: 20241018