WO2021065605A1 - 情報処理装置および情報処理方法 - Google Patents

情報処理装置および情報処理方法 Download PDF

Info

Publication number
WO2021065605A1
WO2021065605A1 PCT/JP2020/035747 JP2020035747W WO2021065605A1 WO 2021065605 A1 WO2021065605 A1 WO 2021065605A1 JP 2020035747 W JP2020035747 W JP 2020035747W WO 2021065605 A1 WO2021065605 A1 WO 2021065605A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
time
metadata
generation unit
reproduction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2020/035747
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
由佳 木山
遼平 高橋
平林 光浩
久野 浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Corp
Original Assignee
Sony Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corp filed Critical Sony Corp
Priority to US17/642,453 priority Critical patent/US20220303641A1/en
Priority to JP2021550644A priority patent/JPWO2021065605A1/ja
Priority to EP20870699.4A priority patent/EP4016994A1/en
Priority to CN202080057094.XA priority patent/CN114223211A/zh
Publication of WO2021065605A1 publication Critical patent/WO2021065605A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44218Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV programme
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/82Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only
    • H04N9/8205Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/23614Multiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/2362Generation or processing of Service Information [SI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4345Extraction or processing of SI, e.g. extracting service information from an MPEG stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4348Demultiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4722End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting additional data associated with the content
    • H04N21/4725End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting additional data associated with the content using interactive regions of the image, e.g. hot spots
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/78Television signal recording using magnetic recording
    • H04N5/782Television signal recording using magnetic recording on tape
    • H04N5/783Adaptations for reproducing at a rate different from the recording rate

Definitions

  • the present invention relates to an information processing device and an information processing method.
  • the distribution of 2D content called 2D video used for distribution of movies and the like is the mainstream. Furthermore, distribution of content called 360-degree video, which can be looked around in all directions, is also provided on various sites on the Internet. 360-degree video is also called 3DoF (Degree of Freedom) video. In both the 2D video and the 3DoF video, the content encoded in 2D is basically distributed and displayed on the client device.
  • 6DoF content is composed of data from one or more 3D models in 3D space.
  • data of the 3D model will be referred to as 3D model data.
  • 6DoF video the video displayed by playing back the 6DoF content on the playback terminal.
  • 6DoF video has a degree of freedom, there is a possibility that the user may miss a scene that should be noted. Therefore, in the distribution of 6DoF content, it is performed to provide the user with viewing area information indicating a line-of-sight direction and a viewpoint position indicating a notable scene. The user can view the attention scene of the 6DoF video based on the viewing area information.
  • temporal playback information that notifies the recommended temporal playback method such as pause, slow playback, and loop playback. It is possible.
  • the original data of the video to be viewed is called media data.
  • the time axis of the media data is called a media timeline.
  • Media data is formed by sample, which is a bitstream arranged along the media timeline. Sample is the smallest unit of bitstream. Each sample is assigned a CTS (Component Time Stamp) according to the media timeline.
  • CTS Component Time Stamp
  • the current method of temporal playback control defined in ISOBMFF does not assume spatial playback control based on viewing of 6DoF content from a free viewpoint direction and viewpoint position. For example, when the video of the 6DoF content is paused, the paused 6DoF content cannot be displayed by changing the line-of-sight direction and the viewpoint position according to the intention of the provider, and an appropriate visual experience is provided. Was at risk of being damaged.
  • the metadata generation unit sets the time elapsed for reproducing each scene of the 6DoF content composed of the three-dimensional model in the three-dimensional space.
  • the temporal reproduction information indicating the display order according to the following and the correction time corresponding metadata including the recommended viewing information indicating the viewpoint position and the line-of-sight direction corresponding to each time of the time elapsed for the reproduction are generated, and the temporal reproduction is performed.
  • associating information indicating that it corresponds to each time of the time lapse for reproduction is generated.
  • EditList It is a figure for demonstrating data reproduction using EditList. It is a system configuration diagram of an example of a distribution system. It is a block diagram of a file generator. It is a figure of an example of EditList. It is a figure which shows an example of the syntax of the modification time correspondence metadata. It is a figure for demonstrating the association of recommended viewing area information with Editlist. It is a figure which shows the ISOBMFF file in 1st Embodiment. It is a block diagram of a client device. It is a flowchart of the file generation processing by the file generation apparatus which concerns on 1st Embodiment. It is a flowchart of the reproduction processing executed by the client apparatus which concerns on 1st Embodiment.
  • Non-Patent Document 1 (above) Non-Patent Document 2: "ISO / IEC 14496-11", Second Edition, 2015-11-01 Non-Patent Document 3: “ISO / IEC 23009-1", Third Edition, 2019-08 Non-Patent Document 4: "ISO / IEC 23001-10", First Edition, 2015-09-01 Non-Patent Document 5: Matroska Media Container (https://www.matroska.org/)
  • the contents described in the above-mentioned non-patent documents are also the basis for determining the support requirements.
  • it is used in the File Structure described in Non-Patent Document 1, the structure / term used in the Scene Description described in Non-Patent Document 2, and the MPEG-DASH standard described in Non-Patent Document 3.
  • the term, the structure / term used in the recommended view port etc. described in Non-Patent Document 4, and the structure / term used in the Matroska standard described in Non-Patent Document 5 are direct in the embodiment. Even if there is no description, it is within the scope of disclosure of this technology and shall meet the support requirements of the scope of claims.
  • technical terms such as Parsing, Syntax, and Semantics are within the scope of disclosure of the present technology even if there is no direct description in the embodiment, and the patent. It shall meet the support requirements of the claims.
  • a method of storing recommended viewing area information indicating a recommended viewing area for spatial playback control is OMAF. It is defined as a recommended viewport in (Omnidirectional media format) (ISO / IEC23090-2).
  • Recommended viewport is a technology that provides images in 3DoF content by designating attention viewpoints such as multi-viewpoint images and spherical images, and areas of interest. As a result, it is possible to dynamically provide a spherical display area of a viewpoint such as director's cut, which is recommended for display.
  • the all-sky video data which is 3DoF content
  • the all-sky video data is configured as a bit stream encoded as two-dimensional video data, and the bit stream follows the DTS (Decoder Time Stamp) indicating the order to be decoded. It consists of arranged samples. Therefore, in playback, in ISOBMFF, for example, when such spherical video data is stored in one track, time-corresponding metadata indicating a recommended viewing area that changes with time of the video is separated. Store in the truck. Then, the reference relationship information for showing the reference relationship of these two tracks is stored in the track reference Box ('tref'BOX).
  • the method of associating the sample of spherical video data with the sample of timed metadata is, for example, a form of associating a sample of time-compatible metadata having the same CTS as a sample of a certain video data. It is done in. As a result, the sample of the spherical image data specified by the time-corresponding metadata based on the CTS becomes a form corresponding to the display area of the image.
  • the content is played on the media timeline, which is the time axis of the content. Therefore, in the recommended viewing of the 6DoF content as described above, the media timeline is temporarily stopped to temporarily stop the media timeline for a certain scene. It is not possible to take measures such as changing the playback method.
  • EditList is a technology mainly used to match the playback timing of images and sounds. EditList can have recommended viewing information such as media time, playback speed, and playback time at the playback speed as a list in terms of syntax. Therefore, by using EditList, it is possible to provide a playback method different from the normal playback method without damaging the media. EditList is signaled to the tracks in the file as EditListBox () 0 or 1. Then, the media playback method in the track including EditListBox () is shown.
  • a scene description is used to arrange the 3D model constituting the 6DoF content in the three-dimensional space.
  • the scene description includes coordinate conversion information for arranging the 3D model constituting the 3D space in the 3D space, access information to the bit stream corresponding to the 3D model, and the like. Therefore, it can be said that the 6DoF content is composed of such a scene description and 3D model data.
  • EditListBox () on the track containing the scene description.
  • each entry has segment_duration, which is the length of time to play the chunk, media_time, which is the time on the media timeline, and media_rate_integer /, which is the playback speed.
  • segment_duration which is the length of time to play the chunk
  • media_time which is the time on the media timeline
  • media_rate_integer / which is the playback speed.
  • FIG. 1 is a diagram for explaining data reproduction using EditList.
  • sample 102 is arranged according to the media timeline.
  • the scene description 101 reproduced according to the Edit List is reproduced like the reproduction data 103.
  • entry_cont is 4, EditList has 4 pieces of playback data.
  • the first entry_cont104 indicates that the data from 0 second to 5 seconds later on the media timeline is reproduced at the normal speed of 1x. Since the second entry_cont105 has the same data as the first data, it indicates that the first data is reproduced again to perform loop reproduction.
  • the third entry_cont106 indicates that the data from 5 seconds to 10 seconds later on the media timeline is reproduced at 1/2 times speed.
  • the fourth entry_cont107 indicates that 10 seconds of data in the media timeline is paused for 5 seconds.
  • a negative value for media_rate in EditList it is also possible to realize rewind playback, which is also called reverse playback.
  • the 6DoF content reproduced using the EditList proceeds to be reproduced on the playback timeline, which is a timeline different from the media timeline. That is, the media timeline is the time axis that the 6DoF content is confident about, and the playback timeline is the time axis during playback.
  • EditList specifies at what time on the playback timeline and at what time on the media timeline the data should be played.
  • the media timeline based on the recommended viewing area information (recommended viewport) and time-corresponding metadata described above and the playback timeline based on the EditList described above are combined and applied to the 6DoF content, the time is displayed in the media timeline. Only the temporal reproduction information of the plyback timeline is applied to the sample of the 6DoF content specified based on the CTS of the corresponding metadata. This means that other operations cannot be performed, that is, the viewpoint direction / position of the recommended viewing area cannot be changed, and the same recommended viewing area is displayed, for example, when the pause expression is performed on the playback timeline. Means to continue.
  • the viewpoint position and line-of-sight direction indicating the recommended viewing area to be displayed corresponding to the playback timeline on the playback timeline etc.
  • Set the modified timed metadata that stores the recommended viewing area information of We also propose a method of setting the recommended viewing time identification information in the modified time-compatible metadata in order to specify the recommended viewing area information corresponding to the playback timeline in the modified time-compatible metadata.
  • FIG. 2 is a system configuration diagram of an example of a distribution system.
  • the distribution system 100 includes a file generation device 1 which is an information processing device, a client device 2 which is a reproduction processing device, and a Web server 3.
  • the file generation device 1, the client device 2, and the Web server 3 are connected to the network 4. Then, the file generation device 1, the client device 2, and the Web server 3 can communicate with each other via the network 4.
  • the distribution system 100 may include a plurality of file generation devices 1 and a plurality of client devices 2, respectively.
  • File generation device 1 Generates a 6DoF content file containing time playback information that specifies the recommended temporal playback order and recommended viewing information that specifies the recommended viewing area according to the passage of time.
  • the file generation device 1 uploads the generated 6DoF content file to the Web server 3.
  • the Web server 3 provides the 6DoF content to the client device 2 will be described, but the distribution system 100 can adopt another configuration.
  • the file generation device 1 may include the functions of the Web server 3, store the generated 6DoF content in its own device, and provide it to the client device 2.
  • the Web server 3 is connected to the client device 2 via the network 4.
  • the Web server 3 holds a 6DoF content file generated by the file generation device 1. Then, the Web server 3 provides the designated 6DoF content according to the request from the client device 2.
  • the client device 2 transmits a transmission request for a 6DoF content file generated by the file generation device 1 to the Web server 3. Then, the client device 2 acquires the 6DoF content specified in the transmission request from the Web server 3 via the network 4. Then, the client device 2 renders in the line-of-sight direction at the viewpoint position designated by the recommended viewing information using the object data designated by the temporal reproduction information according to the reproduction time, and generates an image for display. This playback time becomes the time on the playback timeline. Then, the client device 2 displays the generated image on a display device such as a monitor.
  • the client device 2 acquires 6DoF contents from the file generation device 1.
  • various bitstreams of 6DoF contents may be referred to as media data.
  • FIG. 3 is a block diagram of the file generator.
  • the file generation device 1 includes a data input unit 11, a file generation processing unit 12, a transmission unit 13, and a control unit 14.
  • the control unit 14 executes a process related to the control of the file generation processing unit 12.
  • the control unit 14 performs integrated control such as the operation timing of each unit of the file generation processing unit 12.
  • the file generation processing unit 12 includes a preprocessing unit 121, a metadata generation unit 122, an encoding unit 123, and a file generation unit 124.
  • the data input unit 11 accepts the input of the original data of the target 6DoF content.
  • the data received by the data input unit 11 includes object data and control information for generating metadata.
  • the control information includes, for example, coordinate conversion information, position information, size, and the like. Further, the control information includes temporal reproduction information that specifies a recommended temporal reproduction order and recommended viewing information that specifies a recommended viewing method according to the reproduction order.
  • the data input unit 11 outputs the acquired original data and control information to the pre-processing unit 121 of the file generation processing unit 12.
  • the preprocessing unit 121 receives input of original data and control information of the target 6DoF content. Then, the preprocessing unit 121 specifies the object data and the scene information for each scene included in the original data. Then, the preprocessing unit 121 acquires control information used for coding such as codec information from the control information. Then, the preprocessing unit 121 outputs the object data, the scene information, and the control information used for coding to the coding unit 123. Further, the preprocessing unit 121 outputs the control information to the metadata generation unit 122.
  • the metadata generation unit 122 receives input of control information from the preprocessing unit 121. Then, the metadata generation unit 122 specifies a sample that stores each reproduction unit data arranged in the media timeline in the bit stream from the control information. As shown in FIG. 1, in the scene description 101, the sample 102 is arranged with the media timeline as the time axis. The number assigned to the lower side of the scene description 101 toward the paper represents the time on the media timeline. Each sample 102 has a Composition Timestamp (CTS) represented by C1 to C15 and a Decoding Timestamp (DTS) represented by D1 to D15. That is, the metadata generation unit 122 identifies individual samples 102 arranged along the media timeline.
  • CTS Composition Timestamp
  • DTS Decoding Timestamp
  • FIG. 4 is a diagram of an example of EditList.
  • the metadata generation unit 122 generates EditList 151 represented by the syntax shown in FIG.
  • the metadata generation unit 122 realizes normal reproduction, loop reproduction, 1/2 speed reproduction, and pause by setting each parameter for each entry_count as shown in the usage example 152.
  • Segment_duration is the duration of the corresponding entry_count.
  • Media_time is the time in the corresponding media timeline at the beginning of the corresponding entry_count.
  • Media_rate is the playback speed when normal playback is 1.
  • sample 102 for storing the reproduction unit data at each time is arranged as in the reproduction data 103 of FIG.
  • the numbers in the frame of the reproduced data 103 represent the media time, which is the time on the media timeline. Further, the numbers shown on the lower side of the reproduced data 103 toward the paper represent the time on the playback timeline.
  • entry_count104 represents normal playback
  • entry_count105 represents loop playback
  • entry_count106 represents 1/2 speed playback
  • entry_count107 represents pause.
  • the metadata generation unit 122 associates the data indicating the recommended viewing area information with the data to be reproduced according to the created EditList. Specifically, the metadata generation unit 122 generates metadata corresponding to the correction time indicating the viewpoint position and the line-of-sight direction according to the playback timeline for the scene description reproduced according to the EditList.
  • the time-corresponding metadata is the data corresponding to the passage of time in the media timeline
  • the modification time-corresponding metadata is the data corresponding to the passage of time in the playback timeline.
  • FIG. 5 is a diagram showing an example of the syntax of the metadata corresponding to the correction time.
  • the metadata generation unit 122 generates the modification time-corresponding metadata 153 as shown in FIG.
  • the metadata generation unit 122 sets the viewpoint position information and the line-of-sight direction information at each playback time in the metadata 153 corresponding to the correction time.
  • FIG. 6 is a diagram for explaining the association of recommended viewing area information with the Edit list.
  • the metadata generation unit 122 generates the correction time-corresponding metadata 155 indicating the viewpoint position and the line-of-sight direction for each playback time with respect to the reproduction data 154 in which the scene description is reproduced according to the Edit list.
  • the recommended viewing area information 156 is assigned to each of the samples on the playback timeline, as shown in the modified time-corresponding metadata 155.
  • the number in the frame of the modification time correspondence metadata 155 represents the identification information for identifying the recommended viewing area information corresponding to the time on the playback timeline.
  • the metadata generation unit 122 can show recommended viewing information according to the recommended temporal reproduction order.
  • the metadata generation unit 122 generates metadata such as time information of each sample. After that, the metadata generation unit 122 sends metadata including information on the reference relationship between the generated Editlist, the metadata corresponding to the modification time, and the track storing the EditList to the track storing the modification time-compatible metadata to the file generation unit 124. Output.
  • the coding unit 123 receives input of object data for each scene and control information used for coding from the preprocessing unit 121. Then, the coding unit 123 encodes the object data using the control information to generate each bit stream. Then, the coding unit 123 outputs the generated bit stream to the file generation unit 124.
  • the coding unit 123 receives input of information about the object including coordinate conversion information and access information. Then, the coding unit 123 encodes the coordinate conversion information and the information about the object to generate the scene description. Then, the coding unit 123 outputs the generated scene description data to the file generation unit 124.
  • the file generation unit 124 receives the input of the bit stream from the coding unit 123. Further, the file generation unit 124 receives the input of the scene description data from the encoding unit 123. Further, the file generation unit 124 inputs the metadata including the reference relationship information between the EditList, the metadata corresponding to the modification time, and the track storing the EditList and the track storing the modification time correspondence metadata from the metadata generation unit 122. receive. Then, the file generation unit 124 segments the acquired bit stream and scene description.
  • FIG. 7 is a diagram showing an ISOBMFF file according to the first embodiment.
  • the file generation unit 124 stores the segmented bitstream and the scene description in the mdat shown in the file 159.
  • the file generation unit 124 outputs the segment file in which the segmented bit stream, the scene description, the EditList, and the modification time-corresponding metadata are stored in the ISOBMFF file to the transmission unit 13.
  • the transmission unit 13 receives the input of the segment file of the 6DoF content from the file generation unit 124. Then, the transmission unit 13 transmits the acquired segment file of the 6DoF content to the Web server 3 and uploads it.
  • FIG. 8 is a block diagram of the client device.
  • the client device 2 includes a reproduction processing unit 21, a display unit 22, and a control unit 23.
  • the control unit 23 controls the operation of each unit of the reproduction processing unit 21.
  • the control unit 23 collectively controls the operation timing of each unit of the reproduction processing unit 21.
  • the reproduction processing unit 21 includes a media data acquisition unit 211, a metadata acquisition unit 212, a decoding processing unit 213, a media data acquisition control unit 214, a buffer 215, a display control unit 216, and a display information generation unit 217.
  • the metadata acquisition unit 212 accesses the Web server 3 and acquires metadata from the segment file. Then, the metadata acquisition unit 212 parses the acquired metadata and acquires the management information of the scene description. Further, the metadata acquisition unit 212 acquires the EditList included in the track that stores the management information of the scene description. Further, the metadata acquisition unit 212 analyzes the metadata corresponding to the correction time and acquires the recommended viewing information for each playback time.
  • the metadata acquisition unit 212 acquires a scene description from the Web server 3 and parses it. Then, the metadata acquisition unit 212 outputs the parsing result of the scene description, the temporal reproduction information, and the recommended viewing area information to the display control unit 216. Further, the metadata acquisition unit 212 acquires the coordinate conversion information and the access information to the bit stream from the parsing result of the scene description, and outputs the information to the media data acquisition control unit 214.
  • the media data acquisition control unit 214 receives the coordinate conversion information and the access information to the bit stream from the metadata acquisition unit 212. Then, the media data acquisition control unit 214 selects the bit stream to be reproduced from the coordinate conversion information and the access information to the bit stream. Then, the media data acquisition control unit 214 outputs the information of the selected bit stream to the media data acquisition unit 211.
  • the media data acquisition unit 211 receives the input of the information of the bit stream to be reproduced selected by the media data acquisition control unit 214. Then, the media data acquisition unit 211 accesses the Web server 3 to request and acquire the segment file of the selected bit stream. After that, the media data acquisition unit 211 outputs the acquired bitstream segment file to the decoding processing unit 213.
  • the decoding processing unit 213 receives the input of the bit stream from the media data acquisition unit 211. Then, the decoding processing unit 213 performs compound processing on the acquired bit stream. After that, the decoding processing unit 213 outputs the decoded bit stream to the buffer 215.
  • the display control unit 216 receives the parsing result of the scene description, and the input of the temporal reproduction information and the recommended viewing information from the metadata acquisition unit 212. Then, the display control unit 216 specifies recommended viewing information for each time on the playback timeline. Then, the display control unit 216 collectively outputs the parsing result of the scene description, the temporal reproduction information, and the recommended viewing information for each time of the playback timeline to the buffer 215.
  • the buffer 215 receives the input of the bit stream from the decoding processing unit 213. Further, the buffer 215 receives the input of the parsing result of the scene description, the temporal reproduction information, and the recommended viewing information for each time of the playback timeline from the display control unit 216. Then, the buffer 215 stores the bit stream, the scene description information corresponding to the bit stream, the temporal reproduction information, and the recommended viewing information for each time of the playback timeline in association with each other.
  • the display information generation unit 217 acquires the bit stream, the scene description information corresponding to the bit stream, the temporal reproduction information, and the recommended viewing information for each time of the playback timeline from the buffer 215. Then, the display information generation unit 217 arranges a 3D model according to the reproduction order specified in the temporal reproduction information in the three-dimensional space by using the coordinate conversion information and the temporal reproduction information with respect to the acquired bit stream. .. Further, the display information generation unit 217 renders a 3D model arranged in the three-dimensional space according to the viewpoint position and the line-of-sight direction specified in the recommended viewing information to generate a display image. After that, the display information generation unit 217 supplies the generated display image to the display unit 191.
  • the display unit 22 has a display device such as a monitor.
  • the display unit 22 receives an input of a display image generated by the display information generation unit 217. Then, the display unit 22 reproduces the 6DoF content by displaying the acquired display image on the display device over time.
  • FIG. 9 is a flowchart of a file generation process by the file generation device according to the first embodiment.
  • the data input unit 11 acquires object data according to the media timeline, and control information including temporal reproduction information and recommended viewing area information (step S101). Then, the data input unit 11 outputs the object data corresponding to the media timeline and the control information including the temporal reproduction information and the recommended viewing area information to the preprocessing unit 121 of the file generation processing unit 12.
  • the preprocessing unit 121 divides the data acquired from the data input unit 11 into object data and information related to the object including coordinate conversion information and the like. Then, the preprocessing unit 121 outputs the object data and the control information used for coding to the coding unit 123. Further, the preprocessing unit 121 outputs information on the state of the object, control information such as compression, temporal reproduction information, and recommended viewing area information to the metadata generation unit 122.
  • the metadata generation unit 122 receives input of object state information, control information such as compression, temporal reproduction information, and recommended viewing area information from the preprocessing unit 121. Then, the metadata generation unit 122 generates an EditList using the temporal reproduction information and the control information.
  • the metadata generation unit 122 generates metadata corresponding to the correction time using the temporal reproduction information and the recommended viewing information (step S102). Further, the metadata generation unit 122 also generates other metadata by using the control information. Then, the metadata generation unit 122 outputs the metadata including the EditList and the metadata corresponding to the modification time to the file generation unit 124.
  • the coding unit 123 encodes the object data using the control information to generate a bit stream. Further, the coding unit 123 generates a scene description using the information about the object including the coordinate conversion information acquired from the preprocessing unit 121 (step S103). Then, the coding unit 123 outputs the generated bitstream and scene description data to the file generation unit 124.
  • the file generation unit 124 segments the bitstream from the bitstream data. In addition, the file generation unit 124 segments the scene description (step S104).
  • the file generation unit 124 stores the segmented bit stream, the scene description, the EditList, and the modification time-corresponding metadata in the ISOBMFF to generate the segment file (step S105).
  • the file generation unit 124 stores the Editlist in the track of the scene description and associates it.
  • the file generation unit 104 stores the modification time-corresponding metadata in another track, and associates the track with the track in which the EditList is stored. After that, the file generation unit 124 outputs the generated segment file to the transmission unit 105.
  • the transmission unit 125 acquires a segment file from the file generation unit 124, transmits it to the Web server 3, and uploads it (step S106).
  • FIG. 10 is a flowchart of the reproduction process executed by the client device according to the first embodiment.
  • the metadata acquisition unit 212 acquires the metadata of the 6DoF content to be played back from the Web server 3. Then, the metadata acquisition unit 212 parses the acquired metadata and acquires various metadata including scene description management information, EditList, and modification time-corresponding metadata (step S201).
  • the metadata acquisition unit 212 analyzes the EditList and the metadata corresponding to the modification time, and acquires the temporal reproduction information and the recommended viewing information (step S202). Further, the metadata acquisition unit 212 acquires and parses the scene description. Then, the metadata acquisition unit 212 outputs the parsing result of the scene description, the temporal reproduction information, and the recommended viewing area information to the display control unit 216. Further, the metadata acquisition unit 212 outputs the parsing result of the scene description to the media data acquisition control unit 214. The display control unit 216 receives the input of the scene description parsing result, the temporal reproduction information, and the recommended viewing area information from the metadata acquisition unit 212. After that, the display control unit 216 collectively stores the parsing result of the scene description, the temporal reproduction information for each scene description, and the visual information for each time of the temporal reproduction in the buffer 215.
  • the media data acquisition control unit 214 acquires bitstream access information from the parsing result of the scene description acquired from the metadata acquisition unit 212. Then, the media data acquisition control unit 214 selects a bit stream using the access information. After that, the media data acquisition control unit 214 outputs the information of the selected bit stream to the media data acquisition unit 211.
  • the media data acquisition unit 211 acquires the bit stream selected by the media data acquisition control unit 214 from the Web server 3 (step S203). After that, the media data acquisition unit 211 outputs the acquired bit stream to the decoding processing unit 213.
  • the decoding processing unit 213 decodes the bit stream acquired from the media data acquisition unit 211 (step S204).
  • the decoding processing unit 213 stores the decoded bit stream in the buffer 215.
  • the display information generation unit 217 acquires the bit stream, the temporal reproduction information, the recommended viewing information for each time of the temporal reproduction, and the parsing result of the scene description from the buffer 215. Then, the display information generation unit 217 arranges each 3D model in the three-dimensional space using the arrangement position and the coordinate conversion information indicated by the parsing result of the scene description according to the temporal reproduction information (step S205).
  • the display information generation unit 217 renders each 3D model arranged in the three-dimensional space according to the viewpoint position information and the line-of-sight direction information recommended for each time, and generates a display image (step S206). .. After that, the display information generation unit 217 outputs the generated display image to the display unit 22.
  • the display unit 22 displays the display image acquired from the display information generation unit 217 on a display device such as a monitor (step S207).
  • control unit 23 determines whether or not the reproduction of the 6DoF content is completed (step S208).
  • step S208 negation
  • the control unit 23 instructs the metadata acquisition unit 212 to acquire the scene description at the next time.
  • the metadata acquisition unit 212 receives the instruction from the control unit 23 and returns to step S201.
  • step S208 affirmative
  • the control unit 23 notifies the reproduction processing unit 21 of the end of the file generation.
  • the reproduction processing unit 21 ends the reproduction processing of the 6DoF content.
  • the file generation device generates an EditList having temporal reproduction information in which the recommended reproduction order is represented by the time on the media timeline.
  • the file generator includes recommended viewing information including information on the recommended viewpoint position and line-of-sight direction for each time on the playback timeline, which is the time axis when playback is performed using the temporal playback information.
  • Generate modification time-enabled metadata Then, the file generator stores the EditList in the track of the scene description of the ISOBMFF file, further stores the modification time-corresponding metadata in another track, and defines the reference relationship between the track and the track including the EditList. ..
  • the content is processed once, so that it cannot be viewed in normal playback. ..
  • the time playback information of the media can be stored as a list in EditList, and the content itself is not edited. Therefore, viewing in normal playback is performed. Is also possible.
  • the file generation device 1 stores the information corresponding to EditList in the newly defined box and notifies the temporal reproduction information.
  • the metadata generation unit 122 newly defines PlayListBox () and the like indicating a playlist containing information equivalent to EdtList without extending the definition of EditListBox ().
  • the metadata generation unit 107 sets the temporal playback information to be assigned to the scene description by generating the newly defined PlayListBox () represented by the syntax 201 shown in FIG.
  • FIG. 11 is a diagram showing an example of playlist syntax according to the modified example (1) of the first embodiment.
  • the metadata generation unit 122 realizes normal reproduction, loop reproduction, 1/2 speed reproduction, and pause by setting each parameter for each entry_count as shown in the usage example 202.
  • the metadata generation unit 122 generates metadata corresponding to the correction time indicating the viewpoint position and the line-of-sight direction according to the playback timeline for the scene description played according to the created playlist. Further, the metadata generation unit 122 sets a Track Reference Box (tref) indicating a reference relationship between the track that stores the playlist and the track that stores the metadata corresponding to the modification time.
  • tref Track Reference Box
  • the metadata generation unit 122 generates an EditList for synchronization of image and sound.
  • the metadata generation unit 122 sets the EditList to be stored in a track of a 3D model for video different from the scene description.
  • the file generation unit 124 generates an ISOBMFF file according to the instructions of the metadata generation unit 122.
  • the file generation unit 124 stores the playlist in the box 204 in the track of the scene description of the ISOBMFF file as shown in FIG.
  • FIG. 12 is a diagram showing an example of storing the playlist according to the modified example (1) of the first embodiment in a file of ISOBMFF.
  • the playlist is written in binary.
  • the file generation unit 124 stores the modification time-corresponding metadata in the box 205 of the track different from the track of the scene description.
  • the file generation unit 124 stores the EditList in the box 206 of the track of the 3D model for video. This EditList controls playback synchronization between the video and audio of the 3D model.
  • FIG. 13 is a diagram for explaining a playback state when the playlist and EditList are used at the same time.
  • the 6DoF content generated by the file generation device 1 according to this embodiment is reproduced as shown in FIG.
  • the playback timing chart 207 shows the playback synchronization of the image and sound by the Edit List. That is, the timings of the image reproduction 272 and the audio reproduction 273 are matched with respect to the scene description 271 by the Edit List. Then, when the playback is performed by the playlist after the reproduction and synchronization of the image and sound by the EditList is performed, the playback as in the playback timing chart 208 is performed. In this case, while the image / audio synchronization is performed by this EditList, the reproduction is performed according to the temporal reproduction information shown in the playlist, so that the reproduction such as the reproduction data 218 is performed. For each sample of the reproduction data 218, reproduction such as image / sound reproduction 282 is performed.
  • FIG. 14 is a flowchart of the metadata generation process in the modified example (1) of the first embodiment.
  • the metadata generation unit 122 generates playlist metadata including temporal reproduction information (step S301).
  • the metadata generation unit 122 generates the metadata corresponding to the correction time including the recommended viewing information (step S302).
  • the metadata generation unit 122 generates information that associates the playlist with the metadata corresponding to the correction time (step S303).
  • FIG. 15 is a flowchart of the metadata analysis process in the modified example (1) of the first embodiment.
  • the metadata acquisition unit 212 analyzes the metadata of the 6DoF content to be played back and acquires the information for associating the playlist metadata with the correction time-corresponding metadata (step S311).
  • the metadata acquisition unit 212 analyzes the metadata of the playlist and acquires the temporal reproduction information (step S312).
  • the metadata acquisition unit 212 analyzes the modification time-corresponding metadata associated with the playlist and acquires the recommended viewing area information corresponding to the temporal reproduction information (step S313).
  • the file generation device provides temporal reproduction information using a playlist.
  • EditList is used in its original usage, temporal playback information is applied by playlists, and each can be clearly used properly. That is, the EditList is included for synchronizing the image and sound, and it is possible to play back in the recommended time according to the playlist in the synchronized state. This makes it possible to display 6DoF content according to the recommended playback method and the recommended viewpoint position and line-of-sight direction.
  • the file generation device 1 associates a plurality of temporal reproduction information and a plurality of recommended viewing information with the same scene description.
  • the file generation device 1 according to the present embodiment is also represented by the block diagram of FIG. In the following description, description of the functions of each part similar to that of the first embodiment will be omitted.
  • the metadata generation unit 122 sets EditList_ID, which is an identifier for identifying each EditList, in each EditList. Then, the metadata generation unit 122 generates an extended EditListBox represented by the syntax 301 shown in FIG. FIG. 16 is a diagram showing an example of the syntax of EditList according to the second embodiment.
  • the metadata generation unit 107 sets the temporal reproduction information for each EditList_ID as shown in the syntax 301.
  • the metadata generation unit 122 may realize this extension so as to have a plurality of types of list information in one EditList as shown by the syntax 302 of FIG.
  • FIG. 17 is a diagram showing an example of EditList having a plurality of types of list information.
  • the metadata generation unit 122 generates the modification time-corresponding metadata indicating the recommended viewing information corresponding to each EditList represented by the syntax 303 shown in FIG.
  • FIG. 18 is a diagram showing an example of the syntax of the modification time-corresponding metadata according to the second embodiment.
  • the metadata generation unit 122 sets the EditList_ID of the EditList corresponding to each modification time-corresponding metadata. Then, the metadata generation unit 122 sets each modification time-corresponding metadata to be stored in a different track. Further, the metadata generation unit 122 sets a reference relationship between the track storing the respective modification time-corresponding metadata and the track storing the EditList.
  • the file generation unit 124 generates an ISOBMFF file according to the instructions of the metadata generation unit 122.
  • the file generation unit 124 stores a plurality of EditLists in the track of the scene description in the ISOBMFF file as shown in FIG.
  • FIG. 19 is a diagram showing an example of storing the EditList according to the second embodiment in a file of ISOBMFF.
  • the file generation unit 124 can associate a plurality of EditLists with one track, that is, one scene description.
  • the file generation device associates a plurality of temporal playback information and a plurality of recommended viewing information with one scene description. This makes it possible to have a plurality of variations in the recommended reproduction method. That is, when playing back a certain scene description, it is possible to provide 6DoF content in which one of a plurality of recommended temporal playback methods is selected.
  • FIG. 20 is a diagram showing an example of storing the EditList in the file according to the modified example (1) of the second embodiment.
  • the file generation device 1 If the EditList is stored in a track that contains media such as a scene description, the media track that stores the bitstream information will be processed when the EditList is changed or added. Therefore, in the present modification, the file generation device 1 according to the present modification that extends the EditList to the track other than the track including the bitstream stores the EditList in a place other than the track including the media.
  • the metadata generation unit 122 sets, for example, to store the EditListBox in the idat box in the meta box of the ISOBMFF file.
  • 'Stores etc. elst is an identifier indicating EditListBox.
  • the metadata generation unit 122 generates the modification time-corresponding metadata indicating the recommended viewing information corresponding to each EditList.
  • the metadata generation unit 122 sets the EditList_ID of the EditList corresponding to each modification time-corresponding metadata. Then, the metadata generation unit 122 sets each modification time-corresponding metadata to be stored in a different track. Further, the metadata generation unit 122 sets a reference relationship between the track storing the respective correction time-corresponding metadata and the track of the scene description.
  • the EditList is stored in a location other than the track containing the media, the recommended playback method and viewing method can be provided.
  • the track itself that manages media such as scene descriptions can be changed or added by EditList alone without processing.
  • EditList when used has been described, but when a newly defined playlist is also used, it can be stored in a track different from the track of the scene description.
  • FIG. 21 is a diagram showing an example of storing the EditList in the file according to the modified example (2) of the second embodiment.
  • the metadata generation unit 122 sets a track of metadata corresponding to the modification time including recommended viewing information associated with the EditList as a storage destination of the EditList. In this case, since the modification time-corresponding metadata indicating the recommended viewing information and the EditList indicating the temporal reproduction information are directly linked, the metadata generation unit 122 does not need to set the identification information of the EditList. Further, the metadata generation unit 122 sets a reference relationship between the track storing the respective correction time-corresponding metadata and the track of the scene description.
  • the file generation unit 124 stores the modification time corresponding metadata corresponding to each EditList in the ISOBMFF file according to the instruction of the metadata generation unit 122. Next, as shown in FIG. 20, the file generation unit 124 stores different EditLists associated with each in the box of the modification time-corresponding metadata in the ISOBMFF file.
  • the EditList is stored in the correction time-compatible metadata track that stores the recommended viewing information
  • the recommended playback method and viewing method can be provided.
  • the track itself that manages media such as scene descriptions can be changed or added by EditList alone without processing.
  • FIG. 22 is a diagram showing an example of storing the playlist according to the modified example (1-3) of the second embodiment in a file of ISOBMFF.
  • the file generation device 1 according to the present embodiment associates a plurality of temporal reproduction information with the same scene description by using a playlist including information corresponding to EditList.
  • the metadata generation unit 122 defines a new Playlist Box that stores a playlist containing information equivalent to EditList. For example, the metadata generator 122 generates different playlists with the syntax 201 shown in FIG. The metadata generation unit 122 stores different temporal reproduction information in each playlist. Then, the metadata generation unit 122 sets the storage destination of the generated different playlists in the tracks of the different modification time-corresponding metadata. Further, the metadata generation unit 122 sets a reference relationship between the track storing the respective correction time-corresponding metadata and the track of the scene description.
  • FIG. 23 is a diagram showing an example of storing ISOBMFF in a file when the EditList and the playlist are used at the same time in the modified example (1-3) of the second embodiment.
  • the metadata generation unit 122 generates an EditList for synchronization of image and sound. Then, the metadata generation unit 122 sets the EditList to be stored in a track of a 3D model for video different from the scene description.
  • the file generation unit 124 stores the EditList in the box 206 of the 3D model track for video in the ISOBMFF file according to the instruction of the metadata generation unit 122.
  • This EditList controls playback synchronization between the video and audio of the 3D model.
  • EditList is used in its original way, and temporal playback information is applied by playlists, and each can be used clearly. That is, the EditList is included for synchronizing the image and sound, and it is possible to play back in the recommended time according to the playlist in the synchronized state. This makes it possible to display 6DoF content according to the recommended playback method and the recommended viewpoint position and line-of-sight direction.
  • the recommended playback method and viewing method can be provided.
  • the playlist itself can be changed or added without processing the track itself that manages media such as scene descriptions.
  • the file generation device 1 describes a playlist including information equivalent to EditList in SMIL (Synchronized Multimedia Integration Language).
  • SMIL Synchronized Multimedia Integration Language
  • SMIL is an xml-based multimedia description language aimed at controlling media playback time and laying out presentations on the screen.
  • the metadata generation unit 122 generates a playlist represented by the syntax 311 shown in FIG. 24. Then, the metadata generation unit 122 describes the smile data represented by the syntax 312 in the PlayListBox.
  • FIG. 24 is a diagram showing an example of a playlist according to a modified example (1-4) of the second embodiment.
  • the scr attribute of Video element in SMIL specifies the file name to be played, the URL of the file, etc. in the original usage of SMIL.
  • the control target of the playlist is a track
  • the track to be controlled is associated with the track reference. Therefore, the metadata generation unit 122 is set to the syntax 312 as the scr attribute of the Video element.
  • the media to be controlled is indicated by specifying a tref or the like indicating that the track reference is followed.
  • the metadata generation unit 122 can also represent the media to be controlled by designating the trackID, which is the identification information of the track to be controlled, as the scr attribute of the Video element.
  • the metadata generation unit 122 represents, as attributes for time control, a begin representing the media time on the media timeline, a dur representing the playback time length, and a playback speed. Use speed.
  • the metadata generation unit 122 can provide the playlist described in SMIL with the same information as the playlist described in binary data.
  • the metadata generation unit 122 stores different temporal reproduction information in each playlist. Then, the metadata generation unit 122 sets the storage destination of the generated different playlists in the tracks of the different modification time-corresponding metadata. Further, the metadata generation unit 122 sets a reference relationship between the track storing the respective correction time-corresponding metadata and the track of the scene description.
  • FIG. 25 is a diagram showing an example of storing the playlist according to the modified example (1-4) of the second embodiment in the ISOBMFF file.
  • the file generation unit 124 stores the modification time corresponding metadata corresponding to each playlist in the ISOBMFF file according to the instruction of the metadata generation unit 122.
  • the file generation unit 124 stores the playlists described in SMIL associated with each in the box of the modification time-corresponding metadata in the ISOBMFF file.
  • the file generation device 1 stores both the temporal reproduction information and the recommended viewing information in the modification temporal metadata.
  • the correction time metadata that stores both the time reproduction information and the recommended viewing information will be referred to as the mixed time correspondence metadata.
  • the metadata generation unit 122 generates metadata corresponding to mixed time including temporal reproduction information and recommended viewing information having the syntax 313 shown in FIG. 26.
  • FIG. 26 is a diagram of an example of the mixing time-corresponding metadata according to the modified example (2-1) of the second embodiment.
  • the metadata generation unit 122 stores information indicating the duration of one sample of the media timeline in the playback timeline as PlayBack_Duration in each sample of the metadata corresponding to the mixed time in the syntax 313.
  • the client device 2 continues to play the sample of the scene description for the duration thereof, so that the pause or slow play is performed.
  • the metadata generation unit 122 sets the storage of the metadata corresponding to the mixing time in a track different from the track of the scene description. Further, the metadata generation unit 122 sets a reference relationship between the track storing the respective mixed time-corresponding metadata and the track of the scene description.
  • the file generation unit 124 stores the mixed time-corresponding metadata on a track 314 different from the scene description track in the ISOBMFF file as shown in FIG. 27 according to the instruction of the metadata generation unit 122.
  • FIG. 27 is a diagram showing an example of storing the mixing time-corresponding metadata in the file of ISOBMFF in the modified example (2-1) of the second embodiment.
  • the file generation unit 124 associates the track 314 of the metadata corresponding to the mixed time with the track of the scene description to be reproduced according to the track 314 using a track reference or the like.
  • FIG. 28 is a diagram for explaining an outline of designation by the mixed time correspondence metadata of the temporal reproduction information in the modified example (2-1) of the second embodiment.
  • the scene description 315 is a reproduction target using the temporal reproduction information, and has a plurality of samples 316.
  • the CTS and DTS of the sample are shown in the frame.
  • the mixed time correspondence metadata 317 is metadata that allocates temporal reproduction information to the scene description 315, and has a plurality of samples.
  • the mixed time-corresponding metadata 317 that specifies temporal reproduction information also has CTS and DTS.
  • the number described in the sample of the metadata 317 corresponding to the mixing time is the duration of the corresponding sample 316.
  • sample 318 of the mixed time correspondence metadata 317 is applied to sample 316 having the same CTS of the scene description 315. That is, in sample 318, the CTS of the scene description 315 repeats the sample 316 of C2 by 120 units. That is, it is paused.
  • sample1 included in the group 319 of the mixed time correspondence metadata is applied to the sample having the same CTS of the scene description 315. Since the sample in the meantime is played back by 2 units, the slow playback is performed at half speed.
  • the recommended viewing information specified in the sample of the mixed time-corresponding metadata corresponding to the reproduction of each sample 318 is applied.
  • the file generation device provides temporal reproduction information by mixed temporal metadata storing temporal reproduction information and recommended viewing information. Even in that case, the recommended playback method and viewing method can be provided.
  • the metadata generation unit 122 associates one of the samples of the original scene description with each of the samples 1 of the metadata corresponding to the mixing time, and the metadata corresponding to the number of mixing times corresponding to all the recommended viewing / playback times. Generate sample1.
  • the metadata generation unit 122 generates mixing time-corresponding metadata having the syntax 320 shown in FIG.
  • FIG. 29 is a diagram of an example of the mixing time-corresponding metadata according to the modified example (2-2) of the second embodiment.
  • the metadata generation unit 122 sets the CTS of the sample of the scene description to be linked for each sample of the metadata corresponding to the mixing time.
  • the metadata generation unit 122 stores data of recommended viewing area information including the line-of-sight direction and the viewpoint position corresponding to each sample of the metadata corresponding to the mixing time.
  • FIG. 30 is a diagram for explaining an outline of designation by the mixed time correspondence metadata of the temporal reproduction information in the modified example (2-2) of the second embodiment.
  • the scene description 321 is a reproduction target using the temporal reproduction information, and has a plurality of samples.
  • the CTS and DTS of the sample are shown in the frame.
  • the mixed time correspondence metadata 322 is the metadata that allocates the temporal reproduction information to the scene description 321 and has a plurality of samples.
  • the mixed time-corresponding metadata 322 that specifies the temporal reproduction information also has CTS and DTS.
  • the numbers described in the sample of the metadata 322 corresponding to the mixing time represent the CTS of the sample of the corresponding scene description, and are applied to the samples having the same CTS. That is, the sample included in the group 323 of the metadata 322 corresponding to the mixing time repeats the sample in which the CTS of the scene description 321 is C2. That is, it is paused.
  • the CTS of the scene description 315 reproduces the sample of C6 and C7 twice. That is, the slow playback is performed at half speed. Further, as the sample included in the group 325 of the metadata 322 corresponding to the mixing time, the sample in which the CTS of the scene description 315 repeats C8 to C10. That is, it becomes a loop reproduction. In addition, the recommended viewing information corresponding to each sample of the mixed time-corresponding metadata 322 is applied.
  • the file generation device specifies a sample of the scene description in each sample of the mixed temporal metadata storing the temporal reproduction information and the recommended viewing information. Thereby, it is possible to provide a recommended playback method and viewing method including loop playback and rewind playback.
  • the modified example (1-3) of the second embodiment, the modified example (2-1) of the second embodiment, and the modified example (2-2) of the second embodiment are compared. ..
  • the modified example (1-3) of the second embodiment since the reproduction method is collectively signaled, the change on the production side is easy.
  • the modified example (1-3) of the second embodiment since the recommended viewing control method differs between the playlist and the modified time-corresponding metadata, the client's time management becomes complicated.
  • the modified example (2-1) of the second embodiment the conventional concept of time-corresponding metadata can be used as it is.
  • the time management of the client is relatively easy because the control method of the temporal reproduction method and the recommended viewing method is integrated into one.
  • the reproduction method since the reproduction method is sent in chronological order, it is necessary to acquire the data to be used in advance or to acquire the information for holding the data. Is difficult.
  • the file generation device 1 according to the present embodiment is divided into temporal chunks having the same reproduction method and stored in separate modification time-corresponding metadata.
  • the file generation device 1 according to this modification is set so that the information of each entry_count is sent as the modification time corresponding metadata at the timing of using it.
  • the metadata generation unit 122 generates the modification time-corresponding metadata that stores the temporal reproduction information having the syntax 326 as shown in FIG. 31.
  • the modification time-corresponding metadata that stores the temporal reproduction information according to this embodiment is referred to as the temporal reproduction time-corresponding metadata.
  • FIG. 31 is a diagram of an example of the metadata corresponding to the temporal reproduction time in the modified example (2-3) of the second embodiment. In this case, the EditList information for each entry_conut is stored in each sample of the metadata corresponding to the temporal reproduction time.
  • FIG. 32 is a diagram for explaining an outline of designation by temporal reproduction time correspondence metadata in the modified example (6) of the second embodiment.
  • the metadata generation unit 122 sets EditListBox () indicated by syntax 332 to 333 or the like in each sample of the metadata corresponding to the temporal reproduction time.
  • the metadata generation unit 122 specifies the timing, playback method, and duration of playback of the unit playback data of the scene description in EditListBox (). As a result, the metadata generation unit 122 enables the reproduction shown in the reproduction data 331.
  • the metadata generation unit 122 generates the modification time-corresponding metadata for storing the recommended viewing information shown in FIG. 5, which is the same as the first embodiment.
  • the modified time-corresponding metadata that stores the recommended viewing information according to this embodiment is called the recommended viewing time-corresponding metadata.
  • the metadata generation unit 122 sets the track of the metadata corresponding to the temporal reproduction time and the track of the metadata corresponding to the modification time as separate tracks.
  • the file generation unit 124 stores the metadata corresponding to the temporal playback time on the track 336 different from the scene description of the ISOBMFF file as shown in FIG. 33 according to the instruction of the metadata generation unit 122.
  • the file generation unit 124 associates the track group with the scene description 335 to be reproduced according to the track group by using the track reference or the like.
  • the track of the metadata corresponding to the temporal playback time of the entry and the track of the metadata corresponding to the modification time are sent from the Web server 3 and acquired by the client device 2. ..
  • the client device 2 reproduces the scene description according to the information of the time-playing time-corresponding metadata and the modification time-corresponding metadata. That is, the client device 2 reproduces the reproduction data 331 of FIG. 31 by reproducing the scene description using the temporal reproduction time-corresponding metadata and the correction time-corresponding metadata.
  • the file generation device 1 according to the present embodiment further flicks to indicate the viewing experience when the flick is applied to the correction time-corresponding metadata associated with the reproduction unit data reproduced using the temporal reproduction information. Apply the information.
  • the viewing content by flicking the screen includes the behavior of looking around at that position and the behavior of viewing the 3D model at that position from the surroundings. Can be considered.
  • the user can be provided with the viewing experience that he / she wants to see so that the production side can specify the appropriate behavior according to the content of the 6DoF content.
  • the flick looks around at that viewpoint at a certain time. It will be an experience, and at a certain time, it will be an experience where the flick sees the 3D model that can be seen from that point from the surroundings.
  • the metadata generation unit 122 generates the modification time-corresponding metadata having the syntax 338 shown in FIG. 34.
  • FIG. 34 is a diagram of an example of the modification time correspondence metadata according to the third embodiment.
  • the metadata generation unit 122 adds, for example, information on the behavior by flicking to the correction time-corresponding metadata that stores the recommended viewing information.
  • the information on the behavior by flicking includes the information on the center position of the flick and the information on the type of behavior.
  • the metadata generation unit 122 sets the behavior by flicking as the behavior of viewing the 3D model from the surroundings when the flic_action_flag is 1. Further, the metadata generation unit 122 sets the behavior by flicking as the behavior of looking around from the viewpoint position when the flic_action_flag is 0. Further, the metadata generation unit 122 specifies the information of the center point when looking around the surroundings by flick action, or the information of the center point of the 3D model when viewing the 3D model from the surroundings by flic_center_position.
  • FIG. 35 is a diagram for explaining the assignment of flick behavior to the modification time-corresponding metadata.
  • the modification time-corresponding metadata 501 has a plurality of samples. Then, the number described in the lower part of the frame facing the paper of FIG. 35 of each sample represents the value of flic_action_flag. That is, while the time-corresponding metadata of group 502 is applied, the behavior seen from the surroundings occurs with respect to the flick. Also, while the time-enabled metadata of group 503 is applied, the behavior of looking around for flicks occurs.
  • the file generation device stores the information of the operation generated when the flick operation is performed in the correction time corresponding metadata, and when the flick operation is performed at the time of playback. Provide operation. As a result, it is possible to provide the recommended operation at the timing when the flick is performed, and it is possible to provide an appropriate viewing experience in line with the intention of the provider.
  • the EditList is the information that specifies the entire playback method
  • the data can be acquired and saved in advance by referring to the EditList. For example, it is possible to save the data for rewind playback and to acquire the data in advance for fast forward playback. That is, the client device 2 can estimate which data should be retained by using EditList, at what timing, and for how long it should be retained. Then, the client device 2 can surely provide the 6DoF content by acquiring and holding the data according to the estimate.
  • the file generation device 1 related to the actual mobile phone performs prediction processing including the time when the data indicated by the EditList should be acquired, the time to be retained after the data acquisition, and the sample position which is the data storage position information. Store information with EditList.
  • the metadata generation unit 122 generates prediction processing information (media_data_get_time, media_data_keep_duration, media_sample_position) for the entry specified in EditList and stores it in DataKeepListBox () or the like.
  • FIG. 36 is a diagram of an example of the syntax including the prediction processing information according to the fourth embodiment. For example, if the value of media_data_get_flag indicating that acquisition is performed in advance is 1, the metadata generation unit 122 stores the acquisition time as prediction processing information. Further, if the media_data_keep_flag indicating that the data is retained is 1, the metadata generation unit 122 stores the retention period as the prediction processing information.
  • the metadata generation unit 122 also stores the position information of the sample indicated by the entry of the EditList as the prediction processing information.
  • the position information of the sample includes, for example, the sample number and the byte position. That is, the scene description data surrounded by the same type of frame of the reproduced data 604 is acquired and held by the setting information surrounded by the frame of each line type in the prediction processing information in FIG. 35.
  • the setting information 602 in the prediction processing information indicates that the setting information 602 is acquired at the 3rd second in the playback timeline and held for 4 seconds after the acquisition.
  • the data used for the reproduction of the group 605 in the reproduction data 604 is acquired and retained.
  • the setting information 603 in the prediction processing information indicates that the setting information 603 is held for the next 6 seconds of the period specified in the previous setting information.
  • the data used in the group 606 in the reproduced data 604 is acquired and retained.
  • the metadata generation unit 122 generates the playlist represented by the syntax 607 shown in FIG. 37.
  • FIG. 37 is a diagram of an example of a playlist according to the fourth embodiment.
  • the metadata generation unit 122 registers the identification information of the sample to be reproduced and the duration of reproduction in the playlist.
  • the metadata generation unit 122 generates the prediction processing information represented by the syntax 608 shown in FIG. 38.
  • FIG. 38 is a diagram of an example of prediction processing information according to the fourth embodiment.
  • the metadata generation unit 122 registers the information of the data to be acquired in advance and the information of the time for holding the data in the prediction processing information.
  • the file generation device obtains prediction processing information including the time when the data to be reproduced should be acquired, the time to be retained after the data acquisition, and the data storage position information. Provided to the client device together with EditList. As a result, the client device can acquire and save the data in advance with a low load, and can reliably provide the 6DoF content.
  • the file generation device 1 stores the scene description, the temporal reproduction information, and the recommended viewing area information in a file in the Matroska format shown in FIG. 39.
  • FIG. 39 is an example diagram showing the format of Matroska.
  • the file generation unit 124 stores the common information of the modification time-corresponding metadata in the track entry element of the trak element 701, and stores the sample of the actual modification time-corresponding metadata in the Block element 602 in the Cluster.
  • the file generation unit 124 stores the meta box including the EditList in the Track entry element of the trak element 701 that stores the time-corresponding metadata.
  • FIG. 40 is a diagram showing a description example of a DASH MPD file in the sixth embodiment. This makes it possible to specify a reference to the scene description, associationID / type / codes indicating the type of data, and the like.
  • the scene description of the 6DoF content has been described as an example to which the temporal reproduction information and the recommended viewing area information are applied, but the object to which the temporal reproduction information and the recommended viewing area information are applied is It may be "3DoF +" content.
  • the series of processes described above can be executed by hardware or software.
  • the programs that make up the software are installed on the computer.
  • the computer includes a computer embedded in dedicated hardware, a general-purpose personal computer capable of executing various functions by installing various programs, and the like.
  • FIG. 41 is a hardware configuration diagram of the file generator.
  • the file generation device 1 is realized by the computer 900 shown in FIG.
  • the CPU Central Processing Unit
  • the ROM Read Only Memory
  • the RAM Random Access Memory
  • the input / output interface 910 is also connected to the bus 904.
  • An input unit 911, an output unit 912, a storage unit 913, a communication unit 914, and a drive 915 are connected to the input / output interface 910.
  • the input unit 911 includes, for example, a keyboard, a mouse, a microphone, a touch panel, an input terminal, and the like.
  • the output unit 912 includes, for example, a display, a speaker, an output terminal, and the like.
  • the storage unit 913 is composed of, for example, a hard disk, a RAM disk, a non-volatile memory, or the like.
  • the communication unit 914 includes, for example, a network interface.
  • the drive 915 drives a removable medium 921 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
  • the CPU 901 loads the program stored in the storage unit 913 into the RAM 903 via the input / output interface 910 and the bus 904 and executes the above-described series. Is processed.
  • the RAM 903 also appropriately stores data and the like necessary for the CPU 901 to execute various processes.
  • the program executed by the CPU 901 can be recorded and applied to the removable media 921 as a package media or the like, for example.
  • the program can be installed in the storage unit 913 via the input / output interface 910 by mounting the removable media 921 in the drive 915.
  • This program can also be provided via wired or wireless transmission media such as local area networks, the Internet, and digital satellite broadcasting. In that case, the program can be received by the communication unit 914 and installed in the storage unit 913.
  • this program can be installed in advance in ROM 902 or storage unit 913.
  • (1) Corresponds to the temporal reproduction information indicating the display order according to the time lapse for reproduction of each scene of the 6DoF content composed of the three-dimensional model in the three-dimensional space, and each time of the time lapse for reproduction.
  • Generate correction time-corresponding metadata including recommended viewing information indicating the viewpoint position and the line-of-sight direction, and the temporal reproduction information and the correction time-corresponding metadata correspond to each time of the time lapse for reproduction.
  • the metadata generation unit stores the identification information of the temporal reproduction information corresponding to the recommended viewing information in the modification time-corresponding metadata and generates the association information.
  • Information processing device Described in (1), stores the identification information of the temporal reproduction information corresponding to the recommended viewing information in the modification time-corresponding metadata and generates the association information.
  • the metadata generation unit generates other temporal reproduction information different from the temporal reproduction information and other modification time-corresponding metadata including other recommended viewing information different from the recommended viewing information.
  • the temporal reproduction information and the other temporal reproduction information are stored in a different track
  • the identification information of the temporal reproduction information is stored in the modification time correspondence metadata
  • the other modification time correspondence metadata is stored.
  • the information processing apparatus according to (2) wherein the identification information of the other temporal reproduction information is stored and linked.
  • the metadata generation unit includes the recommended viewing information and the information of the temporal reproduction information by storing the temporal reproduction information at each time of the scene in the modification time corresponding metadata.
  • the information processing apparatus wherein the mixing time-corresponding metadata is generated, and the mixing time-corresponding metadata and each of the scenes are associated with each time of the scene.
  • the metadata generation unit stores the temporal reproduction information at each time of the time lapse for reproduction in the correction time corresponding metadata, so that the recommended viewing information and the temporal reproduction information can be obtained.
  • the information processing apparatus according to (1) wherein the mixed time-corresponding metadata including information is generated, and the data at each time and each scene are linked to the time lapse for reproduction of the mixed metadata.
  • the metadata generation unit generates the temporal reproduction time-corresponding metadata for storing the information of the temporal reproduction information by having the temporal reproduction information at each time of the time lapse for reproduction.
  • the information processing apparatus according to (1), which associates the temporal reproduction time correspondence with the metadata and the correction time correspondence metadata.
  • the metadata generation unit generates predetermined operation information that makes a predetermined change in the viewpoint position or the line-of-sight direction when a predetermined operation is received during playback of the 6DoF content.
  • a file generation unit that generates a file including the data of the 6DoF content, the temporal reproduction information, the modification time corresponding metadata, the association information, and the predetermined operation information is further provided.
  • the information processing device according to any one of (1) to (7).
  • the metadata generation unit generates acquisition control information including acquisition timing and holding time of data of each scene based on the temporal reproduction information.
  • a file generation unit that generates a file including the data of the 6DoF content, the temporal reproduction information, the modification time corresponding metadata, the association information, and the acquisition control information is further provided.
  • the information processing device according to any one of (1) to (8).
  • a playlist including temporal playback information indicating the display order according to the passage of time for playback of each scene of 6DoF content composed of a three-dimensional model in a three-dimensional space is defined and generated, and is used for playback.
  • a modified time-corresponding metadata including recommended viewing information indicating the viewpoint position and the line-of-sight direction corresponding to each time of the time lapse of is generated, and in the playlist and the modified time-corresponding metadata, the time lapse for reproduction is An information processing device equipped with a metadata generation unit that generates association information indicating that each time corresponds to each time.
  • the metadata generation unit supports other playlists including other temporal reproduction information different from the temporal reproduction information and other correction time including other recommended viewing information different from the recommended viewing information.
  • Metadata is generated and stored in a track different from the playlist and the other playlists, the identification information of the playlist is stored in the modification time-corresponding metadata, and the other modification time-corresponding metadata is stored.
  • the information processing device which stores and associates identification information of the other playlist.
  • the metadata generation unit stores the playlist in a track that stores the metadata corresponding to the correction time and generates the association information.
  • the metadata generation unit stores the temporal reproduction information at each time of each scene in the metadata corresponding to the correction time, so that the mixing time including the recommended viewing information and the playlist information is included.
  • the information processing apparatus wherein the corresponding metadata is generated, and the mixed time corresponding metadata and each of the scenes are associated with each time of the scene.
  • the metadata generation unit stores the temporal reproduction information at each time of the elapsed time for reproduction in the correction time corresponding metadata, thereby storing the recommended viewing information and the playlist information.
  • the information processing apparatus according to (10) wherein the mixed time-corresponding metadata including the mixed metadata is generated, and the data at each time and each scene are linked to the time elapsed for reproduction of the mixed metadata.
  • the metadata generation unit generates the temporal reproduction time-corresponding metadata for storing the information of the playlist by having the temporal reproduction information at each time of the time lapse for reproduction, and the said.
  • the information processing apparatus according to (10) which associates the temporal reproduction time correspondence and the metadata with the correction time correspondence metadata.
  • the information processing apparatus according to any one of (10) to (16), wherein the metadata generation unit describes the playlist using Synchronized Multimedia Integration Language (SMIL).
  • SMIL Synchronized Multimedia Integration Language
  • the metadata generation unit generates predetermined operation information that makes a predetermined change in the viewpoint position or the line-of-sight direction when a predetermined operation is received during playback of the 6DoF content.
  • a file generation unit that generates a file including the data of the 6DoF content, the temporal reproduction information, the modification time corresponding metadata, the association information, and the predetermined operation information is further provided.
  • the information processing device according to any one of (10) to (17).
  • the metadata generation unit generates acquisition control information including acquisition timing and holding time of data of each scene based on the temporal reproduction information.
  • a file generation unit that generates a file including the data of the 6DoF content, the temporal reproduction information, the modification time corresponding metadata, the association information, and the acquisition control information is further provided.
  • the information processing device according to any one of (10) to (18).
  • (20) Temporal playback information indicating the display order according to the time lapse for playback of each scene of 6DoF content composed of a three-dimensional model in a three-dimensional space, and the viewpoint position at each time of the time lapse for playback.
  • a metadata generator that stores recommended viewing information indicating changes in the line-of-sight direction in the MPD
  • An information processing device including a file generation unit that generates a file containing the MPD that stores the data of the 6DoF content, the temporal reproduction information, and the recommended viewing information.
  • (21) Corresponds to the temporal reproduction information indicating the display order according to the time lapse for reproduction of each scene of the 6DoF content composed of the three-dimensional model in the three-dimensional space, and each time of the time lapse for reproduction. Generates correction time-enabled metadata that includes recommended viewing information that indicates the viewpoint position and line-of-sight direction. In the temporal reproduction information and the correction time correspondence metadata, associating information indicating that it corresponds to each time of the time lapse for reproduction is generated. An information processing method that causes a computer to perform processing. (22) A playlist including temporal playback information indicating the display order according to the passage of time for playback of each scene of 6DoF content composed of a three-dimensional model in a three-dimensional space is defined and generated.
  • Generate correspondence information that links the playlist with the modification time correspondence metadata An information processing method for causing a computer to execute a process of generating a file including the 6DoF content data, the playlist, the modification time correspondence metadata, and the correspondence information.
  • the first temporal reproduction information indicating the display order according to the time lapse for reproduction of each scene of the 6DoF content and the recommended viewing information indicating the viewpoint position and the line-of-sight direction corresponding to each time of the time lapse for reproduction are provided.
  • Display information generator to be generated and A reproduction processing device including a display unit that displays the display image generated by the display information generation unit according to the passage of time for reproduction.
  • a media data acquisition unit that acquires 6DoF content data composed of a 3D model in a 3D space, and a media data acquisition unit.
  • a playlist including temporal playback information indicating the display order according to the time lapse for playback of each scene of the 6DoF content, and recommended viewing information indicating the viewpoint position and line-of-sight direction at each time of the time lapse for playback.
  • Metadata acquisition unit that acquires the metadata corresponding to the correction time including, and the file including the association information indicating that the playlist and the metadata corresponding to the correction time correspond to each time of the elapsed time for reproduction.
  • the 6DoF A display information generator that generates images for displaying content
  • a reproduction processing device including a display unit that displays the display image generated by the display information generation unit according to the passage of time for reproduction. (25) Acquire the data of 6DoF contents composed of the 3D model in the 3D space, and obtain the data.
  • the first temporal reproduction information indicating the display order according to the time lapse for reproduction of each scene of the 6DoF content and the recommended viewing information indicating the viewpoint position and the line-of-sight direction corresponding to each time of the time lapse for reproduction are provided.
  • a playlist including temporal playback information indicating the display order according to the time lapse for playback of each scene of the 6DoF content, and recommended viewing information indicating the viewpoint position and line-of-sight direction at each time of the time lapse for playback.
  • the 6DoF Generate an image for displaying the content
  • a reproduction processing device that causes a computer to execute a process of displaying the display image generated by the display information generation unit according to the passage of time for reproduction.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Signal Processing For Recording (AREA)
PCT/JP2020/035747 2019-10-01 2020-09-23 情報処理装置および情報処理方法 Ceased WO2021065605A1 (ja)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US17/642,453 US20220303641A1 (en) 2019-10-01 2020-09-23 Information processing device and information processing method
JP2021550644A JPWO2021065605A1 (https=) 2019-10-01 2020-09-23
EP20870699.4A EP4016994A1 (en) 2019-10-01 2020-09-23 Information processing device and information processing method
CN202080057094.XA CN114223211A (zh) 2019-10-01 2020-09-23 信息处理装置和信息处理方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-181750 2019-10-01
JP2019181750 2019-10-01

Publications (1)

Publication Number Publication Date
WO2021065605A1 true WO2021065605A1 (ja) 2021-04-08

Family

ID=75336444

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/035747 Ceased WO2021065605A1 (ja) 2019-10-01 2020-09-23 情報処理装置および情報処理方法

Country Status (5)

Country Link
US (1) US20220303641A1 (https=)
EP (1) EP4016994A1 (https=)
JP (1) JPWO2021065605A1 (https=)
CN (1) CN114223211A (https=)
WO (1) WO2021065605A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592237A (zh) * 2021-07-01 2021-11-02 中国联合网络通信集团有限公司 一种教学质量评估方法及电子设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7592465B2 (ja) * 2020-11-11 2024-12-02 キヤノン株式会社 音響処理装置、音響処理方法、およびプログラム
US12293152B1 (en) * 2024-09-11 2025-05-06 The Florida International University Board Of Trustees Systems and methods for performing temporal analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019062390A (ja) * 2017-09-26 2019-04-18 キヤノン株式会社 情報処理装置、情報提供装置、制御方法、及びプログラム
WO2019155930A1 (ja) * 2018-02-07 2019-08-15 ソニー株式会社 送信装置、送信方法、処理装置および処理方法
JP2019152972A (ja) * 2018-03-01 2019-09-12 キヤノン株式会社 配信装置、情報処理方法及びプログラム

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2628297B1 (en) * 2010-10-15 2017-07-12 Thomson Licensing Method for synchronizing multimedia flows and corresponding device
US9071853B2 (en) * 2012-08-31 2015-06-30 Google Technology Holdings LLC Broadcast content to HTTP client conversion
WO2014145137A2 (en) * 2013-03-15 2014-09-18 General Instrument Corporation File transfer based upon streaming format
EP3741108A4 (en) * 2018-01-17 2021-10-13 Nokia Technologies Oy APPARATUS, PROCESS AND COMPUTER PROGRAM FOR OMNIDIRECTIONAL VIDEO
JP2019125303A (ja) * 2018-01-19 2019-07-25 キヤノン株式会社 情報処理装置、情報処理方法、およびプログラム
WO2019194434A1 (ko) * 2018-04-05 2019-10-10 엘지전자 주식회사 복수의 뷰포인트들에 대한 메타데이터를 송수신하는 방법 및 장치
US10356387B1 (en) * 2018-07-26 2019-07-16 Telefonaktiebolaget Lm Ericsson (Publ) Bookmarking system and method in 360° immersive video based on gaze vector information
EP3712854B1 (en) * 2019-03-19 2024-07-24 Nokia Technologies Oy Method and apparatus for storage and signaling of static point cloud data
WO2021064293A1 (en) * 2019-10-02 2021-04-08 Nokia Technologies Oy Method and apparatus for storage and signaling of sub-sample entry descriptions

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019062390A (ja) * 2017-09-26 2019-04-18 キヤノン株式会社 情報処理装置、情報提供装置、制御方法、及びプログラム
WO2019155930A1 (ja) * 2018-02-07 2019-08-15 ソニー株式会社 送信装置、送信方法、処理装置および処理方法
JP2019152972A (ja) * 2018-03-01 2019-09-12 キヤノン株式会社 配信装置、情報処理方法及びプログラム

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"ISO/IEC 14496-11", 1 November 2015
"ISO/IEC 14496-12", 15 December 2015
"ISO/IEC 23001-10", 1 September 2015
"ISO/IEC 23009-1", August 2019

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592237A (zh) * 2021-07-01 2021-11-02 中国联合网络通信集团有限公司 一种教学质量评估方法及电子设备
CN113592237B (zh) * 2021-07-01 2023-06-09 中国联合网络通信集团有限公司 一种教学质量评估方法及电子设备

Also Published As

Publication number Publication date
EP4016994A1 (en) 2022-06-22
JPWO2021065605A1 (https=) 2021-04-08
CN114223211A (zh) 2022-03-22
US20220303641A1 (en) 2022-09-22

Similar Documents

Publication Publication Date Title
JP7801052B2 (ja) 高品質のエクスペリエンスのためのオーディオメッセージの効率的な配信および使用のための方法および装置
CN111316652A (zh) 使用对齐编码内容片段的个性化内容流
CN105765990A (zh) 视频广播系统和传播视频内容的方法
US20250380037A1 (en) Information processing apparatus, information processing method, reproduction processing apparatus, and reproduction processing method
US11205456B2 (en) Methods and apparatus for using edit operations to perform temporal track derivations
WO2021065605A1 (ja) 情報処理装置および情報処理方法
US11967153B2 (en) Information processing apparatus, reproduction processing apparatus, and information processing method
JP7287454B2 (ja) 情報処理装置、再生処理装置、情報処理方法及び再生処理方法
KR20200101349A (ko) 정보 처리 장치, 정보 처리 방법 및 프로그램
JP2016072858A (ja) メディアデータ生成方法、メディアデータ再生方法、メディアデータ生成装置、メディアデータ再生装置、コンピュータ読み取り可能な記録媒体、及びプログラム
JP7647554B2 (ja) ファイル生成装置、ファイル生成方法、再生処理装置及び再生処理方法
CN112188256A (zh) 信息处理方法、信息提供方法、装置、电子设备及存储介质
US11974028B2 (en) Information processing device, information processing method, reproduction processing device, and reproduction processing method
KR102659489B1 (ko) 정보 처리 장치, 정보 처리 장치 및 프로그램
HK40106873A (zh) 用於高体验质量的音频消息的有效传递和使用的方法和装置
VRT et al. First Version of Playout Clients
HK40109061A (zh) 用於高体验质量的音频消息的有效传递和使用的方法和装置
HK40107188A (zh) 用於高体验质量的音频消息的有效传递和使用的方法和装置
WO2021140956A1 (ja) 情報処理装置および方法
HK40106874A (zh) 用於高体验质量的音频消息的有效传递和使用的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20870699

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021550644

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020870699

Country of ref document: EP

Effective date: 20220318

NENP Non-entry into the national phase

Ref country code: DE