WO2022116822A1 - 沉浸式媒体的数据处理方法、装置和计算机可读存储介质 - Google Patents

沉浸式媒体的数据处理方法、装置和计算机可读存储介质 Download PDF

Info

Publication number
WO2022116822A1
WO2022116822A1 PCT/CN2021/131108 CN2021131108W WO2022116822A1 WO 2022116822 A1 WO2022116822 A1 WO 2022116822A1 CN 2021131108 W CN2021131108 W CN 2021131108W WO 2022116822 A1 WO2022116822 A1 WO 2022116822A1
Authority
WO
WIPO (PCT)
Prior art keywords
viewpoint
recommended
window
target
immersive media
Prior art date
Application number
PCT/CN2021/131108
Other languages
English (en)
French (fr)
Inventor
胡颖
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP21899856.5A priority Critical patent/EP4258222A4/en
Publication of WO2022116822A1 publication Critical patent/WO2022116822A1/zh
Priority to US17/961,003 priority patent/US20230025664A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/261Image signal generators with monoscopic-to-stereoscopic image conversion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/363Image reproducers using image projection screens
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234345Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26258Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for generating a list of items to be played back in a given order, e.g. playlist, or scheduling item distribution according to such list
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8543Content authoring using a description language, e.g. Multimedia and Hypermedia information coding Expert Group [MHEG], eXtensible Markup Language [XML]

Definitions

  • the present application relates to the field of computer technology, and more particularly, to data processing for immersive media.
  • the content of immersive media is usually divided into multiple samples (for example, image frames, or video frames), and these samples are encapsulated in track groups according to their association.
  • the so-called association means that all the samples in the track group correspond to the same Viewpoints, or samples in one track group correspond to one viewpoint, or samples in one track group correspond to different viewpoints, but samples in one track group can only be presented in a certain time sequence.
  • the present application provides a data processing method, device and computer-readable storage medium for immersive media, which can enrich the presentation form of immersive media.
  • a data processing method for immersive media is provided, the method is performed by a media playback device, and the method includes:
  • the recommended window data box of the immersive media file where the recommended window data box is used to define the switching information of the viewpoint of the immersive media file and the switching information of the recommended window;
  • a data processing method for immersive media is provided, the method is executed by a media production device, and the method includes:
  • a recommended window data box is configured according to the viewpoint information and the window information when the immersive media file is presented, wherein the recommended window data box is used to define the switching information of the viewpoint and the switching information of the recommended window of the immersive media file.
  • a data processing apparatus for immersive media is provided, the apparatus is deployed on a media playback device, and the apparatus includes:
  • an acquisition unit used for acquiring the recommended window data box of the immersive media file, and the recommended window data box is used to define the switching information of the viewpoint of the immersive media file and the switching information of the recommended window;
  • a determining unit configured to determine the target viewpoint and the target recommended window according to the switching information of the current viewpoint and the switching information of the current recommended window;
  • a presentation unit configured to switch to the target viewpoint and the target recommended window to present the immersive media file.
  • a data processing apparatus for immersive media is provided, the apparatus is deployed on a media production device, and the apparatus includes:
  • an acquisition unit for acquiring viewpoint information and window information when the immersive media file is presented
  • an encapsulation unit configured to encapsulate a recommended window data box according to the viewpoint information and the window information when the immersive media file is presented, wherein the recommended window data box is used to define the switching information of the viewpoint of the immersive media file and the switching of the recommended window information.
  • a computer device including a processor and a memory.
  • the memory is used for storing a computer program
  • the processor is used for calling and running the computer program stored in the memory to execute the method in the above-mentioned first aspect or each implementation manner thereof.
  • a computer device including a processor and a memory.
  • the memory is used to store a computer program
  • the processor is used to call and run the computer program stored in the memory to execute the method in the second aspect or each of its implementations.
  • a computer-readable storage medium for storing a computer program, the computer program causing a computer to execute the method in any one of the above-mentioned first aspect to the second aspect or each of its implementations.
  • a computer program product comprising computer program instructions, the computer program instructions causing a computer to execute the method in any one of the above-mentioned first to second aspects or the implementations thereof.
  • a computer program which, when run on a computer, causes the computer to perform the method of any one of the above-mentioned first to second aspects or the respective implementations thereof.
  • the media production device can encapsulate the recommended window data box of the immersive media according to the viewpoint information and the window information of the immersive media. Further, the media playback device can follow the switching information of the current viewpoint defined in the recommended window data box. and the switching information of the current recommended window to determine the target viewpoint and the target recommended window, and then switch to the target viewpoint and the target recommended window to present the immersive media file.
  • This solution can flexibly determine the switching method according to the switching information of the viewpoints defined in the recommended window data box and the switching information of the recommended windows, so as to switch between the viewpoints and the recommended windows, thereby diversifying the presentation forms of the media playback device, thereby enriching the The application form of immersive media improves user experience.
  • FIG. 1 is an architectural diagram of an immersive media system provided by an exemplary embodiment of the present application
  • FIG. 2 is a flowchart of an immersive media transmission solution provided by an exemplary embodiment of the present application
  • FIG. 3 is a schematic flowchart of a data processing method for immersive media provided according to an embodiment of the present application
  • FIG. 4 is a schematic flowchart of another data processing method for immersive media provided according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a data processing apparatus for immersive media provided by an exemplary embodiment of the present application
  • FIG. 6 is a schematic structural diagram of another data processing apparatus for immersive media provided by an exemplary embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a media production device provided by an exemplary embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a media playback device provided by an exemplary embodiment of the present application.
  • immersive media refers to media files that can provide immersive media content, so that users immersed in the media content can obtain visual, auditory and other sensory experiences in the real world.
  • immersive media can be 3DoF (Degree of Freedom) immersive media, 3DoF+ immersive media or 6DoF immersive media.
  • Immersive media content includes video content represented in a three-dimensional (3-Dimension, 3D) space in various forms, for example, three-dimensional video content represented in a spherical form.
  • Immersive media content can be VR (Virtual Reality, virtual reality) video content, panoramic video content, spherical video content or 360-degree video content; therefore, immersive media can also be called VR video, panoramic video, spherical video or 360-degree video content video. Additionally, immersive media content also includes audio content that is synchronized with the video content represented in three-dimensional space.
  • VR Virtual Reality, virtual reality
  • panoramic video content panoramic video content
  • immersive media content also includes audio content that is synchronized with the video content represented in three-dimensional space.
  • FIG. 1 shows an architecture diagram of an immersive media system provided by an exemplary embodiment of the present application; as shown in FIG. 1 , the immersive media system includes a media production device and a media playback device (or a media consumption device),
  • the media production device may refer to a computer device used by a provider of immersive media (for example, a media producer of immersive media), and the computer device may be a terminal (such as a PC (Personal Computer, personal computer), a smart mobile device (such as smartphone), etc.) or server.
  • a media playback device may refer to a computer device used by a user (such as a user) of immersive media, and the computer device may be a terminal (such as a PC (Personal Computer), a smart mobile device (such as a smart phone), a VR device ( Such as VR helmets, VR glasses, etc.)).
  • the data processing process of immersive media includes a data processing process on the media production device side and a data processing process on the media playback device side.
  • the data processing process on the media production device side mainly includes: (1) the acquisition and production process of the media content of the immersive media; (2) the process of encoding and file encapsulation of the immersive media.
  • the data processing process on the media playback device side mainly includes: (3) the process of decapsulating and decoding the immersive media files; (4) the rendering process of the immersive media.
  • the transmission process of immersive media between the media production device and the media playback device may be performed based on various transmission protocols.
  • the transmission protocols here may include but are not limited to: DASH (Dynamic Adaptive Streaming over HTTP, Dynamic adaptive streaming media transmission) protocol, HLS (HTTP Live Streaming, dynamic bit rate adaptive transmission) protocol, SMTP (Smart Media Transport Protocol, intelligent media transmission protocol), TCP (Transmission Control Protocol, Transmission Control Protocol) and so on.
  • DASH Dynamic Adaptive Streaming over HTTP
  • Dynamic adaptive streaming media transmission Dynamic adaptive streaming media transmission
  • HLS HTTP Live Streaming, dynamic bit rate adaptive transmission
  • SMTP Smart Media Transport Protocol, intelligent media transmission protocol
  • TCP Transmission Control Protocol, Transmission Control Protocol
  • FIG. 2 shows a flowchart of an immersive media transmission solution provided by an exemplary embodiment of the present application.
  • FIG. 2 in order to solve the problem of transmission bandwidth load caused by the excessive amount of data in immersive media itself, in the process of immersive media processing, it is usually chosen to spatially divide the original video into multiple sub-block videos ( For example, after segmented video 1, segmented video 2, ..., segmented video N), the files are encoded and encapsulated respectively, and then transmitted to the client for consumption.
  • the capture device may refer to a hardware component provided in the media production device, for example, the capture device refers to a microphone, a camera, a sensor, and the like of a terminal.
  • the capture device may also be a hardware device connected to the media production device, such as a camera connected to the server, for providing the media production device with a media content acquisition service of immersive media.
  • the capture devices may include, but are not limited to, audio devices, camera devices, and sensing devices.
  • the audio device may include an audio sensor, a microphone, and the like.
  • the camera device may include a normal camera, a stereo camera, a light field camera, and the like.
  • Sensing devices may include laser devices, radar devices, and the like.
  • the number of capture devices can be multiple, and these capture devices are deployed in some specific locations in the real space to capture audio content and video content from different angles in the space at the same time.
  • the captured audio content and video content are both temporally and spatially. keep in sync. Due to the different acquisition methods, the compression encoding methods corresponding to the media content of different immersive media may also be different.
  • the captured audio content is itself suitable for performing audio encoding of immersive media.
  • the captured video content undergoes a series of production processes before it becomes suitable for video encoding for immersive media.
  • the production process includes:
  • splicing refers to splicing the video content (images) captured at various angles into a complete video that can reflect the 360-degree visual panorama of the real space, that is, splicing
  • the latter video is a panoramic video (or spherical video) represented in 3D space.
  • Projection refers to the process of mapping a three-dimensional video formed by splicing to a two-dimensional (2-Dimension, 2D) image.
  • the 2D image formed by projection is called a projection image; the projection methods may include but are not limited to: latitude and longitude projection, Regular hexahedron projection.
  • the projected image can be encoded directly, or the projected image can be encapsulated and then encoded.
  • the modern mainstream video coding technology taking the international video coding standard HEVC (High Efficiency Video Coding), the international video coding standard VVC (Versatile Video Coding), and the Chinese national video coding standard AVS (Audio Video Coding Standard) as examples, adopts a hybrid coding framework.
  • the input original video signal is subjected to the following series of operations and processing: block partition structure, predictive coding (Predictive Coding), transform coding and quantization (Transform&Quantization), entropy coding (Entropy Coding) or statistics Coding, Loop Filtering, etc.
  • the audio code stream and video code stream are encapsulated in the file container according to the file format of immersive media (such as ISOBMFF (ISO BaseMedia File Format, ISO Base Media File Format)) to form the media file resource of immersive media.
  • the media file resource can be It is a media file or media fragment that forms an immersive media; and according to the file format requirements of the immersive media, the media presentation description (MPD) is used to record the metadata of the media file resource of the immersive media.
  • Metadata is a general term for information related to the presentation of immersive media, and the metadata may include description information of media content, description information of windows, and signaling information related to presentation of media content, and so on.
  • the media production device will store the media presentation description information and media file resources formed after the data processing process.
  • a sample is an encapsulation unit in a media file encapsulation process, and a media file consists of many samples. Taking video media as an example, one sample of video media is usually one video frame.
  • the process of file decapsulation and decoding of immersive media (decoding mainly includes audio decoding, video decoding, and 6DoF media decoding);
  • the media playback device can dynamically obtain the media file resources of the immersive media and the corresponding media presentation description information from the media production device through the recommendation of the media production device or adaptively and dynamically according to the user needs of the media playback device.
  • the tracking information of the part/eye/body determines the orientation and position of the user, and then dynamically requests the media production device to obtain corresponding media file resources based on the determined orientation and position.
  • Media file resources and media presentation description information are transmitted from the media production device to the media playback device (media transport) through a transmission mechanism (eg, DASH, SMT).
  • the file decapsulation process on the media playback device side is inverse to the file encapsulation process on the media production device side.
  • the media playback device decapsulates the media file resources according to the file format requirements of the immersive media to obtain the audio stream and video stream.
  • the decoding process on the media playback device is opposite to the encoding process on the media production device.
  • the media playback device performs audio decoding on the audio stream to restore the audio content.
  • the video decoding process of the video code stream by the media playback device includes the following steps: 1. Decode the video code stream to obtain a plane projection image. (2) Reconstruct the projected image according to the media presentation description information to convert it into a 3D image.
  • the reconstruction process here refers to the process of re-projecting the two-dimensional projected image into the 3D space.
  • the media playback device renders the audio content obtained by audio decoding and the 3D image obtained by video decoding according to the metadata related to rendering, viewpoint, and window in the media presentation description information. After rendering, the playback output of the 3D image is realized, for example Audio content is output through speakers/headphones, and 3D images are displayed through a display device.
  • Viewpoint can refer to the immersive media content collected by the capture device in the process of immersive media production, and viewport can refer to the part of the immersive media content that is viewed by the user during the immersive media presentation process.
  • the immersive media system supports a data box (Box).
  • a data box refers to a data block or object including metadata, that is, a data box contains metadata of corresponding media content.
  • the metadata of the window is recorded by the recommended window data box (RcvpInfoBox), and the syntax of the recommended window data box (RcvpInfoBox) of the immersive media is shown in Table 1 below:
  • viewport_type Indicates the type of the recommended viewport. A value of 0 indicates that the recommended viewport is from the editing of the media producer; a value of 1 indicates that the recommended viewport is obtained according to big data statistics.
  • viewport_description Indicates the description information of the recommended window, the description information is a null-terminated octet (UTF-8) string.
  • viewpoint_idc When the value of this field is 0, it means that all media tracks associated with the current media track belong to the same viewpoint; when the value is 1, it means that the ID of the viewpoint associated with the entry of the current recommended track is determined according to rvif_viewpoint_id ; When the value is 2, it means that some samples in the media track correspond to a specific viewpoint.
  • rvif_viewpoint_id indicates the identifier (ID) of the viewpoint (viewpoint) to which all samples corresponding to the sample entry (sample entry) belong.
  • viewpoint_id Indicates the ID of the viewpoint to which the sample belongs.
  • a track refers to a series of samples with time attributes in an encapsulation manner according to the ISO base media file format (ISO BMFF), such as a video track.
  • ISO BMFF ISO base media file format
  • a video track is encoded by a video encoder.
  • the code stream generated after each frame is encapsulated according to the ISOBMFF specification.
  • a track can correspond to an entry, or sample entry.
  • a viewpoint can correspond to a recommended viewport, corresponding to the scene where different viewpoints have different recommended viewports; when the value of viewpoint_idc is 2, a recommended viewport can be in the Changes between multiple viewpoints, that is, one recommendation window can correspond to multiple viewpoints. That is to say, in the above two scenarios, the viewpoints need to be switched according to the recommended windows, which cannot provide the user with a user experience of flexibly switching viewpoints and windows.
  • an embodiment of the present application provides a data processing method for immersive media, which can encapsulate the switching information of the viewpoint and the switching information of the recommended window in the recommended window data box, so that the media playback device can switch according to the viewpoint.
  • the switching information of the information and the recommended window can be flexibly switched between the viewpoint and the recommended window, so that the user experience can be improved.
  • Fig. 3 is a schematic flowchart of a data processing method for immersive media provided according to an exemplary embodiment of the present application.
  • the method can be executed by a media playback device (or called a media consumption device) in an immersive media system, and the media consumption device can be, for example, a server, a drone, a handheld terminal and other devices with point cloud media encoding and decoding capabilities. , which is not limited in this application.
  • the method may include at least some of the following:
  • the recommended window data box is used to define the switching information of the viewpoint of the immersive media file and the switching information of the recommended window;
  • the media consumption device when the media consumption device chooses to browse the immersive media in the recommended browsing mode, the media consumption device switches the viewpoint and the window according to the switch of the viewpoint and the window in the recommended window data box.
  • the recommended browsing mode may refer to browsing each sample of the immersive media in a recommended window (eg, a window recommended by a director/content producer).
  • the recommended window data box includes:
  • viewpoint_switch The viewpoint switch field (referred to as viewpoint_switch) is used to indicate whether it is necessary to switch to another viewpoint after the specific recommended window corresponding to the current viewpoint is presented, wherein the current viewpoint corresponds to at least one recommended window.
  • a value of 1 in the viewpoint switching field indicates that it is necessary to switch to another viewpoint after the specific recommended window corresponding to the current viewpoint is presented; Need to switch to another viewpoint.
  • the recommended window data box includes:
  • the target viewpoint field (denoted as des_viewpoint_id) is used to indicate the target viewpoint that needs to be switched to after the current recommendation window is presented;
  • the target recommended window field (denoted as des_viewport_id) is used to indicate the target recommended window to be switched to after presenting the current recommended window and switching to the target viewpoint.
  • the target viewpoint and the target recommended window may be determined according to the target viewpoint field and the target recommended window field, and the target viewpoint field reflects the switching information of the current viewpoint.
  • the target recommended window field reflects the switching information of the current recommended window.
  • the value of the target viewpoint field may be the identifier of the target viewpoint.
  • the value of the target recommended window field may be the identifier of the recommended video.
  • the recommended viewport data box includes: an independent viewport field (independent_viewport) for indicating whether each viewport corresponds to an independent group of recommended viewports.
  • independent_viewport independent viewport field
  • the value of the independent view window field is the first value, such as 1, indicating that each viewpoint corresponds to an independent group of recommended views, in other words, each group of recommended views corresponds to an independent viewpoint, that is, the two are in a one-to-one correspondence .
  • the independent viewpoints corresponding to each group of recommended windows are determined according to the viewpoint identifier ID field in the sample entry of each recommended window.
  • the value of the independent view window field is a second value, such as 0, indicating that each view point corresponds to the same set of recommended view windows. In this case, there is only one recommended window entry, and all viewpoints correspond to this recommended window entry.
  • the value of the independent window field when the value of the independent window field is 1, it means that each group of recommended windows (ie, one recommended window entry) only corresponds to the content of a certain viewpoint.
  • the value of the independent window field when the value of the independent window field is 0, it means that there is only one group of recommended windows, and the group of recommended windows corresponds to the contents of all viewpoints.
  • the viewpoint information in the recommended window data box only includes the label information of the viewpoints.
  • the viewpoint information in the recommended window data box only includes the label information of the viewpoints.
  • the encapsulation file of the immersive media may further include a viewpoint entity group data box (ViewpointEntityGroupBox), which is used to record tags of each entity that constitutes a viewpoint.
  • ViewpointEntityGroupBox a viewpoint entity group data box
  • each entity constituting a viewpoint may refer to various media components, for example, video (video), audio (audio), background audio (background video), etc., which is not limited in this application.
  • the media consumption device can obtain the tags of each entity of the viewpoint, and perform rendering based on each entity of the viewpoint, which facilitates the diversified presentation of viewpoints.
  • the recommended window data box and the viewpoint entity group data box may be implemented by extending the ISOBMFF (ISO Base Media File Format, International Organization for Standardization Base Media File Format) data box.
  • ISOBMFF ISO Base Media File Format, International Organization for Standardization Base Media File Format
  • the recommended video data box may include the viewpoint information data box (AssoRcViewpointInfoBox) in Table 2 and the recommended viewport sample data box (RecommendedViewportSample) in Table 3.
  • the viewpoint information data box and the recommended viewport sample data box are respectively used for Define the above fields.
  • the viewpoint entity group data box may be defined in the manner in Table 4.
  • viewpoint_label Indicates the label information of the viewpoint, such as a null-terminated string.
  • group_id Indicates the group ID of the current entity group.
  • num_entities_in_group Indicates the number of entities in the current entity group.
  • entity_id Indicates the ID of the entity, which corresponds to the track ID or item ID.
  • viewpoint_entity_label Indicates the label of each entity that composes the current viewpoint, for example, it can be a null-terminated string.
  • the recommended window data box also includes a recommended window description field, which is used to indicate the description information of the recommended window (for example, the first recommended window type is recommended by the author of XX), and the description information is a null-terminated octet (UTF-8). -8) String.
  • the S301 may specifically include:
  • the media consumption device obtains the package file of the immersive media from the media production device, and decapsulates the package file of the immersive media to obtain the recommended window data box of the immersive media, and the recommended window data box
  • the switching information used to define the viewpoint of the immersive media and the switching information of the recommended window further, the media consumption device can determine the target viewpoint and The target recommendation window is then switched to the target viewpoint and the target recommendation window to present the immersive media file.
  • This solution can flexibly determine the switching method according to the switching information of the viewpoints defined in the recommended window data box and the switching information of the recommended windows, so as to switch between the viewpoints and the recommended windows, thereby making the presentation forms of immersive media more diverse and improving the user experience.
  • Fig. 4 is a schematic flowchart of a data processing method for immersive media provided according to an exemplary embodiment of the present application.
  • the method can be executed by a media production device in the immersive media system, for example, the media production device can be various devices with point cloud media encoding and decoding capabilities, such as a server, a drone, a handheld terminal, etc., which is not limited in this application .
  • the method may include at least some of the following:
  • the method further includes:
  • the recommended window data box is encapsulated into the encapsulation file of the immersive media file, and the encapsulation file of the immersive media file is sent to the media playback device.
  • the recommended window data box includes:
  • the viewpoint switching field is used to indicate whether it is necessary to switch to another viewpoint after the specific recommended window corresponding to the current viewpoint is presented, wherein the current viewpoint corresponds to at least one recommended window.
  • the recommended window data box includes:
  • the target viewpoint field is used to indicate the target viewpoint that needs to be switched to after the current recommendation window is presented;
  • the target recommended window field is used to indicate the target recommended window to be switched to after presenting the current recommended window and switching to the target viewpoint.
  • the recommended window data box includes: an independent window field, which is used to indicate whether a viewpoint corresponds to an independent group of recommended windows.
  • the value of the independent window field is a first value, indicating that each viewpoint corresponds to an independent group of recommended windows
  • the value of the independent window field is the second value, indicating that each viewpoint corresponds to the same set of recommended windows.
  • the recommended window data box further includes: a viewpoint entity group data box, which is used to record labels of each entity that constitutes a viewpoint.
  • the media production device can encapsulate the recommended window data box according to the viewpoint information and the window information of the immersive media, how to encapsulate the recommended window data box in the encapsulation file of the immersive media, and further the
  • the encapsulation file of the immersive media is sent to the media consumption device, so that the media consumption device decapsulates the encapsulated file of the immersive media, obtains the window data box of the immersive media, and performs viewpoints and recommendations according to the switching method defined by the window data box.
  • the switching of windows makes the presentation forms of immersive media more diverse, thereby improving the user experience.
  • the media production device encapsulates the recommended window data box according to the collected viewpoint information of the media file A and the recommended window corresponding to the viewpoint, and encapsulates the viewpoint entity group data box according to each entity that constitutes the viewpoint.
  • the media file A has two viewpoints, VPI1 and VPI2, and VPI1 is the initial viewpoint.
  • VPI1 corresponds to the recommended windows vp11, vp12 and vp13, and vp11 is the initial window.
  • VPI2 corresponds to the recommended window vp21, vp22, and vp21 is the initial window.
  • the information recorded in the recommended window data box is as follows:
  • a viewpoint entity group data box corresponding to each viewpoint is generated according to the constituent entities of VPI1 and VPI2.
  • VPI1 contains two entity components “video” and “audio”
  • VPI2 contains three entity components "video”, "audio” and "background video”.
  • the media production device After configuring the recommended window data box, the media production device encapsulates the recommended window data box into the immersive media package file, and sends the immersive media package file to the media consumption device. For example, upon receiving a request from a media consumption device, the media production device may send an immersive media package file to the media consumption device.
  • VPI1 is the initial viewpoint
  • consumption starts from VPI1.
  • the tags of the constituent entities of VPI1 the video track and the audio track are obtained and rendered respectively.
  • the recommended window vp11 is the initial window, so the recommended window vp11 is presented first, and each recommended window in the track is sequentially rendered according to the sample information of the recommended window track.
  • the media consumption device switches to the viewpoint VPI2 and the recommended window vp21, and obtains the video track, the audio track, and the background video track for rendering according to the tags of the constituent entities of the VPI2.
  • the embodiments of the present application can implement flexible and independent switching of viewpoints and windows, thereby improving user experience.
  • FIG. 5 shows a schematic structural diagram of a data processing apparatus for immersive media provided by an exemplary embodiment of the present application; the data processing apparatus for immersive media may be one running in a media consumption device.
  • the computer program (including program code), eg, the immersive media data processing apparatus, may be an application software in a media consumption device.
  • the data processing apparatus includes an acquisition unit 501 , a determination unit 502 and a presentation unit 503 .
  • the data processing apparatus shown in FIG. 5 may be used to execute some or all of the method embodiments described in FIG. 3 above.
  • the obtaining unit 501 is configured to obtain a recommended window data box of the immersive media file, where the recommended window data box is used to define the switching information of the viewpoint of the immersive media file and the switching information of the recommended window;
  • a determining unit 502 configured to determine a target viewpoint and a target recommended window according to the switching information of the current viewpoint and the switching information of the current recommended window;
  • the presenting unit 503 is configured to switch to the target viewpoint and the target recommended window to present the immersive media file.
  • the recommended window data box includes a viewpoint switch field:
  • the determining unit 502 is further configured to determine, according to the viewpoint switching field, whether to switch to another viewpoint after presenting the specific recommended window corresponding to the current viewpoint, the current viewpoint corresponding to at least one recommended window;
  • the determining unit 502 executes the step of determining the target viewpoint and the target recommended window according to the switching information of the current viewpoint and the switching information of the current recommended window.
  • the recommended window data box includes a target viewpoint field, which is used to indicate the target viewpoint to switch to after the current recommended window is presented; the target recommended window field is used to indicate that the current recommended window is presented and switched to the target After the viewpoint, switch to the target recommendation window;
  • the determining unit 502 is configured to determine the target viewpoint according to the target viewpoint field and determine the target recommended view window according to the target recommended view window field, where the target viewpoint field reflects the switching information of the current viewpoint, and the The target recommended window field reflects the switching information of the current recommended window.
  • the recommended window data box includes: an independent window field, which is used to indicate whether a viewpoint corresponds to an independent group of recommended windows.
  • the value of the independent window field is a first value, indicating that each viewpoint corresponds to an independent group of recommended windows
  • the value of the independent window field is the second value, indicating that each viewpoint corresponds to the same set of recommended windows.
  • the recommended view window data box further includes: a viewpoint entity group data box, which is used to record the labels of each entity constituting the viewpoint.
  • the presentation unit 503 is also used for:
  • the obtaining unit 501 is specifically used for:
  • each unit in the data processing apparatus for immersive media shown in FIG. 5 may be respectively or all merged into one or several other units to form, or some unit(s) of which may be further It can be further divided into multiple units with smaller functions, which can realize the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above-mentioned units are divided based on logical functions.
  • the function of one unit may also be implemented by multiple units, or the functions of multiple units may be implemented by one unit.
  • the data processing apparatus for immersive media may also include other units.
  • these functions may also be implemented with the assistance of other units, and may be implemented by cooperation of multiple units.
  • a general-purpose computing device such as a computer
  • a general-purpose computing device may be implemented on a general-purpose computing device including a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and other processing elements and storage elements.
  • CPU central processing unit
  • RAM random access storage medium
  • ROM read-only storage medium
  • Running a computer program capable of executing the steps involved in the corresponding method as shown in FIG. 3 to construct a data processing device for immersive media as shown in FIG. 3 , and to realize the immersion of the embodiments of the present application data processing methods for media.
  • the computer program can be recorded on, for example, a computer-readable recording medium, and loaded in the above-mentioned computing device through the computer-readable recording medium, and executed therein.
  • the principles and beneficial effects of the data processing apparatus for providing immersive media in the embodiments of the present application for solving problems are similar to the principles and beneficial effects for solving the problems of the data processing methods for immersive media in the method embodiments of the present application.
  • the principles and beneficial effects of the implementation of the method will not be repeated here for the sake of brevity.
  • FIG. 6 shows a schematic structural diagram of another data processing apparatus for immersive media provided by an exemplary embodiment of the present application; the data processing apparatus for immersive media may run in a media production device.
  • a computer program (including program code), eg, the immersive media data processing apparatus, may be an application software in a media production device.
  • the data processing apparatus for immersive media includes an acquisition unit 601 and an encapsulation unit 602 . The detailed description of each unit is as follows:
  • an acquiring unit 601 configured to acquire viewpoint information and window information when the immersive media file is presented;
  • the encapsulation unit 602 is configured to encapsulate a recommended window data box according to the viewpoint information and the window information when the immersive media file is presented, wherein the recommended window data box is used to define the switching information of the viewpoint of the immersive media file and the value of the recommended window. toggle information
  • the recommended window data box includes:
  • the viewpoint switching field is used to indicate whether it is necessary to switch to another viewpoint after the specific recommended window corresponding to the current viewpoint is presented, wherein the current viewpoint corresponds to at least one recommended window.
  • the recommended window data box includes:
  • the target viewpoint field is used to indicate the target viewpoint to switch to after the current recommendation window is presented
  • the target recommended window field is used to indicate the target recommended window to switch to after presenting the current recommended window and switching to the target viewpoint.
  • the recommended window data box includes: an independent window field, which is used to indicate whether a viewpoint corresponds to an independent group of recommended windows.
  • the value of the independent window field is a first value, indicating that each viewpoint corresponds to an independent group of recommended windows
  • the value of the independent window field is the second value, indicating that each viewpoint corresponds to the same set of recommended windows.
  • the recommended view window data box further includes: a viewpoint entity group data box, which is used to record the labels of each entity constituting the viewpoint.
  • the encapsulation unit 602 is further configured to encapsulate the recommended window data box into an encapsulation file of the immersive media file
  • the data processing apparatus further includes: a transmission unit, configured to send the package file of the immersive media file to the media playback device.
  • each unit in the data processing apparatus for immersive media shown in FIG. 6 may be respectively or all merged into one or several other units to form, or some of the unit(s) may be further It can be further divided into multiple units with smaller functions, which can realize the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above-mentioned units are divided based on logical functions.
  • the function of one unit may also be implemented by multiple units, or the functions of multiple units may be implemented by one unit.
  • the data processing apparatus for immersive media may also include other units.
  • these functions may also be implemented with the assistance of other units, and may be implemented by cooperation of multiple units.
  • a general-purpose computing device such as a computer
  • a general-purpose computing device may be implemented on a general-purpose computing device including a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and other processing elements and storage elements.
  • CPU central processing unit
  • RAM random access storage medium
  • ROM read-only storage medium
  • Running a computer program capable of executing the steps involved in the corresponding method as shown in FIG. 4 , to construct a data processing apparatus for immersive media as shown in FIG. 4 , and to realize the immersion of the embodiments of the present application data processing methods for media.
  • the computer program can be recorded on, for example, a computer-readable recording medium, and loaded in the above-mentioned computing device through the computer-readable recording medium, and executed therein.
  • the principles and beneficial effects of the data processing apparatus for providing immersive media in the embodiments of the present application for solving problems are similar to the principles and beneficial effects for solving the problems of the data processing methods for immersive media in the method embodiments of the present application.
  • the principles and beneficial effects of the implementation of the method will not be repeated here for the sake of brevity.
  • the embodiments of the present application also provide a computer device, which may be a media production device or a media playback device.
  • a computer device which may be a media production device or a media playback device.
  • the media production device and the media playback device will be introduced respectively.
  • FIG. 7 shows a schematic structural diagram of a media production device provided by an exemplary embodiment of the present application
  • the media production device may refer to a computer device used by a provider of immersive media, and the computer device may be a terminal (such as , PC, smart mobile devices (such as smartphones, etc.) or servers.
  • the media production device includes a capture device 801 , a processor 802 , a memory 803 and a transmitter 804 . in:
  • the capture device 801 is used to capture real-world sound-visual scenes to obtain raw data of immersive media (including audio content and video content that are synchronized in time and space).
  • the capture device 801 may include, but is not limited to, an audio device, a camera device, and a sensor device.
  • the audio device may include an audio sensor, a microphone, and the like.
  • the camera device may include a normal camera, a stereo camera, a light field camera, and the like.
  • Sensing devices may include laser devices, radar devices, and the like.
  • the processor 802 (or called CPU (Central Processing Unit, central processing unit)) is the processing core of the media production device, and the processor 802 is suitable for implementing one or more program instructions, specifically for loading and executing one or more programs.
  • the instructions thus implement the flow of the data processing method for immersive media shown in FIG. 3 .
  • the memory 803 is a memory device in the media production device, used for storing programs and media resources. It can be understood that, the memory 803 here can include both a built-in storage medium in the media production device, and certainly also an extended storage medium supported by the media production device. It should be noted that the memory. It can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; optionally, it can also be at least one memory located away from the aforementioned processor.
  • the memory provides storage space for storing the operating system of the media production device. In addition, the storage space is also used to store a computer program.
  • the computer program includes program instructions, and the program instructions are suitable for being called and executed by the processor, so as to execute each step of the data processing method for immersive media.
  • the memory 803 can also be used to store the immersive media file processed by the processor, where the immersive media file includes media file resources and media presentation description information.
  • the transmitter 804 is used to realize the transmission interaction between the media production device and other devices, and is specifically used to realize the transmission of immersive media between the media production device and the media playback device. That is, the media production device transmits the relevant media resources of the immersive media to the media playback device through the transmitter 804 .
  • the processor 802 may include a converter 821, an encoder 822 and a wrapper 823;
  • the converter 821 is used to perform a series of conversion processing on the captured video content, so that the video content becomes content suitable for performing video coding of immersive media.
  • the transformation process may include: stitching and projection, and optionally, the transformation process further includes region encapsulation.
  • the converter 821 can convert the captured 3D video content into 2D images and provide them to the encoder for video encoding.
  • the encoder 822 is configured to perform audio encoding on the captured audio content to form an audio code stream of the immersive media. It is also used to perform video coding on the 2D image converted by the converter 821 to obtain a video stream.
  • the encapsulator 823 is used to encapsulate the audio code stream and the video code stream in the file container according to the immersive media file format (such as ISOBMFF) to form a media file resource of the immersive media, and the media file resource can be formed by a media file or a media fragment.
  • the encapsulated file of the immersive media processed by the encapsulator will be stored in the memory, and provided to the media playback device on demand for the presentation of the immersive media.
  • the processor 802 (specifically, each device included in the processor) executes each step of the data processing method for immersive media shown in FIG. 4 by calling one or more instructions in the memory 803 .
  • the memory 803 stores one or more first instructions, and the one or more first instructions are suitable for being loaded by the processor 802 and performing the following steps:
  • a recommended window data box is encapsulated according to the viewpoint information and the window information of the immersive media file, wherein the recommended window data box is used to define the switching information of the viewpoint and the switching information of the recommended window of the immersive media file.
  • the processor 802 further performs the following operations: encapsulate the recommended window data box into the encapsulation file of the immersive media file, and play it to the media
  • the device sends a wrapper file for the immersive media file.
  • the principles and beneficial effects of the data processing apparatus for providing immersive media in the embodiments of the present application for solving problems are similar to the principles and beneficial effects for solving the problems of the data processing methods for immersive media in the method embodiments of the present application.
  • the principles and beneficial effects of the implementation of the method will not be repeated here for the sake of brevity.
  • FIG. 8 shows a schematic structural diagram of a media playback device provided by an exemplary embodiment of the present application
  • the media playback device may refer to a computer device used by a user of immersive media, and the computer device may be a terminal (such as PC, smart mobile devices (such as smartphones), VR devices (such as VR helmets, VR glasses, etc.).
  • the media playing device includes a receiver 901 , a processor 902 , a memory 903 , and a display/playing device 904 . in:
  • the receiver 901 is used to realize the transmission interaction between decoding and other devices, and is specifically used to realize the transmission of immersive media between the media production device and the media playback device. That is, the media playback device receives the relevant media resources of the immersive media transmitted by the media production device through the receiver 901 .
  • the processor 902 (or called CPU (Central Processing Unit, central processing unit)) is the processing core of the media production device, and the processor 902 is suitable for implementing one or more program instructions, specifically for loading and executing one or more programs.
  • the instructions thus implement the flow of the data processing method for immersive media shown in FIG. 3 .
  • the memory 903 is a memory device in the media playback device, used to store programs and media resources. It can be understood that, the memory 903 here may include both a built-in storage medium in the media playback device, and of course, may also include an extended storage medium supported by the media playback device. It should be noted that, the memory 903 can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; optionally, it can also be at least one memory located far from the aforementioned processor.
  • the memory 903 provides storage space for storing the operating system of the media playback device. In addition, the storage space is also used to store a computer program.
  • the computer program includes program instructions, and the program instructions are suitable for being called and executed by the processor, so as to execute each step of the data processing method for immersive media.
  • the memory 903 can also be used to store the three-dimensional image of the immersive media formed after processing by the processor, the audio content corresponding to the three-dimensional image, and the information required for rendering the three-dimensional image and the audio content, and the like.
  • the display/playing device 904 is used for outputting rendered sound and three-dimensional images.
  • the processor 902 may include a parser 921, a decoder 922, a converter 923 and a renderer 924; wherein:
  • the parser 921 is used to decapsulate the encapsulated files of the rendering media from the media production equipment, specifically decapsulate the media file resources according to the file format requirements of the immersive media, and obtain the audio code stream and the video code stream; and The audio and video streams are provided to the decoder 922 .
  • the decoder 922 performs audio decoding on the audio stream to obtain audio content and provides the audio content to the renderer for audio rendering. In addition, the decoder 922 decodes the video stream to obtain a 2D image. According to the metadata provided by the media presentation description information, if the metadata indicates that the immersive media has performed the area encapsulation process, the 2D image refers to the encapsulated image; if the metadata indicates that the immersive media has not performed the area encapsulation process, the plane Image refers to the projected image.
  • the converter 923 is used to convert 2D images into 3D images. If the immersive media has performed the area encapsulation process, the converter 923 will first perform area decapsulation on the encapsulated image to obtain the projection image. The projected image is then reconstructed to obtain a 3D image. If the rendering medium has not performed the region encapsulation process, the converter 923 will directly reconstruct the projected image to obtain a 3D image.
  • Renderer 924 is used to render audio content and 3D images of immersive media. Specifically, the audio content and the 3D image are rendered according to the metadata related to rendering and the viewport in the media presentation description information, and the rendering is completed and the display/playing device outputs.
  • the processor 902 (specifically, each device included in the processor) executes each step of the data processing method for immersive media shown in FIG. 3 by calling one or more instructions in the memory 803 .
  • the memory 903 stores one or more second instructions, and the one or more second instructions are suitable for being loaded by the processor 902 and performing the following steps:
  • the recommended window data box of the immersive media file where the recommended window data box is used to define the switching information of the viewpoint of the immersive media file and the switching information of the recommended window;
  • the processor 902 further performs the following operations: obtaining the package file of the immersive media file from the media production device;
  • the principles and beneficial effects of the data processing apparatus for providing immersive media in the embodiments of the present application for solving problems are similar to the principles and beneficial effects for solving the problems of the data processing methods for immersive media in the method embodiments of the present application.
  • the principles and beneficial effects of the implementation of the method will not be repeated here for the sake of brevity.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program is adapted to be loaded by a processor and execute the foregoing method embodiments.
  • Embodiments of the present application also provide a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the above method embodiments.
  • the modules in the apparatus of the embodiment of the present application may be combined, divided and deleted according to actual needs.
  • processor in this embodiment of the present application may be an integrated circuit chip, which has a signal processing capability.
  • each step of the above method embodiments may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Computer Security & Cryptography (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

一种沉浸式媒体文件的数据处理方法、装置和计算机可读存储介质,能够丰富沉浸式媒体的呈现形式,该方法包括:获取沉浸式媒体文件的推荐视窗数据盒,推荐视窗数据盒用于定义沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息(S301);根据当前视点的切换信息和当前推荐视窗的切换信息,确定目标视点和目标推荐视窗(S302);切换至目标视点和目标推荐视窗,进行沉浸式媒体文件的呈现(S303)。

Description

沉浸式媒体的数据处理方法、装置和计算机可读存储介质
本申请要求于2020年12月2日提交中国专利局、申请号202011399700.4、申请名称为“沉浸式媒体的数据处理方法、装置和计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,并且更具体地,涉及沉浸式媒体的数据处理。
背景技术
沉浸式媒体的内容通常被划分为多个样本(例如,图像帧,或者视频帧),这些样本按照关联性被封装在轨道组中,所谓关联性是指所有的轨道组中的样本都对应同一视点,或者一个轨道组中的样本对应一个视点,或者一个轨道组中的样本对应不同的视点,但是,一个轨道组中的样本只能按照一定的时序进行呈现。
然而,这样的关联性在一定程度上局限了沉浸式媒体的封装灵活性,也就限制了沉浸式媒体的呈现形式的灵活性,因此,如何丰富沉浸式媒体的呈现形式是一项亟需解决的问题。
发明内容
本申请提供了一种沉浸式媒体的数据处理方法、装置和计算机可读存储介质,能够丰富沉浸式媒体的呈现形式。
第一方面,提供了一种沉浸式媒体的数据处理方法,所述方法由媒体播放设备执行,所述方法包括:
获取沉浸式媒体文件的推荐视窗数据盒,该推荐视窗数据盒用于定义该沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息;
根据当前视点的切换信息和当前推荐视窗的切换信息,确定目标视点和目标推荐视窗;
切换至该目标视点和该目标推荐视窗,进行该沉浸式媒体文件的呈现。
第二方面,提供了一种沉浸式媒体的数据处理方法,所述方法由媒体制作设备执行,所述方法包括:
获取沉浸式媒体文件呈现时的视点信息以及视窗信息;
根据该沉浸式媒体文件呈现时的视点信息以及视窗信息配置推荐视窗数据盒,其中,该推荐视窗数据盒用于定义该沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息。
第三方面,提供了一种沉浸式媒体的数据处理装置,所述装置部署在媒体播放设备上,所述装置包括:
获取单元,用于获取沉浸式媒体文件的推荐视窗数据盒,该推荐视窗数据盒用于定义该沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息;
确定单元,用于根据当前视点的切换信息和当前推荐视窗的切换信息,确定目标视点和目标推荐视窗;
呈现单元,用于切换至该目标视点和该目标推荐视窗,进行该沉浸式媒体文件的呈现。
第四方面,提供了一种沉浸式媒体的数据处理装置,所述装置部署在媒体制作设备上,所述装置包括:
获取单元,用于获取沉浸式媒体文件呈现时的视点信息以及视窗信息;
封装单元,用于根据该沉浸式媒体文件呈现时的视点信息以及视窗信息封装推荐视窗数据盒,其中,该推荐视窗数据盒用于定义该沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息。
第五方面,提供了一种计算机设备,所述计算机设备包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,执行上述第一方面或其各实现方式中的方法。
第六方面,提供了一种计算机设备,所述计算机设备包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,执行上述第二方面或其各实现方式中的方法。
第七方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第八方面,提供了一种计算机程序产品,包括计算机程序指令,所述计算机程序指令使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第九方面,提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
通过上述技术方案,媒体制作设备可以根据沉浸式媒体的视点信息和视窗信息封装该沉浸式媒体的推荐视窗数据盒,进一步地,媒体播放设备可以按照推荐视窗数据盒中定义的当前视点的切换信息和当前推荐视窗的切换信息确定目标视点和目标推荐视窗,进而切换至目标视点和目标推荐视窗进行沉浸式媒体文件的呈现。本方案可以根据推荐视窗数据盒中定义的视点的切换信息和推荐视窗的切换信息灵活地确定切换方式,以进行视点和推荐视窗的切换,从而能够使得媒体播放设备端的呈现形式多样化,进而丰富沉浸式媒体的应用形式,提升用户体验。
附图说明
图1是本申请一个示例性实施例提供的一种沉浸式媒体系统的架构图;
图2是本申请一个示例性实施例提供的一种沉浸式媒体的传输方案流程图;
图3是根据本申请实施例提供的一种沉浸式媒体的数据处理方法的示意性流程图;
图4是根据本申请实施例提供的另一种沉浸式媒体的数据处理方法的示意性流程图;
图5是本申请一个示例性实施例提供的一种沉浸式媒体的数据处理装置的结构示意图;
图6本申请一个示例性实施例提供的另一种沉浸式媒体的数据处理装置的结构示意图;
图7是本申请一个示例性实施例提供的一种媒体制作设备的结构示意图;
图8是本申请一个示例性实施例提供的一种媒体播放设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。针对本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例涉及沉浸式媒体的数据处理技术。所谓沉浸式媒体是指能够提供沉浸式的媒体内容,使沉浸于该媒体内容中的用户能够获得现实世界中视觉、听觉等感官体验的媒体文件。通常情况下,沉浸式媒体可以是3DoF(Degree of Freedom)沉浸式媒体,3DoF+沉浸式媒体或者6DoF沉浸式媒体。沉浸式媒体内容包括以各种形式在三维(3-Dimension,3D)空间中表示的视频内容,例如以球面形式表示的三维视频内容。沉浸式媒体内容可以是VR(Virtual Reality,虚拟现实)视频内容、全景视频内容、球面视频内容或360度视频内容;所以,沉浸式媒体又可称为VR视频、全景视频、球面视频或360度视频。另外,沉浸式媒体内容还包括与三维空间中表示的视频内容相同步的音频内容。
图1示出了本申请一个示例性实施例提供的一种沉浸式媒体系统的架构图;如图1所示,沉浸式媒体系统包括媒体制作设备和媒体播放设备(或称媒体消费设备),媒体制作设备可以是指沉浸式媒体的提供者(例如沉浸式媒体的媒体制作者)所使用的计算机设备,该计算机设备可以是终端(如PC(Personal Computer,个人计算机)、智能移动设备(如智能手机)等)或服务器。媒体播放设备可以是指沉浸式媒体的使用者(例如用户)所使用的计算机设备,该计算机设备可以是终端(如PC(PersonalComputer,个人计算机)、智能移动设备(如智能手机)、VR设备(如VR头盔、VR眼镜等))。沉浸式媒体的数据处理过程包括在媒体制作设备侧的数据处理过程及在媒体播放设备侧的数据处理过程。
在媒体制作设备端的数据处理过程主要包括:(1)沉浸式媒体的媒体内容的获取与制作过程;(2)沉浸式媒体的编码及文件封装的过程。在媒体播放设备端的数据处理过程主要包括:(3)沉浸式媒体的文件解封装及解码的过程;(4)沉浸式媒体的渲染过程。另外,媒体制作设备与媒体播放设备之间涉及沉浸式媒体的传输过程,该传输过程可以基于各种传输协议来进行,此处的传输协议可包括但不限于:DASH(Dynamic Adaptive Streaming over HTTP,动态自适应流媒体传输)协议、HLS(HTTP Live Streaming,动态码率自适应传输)协议、SMTP(Smart Media TransportProtocol,智能媒体传输协议)、TCP(Transmission Control Protocol,传输控制协议)等。
图2示出了本申请一个示例性实施例提供的一种沉浸式媒体的传输方案流程图。如图2所示,为了解决沉浸式媒体自身数据量过大带来的传输带宽负荷问题,在沉浸式媒体的处理过程中,通常选择将原始视频在空间上切分为多个分块视频(例如分块视频1、分块视频2、……、分块视频N)后,分别编码后进行文件封装,再传输给客户端进行消费。
结合图1和图2,下面分别对沉浸式媒体的数据处理过程中涉及的各个过程进行说明。
一、在媒体制作设备端的数据处理过程:
(1)获取沉浸式媒体的媒体内容(即图1所示的内容获取)。
从沉浸式媒体的媒体内容的获取方式看,可以分为通过捕获设备采集现实世界的声音-视觉场景获得的以及通过计算机生成的两种方式。在一种实现中,捕获设备可以是指设于媒体制作设备中的硬件组件,例如捕获设备是指终端的麦克风、摄像头、传感器等。另一种实现中,该捕获设备也可以是与媒体制作设备相连接的硬件装置,例如与服务器相连接摄像头;用于为媒体制作设备提供沉浸式媒体的媒体内容的获取服务。该捕获设备可以包括但不限于:音频设备、摄像设备及传感设备。其中,音频设备可以包括音频传感器、麦克风等。摄像设备可以包括普通摄像头、立体摄像头、光场摄像头等。传感设备可以包括激光设备、雷达设备等。捕获设备的数量可以为多个,这些捕获设备被部署在现实空间中的一些特定位置以同时捕获该空间内不同角度的音频内容和视频内容,捕获的音频内容和视频内容在时间和空间上均保持同步。由于获取的方式不同,不同沉浸式媒体的媒体内容对应的压缩编码方式也可能有所区别。
(2)沉浸式媒体的媒体内容的制作过程。
捕获到的音频内容本身就是适合被执行沉浸式媒体的音频编码的内容。捕获到的视频内容进行一系列制作流程后才可成为适合被执行沉浸式媒体的视频编码的内容,该制作流程包括:
①拼接。由于捕获到的视频内容是捕获设备在不同角度下拍摄得到的,拼接就是指对这些各个角度拍摄的视频内容(图像)拼接成一个完整的、能够反映现实空间360度视觉全景的视频,即拼接后的视频是一个在三维空间表示的全景视频(或球面视频)。
②投影。投影就是指将拼接形成的一个三维视频映射到一个二维(2-Dimension,2D)图像上的过程,投影形成的2D图像称为投影图像;投影的方式可包括但不限于:经纬图投影、正六面体投影。
(3)沉浸式媒体的媒体内容的编码过程(包括音频编码和视频编码,对经过6DoF媒体处理后的内容进行6DoF媒体编码)。
投影图像可以被直接进行编码,也可以对投影图像进行区域封装之后再进行编码。现代主流视频编码技术,以国际视频编码标准HEVC(High EfficiencyVideo Coding),国际视频编码标准VVC(Versatile Video Coding),以及中国国家视频编码标准AVS(Audio Video Coding Standard)为例,采用了混合编码框架,对输入的原始视频信号,进行了如下一系列的操作和处理:块划分结构(block partition structure),预测编码(Predictive Coding),变换编码及量化(Transform&Quantization),熵编码(Entropy Coding)或统计编码,环路滤波(Loop Filtering)等。
(4)沉浸式媒体的封装过程。
将音频码流和视频码流按照沉浸式媒体的文件格式(如ISOBMFF(ISO BaseMedia File Format,ISO基媒体文件格式))封装在文件容器中形成沉浸式媒体的媒体文件资源,该媒体文件资源可以是媒体文件或媒体片段形成沉浸式媒体的媒体文件;并按照沉浸式媒体的文件格式要求采用媒体呈现描述信息(Mediapresentation description,MPD)记录该沉浸式媒体的媒体文件资源的元数据,此处的元数据是对与沉浸式媒体的呈现有关的信息的总称,该元数据可包括对媒体内容的描述信息、对视窗的描述信息以及对媒体内容呈现相关的信令信息等等。如图1所示,媒体制作设备会存储经过数据处理过程之后形成的媒体呈现描述信息和媒体文件资源。
样本(sample)是媒体文件封装过程中的封装单位,一个媒体文件由很多个样本组成。以视频媒体为例,视频媒体的一个样本通常为一个视频帧。
二、在媒体播放设备端的数据处理过程:
(1)沉浸式媒体的文件解封装及解码的过程(解码主要包括音频解码、视频解码、6DoF媒体解码);
媒体播放设备可以通过媒体制作设备的推荐或按照媒体播放设备端的用户需求自适应动态从媒体制作设备获得沉浸式媒体的媒体文件资源和相应的媒体呈现描述信息,例如媒体播放设备可根据用户的头部/眼睛/身体的跟踪信息确定用户的朝向和位置,再基于确定的朝向和位置动态向媒体制作设备请求获得相应的媒体文件资源。媒体文件资源和媒体呈现描述信息通过传输机制(如DASH、SMT)由媒体制作设备传输给媒体播放设备(媒体运输)。媒体播放设备端的文件解封装的过程与媒体制作设备端的文件封装过程是相逆的,媒体播放设备按照沉浸式媒体的文件格式要求对媒体文件资源进行解封装,得到音频码流和视频码流。媒体播放设备端的解码过程与媒体制作设备端的编码过程是相逆的,媒体播放设备对音频码流进行音频解码,还原出音频内容。另外,媒体播放设备对视频码流的视频解码过程包括如下:①对视频码流进行解码,得到平面的投影图像。②根据媒体呈现描述信息将投影图像进行重建处理以转换为3D图像,此处的重建处理是指将二维的投影图像重新投影至3D空间中的处理。
(2)沉浸式媒体的渲染过程。
媒体播放设备根据媒体呈现描述信息中与渲染、视点、视窗相关的元数据对音频解码得到的音频内容及视频解码得到的3D图像进行渲染,渲染完成即实现了对该3D图像的播放输出,例如通过扬声器/耳机输出音频内容,通过显示设备显示3D图像。视点可以指沉浸式媒体制作过程中,捕捉设备对应采集的沉浸式媒体内容,视窗可以指沉浸式媒体呈现过程中,被用户观看的那一部分沉浸式媒体内容。
沉浸式媒体系统支持数据盒(Box),数据盒是指包括元数据的数据块或对象,即数据盒中包含了相应媒体内容的元数据。在相关技术中,视窗的元数据采用推荐视窗数据盒 (RcvpInfoBox)来记录,沉浸式媒体的推荐视窗数据盒(RcvpInfoBox)的语法参见如下表1:
表1
Figure PCTCN2021131108-appb-000001
上述表示1所示语法的语义如下:
viewport_type:指示推荐视窗的类型,取值为0代表推荐视窗来源于媒体制作者剪辑;取值为1代表根据大数据统计得到的推荐视窗。
viewport_description:指示推荐视窗的描述信息,该描述信息是以空字符结尾的八位元(UTF-8)字符串。
viewpoint_idc:该字段取值为0时,表示与当前媒体轨道关联的所有媒体轨道都属于同一个视点;取值为1时,表示与当前推荐轨道的入口(entry)关联的视点的ID根据rvif_viewpoint_id确定;取值为2时,表示媒体轨道中的某些样本对应某个特定的视点。
rvif_viewpoint_id:指示样本入口(sample entry)对应的所有样本所属于的视点(viewpoint)的标识(ID)。
viewpoint_id:指示样本所属于的viewpoint的ID。
在本申请实施例中,轨道是指一系列有时间属性的按照ISO基本媒体文件格式(ISO base media file format,ISOBMFF)的封装方式的样本,比如视频track,视频track是通过将视频编码器编码每一帧后产生的码流按照ISOBMFF的规范封装后得到的。一个轨道可以对应一个入口,或称,样本入口。
由上述语义可知,在RcvpInfoBox中,当viewpoint_idc取值为1时,一个视点可以对应一个推荐视窗,对应不同的viewpoint拥有不同的推荐视窗的场景;当viewpoint_idc取值为2时,一个推荐视窗可以在多个视点之间变化,即一个推荐视窗可以对应多个视点。也就是说,在上述两种场景中,视点都需要跟随推荐视窗切换,不能提供给用户以灵活的切换视点以及视窗的用户体验。
有鉴于此,本申请实施例提供了一种沉浸式媒体的数据处理方法,可以在推荐视窗数据盒中封装视点的切换信息以及推荐视窗的切换信息,这样,媒体播放设备可以根据该视点的切换信息和推荐视窗的切换信息灵活的进行视点和推荐视窗的切换,从而能够提升用户体验。
图3是根据本申请一个示例性实施例提供的一种沉浸式媒体的数据处理方法的示意性流程图。该方法可以由沉浸式媒体系统中的媒体播放设备(或称媒体消费设备)来执行,该媒体消费设备例如可以为服务器,无人机,手持终端等各种具有点云媒体编解码能力的设备,本申请对此不作限定。
如图3所示,该方法可以包括如下至少部分内容:
S301,获取沉浸式媒体文件的推荐视窗数据盒,该推荐视窗数据盒用于定义该沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息;
S302,根据当前视点的切换信息和当前推荐视窗的切换信息,确定目标视点和目标推荐视窗;
S303,切换至该目标视点和该目标推荐视窗,进行该沉浸式媒体文件的呈现。
在一些可能的实施例中,在媒体消费设备选择以推荐浏览模式浏览沉浸式媒体的情况下,该媒体消费设备按照推荐视窗数据盒中的视点和视窗的切换进行视点和视窗的切换。
推荐浏览模式可以是指以推荐视窗(如导演/内容制作者推荐的视窗)浏览点沉浸式媒体的各个样本。
在一些可能的实施例中,该推荐视窗数据盒包括:
视点切换字段(记为viewpoint_switch),用于指示在呈现当前视点对应的特定推荐视窗之后,是否需要切换至另一视点,其中,该当前视点对应至少一个推荐视窗。
例如,该视点切换字段取值为1表示在当前视点对应的特定推荐视窗呈现之后,需要切换至另一视点;该视点切换字段取值为0表示在当前视点对应的特定推荐视窗呈现之后,不需要切换至另一视点。
因此,在一种可能的实现方式中,可以根据视点切换字段确定在呈现当前视点对应的特定推荐视窗之后,是否需要切换至另一视点,若是,则执行S302所示的步骤。
在一些可能的实施例中,该推荐视窗数据盒包括:
目标视点字段(记为des_viewpoint_id),用于指示在呈现当前推荐视窗后,需要切换到的目标视点;
目标推荐视窗字段(记为des_viewport_id),用于指示在呈现当前推荐视窗并且切换至该目标视点后,需要切换到的目标推荐视窗。
在一些实施例中,在视点切换字段指示需要切换至另一视点的情况下,可以根据目标视点字段和目标推荐视窗字段确定目标视点和目标推荐视窗,此时目标视点字段体现当前视点的切换信息,目标推荐视窗字段体现当前推荐视窗的切换信息。
在一些实施例中,该目标视点字段的取值可以为该目标视点的标识。
在一些实施例中,该目标推荐视窗字段的取值可以为推荐视频的标识。
在一些可能的实施例中,该推荐视窗数据盒包括:独立视窗字段(independent_viewport),用于指示每个视窗是否对应独立的一组推荐视窗。
作为一个示例,该独立视窗字段的取值为第一值,例如1,表示每个视点对应独立的一组推荐视窗,换言之,每组推荐视窗对应独立的视点,即二者是一一对应的。
其中,该每组推荐视窗对应的独立的视点根据该每个推荐视窗的样本入口中的视点标识ID字段确定。
作为另一个示例,该独立视窗字段的取值为第二值,例如0,表示每个视点均对应相同的一组推荐视窗。此情况下,仅有一个推荐视窗entry,所有视点都对应这个推荐视窗enry。
换言之,该独立视窗字段的取值为1时,表示每组推荐视窗(即一个推荐视窗entry)仅对应某一个视点的内容。该独立视窗字段的取值为0时,表示仅存在一组推荐视窗,该一组推荐视窗对应所有视点的内容。
在相关技术中,推荐视窗数据盒中的视点信息只包括视点的标签信息,当一个视点包括多个媒体内容时,不能实现视点的多样化呈现。
因此,在本申请实施例中,该沉浸式媒体的封装文件还可以包括视点实体组数据盒(ViewpointEntityGroupBox),用于记录组成视点的各个实体的标签。
其中,组成视点的各个实体可以指各种媒体成分,例如,视频(video),音频(audio),背景音频(background video)等,本申请对此不作限定。
这样,在切换至某个视点时,媒体消费设备可以获取该视点的各个实体的标签,基于该视点的各个实体进行渲染,便于视点的多样化呈现。
在一种实施方式中,推荐视窗数据盒和视点实体组数据盒可以通过扩展ISOBMFF(ISO Base Media File Format,国际标准化组织基媒体文件格式)数据盒的方式实现。
作为一个示例,推荐视频数据盒可以包括如表2中的视点信息数据盒(AssoRcViewpointInfoBox)和表3中的推荐视窗样本数据盒(RecommendedViewportSample),视点信息数据盒和该推荐视窗样本数据盒分别用于定义上述字段。
表2
Figure PCTCN2021131108-appb-000002
表3
Figure PCTCN2021131108-appb-000003
Figure PCTCN2021131108-appb-000004
上述表2和表3中的各个字段的含义参考前文所述。
作为一个示例,视点实体组数据盒可以采用表4中方式定义。
表4
Figure PCTCN2021131108-appb-000005
上述表4所示语法的语义如下:
viewpoint_label:指示视点的标签信息,例如可以为以空字符结尾的字符串。
group_id:指示当前实体组的组ID。
num_entities_in_group:指示当前实体组中实体的个数。
entity_id:指示实体的ID,该ID对应track ID或item ID。
viewpoint_entity_label:指示组成当前viewpoint的各个实体的标签,例如,可以为以空字符结尾的字符串。
应理解,表2,表3和图4中的封装方式仅为示例,在其他实施例中,上述字段也可以采用其他方式封装,本申请对此不作限定。
此外,推荐视窗数据盒还包括推荐视窗描述字段,用于指示推荐视窗的描述信息(如第一推荐视窗类型是由XX作者推荐的),该描述信息是以空字符结尾的八位元(UTF-8)字符串。
在一些可能的实施例中,该S301可以具体包括:
从媒体制作设备获取沉浸式媒体文件的封装文件;
对该沉浸式媒体文件的封装文件进行解封装处理,得到该沉浸式媒体文件的推荐视窗数据盒。
应理解,沉浸式媒体的封装文件的传输和解封装过程可参考图1和图2所示实施例的描述,在此不再赘述。
因此,在本申请实施例中,媒体消费设备从媒体制作设备获取沉浸式媒体的封装文件,对沉浸式媒体的封装文件进行解封装处理,得到沉浸式媒体的推荐视窗数据盒,推荐视窗数据盒用于定义沉浸式媒体的视点的切换信息以及推荐视窗的切换信息,进一步地,该媒体消费设备可以按照推荐视窗数据盒所定义的当前视点的切换信息和当前推荐视窗的切换信息确定目标视点和目标推荐视窗,进而切换至目标视点和目标推荐视窗进行沉浸式媒体文件的呈现。本方案可以根据推荐视窗数据盒中定义的视点的切换信息和推荐视窗的切换信息灵活地确定切换方式,以进行视点和推荐视窗的切换,从而使得沉浸式媒体的呈现形式更加多样化,进而提升用户体验。
图4是根据本申请一个示例性实施例提供的一种沉浸式媒体的数据处理方法的示意性流程图。该方法可以由沉浸式媒体系统中的媒体制作设备来执行,该媒体制作设备例如可以为服务器,无人机,手持终端等各种具有点云媒体编解码能力的设备,本申请对此不作限定。
如图4所示,该方法可以包括如下至少部分内容:
S401,获取沉浸式媒体文件的视点信息以及视窗信息;
S402,根据沉浸式媒体文件的视点信息以及视窗信息封装推荐视窗数据盒,其中,推荐视窗数据盒用于定义沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息。
在一些可能的实施例中,该方法还包括:
将推荐视窗数据盒封装至沉浸式媒体文件的封装文件中,并向媒体播放设备发送沉浸式媒体文件的封装文件。
沉浸式媒体的制作及封装过程可参考图1和图2实施例中的相关描述,为了简洁,在此不再赘述。
在一些可能的实施例中,所述推荐视窗数据盒包括:
视点切换字段,用于指示在呈现当前视点对应的特定推荐视窗之后,是否需要切换至另一视点,其中,所述当前视点对应至少一个推荐视窗。
在一些可能的实施例中,所述推荐视窗数据盒包括:
目标视点字段,用于指示在呈现当前推荐视窗后,需要切换到的目标视点;
目标推荐视窗字段,用于指示在呈现当前推荐视窗并且切换至所述目标视点后,需要切换到的目标推荐视窗。
在一些可能的实施例中,所述推荐视窗数据盒包括:独立视窗字段,用于指示视点是否对应独立的一组推荐视窗。
在一些可能的实施例中,所述独立视窗字段的取值为第一值,表示每个视点对应独立的一组推荐视窗;
所述独立视窗字段的取值为第二值,表示每个视点均对应相同的一组推荐视窗。
在一些可能的实施例中,所述推荐视窗数据盒还包括:视点实体组数据盒,用于记录组成视点的各个实体的标签。
因此,在本申请实施例中,媒体制作设备可以根据沉浸式媒体的视点信息和视窗信息封装推荐视窗数据盒,如何将该推荐视窗数据盒封装在该沉浸式媒体的封装文件中,进一步将该沉浸式媒体的封装文件发送给媒体消费设备,从而媒体消费设备对沉浸式媒体的封装文件进行解封装处理,得到沉浸式媒体的视窗数据盒,按照视窗数据盒所定义的切换方式进行视点和推荐视窗的切换,从而使得沉浸式媒体的呈现形式更加多样化,进而提升用户体验。
下面通过一个完整的例子对本申请实施例的方案进行详细说明。
首先,媒体制作设备根据采集的媒体文件A的视点信息和视点对应的推荐视窗封装推荐视窗数据盒,以及根据组成视点的各个实体封装视点实体组数据盒。其中,媒体文件A有两个视点,VPI1和VPI2,VPI1为初始视点。VPI1对应推荐视窗vp11,vp12和vp13,vp11是初始视窗。VPI2对应推荐视窗vp21,vp22,vp21是初始视窗。
推荐视窗数据盒所记录的信息如下:
VPI1(independent_viewport=1;viewpoint_switch=1):vp11,vp12(des_viewpoint_id=2;des_viewport_id=21),vp13;
VPI2(independent_viewport=1;viewpoint_switch=1):vp21,vp22(des_viewpoint_id=1;des_viewport_id=13)。
根据VPI1和VPI2的组成实体生成每个视点对应的视点实体组数据盒。其中,VPI1包含2个实体成分“video”和“audio”,VPI2包含3个实体成分“video”“audio”“background video”。
在配置推荐视窗数据盒完成后,媒体制作设备将推荐视窗数据盒封装至沉浸式媒体的封装文件中,并向媒体消费设备发送沉浸式媒体的封装文件。例如,在接收到媒体消费设备的请求的情况下,该媒体制作设备可以将沉浸式媒体的封装文件发送给媒体消费设备。
在媒体消费设备侧,若用户选择进入推荐浏览模式,由于VPI1为初始视点,因此,从VPI1开始消费,此时根据VPI1的组成实体的标签,获取video轨道和audio轨道分别渲染。
在VPI1中,推荐视窗vp11为初始视窗,因此先呈现推荐视窗vp11,并按照推荐视窗轨道的样本信息依次渲染该轨道中的各个推荐视窗。当渲染至推荐视窗vp12时,通过解析vp12对应的样本中的元数据信息(例如(des_viewpoint_id=2;des_viewport_id=21)),可知在消费vp12后,应跳转到视点VPI2,并开始消费推荐视窗vp21。
媒体消费设备切换至视点VPI2和推荐视窗vp21,并根据VPI2的组成实体的标签,获取video轨道、audio轨道、背景video轨道分别渲染。
媒体消费设备在消费推荐视窗vp21后,按照推荐视窗轨道的样本信息依次渲染各个推荐视窗,例如,在呈现推荐视窗vp22后,根据推荐视窗vp22样本中的元数据信息(例如(des_viewpoint_id=1;des_viewport_id=13)),消费完推荐视窗vp22后跳转至推荐视窗vp13继续消费。
因此,本申请实施例可以实现视点和视窗的灵活且独立的切换,从而能够提升用户体验。
上文结合图3至图4,详细阐述了本申请实施例的方法实施例,为了便于更好地实施本申请实施例的上述方案,以下,结合图5至图8,介绍根据本申请的装置实施例,应理解,装置实施例和方法实施例相互对应,相似描述可参考方法实施例,为了简洁,这里不再赘述。
请参见图5,图5示出了本申请一个示例性实施例提供的一种沉浸式媒体的数据处理装置的结构示意图;该沉浸式媒体的数据处理装置可以是运行于媒体消费设备中的一个计算机程序(包括程序代码),例如该沉浸式媒体的数据处理装置可以是媒体消费设备中的一个应用软件。由图5所示,该数据处理装置包括获取单元501、确定单元502和呈现单元503。图5所示的数据处理装置可以用于执行上述图3所描述的方法实施例中的部分或全部。
在一些实施例中,获取单元501,用于获取沉浸式媒体文件的推荐视窗数据盒,该推荐视窗数据盒用于定义该沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息;
确定单元502,用于根据当前视点的切换信息和当前推荐视窗的切换信息,确定目标视点和目标推荐视窗;
呈现单元503,用于切换至该目标视点和该目标推荐视窗,进行该沉浸式媒体文件的呈现。
在一些实施方式中,该推荐视窗数据盒包括视点切换字段:
所述确定单元502,还用于根据所述视点切换字段确定在呈现当前视点对应的特定推荐视窗之后,是否需要切换至另一视点,所述当前视点对应至少一个推荐视窗;
若是,所述确定单元502执行所述根据当前视点的切换信息和当前推荐视窗的切换信息,确定目标视点和目标推荐视窗的步骤。
在一些实施方式中,该推荐视窗数据盒包括目标视点字段,用于指示在呈现当前推荐视窗后,切换到的目标视点;目标推荐视窗字段,用于指示在呈现当前推荐视窗并且切换至该目标视点后,切换到的目标推荐视窗;
所述确定单元502,用于根据所述目标视点字段确定所述目标视点以及根据所述目标推荐视窗字段确定所述目标推荐视窗,所述目标视点字段体现所述当前视点的切换信息,所述目标推荐视窗字段体现所述当前推荐视窗的切换信息。
在一些实施方式中,该推荐视窗数据盒包括:独立视窗字段,用于指示视点是否对应独立的一组推荐视窗。
在一些实施方式中,该独立视窗字段的取值为第一值,表示每个视点对应独立的一组推荐视窗;
该独立视窗字段的取值为第二值,表示每个视点均对应相同的一组推荐视窗。
在一些实施方式中,该推荐视窗数据盒还包括:视点实体组数据盒,用于记录组成视点的各个实体的标签。
在一些实施方式中,该呈现单元503还用于:
根据该目标视点的各个实体的标签,渲染该目标视点的各个实体。
在一些实施方式中,该获取单元501具体用于:
从媒体制作设备获取沉浸式媒体文件的封装文件;
对该沉浸式媒体文件的封装文件进行解封装处理,得到该沉浸式媒体文件的推荐视窗数据盒。
根据本申请的一个实施例,图5所示的沉浸式媒体的数据处理装置中的各个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,该沉浸式媒体的数据处理装置也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。根据本申请的另一个实施例,可以通过在包括中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的例如计算机的通用计算设备上运行能够执行如图3所示的相应方法所 涉及的各步骤的计算机程序(包括程序代码),来构造如图3所示的沉浸式媒体的数据处理装置,以及来实现本申请实施例的沉浸式媒体的数据处理方法。所述计算机程序可以记载于例如计算机可读记录介质上,并通过计算机可读记录介质装载于上述计算设备中,并在其中运行。
基于同一发明构思,本申请实施例中提供沉浸式媒体的数据处理装置解决问题的原理与有益效果与本申请方法实施例中沉浸式媒体的数据处理方法解决问题的原理和有益效果相似,可以参见方法的实施的原理和有益效果,为简洁描述,在这里不再赘述。
请参见图6,图6示出了本申请一个示例性实施例提供的另一种沉浸式媒体的数据处理装置的结构示意图;该沉浸式媒体的数据处理装置可以是运行于媒体制作设备中的一个计算机程序(包括程序代码),例如该沉浸式媒体的数据处理装置可以是媒体制作设备中的一个应用软件。由图6所示,该沉浸式媒体的数据处理装置包括获取单元601和封装单元602。各个单元的详细描述如下:
获取单元601,用于获取沉浸式媒体文件呈现时的视点信息以及视窗信息;
封装单元602,用于根据该沉浸式媒体文件呈现时的视点信息以及视窗信息封装推荐视窗数据盒,其中,该推荐视窗数据盒用于定义该沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息
在一些实施方式中,该推荐视窗数据盒包括:
视点切换字段,用于指示在呈现当前视点对应的特定推荐视窗之后,是否需要切换至另一视点,其中,该当前视点对应至少一个推荐视窗。
在一些实施方式中,该推荐视窗数据盒包括:
目标视点字段,用于指示在呈现当前推荐视窗后,切换到的目标视点;
目标推荐视窗字段,用于指示在呈现当前推荐视窗并且切换至该目标视点后,切换到的目标推荐视窗。
在一些实施方式中,该推荐视窗数据盒包括:独立视窗字段,用于指示视点是否对应独立的一组推荐视窗。
在一些实施方式中,该独立视窗字段的取值为第一值,表示每个视点对应独立的一组推荐视窗;
该独立视窗字段的取值为第二值,表示每个视点均对应相同的一组推荐视窗。
在一些实施方式中,该推荐视窗数据盒还包括:视点实体组数据盒,用于记录组成视点的各个实体的标签。
在一些实施方式中,该封装单元602还用于:将该推荐视窗数据盒封装至该沉浸式媒体文件的封装文件中,
该数据处理装置还包括:传输单元,用于向媒体播放设备发送该沉浸式媒体文件的封装文件。
根据本申请的一个实施例,图6所示的沉浸式媒体的数据处理装置中的各个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,该沉浸式媒体的数据处理装置也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。根据本申请的另一个实施例,可以通过在包括中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的例如计算机的通用计算设备上运行能够执行如图4所示的相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造如图4所示的沉浸式媒体的数据处理装置,以及来实现本申请实施例的沉浸式媒体的数据处理方法。所述计算机程序可以记载于例如计算机可读记录介质上,并通过计算机可读记录介质装载于上述计算设备中,并在其中运行。
基于同一发明构思,本申请实施例中提供沉浸式媒体的数据处理装置解决问题的原理与有益效果与本申请方法实施例中沉浸式媒体的数据处理方法解决问题的原理和有益效果相似,可以参见方法的实施的原理和有益效果,为简洁描述,在这里不再赘述。
需要说明的是,本申请实施例还提供了一种计算机设备,该计算机设备可以是媒体制作设备,也可以是媒体播放设备,接下来对媒体制作设备和媒体播放设备分别进行介绍。
图7示出了本申请一个示例性实施例提供的一种媒体制作设备的结构示意图;该媒体制作设备可以是指沉浸式媒体的提供者所使用的计算机设备,该计算机设备可以是终端(如,PC、智能移动设备(如智能手机)等)或服务器。如图7所示,该媒体制作设备包括捕获设备801、处理器802、存储器803和发射器804。其中:
捕获设备801用于采集现实世界的声音-视觉场景获得沉浸式媒体的原始数据(包括在时间和空间上保持同步的音频内容和视频内容)。该捕获设备801可以包括但不限于:音频设备、摄像设备及传感设备。其中,音频设备可以包括音频传感器、麦克风等。摄像设备可以包括普通摄像头、立体摄像头、光场摄像头等。传感设备可以包括激光设备、雷达设备等。
处理器802(或称CPU(Central Processing Unit,中央处理器))是媒体制作设备的处理核心,该处理器802适于实现一条或多条程序指令,具体适于加载并执行一条或多条程序指令从而实现图3所示的沉浸式媒体的数据处理方法的流程。
存储器803是媒体制作设备中的记忆设备,用于存放程序和媒体资源。可以理解的是,此处的存储器803既可以包括媒体制作设备中的内置存储介质,当然也可以包括媒体制作设备所支持的扩展存储介质。需要说明的是,存储器。可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器;可选的还可以是至少一个位于远离前述处理器的存储器。存储器提供存储空间,该存储空间用于存储媒体制作设备的操作系统。并且,在该存储空间中还用于存储计算机程序,该计算机程序包括程序指令,且该程序指令适于被处理器调用并执行,以用来执行沉浸式媒体的数据处理方法的各步骤。另外,存储器803还可用于存储经处理器处理后形成的沉浸式媒体文件,该沉浸式媒体文件包括媒体文件资源和媒体呈现描述信息。
发射器804用于实现媒体制作设备与其他设备的传输交互,具体用于实现媒体制作设备与媒体播放设备之间关于进行沉浸式媒体的传输。即媒体制作设备通过发射器804来向媒体播放设备传输沉浸式媒体的相关媒体资源。
再请参见图7,处理器802可包括转换器821、编码器822和封装器823;
其中:转换器821用于对捕获到的视频内容进行一系列转换处理,使视频内容成为适合被执行沉浸式媒体的视频编码的内容。转换处理可包括:拼接和投影,可选地,转换处理还包括区域封装。转换器821可以将捕获到的3D视频内容转换为2D图像,并提供给编码器进行视频编码。
编码器822用于对捕获到的音频内容进行音频编码形成沉浸式媒体的音频码流。还用于对转换器821转换得到的2D图像进行视频编码,得到视频码流。
封装器823用于将音频码流和视频码流按照沉浸式媒体的文件格式(如ISOBMFF)封装在文件容器中形成沉浸式媒体的媒体文件资源,该媒体文件资源可以是媒体文件或媒体片段形成沉浸式媒体的媒体文件;并按照沉浸式媒体的文件格式要求采用媒体呈现描述信息记录该沉浸式媒体的媒体文件资源的元数据。封装器处理得到的沉浸式媒体的封装文件会保存在存储器中,并按需提供给媒体播放设备进行沉浸式媒体的呈现。
在一个示例性实施例中,处理器802(具体是处理器包含的各器件)通过调用存储器803中的一条或多条指令来执行图4所示的沉浸式媒体的数据处理方法的各步骤。具体地,存储器803存储有一条或多条第一指令,该一条或多条第一指令适于由处理器802加载并执行如下步骤:
获取沉浸式媒体文件的视点信息以及视窗信息;
根据该沉浸式媒体文件的视点信息以及视窗信息封装推荐视窗数据盒,其中,该推荐视窗数据盒用于定义该沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息。
作为一种可选的实施方式,处理器802通过运行存储器803中的可执行程序代码,还执行如下操作:将该推荐视窗数据盒封装至该沉浸式媒体文件的封装文件中,并向媒体播放设备发送该沉浸式媒体文件的封装文件。
基于同一发明构思,本申请实施例中提供沉浸式媒体的数据处理装置解决问题的原理与有益效果与本申请方法实施例中沉浸式媒体的数据处理方法解决问题的原理和有益效果相似,可以参见方法的实施的原理和有益效果,为简洁描述,在这里不再赘述。
图8示出了本申请一个示例性实施例提供的一种媒体播放设备的结构示意图;该媒体播放设备可以是指沉浸式媒体的使用者所使用的计算机设备,该计算机设备可以是终端(如PC、智能移动设备(如智能手机)、VR设备(如VR头盔、VR眼镜等))。如图8所示,该媒体播放设备包括接收器901、处理器902、存储器903、显示/播放装置904。其中:
接收器901用于实现解码与其他设备的传输交互,具体用于实现媒体制作设备与媒体播放设备之间关于进行沉浸式媒体的传输。即媒体播放设备通过接收器901来接收媒体制作设备传输沉浸式媒体的相关媒体资源。
处理器902(或称CPU(Central Processing Unit,中央处理器))是媒体制作设备的处理核心,该处理器902适于实现一条或多条程序指令,具体适于加载并执行一条或多条程序指令从而实现图3所示的沉浸式媒体的数据处理方法的流程。
存储器903是媒体播放设备中的记忆设备,用于存放程序和媒体资源。可以理解的是,此处的存储器903既可以包括媒体播放设备中的内置存储介质,当然也可以包括媒体播放设备所支持的扩展存储介质。需要说明的是,存储器903可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器;可选的还可以是至少一个位于远离前述处理器的存储器。存储器903提供存储空间,该存储空间用于存储媒体播放设备的操作系统。并且,在该存储空间中还用于存储计算机程序,该计算机程序包括程序指令,且该程序指令适于被处理器调用并执行,以用来执行沉浸式媒体的数据处理方法的各步骤。另外,存储器903还可用于存储经处理器处理后形成的沉浸式媒体的三维图像、三维图像对应的音频内容及该三维图像和音频内容渲染所需的信息等。
显示/播放装置904用于输出渲染得到的声音和三维图像。
再请参见图8,处理器902可包括解析器921、解码器922、转换器923和渲染器924;其中:
解析器921用于对来自媒体制作设备的渲染媒体的封装文件进行文件解封装,具体是按照沉浸式媒体的文件格式要求对媒体文件资源进行解封装,得到音频码流和视频码流;并将该音频码流和视频码流提供给解码器922。
解码器922对音频码流进行音频解码,得到音频内容并提供给渲染器进行音频渲染。另外,解码器922对视频码流进行解码得到2D图像。根据媒体呈现描述信息提供的元数据,如果该元数据指示沉浸式媒体执行过区域封装过程,该2D图像是指封装图像;如果该元数据指示沉浸式媒体未执行过区域封装过程,则该平面图像是指投影图像。
转换器923用于将2D图像转换为3D图像。如果沉浸式媒体执行过区域封装过程,转换器923还会先将封装图像进行区域解封装得到投影图像。再对投影图像进行重建处理 得到3D图像。如果渲染媒体未执行过区域封装过程,转换器923会直接将投影图像重建得到3D图像。
渲染器924用于对沉浸式媒体的音频内容和3D图像进行渲染。具体根据媒体呈现描述信息中与渲染、视窗相关的元数据对音频内容及3D图像进行渲染,渲染完成交由显示/播放装置进行输出。
在一个示例性实施例中,处理器902(具体是处理器包含的各器件)通过调用存储器803中的一条或多条指令来执行图3所示的沉浸式媒体的数据处理方法的各步骤。具体地,存储器903存储有一条或多条第二指令,该一条或多条第二指令适于由处理器902加载并执行如下步骤:
获取沉浸式媒体文件的推荐视窗数据盒,该推荐视窗数据盒用于定义该沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息;
根据当前视点的切换信息和当前推荐视窗的切换信息,确定目标视点和目标推荐视窗;
切换至该目标视点和该目标推荐视窗,进行该沉浸式媒体文件的呈现。
作为一种可选的实施方式,处理器902通过运行存储器903中的可执行程序代码,还执行如下操作:从媒体制作设备获取沉浸式媒体文件的封装文件;
对该沉浸式媒体文件的封装文件进行解封装处理,得到该沉浸式媒体文件的推荐视窗数据盒。
基于同一发明构思,本申请实施例中提供沉浸式媒体的数据处理装置解决问题的原理与有益效果与本申请方法实施例中沉浸式媒体的数据处理方法解决问题的原理和有益效果相似,可以参见方法的实施的原理和有益效果,为简洁描述,在这里不再赘述。
本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,该计算机程序适于由处理器加载并执行上述方法实施例。
本申请实施例还提供一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方法实施例。
需要说明的是,对于前述的各个方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某一些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。
本申请实施例装置中的模块可以根据实际需要进行合并、划分和删减。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,可读存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
应理解,本申请实施例的处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (17)

  1. 一种沉浸式媒体文件的数据处理方法,其中,所述方法由媒体播放设备执行,所述方法包括:
    获取沉浸式媒体文件的推荐视窗数据盒,所述推荐视窗数据盒用于定义所述沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息;
    根据当前视点的切换信息和当前推荐视窗的切换信息,确定目标视点和目标推荐视窗;
    切换至所述目标视点和所述目标推荐视窗,进行所述沉浸式媒体文件的呈现。
  2. 根据权利要求1所述的方法,其中,所述推荐视窗数据盒包括视点切换字段,所述方法还包括:
    根据所述视点切换字段确定在呈现当前视点对应的特定推荐视窗之后,是否需要切换至另一视点,所述当前视点对应至少一个推荐视窗;
    若是,执行所述根据当前视点的切换信息和当前推荐视窗的切换信息,确定目标视点和目标推荐视窗的步骤。
  3. 根据权利要求1或2所述的方法,其中,所述推荐视窗数据盒包括目标视点字段,用于指示在呈现当前推荐视窗后,切换到的目标视点;目标推荐视窗字段,用于指示在呈现当前推荐视窗并且切换至所述目标视点后,切换到的目标推荐视窗;所述根据当前视点的切换信息和当前推荐视窗的切换信息,确定目标视点和目标推荐视窗,包括:
    根据所述目标视点字段确定所述目标视点以及根据所述目标推荐视窗字段确定所述目标推荐视窗,所述目标视点字段体现所述当前视点的切换信息,所述目标推荐视窗字段体现所述当前推荐视窗的切换信息。
  4. 根据权利要求1或2所述的方法,其中,所述推荐视窗数据盒包括:独立视窗字段,用于指示视点是否对应独立的一组推荐视窗。
  5. 根据权利要求4所述的方法,其中,所述独立视窗字段的取值为第一值,表示每个视点对应独立的一组推荐视窗;
    所述独立视窗字段的取值为第二值,表示每个视点均对应相同的一组推荐视窗。
  6. 根据权利要求1或2所述的方法,其中,所述推荐视窗数据盒还包括:视点实体组数据盒,用于记录组成视点的各个实体的标签。
  7. 根据权利要求6所述的方法,其中,所述方法还包括:
    根据所述目标视点的各个实体的标签,渲染所述目标视点的各个实体。
  8. 如权利要求1所述的方法,其中,所述获取沉浸式媒体文件的推荐视窗数据盒,包括:
    从媒体制作设备获取所述沉浸式媒体文件的封装文件;
    对所述沉浸式媒体文件的封装文件进行解封装处理,得到所述沉浸式媒体文件的推荐视窗数据盒。
  9. 一种沉浸式媒体文件的数据处理方法,其中,所述方法由媒体制作设备执行,所述方法包括:
    获取沉浸式媒体文件的视点信息以及视窗信息;
    根据所述沉浸式媒体文件的视点信息以及视窗信息封装推荐视窗数据盒,其中,所述推荐视窗数据盒用于定义所述沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息。
  10. 根据权利要求9所述的方法,其中,所述推荐视窗数据盒包括:
    视点切换字段,用于指示在呈现当前视点对应的特定推荐视窗之后,是否需要切换至另一视点,其中,所述当前视点对应至少一个推荐视窗;
  11. 根据权利要求9或10所述的方法,其中,所述推荐视窗数据盒包括:目标视点字段,用于指示在呈现当前推荐视窗后,切换到的目标视点;
    目标推荐视窗字段,用于指示在呈现当前推荐视窗并且切换至所述目标视点后,切换到的目标推荐视窗。
  12. 根据权利要求9或10所述的方法,其中,所述推荐视窗数据盒还包括:视点实体组数据盒,用于记录组成视点的各个实体的标签。
  13. 一种沉浸式媒体文件的数据处理装置,其中,所述装置部署在媒体播放设备上,所述装置包括:
    获取单元,用于获取沉浸式媒体文件的推荐视窗数据盒,所述推荐视窗数据盒用于定义所述沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息;
    确定单元,用于根据当前视点的切换信息和当前推荐视窗的切换信息,确定目标视点和目标推荐视窗;
    呈现单元,用于切换至所述目标视点和所述目标推荐视窗,进行所述沉浸式媒体文件的呈现。
  14. 一种沉浸式媒体文件的数据处理装置,其中,所述装置部署在媒体制作设备上,所述装置包括:
    获取单元,用于获取沉浸式媒体文件呈现时的视点信息以及视窗信息;
    封装单元,用于根据所述沉浸式媒体文件呈现时的视点信息以及视窗信息封装推荐视窗数据盒,其中,所述推荐视窗数据盒用于定义所述沉浸式媒体文件的视点的切换信息以及推荐视窗的切换信息。
  15. 一种计算机设备,其中,所述计算机设备包括:
    处理器,适于执行计算机程序;
    计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1-8中任一项所述的沉浸式媒体文件的数据处理方法,或实现如权利要求9至12中任一项所述的沉浸式媒体文件的数据处理方法。
  16. 一种计算机可读存储介质,其中,用于存储计算机程序,所述计算机程序使得计算机执行如权利要求1-8中任一项所述的沉浸式媒体文件的数据处理方法,或实现如权利要求9至12中任一项的沉浸式媒体文件的数据处理方法。
  17. 一种计算机程序产品,当所述计算机程序产品被执行时,用于实现如权利要求1-8中任一项所述的沉浸式媒体文件的数据处理方法,或实现如权利要求9至12中任一项的沉浸式媒体文件的数据处理方法。
PCT/CN2021/131108 2020-12-02 2021-11-17 沉浸式媒体的数据处理方法、装置和计算机可读存储介质 WO2022116822A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21899856.5A EP4258222A4 (en) 2020-12-02 2021-11-17 DATA PROCESSING METHOD AND APPARATUS FOR IMMERSIVE MEDIA AND COMPUTER-READABLE STORAGE MEDIUM
US17/961,003 US20230025664A1 (en) 2020-12-02 2022-10-06 Data processing method and apparatus for immersive media, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011399700.4A CN114581631A (zh) 2020-12-02 2020-12-02 沉浸式媒体的数据处理方法、装置和计算机可读存储介质
CN202011399700.4 2020-12-02

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/961,003 Continuation US20230025664A1 (en) 2020-12-02 2022-10-06 Data processing method and apparatus for immersive media, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2022116822A1 true WO2022116822A1 (zh) 2022-06-09

Family

ID=81767898

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/131108 WO2022116822A1 (zh) 2020-12-02 2021-11-17 沉浸式媒体的数据处理方法、装置和计算机可读存储介质

Country Status (4)

Country Link
US (1) US20230025664A1 (zh)
EP (1) EP4258222A4 (zh)
CN (1) CN114581631A (zh)
WO (1) WO2022116822A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115941995A (zh) * 2021-08-23 2023-04-07 腾讯科技(深圳)有限公司 媒体文件封装与解封装方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106937156A (zh) * 2015-12-31 2017-07-07 幸福在线(北京)网络技术有限公司 一种实现多资源同步播放的方法及装置和媒体播放器
CN108111899A (zh) * 2017-12-29 2018-06-01 中兴通讯股份有限公司 视频传输方法、客户端、服务器
CN108632674A (zh) * 2017-03-23 2018-10-09 华为技术有限公司 一种全景视频的播放方法和客户端
EP3422701A1 (en) * 2017-03-03 2019-01-02 Clicked Inc. Method for reproducing virtual reality image and program using same
US20190220953A1 (en) * 2016-09-29 2019-07-18 Huawei Technologies Co., Ltd. Panoramic video playback method and apparatus
CN110876051A (zh) * 2018-08-29 2020-03-10 中兴通讯股份有限公司 视频数据的处理,传输方法及装置,视频数据的处理系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117499686A (zh) * 2018-04-05 2024-02-02 Vid拓展公司 用于全向视频的视点元数据
WO2020068935A1 (en) * 2018-09-27 2020-04-02 Futurewei Technologies, Inc. Virtual reality viewpoint viewport center point correspondence signaling

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106937156A (zh) * 2015-12-31 2017-07-07 幸福在线(北京)网络技术有限公司 一种实现多资源同步播放的方法及装置和媒体播放器
US20190220953A1 (en) * 2016-09-29 2019-07-18 Huawei Technologies Co., Ltd. Panoramic video playback method and apparatus
EP3422701A1 (en) * 2017-03-03 2019-01-02 Clicked Inc. Method for reproducing virtual reality image and program using same
CN108632674A (zh) * 2017-03-23 2018-10-09 华为技术有限公司 一种全景视频的播放方法和客户端
CN108111899A (zh) * 2017-12-29 2018-06-01 中兴通讯股份有限公司 视频传输方法、客户端、服务器
CN110876051A (zh) * 2018-08-29 2020-03-10 中兴通讯股份有限公司 视频数据的处理,传输方法及装置,视频数据的处理系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4258222A4

Also Published As

Publication number Publication date
US20230025664A1 (en) 2023-01-26
CN114581631A (zh) 2022-06-03
EP4258222A4 (en) 2024-05-22
EP4258222A1 (en) 2023-10-11

Similar Documents

Publication Publication Date Title
KR102246002B1 (ko) 가상 현실 미디어 콘텐트의 스트리밍을 개선하는 방법, 디바이스, 및 컴퓨터 프로그램
CN114503599A (zh) 使用gltf2场景描述中的扩展来支持视频和音频数据
JP2020503792A (ja) 情報処理方法および装置
CN112219403B (zh) 沉浸式媒体的渲染视角度量
US20220369000A1 (en) Split rendering of extended reality data over 5g networks
CN113891117B (zh) 沉浸媒体的数据处理方法、装置、设备及可读存储介质
WO2023207119A1 (zh) 沉浸媒体的处理方法、装置、设备及存储介质
WO2024037137A1 (zh) 一种沉浸媒体的数据处理方法、装置、设备、介质和产品
WO2023061131A1 (zh) 媒体文件封装方法、装置、设备及存储介质
CN113574903B (zh) 针对媒体内容中的后期绑定的方法和装置
WO2022116822A1 (zh) 沉浸式媒体的数据处理方法、装置和计算机可读存储介质
WO2021244132A1 (zh) 沉浸媒体的数据处理方法、装置、设备及计算机存储介质
WO2024041239A1 (zh) 一种沉浸媒体的数据处理方法、装置、设备、存储介质及程序产品
CN114116617A (zh) 点云媒体的数据处理方法、装置、设备及可读存储介质
WO2023226504A1 (zh) 一种媒体数据处理方法、装置、设备以及可读存储介质
KR20240007142A (ko) 5g 네트워크들을 통한 확장 현실 데이터의 분할 렌더링
TWI796989B (zh) 沉浸媒體的數據處理方法、裝置、相關設備及儲存媒介
EP3776484A1 (en) Associating file format objects and dynamic adaptive streaming over hypertext transfer protocol (dash) objects
WO2022037423A1 (zh) 点云媒体的数据处理方法、装置、设备及介质
US12010402B2 (en) Data processing for immersive media
US20230360678A1 (en) Data processing method and storage medium
WO2022111348A1 (zh) 点云媒体的数据处理方法、装置、设备及存储介质
US20230062933A1 (en) Data processing method, apparatus, and device for non-sequential point cloud media
WO2023169004A1 (zh) 点云媒体的数据处理方法、装置、设备及介质
WO2024114519A1 (zh) 点云封装与解封装方法、装置、介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21899856

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021899856

Country of ref document: EP

Effective date: 20230703