WO2021259054A1 - 沉浸媒体数据的处理方法及装置、存储介质和电子装置 - Google Patents

沉浸媒体数据的处理方法及装置、存储介质和电子装置 Download PDF

Info

Publication number
WO2021259054A1
WO2021259054A1 PCT/CN2021/098689 CN2021098689W WO2021259054A1 WO 2021259054 A1 WO2021259054 A1 WO 2021259054A1 CN 2021098689 W CN2021098689 W CN 2021098689W WO 2021259054 A1 WO2021259054 A1 WO 2021259054A1
Authority
WO
WIPO (PCT)
Prior art keywords
viewing space
dimensional viewing
dimensional
space
media
Prior art date
Application number
PCT/CN2021/098689
Other languages
English (en)
French (fr)
Inventor
李秋婷
黄成�
白雅贤
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to KR1020237002677A priority Critical patent/KR20230028489A/ko
Priority to CA3178737A priority patent/CA3178737A1/en
Priority to EP21830152.1A priority patent/EP4171026A4/en
Priority to US17/922,086 priority patent/US20230169719A1/en
Publication of WO2021259054A1 publication Critical patent/WO2021259054A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
    • H04N13/117Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation the virtual viewpoint locations being selected by the viewers or determined by viewer tracking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/122Improving the 3D impression of stereoscopic images by modifying image signal contents, e.g. by filtering or adding monoscopic depth cues
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/172Processing image signals image signals comprising non-image signal components, e.g. headers or format information
    • H04N13/178Metadata, e.g. disparity information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/366Image reproducers using viewer tracking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2016Rotation, translation, scaling

Definitions

  • the embodiments of the present disclosure relate to the field of communications, and in particular to a method and device for processing immersive media data, a storage medium, and an electronic device.
  • Immersive media allows users to experience a highly realistic virtual space environment in terms of vision and hearing through audio and video technologies, and create an immersive feeling.
  • the immersive experience mainly supports flat panoramic videos.
  • the user can watch 360-degree videos through the free rotation of the head while wearing a head-mounted display device, that is, an immersive experience of 3 degrees of freedom (3DOF).
  • 3DOF+ enhanced three degrees of freedom
  • 6DOF six degrees of freedom
  • users can perform body translation and head rotation according to their needs during the viewing process to see more details, such as being blocked Part of the visual content.
  • the display device needs to select appropriate video data from the media data for reconstruction and rendering according to the user's viewing posture (body position, head orientation).
  • the visual content within the current window is presented to the user to satisfy the user's sense of immersion.
  • the movable range of the user in space is directly related to the location and direction of the video data collection device.
  • the display device will not be able to reconstruct the current video data from the collected video data.
  • Visual content in the window then the visual content viewed by the user will gradually fade out, be distorted, and have no sense of immersion.
  • the user’s three-dimensional viewing space needs to be limited.
  • the user can normally experience the immersive media through the rendering of video data. If it exceeds the three-dimensional viewing space, After the video data is reconstructed and rendered, it can continue to watch, restore immersion, or fade out without further processing.
  • the immersive media system limits the user's three-dimensional viewing space based on a single viewpoint, and the movable range is extremely limited, indicating that the geometric structure of the three-dimensional viewing space is simple, such as a cube, a circle or a cylinder, etc., due to the three-dimensional viewing space
  • the size and shape of the camera are necessarily related to the location and quantity of the collection equipment. How to accurately describe the user's three-dimensional viewing space will affect the rendering and presentation of the visual content viewed by the user.
  • the present disclosure proposes a processing method and device, storage medium and electronic device for immersive media data, which are suitable for limiting the viewing position range when viewing immersive media based on multi-viewpoint or point cloud data, which is helpful for the viewing process Reconstruction and rendering of medium-speed, high-efficiency, and high-quality visual content to meet the best immersive experience.
  • the embodiments of the present disclosure provide a method and device for processing immersive media data, a storage medium, and an electronic device, so as to at least solve the problems in related technologies that affect the rendering and presentation of visual content viewed by users.
  • a method for processing immersive media data includes: acquiring immersive media data corresponding to a user's current viewing posture, wherein a three-dimensional viewing space is used to limit the user's viewing content; rendering the immersive media data The immersive media data corresponding to the current viewing posture, wherein the immersive media data obtained is rendered when the current viewing posture is in the three-dimensional viewing space; or, the current viewing posture has moved or is moving to the When outside the three-dimensional viewing space, the immersive media data is rendered according to the processing information of the three-dimensional viewing space.
  • the acquiring the immersive media data corresponding to the current viewing posture of the user includes: determining the relationship between the current viewing posture and the three-dimensional viewing space, wherein the three-dimensional viewing space is a user allowed The range of movement in the immersive media scene; the immersive media data is determined according to the relationship.
  • the determining the relationship between the current viewing posture and the three-dimensional viewing space includes: determining a position of the current viewing posture in the three-dimensional viewing space or outside the three-dimensional viewing space;
  • the immersive media data is determined according to the location.
  • before rendering the immersive media data corresponding to the current viewing posture it includes: determining the processing information describing the three-dimensional viewing space in the media file according to the data box type or the timed metadata track sample entry type Media track or data box.
  • determining the media track or data box describing the three-dimensional viewing space in the media file according to the data box type or the timed metadata track sample entry type includes one of the following methods: according to the first data box type Identify the first three-dimensional viewing space data box in the file header of the media file; or, identify the second three-dimensional viewing space grouping data box in the file header of the media file according to the second grouping type; or, according to the first
  • the three data box types identify the third three-dimensional viewing space data box in one or more media tracks in the media file; or, according to the fourth sample grouping type, identify the fourth three-dimensional viewing space in one media track in the media file
  • the sample group description data box; or, according to the fifth track group type, the fifth three-dimensional viewing space data box in one of the media tracks in the media file is identified, wherein one or more media tracks with the same track group identifier belong to the same three-dimensional viewing Space; or, identifying the three-dimensional viewing space timing metadata track in the media file according to the sixth sample entry type, the three
  • the information of the three-dimensional viewing space includes at least one of the following: the coordinate position of the three-dimensional viewing space in the spatial scene of the immersive media, and the three-dimensional viewing space in the spatial scene of the immersive media
  • the geometric structure is constructed by a combination of one or more basic structures; each of the basic structures corresponds to zero or one collection devices or collected views of the immersive media data.
  • before rendering the immersive media data corresponding to the current viewing posture it includes: determining the processing information describing the three-dimensional viewing space in the media file according to the data box type or the timed metadata track sample entry type Media tracks or data boxes.
  • the determining the media track or the data box describing the processing information of the three-dimensional viewing space in the media file according to the data box type or the timed metadata track sample entry type includes: passing the seventh data box The type identifies the three-dimensional viewing space processing data box, wherein the three-dimensional viewing space processing data box is included in the data box describing the three-dimensional viewing space, or is in the same way as the data box describing the three-dimensional viewing space In the same upper-level data box; or, the three-dimensional viewing space processing data box is included in the three-dimensional viewing space timing metadata track; or, identifying the three-dimensional viewing space processing timing element in the media file according to the eighth sample entry type A data track, the three-dimensional viewing space processing timing metadata track indicates a dynamically changing three-dimensional viewing space processing mode of the three-dimensional viewing space.
  • the processing information of the three-dimensional viewing space includes at least one of the following: the number of options for processing the three-dimensional viewing space, the type of equipment for processing the three-dimensional viewing space, the application type of the three-dimensional viewing space processing, and the three-dimensional viewing space The processing method, the identification of the three-dimensional viewing space.
  • acquiring the immersive media data corresponding to the current viewing posture of the user includes: directly acquiring the immersive media data when it is determined that the current viewing posture of the user is in the three-dimensional viewing space Or, in the case where it is determined that the current viewing posture of the user's forehead is moving or has moved outside the three-dimensional viewing space, the immersion corresponding to the current viewing posture is obtained according to the processing information of the three-dimensional viewing space Media data.
  • acquiring the immersive media data includes: determining a media presentation description file, wherein the media presentation description file includes a three-dimensional viewing space descriptor and/or three-dimensional viewing space processing indicating viewing of the immersive media Descriptor; request to obtain the immersive media data corresponding to the current viewing posture according to the three-dimensional viewing space descriptor and/or the three-dimensional viewing space processing descriptor corresponding to the current viewing posture of the user.
  • the three-dimensional viewing space descriptor of the immersive media includes at least one of the following: basic geometry structure, rotation direction of the basic geometry, identification or index of the view or acquisition device corresponding to the basic geometry, and rotation of the basic geometry , Viewing direction in basic geometry, basic geometry combination, complex combination combination, three-dimensional viewing space logo.
  • the 3D viewing space processing descriptor of the immersive media includes at least one of the following: device type for 3D viewing space processing, 3D viewing space processing application type, 3D viewing space processing method, 3D viewing space logo.
  • an immersive media data processing device including: an acquiring unit configured to acquire immersive media data corresponding to the user’s current viewing posture, wherein a three-dimensional viewing space is used to restrict the user’s Viewing content; a rendering unit configured to render the immersive media data corresponding to the current viewing posture, wherein, when the current viewing posture is in the three-dimensional viewing space, the acquired immersive media data is rendered; or, the current When the viewing posture has moved or is moving outside the three-dimensional viewing space, the immersive media data is rendered according to the processing information of the three-dimensional viewing space.
  • the acquiring unit includes: a first determining module, configured to determine the relationship between the current viewing posture and the three-dimensional viewing space, wherein the three-dimensional viewing space is for allowing the user to be immersed in the media scene The range of movement in the middle; the second determining module is configured to determine the immersive media data according to the relationship.
  • the first determining module includes: a first determining sub-module configured to determine the position of the current viewing posture in the three-dimensional viewing space or outside the three-dimensional viewing space; and second determining The sub-module is configured to determine the immersive media data according to the position.
  • the device further includes: a first determining unit configured to determine that the immersive media data corresponding to the current viewing posture is in the media file according to the data box type or the timing metadata sample entry type before rendering A media track or data box describing the processing information of the three-dimensional viewing space.
  • the first determining unit is further configured to perform one of the following: identifying the first three-dimensional viewing space data box in the file header of the media file according to the first data box type; or, according to The second grouping type identifies the second three-dimensional viewing space grouping data box in the file header of the media file; or, according to the third data box type, identifying the third three-dimensional viewing in one or more media tracks in the media file Spatial data box; or, identify the fourth three-dimensional viewing space sample group description data box in a media track in the media file according to the fourth sample grouping type; or, identify one medium in the media file according to the fifth track group type
  • the fifth three-dimensional viewing space data box in the track wherein one or more media tracks with the same track group identify that one or more media tracks belong to the same three-dimensional viewing space; or, identifying the three-dimensional viewing space timing metadata in the media file according to the sixth sample entry type Track, the three-dimensional viewing space timing metadata track indicates the dynamically changing three-dimensional viewing space of the immersive media data.
  • the information of the three-dimensional viewing space includes at least one of the following: the coordinate position of the three-dimensional viewing space in the spatial scene of the immersive media, and the three-dimensional viewing space in the spatial scene of the immersive media
  • the geometric structure is constructed by a combination of one or more basic structures; each of the basic structures corresponds to zero or one collection devices or collected views of the immersive media data.
  • the device further includes: a second determining unit configured to determine the media data according to the data box type or the timed metadata track sample entry type before rendering the immersive media data corresponding to the current viewing posture.
  • the file describes the media track or data box of the processing information of the three-dimensional viewing space.
  • the second determining unit includes: an identification module configured to identify the three-dimensional viewing space processing data box through a seventh data box type, wherein the three-dimensional viewing space processing data box is included in the description
  • the data box of the three-dimensional viewing space is either in the same superior data box as the data box describing the three-dimensional viewing space; or, the three-dimensional viewing space processing data box is included in the three-dimensional viewing space
  • the processing information of the three-dimensional viewing space includes at least one of the following: the number of options for processing the three-dimensional viewing space, the type of equipment for processing the three-dimensional viewing space, the application type of the three-dimensional viewing space processing, and the three-dimensional viewing space The processing method, the identification of the three-dimensional viewing space.
  • the acquiring unit includes: a first acquiring module configured to directly acquire the immersive media data when it is determined that the current viewing posture of the user is in the three-dimensional viewing space; or The second obtaining module is configured to obtain the corresponding current viewing posture according to the processing information of the three-dimensional viewing space when it is determined that the current viewing posture of the user's forehead is moving or has moved outside the three-dimensional viewing space.
  • a first acquiring module configured to directly acquire the immersive media data when it is determined that the current viewing posture of the user is in the three-dimensional viewing space
  • the second obtaining module is configured to obtain the corresponding current viewing posture according to the processing information of the three-dimensional viewing space when it is determined that the current viewing posture of the user's forehead is moving or has moved outside the three-dimensional viewing space.
  • the first acquiring module or the second acquiring module is further configured to: determine a media presentation description file, wherein the media presentation description file includes a three-dimensional viewing space description indicating viewing immersive media Sub and/or three-dimensional viewing space processing descriptor; according to the three-dimensional viewing space descriptor and/or three-dimensional viewing space processing descriptor corresponding to the current viewing posture of the user, request to obtain the immersive media data corresponding to the current viewing posture.
  • the three-dimensional viewing space descriptor of the immersive media includes at least one of the following: basic geometry structure, rotation direction of the basic geometry, identification or index of the view or acquisition device corresponding to the basic geometry, and rotation of the basic geometry , Viewing direction in basic geometry, basic geometry combination, complex combination combination, three-dimensional viewing space logo.
  • the 3D viewing space processing descriptor of the immersive media includes at least one of the following: device type for 3D viewing space processing, 3D viewing space processing application type, 3D viewing space processing method, 3D viewing space logo.
  • a computer-readable storage medium in which a computer program is stored, wherein the computer program is configured to execute any of the above methods when running Steps in the embodiment.
  • an electronic device including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute any of the above Steps in the method embodiment.
  • a three-dimensional viewing space is used to limit the user’s viewing content; the immersive media data corresponding to the current viewing posture is rendered, wherein the current When the viewing posture is in the three-dimensional viewing space, render the acquired immersive media data; or, when the current viewing posture has moved or is moving outside the three-dimensional viewing space, according to the processing of the three-dimensional viewing space.
  • the method of information rendering the immersive media data because when the user moves outside the three-dimensional viewing space, the immersive media data can be rendered according to the processing information of the three-dimensional viewing space, so that even if the user moves outside the three-dimensional viewing space, it can be fast and efficient Provide users with high-quality visual content. Therefore, it can solve the rendering and presentation problems that affect the user’s viewing of visual content, and achieve fast, efficient and high-quality visual content reconstruction and rendering during the viewing process to meet the optimal Immersive experience effect.
  • Fig. 1 is a schematic structural diagram of an execution device of an immersive media data processing method according to an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a method for processing immersive media data according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a three-dimensional viewing space of a processing method for immersive media data according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of the system structure of a method for processing immersive media data according to an embodiment of the present disclosure
  • Fig. 5 is a flowchart of a method for processing immersive media data according to an embodiment of the present disclosure
  • Fig. 6 is a schematic diagram of an IOS file of a method for processing immersive media data according to an embodiment of the present disclosure
  • FIG. 7 is a schematic diagram of an IOS file of a method for processing immersive media data according to an embodiment of the present disclosure
  • FIG. 8 is a schematic diagram of an IOS file of a method for processing immersive media data according to an embodiment of the present disclosure
  • FIG. 9 is a schematic diagram of an IOS file of a method for processing immersive media data according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of an IOS file of a method for processing immersive media data according to an embodiment of the present disclosure
  • FIG. 11 is a schematic flowchart of a method for processing immersive media data according to an embodiment of the present disclosure
  • Fig. 12 is a schematic structural diagram of an immersive media data processing device according to an embodiment of the present disclosure.
  • FIG. 1 is a hardware structural block diagram of a mobile terminal of an immersive media data processing method according to an embodiment of the present disclosure.
  • the mobile terminal may include one or more (only one processor 102 is shown in FIG. 1 (the processor 102 may include, but is not limited to, a Microprocessor Unit, MPU for short) or a programmable logic device.
  • MPU Microprocessor Unit
  • the above-mentioned mobile terminal may also include a transmission device 106 set as a communication function and an input/output device 108.
  • a transmission device 106 set as a communication function
  • an input/output device 108 A person of ordinary skill in the art can understand that the figure The structure shown in 1 is only for illustration, and it does not limit the structure of the above-mentioned mobile terminal.
  • the mobile terminal may also include more or fewer components than those shown in FIG. 1, or have different components from those shown in FIG. Configuration.
  • the memory 104 may be configured to store computer programs, for example, software programs and modules of application software, such as the computer programs corresponding to the immersive media data processing method in the embodiments of the present disclosure.
  • the processor 102 runs the computer programs stored in the memory 104 , So as to perform various functional applications and data processing, that is, to achieve the above-mentioned methods.
  • the memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 104 may further include a memory remotely provided with respect to the processor 102, and these remote memories may be connected to the mobile terminal through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the transmission device 106 is configured to receive or transmit data via a network.
  • the above-mentioned specific examples of the network may include a wireless network provided by a communication provider of a mobile terminal.
  • the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station to communicate with the Internet.
  • the transmission device 106 may be a radio frequency (Radio Frequency, referred to as RF) module, which is configured to communicate with the Internet in a wireless manner.
  • RF Radio Frequency
  • FIG. 2 is a flowchart of a method for processing immersive media data according to an embodiment of the present disclosure. As shown in FIG. 2, the process includes the following steps:
  • Step S202 Obtain immersive media data corresponding to the user's current viewing posture, where a three-dimensional viewing space is used to limit the user's viewing content;
  • Step S204 rendering the immersive media data corresponding to the current viewing posture, wherein when the current viewing posture is in the three-dimensional viewing space, rendering the acquired immersive media data; or, the current viewing posture has been moved to Or when moving outside the three-dimensional viewing space, rendering the immersive media data according to processing information of the three-dimensional viewing space.
  • the three-dimensional viewing space in the present disclosure is a space that allows the user to move.
  • a virtual camera is set in the three-dimensional viewing space.
  • the virtual camera can capture the content in the three-dimensional viewing space and feed it back to users in the real space.
  • the virtual camera moves or rotates in the three-dimensional viewing space.
  • the three-dimensional viewing space is the range that the system recommends to the user, that is, the virtual camera to move.
  • the user or the virtual camera can also be moved outside the three-dimensional viewing space, and it is not necessary to move in the three-dimensional viewing space.
  • the aforementioned use of the three-dimensional viewing space to restrict the rendering of the user's viewing content means that after the virtual camera captures the content in the three-dimensional viewing space, it will be passed to the user for viewing. At this time, the content captured by the virtual camera needs to be rendered. That is, the content viewed by the user is rendered.
  • the obtaining the immersive media data corresponding to the current viewing posture of the user includes: determining the relationship between the current viewing posture and the three-dimensional viewing space, where the three-dimensional viewing space is a virtual space range that allows the user to move;
  • the immersive media data is determined according to the relationship.
  • the determining the relationship between the current viewing posture and the three-dimensional viewing space includes: determining the position of the current viewing posture in the three-dimensional viewing space or outside the three-dimensional viewing space; and determining the position of the current viewing posture according to the position.
  • the three-dimensional viewing space of immersive media data is described.
  • the immersive media data corresponding to the current viewing posture before rendering the immersive media data corresponding to the current viewing posture, it includes: determining the media track or data box describing the processing information of the three-dimensional viewing space in the media file according to the data box type or the timing metadata sample entry type.
  • determining the media track or data box describing the three-dimensional viewing space in the media file according to the data box type or the timed metadata track sample entry type includes one of the following methods: identifying the media file in the media file according to the first data box type The first three-dimensional viewing space data box in the file header of the media file; or, the second three-dimensional viewing space grouping data box in the file header of the media file is identified according to the second grouping type; or the third data box type is identified A third three-dimensional viewing space data box in one or more media tracks in the media file; or, identifying a fourth three-dimensional viewing space sample group description data box in one media track in the media file according to a fourth sample grouping type; Or, identify the fifth three-dimensional viewing space data box in one media track in the media file according to the fifth track group type, wherein one or more media tracks with the same track group identifier belong to the same three-dimensional viewing space; or, according to the first The six-sample entry type identifies a three-dimensional viewing space timing metadata track in the
  • the processing information of the three-dimensional viewing space includes at least one of the following: the coordinate position of the three-dimensional viewing space in the spatial scene of the immersive media, the orientation of the three-dimensional viewing space in the spatial scene of the immersive media, and The geometric structure of the three-dimensional viewing space and the viewing direction in the three-dimensional viewing space, wherein the structure of the three-dimensional viewing space is constructed by a combination of one or more complex geometric structures, and each of the complex geometric structures is composed of one or A plurality of basic structures are combined and constructed; each of the basic structures corresponds to zero or one collection devices or collected views of the immersive media data.
  • the immersive media data corresponding to the current viewing posture before rendering the immersive media data corresponding to the current viewing posture, it includes: determining the media track or data box describing the processing information of the three-dimensional viewing space in the media file according to the data box type or the timing metadata track sample entry type .
  • the determining the media track or the data box describing the processing information of the three-dimensional viewing space in the media file according to the data box type or the timed metadata track sample entry type includes: identifying the three-dimensional viewing through the seventh data box type A spatial processing data box, wherein the three-dimensional viewing space processing data box is contained in the data box describing the three-dimensional viewing space, or is in the same upper-level data box as the data box describing the three-dimensional viewing space Or, the three-dimensional viewing space processing data box is included in the three-dimensional viewing space timing metadata track; or, the three-dimensional viewing space processing timing metadata track in the media file is identified according to the eighth sample entry type, the three-dimensional The viewing space processing timing metadata track indicates the dynamically changing three-dimensional viewing space processing mode of the three-dimensional viewing space.
  • the processing information of the three-dimensional viewing space includes at least one of the following: the number of options for processing the three-dimensional viewing space, the type of equipment for processing the three-dimensional viewing space, the type of application for processing the three-dimensional viewing space, the processing method of the three-dimensional viewing space, and the three-dimensional viewing space. logo.
  • obtaining the immersive media data corresponding to the current viewing posture of the user includes: directly obtaining the immersive media data when it is determined that the current viewing posture of the user is in the three-dimensional viewing space; or, after determining the location
  • the immersive media data corresponding to the current viewing posture is acquired according to processing information of the three-dimensional viewing space.
  • obtaining the immersive media data includes: determining a media presentation description file, wherein the media presentation description file includes a three-dimensional viewing space descriptor and/or a three-dimensional viewing space processing descriptor indicating the viewing of the immersive media; The three-dimensional viewing space descriptor and/or the three-dimensional viewing space processing descriptor corresponding to the current viewing posture of the user requests to obtain the immersive media data corresponding to the current viewing posture.
  • the three-dimensional viewing space descriptor of the immersive media includes at least one of the following: basic geometry structure, rotation direction of the basic geometry, identification or index of the view or acquisition device corresponding to the basic geometry, rotation of the basic geometry, viewing in the basic geometry Direction, basic geometry combination, complex combination combination, three-dimensional viewing space logo.
  • the 3D viewing space processing descriptor of the immersive media includes at least one of the following: the type of 3D viewing space processing equipment, the application type of 3D viewing space processing, the 3D viewing space processing mode, and the identification of the 3D viewing space.
  • the present disclosure can be applied to the process of immersive experience.
  • the user watches the scene in the three-dimensional viewing space by wearing the display device.
  • the user can move in the real space, which is mapped to the movement of the user's virtual character in the three-dimensional viewing space, and the immersive media data captured by the corresponding virtual character is played on the display device.
  • the immersive media data is rendered according to the processing information of the three-dimensional viewing space.
  • the three-dimensional viewing space refers to the range in which the user's virtual character can move in the immersive media scene.
  • the three-dimensional viewing posture refers to the position and viewing direction of the virtual character corresponding to the user in the three-dimensional viewing space (virtual camera orientation). If the virtual character exceeds the range of the three-dimensional viewing space, related technologies may cause problems such as a black screen.
  • the immersive media data can be rendered according to the processing information of the three-dimensional viewing space, so as to realize fast, efficient, and high-quality visual content reproduction during the viewing process. The effect of structure and rendering. Solve the black screen problem.
  • Figure 3 is a schematic diagram of an optional three-dimensional viewing space structure location.
  • the immersive video capture device (camera) is placed at different locations to capture video data in the scene. Based on the location of the capture device, one or more three-dimensional viewing spaces can be formed.
  • the three-dimensional viewing space may be a basic geometric body represented by a single cube, a sphere, a cylinder, etc., or may be a complex geometric body formed by a combination of multiple basic geometric bodies.
  • the display device can obtain the media data corresponding to the user window, reconstruct and render the image in the window, and quickly switch the content presentation in the user window; when the user moves to the 3D viewing space After being out of range, because there is no media data collected by the corresponding device, the display device cannot display the corresponding visual content in the user's window, or can only display low-quality visual content.
  • the three-dimensional viewing space 1-3 in FIG. 3 is a different situation.
  • the viewing direction of the user may also be restricted by the location and orientation of the collection device.
  • Figure 4 is a schematic structural diagram of an optional immersive media data processing system.
  • the system includes a transmission server 402 and a user terminal 404. in,
  • the transmission server 402 includes at least a storage module 402-1 and a transmission module 402-2,
  • the storage module 402-1 is configured to store the media files produced in the production server 10.
  • the transmission module 402-2 is configured to receive a request message from the user terminal or send a stored media file.
  • the above-mentioned receiving or sending can be realized through a wireless network provided by a communication provider, a locally established wireless local area network, or a wired method;
  • the user terminal 404 includes at least a transmission module 404-1, a decoding and decapsulation module 404-2, a media processing module 404-3, and a display module 404-4.
  • the transmission module 404-1 is configured to receive the media file sent by the media server 402, or send a request message to the server 402, such as requesting media file download;
  • the decoding and decapsulation module 404-2 is configured to decode and decapsulate the media file received by the transmission module;
  • the media processing module 404-3 performs processing such as reconstruction and rendering on the media data output by the decoding and decapsulation module 404-2;
  • the display module 404-4 is configured to present the visual content of the user's current window to the user.
  • This embodiment provides a processing/playing procedure of a media system based on a three-dimensional viewing space and a representation of a file format. As shown in FIG. 5, the procedure includes the following steps:
  • Step S501 the user's viewing posture (viewing position, viewing direction) is changed, and the user terminal obtains a corresponding media file based on the three-dimensional viewing space range to which the user's current posture information belongs, and the media file includes immersive media data;
  • step S502 the user terminal decapsulates and decodes the obtained media file, reconstructs the visual content in the current window according to the user's current viewing position, viewing direction and other information, and renders the content to the user for viewing.
  • an implementation manner is to store omnidirectional video data in a file based on the ISO (International Organization for Standardization) basic media file format.
  • the ISO basic media file formats such as restricted scheme information box, track reference box, track group box, etc. can refer to MPEG-4 formulated by ISO/IEC JTC1/SC29/WG11 Moving Picture Experts Group (MPEG for short) Part 12 ISO Base Media File Format to operate.
  • the projection, packaging steps and basic format of omnidirectional video can refer to MPEG-I Part 2 OMAF (omnidirectional media format) formulated by ISO/IEC JTC1/SC29/WG11 Motion Picture Experts Group (MPEG).
  • the ISO basic file format represented by MP4 files is composed of several boxes, and each box has a type and length, which can be regarded as a data object .
  • a box can contain another box, called a container box.
  • An MP4 file will first and only have a "ftyp” type box, which serves as a mark of the file format and contains some information about the file. After that, there will be only one "MOOV” type box (Movie Box), which is a container box, and its child box contains the metadata information of the media.
  • the media data of the MP4 file is contained in the "mdat" type box (Media Data Box), which is also a container box.
  • Metadata box is optionally used, which is also a container box, used to describe some general or additional non-timed metadata.
  • a media can be composed of one or more tracks, each track is a media sequence that changes over time, and a track contains a continuous collection of samples.
  • timing metadata track is a mechanism in the ISO Base Media File Format (ISOBMFF) for establishing timing metadata associated with a specific sample. Timing metadata is less coupled with media data and is usually "descriptive”.
  • ISOBMFF ISO Base Media File Format
  • a data box or sample entry is defined in the media file to describe the three-dimensional viewing space of the scene.
  • the three-dimensional viewing space can be described at the file level or the media track level.
  • the specific description method of the three-dimensional viewing space may adopt one of the following methods:
  • Manner 1 The three-dimensional viewing space is described at the file level, and the three-dimensional viewing space data box (ViewingSpaceBox) is defined to be included in the file-level Metabox (shown in FIG. 6). The following describes the optional implementation of the three-dimensional viewing space data box.
  • ViewingSpaceBox (3D viewing space data box)
  • aligned(8)class ViewingSpaceBox extends FullBox('vwsp',0,flags) ⁇ signed int(8)num_viewing_space;//Optional, if there is only one viewing space in a media track or a media track group, it is not applicable.
  • num_viewing_space indicates the number of 3D viewing space corresponding to the media file
  • viewing_space_id indicating the identification of the three-dimensional viewing space
  • the structure of the three-dimensional viewing space is represented by ViewingSpaceStruct(), and the spatial structure is obtained by combining one or more complex geometric bodies, usually by combining CSG (Constructive Solid Geometry); Complex geometry is obtained by orderly adding and combining basic geometry (cubes, cylinders, etc.), usually by CSG or sequential interpolation.
  • CSG Consstructive Solid Geometry
  • Complex geometry is obtained by orderly adding and combining basic geometry (cubes, cylinders, etc.), usually by CSG or sequential interpolation.
  • the basic geometric bodies are common shapes such as cubes, spheres, and hemispheres, which are described below in conjunction with the optional implementation of basic geometric body structures.
  • Cuboidstruct(), SpheroidStruct(), and HalfspaceStruct() represent cubes, spheres, and hemispheres, respectively .
  • center_x, center_y, center_z respectively indicate the position of the center point of the geometric structure in the coordinate system
  • size_x, size_y, size_z respectively indicate the side length of the cube in the x, y, and z directions;
  • radius_x, radius_y, radius_z respectively indicate the radius of the sphere in the x, y, and z dimensions
  • normal_x, normal_y, and normal_z respectively indicate the normal direction of the plane defining the hemisphere
  • camera_inferred_flag indicates whether the position of the simple geometry corresponds to the acquisition device; 0 means it has nothing to do with the position of the acquisition device, and the center point position needs to be defined by yourself, and 1 means it is related to the acquisition device, and the acquisition device can be used location information;
  • shapeSpaceStruct which is constructed from one or more basic geometry.
  • shapeSpaceStruct which is constructed from one or more basic geometry.
  • the following describes the optional implementation of the three-dimensional viewing space structure.
  • num_primitive_shape indicating the number of basic geometry that composes the three-dimensional viewing space
  • primitive_shape_operation indicates the operation mode of the basic geometry shape that composes the three-dimensional viewing space; when it is 0, it indicates the use of CSG mode to add the basic geometry to form a complex geometry; when it is 1, it indicates that the basic geometry will be aligned along the path formed by the center of the basic geometry Perform interpolation operations to form complex geometry;
  • camera_inferred_flag when 1, indicates that the alignment and direction of the basic geometry correspond to the acquisition device, where the acquisition device corresponds to the viewpoint index; when it is 0, it indicates that the position and direction of the basic geometry does not correspond to the acquisition device;
  • viewing_space_shape_type indicating the basic geometric shape of the 3D viewing space.
  • the specific shape types are described in the following table;
  • distance_scale indicating the scale of the border distance size of the basic geometry
  • view_id indicates the identification of the viewpoint corresponding to the camera corresponding to the basic geometry, through which the media track where the media data corresponding to the viewpoint is located can be located.
  • the three-dimensional viewing space structure is described by a three-dimensional viewing space structure (ViewingSpaceStruct), which is constructed by combining one or more basic complex geometrical bodies.
  • ViewingSpaceStruct constructed by combining one or more basic complex geometrical bodies.
  • the following describes the optional implementation of the three-dimensional viewing space structure.
  • num_shape_space indicating the number of complex geometry needed to form a three-dimensional viewing space structure
  • operation_type indicates the CSG operation mode of the combination of the geometry into a three-dimensional viewing space, as shown in the following table:
  • Method 2 The three-dimensional viewing space is described at the file level. There may be multiple media spaces in one media file, and the media data of one or more media tracks (such as multi-viewpoint data, point cloud data) corresponds to one three-dimensional viewing space. That is, the user can watch the visual content rendered by the corresponding media data in the three-dimensional viewing space.
  • the media tracks are grouped according to the three-dimensional viewing space, and all the media tracks in the same group belong to the same three-dimensional viewing space.
  • the entity grouping in ISO BMFF is used to describe the three-dimensional viewing space corresponding to the media track group (shown in Figure 7).
  • EntityToGroupBox which is included in the GroupsListBox under the file-level Metabox, where grouping_type is ‘vspg’; the following description is combined with the optional implementation of ViewingSpaceGroupBox.
  • Manner 3 The three-dimensional viewing space is described in the track level, one media track may correspond to one or more three-dimensional viewing spaces, and the three-dimensional viewing space is described in the SampleEntry (shown in FIG. 8) of the media track. The following description will be given in conjunction with optional embodiments.
  • ViewingSpaceBox (3D viewing space data box)
  • the media track grouping may be grouping the media tracks where the atlas data is located, and the information of the three-dimensional viewing space corresponding to the media track is described in the sample entry of the media track where the atlas data is located.
  • volumetric video data-based media file entry is VolumetricVisualSampleEntry, which is described in the configuration data box (VPCCConfigurationBox) as the SEI message .
  • Method 4 The three-dimensional viewing space is described in the track level.
  • One media track may correspond to one or more three-dimensional viewing spaces, or the structure of the three-dimensional viewing space may have low-frequency changes during the playback process.
  • Grouping is used to describe the three-dimensional viewing space that each sample may correspond to.
  • a sample group corresponds to a three-dimensional viewing space, where grouping_type is'vssg'. The following description will be given in conjunction with optional embodiments.
  • the sample grouping in the corresponding media track is extended, for example, based on the sample grouping of the volume video data type as'vaps', the 3D viewing space is used as the SEI message in the sampling
  • the group entry (V3CAtlasParamSampleGroupDescriptionEntry) is described.
  • Method 5 The three-dimensional viewing space is described in the track level. There may be multiple media spaces in one media file, and the media data of one or more media tracks (such as multi-viewpoint data, point cloud data) corresponds to one three-dimensional viewing space. That is, the user can watch the visual content rendered by the corresponding media data in the three-dimensional viewing space.
  • the three-dimensional viewing space is grouped by defining the track group of track_group_type as'vptg' (as shown in FIG. 9), and all media tracks in the same group (including the same track_gourp_id) belong to the same three-dimensional viewing space.
  • the type typt of the list box group GroupListBox is ‘vspg’
  • the media track that contains the basic view (base view) in the multi-view media data is the base track, which is described in the sample entry of the base track.
  • the 3D viewing space corresponding to the media track group is the following description.
  • ViewingSpaceBox (3D viewing space data box)
  • Method 6 Define a media track in the media file to describe or store various parameter information of the media, carry the NAL unit information description through the sample, and store the SEI message describing the 3D viewing space at the code stream layer in the corresponding sample.
  • This embodiment provides a representation method based on variable three-dimensional viewing space information, as shown in FIG. 10.
  • the following describes the optional implementation of the three-dimensional viewing space structure.
  • each sample corresponds to one or more three-dimensional viewing spaces, and the three-dimensional viewing space information of each sample is provided by VRSpaceSample();
  • static_vr_space_flag when it is 1, it indicates that the three-dimensional viewing space is defined in the sample of the sample entry; when it is 0, it indicates that the three-dimensional viewing space of all subsequent samples remains unchanged, and the sample entry is referenced.
  • num_viewing_space indicating the number of three-dimensional viewing space
  • viewing_space_id indicating the identification of the three-dimensional viewing space
  • one timed metadata track can only describe the dynamic changes of one three-dimensional viewing space
  • the dynamic change of the three-dimensional viewing space is related to the dynamic change of the acquisition device, and the sample entry (CameraInfoSampleEntry) and samples in the timing metadata track describing the dynamic information of the camera can be expanded.
  • the dynamic viewpoint timing metadata track can refer to a track or track group through a track reference box (Track Reference Box) with a reference type of'cdsc'.
  • Track Reference Box Track Reference Box
  • the viewing direction or viewing range in the three-dimensional viewing space is described by extending ViewingSpaceStruct.
  • the basic geometry of the three-dimensional viewing space corresponding to the collection device rotates in the coordinate system, for example, the sides of the cube are not parallel to the coordinate axis.
  • shape_rotation_x, shape_rotation_y, and shape_rotation_z respectively indicate the x, y, and z components of the rotation quaternion for the base geometry.
  • the rotation direction of the basic geometry can also be indicated by Euler angle rotation, that is, shape_rotation_x, shape_rotation_y, and shape_rotation_z respectively indicate the angle of rotation along the x, y, and z coordinate axes.
  • the objects in the scene may not be collected from all angles.
  • the user can move the position range and rotate
  • the viewing direction may be restricted, that is, when the visual content in the current window is reconstructed after the user moves, the viewing direction and the viewing range exceed the limit, and the visual content in the window cannot be effectively reconstructed and rendered, resulting in The content fades out, no sense of immersion, etc.
  • the viewing direction or the viewing range structure is described by the viewing direction restriction structure (ViewingDirectionConstrainStruct), which is described below in conjunction with the optional implementation of the three-dimensional viewing space structure based on the viewing direction and the viewing range.
  • ViewingDirectionConstrainStruct the viewing direction restriction structure
  • viewing_direction_center_x, viewing_direction_center_y, viewing_direction_center_z respectively indicate the quaternion components x, y, z of the center of the recommended viewing direction in the basic geometry
  • viewing_direction_yaw_range, viewing_direction_pitch_range respectively indicate one half of the yaw range and tilt range of the recommended viewing direction in the basic geometry
  • guard_band_present_flag 1 indicates that the basic body has a guard band, and 0 indicates that the basic geometry has no guard band;
  • guard_band_direction_size indicating the size of the guard band in the viewing direction of the basic geometry, expressed in degrees.
  • SpaceShapeStruct() needs to be extended.
  • the following describes the optional implementation of the three-dimensional viewing space structure.
  • the three-dimensional viewing space structure is described by a three-dimensional viewing space structure (ViewingSpaceStruct), which is constructed by combining one or more basic complex geometrical bodies.
  • ViewingSpaceStruct constructed by combining one or more basic complex geometrical bodies.
  • the following describes the optional implementation of the three-dimensional viewing space structure.
  • This embodiment provides a representation method based on a three-dimensional viewing space rendering process and file format. The process includes the following steps:
  • Step S1 the user's viewing posture (viewing position, viewing direction) is changed, and the user terminal obtains the corresponding media file based on the user's posture change trajectory, the range of the three-dimensional viewing space, and the processing method corresponding to the three-dimensional viewing space;
  • Step S2 The user terminal decodes and decapsulates the obtained media file, reconstructs the visual content in the current window according to the user's current viewing posture and the processing method corresponding to the three-dimensional viewing space, and renders the content to The user watches.
  • the viewing space processing method determines the required media files and obtains them.
  • the information related to the processing mode of the 3D viewing space may be described at the file level, or may be described at the track level.
  • the processing method of the three-dimensional viewing space is directly related to the scope of the three-dimensional viewing space.
  • the device types, application types, or viewing processing methods supported by different three-dimensional viewing spaces may be different.
  • the three-dimensional viewing space processing method is described at the file level, and there are the following optional methods:
  • Manner 1 The defined 3D viewing space processing data box (ViewingSpaceHandlingBox) is included in the file-level Metabox, and can be included in the same data box container as the 3D viewing space data box (ViewingSpaceBox).
  • the processing method based on the three-dimensional viewing space is described by defining the ViewingSpaceHandlingBox (three-dimensional viewing space processing data box) of the'vsph' type, which will be described below in conjunction with optional implementations.
  • ViewingSpaceHandlingBox three-dimensional viewing space processing data box
  • num_viewing_space indicating the number of three-dimensional viewing spaces in the scene
  • viewing_space_id indicating the identification of the three-dimensional viewing space to which this processing method is applicable
  • num_handling_options indicates the number of options for the 3D viewing space processing method; when it is 0, it means that no 3D viewing space processing method is provided, and the target device can select an appropriate processing method according to the 3D viewing space information;
  • handling_device_class indicates the value of the device type processed by the 3D viewing space (the specific value corresponds to the description shown in the following table) (all devices support 6DOF position tracking, and sometimes the user's movement is related to playback); in the same 3D viewing In the space, there should be no duplicate values in a conformant bitstream. If the type value is 0, the value of i+1 should be num_handling_options;
  • handling_application_class indicates the value of the application type of 3D viewing space processing (the specific value is described in the following table). In the same 3D viewing space, there should be no duplicate values in a conformant bitstream. If the type value is 0 , The value of i+1 should be num_handling_options;
  • handling_method indicates the value of the three-dimensional viewing space processing method (the specific value is described in the following table). In the same three-dimensional viewing space, a conformant bitstream should not have repeated values. If the type value is 0, i The value of +1 should be num_handling_options;
  • Method 2 There may be multiple media spaces in a media file, and the media tracks are grouped according to the three-dimensional viewing space, and all the media tracks in the same group belong to the same three-dimensional viewing space.
  • the entity grouping in ISO BMFF is used to describe the three-dimensional viewing space processing method corresponding to the media track group.
  • the three-dimensional viewing space and the corresponding processing method of the three-dimensional viewing space are described by extending EntityToGroupBox, which are included in the GroupsListBox under the file-level Metabox, where grouping_type is ‘vspg’; the following description is combined with the optional implementation of ViewingSpaceGroupBox.
  • the three-dimensional viewing space processing method is described in the orbit level, and there are the following optional methods:
  • One media track may correspond to one or more three-dimensional viewing spaces, and the three-dimensional viewing space processing method information is described in the SampleEntry of the media track. That is, ViewingSpaceHandlingBox() is included in SampleEntry, and its number is 0 or 1.
  • Method 2 The three-dimensional viewing space is described in the track level.
  • a media track may correspond to one or more three-dimensional viewing spaces, or the position and structure of the three-dimensional viewing space may have low-frequency changes during playback, then use sample grouping( Sample grouping) is used to describe the three-dimensional viewing space that each sample may correspond to and the processing method of the three-dimensional viewing space.
  • Sample group corresponds to a three-dimensional viewing space, where grouping_type is'vssg'. The following is described in conjunction with alternative embodiments.
  • the sample grouping in the corresponding media track is extended, for example, based on the sample grouping of the volume video data type as'vaps', the three-dimensional viewing space processing method is regarded as SEI
  • SEI the three-dimensional viewing space processing method
  • Manner 3 There may be multiple media spaces in a media file, and the media data of one or more media tracks (such as multi-viewpoint data, point cloud data) corresponds to a three-dimensional viewing space.
  • track_group_type as a track group of ‘vptg’
  • the media tracks are grouped based on the three-dimensional viewing space, and all media tracks in the same group (including the same track_gourp_id) belong to the same three-dimensional viewing space.
  • the media track grouping may be grouping the media tracks where the atlas data is located, and the media processing mode of the three-dimensional viewing space corresponding to the media track is described in the sample entry of the media track where the atlas data is located.
  • the media track that contains the basic view (base view) in the multi-view media data is the base track, as described in the sample entry of the base track
  • the 3D viewing space corresponding to the media track group is the base track, as described in the sample entry of the base track.
  • Method 4 In the media track, there may be a media track that only stores the parameter information of the media.
  • the sample carries the information description of the NAL unit, and the description metadata of the three-dimensional viewing space processing method can be stored in the corresponding sample as the SEI message.
  • the processing method of the three-dimensional viewing space may also change continuously, or the director’s plot arrangement and other circumstances change, then describe the changed three-dimensionality in the timed metadata track View metadata about how the space is processed.
  • the corresponding three-dimensional viewing space processing method is described in the three-dimensional viewing space timing metadata track.
  • VRSpaceSampleEntry() and VRSpaceSample() are extended, which will be described below in conjunction with optional implementations. Determine the timed metadata track for 3D viewing space processing at the sample entry.
  • One of the following methods is available to extend VRSpaceSampleEntry():
  • Method 2 Extend VRSpaceSampleEntry() to VRSpacehandlingEntry(), define three-dimensional viewing space processing metadata based on the three-dimensional viewing space defined in VRSpaceSampleEntry(), and identify the sample entry of the three-dimensional viewing space timing metadata track through the'vrsh' type (track sample entry).
  • each sample of the timed metadata media track describes one or more three-dimensional viewing spaces, and each sample describes the metadata processed by the three-dimensional viewing space.
  • VRSpaceSample() There are one of the following methods for extending VRSpaceSample():
  • Method one directly describe the metadata processed by the three-dimensional viewing space corresponding to the three-dimensional viewing space in VRSpaceSample().
  • the second method is to extend VRSpaceSample() to describe the metadata of the three-dimensional viewing space corresponding to the three-dimensional viewing space.
  • space_handling_flag when it is 1, it indicates that the processing of the three-dimensional viewing space is defined in the sample of the sample entry; when it is 0, it indicates that the processing mode of the three-dimensional viewing space of all subsequent samples remains unchanged, and the sample entry is referenced;
  • viewing_space_idx indicating the viewing space identification index
  • handling_present 0 indicates that the 3D viewing space is processed in accordance with the default processing mode, and 1 indicates the processing mode corresponding to the current 3D viewing space.
  • one timed metadata track can only describe the dynamic changes of one 3D viewing space, and when the media file corresponds to multiple 3D viewing spaces, there may be multiple timed metadata tracks to describe changes in multiple 3D viewing spaces, then A three-dimensional viewing space timing metadata corresponds to a processing mode of the three-dimensional viewing space.
  • the processing mode of the three-dimensional viewing space may be unchanged.
  • the independent timing metadata track can be used to describe the dynamic three-dimensional viewing space processing method. The following description is combined with optional implementations.
  • each sample corresponds to a three-dimensional viewing space
  • the information about the processing method of the sample's three-dimensional viewing space is provided by VRSpaceHandlingSample();
  • This embodiment provides a method for a user terminal to obtain media data corresponding to the user's current posture (shown in FIG. 11). The specific steps are as follows:
  • the user's viewing posture (viewing position, viewing direction) changes, and the user terminal determines based on the defined three-dimensional viewing space whether the user’s latest viewing posture is in the three-dimensional viewing space, moving outside of the three-dimensional viewing space, or already outside the three-dimensional viewing space , And determine whether there is corresponding media data locally according to the position of the user's viewing posture relative to the three-dimensional viewing space and the processing method corresponding to the three-dimensional viewing space;
  • S1104 Determine whether media data corresponding to the user's current viewing posture exists locally;
  • step S1106 if not, search for the location of the media file corresponding to the user's current posture in the MPD file, request and download the corresponding media data from the server; if yes, go directly to step S1108;
  • the user terminal reconstructs and renders the visual content in the current window of the user based on the media data.
  • the scheme identifier attribute (@schemeIdUri) is equal to "urn: mpeg: mpegI: miv: 2020:
  • the ViewingSpace element of "vwsp” is called a three-dimensional viewing space (VWSP) descriptor, through which the corresponding media data is requested and downloaded.
  • VWSP three-dimensional viewing space
  • the three-dimensional viewing space is indicated through the VWSP descriptor.
  • the VWSP descriptor cannot appear in two levels at the same time. If it is expressed in the lower level, it will be reused in the higher level. If the VWSP descriptor does not appear, the viewing position may not be moved or the viewing position can be moved arbitrarily in space.
  • Table 1 a table is provided to describe the semantics of the attributes of the sub-elements of the three-dimensional viewing space description.
  • the three-dimensional viewing space has different processing methods for media files for different terminal devices and applications. Because the processing methods are different, the media files that may need to be selected and acquired may also be different. For example, when the user's viewing posture changes (out of three-dimensional Viewing space), the corresponding media file of the next three-dimensional viewing space needs to be obtained.
  • a suitable media file can be selected according to the processing information of the three-dimensional viewing space in the MPD, and the corresponding visual content can be rendered according to a specific processing method.
  • the scheme identifier attribute (@schemeIdUri) is equal to "urn: mpeg: mpegI: miv: 2020:
  • the ViewingSpaceHandling element of "vwph” is called a three-dimensional viewing space processing mode (VWPH) descriptor, through which the corresponding media data is requested and downloaded.
  • VWPH viewing space processing mode
  • the VWPH descriptor is used to indicate the processing mode corresponding to the three-dimensional viewing space.
  • the VWPH descriptor cannot appear in two levels at the same time. If the expression is concentrated in the lower level, it will be reused in the higher level.
  • VWPH three-dimensional viewing space processing method
  • the method according to the above embodiment can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is Better implementation.
  • the technical solution of the present disclosure essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the methods described in the various embodiments of the present disclosure.
  • a device for processing immersive media data is also provided.
  • the device is configured to implement the above-mentioned embodiments and preferred implementations, and what has been described will not be repeated.
  • the term "module" can implement a combination of software and/or hardware with predetermined functions.
  • the devices described in the following embodiments are preferably implemented by software, implementation by hardware or a combination of software and hardware is also possible and conceived.
  • FIG. 12 is a structural block diagram of an immersive media data processing device according to an embodiment of the present disclosure. As shown in FIG. 12, the device includes
  • the acquiring unit 1202 is configured to acquire immersive media data corresponding to the current viewing posture of the user, wherein a three-dimensional viewing space is used to limit the viewing content of the user;
  • the rendering unit 1204 is configured to render the immersive media data corresponding to the current viewing posture, wherein when the current viewing posture is in the three-dimensional viewing space, rendering the acquired immersive media data; or, the current viewing posture When it has moved or is moving outside the three-dimensional viewing space, the immersive media data is rendered according to the processing information of the three-dimensional viewing space.
  • each of the above-mentioned modules can be implemented by software or hardware.
  • it can be implemented in the following way, but not limited to this: the above-mentioned modules are all located in the same processor; or, the above-mentioned modules are in any combination The forms are located in different processors.
  • the embodiment of the present disclosure also provides a computer-readable storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the foregoing method embodiments when running.
  • the above-mentioned computer-readable storage medium may include, but is not limited to: U disk, Read-Only Memory (Read-Only Memory, ROM for short), Random Access Memory (Random Access Memory, RAM for short) , Mobile hard drives, magnetic disks or optical discs and other media that can store computer programs.
  • U disk Read-Only Memory
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • Mobile hard drives magnetic disks or optical discs and other media that can store computer programs.
  • An embodiment of the present disclosure also provides an electronic device including a memory and a processor, the memory stores a computer program, and the processor is configured to run the computer program to execute the steps in any one of the foregoing method embodiments.
  • the aforementioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the aforementioned processor, and the input-output device is connected to the aforementioned processor.
  • modules or steps of the present disclosure can be implemented by a general computing device, and they can be concentrated on a single computing device or distributed in a network composed of multiple computing devices. Above, they can be implemented with program codes executable by a computing device, so that they can be stored in a storage device for execution by the computing device, and in some cases, they can be executed in a different order than shown here. Or the described steps, or fabricate them into individual integrated circuit modules respectively, or fabricate multiple modules or steps of them into a single integrated circuit module to achieve. In this way, the present disclosure is not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Geometry (AREA)
  • Architecture (AREA)
  • Library & Information Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • Processing Or Creating Images (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本公开实施例提供了一种沉浸媒体数据的处理方法及装置、存储介质和电子装置,其中,上述沉浸媒体数据的处理方法包括:获取用户的当前观看姿态所对应的沉浸媒体数据,其中,使用一个三维观看空间限制用户的观看内容;渲染当前观看姿态对应的沉浸媒体数据,其中,当前观看姿态在三维观看空间内时,渲染获取的沉浸媒体数据;或者,当前观看姿态已移到或正在移到三维观看空间外时,根据三维观看空间的处理信息渲染沉浸媒体数据的方法。通过本公开,解决了影响到对用户观看视觉内容的渲染、呈现的问题,进而达到了有助于观看过程中快速、高效、高质量的视觉内容的重构和渲染,以满足最优的沉浸体验的效果。

Description

沉浸媒体数据的处理方法及装置、存储介质和电子装置 技术领域
本公开实施例涉及通信领域,具体而言,涉及一种沉浸媒体数据的处理方法及装置、存储介质和电子装置。
背景技术
沉浸式媒体通过音视频等技术,让使用者在视觉、听觉等方面体验高度逼真的虚拟空间环境,产生身临其境的感觉。目前,沉浸式体验主要是支持平面的全景视频,如用户佩戴头显设备通过头部的自由旋转可以观看360度的视频,即三自由度(3 degree of freedom,简称3DOF)的沉浸式体验。对于支持增强的三自由度(3DOF+)和六自由度(6DOF)的视频,用户在观看过程中可以根据自己的需求进行身体的平移和头部的旋转来观看到更多的细节,如被遮挡部分视觉内容。
用户在身体移动或头部旋转以后,所观看到的视窗内容发生变化,显示设备需要根据用户的观看姿态(身体位置、头部朝向)从媒体数据中选取合适的视频数据进行重构、渲染,向用户呈现当前视窗范围内的视觉内容,满足用户的沉浸感。其中,用户在空间中的可移动的范围与视频数据采集设备的位置、方向等属性有直接的关系,当用户移动的位置超出了某一范围,显示设备将无法根据采集的视频数据重构当前视窗内的视觉内容,那么用户观看到的视觉内容将会出现逐渐淡出、发生扭曲、无沉浸感等情况。为保证用户观看的连续性和观看的质量,需要对用户的三维观看空间进行限定,如果在三维观看空间内,用户能够通过对视频数据的渲染正常体验沉浸感媒体,如果超出三维观看空间后,通过对视频数据进行重构、渲染等处理后能够继续观看、恢复沉浸感,或者淡出不做进一步处理等。
目前,沉浸媒体系统对用户的三维观看空间的范围限定基于单个视点、并且可移动的范围及其有限,表示三维观看空间的几何结构简单,比如立方体、圆形或圆柱形等,由于三维观看空间的大小、形状与采集设备的位 置、数量等有必然关系,如何准确的描述用户的三维观看空间,这将会影响到对用户观看视觉内容的渲染、呈现。
针对上述问题,本公开提出了一种沉浸媒体数据的处理方法及装置、存储介质和电子装置,适用于观看基于多视点或点云数据的沉浸媒体时观看位置的范围限定,有助于观看过程中快速、高效、高质量的视觉内容的重构和渲染,以满足最优的沉浸体验。
发明内容
本公开实施例提供了一种沉浸媒体数据的处理方法及装置、存储介质和电子装置,以至少解决相关技术中影响到对用户观看视觉内容的渲染、呈现的问题。
根据本公开的一个实施例,提供了一种沉浸媒体数据的处理方法,包括:获取用户的当前观看姿态所对应的沉浸媒体数据,其中,使用一个三维观看空间限制用户的观看内容;渲染所述当前观看姿态对应的沉浸媒体数据,其中,所述当前观看姿态在所述三维观看空间内时,渲染获取的所述沉浸媒体数据;或者,所述当前观看姿态已移到或正在移到所述三维观看空间外时,根据所述三维观看空间的处理信息渲染所述沉浸媒体数据。
在一个示例性实施例中,所述获取用户的当前观看姿态所对应的沉浸媒体数据,包括:确定所述当前观看姿态与所述三维观看空间的关系,其中,所述三维观看空间为允许用户在沉浸媒体场景中移动的范围;根据所述关系确定所述沉浸媒体数据。
在一个示例性实施例中,所述确定所述当前观看姿态与所述三维观看空间的关系,包括:确定所述当前观看姿态在所述三维观看空间中或所述三维观看空间外的位置;根据所述位置确定所述沉浸媒体数据。
在一个示例性实施例中,渲染所述当前观看姿态对应的沉浸媒体数据之前,包括:根据数据盒类型或定时元数据轨道样本入口类型确定在媒体文件中描述所述三维观看空间的处理信息的媒体轨道或数据盒。
在一个示例性实施例中,根据数据盒类型或定时元数据轨道样本入口类型确定在媒体文件中描述所述三维观看空间的媒体轨道或数据盒,包括以下方法之一:根据第一数据盒类型识别在所述媒体文件的文件头中的第一三维观看空间数据盒;或者,根据第二分组类型识别在所述媒体文件的文件头中的第二三维观看空间分组数据盒;或者,根据第三数据盒类型识别所述媒体文件中一个或多个媒体轨道中的第三三维观看空间数据盒;或者,根据第四样本分组类型识别所述媒体文件中一个媒体轨道中的第四三维观看空间样本组描述数据盒;或者,根据第五轨道组类型识别所述媒体文件中一个媒体轨道中的第五三维观看空间数据盒,其中,具有相同轨道组标识一个或多个媒体轨道属于同一三维观看空间;或者,根据第六样本入口类型识别所述媒体文件中三维观看空间定时元数据轨道,所述三维观看空间定时元数据轨道指示所述沉浸媒体数据的动态变化的三维观看空间。
在一个示例性实施例中,所述三维观看空间的信息包括以下至少之一:所述三维观看空间在沉浸媒体的空间场景中的坐标位置、所述三维观看空间在所述沉浸媒体的空间场景中的朝向、所述三维观看空间的几何结构、所述三维观看空间中的观看方向,其中,所述三维观看空间的结构由一个或多个复杂几何结构组合构造而成,每一个所述复杂几何结构由一个或多个基础结构组合构造而成;每一个所述基础结构与零个或一个所述沉浸媒体数据的采集设备或采集的视图对应。
在一个示例性实施例中,在渲染所述当前观看姿态对应的沉浸媒体数据之前,包括:根据数据盒类型或定时元数据轨道样本入口类型确定在媒体文件中描述所述三维观看空间的处理信息的媒体轨道或数据盒。
在一个示例性实施例中,所述根据数据盒类型或定时元数据轨道样本入口类型确定在媒体文件中描述所述三维观看空间的处理信息的媒体轨道或数据盒,包括:通过第七数据盒类型识别所述三维观看空间处理数据盒,其中,所述三维观看空间处理数据盒包含于描述所述三维观看空间的所述数据盒,或者,与描述所述三维观看空间的所述数据盒在同一个上级 数据盒中;或者,所述三维观看空间处理数据盒包含于所述三维观看空间定时元数据轨道中;或者,根据第八样本入口类型识别所述媒体文件中三维观看空间处理定时元数据轨道,所述三维观看空间处理定时元数据轨道指示所述三维观看空间的动态变化的三维观看空间处理的方式。
在一个示例性实施例中,所述三维观看空间的处理信息包括以下至少之一:三维观看空间处理方式的选项数量、三维观看空间处理的设备类型、三维观看空间处理的应用类型、三维观看空间处理方式、三维观看空间的标识。
在一个示例性实施例中,获取用户的当前观看姿态所对应的沉浸媒体数据,包括:在确定所述用户的当前观看姿态在所述三维观看空间内的情况下,直接获取所述沉浸媒体数据;或者,在确定所述用户额当前观看姿态正在移到或已经移到所述三维观看空间外的情况下,根据所述三维观看空间的处理信息,获取所述当前观看姿态对应的所述沉浸媒体数据。
在一个示例性实施例中,获取所述沉浸媒体数据,包括:确定媒体呈现描述文件,其中,所述媒体呈现描述文件中包括指示观看沉浸媒体的三维观看空间描述子和/或三维观看空间处理描述子;根据所述用户的当前观看姿态对应的三维观看空间描述子和/或三维观看空间处理描述子请求获取所述当前观看姿态对应的所述沉浸媒体数据。
在一个示例性实施例中,所述沉浸媒体的三维观看空间描述子包括以下至少之一:基础几何体结构、基础几何体的旋转方向、基础几何体对应的视图或采集设备标识或索引、基础几何体的旋转、在基础几何体中观看方向、基础几何体组合方式、复杂结合体组合方式、三维观看空间标识。
在一个示例性实施例中,所述沉浸媒体的三维观看空间处理描述子包括以下至少之一:三维观看空间处理的设备类型、三维观看空间处理的应用类型、三维观看空间处理方式、三维观看空间的标识。
根据本公开的另一个实施例,提供了一种沉浸媒体数据的处理装置,包括:获取单元,设置为获取用户的当前观看姿态所对应的沉浸媒体数据, 其中,使用一个三维观看空间限制用户的观看内容;渲染单元,设置为渲染所述当前观看姿态对应的沉浸媒体数据,其中,所述当前观看姿态在所述三维观看空间内时,渲染获取的所述沉浸媒体数据;或者,所述当前观看姿态已移到或正在移到所述三维观看空间外时,根据所述三维观看空间的处理信息渲染所述沉浸媒体数据。
在一个示例性实施例中,所述获取单元包括:第一确定模块,设置为确定所述当前观看姿态与所述三维观看空间的关系,其中,所述三维观看空间为允许用户在沉浸媒体场景中移动的范围;第二确定模块,设置为根据所述关系确定所述沉浸媒体数据。
在一个示例性实施例中,所述第一确定模块包括:第一确定子模块,设置为确定所述当前观看姿态在所述三维观看空间中或所述三维观看空间外的位置;第二确定子模块,设置为根据所述位置确定所述沉浸媒体数据。
在一个示例性实施例中,所述装置还包括:第一确定单元,设置为渲染所述当前观看姿态对应的沉浸媒体数据之前,根据数据盒类型或定时元数据样本入口类型确定在媒体文件中描述所述三维观看空间的处理信息的媒体轨道或数据盒。
在一个示例性实施例中,所述第一确定单元还设置为执行以下之一:根据第一数据盒类型识别在所述媒体文件的文件头中的第一三维观看空间数据盒;或者,根据第二分组类型识别在所述媒体文件的文件头中的第二三维观看空间分组数据盒;或者,根据第三数据盒类型识别所述媒体文件中一个或多个媒体轨道中的第三三维观看空间数据盒;或者,根据第四样本分组类型识别所述媒体文件中一个媒体轨道中的第四三维观看空间样本组描述数据盒;或者,根据第五轨道组类型识别所述媒体文件中一个媒体轨道中的第五三维观看空间数据盒,其中,具有相同轨道组标识一个或多个媒体轨道属于同一三维观看空间;或者,根据第六样本入口类型识别所述媒体文件中三维观看空间定时元数据轨道,所述三维观看空间定时 元数据轨道指示所述沉浸媒体数据的动态变化的三维观看空间。
在一个示例性实施例中,所述三维观看空间的信息包括以下至少之一:所述三维观看空间在沉浸媒体的空间场景中的坐标位置、所述三维观看空间在所述沉浸媒体的空间场景中的朝向、所述三维观看空间的几何结构、所述三维观看空间中的观看方向,其中,所述三维观看空间的结构由一个或多个复杂几何结构组合构造而成,每一个所述复杂几何结构由一个或多个基础结构组合构造而成;每一个所述基础结构与零个或一个所述沉浸媒体数据的采集设备或采集的视图对应。
在一个示例性实施例中,所述装置还包括:第二确定单元,设置为在渲染所述当前观看姿态对应的沉浸媒体数据之前,根据数据盒类型或定时元数据轨道样本入口类型确定在媒体文件中描述所述三维观看空间的处理信息的媒体轨道或数据盒。
在一个示例性实施例中,所述第二确定单元包括:识别模块,设置为通过第七数据盒类型识别所述三维观看空间处理数据盒,其中,所述三维观看空间处理数据盒包含于描述所述三维观看空间的所述数据盒,或者,与描述所述三维观看空间的所述数据盒在同一个上级数据盒中;或者,所述三维观看空间处理数据盒包含于所述三维观看空间定时元数据轨道中;或者,根据第八样本入口类型识别所述媒体文件中三维观看空间处理定时元数据轨道,所述三维观看空间处理定时元数据轨道指示所述三维观看空间的动态变化的三维观看空间处理的方式。
在一个示例性实施例中,所述三维观看空间的处理信息包括以下至少之一:三维观看空间处理方式的选项数量、三维观看空间处理的设备类型、三维观看空间处理的应用类型、三维观看空间处理方式、三维观看空间的标识。
在一个示例性实施例中,所述获取单元包括:第一获取模块,设置为在确定所述用户的当前观看姿态在所述三维观看空间内的情况下,直接获取所述沉浸媒体数据;或者第二获取模块,设置为在确定所述用户额当前 观看姿态正在移到或已经移到所述三维观看空间外的情况下,根据所述三维观看空间的处理信息,获取所述当前观看姿态对应的所述沉浸媒体数据。
在一个示例性实施例中,所述第一获取模块或者所述第二获取模块还设置为:确定媒体呈现描述文件,其中,所述媒体呈现描述文件中包括指示观看沉浸媒体的三维观看空间描述子和/或三维观看空间处理描述子;根据所述用户的当前观看姿态对应的三维观看空间描述子和/或三维观看空间处理描述子请求获取所述当前观看姿态对应的所述沉浸媒体数据。
在一个示例性实施例中,所述沉浸媒体的三维观看空间描述子包括以下至少之一:基础几何体结构、基础几何体的旋转方向、基础几何体对应的视图或采集设备标识或索引、基础几何体的旋转、在基础几何体中观看方向、基础几何体组合方式、复杂结合体组合方式、三维观看空间标识。
在一个示例性实施例中,所述沉浸媒体的三维观看空间处理描述子包括以下至少之一:三维观看空间处理的设备类型、三维观看空间处理的应用类型、三维观看空间处理方式、三维观看空间的标识。
根据本公开的又一个实施例,还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
根据本公开的又一个实施例,还提供了一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行上述任一项方法实施例中的步骤。
通过本公开实施例,通过获取用户的当前观看姿态所对应的沉浸媒体数据,其中,使用一个三维观看空间限制用户的观看内容;渲染所述当前观看姿态对应的沉浸媒体数据,其中,所述当前观看姿态在所述三维观看空间内时,渲染获取的所述沉浸媒体数据;或者,所述当前观看姿态已移到或正在移到所述三维观看空间外时,根据所述三维观看空间的处理信息渲染所述沉浸媒体数据的方法,由于用户移动到三维观看空间之外时,可以根据三维观看空间的处理信息渲染沉浸媒体数据,从而即使用户移动 到三维观看空间之外,也可以快速、高效的提供给用户高质量的视觉内容因此,可以解决影响到对用户观看视觉内容的渲染、呈现问题,达到观看过程中快速、高效、高质量的视觉内容的重构和渲染,以满足最优的沉浸体验效果。
附图说明
图1是根据本公开实施例的沉浸媒体数据的处理方法的执行装置的结构示意图;
图2是根据本公开实施例的沉浸媒体数据的处理方法的流程示意图;
图3是根据本公开实施例的沉浸媒体数据的处理方法的三维观看空间示意图;
图4是根据本公开实施例的沉浸媒体数据的处理方法的系统结构示意图;
图5是根据本公开实施例的沉浸媒体数据的处理方法的流程图;
图6是根据本公开实施例的沉浸媒体数据的处理方法的IOS文件示意图;
图7是根据本公开实施例的沉浸媒体数据的处理方法的IOS文件示意图;
图8是根据本公开实施例的沉浸媒体数据的处理方法的IOS文件示意图;
图9是根据本公开实施例的沉浸媒体数据的处理方法的IOS文件示意图;
图10是根据本公开实施例的沉浸媒体数据的处理方法的IOS文件示意图;
图11是根据本公开实施例的沉浸媒体数据的处理方法的流程示意图;
图12是根据本公开实施例的沉浸媒体数据的处理装置的结构示意图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本公开的实施例。
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
本公开实施例中所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置中执行。以运行在移动终端上为例,图1是本公开实施例的一种沉浸媒体数据的处理方法的移动终端的硬件结构框图。如图1所示,移动终端可以包括一个或多个(图1中仅示出一个处理器102(处理器102可以包括但不限于微处理器(Microprocessor Unit,简称是MPU)或可编程逻辑器件(Programmable logic device,简称是PLD))和设置为存储数据的存储器104,其中,上述移动终端还可以包括设置为通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述移动终端的结构造成限定。例如,移动终端还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。
存储器104可设置为存储计算机程序,例如,应用软件的软件程序以及模块,如本公开实施例中的沉浸媒体数据的处理方法对应的计算机程序,处理器102通过运行存储在存储器104内的计算机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至移动终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输装置106设置为经由一个网络接收或者发送数据。上述的网络具体实例可包括移动终端的通信供应商提供的无线网络。在一个实例中,传 输装置106包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,简称为RF)模块,其设置为通过无线方式与互联网进行通讯。
在本实施例中提供了一种沉浸媒体数据的处理方法,图2是根据本公开实施例的沉浸媒体数据的处理方法的流程图,如图2所示,该流程包括如下步骤:
步骤S202,获取用户的当前观看姿态所对应的沉浸媒体数据,其中,使用一个三维观看空间限制用户的观看内容;
步骤S204,渲染所述当前观看姿态对应的沉浸媒体数据,其中,所述当前观看姿态在所述三维观看空间内时,渲染获取的所述沉浸媒体数据;或者,所述当前观看姿态已移到或正在移到所述三维观看空间外时,根据所述三维观看空间的处理信息渲染所述沉浸媒体数据。
需要说明的是,本公开中的三维观看空间是允许用户移动的空间。通俗来讲,是在三维观看空间中设置一个虚拟摄像头,虚拟摄像头可以拍摄三维观看空间中的内容并反馈给真实空间中的用户。用户在真实空间中移动或旋转,则虚拟摄像头在三维观看空间中移动或旋转。三维观看空间是系统建议用户也就是虚拟摄像头移动的范围。用户或者说虚拟摄像头也可以移动到三维观看空间之外,并不是一定要在三维观看空间内移动。上述使用三维观看空间限制用户的观看内容的渲染,是指虚拟摄像头在三维观看空间内捕捉到内容之后,会传递给用户观看,此时,需要渲染虚拟摄像头捕捉到的内容。也就是,渲染用户观看到的内容。
其中,所述获取用户的当前观看姿态所对应的沉浸媒体数据,包括:确定所述当前观看姿态与所述三维观看空间的关系,其中,所述三维观看空间为允许用户移动的虚拟空间范围;根据所述关系确定所述沉浸媒体数据。
其中,所述确定所述当前观看姿态与所述三维观看空间的关系,包括: 确定所述当前观看姿态在所述三维观看空间中或所述三维观看空间外的位置;根据所述位置确定所述沉浸媒体数据三维观看空间。
其中,渲染所述当前观看姿态对应的沉浸媒体数据之前,包括:根据数据盒类型或定时元数据样本入口类型确定在媒体文件中描述所述三维观看空间的处理信息的媒体轨道或数据盒。
其中,根据数据盒类型或定时元数据轨道样本入口类型确定在媒体文件中描述所述三维观看空间的媒体轨道或数据盒,包括以下方法之一:根据第一数据盒类型识别在所述媒体文件的文件头中的第一三维观看空间数据盒;或者,根据第二分组类型识别在所述媒体文件的文件头中的第二三维观看空间分组数据盒;或者,根据第三数据盒类型识别所述媒体文件中一个或多个媒体轨道中的第三三维观看空间数据盒;或者,根据第四样本分组类型识别所述媒体文件中一个媒体轨道中的第四三维观看空间样本组描述数据盒;或者,根据第五轨道组类型识别所述媒体文件中一个媒体轨道中的第五三维观看空间数据盒,其中,具有相同轨道组标识一个或多个媒体轨道属于同一三维观看空间;或者,根据第六样本入口类型识别所述媒体文件中三维观看空间定时元数据轨道,所述三维观看空间定时元数据轨道指示所述沉浸媒体数据的动态变化的三维观看空间。
其中,所述三维观看空间的处理信息包括以下至少之一:所述三维观看空间在沉浸媒体的空间场景中的坐标位置、所述三维观看空间在所述沉浸媒体的空间场景中的朝向、所述三维观看空间的几何结构、所述三维观看空间中的观看方向,其中,所述三维观看空间的结构由一个或多个复杂几何结构组合构造而成,每一个所述复杂几何结构由一个或多个基础结构组合构造而成;每一个所述基础结构与零个或一个所述沉浸媒体数据的采集设备或采集的视图对应。
其中,在渲染所述当前观看姿态对应的沉浸媒体数据之前,包括:根据数据盒类型或定时元数据轨道样本入口类型确定在媒体文件中描述所述三维观看空间的处理信息的媒体轨道或数据盒。
其中,所述根据数据盒类型或定时元数据轨道样本入口类型确定在媒体文件中描述所述三维观看空间的处理信息的媒体轨道或数据盒,包括:通过第七数据盒类型识别所述三维观看空间处理数据盒,其中,所述三维观看空间处理数据盒包含于描述所述三维观看空间的所述数据盒,或者,与描述所述三维观看空间的所述数据盒在同一个上级数据盒中;或者,所述三维观看空间处理数据盒包含于所述三维观看空间定时元数据轨道中;或者,根据第八样本入口类型识别所述媒体文件中三维观看空间处理定时元数据轨道,所述三维观看空间处理定时元数据轨道指示所述三维观看空间的动态变化的三维观看空间处理的方式。
其中,所述三维观看空间的处理信息包括以下至少之一:三维观看空间处理方式的选项数量、三维观看空间处理的设备类型、三维观看空间处理的应用类型、三维观看空间处理方式、三维观看空间的标识。
其中,获取用户的当前观看姿态所对应的沉浸媒体数据,包括:在确定所述用户的当前观看姿态在所述三维观看空间内的情况下,直接获取所述沉浸媒体数据;或者,在确定所述用户额当前观看姿态正在移到或已经移到所述三维观看空间外的情况下,根据所述三维观看空间的处理信息,获取所述当前观看姿态对应的所述沉浸媒体数据。
其中,获取所述沉浸媒体数据,包括:确定媒体呈现描述文件,其中,所述媒体呈现描述文件中包括指示观看沉浸媒体的三维观看空间描述子和/或三维观看空间处理描述子;根据所述用户的当前观看姿态对应的三维观看空间描述子和/或三维观看空间处理描述子请求获取所述当前观看姿态对应的所述沉浸媒体数据。
其中,所述沉浸媒体的三维观看空间描述子包括以下至少之一:基础几何体结构、基础几何体的旋转方向、基础几何体对应的视图或采集设备标识或索引、基础几何体的旋转、在基础几何体中观看方向、基础几何体组合方式、复杂结合体组合方式、三维观看空间标识。
其中,所述沉浸媒体的三维观看空间处理描述子包括以下至少之一: 三维观看空间处理的设备类型、三维观看空间处理的应用类型、三维观看空间处理方式、三维观看空间的标识。通过上述步骤,解决影响到对用户观看视觉内容的渲染、呈现问题,达到观看过程中快速、高效、高质量的视觉内容的重构和渲染,以满足最优的沉浸体验效果。
本公开可以应用与沉浸式体验过程中。用户通过佩戴显示设备观看三维观看空间中的场景。用户可以在真实空间内移动,从而映射到三维观看空间中用户的虚拟角色移动,对应的虚拟角色捕捉到的沉浸媒体数据在显示设备中播放。在此过程中,如果三维观看空间中用户的当前观看姿态超出了三维观看空间,则根据所述三维观看空间的处理信息渲染所述沉浸媒体数据。
其中,三维观看空间指用户的虚拟角色在沉浸媒体场景中可以移动的范围。三维观看姿态指用户对应的虚拟角色在三维观看空间内的位置与观看方向(虚拟摄像头朝向)如果虚拟角色超出三维观看空间的范围,相关技术中可能造成黑屏等问题。本公开中,如果用户的虚拟角色超出三维观看空间的范围,则可以根据所述三维观看空间的处理信息渲染所述沉浸媒体数据,从而实现观看过程中快速、高效、高质量的视觉内容的重构和渲染的效果。解决黑屏的问题。
以下结合具体示例对如何构造三维观看空间,如何获取沉浸媒体数据等进行说明。
图3是一种可选的三维观看空间结构位置的示意图。在示例的场景中,沉浸视频采集设备(摄像头)放置在不同位置采集场景中的视频数据,以采集设备的位置为基础,可以构成一个或多个三维观看空间。所述三维观看空间三维观看空间可以是单独的立方体、球形体、圆柱体等表示的基础几何体,也可以是由多个基础几何体组合成的复杂的几何体。当用户在三维观看空间内移动,显示设备可以的获取用户视窗对应的媒体数据,进行视窗内图像的重构、渲染,对用户视窗中的内容呈现进行快速的切换;当用户移动到三维观看空间范围外后,因为没有对应设备采集的媒体数据, 显示设备将无法在用户的视窗内显示相应的视觉内容,或者只能显示低质量的视觉内容。图3中的三维观看空间1-3为不同的情况。
同时,在三维观看空间中,用户的观看的方向也可能会受到采集设备的位置、朝向的限制。
图4是一种可选沉浸媒体数据的处理系统的结构示意图。
如图4所示,所述系统包括传输服务器402,用户终端404。其中,
传输服务器402至少包含存储模块402-1、传输模块402-2,
存储模块402-1,设置为存储在制作服务器10中制作好的媒体文件。
传输模块402-2,设置为接收用户终端的请求消息,或发送存储的媒体文件。上述的接收或发送可以通过通信供应商提供的无线网络、本地组建的无线局域网络、或者有线方式来实现;
用户终端404至少包含传输模块404-1、解码解封装模块404-2、媒体处理模块404-3和显示模块404-4。
传输模块404-1,设置为接收媒体服务器402发送的媒体文件,或发送请求消息到服务器402,如请求媒体文件下载;
解码解封装模块404-2,设置为解码解封装传输模块接收到的媒体文件;
媒体处理模块404-3,根据用户当前的观看的位置、观看方向等信息,对解码解封装模块404-2输出的媒体数据进行重构、渲染等处理;
显示模块404-4,设置为呈现用户当前视窗的视觉内容给用户。
本实施例提供了一种基于三维观看空间的媒体系统的处理/播放流程、以及文件格式的表示,如图5所示,该流程包括如下步骤:
步骤S501,用户观看姿态(观看位置、观看方向)变换,用户终端基于用户当前姿态信息所属的三维观看空间范围获取对应的媒体文件,媒体文件中包括了沉浸媒体数据;
步骤S502,用户终端对获取的媒体文件,并进行解封装解码,根据 用户的当前观看位置、观看方向等信息重构当前视窗内的视觉内容,进行渲染后将内容呈现给用户观看。
需要说明的是,在本公开实施例中,一种实现方式是基于ISO(International Organization for Standardization,国际标准化组织)基本媒体文件格式将全方向视频数据存储在文件中。其中,受限方案信息盒子、轨道引用盒子、轨道群组盒子等ISO基本媒体文件格式可参照ISO/IEC JTC1/SC29/WG11运动图像专家组(Moving Picture Experts Group,简称MPEG)制定的MPEG-4 Part 12 ISO Base Media File Format来操作。全方向视频的投影、封装步骤及其基本格式可参照ISO/IEC JTC1/SC29/WG11运动图像专家组(MPEG)制定的MPEG-I Part 2 OMAF(全方向媒体格式)来操作。
另外,ISO基本文件格式中所有数据都装在盒子(Box)中,即以MP4文件为代表的ISO基本文件格式由若干个盒子组成,每个盒子都有类型和长度,可以视为一个数据对象。一个盒子中可以包含另一个盒子,称为容器盒子。一个MP4文件首先会有且只有一个“ftyp”类型的盒子,作为文件格式的标志并包含关于文件的一些信息。之后会有且只有一个“MOOV”类型的盒子(Movie Box),它是一种容器盒子,它的子盒子包含了媒体的元数据信息。MP4文件的媒体数据包含在“mdat”类型的盒子(Media Data Box)中,也是容器盒子,可以有多个,也可以没有(当媒体数据全部引用其他文件时),媒体数据的结构由元数据进行描述。为对媒体进行元数据描述的进一步支撑,可选的使用“meta”类型的盒子(Meta box),也是容器盒子,用于描述一些通用的或额外的非定时的元数据。一个媒体可以由一个或多个轨道(track)组成,每个轨道就是一个随时间变化的媒体序列,一个轨道包含连续的采样(sample)集合。
再而,定时元数据轨道是ISO基本媒体文件格式(ISO Base Media File Format,简称ISOBMFF)中的一种建立与特定样本关联的定时元数据的机制。定时元数据与媒体数据的耦合较少,通常是“描述性”的。
具体地,在所述媒体文件中定义数据盒(Box)或样本入口(sample entry)描述场景的三维观看空间。对于不变、不频繁变化或频繁变化的三维观看空间,可包含于不同层级、不同媒体轨道或不同的容器中,可以在文件级或媒体轨道级对三维观看空间进行描述。同时,一个场景中存在一个或多个独立几何体表示三维观看空间。
可选地,三维观看空间的具体描述方式可采用以下方式之一:
方式一:文件级中描述所述三维观看空间,定义三维观看空间数据盒(ViewingSpaceBox)包含于所述的文件级的Metabox中(图6所示)。下面结合可选实施三维观看空间数据盒进行说明。
ViewingSpaceBox(三维观看空间数据盒)
Box Type:'vwsp'
Container:MetaBox
Mandatory:No
Quantity:Zero or one
语法
aligned(8)class ViewingSpaceBox extends FullBox('vwsp',0,flags){signed int(8)num_viewing_space;//可选,如一个媒体轨道或媒体轨道分组中只有一个观看空间,则可以不适用。
Figure PCTCN2021098689-appb-000001
语义:
num_viewing_space,指示媒体文件对应三维观看空间的数量;
viewing_space_id,指示三维观看空间标识;
具体地,所述三维观看空间的结构是由ViewingSpaceStruct()表示,所述空间结构是由一个或多个复杂几何体组合构造得到,通常采用CSG(Constructive Solid Geometry,构造几何体)方式组合得到;所述复杂几何体是由基础几何体(立方体、圆柱体等)有序的相加组合得到,通常采用CSG方式或按顺序插值方式得到。
具体地,所述基础几何体为立方体、球形体和半球体等常见形状,下面结合可选实施基础几何体结构进行说明,Cuboidstruct(),SpheroidStruct(),HalfspaceStruct()分别表示立方体、球形体和半球体。
Figure PCTCN2021098689-appb-000002
Figure PCTCN2021098689-appb-000003
语义
center_x、center_y、center_z,分别指示几何结构体的中心点在坐标系中的位置;
size_x、size_y、size_z,分别指示立方体在x、y、z方向上的边长;
radius_x,radius_y,radius_z,分别指示球体在x,y,z维度上的半径;
normal_x,normal_y,normal_z,分别指示定义半球体的平面法向;
camera_inferred_flag,由SpaceShapeStruct()中指定,表示该简单几何体的位置是否与采集设备对应;为0表示与采集设备位置无关,需要自行定义中心点位置,为1表示与采集设备有关,可采用采集设备的位置信息;
distance,指示从原点沿着法向量到半球体平面方向的距离。
具体地,所述复杂几何体通过形状空间结构(ShapeSpaceStruct)来描述,由一个或多个基础几何体构造得到。下面结合可选实施三维观看空间结构进行说明。
语法
Figure PCTCN2021098689-appb-000004
Figure PCTCN2021098689-appb-000005
语义
num_primitive_shape,指示组成三维观看空间的基础几何体的数量;
primitive_shape_operation,指示组成三维观看空间对基础几何体形状的操作方式;为0时,指示使用CSG模式对基础几何体进行相加形成复杂几何体;为1时,指示将沿着基础几何体中心形成的路径对基础几何体进行插值操作,以形成复杂几何体;
camera_inferred_flag,为1时,指示基础几何体的对位和方向是与采集设备对应,其中,采集设备与视点索引号对应;为0时,指示基础几何体的位置和方向不与采集设备对应;
viewing_space_shape_type,指示三维观看空间的基础几何体形状,具体的形状类型描述如下表;
Figure PCTCN2021098689-appb-000006
Figure PCTCN2021098689-appb-000007
distance_scale,指示基础几何体的边框距离尺寸的刻度;
view_id,指示基础几何体所对应的摄像机对应视点的标识,通过该标识,可以定位到该视点对应媒体数据所在的媒体轨道。
具体地,所述三维观看空间结构通过三维观看空间结构(ViewingSpaceStruct)来描述,由一个或多个基础复杂几何体组合构造得到。下面结合可选实施三维观看空间结构进行说明。
语法
Figure PCTCN2021098689-appb-000008
num_shape_space,指示组成一个三维观看空间结构所需要的复杂几何体的数量;
operation_type,指示该几何体组合成为三维观看空间的CSG操作方式,如下表所示:
Figure PCTCN2021098689-appb-000009
Figure PCTCN2021098689-appb-000010
方式二:文件级中描述所述三维观看空间,一个媒体文件中存在可能存在多个媒体空间,一个或多个媒体轨道的媒体数据(如多视点数据、点云数据)对应一个三维观看空间,即用户可以在所述三维观看空间中观看到对应媒体数据渲染成的视觉内容。根据三维观看空间对媒体轨道进行分组,同一组的所有媒体轨道只属于同一个三维观看空间。使用ISO BMFF中的entity grouping(实体分组)的方式对媒体轨道组对应的三维观看空间进行描述(图7所示)。
具体地,通过扩展EntityToGroupBox描述所述三维观看空间,包含于文件级Metabox下的GroupsListBox中,其中,grouping_type为‘vspg’;下面结合可选实施ViewingSpaceGroupBox进行说明。
Figure PCTCN2021098689-appb-000011
方式三:轨道级中描述所述三维观看空间,一个媒体轨道可能对应一个或多个三维观看空间,在所述媒体轨道的SampleEntry(图8所示)中描述所述三维观看空间。下面结合可选实施例进行说明。
ViewingSpaceBox(三维观看空间数据盒)
Box Type:'vwsp'
Container:Sample Entry
Mandatory:No
Quantity:Zero or one
语法
Figure PCTCN2021098689-appb-000012
Figure PCTCN2021098689-appb-000013
具体地,媒体轨道分组可以是对图集数据(Atlas data)所在的媒体轨道进行分组,在图集数据所在的媒体轨道的sample entry中描述所述媒体轨道对应的三维观看空间的信息。
可选地,基于特定的应用或者具体的媒体数据,对相应的媒体轨道中的sample entry进行扩展,比如基于容积视频数据的媒体文件入口为VolumetricVisualSampleEntry,作为SEI message在配置数据盒(VPCCConfigurationBox)中描述。
方式四:轨道级中描述所述三维观看空间,一个媒体轨道可能对应一个或多个三维观看空间,或者三维观看空间的结构有在播放过程中可能存在低频率的变化,则通过sample grouping(样本分组)的方式来描述每一个sample可能对应的三维观看空间,一个采样组对应一个三维观看空间,其中,grouping_type为‘vssg’。下面结合可选实施例进行说明。
Figure PCTCN2021098689-appb-000014
可选地,基于特定的应用或者具体的媒体数据,对相应的媒体轨道中的sample grouping进行扩展,比如基于容积视频数据的类型为‘vaps’的采样分组,将三维观看空间作为SEI message在采样分组入口(V3CAtlasParamSampleGroupDescriptionEntry)中描述。
方式五:轨道级中描述所述三维观看空间,一个媒体文件中存在可能存在多个媒体空间,一个或多个媒体轨道的媒体数据(如多视点数据、点云数据)对应一个三维观看空间,即用户可以在所述三维观看空间中观看到对应媒体数据渲染成的视觉内容。通过定义track_group_type为‘vptg’的track group对三维观看空间对媒体轨道进行分组(图9所示),同一组(包含相同的track_gourp_id)的所有媒体轨道只属于同一个三维观看空间。列表框组GroupListBox的类型typt为’vspg’
具体地,每一个分组中有且只有一个base track(基础轨道),如在多视点媒体数据中包含basic view(基础视图)的媒体轨道为基础轨道,在基础轨道的sample entry中描述描述所述媒体轨道组对应的三维观看空间。下面结合可选实施例进行说明。
ViewingSpaceBox(三维观看空间数据盒)
Box Type:'vwsp'
Container:Sample Entry
Mandatory:No
Quantity:Zero or one
语法
Figure PCTCN2021098689-appb-000015
方式六,在媒体文件中定义一个媒体轨道,用于描述或存放媒体的各种参数信息,通过sample承载NAL unit的信息描述,将码流层描述三维观看空间的SEI message存放相应的sample中。
随着时间的推移,因为采集设备的位置变化,或者导演推荐的三维观看空间等情况,三维观看空间会动态发生变化。本实施例提供了一种基于 可变的三维观看空间信息的表示方法,如图10所示。
具体地,通过在定时元数据轨道中指定三维观看空间,通过‘vrsp’类型来识别三维观看空间定时元数据轨道的样本入口(track sample entry),下面结合可选实施三维观看空间结构进行说明。
语法
Figure PCTCN2021098689-appb-000016
可选的,每一个样本对应一个或多个三维观看空间,各样本的三维观看空间信息由VRSpaceSample()提供;
Figure PCTCN2021098689-appb-000017
static_vr_space_flag,为1时,指示三维观看空间在该样本入口的样本中定义;为0时,指示后面所有样本的三维观看空间保持不变,引用该样本入口。
num_viewing_space,指示三维观看空间数量;
viewing_space_id,指示三维观看空间的标识;
可选地,如果一个定时元数据轨道只能描述一个三维观看空间的动态变化,媒体文件对应多个三维观看空间时,可以存在多个定时元数据轨道来描述多个三维观看空间的变化。
可选地,三维观看空间的动态变化与采集设备的动态变化有关,可以对描述摄像头动态信息定时元数据轨道中的样本入口(CameraInfoSampleEntry)及样本进行扩展。
另外,动态视点定时元数据轨道可以通过引用类型为'cdsc'的轨道引用数据盒(Track Reference Box)引用轨道或轨道群组。
由于媒体数据的采集设备放置在空间中的朝向位置,可能会存在观看的问题,通过扩展ViewingSpaceStruct描述在三维观看空间中的观看方向或观看范围。
由于媒体数据的采集设备放置在空间中的位置和朝向,所述采集设备对应的三维观看空间的基本几何体在坐标系中的方向会发生旋转,如立方体各条边与坐标轴不平行。
具体地,基础几何体在坐标系中如何旋转通过旋转结构(ShapeRotationStruct)来描述,下面结合可选实施三维观看空间结构进行说明。
Figure PCTCN2021098689-appb-000018
语义
shape_rotation_x,shape_rotation_y,shape_rotation_z,分别指示用于基础几何体的旋转四元数的x,y,z分量。
可选地,也可以通过欧拉角旋转来指示基础几何体的旋转方向,即shape_rotation_x,shape_rotation_y,shape_rotation_z,分别指示沿着x,y,z坐标轴旋转的角度。
由于媒体数据的采集设备放置在空间中的位置和朝向问题,可能未能从所有角度采集到场景里的对象,当用户观看视频过程中自由的移动,则用户可移动的位置范围和可旋转的观看方向可能是受限制的,即用户移动后引起当前视窗内的视觉内容重构时,观看方向和观看范围超出限制,无法对视所述视窗内的视觉内容进行有效的重构、渲染,造成内容的淡出、无沉浸感等。
具体地,观看方向或观看范围结构通过观看方向限制结构(ViewingDirectionConstrainStruct)来描述,下面结合可选实施基于观看方向和观看范围的三维观看空间结构进行说明。
Figure PCTCN2021098689-appb-000019
Figure PCTCN2021098689-appb-000020
语义
viewing_direction_center_x,viewing_direction_center_y,viewing_direction_center_z,分别指示在基础几何体中建议观看方向的中心的四元数分量x,y,z;
viewing_direction_yaw_range,viewing_direction_pitch_range,分别指示在基础几何体中建议观看方向的偏航范围和倾斜范围的二分之一;
guard_band_present_flag,为1指示基础几个体有保护带,为0指示基础几何体没有保护带;
guard_band_direction_size,指示基础几何体内观看方向的保护带的大小,以度数为单位来表示。
具体地,在考虑了观看方向限制和几何体旋转的情况下,需要对SpaceShapeStruct()进行扩展,下面结合可选实施三维观看空间结构进行说明。
Figure PCTCN2021098689-appb-000021
Figure PCTCN2021098689-appb-000022
具体地,所述三维观看空间结构通过三维观看空间结构(ViewingSpaceStruct)来描述,由一个或多个基础复杂几何体组合构造得到。下面结合可选实施三维观看空间结构进行说明。
语法
Figure PCTCN2021098689-appb-000023
对于观看者在观看过程中的处理方式,移动进入或移动出三维观看空间,基于不同的设备类型和应用类型,可能会采用不同的操作类型,比如导演推荐用户在移动出三维观看空间时对场景的渲染操作。本实施例提供一种基于三维观看空间渲染处理流程及文件格式的表示方法。该流程包括如下步骤:
步骤S1,用户观看姿态(观看位置、观看方向)变换,用户终端基于用户姿态变化轨迹、三维观看空间范围以及三维观看空间对应的处理方式获取对应的媒体文件;
步骤S2,用户终端对获取的媒体文件,并进行解码解封装,根据用户的当前观看姿态、及三维观看空间对应的处理方式等信息重构当前视窗内的视觉内容,进行渲染后将内容呈现给用户观看。
具体地,用户观看姿态在三维观看空间内时,默认为可直接获取对应的媒体文件;用户观看姿态正在移动到三维观看空间外或已在三维观看空间外时,根据不同的场景需要对应的三维观看空间处理方式来确定需要的媒体文件并获取。
具体地,所述三维观看空间处理方式的相关信息可以在文件级中描述,也可以在轨道级中描述。三维观看空间的处理方式与三维观看空间的范围有直接的关系,不同的三维观看空间支持的设备类型、应用类型或者观看的处理方式可能会不相同。
可选地,在文件级中描述所述三维观看空间处理方式,有如下可选方式:
方式一:定义的三维观看空间处理数据盒(ViewingSpaceHandlingBox)包含于所述的文件级的Metabox中,可与三维观看空间数据盒(ViewingSpaceBox)包含在同一个数据盒容器中。
具体地,通过定义’vsph’类型的ViewingSpaceHandlingBox(三维观看空间处理数据盒)来描述基于三维观看空间的处理方式,下面结合可选实施进行说明。
Figure PCTCN2021098689-appb-000024
Figure PCTCN2021098689-appb-000025
具体语法如下:
num_viewing_space,指示场景中的三维观看空间数量;
viewing_space_id,指示该处理方式适用的三维观看空间的标识;
num_handling_options,指示三维观看空间处理方式的选项数量;为0时,表示没有提供三维观看空间处理的方式,则目标设备可根据三维观看空间信息选择合适的处理方式;
handling_device_class,指示三维观看空间处理的设备类型的值(具体的值对应的描述如下表所示)(所有的设备都支持6DOF的位置跟踪,有时候用户的移动与播放有关);在同一个三维观看空间中,一个conformant bitstream中不应该有重复的值,如果类型值为0时,i+1的值应该为num_handling_options;
Figure PCTCN2021098689-appb-000026
handling_application_class,指示三维观看空间处理的应用类型的值(具体的值对应的描述如下表所示),在同一个三维观看空间中,一个conformant bitstream中不应该有重复的值,如果类型值为0时,i+1的值应该为num_handling_options;
Figure PCTCN2021098689-appb-000027
handling_method,指示三维观看空间处理方式的值(具体的值对应的描述如下表所示),在同一个三维观看空间中,一个conformant bitstream中不应该有重复的值,如果类型值为0时,i+1的值应该为num_handling_options;
Figure PCTCN2021098689-appb-000028
Figure PCTCN2021098689-appb-000029
方式二:一个媒体文件中存在可能存在多个媒体空间,根据三维观看空间对媒体轨道进行分组,同一组的所有媒体轨道只属于同一个三维观看空间。使用ISO BMFF中的entity grouping(实体分组)的方式对媒体轨道组对应的三维观看空间处理方式进行描述。
具体地,通过扩展EntityToGroupBox描述所述三维观看空间及其相应的三维观看空间处理的方式,包含于文件级Metabox下的GroupsListBox中,其中,grouping_type为‘vspg’;下面结合可选实施ViewingSpaceGroupBox进行说明。
Figure PCTCN2021098689-appb-000030
可选地,在轨道级中描述所述三维观看空间处理方式,有如下可选方式:
方式一:一个媒体轨道可能对应一个或多个三维观看空间,在所述媒体轨道的SampleEntry中描述所述三维观看空间处理方式信息。即将ViewingSpaceHandlingBox()包含于SampleEntry中,其数量为0或1。
方式二:轨道级中描述所述三维观看空间,一个媒体轨道可能对应一个或多个三维观看空间,或者三维观看空间的位置、结构在播放过程中可能存在低频率的变化,则通过sample grouping(样本分组)的方式来描述每一个sample可能对应的三维观看空间及该三维观看空间的处理方式,一个采样组对应一个三维观看空间,其中,grouping_type为‘vssg’。下 面结合可选实施例进行说明。
Figure PCTCN2021098689-appb-000031
可选地,基于特定的应用或者具体的媒体数据,对相应的媒体轨道中的sample grouping进行扩展,比如基于容积视频数据的类型为‘vaps’的采样分组,将三维观看空间处理的方式作为SEI message在采样分组入口(V3CAtlasParamSampleGroupDescriptionEntry)中描述。
方式三:一个媒体文件中存在可能存在多个媒体空间,一个或多个媒体轨道的媒体数据(如多视点数据、点云数据)对应一个三维观看空间。通过定义track_group_type为‘vptg’的track group,基于三维观看空间对媒体轨道进行分组,同一组(包含相同的track_gourp_id)的所有媒体轨道只属于同一个三维观看空间。
具体地,媒体轨道分组可以是对图集数据(Atlas data)所在的媒体轨道进行分组,在图集数据所在的媒体轨道的sample entry中描述所述媒体轨道对应的三维观看空间的媒体处理方式。
具体地,如果每一个分组中有且只有一个base track(基础轨道),如在多视点媒体数据中包含basic view(基础视图)的媒体轨道为基础轨道,在基础轨道的sample entry中描述所述媒体轨道组对应的三维观看空间。
方式四,在媒体轨道中,可能存在一个媒体轨道只存放媒体的参数信息,通过sample承载NAL unit的信息描述,可以将三维观看空间处理方式的描述元数据作为SEI message存放相应的sample中。
当三维观看空间因为采集设备位置的变化而不断变化,三维观看空间的处理方式可能也会不断发生变化,或者因为导演的剧情安排等情况而发 生变化,则在定时元数据轨道中描述变化的三维观看空间处理方式的元数据。
具体地,三维观看空间与三维观看空间处理方式对应,则在三维观看空间定时元数据轨道中描述其对应的三维观看空间的处理方式。
具体地,对VRSpaceSampleEntry()及其VRSpaceSample()进行扩展,下面结合可选实施进行说明。在样本入口确定三维观看空间处理的定时元数据轨道,对VRSpaceSampleEntry()的扩展有如下方法之一:
语法
Figure PCTCN2021098689-appb-000032
方法二:扩展VRSpaceSampleEntry()为VRSpacehandlingEntry(),在VRSpaceSampleEntry()中已定义的三维观看空间基础上定义三维观看空间 处理元数据,通过‘vrsh’类型来识别三维观看空间定时元数据轨道的样本入口(track sample entry)。
语法:
Figure PCTCN2021098689-appb-000033
可选地,定时元数据媒体轨道每一个样本描述一个或多个三维观看空间,在每个样本中描述三维观看空间处理的元数据,对VRSpaceSample()的扩展有如下方法之一:
方法一,直接在VRSpaceSample()描述三维观看空间对应的三维观看空间处理的元数据。
语法:
Figure PCTCN2021098689-appb-000034
Figure PCTCN2021098689-appb-000035
方法二,扩展VRSpaceSample()描述三维观看空间对应的三维观看空间处理的元数据。
语法:
Figure PCTCN2021098689-appb-000036
语义
space_handling_flag,为1时,指示三维观看空间的处理在该样本入口的样本中定义;为0时,指示后面所有样本的三维观看空间处理方式保持不变,引用该样本入口;
viewing_space_idx,指示观看空间标识索引;
handling_present,为0指示该三维观看空间按照缺省的处理方式,为1指示当前三维观看空间对应的处理方式。
可选地,如果一个定时元数据轨道只能描述一个三维观看空间的动态变化,媒体文件对应多个三维观看空间时,可以存在多个定时元数据轨道来描述多个三维观看空间的变化,则一个三维观看空间定时元数据对应一个该三维观看空间的处理方式。
可选地,三维观看空间的位置、结构变化,但是对应的viewing_space_id没有变化,并且三维观看空间对应采集设备的内参没有变化,则三维观看空间的处理方式可能是不变的。则可通过独立的定时元数据轨道来描述动态的三维观看空间处理方式。下面结合可选实施进行说明。
语法
Figure PCTCN2021098689-appb-000037
可选的,每一个样本对应一个三维观看空间,样本的三维观看空间处理方式的信息由VRSpaceHandlingSample()提供;
Figure PCTCN2021098689-appb-000038
Figure PCTCN2021098689-appb-000039
本实施例提供用户终端获取用户当前姿态所对应的媒体数据的方法(图11所示)。具体的步骤如下:
S1102,用户观看姿态(观看位置、观看方向)发生变化,用户终端基于定义的三维观看空间确定用户最新的观看姿态是在三维观看空间内、正在移动到三维观看空间外或者已在三维观看空间外,并根据用户观看姿态相对于三维观看空间的位置,以及三维观看空间对应的处理方式确定本地是否存在对应的媒体数据;
S1104,判断本地是否存在用户的当前观看姿态对应的媒体数据;
S1106,如果没有,则根据MPD文件中,查找用户当前姿态所对应的媒体文件位置,从服务器中请求并下载对应的媒体数据;如果有,则直接进入步骤S1108;
S1108,如果有,用户终端基于媒体数据重构、渲染用户当前视窗范围内的视觉内容。
具体地,HTTP动态自适应流媒体(DASH,Dynamic Adaptive Streaming over HTTP)的媒体呈现描述(MPD,Media Presentation Description)中,方案标识属性(@schemeIdUri)等于“urn:mpeg:mpegI:miv:2020:vwsp”的ViewingSpace元素被称为三维观看空间(VWSP)描述子,通过所述描述子请求并下载对应的媒体数据。
在MPD层或自适应集(Adaptation Set)通过VWSP描述子指示三维观看空间。VWSP描述子不能同时出现在两个层级中,如果表述在低层集 中,则在高层级的中重用。如果未出现VWSP描述子,则可能无法移动观看位置或者可在空间中任意移动观看位置。
如下表1所示,提供了用于描述三维观看空间描述子元素属性语义的表格。
表1
Figure PCTCN2021098689-appb-000040
Figure PCTCN2021098689-appb-000041
可选地,三维观看空间对于不同的终端设备和应用,对媒体文件存在不同的处理方式,因为处理方式不同,可能需要选择获取的媒体文件也存在不同,如当用户的观看姿态改变(走出三维观看空间),需要获取下一个三维观看空间的对应的媒体文件。则在获取三维观看空间对应的媒体数据时,可根据MPD中三维观看空间处理信息,选择合适的媒体文件,并根据具体的处理方式来渲染相应的视觉内容。
具体地,HTTP动态自适应流媒体(DASH,Dynamic Adaptive  Streaming over HTTP)的媒体呈现描述(MPD,Media Presentation Description)中,方案标识属性(@schemeIdUri)等于“urn:mpeg:mpegI:miv:2020:vwph”的ViewingSpaceHandling元素被称为三维观看空间处理方式(VWPH)描述子,通过所述描述子请求并下载对应的媒体数据。
在MPD层或自适应集(Adaptation Set)通过VWPH描述子指示对应三维观看空间的处理方式。VWPH描述子不能同时出现在两个层级中,如果表述在低层集中,则在高层级的中重用。
如下表2所示,提供了用于描述三维观看空间处理方式(VWPH)描述子元素属性语义的表格。
表2
Figure PCTCN2021098689-appb-000042
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本公开各个实施例所述的方法。
在本实施例中还提供了一种沉浸媒体数据的处理装置,该装置设置为实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所 使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图12是根据本公开实施例的沉浸媒体数据的处理装置的结构框图,如图12所示,该装置包括
获取单元1202,设置为获取用户的当前观看姿态所对应的沉浸媒体数据,其中,使用一个三维观看空间限制用户的观看内容;
渲染单元1204,设置为渲染所述当前观看姿态对应的沉浸媒体数据,其中,所述当前观看姿态在所述三维观看空间内时,渲染获取的所述沉浸媒体数据;或者,所述当前观看姿态已移到或正在移到所述三维观看空间外时,根据所述三维观看空间的处理信息渲染所述沉浸媒体数据。
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述各个模块以任意组合的形式分别位于不同的处理器中。
本公开的实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
在一个示例性实施例中,上述计算机可读存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储计算机程序的介质。
本公开的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。
在一个示例性实施例中,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。
本实施例中的具体示例可以参考上述实施例及示例性实施方式中所描述的示例,本实施例在此不再赘述。
显然,本领域的技术人员应该明白,上述的本公开的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本公开不限制于任何特定的硬件和软件结合。
以上所述仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (16)

  1. 一种沉浸媒体数据的处理方法,包括:
    获取用户的当前观看姿态所对应的沉浸媒体数据,其中,使用一个三维观看空间限制用户的观看内容;
    渲染所述当前观看姿态对应的沉浸媒体数据,其中,所述当前观看姿态在所述三维观看空间内时,渲染获取的所述沉浸媒体数据;
    或者,所述当前观看姿态已移到或正在移到所述三维观看空间三维观看空间外时,根据所述三维观看空间的处理信息渲染所述沉浸媒体数据。
  2. 根据权利要求1所述的方法,其中,所述获取当前观看姿态所对应的沉浸媒体数据,包括:
    确定所述当前观看姿态与所述三维观看空间的关系,其中,所述三维观看空间为允许用户在沉浸媒体场景中移动的范围;
    根据所述关系确定所述沉浸媒体数据。
  3. 根据权利要求2所述方法,其中,所述确定所述当前观看姿态与所述三维观看空间的关系,包括:
    确定所述当前观看姿态在所述三维观看空间中或所述三维观看空间外的位置;
    根据所述位置确定所述沉浸媒体数据三维观看空间。
  4. 根据权利要求1所述的方法,其中,渲染所述当前观看姿态对应的沉浸媒体数据之前,包括:
    根据数据盒类型或定时元数据轨道样本入口类型确定在媒体文件中描述所述三维观看空间的处理信息的媒体轨道或数据盒。
  5. 根据权利要求4所述的方法,其中,根据数据盒类型或定时元数据轨道样本入口类型确定在媒体文件中描述所述三维观看空间的定时元数据轨道或数据盒,包括以下方法之一:
    根据第一数据盒类型识别在所述媒体文件的文件头中的第一三维观看空间数据盒;
    或者,根据第二分组类型识别在所述媒体文件的文件头中的第二三维观看空间分组数据盒;
    或者,根据第三数据盒类型识别所述媒体文件中一个或多个媒体轨道中的第三三维观看空间数据盒;
    或者,根据第四样本分组类型识别所述媒体文件中一个媒体轨道中的第四三维观看空间样本组描述数据盒;
    或者,根据第五轨道组类型识别所述媒体文件中一个媒体轨道中的第五三维观看空间数据盒,其中,具有相同轨道组标识一个或多个媒体轨道属于同一三维观看空间;
    或者,根据第六样本入口类型识别所述媒体文件中三维观看空间定时元数据轨道,所述三维观看空间定时元数据轨道指示所述沉浸媒体数据的动态变化的三维观看空间。
  6. 根据权利要求1至5任意一项所述的方法,其中,所述三维观看空间信息包括以下至少之一:
    所述三维观看空间在沉浸媒体的空间场景中的坐标位置、所述三维观看空间在所述沉浸媒体的空间场景中的朝向、所述三维观看空间的几何结构、所述三维观看空间中的观看方向,其中,所述三维观看空间的结构由一个或多个复杂几何结构组合构造而成,每一个所述复杂几何结构由一个或多个基础结构组合构造而成;每一个所述基础结 构与零个或一个所述沉浸媒体数据的采集设备或采集的视图对应。
  7. 根据权利要求1所述的方法,其中,在渲染所述当前观看姿态对应的沉浸媒体数据之前,包括:
    根据数据盒类型或定时元数据轨道样本入口类型确定在媒体文件中描述所述三维观看空间的处理信息的定时元数据轨道或数据盒。
  8. 根据权利要求7所述方法,其中,所述根据数据盒类型或定时元数据轨道样本入口类型确定在媒体文件中描述所述三维观看空间的处理信息的定时元数据轨道或数据盒,包括:
    通过第七数据盒类型识别所述三维观看空间处理数据盒,其中,所述三维观看空间处理数据盒包含于描述所述三维观看空间的所述数据盒,或者,与描述所述三维观看空间的所述数据盒在同一个上级数据盒中;
    或者,所述三维观看空间处理数据盒包含于所述三维观看空间定时元数据轨道中;
    或者,根据第八样本入口类型识别所述媒体文件中三维观看空间处理定时元数据轨道,所述三维观看空间处理定时元数据轨道指示所述三维观看空间的动态变化的三维观看空间处理的方式。
  9. 根据权利要求8所述方法,其中,所述三维观看空间的处理信息包括以下至少之一:
    三维观看空间处理方式的选项数量、三维观看空间处理的设备类型、三维观看空间处理的应用类型、三维观看空间处理方式、三维观看空间的标识。
  10. 根据权利要求1所述方法,其中,获取用户的当前观看姿态 所对应的沉浸媒体数据,包括:
    在确定所述用户的当前观看姿态在所述三维观看空间内的情况下,直接获取所述沉浸媒体数据;
    或者,在确定所述用户的当前观看姿态正在移到或已经移到所述三维观看空间外的情况下,根据所述三维观看空间的处理信息,获取所述当前观看姿态对应的所述沉浸媒体数据。
  11. 根据权利要求10所述方法,其中,获取所述沉浸媒体数据,包括:
    确定媒体呈现描述文件,其中,所述媒体呈现描述文件中包括指示观看沉浸媒体的三维观看空间描述子和/或三维观看空间处理描述子;
    根据所述用户的当前观看姿态对应的三维观看空间描述子和/或三维观看空间处理描述子请求获取所述当前观看姿态对应的所述沉浸媒体数据。
  12. 根据权利要求10或11所述方法,其中,所述沉浸媒体的三维观看空间描述子包括以下至少之一:
    基础几何体结构、基础几何体的旋转方向、基础几何体对应的视图或采集设备标识或索引、基础几何体的旋转、在基础几何体中观看方向、基础几何体组合方式、复杂结合体组合方式、三维观看空间标识。
  13. 根据权利要求10或11所述方法,其中,所述沉浸媒体的三维观看空间处理描述子包括以下至少之一:
    三维观看空间处理的设备类型、三维观看空间处理的应用类型、三维观看空间处理方式、三维观看空间的标识。
  14. 一种沉浸媒体数据的处理装置,包括:
    获取单元,设置为获取用户的当前观看姿态所对应的沉浸媒体数据,其中,使用一个三维观看空间限制用户的观看内容;
    渲染单元,设置为渲染所述当前观看姿态对应的沉浸媒体数据,其中,所述当前观看姿态在所述三维观看空间内时,渲染获取的所述沉浸媒体数据;或者,所述当前观看姿态已移到或正在移到所述三维观看空间外时,根据所述三维观看空间的处理信息渲染所述沉浸媒体数据。
  15. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至13任一项中所述的方法。
  16. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行所述权利要求1至13任一项中所述的方法。
PCT/CN2021/098689 2020-06-23 2021-06-07 沉浸媒体数据的处理方法及装置、存储介质和电子装置 WO2021259054A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020237002677A KR20230028489A (ko) 2020-06-23 2021-06-07 몰입형 미디어 데이터 처리 방법, 장치, 저장 매체 및 전자 장치
CA3178737A CA3178737A1 (en) 2020-06-23 2021-06-07 Method and apparatus for processing immersive media data, storage mediumand electronic apparatus
EP21830152.1A EP4171026A4 (en) 2020-06-23 2021-06-07 METHOD AND DEVICE FOR PROCESSING IMMERSIVE MEDIA DATA AS WELL AS STORAGE MEDIUM AND ELECTRONIC DEVICE
US17/922,086 US20230169719A1 (en) 2020-06-23 2021-06-07 Method and Apparatus for Processing Immersive Media Data, Storage Medium and Electronic Apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010581346.0A CN112492289A (zh) 2020-06-23 2020-06-23 沉浸媒体数据的处理方法及装置、存储介质和电子装置
CN202010581346.0 2020-06-23

Publications (1)

Publication Number Publication Date
WO2021259054A1 true WO2021259054A1 (zh) 2021-12-30

Family

ID=74920026

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/098689 WO2021259054A1 (zh) 2020-06-23 2021-06-07 沉浸媒体数据的处理方法及装置、存储介质和电子装置

Country Status (6)

Country Link
US (1) US20230169719A1 (zh)
EP (1) EP4171026A4 (zh)
KR (1) KR20230028489A (zh)
CN (1) CN112492289A (zh)
CA (1) CA3178737A1 (zh)
WO (1) WO2021259054A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112492289A (zh) * 2020-06-23 2021-03-12 中兴通讯股份有限公司 沉浸媒体数据的处理方法及装置、存储介质和电子装置
CN115086635B (zh) * 2021-03-15 2023-04-14 腾讯科技(深圳)有限公司 多视角视频的处理方法、装置、设备及存储介质
CN115474034B (zh) * 2021-06-11 2024-04-26 腾讯科技(深圳)有限公司 沉浸媒体的数据处理方法、装置、相关设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319362A (zh) * 2018-01-02 2018-07-24 联想(北京)有限公司 一种全景信息显示方法、电子设备和计算机存储介质
CN110494915A (zh) * 2017-04-04 2019-11-22 佳能株式会社 电子装置及其控制方法
CN110876051A (zh) * 2018-08-29 2020-03-10 中兴通讯股份有限公司 视频数据的处理,传输方法及装置,视频数据的处理系统
CN110944222A (zh) * 2018-09-21 2020-03-31 上海交通大学 沉浸媒体内容随用户移动变化的方法及系统
CN112492289A (zh) * 2020-06-23 2021-03-12 中兴通讯股份有限公司 沉浸媒体数据的处理方法及装置、存储介质和电子装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102322508B1 (ko) * 2017-09-28 2021-11-05 엘지전자 주식회사 스티칭 및 리프로젝션 관련 메타데이터를 이용한 6dof 비디오를 송수신하는 방법 및 그 장치

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110494915A (zh) * 2017-04-04 2019-11-22 佳能株式会社 电子装置及其控制方法
CN108319362A (zh) * 2018-01-02 2018-07-24 联想(北京)有限公司 一种全景信息显示方法、电子设备和计算机存储介质
CN110876051A (zh) * 2018-08-29 2020-03-10 中兴通讯股份有限公司 视频数据的处理,传输方法及装置,视频数据的处理系统
CN110944222A (zh) * 2018-09-21 2020-03-31 上海交通大学 沉浸媒体内容随用户移动变化的方法及系统
CN112492289A (zh) * 2020-06-23 2021-03-12 中兴通讯股份有限公司 沉浸媒体数据的处理方法及装置、存储介质和电子装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4171026A4 *

Also Published As

Publication number Publication date
EP4171026A4 (en) 2023-11-22
KR20230028489A (ko) 2023-02-28
EP4171026A1 (en) 2023-04-26
CN112492289A (zh) 2021-03-12
CA3178737A1 (en) 2021-12-30
US20230169719A1 (en) 2023-06-01

Similar Documents

Publication Publication Date Title
CN109076255B (zh) 发送、接收360度视频的方法及设备
RU2728904C1 (ru) Способ и устройство для управляемого выбора точки наблюдения и ориентации аудиовизуального контента
WO2021259054A1 (zh) 沉浸媒体数据的处理方法及装置、存储介质和电子装置
RU2711591C1 (ru) Способ, устройство и компьютерная программа для адаптивной потоковой передачи мультимедийного контента виртуальной реальности
JP7058273B2 (ja) 情報処理方法および装置
WO2020043126A1 (zh) 视频数据的处理,传输方法及装置,视频数据的处理系统
US11094130B2 (en) Method, an apparatus and a computer program product for video encoding and video decoding
CN108702528A (zh) 发送360视频的方法、接收360视频的方法、发送360视频的设备和接收360视频的设备
US20190394444A1 (en) Method for transmitting 360-degree video, method for receiving 360-degree video, apparatus for transmitting 360-degree video, and apparatus for receiving 360-degree video
CN113891117B (zh) 沉浸媒体的数据处理方法、装置、设备及可读存储介质
CN114095737B (zh) 媒体文件封装及解封装方法、装置、设备及存储介质
JP7035088B2 (ja) 魚眼ビデオデータのための高レベルシグナリング
WO2023029858A1 (zh) 点云媒体文件的封装与解封装方法、装置及存储介质
KR20200088485A (ko) 카메라 렌즈 정보를 포함한 360도 비디오를 송수신하는 방법 및 그 장치
WO2023061131A1 (zh) 媒体文件封装方法、装置、设备及存储介质
US20240080429A1 (en) Video data processing method and apparatus, computer device, computer readable storage medium, and computer program product
KR102214079B1 (ko) 360도 비디오를 송수신하는 방법 및 그 장치
CN110351492A (zh) 一种视频数据处理方法、装置及系统
JP2022541908A (ja) ボリュメトリックビデオコンテンツを配信するための方法および装置
WO2022037423A1 (zh) 点云媒体的数据处理方法、装置、设备及介质
US20230360678A1 (en) Data processing method and storage medium
WO2023024843A1 (zh) 媒体文件封装与解封装方法、设备及存储介质
WO2023024839A1 (zh) 媒体文件封装与解封装方法、装置、设备及存储介质
US20230403411A1 (en) File decapsulation method and apparatus for free viewpoint video, device, and storage medium
CN116643644A (zh) 一种沉浸媒体的数据处理方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21830152

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3178737

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021830152

Country of ref document: EP

Effective date: 20230123