WO2023051138A1 - 沉浸媒体的数据处理方法、装置、设备、存储介质及程序产品 - Google Patents
沉浸媒体的数据处理方法、装置、设备、存储介质及程序产品 Download PDFInfo
- Publication number
- WO2023051138A1 WO2023051138A1 PCT/CN2022/116102 CN2022116102W WO2023051138A1 WO 2023051138 A1 WO2023051138 A1 WO 2023051138A1 CN 2022116102 W CN2022116102 W CN 2022116102W WO 2023051138 A1 WO2023051138 A1 WO 2023051138A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- field
- media content
- interaction
- video
- interactive
- Prior art date
Links
- 238000003860 storage Methods 0.000 title claims abstract description 28
- 238000003672 processing method Methods 0.000 title claims abstract description 22
- 230000003993 interaction Effects 0.000 claims abstract description 212
- 238000000034 method Methods 0.000 claims abstract description 123
- 230000004044 response Effects 0.000 claims abstract description 20
- 230000002452 interceptive effect Effects 0.000 claims description 269
- 230000011664 signaling Effects 0.000 claims description 117
- 238000012545 processing Methods 0.000 claims description 62
- 238000004590 computer program Methods 0.000 claims description 29
- 230000008859 change Effects 0.000 claims description 17
- 230000001960 triggered effect Effects 0.000 claims description 8
- 230000009471 action Effects 0.000 claims description 5
- 238000007654 immersion Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 description 78
- 238000010586 diagram Methods 0.000 description 23
- 238000004519 manufacturing process Methods 0.000 description 20
- 230000005540 biological transmission Effects 0.000 description 16
- 238000005538 encapsulation Methods 0.000 description 16
- 238000005516 engineering process Methods 0.000 description 11
- 230000000007 visual effect Effects 0.000 description 9
- 238000009877 rendering Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 210000003128 head Anatomy 0.000 description 7
- 230000009286 beneficial effect Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000005236 sound signal Effects 0.000 description 5
- 238000003491 array Methods 0.000 description 4
- 230000006835 compression Effects 0.000 description 4
- 238000007906 compression Methods 0.000 description 4
- 238000006073 displacement reaction Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 238000004806 packaging method and process Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 238000012856 packing Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000011960 computer-aided design Methods 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/698—Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/111—Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
- H04N13/117—Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation the virtual viewpoint locations being selected by the viewers or determined by viewer tracking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/239—Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests
- H04N21/2393—Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests involving handling client requests
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/158—Switching image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
- H04N13/366—Image reproducers using viewer tracking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/437—Interfacing the upstream path of the transmission network, e.g. for transmitting client requests to a VOD server
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/63—Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
- H04N21/637—Control signals issued by the client directed to the server or network components
- H04N21/6377—Control signals issued by the client directed to the server or network components directed to server
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/90—Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/2628—Alteration of picture size, shape, position or orientation, e.g. zooming, rotation, rolling, perspective, translation
Definitions
- the present application relates to the field of computer technology, and in particular to a data processing method, device, equipment, storage medium and program product for immersive media.
- Immersive media refers to media content that can bring immersive experience to business objects (for example, users). Degree of Freedom, DoF), can be divided into 3DoF media, 3DoF+ media and 6DoF media.
- the video client and the server can conduct a conversation by sending an Interaction Feedback Message.
- the video client can send feedback to the server to describe user location information (for example, user location ), so that the video client can receive the media content returned by the server based on the user location information.
- user location information for example, user location
- the embodiment of the present application provides a data processing method, device, computer equipment, computer-readable storage medium, and computer program product for immersive media, which can enrich the information types of interactive feedback, and improve the video client's acquisition of media during the interactive feedback process. Accuracy of Content.
- An embodiment of the present application provides a data processing method for immersive media, including:
- an interactive feedback message corresponding to the interactive operation is generated;
- the interactive feedback message carries a business key field, and the business key field is used to describe the business event indicated by the interactive operation;
- the interactive feedback message is used to determine the business event indicated by the interactive operation, and based on the business event indicated by the interactive operation, acquire the second immersive media content for responding to the interactive operation;
- the returned second immersive media content is received.
- An embodiment of the present application provides a data processing method for immersive media, including:
- the interactive feedback message is generated in response to an interactive operation on the first immersive media content;
- the interactive feedback message carries a business key field, and the business key field is used to describe the business event indicated by the interactive operation ;
- An embodiment of the present application provides a data processing device for immersive media, including:
- the message generation module is configured to respond to the interactive operation on the first immersive media content, and generate an interactive feedback message corresponding to the interactive operation;
- the interactive feedback message carries a business key field, and the business key field is used to describe the interactive operation indicated business events;
- the message sending module is configured to send an interactive feedback message, the interactive feedback message is used to determine the business event indicated by the interactive operation, and based on the business event indicated by the interactive operation, obtain the second immersive media content used to respond to the interactive operation;
- the content receiving module is configured to receive the returned second immersive media content.
- An embodiment of the present application provides a data processing device for immersive media, including:
- the message receiving module is configured to receive an interactive feedback message; the interactive feedback message is generated in response to an interactive operation on the first immersive media content; the interactive feedback message carries a business key field, and the business key field is used to describe the interaction the business event indicated by the action;
- the content acquisition module is configured to determine the business event indicated by the interactive operation based on the business key field in the interactive feedback message, and obtain the second immersive media content for responding to the interactive operation based on the business event indicated by the interactive operation;
- a content returning module configured to return the second immersive media content.
- An embodiment of the present application provides a computer device, including: a processor and a memory;
- the processor is connected to the memory, wherein the memory is configured to store a computer program, and when the computer program is executed by the processor, the computer device executes the data processing method for immersive media provided by the embodiment of the present application.
- An embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program is suitable for being loaded and executed by a processor, so that a computer device having the processor executes the embodiment of the present application Provides data processing methods for immersive media.
- An embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
- the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the data processing method for immersive media provided by the embodiment of the present application.
- an interactive feedback message corresponding to the interactive operation is generated and sent, because the interactive feedback message carries the business key used to describe the business event indicated by the interactive operation field, therefore, based on the business key field in the interaction feedback message, the business event indicated by the interactive operation can be determined, and the second immersive media content for responding to the interactive operation can be acquired based on the business event indicated by the interactive operation, because
- the business events indicated by the interactive operation can correspond to different types, and the interactive operation here can not only include operations related to the user's location (for example, the user's location changes), but also include other operations for the currently playing immersive media content (such as , scaling operation), therefore, through the business key field carried in the interaction feedback message, multiple types of business events can be fed back, so that the immersive media content that responds to the interactive operation can be determined based on these different types of business events
- the information types of interactive feedback can be enriched, and the accuracy of media content acquired by the
- Fig. 1 is an architectural diagram of a panoramic video system provided by an embodiment of the present application
- Fig. 2 is a schematic diagram of 3DoF provided by the embodiment of the present application.
- FIG. 3 is a structural diagram of a volumetric video system provided by an embodiment of the present application.
- Fig. 4 is a schematic diagram of 6DoF provided by the embodiment of the present application.
- Fig. 5 is a schematic diagram of 3DoF+ provided by the embodiment of the present application.
- FIG. 6 is a schematic structural diagram of a data processing system 300 for immersive media provided by an embodiment of the present application.
- FIG. 7 is a schematic flowchart of a data processing method for immersive media provided by an embodiment of the present application.
- FIG. 8 is a schematic flowchart of a data processing method for immersive media provided by an embodiment of the present application.
- FIG. 9 is an interactive schematic diagram of a data processing method for immersive media provided by an embodiment of the present application.
- FIG. 10 is a schematic structural diagram of a data processing device for immersive media provided by an embodiment of the present application.
- FIG. 11 is a schematic structural diagram of a data processing device for immersive media provided by an embodiment of the present application.
- Fig. 12 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
- FIG. 13 is a schematic structural diagram of a data processing system provided by an embodiment of the present application.
- the embodiment of the present application relates to a data processing technology for immersive media.
- the so-called immersive media also referred to as immersive media
- immersive media refers to media files that can provide immersive media content, enabling business objects immersed in the media content to obtain visual, auditory and other sensory experiences in the real world.
- Immersive media can be divided into 3DoF media, 3DoF+ media and 6DoF media according to the degree of freedom of business objects when consuming media content.
- Common 6DoF media include multi-view video and point cloud media.
- Immersive media content includes video content represented in a three-dimensional (3-Dimension, 3D) space in various forms, for example, three-dimensional video content represented in a spherical form.
- the immersive media content may be virtual reality (Virtual Reality, VR) video content, panoramic video content, spherical video content, 360-degree video content or volumetric video content.
- immersive media content also includes audio content synchronized with video content represented in three-dimensional space.
- Panoramic video/image refers to the use of multiple cameras to shoot, stitch and map the scene, and provide part of the media screen according to the viewing direction or window of the business object, and provide a spherical video or image with a 360-degree image range at most.
- Panoramic video/image is a typical immersive media that provides a three-degree-of-freedom (ie, 3DoF) experience.
- V3C volumetric media refers to immersive media that captures visual content in three-dimensional space and provides 3DoF+, 6DoF viewing experience, encoded in traditional video, and contains volumetric video-type tracks in the file package, such as Can include multi-view videos, video encoded point clouds, etc.
- multi-view video can also be called multi-view video, which refers to the use of multiple sets of camera arrays to shoot the scene from multiple angles, with texture information (color information, etc.) and depth information (spatial distance information, etc.) of the scene video.
- Multi-view/multi-viewpoint video also called free-view/free-viewpoint video, is an immersive media that provides a six-degree-of-freedom (6DoF) experience.
- a point cloud is a set of discrete point sets that are randomly distributed in space and express the spatial structure and surface properties of a three-dimensional object or scene.
- Each point in the point cloud has at least three-dimensional position information, and may also have color, material or other information depending on the application scenario.
- each point in a point cloud has the same number of additional attributes.
- Point cloud can flexibly and conveniently express the spatial structure and surface properties of three-dimensional objects or scenes, so it is widely used, including virtual reality games, computer aided design (Computer Aided Design, CAD), geographic information system (Geography Information System, GIS), automatic Navigation system (Autonomous Navigation System, ANS), digital cultural heritage, free viewpoint broadcasting, 3D immersive telepresence, 3D reconstruction of biological tissues and organs, etc.
- CAD Computer Aided Design
- GIS Geographic Information System
- Automatic Navigation system Automatic Navigation System
- digital cultural heritage free viewpoint broadcasting
- free viewpoint broadcasting 3D immersive telepresence
- 3D reconstruction of biological tissues and organs etc.
- the acquisition of point cloud mainly has the following ways: computer generation, 3D laser scanning, 3D photogrammetry, etc.
- FIG. 1 is a structural diagram of a panoramic video system provided by an embodiment of the present application.
- the panoramic video system may include an encoding device (for example, encoding device 100A) and a decoding device (for example, decoding device 100B), the encoding device may refer to the computer equipment used by the provider of the panoramic video, the computer
- the device may be a terminal (such as a personal computer (Personal Computer, PC), an intelligent mobile device (such as a smart phone), etc.) or a server.
- a terminal such as a personal computer (Personal Computer, PC), an intelligent mobile device (such as a smart phone), etc.
- the decoding device may refer to a computer device used by a user of the panoramic video, and the computer device may be a terminal (such as a PC, a smart mobile device (such as a smart phone), a VR device (such as a VR helmet, VR glasses, etc.)).
- the data processing process of the panoramic video includes a data processing process at the encoding device side and a data processing process at the decoding device side.
- the data processing process on the side of the encoding device mainly includes: (1) the process of acquiring and producing the media content of the panoramic video; (2) the process of encoding and packaging the panoramic video.
- the data processing process on the side of the decoding device mainly includes: (1) the process of decapsulating and decoding the panoramic video file; (2) the rendering process of the panoramic video.
- the transmission process involving panoramic video between the encoding device and the decoding device can be carried out based on various transmission protocols.
- the transmission protocols here can include but are not limited to: Dynamic Adaptive Streaming over HTTP, DASH) protocol, HTTP Live Streaming (HLS) protocol, Smart Media Transport Protocol (Smart Media Transport Protocol, SMTP), Transmission Control Protocol (Transmission Control Protocol, TCP), etc.
- a capture device may refer to a hardware component provided in an encoding device, for example, a capture device refers to a microphone, a camera, a sensor, and the like of a terminal.
- the capture device may also be a hardware device connected to the encoding device, such as a camera connected to a server, for providing the encoding device with an acquisition service of the media content of the panoramic video.
- the capture device may include but not limited to: audio device, camera device and sensor device. Wherein, the audio device may include an audio sensor, a microphone, and the like.
- the camera device may include a common camera, a stereo camera, a light field camera, and the like.
- Sensing devices may include laser devices, radar devices, and the like.
- the number of capture devices can be multiple, and these capture devices are deployed at some specific positions in the real space to simultaneously capture audio content and video content from different angles in the space, and the captured audio content and video content are distributed in time and space. keep in sync.
- media content in a 3-dimensional space that is captured by a capture device deployed at a specific location and used to provide a viewing experience with three degrees of freedom may be referred to as a panoramic video.
- the real-world sound-visual scene 10A may be captured by multiple audio sensors and a set of camera arrays in the encoding device 100A, or may be captured by a multiple camera and camera array connected to the encoding device 100A.
- Sensor camera equipment captures.
- the collection result can be a set of digital image/video signal 10B i (ie video content) and digital audio signal 10B a (ie audio content).
- the cameras or cameras here generally cover all directions around the center point of the camera array or camera equipment, so the panoramic video can also be called a 360-degree video.
- the production process of the panoramic video media content involved in the embodiment of the present application may be understood as the production process of the panoramic video content.
- the captured audio content itself is suitable for performing audio coding of panoramic video.
- the captured video content can only become content suitable for video encoding of panoramic videos after a series of production processes.
- the production process may include:
- splicing refers to splicing the video content shot at various angles into a complete video that can reflect the 360-degree visual panorama of the real space, that is, the spliced video is a spherical video represented in 3D space.
- multiple captured images are stitched together to obtain a spherical image represented in three-dimensional space.
- Each video frame in the spherical video spliced above is a spherical image based on the unit sphere of the global coordinate axis.
- Rotation refers to rotating the unit sphere on the global coordinate axis.
- the rotation angle required for the conversion from the local coordinate axis to the global coordinate axis is represented by the rotation angle.
- the local coordinate axis of the unit sphere is the axis of the rotated coordinate system. It should be understood that if the local and global axes are the same, no rotation is required.
- 3Projection refers to the process of mapping a three-dimensional video formed by splicing (or a rotated three-dimensional video) onto a two-dimensional (2-Dimension, 2D) image.
- the 2D image formed by projection is called a projection image; the projection method It may include but not limited to: longitude and latitude map projection, regular hexahedron projection.
- the projected image can be encoded directly, or the projected image can be encoded after area encapsulation.
- region encapsulation technology is widely used in the process of video processing of immersive media.
- area encapsulation refers to the process of converting the projected image by area, and the area encapsulation process converts the projected image into an encapsulated image.
- the process of area encapsulation includes: dividing the projected image into multiple mapped areas, and then converting the multiple mapped areas to obtain multiple encapsulated areas, and mapping the multiple encapsulated areas into a 2D image to obtain an encapsulated image.
- the mapping area refers to the area obtained by dividing in the projected image before performing area encapsulation;
- the encapsulating area refers to the area located in the encapsulating image after performing area encapsulation.
- Conversion processing may include, but is not limited to: mirroring, rotation, rearrangement, up-sampling, down-sampling, changing the resolution of an area, and moving.
- encoding device 100A may stitch, (possibly) rotate, project and map images belonging to the same time instance in digital image/video signal 10B i onto packaged image 10D.
- FIG. 2 is a schematic diagram of 3DoF provided by an embodiment of the present application.
- 3DoF means that the business object is fixed at the center point of a three-dimensional space, and the head of the business object rotates along the X axis, Y axis, and Z axis to watch the screen provided by the media content.
- users who consume immersive media may be collectively referred to as business objects.
- the captured audio content can be directly encoded to form the audio code stream of the panoramic video.
- video encoding is performed on the encapsulated image to obtain the video code stream of the panoramic video.
- the media presentation description information Media presentation description, MPD
- Metadata is a general term for information related to the presentation of a panoramic video, and the metadata may include description information of media content, description information of a window, signaling information related to the presentation of media content, and the like.
- the encoding device stores media presentation description information and
- the encoding device 100A can perform audio encoding on the captured digital audio signal 10B a to obtain an audio code stream 10E a , and at the same time, can perform video encoding on a packaged image 10D to obtain a video code stream 10E v , Alternatively, image encoding may be performed on the packaged image 10D to obtain an encoded image 10E i . Subsequently, the encoding device 100A can combine the encoded image 10E i , video code stream 10E v and/or audio code stream 10E a obtained after encoding into a media file for file playback according to a specific media file format (such as ISOBMFF).
- a specific media file format such as ISOBMFF
- the file encapsulator in the encoding device 100A can also add metadata to the media file 10F or segment sequence 10F s , for example, the metadata here can include projection information and region encapsulation information, and these metadata will help subsequent
- the decoding device renders the encapsulated image obtained after decoding.
- the encoding device 100A can use a specific transmission mechanism (such as DASH, SMTP) to transmit the segment sequence 10F s to the decoding device 100B, and at the same time transmit the media file 10F to the decoding device 100B.
- a specific transmission mechanism such as DASH, SMTP
- the decoding device 100B may be an omnidirectional media application format (Omnidirectional Media Application Format, OMAF) player.
- the decoding device can obtain the panoramic video media file resources and corresponding media presentation description information from the encoding device adaptively and dynamically according to the recommendation of the encoding device or according to the requirements of the business object on the decoding device.
- the tracking information determines the orientation and position of the business object, and then dynamically requests the encoding device to obtain the corresponding media file resources based on the determined orientation and position.
- Media file resources and media presentation description information are transmitted from the encoding device to the decoding device through a transmission mechanism (such as DASH, Smart Media Transport (SMT)).
- the file decapsulation process on the decoding device side is opposite to the file encapsulation process on the encoding device side.
- the decoding device decapsulates the media file resources according to the file format (for example, ISOBMFF) requirements of the panoramic video, and obtains the audio stream and video stream. stream.
- the decoding process on the decoding device side is opposite to the encoding process on the encoding device side.
- the decoding device performs audio decoding on the audio code stream to restore the audio content; the decoding device performs video decoding on the video code stream to restore the video content.
- the media file 10F output by the file encapsulator in the encoding device 100A is the same as the media file 10F' input to the file decapsulator in the decoding device 100B.
- the file decapsulator performs file decapsulation processing on the media file 10F' or the received segment sequence 10F 's , and extracts the encoded code stream, including the audio code stream 10E' a , the video code stream 10E' v , and the coded image 10E' i while parsing the corresponding metadata.
- the video related to the window may be carried in multiple tracks, and before decoding, these tracks can be combined into a single video code stream 10E' v in stream rewriting.
- the decoding device 100B can perform audio decoding on the audio code stream 10E'a to obtain the audio signal 10B'a (that is, the restored audio content); perform video decoding on the video code stream 10E'v , or, encode the image 10E' i Perform image decoding to obtain an image/video signal 10D' (that is, the restored video content).
- the decoding device renders the audio content obtained by audio decoding and the video content obtained by video decoding according to the rendering-related metadata in the media presentation description information, and the playback and output of the image is realized after the rendering is completed.
- the decoding device mainly renders the image based on the current viewpoint, disparity, depth information, and the like.
- the viewpoint refers to the viewing position of the business object
- the parallax refers to the visual difference caused by binocular eyes of the business object or the visual difference caused by movement.
- the panoramic video system supports a data box (Box), which refers to a data block or object including metadata, that is, the data box contains metadata of corresponding media content.
- Panoramic video can include a plurality of data boxes, for example including spherical area scaling data box (Sphere Region Zooming Box), which contains metadata for describing spherical area zooming information; 2D area zooming data box (2D Region Zooming Box), which contains Metadata for describing 2D region scaling information; Region Wise Packing Box (Region Wise Packing Box), which contains metadata for describing corresponding information in the region packing process; and so on.
- Sphere Region Zooming Box which contains metadata for describing spherical area zooming information
- 2D Region Zooming Box 2D Region Zooming Box
- Region Wise Packing Box Region Wise Packing Box
- the decoding device 100B can be based on the current viewing direction or window (i.e. viewing area) and projection, spherical coverage, rotation, and the area encapsulation element obtained from media file 10F' or segment sequence 10F 's parsing data, and project the decoded encapsulated image 10D' (ie image/video signal 10D') onto the screen of a head-mounted display or any other display device.
- the audio signal 10B'a is rendered according to the current viewing direction (eg, through headphones or speakers).
- the current viewing direction is determined by head tracking and possibly eye tracking.
- the current viewing direction may also be used by the video decoder and audio decoder for decoding optimization.
- the current viewing direction will also be passed to the policy module in the decoding device 100B, and the policy module can determine the video track to be received according to the current viewing direction.
- FIG. 3 is a structural diagram of a volumetric video system provided by an embodiment of the present application.
- the volumetric video system includes an encoding device (for example, encoding device 200A) and a decoding device (for example, decoding device 200B).
- the encoding device may refer to a computer device used by a volumetric video provider.
- the computer device It can be a terminal (such as a PC, an intelligent mobile device (such as a smart phone), etc.) or a server.
- the decoding device may refer to a computer device used by a volumetric video user, and the computer device may be a terminal (such as a PC, a smart mobile device (such as a smart phone), a VR device (such as a VR helmet, VR glasses, etc.)).
- the data processing process of the volumetric video includes the data processing process at the encoding device side and the data processing process at the decoding device side.
- the data processing process on the side of the encoding device mainly includes: (1) the process of acquiring and producing the media content of the volumetric video; (2) the process of encoding and packaging the volumetric video.
- the data processing process on the side of the decoding device mainly includes: (1) the process of decapsulating and decoding the volumetric video file; (2) the rendering process of the volumetric video.
- the transmission process involving volumetric video between the encoding device and the decoding device can be carried out based on various transmission protocols.
- the transmission protocols here include but are not limited to: DASH protocol, HLS protocol, SMTP protocol, TCP protocol wait.
- a capture device may refer to a hardware component provided in an encoding device, for example, a capture device refers to a microphone, a camera, a sensor, and the like of a terminal.
- the capture device may also be a hardware device connected to the encoding device, such as a camera connected to a server, for providing the encoding device with an acquisition service of volumetric video media content.
- the capture device may include but not limited to: audio device, camera device and sensor device. Wherein, the audio device may include an audio sensor, a microphone, and the like.
- the camera device may include a common camera, a stereo camera, a light field camera, and the like.
- Sensing devices may include laser devices, radar devices, and the like.
- the number of capture devices can be multiple, and these capture devices are deployed at some specific positions in the real space to simultaneously capture audio content and video content from different angles in the space, and the captured audio content and video content are distributed in time and space. keep in sync.
- media content in a 3-dimensional space that is captured by a capture device deployed at a specific location and used to provide a multi-degree-of-freedom (such as 3DoF+, 6DoF) viewing experience may be referred to as volumetric video.
- a visual scene 20A may be captured by a group of camera arrays connected to an encoding device 200A, Alternatively, it may be captured by an imaging device connected to the encoding device 200A with multiple cameras and sensors, or it may also be captured by multiple virtual cameras.
- the collection result may be the source volume data 20B (that is, the video content of the volume video).
- volumetric video media content production process involved in the embodiments of the present application can be understood as the volumetric video content production process, and the volumetric video content production here is mainly taken by cameras or camera arrays deployed in multiple locations
- the obtained multi-view video, point cloud data, light field and other forms of content are produced.
- the encoding device can convert the volumetric video from a three-dimensional representation to a two-dimensional representation.
- the volumetric video here can contain geometric information, attribute information, occupancy map information, and atlas data, etc.
- Volumetric video generally requires specific processing before encoding.
- point cloud data needs to be cut and mapped before encoding.
- 1 project the three-dimensional representation data (that is, the above-mentioned point cloud data) of the collected and input volumetric video onto a two-dimensional plane, usually using orthogonal projection, perspective projection, ERP projection (Equi-Rectangular Projection, equidistant cylindrical projection)
- the volumetric video projected onto a two-dimensional plane is represented by the data of geometric components, occupancy components, and attribute components.
- the data of the geometric component provides the position information of each point in the volumetric video in three-dimensional space
- the data of the attribute component provides the volume Additional attributes (such as texture or material information) for each point of the video
- the data of the placeholder component indicates whether the data in other components is associated with the volumetric video;
- the tiles generated by a volumetric video can be packaged into one or more atlases;
- the geometry component is mandatory
- the placeholder component is conditionally mandatory
- the attribute component is optional.
- the panoramic video can be captured by the capture device, after such video is processed by the encoding device and transmitted to the decoding device for corresponding data processing, the business object on the decoding device side needs to perform some specific actions (such as Head rotation) to watch 360-degree video information, but performing non-specific actions (such as moving the head) can not get the corresponding video changes, and the VR experience is not good, so it is necessary to provide additional depth information that matches the panoramic video.
- some specific actions such as Head rotation
- non-specific actions such as moving the head
- FIG. 4 is a schematic diagram of 6DoF provided by the embodiment of the present application.
- 6DoF is divided into window 6DoF, omnidirectional 6DoF, and 6DoF.
- window 6DoF means that the rotation and movement of business objects on the X-axis and Y-axis are limited, and the translation on the Z-axis is limited. For example, business Objects cannot see beyond the window frame, and business objects cannot pass through the window.
- Omni-directional 6DoF means that the rotation and movement of business objects on the X-axis, Y-axis, and Z-axis is restricted. For example, business objects cannot freely pass through three-dimensional 360-degree VR content in the restricted movement area.
- 6DoF means that on the basis of 3DoF, business objects can freely translate along the X-axis, Y-axis, and Z-axis. For example, business objects can move freely in three-dimensional 360-degree VR content. Similar to 6DoF, there are 3DoF and 3DoF+ production technologies.
- Fig. 5 is a schematic diagram of 3DoF+ provided by the embodiment of the present application.
- 3DoF+ means that when the virtual scene provided by immersive media has certain depth information, the head of the business object can move in a limited space based on 3DoF to watch the screen provided by the media content.
- the schematic diagram of 3DoF can refer to the above-mentioned FIG. 2 , which will not be repeated here.
- the captured audio content can be directly encoded to form the audio stream of the volumetric video.
- the captured video content can be video coded to obtain the video code stream of the volumetric video. What needs to be explained here is that if the 6DoF production technology is used, a specific encoding method (such as a point cloud compression method based on traditional video encoding) needs to be used for encoding in the video encoding process.
- the media file resource can be a media file of the volumetric video formed by a media file or a media segment;
- MPD media presentation description information
- the metadata is a general term for the information related to the presentation of the volumetric video.
- the metadata It may include the description information of the media content, the timing metadata information describing the mapping relationship between each viewpoint group constructed and the spatial position information of watching the media content, the description information of the window, and the information related to the presentation of the media content. signaling information, etc.
- the encoding device stores media presentation description information and media file resources formed after the data processing process.
- the collected audio will be encoded into a corresponding audio stream
- the geometric information, attribute information and occupancy map information of the volumetric video can adopt the traditional video encoding method
- the atlas data of the volumetric video can adopt the entropy encoding method .
- encapsulate the encoded media in a file container according to a certain format such as ISOBMFF, HNSS
- combine the metadata and window metadata describing the properties of the media content to form a media file or an initialization segment according to a specific media file format and media fragments.
- the encoding device 200A performs volumetric video encoding on one or more volumetric video frames in the source volumetric video data 20B to obtain an encoded VC3 code stream 20Ev (i.e., a video code stream), including a picture Set code stream (that is, the code stream obtained after encoding the atlas data), at most one occupation code stream (that is, the code stream obtained after encoding the occupancy map information), and one geometric code stream (that is, the code stream that encodes geometric information code stream obtained later), and zero or more attribute code streams (that is, code streams obtained after encoding attribute information).
- a picture Set code stream that is, the code stream obtained after encoding the atlas data
- at most one occupation code stream that is, the code stream obtained after encoding the occupancy map information
- one geometric code stream that is, the code stream that encodes geometric information code stream obtained later
- zero or more attribute code streams that is, code streams obtained after encoding attribute information
- the encoding device 200A can encapsulate one or more encoded code streams into a media file 20F for local playback according to a specific media file format (such as ISOBMFF) or into a media file 20F for streaming transmission containing a
- the segment sequence 20F s of the segment and the plurality of media segments is initialized.
- the file encapsulator in the encoding device 200A may also add metadata to the media file 20F or segment sequence 20F s .
- the encoding device 200A may use a certain transmission mechanism (such as DASH, SMTP) to transmit the segment sequence 20F s to the decoding device 200B, and at the same time transmit the media file 20F to the decoding device 200B.
- the decoding device 200B here may be a player.
- the decoding device can adaptively and dynamically obtain volumetric video media file resources and corresponding media presentation description information from the encoding device through the recommendation of the encoding device or according to the requirements of the business object on the decoding device side.
- the tracking information determines the orientation and position of the business object, and then dynamically requests the encoding device to obtain the corresponding media file resources based on the determined orientation and position.
- Media file resources and media presentation description information are transmitted from the encoding device to the decoding device through a transmission mechanism (such as DASH, SMT).
- the file decapsulation process on the decoding device side is opposite to the file encapsulation process on the encoding device side.
- the decoding device decapsulates the media file resources according to the volumetric video file format (for example, ISOBMFF) to obtain the audio stream and video stream. stream.
- the decoding process on the decoding device side is opposite to the encoding process on the encoding device side.
- the decoding device performs audio decoding on the audio code stream to restore the audio content; the decoding device performs video decoding on the video code stream to restore the video content.
- the media file 20F output by the file encapsulator in the encoding device 200A is the same as the media file 20F' input to the file decapsulator in the decoding device 200B.
- the file decapsulator performs file decapsulation processing on the media file 20F' or the received segment sequence 20F 's , and extracts the encoded VC3 code stream 20E' v , and parses the corresponding metadata at the same time, and then can process the VC3 code stream 20E' v performs volumetric video decoding to obtain a decoded video signal 20D' (that is, the restored video content).
- the decoding device renders the audio content obtained by audio decoding and the video content obtained by video decoding according to the rendering-related metadata in the media presentation description information corresponding to the media file resource. After the rendering is completed, the playback and output of the image is realized.
- the volumetric video system supports a data box (Box).
- a data box refers to a data block or object including metadata, that is, a data box includes metadata of corresponding media content.
- the volumetric video may include multiple data boxes, for example, a file encapsulation data box (ISO Base Media File Format Box, ISOBMFF Box), which contains metadata for describing corresponding information when the file is encapsulated.
- ISOBMFF Box ISO Base Media File Format Box
- the decoding device 200B can reconstruct the decoded video signal 20D' based on the current viewing direction or window to obtain the reconstructed volumetric video data 20B', and then can reconstruct the reconstructed video signal 20B'.
- the volumetric video data 20B' is rendered and displayed on the screen of the head mounted display or any other display device.
- the current viewing direction is determined by head tracking and possibly eye tracking. In window-related transmission, the current viewing direction will also be passed to the strategy module in the decoding device 200B, and the strategy module can determine the track to be received according to the current viewing direction.
- the decoding device can dynamically obtain the media file resource corresponding to the immersive media from the encoding device side, because the media file resource is The captured audio and video content is encoded and encapsulated by the encoding device. Therefore, after the decoding device receives the media file resource returned by the encoding device, it needs to decapsulate the media file resource first to obtain the corresponding audio and video. code stream, and then decode the audio and video code stream, and finally present the decoded audio and video content to the business object.
- the immersive media here includes but not limited to panoramic video and volumetric video, where volumetric video can include multi-view video, video-based Point Cloud Compression (VPCC) point cloud media based on traditional video coding, geometric model-based Point cloud compression (Geometry-based Point Cloud Compression, GPCC) point cloud media.
- VPCC Point Cloud Compression
- GPCC geometric model-based Point cloud compression
- the business object consumes immersive media
- interactive feedback can be continuously performed between the decoding device and the encoding device. It can provide business objects with corresponding media file resources according to the content of interactive feedback.
- the playable media content obtained after decapsulating and decoding the media file resources of immersive media can be collectively referred to as immersive media content.
- the decoding device can play the immersive media content restored from the acquired media file resources on the video playback interface. That is to say, one media file resource may correspond to one immersive media content.
- the immersive media content corresponding to the first media file resource may be referred to as the first immersive media content
- the immersive media content corresponding to the second media file resource may be called
- the immersive media content is called the second immersive media content
- other media file resources and the corresponding immersive media content can also be named similarly.
- an embodiment of the present application provides a method for indicating an immersive media interaction feedback message.
- a video client may run on the decoding device (such as a user terminal), and then the first immersive media content may be played on a video playback interface of the video client.
- the first immersive media content here is obtained after the decoding device decapsulates and decodes the first media file resource, and the first media file resource is obtained by encoding the relevant audio and video content in advance by the encoding device (such as a server). obtained after encoding and encapsulation.
- the decoding device may respond to the interactive operation on the first immersive media content, and generate an interactive feedback message corresponding to the interactive operation, where the interactive feedback message here carries information used to describe the interactive operation.
- the decoding device can send the interaction feedback message to the encoding device, so that the encoding device can determine the business event indicated by the interaction operation based on the business key field in the interaction feedback message, and can obtain the information used for the information based on the service event indicated by the interaction operation.
- the second media file resource that responds to the interactive operation. Wherein, the second media file resource here is obtained by encoding and encapsulating relevant audio and video content in advance by the encoding device.
- the decoding device can receive the second media file resource returned by the encoding device, and decapsulate and decode the second media file resource, so as to obtain the playable second immersive media content, and then play the second media file resource on its video playback interface.
- Immersive media content For the process of decapsulating and decoding the second media file resource, refer to the related process described in the embodiment corresponding to FIG. 1 or the related process described in the embodiment corresponding to FIG. 3 .
- the above-mentioned interactive operations may not only include operations related to the user's location (for example, the user's location changes), but also include other operations on the immersive media content currently played by the video client (for example, the zoom operation ), therefore, through the business key field carried in the interactive feedback message, the video client on the decoding device can feed back multiple types of business events to the encoding device, so that the encoding device can determine based on these different types of business events
- the immersive media content in response to the interactive operation instead of relying only on user location information, can enrich the information types of interactive feedback, and can improve the accuracy of media content acquired by the video client during the interactive feedback process.
- FIG. 6 is a schematic diagram of the architecture of a data processing system 300 for immersive media provided by an embodiment of the present application.
- the terminal 400 is a decoding device equipped with a video client
- the terminal 400 is connected to the server 600 (that is, the encoding device) through the network 500.
- the network 500 may be a wide area network or a local area network, or a combination of the two, and wireless or wired links are used to realize data transmission.
- the terminal 400 (video client) is configured to respond to the interactive operation on the first immersive media content, and generate an interactive feedback message corresponding to the interactive operation;
- the interactive feedback message carries a business key field, and the business key field is used to describe A business event indicated by an interaction;
- the server 600 is configured to determine a business event indicated by the interactive operation based on the interactive feedback message, obtain second immersive media content for responding to the interactive operation based on the business event, and return the second immersive media content to the terminal 400;
- the terminal 400 (video client) is further configured to receive and play the returned second immersive media content.
- the server (such as the server 600) can be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, Cloud servers for basic cloud computing services such as network services, cloud communications, middleware services, domain name services, security services, content delivery network (CDN, Content Delivery Network), and big data and artificial intelligence platforms.
- Terminals (such as terminal 400) may be smart phones, tablet computers, notebook computers, desktop computers, intelligent voice interaction devices (such as smart speakers), smart home appliances (such as smart TVs), smart watches, vehicle-mounted terminals, etc., but are not limited to this.
- the terminal and the server may be connected directly or indirectly through wired or wireless communication, which is not limited in this embodiment of the present application.
- the terminal or server can implement the data processing method for immersive media provided by the embodiment of the present application by running a computer program.
- the computer program can be a native program or a software module in the operating system; it can be a local (Native) application program (APP, Application), that is, a program that needs to be installed in the operating system to run; it can also be a small program, that is, a program that only needs to be downloaded to the browser environment to run; it can also be embedded in A small program in any APP.
- APP Native application program
- the above-mentioned computer program can be any form of application program, module or plug-in.
- the method provided by the embodiment of this application can be applied to the server side (ie, encoding device side), player side (ie, decoding device side), and intermediate nodes (eg, SMT receiving entity, SMT sending entity) of an immersive media system.
- server side ie, encoding device side
- player side ie, decoding device side
- intermediate nodes eg, SMT receiving entity, SMT sending entity
- Fig. 7 is a schematic flowchart of a data processing method for immersive media provided by an embodiment of the present application.
- the method can be performed by a decoding device in an immersive media system (for example, a panoramic video system or a volumetric video system), and the decoding device can be the decoding device 100B in the above embodiment corresponding to FIG. Corresponds to the decoding device 200B in the embodiment or the terminal 400 in FIG. 6 .
- the decoding device may be a user terminal integrated with a video client, and the method may at least include the following steps S101-step S103:
- Step S101 in response to an interactive operation on the first immersive media content, generating an interactive feedback message corresponding to the interactive operation.
- the interaction feedback message carries a business key field, which is used to describe the business event indicated by the interaction operation, and the business event includes at least one of the following: scaling event, switching event and location interaction event;
- the video client on the user terminal obtains the first media file resource returned by the server, it can decapsulate and decode the first media file resource to obtain the first immersive media content, and then can The first immersive media content is played on the video playback interface of the video client.
- the first immersive media content here refers to the immersive media content currently being watched by the business object
- the business object here may refer to a user who consumes the first immersive media content.
- the first immersive media content may belong to an immersive video
- the immersive video may be a video collection including one or more immersive media contents.
- the embodiment of the present application does not limit the amount of content included in the immersive video.
- an immersive video provided by the server includes N immersive media contents, where N is an integer greater than 1, namely: immersive media content A1 associated with scene P1, immersive media content A2 associated with scene P2, ... , the immersive media content AN associated with scene P1, then the video client can obtain any one or more immersive media contents from the above N immersive media contents according to the recommendation of the server or the requirements of the business object, for example, immersive media content A1, then the immersive media content A1 can be used as the current first immersive media content at this time.
- the immersive video may be a panoramic video; for example, the immersive video may be a volumetric video, and this embodiment of the present application does not limit the video type of the immersive video.
- the video client may respond to an interactive operation on the currently playing first immersive media content, and generate an interaction corresponding to the interactive operation feedback message.
- the interactive feedback message can also be called interactive feedback signaling, which can provide interactive feedback between the video client and the server during immersive media consumption.
- the SMT The receiving entity can periodically send entity feedback virtual camera direction information to the SMT to notify the current VR virtual camera direction.
- the corresponding direction information will also be sent when the field of view (Field of view, FOV) changes.
- the SMT receiving entity can periodically send entity feedback information about the position of the virtual camera, the position of the business object, and the viewing direction to the SMT, so that the video client can obtain the corresponding media content .
- the SMT receiving entity and the SMT sending entity belong to intermediate nodes between the video client and the server.
- the interactive operation refers to the operation performed by the business object on the first immersive media content currently consumed, including but not limited to zooming operation, switching operation, and location interactive operation.
- the zoom operation corresponds to a zoom event, and refers to the operation of reducing or enlarging the screen size of the first immersive media content. For example, by double-clicking the immersive media content A1, the screen of the immersive media content A1 can be enlarged; and for example, The immersive media content A1 can be slid and stretched simultaneously by double-pointing in different directions, so as to reduce or enlarge the screen of the immersive media content A1.
- the switching operation here corresponds to a switching event, and may include a playback rate switching operation, an image quality switching operation (that is, a resolution switching operation), a flipping operation, and a content switching operation performed on the first immersive media content.
- Other event-based trigger operations such as the click operation on the target position in the screen, the trigger operation when the business object faces the target direction, and so on.
- the position interaction operation corresponds to the position interaction event, which refers to the operation on the object position information (that is, the user position information) generated by the business object when watching the first immersive media content, such as the change of the real-time position, the change of the viewing direction, Changes in viewing angle, etc.
- the corresponding position interaction operation when the first immersive media content belongs to panoramic video, the corresponding position interaction operation is referred to as the first position interaction operation; when the first immersive media content belongs to volumetric video, the corresponding position interaction operation is referred to as The location interaction is called a second location interaction. It should be noted that, the embodiment of the present application does not limit the triggering manner of the scaling operation, the switching operation, and the location interaction operation.
- interaction feedback message may carry a business key field used to describe the business event indicated by the interaction operation.
- the interaction feedback message may directly include the business key field, where the business key field may include the first key field, the second key field, the third key field and the fourth key field One or more (i.e. at least two) of .
- the first key field is used to represent the zoom ratio when the zoom event indicated by the zoom operation is executed when the interactive operation includes a zoom operation
- the second key field is used to represent that when the interactive operation includes a switch operation, the The event label and event state corresponding to the indicated switching event
- the third key field is used to represent the first object position information of the business object that watches the first immersive media content belonging to the panoramic video when the interactive operation includes the first position interactive operation (for example, the real-time position of the business object, the direction of view, etc.)
- the fourth key field is used to characterize the second object of the business object of the first immersive media content belonging to the volumetric video when the interactive operation includes the second position interactive operation Location information (for example, the real-time viewing direction of business objects).
- an interaction feedback message may carry business key fields used to describe business events indicated by one or more interaction operations, and this embodiment of the present application does not limit the number and types of interaction operations corresponding to the interaction feedback message.
- the business key fields carried in an interactive feedback message may include any one of the first key field, the second key field, the third key field, and the fourth key field, for example, every occurrence
- the video client generates a corresponding interactive feedback message.
- the business key fields carried in an interactive feedback message may include the first key field, the second Any one or more fields in the key field and the third key field include, for example, the first key field and the third key field.
- the business key field carried in an interactive feedback message can include any one of the first key field, the second key field, and the fourth key field one or more fields.
- the business object performs a zoom operation on the above-mentioned immersive media content A2, and the object position information of the business object changes during the viewing process of the immersive media content A2.
- the business object watches while walking, if Immersive media content A2 belongs to volumetric video, then the interactive feedback message generated at this time will include the first key field reflecting the zoom ratio and the fourth key field reflecting the position information of the second object at the same time, so the finally obtained second
- the immersive media content is jointly determined based on the scaling ratio and the second object position information, that is, the immersive media content that responds to the interactive operation can be determined based on different types of business events, that is, combining multiple information dimensions
- the media content is determined. In this way, the accuracy of obtaining the media content by the video client during the interactive feedback process can be improved.
- the above interaction feedback message may also include an information identification field, which is used to characterize the information type of the business event indicated by each interaction operation.
- the field value of the information identification field may be the information corresponding to each type of business event In this way, when an interactive feedback message carries multiple business events at the same time, the information type can be distinguished through the information identification field.
- the embodiment of this application does not limit the timing of interactive feedback, which can be agreed at the application layer according to actual needs.
- a video client when a video client detects an interactive operation, it can immediately generate a corresponding interaction feedback message and send it to the server; in some embodiments, the video client can periodically send an interaction feedback message to the server. Feedback to the server once every 30 seconds.
- the interaction feedback information may carry an interaction signaling table associated with the interaction operation, and the interaction signaling table includes a business key field for describing a business event indicated by the interaction operation. That is to say, the interaction feedback message can redefine and organize many different types of business events in the form of an interaction signaling table.
- the video client responds to the trigger operation for the first immersive media content, determines the first information type field of the service event indicated by the trigger operation, and records the trigger operation Operation timestamp.
- the trigger operation here may refer to a contact operation or some specific non-contact operation on the first immersive media content, for example, the trigger operation may include a zoom operation, a switch operation, and the like.
- the video client may add the first information type field and the operation timestamp to the interactive signaling table associated with the first immersive media content, and may add the first information type field added in the interactive signaling table field as a business-critical field describing the business event indicated by the interaction.
- the video client can generate an interaction feedback message corresponding to the trigger operation based on the service key field and the operation timestamp in the interaction signaling table.
- the first information type field here may be used to represent the information type of the business event indicated by the trigger operation. It can be understood that each trigger operation may correspond to an interaction signaling table, therefore, the same interaction feedback message may include one or more interaction signaling tables, and the embodiment of the present application does not require the interaction signaling table included in the interaction feedback message The number is not limited.
- the business event indicated by the scaling operation is a scaling event
- the field value of the first information type field corresponding to the scaling operation is the first field value
- the field mapped to the first information type field of a field value is used to represent the scaling ratio when the scaling event is executed.
- the service event indicated by the switching operation is a switching event
- the field value of the first information type field corresponding to the switching operation is the second field value
- the field mapped to the first information type field of the two-field value is used to represent the event label and event status of the switching event.
- table_id is a signaling table identification field, which is used to represent an identifier of an interactive signaling table.
- version is a signaling table version field, which is used to represent the version number of the interactive signaling table.
- length is a signaling table length field, which is used to represent the length of the interactive signaling table.
- table_type is the first information type field, which is used to characterize the type of information carried in the interactive signaling table (for example, scaling event or switching event).
- timestamp is the operation timestamp, which is used to indicate the timestamp generated by the current trigger operation, and UTC time (Universal Time Coordinated, Coordinated Universal Time) may be used here.
- zoom_ratio indicates the zoom behavior of the business object Ratio, that is, the zoom ratio when the zoom event is executed (also referred to as screen zoom information)
- zoom_ratio can be in units of 2-3 .
- zoom_ratio may also be used as the first key field described in the foregoing optional implementation manners.
- the fields mapped to the first information type field are event_label and event_trigger_flag, and event_label indicates the event label triggered by the business object interaction , event_trigger_flag indicates the event state triggered by the business object interaction.
- event_trigger_flag when the value of event_trigger_flag is 1 (that is, the first state value), it indicates that the event is triggered (that is, the switching event is in the event-triggered state), and the value of event_trigger_flag is 0 (that is, The second state value) indicates that the event ends (that is, the switching event is in the event end state).
- event_label and event_trigger_flag can also be used as the second key fields described in the foregoing optional implementation manners. Also, reserved indicates reserved byte bits.
- the embodiment of the present application does not limit the values of the first field value and the second field value, and does not limit the values of the first state value and the second state value. It should be understood that this embodiment of the present application can support developers to pre-define the required switching events at the application layer, and the content of the event label can be determined according to the content of the immersive media, which is not limited in the embodiment of the present application. It should be noted that relevant The immersive media content needs to support custom switching events, so that it is possible to trigger events in the subsequent interaction process. For example, when the above immersive media content F2 supports content switching, it will be played in the video playback interface of the immersive media content F2 Displays the corresponding content toggle control.
- the embodiment of the present application may also support carrying object location information in the interaction feedback message, which may also be defined in the form of an interaction signaling table.
- the video client detects the object location information of the business object watching the first immersive media content, it takes the location interaction operation for the object location information as an interaction operation in response to the first immersive media content, and then can determine the location information of the first immersive media content.
- the second information type field of the business event indicated by the interaction operation and may record the operation time stamp of the interaction operation.
- the second information type field and the operation timestamp may be added to the interactive signaling table associated with the first immersive media content, and the added second information type field in the interactive signaling table may be used as Business key field used to describe the business event indicated by the above interaction operation.
- the video client can generate an interaction feedback message corresponding to the interaction operation based on the service key field and the operation timestamp in the interaction signaling table. It can be understood that each location interaction operation may correspond to an interaction signaling table, therefore, the same interaction feedback message may include one or more interaction signaling tables, but the same interaction feedback message cannot simultaneously contain the first An interactive signaling table of object location information and an interactive signaling table carrying second object location information.
- the object location information of the business object may or may not change within a period of time when the business object consumes the first immersive media content.
- the server can still obtain the corresponding immersive media content based on the object location information.
- the obtained immersive media content may be the same as the first immersive media content; similarly, except for the object location information.
- the video client also feeds back other information to the server during this period, for example, the zoom ratio when performing a zoom operation on the first immersive media content, the server can obtain the corresponding object position information and the zoom ratio based on the object position information.
- Immersive media content the acquired immersive media content at this time is different from the first immersive media content.
- the field value of the second information type field corresponding to the object position information is the third field value
- the second information type field with the third field value includes a first type location field
- the first type location field is used to describe the location change information of the business object watching the first immersive media content belonging to the panoramic video.
- the field value of the second information type field corresponding to the object position information is the fourth field value
- the second information type field with the fourth field value includes a second type of location field
- the second type of location field is used to describe the location change information of the business object watching the first immersive media content belonging to the volumetric video.
- Table 2 which is used to indicate the syntax of an interactive signaling table provided by the embodiment of the present application:
- table_id is a signaling table identification field, which is used to represent an identifier of an interactive signaling table.
- version is a signaling table version field, which is used to represent the version number of the interactive signaling table.
- length is a signaling table length field, which is used to represent the length of the interactive signaling table.
- table_type is a second information type field, which is used to characterize the type of information carried in the interactive signaling table (such as first object location information or second object location information).
- timestamp is the operation timestamp, which is used to indicate the timestamp generated by the interactive operation at the current location, and UTC time can be used here.
- the first type of location fields it contains are: 3DoF+_flag indicates 3DoF+ video content; interaction_target is the interaction target field, indicating the video client
- the target of the current interaction includes the current status of the helmet device (HMD_status), the target of the business object (Object of interests), the current status of the business object (User_status), etc.
- interaction_type is an interaction type field, which is set to 0 in this embodiment of the application.
- the value of the interaction target field interaction_target can refer to Table 3, and Table 3 is used to indicate a value table of the interaction target field provided in the embodiment of the present application:
- ClientRegion is the window information, indicating the size and screen resolution of the video client window.
- Table 4 for syntax, which is used to indicate the syntax of a window information provided by the embodiment of the present application:
- Region_width_angle indicates the horizontal opening angle of the video client window, with an accuracy of 2-16 degrees and a value range of (-90*2 16 , 90*2 16 ).
- Region_height_angle indicates the vertical opening angle of the video client window, the precision is 2 -16 degrees, and the value range is (-90*2 16 , 90*2 16 ).
- Region_width_resolution indicates the horizontal resolution of the video client window, and the value range is (0, 2 16 -1).
- Region_height_resolution indicates the vertical resolution of the video client window, and the value range is (0, 2 16 -1).
- ClientRotation is the viewing angle direction, indicating the change of the real-time viewing angle of the business object relative to the initial viewing angle.
- syntax for See Table 5 which is used to indicate the grammar of a viewing angle direction provided by the embodiment of the present application:
- 3D_rotation_type indicates the representation type of the rotation information, the value of this field is 0, indicating that the rotation information is given in the form of Euler angles; the value of this field is 1, indicating that the rotation information is given in the form of a quaternion ; Other values are reserved.
- rotation_yaw indicates the yaw angle of the real-time viewing angle of the business object relative to the initial viewing angle along the x-axis, and the value range is (-180*2 16 , 180*2 16 –1).
- rotation_pitch indicates the pitch angle along the y-axis of the real-time viewing angle of the business object relative to the initial viewing angle, and the value range is (-90*2 16 , 90*2 16 ).
- rotation_roll indicates the rolling angle of the real-time viewing angle of the business object relative to the initial viewing angle along the z-axis, and the value range is (-180*2 16 , 180*2 16 –1).
- rotation_x, rotation_y, rotation_z, and rotation_w respectively indicate the value of the quaternion x, y, z, and w components, indicating the rotation information of the real-time viewing angle of the business object relative to the initial viewing angle.
- ClientPosition is the real-time position of the business object, indicating that the business object is in the virtual scene The displacement relative to the starting position, the field value of all fields in the structure is 0 when 3DoF (that is, the value of 3DoF+_flag is 0), and the field value of all fields in the structure is 3DoF+ (that is, the value of 3DoF+_flag is 1). The field value is non-zero, and the value range should be within the constraint range.
- behavior_coefficient defines an amplification behavior coefficient.
- Table 6 is used to indicate the syntax of a real-time position of a business object provided by the embodiment of the present application:
- position_x indicates the displacement of the real-time position of the business object relative to the initial position along the x-axis, and the value range is (-2 15 , 2 15 -1) mm.
- position_y indicates the displacement of the real-time position of the business object relative to the initial position along the y-axis, and the value range is (-2 15 , 2 15 -1) mm.
- position_z indicates the displacement of the real-time position of the business object relative to the initial position along the z-axis, and the value range is (-2 15 , 2 15 -1) mm.
- the first type of position field included when the field value of table_type is 0 can be used as the aforementioned third key field.
- Table 2 when the field value of table_type is 1 (that is, the value of the fourth field), the second type of position fields it contains are:
- ClientPosition indicates the current position of the business object in the global coordinate system , its syntax can be found in Table 6 above.
- V3C_orientation indicates the viewing direction of the business object in the Cartesian coordinate system established with the current position.
- last_processed_media_timestamp indicates the timestamp of the last media unit that has been added to the decoder buffer. The SMT sending entity uses this field to determine the next media unit to transmit from the volumetric video player's new asset (i.e. new immersive media content).
- the next media unit is the media unit with a timestamp or sequence number immediately following the timestamp.
- the SMT sending entity switches from transmitting the previous asset (determined according to the previous window) to transmitting the new asset (determined according to the new window), starting with the subsequent media timestamp, to reduce the delay in receiving media content corresponding to the new window.
- Table 7 for the syntax of V3C_orientation, which is used to indicate the syntax of a real-time viewing direction of a business object provided by the embodiment of this application:
- dirx represents the coordinates on the x-axis of the viewing direction of the business object in a Cartesian coordinate system established with the location of the business object as the origin.
- diry indicates that the Cartesian coordinate system is established with the location of the business object as the origin, and the coordinates of the viewing direction of the business object on the y-axis.
- dirz indicates that the Cartesian coordinate system is established with the location of the business object as the origin, and the coordinate of the viewing direction of the business object on the z-axis.
- the second type of location field included when the field value of table_type is 1 may serve as the aforementioned fourth key field.
- the embodiment of the present application can also combine the above Table 1 and Table 2 to obtain an interactive signaling table that can represent at least four types of information, so that the interactive feedback corresponding to the interactive operation can be generated based on the interactive signaling table
- an interactive signaling table that can represent at least four types of information
- Step S102 sending an interaction feedback message, where the interaction feedback message is used to determine a service event indicated by the interaction operation, and acquire second immersive media content for responding to the interaction operation based on the service event.
- the terminal sends the interaction feedback message to the server, so that the server determines the business event indicated by the interaction operation based on the business key field in the interaction feedback message, and obtains the first service event for responding to the interaction operation based on the business event indicated by the interaction operation.
- the video client may send an interaction feedback message to the server, and after receiving the interaction feedback message, the subsequent server may determine the business event indicated by the interaction operation based on the business key field in the interaction feedback message, and then may further base on The business event indicated by the interactive operation acquires the second media file resource corresponding to the second immersive media content used to respond to the interactive operation.
- the second media file resource is obtained by the server after encoding and packaging the relevant audio and video content in advance.
- the obtained, which corresponds to the second immersive media content, the process of encoding and packaging the audio and video content can refer to the relevant description in the above embodiment corresponding to FIG. 1 or FIG. 3 , and will not be repeated here.
- the interactive operation is a definition switching operation
- the server may obtain a media file resource matching the resolution according to the resolution indicated by the definition switching operation as the second media file in response to the definition switching operation resource.
- Step S103 receiving returned second immersive media content.
- the video client may receive the second immersive media content returned by the server, and may play the second immersive media content on the video playback interface.
- the server first obtains the media file resource corresponding to the second immersive media content based on the business event, that is, the second media file resource, and can return the second media file resource to the video client, Therefore, after the video client receives the second media file resource, it can decapsulate and decode the second media file resource through the relevant description in the embodiment corresponding to FIG. 1 or FIG. 3 above, so as to obtain the The second immersive media content played on the video playback interface of the terminal, wherein, the process of decapsulating and decoding will not be repeated here.
- FIG. 8 is a schematic flowchart of a data processing method for immersive media provided by an embodiment of the present application.
- the method can be performed by a decoding device in an immersive media system (for example, a panoramic video system or a volumetric video system), and the decoding device can be the decoding device 100B in the above embodiment corresponding to FIG. Corresponds to the decoding device 200B in the embodiment.
- the decoding device may be a user terminal integrated with a video client, and the method may at least include the following steps:
- Step S201 in response to the video playback operation for the immersive video in the video client, generate a playback request corresponding to the video playback operation, and send the playback request to the server, so that the server can obtain the first immersive media content of the immersive video based on the playback request ;
- a business object when a business object wishes to experience immersive video, it may request corresponding immersive media content through a video client on a user terminal.
- the video client can respond to the video playback operation of the immersive video in the video client, generate a playback request corresponding to the video playback operation, and then send the playback request to the server, so that the server can obtain information based on the playback request.
- a first media file resource corresponding to the first immersive media content in the immersive video refers to the data obtained after the server encodes and encapsulates related audio and video content.
- Step S202 receiving the first immersive media content returned by the server, and playing the first immersive media content on the video playback interface of the video client;
- the first media file resource may be returned to the video client, so that the video client may receive the first media file resource returned by the server.
- media file resources and decapsulate and decode the first media file resources to obtain the first immersive media content that can be played on the video playback interface of the video client.
- Step S203 when the first immersive media content is played on the video playback interface of the video client, in response to the interactive operation on the first immersive media content, an interactive feedback message corresponding to the interactive operation is generated; the interactive feedback message carries information associated with the interactive operation Interactive signaling table;
- the video client may respond to an interactive operation on the first immersive media content, and generate an interactive feedback message corresponding to the interactive operation, for example, the video client responds
- the information type field of the business event indicated by the interactive operation is determined, and the operation time stamp of the interactive operation is recorded.
- the information type field and the operation time stamp may be added to the interactive signaling table associated with the first immersive media content, and the information type field added in the interactive signaling table is used as an indication for describing the interactive operation.
- the business key field of the business event Subsequently, an interaction feedback message corresponding to the interaction operation may be generated based on the service key field and the operation timestamp in the interaction signaling table.
- the interactive operation here may include one or more of a zooming operation, a switching operation, and a location interaction operation, where the location interaction operation may be a first location interaction operation or a second location interaction operation.
- the interactive feedback message may carry an interactive signaling table associated with the interactive operation, and the information type field contained in the interactive signaling table may be used as a business key for describing the business event indicated by the interactive operation field.
- the information type field may include a first information type field related to a trigger operation and a second information type field related to a location interaction operation.
- the first information type field and the second information type field are collectively referred to as information type field.
- Table 1 and Table 2 can be combined to obtain an interactive signaling table that can represent at least four types of information.
- interactive feedback messages of different information types can be sent through the interactive signaling table. Integrate together without appearing confusing due to the diversity of information types.
- Table 8 which is used to indicate the syntax of an interactive signaling table provided by the embodiment of the present application:
- the table_type shown in Table 8 above is an information type field, which can be used to represent the type of information carried in the interactive signaling table.
- Table 1 and Table 2 in the above embodiment corresponding to FIG. 3 which will not be repeated here.
- the value of table_type can refer to Table 9, which is used to indicate the value table of an information type field provided by the embodiment of the present application:
- the field value of the information type field can be the first field value (for example, 2), the second field value (for example, 3), the third field value (for example, 0), the fourth field value (for example, 1) etc.
- the panoramic video user position change information in Table 9 is the position change information described in the first type of position field
- the volumetric video user position change information is the position change information described in the second type of position field
- the screen zoom information is the execution The zoom ratio when zooming the event
- the interactive event trigger information includes the event label and event status of the switching event, and other value information can be added later.
- the interaction feedback message generated in the embodiment of the present application may support richer interaction feedback scenarios.
- Table 10 which is used to indicate the syntax of an interactive feedback message provided by the embodiment of the present application:
- message_id indicates the identifier of the interaction feedback message.
- version indicates the version of the interaction feedback message, and the information carried by the new version will overwrite any previous old version.
- the length indicates the length of the interaction feedback message in bytes, that is, the length from the next field to the last byte of the interaction feedback message, where the value of "0" is invalid in this field.
- number_of_tables is a field for the number of signaling tables, indicating the number of interactive signaling tables included in the interactive feedback message, which is represented by N1 here, and the value of N1 is not limited in this embodiment of the present application.
- table_id is a signaling table identification field, indicating the identifier of each interaction signaling table included in the interaction feedback message, which is a copy of the table_id field included in the payload of the interaction feedback message in the interaction signaling table.
- table_version is a signaling table version field, indicating the version number of each interaction signaling table included in the interaction feedback message, which is a copy of the version field of the interaction signaling table included in the payload of the interaction feedback message.
- table_length is a signaling table length field, indicating the length of each interaction signaling table included in the interaction feedback message, and is a copy of the length field of the interaction signaling table included in the payload of the interaction feedback message.
- asset_group_flag is a resource group attribute field, which is used to represent the affiliation between the first immersive media content and the immersive media content set contained in the target resource group, for example, the field value in the resource group attribute field is the first attribute field value (for example , 1), the resource group attribute field with the first attribute field value is used to represent that the first immersive media content belongs to the immersive media content set; the field value in the resource group attribute field is the second attribute field value (for example, 0) , the resource group attribute field with the second attribute field value is used to indicate that the first immersive media content does not belong to the immersive media content set, that is to say, a value of 1 for asset_group_flag indicates that the video client is currently consuming content (that is, the first immersive media content media content) belongs to a resource group (such as the target resource group), and a resource group attribute field, which is used to represent the affiliation between the first immersive media content and the immersive media content set contained in the target resource group, for example, the field value in the resource group attribute field is the first attribute field value
- the resource group refers to a collection containing multiple immersive media contents
- the immersive video in the embodiment of the present application may include multiple immersive media contents (for example, the first immersive media content), and the multiple immersive media contents may be in the form of Subdivide the resource group as a unit, for example, the immersive video itself can be regarded as a resource group, that is to say, all immersive media content in the immersive video belongs to a resource group; or, the immersive video can be Divided into multiple resource groups, each resource group can include multiple immersive media contents in the immersive video.
- asset_group_id is a resource group identification field, which indicates the resource group identifier of the content currently consumed by the video client, that is, the identifier of the resource group (such as the target resource group) corresponding to the immersive media content set to which the first immersive media content belongs.
- asset_id indicates the identifier of the content currently consumed by the video client. It should be understood that each immersive media content has a unique corresponding asset_id. When the first immersive media content belongs to a certain resource group, the number of the first immersive media content currently consumed by the video client may be more than one. At this time, feedback among them The asset_id of a certain first immersive media content is obviously inappropriate, so identifiers of multiple resource groups to which the first immersive media content belongs may be fed back.
- table() is an interactive signaling table entity.
- the interactive signaling table in the payload is in the same order as the table_id in the extension field.
- An interactive signaling table can be used as an instance of table(). Wherein, the order of the interactive signaling table may be sorted according to the corresponding operation timestamp, or sorted according to the table_id corresponding to the interactive signaling table, or other sorting methods may be adopted, which is not limited in this embodiment of the present application.
- the loop statement is used in the interaction feedback message shown in Table 10, so the business events carried by one or more interaction signaling tables included in the interaction feedback message can be fed back in an orderly manner, that is, in the interaction
- the server reads each interactive signaling table sequentially according to the order of the interactive signaling tables presented in the loop statement.
- the above signaling table quantity field, signaling table identification field, signaling table version field, signaling table length field, resource group attribute field and resource group identification field all belong to the extended description field newly added at the system layer of the video client .
- the embodiment of the present application redefines and organizes interactive feedback messages on the basis of existing technologies, and adds two types of feedback information, zooming and event triggering, to support richer interactions. Feedback scenarios, and can improve the accuracy of media content acquired by the video client during the interactive feedback process.
- Step S204 sending the interaction feedback message to the server, so that the server extracts the interaction signaling table, determines the business event indicated by the interaction operation according to the information type field in the interaction signaling table, and obtains the service event indicated by the interaction operation based on the business event indicated by the interaction operation the second immersive media content responsive to the interactive manipulation;
- the video client may send the interaction feedback message to the server, and after receiving the interaction feedback message, the server may extract the interaction signaling table from the interaction feedback message sequentially, and may use the extracted interaction signaling
- the information type field is read from the table, and then the business event indicated by the interaction operation is determined according to the information type field.
- the second immersive media content for responding to the interactive operation can be obtained from the above immersive video, and the second immersive media content can be returned to the video client.
- the zoom ratio corresponding to the scaling event can be obtained as a business event; when the field value of the information type field is the second field value, the event label and The event status is regarded as a business event; when the field value of the information type field is the third field value, the position change information of the business object watching the first immersive media content belonging to the panoramic video can be obtained as a business event; when the field value of the information type field When it is the value of the fourth field, the position change information of the business object watching the first immersive media content belonging to the volumetric video can be acquired as a business event.
- Step S205 receiving the second immersive media content returned by the server, and playing the second immersive media content on the video playback interface.
- the video client can receive the second media file resource returned by the server, and The file resource is decapsulated and decoded to obtain playable second immersive media content, which can be played on the video playback interface of the video client.
- the video client can feed back to the server business events indicated by different types of interactive operations.
- the interactive operations here can not only include operations related to user locations ( For example, the user's position changes), and other operations (for example, zooming operations) on the immersive media content currently played by the video client may also be included.
- the video client Multiple types of business events can be fed back to the server, so that the server can determine the immersive media content in response to the interactive operation based on these different types of business events, instead of relying only on user location information, so that the interactive feedback can be enriched Information type, and can improve the accuracy of media content obtained by the video client during the interactive feedback process.
- FIG. 9 is an interactive schematic diagram of a data processing method for immersive media provided by an embodiment of the present application.
- the method can be jointly executed by a decoding device and an encoding device in an immersive media system (for example, a panoramic video system or a volumetric video system).
- the decoding device can be the decoding device 100B in the embodiment corresponding to FIG. 1 above, or the above-mentioned The decoding device 200B in the embodiment corresponding to FIG. 3 .
- the encoding device may be the decoding device 100A in the above-mentioned embodiment corresponding to FIG. 1 , or may be the decoding device 200A in the above-mentioned embodiment corresponding to FIG. 3 .
- the decoding device may be a user terminal integrated with a video client, the encoding device may be a server, and the method may at least include the following steps:
- Step S301 the video client initiates a playback request to the server
- step S201 for the implementation of this step, refer to step S201 in the above embodiment corresponding to FIG. 8 , which will not be repeated here.
- Step S302 the server obtains the first immersive media content of the immersive video based on the playback request
- the server may obtain the immersive media content matching the target content identifier from the immersive video as the first immersive media content based on the target content identifier carried in the play request (namely target asset_id).
- the server may obtain the immersive media content matching the object location information from the immersive video as the first immersive media content.
- Step S303 the server returns the first immersive media content to the video client
- Step S304 the video client plays the first immersive media content on the video playback interface
- Step S305 the video client generates an interactive feedback message corresponding to the interactive operation in response to the interactive operation on the first immersive media content
- step S101 in the above embodiment corresponding to FIG. 7
- step S203 in the above embodiment corresponding to FIG. 8 , which will not be repeated here.
- Step S306 the video client sends the interactive feedback message to the server
- Step S307 the server receives the interactive feedback message sent by the video client
- Step S308 the server determines the business event indicated by the interactive operation based on the business key field in the interactive feedback message, and acquires the second immersive media content for responding to the interactive operation based on the business event indicated by the interactive operation;
- the server may determine the business event indicated by the interaction operation based on the business key field in the interaction feedback message, and then based on the business event indicated by the interaction operation, obtain Second immersive media content for responding to the interactive operation is acquired.
- the interaction feedback message is expressed in the form of an interaction signaling table
- the business key field in the interaction feedback message is the information type field added in the interaction signaling table; when the interaction feedback message does not use the interaction signaling table
- the key business fields are directly added to the interactive feedback message.
- the finally acquired second immersive media content may also belong to the immersive media content set, or the second immersive media content may belong to other The immersive media content set included in the resource group, or the second immersive media content may not belong to any immersive media content set included in the resource group, which is not limited in this embodiment of the present application.
- Step S309 the server returns the second immersive media content to the video client
- Step S310 the video client receives the second immersive media content returned by the server, and plays the second immersive media content on the video playback interface.
- the video client requests the immersive video T from the server
- the server receives the request (for example, a playback request)
- it can send the immersive media content T1 (ie, the first immersive media content) in the immersive video T to the video client.
- the video client After the video client receives the immersive media content T1, it can play the immersive media content T1 on the corresponding video playback interface, and the business object (for example, user 1) starts to consume the immersive media content T1, and can generate interactive behaviors during the consumption process ( That is, an interactive operation is performed on the immersive media content T1), so that the video client can generate an interactive feedback message corresponding to the interactive action and send it to the server.
- the business object for example, user 1
- the server receives the interaction feedback message sent by the video client, and according to the content of the interaction feedback message (for example, the business key field), other immersive media content (that is, the second The immersive media content, for example, the immersive media content T2) is sent to the video client, so that the business object can experience the new immersive media content.
- the server may, based on the zoom ratio indicated by the zoom operation, from the immersive The immersive media content with higher color accuracy (for example, immersive media content T2) is selected from the video T and sent to the user 1.
- the server can select the immersive media content (for example, immersive media content T3) corresponding to the content of the replacement version from the immersive video T based on the content switching operation and send it to User 1.
- the immersive media content for example, immersive media content T3
- the server can select the immersive media content (for example, immersive media content T3) corresponding to the content of the replacement version from the immersive video T based on the content switching operation and send it to User 1.
- the embodiment of the present application reorganizes and defines interactive feedback messages on the basis of related technologies, and adds two types of feedback information, zooming and switching (or event triggering), to the types of interactive feedback, so as to support Richer interactive feedback scenarios, and can improve the accuracy of media content acquired by the video client during the interactive feedback process.
- FIG. 10 is a schematic structural diagram of a data processing device for immersive media provided by an embodiment of the present application.
- the data processing device for immersive media can be a computer program (including program code) running on the decoding device, for example, the data processing device for immersive media can be an application software in the decoding device; the data processing device for immersive media can be used Corresponding steps in the data processing method for immersive media provided by the embodiment of the present application are performed.
- the immersive media data processing 1 may include: a message generating module 11, a message sending module 12, and a content receiving module 13;
- the message generation module 11 is configured to respond to the interactive operation for the first immersive media content, and generate an interactive feedback message corresponding to the interactive operation; the interactive feedback message carries a business key field for describing the business event indicated by the interactive operation;
- the message sending module 12 is configured to send an interactive feedback message, the interactive feedback message is used to determine the business event indicated by the interactive operation, and obtain the second immersive media content for responding to the interactive operation based on the business event;
- the content receiving module 13 is configured to receive the returned second immersive media content.
- the implementation of the message generation module 11, the message sending module 12, and the content receiving module 13 can refer to the steps S101-Step S103 in the embodiment corresponding to the above-mentioned FIG. 7, or can refer to the steps in the embodiment corresponding to the above-mentioned FIG. 8 S203-step S205, which will not be repeated here.
- the description of the beneficial effect of adopting the same method will not be repeated here.
- the data processing 1 for immersive media may further include: a video request module 14;
- the video request module 14 is configured to respond to the video playback operation for the immersive video, generate a playback request corresponding to the video playback operation, and send the playback request;
- the play request is used to request to acquire the first immersive media content of the immersive video; receive the returned first immersive media content, and play the first immersive media content.
- the business key field includes at least one of the first key field, the second key field, the third key field, and the fourth key field;
- the first key field is used to indicate that the interactive operation includes a zoom operation , the zoom ratio when the zoom event indicated by the zoom operation is executed;
- the second key field is used to represent the event label and event state corresponding to the switch event indicated by the switch operation when the interactive operation includes a switch operation;
- the third key field used to represent the first object position information of the business object that watches the first immersive media content belonging to the panoramic video when the interactive operation includes the first position interactive operation;
- the fourth key field is used to represent the first object position information when the interactive operation includes During the second location interaction operation, view the second object location information of the business object belonging to the first immersive media content of the volumetric video.
- the message generating module 11 may include: a first determining unit 111, a first adding unit 112, a first generating unit 113, a second determining unit 114, a second adding unit 115, second generation unit 116;
- the first determining unit 111 is configured to respond to a trigger operation for the first immersive media content, determine the first information type field of the business event indicated by the trigger operation, and record the operation timestamp of the trigger operation;
- the first adding unit 112 is configured to add the first information type field and the operation timestamp to the interaction signaling table associated with the first immersive media content, and use the first information type field added in the interaction signaling table as the A business-critical field describing the business event indicated by the interaction;
- the first generating unit 113 is configured to generate an interaction feedback message corresponding to the trigger operation based on the service key field and the operation timestamp in the interaction signaling table.
- the trigger operation when the trigger operation includes a scaling operation, the business event indicated by the scaling operation is a scaling event, and when the field value of the first information type field corresponding to the scaling operation is the first field value, the first field value with the first field value A field mapped to an information type field, which is used to represent the zoom ratio when executing a zoom event.
- the business event indicated by the switching operation is a switching event
- the field value of the first information type field corresponding to the switching operation is the second field value
- the second field value with the second field value A field mapped to an information type field, which is used to represent the event label and event state of the handover event.
- the event state with the first state value is used to indicate that the switching event is in the event-triggered state; when the state value of the event state is the second state value, it has the second state The event state of the value is used to indicate that the switch event is in the event end state.
- the second determining unit 114 is configured to, when detecting the object position information of the business object watching the first immersive media content, use the position interaction operation for the object position information as an interaction operation in response to the first immersive media content; determine the interaction operation The second information type field of the indicated business event, and records the operation timestamp of the interaction operation;
- the second adding unit 115 is configured to add the second information type field and the operation timestamp to the interaction signaling table associated with the first immersive media content, and use the second information type field added in the interaction signaling table as the A business-critical field describing the business event indicated by the interaction;
- the second generation unit 116 is configured to generate an interaction feedback message corresponding to the interaction operation based on the service key field and the operation timestamp in the interaction signaling table.
- the field value of the second information type field corresponding to the object position information is the third field value, with the third The second information type field of the field value includes the first type of location field, and the first type of location field is used to describe the location change information of the business object watching the first immersive media content belonging to the panoramic video.
- the field value of the second information type field corresponding to the object position information is the fourth field value
- the fourth field value includes a second type of location field
- the second type of location field is used to describe the location change information of the business object watching the first immersive media content belonging to the volumetric video.
- the interactive feedback message also includes an extended description field, which may be an extended description field newly added by the system layer of the video client;
- the extended description field includes a signaling table quantity field, a signaling table identification field, and a signaling table version field and at least one of the signaling table length fields;
- the signaling table number field is used to represent the total number of interactive signaling tables contained in the interactive feedback message;
- the signaling table identification field is used to represent each The identifier of each interactive signaling table;
- the signaling table version field is used to represent the version number of each interactive signaling table;
- the signaling table length field is used to represent the length of each interactive signaling table.
- the interaction feedback message also includes a resource group attribute field and a resource group identification field;
- the resource group attribute field is used to represent the affiliation between the first immersive media content and the immersive media content set contained in the target resource group;
- the resource group identification field Identifier used to characterize the target resource group.
- the resource group attribute field with the first attribute field value is used to indicate that the first immersive media content belongs to the immersive media content set; in the resource group attribute field
- the resource group attribute field with the second attribute field value is used to indicate that the first immersive media content does not belong to the immersive media content set.
- step S101 in the above-mentioned embodiment corresponding to FIG. 7 , which will not be repeated here.
- FIG. 11 is a schematic structural diagram of a data processing device for immersive media provided by an embodiment of the present application.
- the data processing device for immersive media can be a computer program (including program code) running on the encoding device, for example, the data processing device for immersive media can be an application software in the encoding device; the data processing device for immersive media can be used Corresponding steps in the data processing method for immersive media provided by the embodiment of the present application are executed.
- the data processing 2 of the immersive media may include: a message receiving module 21, a content obtaining module 22, and a content returning module 23;
- the message receiving module 21 is configured to receive an interactive feedback message; the interactive feedback message is generated in response to an interactive operation on the first immersive media content; the interactive feedback message carries a business keyword for describing a business event indicated by the interactive operation part;
- the content acquisition module 22 is configured to determine the business event indicated by the interactive operation based on the business key field in the interactive feedback message, and acquire the second immersive media content for responding to the interactive operation based on the business event indicated by the interactive operation;
- the content return module 23 is configured to return the second immersive media content.
- the computer device 1000 may include: a processor 1001, a network interface 1004, and a memory 1005.
- the above computer device 1000 may also include: a user interface 1003, and at least one communication bus 1002.
- the communication bus 1002 is configured to realize connection and communication between these components.
- the user interface 1003 may include a display screen (Display) and a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
- the network interface 1004 may include a standard wired interface and a wireless interface (such as a WI-FI interface).
- the memory 1004 can be a high-speed RAM memory, or a non-volatile memory, such as at least one disk memory.
- the memory 1005 may also be at least one storage device located away from the aforementioned processor 1001 .
- the memory 1005 as a computer-readable storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
- the network interface 1004 can provide network communication functions; the user interface 1003 is mainly used to provide an input interface for the user; and the processor 1001 can be used to call the device control stored in the memory 1005 Application program, to execute the description of the data processing method for the immersive media in any one of the embodiments corresponding to Figure 7, Figure 8, and Figure 9 above, and also execute the data processing of the immersive media in the embodiment corresponding to Figure 10 above
- the description of the device 1 can also implement the description of the data processing device 2 for immersive media in the aforementioned embodiment corresponding to FIG. 11 , which will not be repeated here. I won't repeat them here.
- the description of the beneficial effect of adopting the same method will not be repeated here.
- the embodiment of the present application also provides a computer-readable storage medium, and the computer-readable storage medium stores the aforementioned data processing device 1 for immersive media or the data processing device for immersive media 2.
- the computer program executed, and the computer program includes program instructions.
- the processor executes the program instructions, it can execute the description of the data processing method for immersive media in any one of the embodiments corresponding to FIG. 7 , FIG. 8 , and FIG. 9 above, Therefore, details will not be repeated here.
- the description of the beneficial effect of adopting the same method will not be repeated here.
- the above-mentioned computer-readable storage medium may be the data processing apparatus for immersive media provided in any of the foregoing embodiments or an internal storage unit of the above-mentioned computer equipment, such as a hard disk or memory of the computer equipment.
- the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (smart media card, SMC), a secure digital (secure digital, SD) card, Flash card (flash card), etc.
- the computer-readable storage medium may also include both an internal storage unit of the computer device and an external storage device.
- the computer-readable storage medium is used to store the computer program and other programs and data required by the computer device.
- the computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
- the embodiment of the present application also provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
- the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the method provided by any one of the embodiments corresponding to FIG. 7 , FIG. 8 , or FIG. 9 above.
- the description of the beneficial effect of adopting the same method will not be repeated here.
- FIG. 13 is a schematic structural diagram of a data processing system provided by an embodiment of the present application.
- the data processing system 3 may comprise a data processing device 1a and a data processing device 2a.
- the data processing device 1a may be the data processing device 1 for immersive media in the above-mentioned embodiment corresponding to FIG. Or in the decoding device 200B in the above embodiment corresponding to FIG. 3 , therefore, details will not be described here.
- the data processing device 2a may be the data processing device 2 for immersive media in the above-mentioned embodiment corresponding to FIG. Or in the encoding device 200A in the above embodiment corresponding to FIG. 3 , therefore, details will not be described here.
- the description of the beneficial effect of adopting the same method will not be repeated here.
- the technical details not disclosed in the embodiments of the data processing system involved in this application please refer to the description of the method embodiments of this application.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Processing Or Creating Images (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
本申请公开了一种沉浸媒体的数据处理方法、装置、设备、存储介质及程序产品,方法包括:响应针对第一沉浸媒体内容的交互操作,生成交互操作对应的交互反馈消息;该交互反馈消息中携带业务关键字段,该业务关键字段,用于描述交互操作所指示的业务事件;将交互反馈消息发送至服务器,以使服务器基于交互反馈消息中的业务关键字段,确定交互操作所指示的业务事件,基于交互操作所指示的业务事件获取用于响应交互操作的第二沉浸媒体内容;接收服务器返回的第二沉浸媒体内容。
Description
相关申请的交叉引用
本申请基于申请号为202111149860.8、申请日为2021年09月29日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
本申请涉及计算机技术领域,尤其涉及一种沉浸媒体的数据处理方法、装置、设备、存储介质及程序产品。
沉浸媒体(也可称为沉浸式媒体)是指能为业务对象(例如,用户)带来沉浸式体验的媒体内容,沉浸媒体按照业务对象(例如,用户)在消费媒体内容时的自由度(Degree of Freedom,DoF),可以分为3DoF媒体、3DoF+媒体以及6DoF媒体。
在沉浸媒体消费的过程中,视频客户端与服务器之间可以通过发送交互反馈消息(Interaction Feedback Message)来进行会话,例如,视频客户端可以向服务器反馈用于描述用户位置信息(例如,用户位置)的交互反馈消息,以便视频客户端能够接收到服务器基于该用户位置信息所返回的媒体内容。
相关技术中,在沉浸媒体消费的过程中,仅存在用户位置信息这一交互反馈消息,以至于在视频客户端与服务器之间进行会话时存在反馈的信息类型较为单一的现象,从而降低了视频客户端在交互反馈过程中获取媒体内容的准确度。
发明内容
本申请实施例提供了一种沉浸媒体的数据处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品,可以丰富交互反馈的信息类型,且提升视频客户端在交互反馈过程中获取媒体内容的准确度。
本申请实施例提供了一种沉浸媒体的数据处理方法,包括:
响应针对第一沉浸媒体内容的交互操作,生成交互操作对应的交互反馈消息;交互反馈消息中携带业务关键字段,所述业务关键字段,用于描述交互操作所指示的业务事件;
发送交互反馈消息,所述交互反馈消息,用于确定交互操作所指示的业务事件,基于交互操作所指示的业务事件,获取用于响应交互操作的第二沉浸媒体内容;
接收返回的第二沉浸媒体内容。
本申请实施例提供了一种沉浸媒体的数据处理方法,包括:
接收交互反馈消息;交互反馈消息在响应针对第一沉浸媒体内容的交互操作时所生成;交互反馈消息中携带业务关键字段,所述业务关键字段,用于描述交互操作所指示的业务事件;
基于交互反馈消息中的业务关键字段,确定交互操作所指示的业务事件,基于交互操作所指示的业务事件,获取用于响应交互操作的第二沉浸媒体内容;
返回第二沉浸媒体内容。
本申请实施例提供了一种沉浸媒体的数据处理装置,包括:
消息生成模块,配置为响应针对第一沉浸媒体内容的交互操作,生成交互操作对应的交互反馈消息;交互反馈消息中携带业务关键字段,所述业务关键字段,用于描述交互操作所指示的业务事件;
消息发送模块,配置为发送交互反馈消息,所述交互反馈消息,用于确定交互操作所指示的业务事件,基于交互操作所指示的业务事件,获取用于响应交互操作的第二沉浸媒体内容;
内容接收模块,配置为接收返回的第二沉浸媒体内容。
本申请实施例提供了一种沉浸媒体的数据处理装置,包括:
消息接收模块,配置为接收交互反馈消息;交互反馈消息在响应针对第一沉浸媒体内容的交互操作时所生成;交互反馈消息中携带业务关键字段,所述业务关键字段,用于描述交互操作所指示的业务事件;
内容获取模块,配置为基于交互反馈消息中的业务关键字段,确定交互操作所指示的业务事件,基于交互操作所指示的业务事件,获取用于响应交互操作的第二沉浸媒体内容;
内容返回模块,配置为返回第二沉浸媒体内容。
本申请实施例提供了一种计算机设备,包括:处理器和存储器;
处理器与存储器相连,其中,存储器配置为存储计算机程序,计算机程序被处理器执行时,使得该计算机设备执行本申请实施例提供的沉浸媒体的数据处理方法。
本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,该计算机程序适于由处理器加载并执行,以使得具有该处理器的计算机设备执行本申请实施例提供的沉浸媒体的数据处理方法。
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行本申请实施例提供的沉浸媒体的数据处理方法。
在本申请实施例中,通过响应针对第一沉浸媒体内容的交互操作,生成交互操作对应的交互反馈消息并发送,由于交互反馈消息中携带用于描述该交互操作所指示的业务事件的业务关键字段,因此,能够基于交互反馈消息中的业务关键字段,确定交互操作所指示的业务事件,并可以基于交互操作所指示的业务事件获取用于响应交互操作的第二沉浸媒体内容,由于交互操作所指示的业务事件可以对应不同类型,这里的交互操作不仅可以包含与用户位置相关的操作(例如,用户位置发生变动),还可以包含针对当前所播放的沉浸媒体内容的其他操作(例如,缩放操作),因此,通过交互反馈消息中所携带的业务关键字段,可以反馈多种类型的业务事件,这样,可以基于这些不同类型的业务事件来确定响应于该交互操作的沉浸媒体内容,而非只能依赖于用户位置信息,从而可以丰富交互反馈的信息类型,且可以提升视频客户端在交互反馈过程中获取媒体内容的准确度。
图1是本申请实施例提供的一种全景视频系统的架构图;
图2是本申请实施例提供的3DoF的示意图;
图3是本申请实施例提供的一种容积视频系统的架构图;
图4是本申请实施例提供的6DoF的示意图;
图5是本申请实施例提供的3DoF+的示意图;
图6是本申请实施例提供的沉浸媒体的数据处理系统300的架构示意图;
图7是本申请实施例提供的沉浸媒体的数据处理方法的流程示意图;
图8是本申请实施例提供的沉浸媒体的数据处理方法的流程示意图;
图9是本申请实施例提供的沉浸媒体的数据处理方法的交互示意图;
图10是本申请实施例提供的沉浸媒体的数据处理装置的结构示意图;
图11是本申请实施例提供的沉浸媒体的数据处理装置的结构示意图;
图12是本申请实施例提供的一种计算机设备的结构示意图;
图13是本申请实施例提供的一种数据处理系统的结构示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例涉及沉浸媒体的数据处理技术。所谓沉浸媒体(也可称为沉浸式媒体)是指能够提供沉浸式的媒体内容,使沉浸于该媒体内容中的业务对象能够获得现实世界中视觉、听觉等感官体验的媒体文件。沉浸媒体按照业务对象在消费媒体内容时的自由度,可以分为3DoF媒体、3DoF+媒体以及6DoF媒体。其中常见的6DoF媒体包括多视角视频以及点云媒体。沉浸式的媒体内容包括以各种形式在三维(3-Dimension,3D)空间中表示的视频内容,例如以球面形式表示的三维视频内容。例如,沉浸式的媒体内容可以是虚拟现实(Virtual Reality,VR)视频内容、全景视频内容、球面视频内容、360度视频内容或容积视频内容。另外,沉浸式的媒体内容还包括与三维空间中表示的视频内容相同步的音频内容。
全景视频/图像是指采用多个摄像机对场景进行拍摄、拼接以及映射后,可根据业务对象的观看朝向或视窗提供部分媒体画面,最多提供360度图像范围的球面视频或图像。全景视频/图像是一种典型的提供三自由度(即3DoF)体验的沉浸式媒体。
V3C容积媒体(visual volumetric video-based coding media)是指捕获自三维空间视觉内容并提供3DoF+、6DoF观看体验的,以传统视频编码的,在文件封装中包含容积视频类型轨道的沉浸式媒体,例如可以包括多视角视频、视频编码点云等。
其中,多视角视频也可称为多视点视频,是指采用多组摄像机阵列,从多个角度对场景进行拍摄,带有场景的纹理信息(色彩信息等)和深度信息(空间距离信息等)的视频。多视角/多视点视频也叫自由视角/自由视点视频,是一种提供六自由度(即6DoF)体验的沉浸式媒体。
其中,点云是空间中一组无规则分布的、表达三维物体或场景的空间结构及表面属性的离散点集。点云中的每个点至少具有三维位置信息,根据应用场景的不同,还可能具有色彩、材质或其他信息。通常,点云中的每个点都具有相同数量的附加属性。点云可以灵活方便地表达三维物体或场景的空间结构及表面属性,因而应用广泛,包括虚拟现实游戏、计算机辅助设计(Computer Aided Design,CAD)、地理信息系统(Geography Information System,GIS)、自动导航系统(Autonomous Navigation System,ANS)、数字文化遗产、自由视点广播、三维沉浸远程呈现、生物组织器官三维重建等。其中,点云的获取主要有以下途径:计算机生成、3D激光扫描、3D摄 影测量等。
请参见图1,图1是本申请实施例提供的一种全景视频系统的架构图。如图1所示,该全景视频系统可以包括编码设备(例如,编码设备100A)和解码设备(例如,解码设备100B),编码设备可以是指全景视频的提供者所使用的计算机设备,该计算机设备可以是终端(如个人计算机(Personal Computer,PC)、智能移动设备(如智能手机)等)或服务器。解码设备可以是指全景视频的使用者所使用的计算机设备,该计算机设备可以是终端(如PC、智能移动设备(如智能手机)、VR设备(如VR头盔、VR眼镜等))。全景视频的数据处理过程包括在编码设备侧的数据处理过程及在解码设备侧的数据处理过程。
在编码设备侧的数据处理过程主要包括:(1)全景视频的媒体内容的获取与制作过程;(2)全景视频的编码及文件封装的过程。在解码设备侧的数据处理过程主要包括:(1)全景视频的文件解封装及解码的过程;(2)全景视频的渲染过程。另外,编码设备与解码设备之间涉及全景视频的传输过程,该传输过程可以基于各种传输协议来进行,此处的传输协议可包括但不限于:动态自适应流媒体传输(Dynamic Adaptive Streaming over HTTP,DASH)协议、动态码率自适应传输(HTTP Live Streaming,HLS)协议、智能媒体传输协议(Smart Media Transport Protocol,SMTP)、传输控制协议(Transmission Control Protocol,TCP)等。
下面将结合图1,分别对全景视频的数据处理过程中涉及的各个过程进行详细介绍。
在编码设备侧的数据处理过程:
(1)全景视频的媒体内容的获取与制作过程。
1)全景视频的媒体内容的获取过程。
全景视频的媒体内容是通过捕获设备采集现实世界的声音-视觉场景获得的。在一些实施例中,捕获设备可以是指设于编码设备中的硬件组件,例如捕获设备是指终端的麦克风、摄像头、传感器等。另一些实施例中,该捕获设备也可以是与编码设备相连接的硬件装置,例如与服务器相连接的摄像头,用于为编码设备提供全景视频的媒体内容的获取服务。该捕获设备可以包括但不限于:音频设备、摄像设备及传感设备。其中,音频设备可以包括音频传感器、麦克风等。摄像设备可以包括普通摄像头、立体摄像头、光场摄像头等。传感设备可以包括激光设备、雷达设备等。捕获设备的数量可以为多个,这些捕获设备被部署在现实空间中的一些特定位置以同时捕获该空间内不同角度的音频内容和视频内容,捕获的音频内容和视频内容在时间和空间上均保持同步。本申请实施例可以将由部署在特定位置的捕获设备所采集到的用于提供三自由度观看体验的3维空间的媒体内容称作全景视频。
例如,如图1所示,真实世界的声音-视觉场景10A可以由编码设备100A中的多个音频传感器以及一组摄像机阵列捕获,或者,可以由与编码设备100A相连接的具有多个摄像头和传感器的摄像设备捕获。采集结果可以为一组数字图像/视频信号10B
i(即视频内容)以及数字音频信号10B
a(即音频内容)。这里的摄像机或摄像头,通常会覆盖摄像机阵列或摄像设备中心点周围的所有方向,因此,全景视频也可称为360度视频。
2)全景视频的媒体内容的制作过程。
应当理解,本申请实施例所涉及的全景视频的媒体内容的制作过程可以理解为全景视频的内容制作的过程。捕获到的音频内容本身就是适合被执行全景视频的音频编码的内容。捕获到的视频内容进行一系列制作流程后才可成为适合被执行全景视频的视频编码的内容,该制作流程可以包括:
①拼接。由于捕获到的视频内容是捕获设备在不同角度下拍摄得到的,拼接就是指将这些各个角度拍摄的视频内容拼接成一个完整的、能够反映现实空间360度视觉全景的视频,即拼接后的视频是一个在三维空间表示的球面视频。或者,对捕获到的多个图像进行拼接,得到一个在三维空间表示的球面图像。
②旋转。是该制作流程中一个处理操作,上述拼接得到的球面视频中的每个视频帧均为基于全局坐标轴单位球面上的球面图像,旋转就是指将单位球面在全局坐标轴上进行旋转。通过旋转的角度来表示本地坐标轴到全局坐标轴转换所需要的旋转角度。其中,单位球面的本地坐标轴是经过旋转后的坐标系统的轴。应当理解,如果本地坐标轴和全局坐标轴相同,则不需要进行旋转。
③投影。投影就是指将拼接形成的一个三维视频(或将旋转后的一个三维视频)映射到一个二维(2-Dimension,2D)图像上的过程,投影形成的2D图像称为投影图像;投影的方式可包括但不限于:经纬图投影、正六面体投影。
④区域封装。投影图像可以被直接进行编码,也可以对投影图像进行区域封装之后再进行编码。实践中发现,在沉浸媒体的数据处理过程中,对于二维投影图像进行区域封装之后再进行编码能够大幅提升沉浸媒体的视频编码效率,因此区域封装技术被广泛应用到沉浸媒体的视频处理过程中。所谓区域封装是指将投影图像按区域执行转换处理的过程,区域封装过程使投影图像被转换为封装图像。区域封装的过程包括:将投影图像划分为多个映射区域,然后再对多个映射区域分别进行转换处理得到多个封装区域,将多个封装区域映射到一个2D图像中得到封装图像。其中,映射区域是指执行区域封装前在投影图像中经划分得到的区域;封装区域是指执行区域封装后位于封装图像中的区域。转换处理可以包括但不限于:镜像、旋转、重新排列、上采样、下采样、改变区域的分辨率及移动等处理。
例如,如图1所示,编码设备100A可以对数字图像/视频信号10B
i中属于同一时间实例的图像进行拼接、(可能)旋转、投影并映射到封装图像10D上。
需要说明的是,通过上述获取与制作过程得到的全景视频,再通过编码设备处理并传输至解码设备进行相应的数据处理后,解码设备侧的业务对象只能通过执行一些特定动作(如头部旋转)来观看360度的视频信息,也就是说,全景视频是一种提供三自由度的沉浸式媒体。请一并参见图2,图2是本申请实施例提供的3DoF的示意图。如图2所示,3DoF是指业务对象在一个三维空间的中心点固定,业务对象头部沿着X轴、Y轴和Z轴旋转来观看媒体内容提供的画面。在本申请实施例中,可以将进行沉浸式媒体(例如全景视频、容积视频)消费的用户统称为业务对象。
(2)全景视频的编码及文件封装的过程。
捕获到的音频内容可直接进行音频编码形成全景视频的音频码流。经过上述制作流程①-④(可能不包括②)之后,对封装图像进行视频编码,得到全景视频的视频码流。将音频码流和视频码流按照全景视频的文件格式(如基于ISO标准的媒体文件格式(ISO Based Media File Format,ISOBMFF))封装在文件容器中形成全景视频的媒体文件资源,该媒体文件资源可以是媒体文件或媒体片段形成的全景视频的媒体文件;并按照全景视频的文件格式要求采用媒体呈现描述信息(Media presentation description,MPD)记录该全景视频的媒体文件资源的元数据,此处的元数据是对与全景视频的呈现有关的信息的总称,该元数据可包括对媒体内容的描述信息、对视窗的描述信息以及对媒体内容呈现相关的信令信息等等。如图1所示,编码设备会存储经过数据处理过程之后形成的媒体呈现描述信息和媒体文件资源。
例如,如图1所示,编码设备100A可以对捕获到的数字音频信号10B
a进行音频编码,得到音频码流10E
a,同时,可以对封装图像10D进行视频编码,得到视频码 流10E
v,或者,可以对封装图像10D进行图像编码,得到编码图像10E
i。随后,编码设备100A可以根据特定的媒体文件格式(如ISOBMFF),将编码后得到的编码图像10E
i、视频码流10E
v和/或音频码流10E
a组合成用于文件回放的一个媒体文件10F或者组合成一个用于流式传输的包含一个初始化片段和多个媒体片段的片段序列10F
s。其中,媒体文件10F和片段序列10F
s均属于全景视频的媒体文件资源。此外,编码设备100A中的文件封装器也可以将元数据添加到媒体文件10F或片段序列10F
s中,例如,这里的元数据可以包括投影信息和区域封装信息,这些元数据将有助于后续解码设备渲染解码后得到的封装图像。随后,编码设备100A可以采用特定的传输机制(如DASH、SMTP)将片段序列10F
s传输到解码设备100B,同时将媒体文件10F也传输到解码设备100B。其中,解码设备100B可以为一个全景媒体应用格式(Omnidirectional Media Application Format,OMAF)播放器。
在解码设备侧的数据处理过程:
(3)全景视频的文件解封装及解码的过程。
解码设备可以通过编码设备的推荐或按照解码设备端的业务对象需求自适应动态从编码设备获得全景视频的媒体文件资源和相应的媒体呈现描述信息,例如解码设备可根据业务对象的头部/眼睛的跟踪信息确定业务对象的朝向和位置,再基于确定的朝向和位置动态向编码设备请求获得相应的媒体文件资源。媒体文件资源和媒体呈现描述信息通过传输机制(如DASH、智能媒体传输(Smart Media Transport,SMT))由编码设备传输给解码设备。解码设备侧的文件解封装的过程与编码设备侧的文件封装过程是相逆的,解码设备按照全景视频的文件格式(例如,ISOBMFF)要求对媒体文件资源进行解封装,得到音频码流和视频码流。解码设备侧的解码过程与编码设备侧的编码过程是相逆的,解码设备对音频码流进行音频解码,还原出音频内容;解码设备对视频码流进行视频解码,还原出视频内容。
例如,如图1所示,编码设备100A中的文件封装器输出的媒体文件10F与解码设备100B中输入文件解封装器的媒体文件10F'是相同的。文件解封装器对媒体文件10F'或接收到的片段序列10F'
s进行文件解封装处理,并提取出编码后的码流,包括音频码流10E'
a、视频码流10E'
v、编码图像10E'
i,同时解析相应的元数据。其中,视窗相关视频可能会在多个轨道中承载,在进行解码处理之前,这些轨道可以在流重写中合并为单个视频码流10E'
v。随后,解码设备100B可以对音频码流10E'
a进行音频解码,得到音频信号10B'
a(即还原出的音频内容);对视频码流10E'
v进行视频解码,或者,对编码图像10E'
i进行图像解码,得到图像/视频信号10D'(即还原出的视频内容)。
(4)全景视频的渲染过程。
解码设备根据媒体呈现描述信息中与渲染相关的元数据对音频解码得到的音频内容及视频解码得到的视频内容进行渲染,渲染完成即实现了对该图像的播放输出。特别地,由于全景视频采用3DoF的制作技术,因此解码设备主要基于当前视点、视差、深度信息等对图像进行渲染。其中,视点指业务对象的观看位置点,视差是指业务对象的双目产生的视线差或由于运动产生的视线差。
全景视频系统支持数据盒(Box),数据盒是指包括元数据的数据块或对象,即数据盒中包含了相应媒体内容的元数据。全景视频可以包括多个数据盒,例如包括球面区域缩放数据盒(Sphere Region Zooming Box),其包含用于描述球面区域缩放信息的元数据;2D区域缩放数据盒(2D Region Zooming Box),其包含用于描述2D区域缩放信息的元数据;区域封装数据盒(Region Wise Packing Box),其包含用于描述区域封装过程中的相应信息的元数据;等等。
例如,如图1所示,解码设备100B可以基于当前的观看方向或视窗(即观看区域)以及投影、球形覆盖、旋转,以及从媒体文件10F'或片段序列10F'
s解析得到的区域封装元数据,将解码得到的封装图像10D'(即图像/视频信号10D')投影到头戴式显示器或任何其他显示设备的屏幕上。类似的,根据当前的观看方向对音频信号10B'
a进行渲染(例如,通过耳机或扬声器)。其中,当前的观看方向由头部跟踪,可能还有眼睛跟踪来确定。此外,除了被渲染器用来渲染解码后的视频信号和音频信号的适当部分之外,当前的观看方向也可被视频解码器和音频解码器用于解码优化。在视窗相关的传输中,当前的观看方向也会被传递给解码设备100B中的策略模块,该策略模块可以根据当前的观看方向确定要接收的视频轨道。
请一并参见图3,图3是本申请实施例提供的一种容积视频系统的架构图。如图3所示,该容积视频系统包括编码设备(例如,编码设备200A)和解码设备(例如,解码设备200B),编码设备可以是指容积视频的提供者所使用的计算机设备,该计算机设备可以是终端(如PC、智能移动设备(如智能手机)等)或服务器。解码设备可以是指容积视频的使用者所使用的计算机设备,该计算机设备可以是终端(如PC、智能移动设备(如智能手机)、VR设备(如VR头盔、VR眼镜等))。容积视频的数据处理过程包括在编码设备侧的数据处理过程及在解码设备侧的数据处理过程。
在编码设备侧的数据处理过程主要包括:(1)容积视频的媒体内容的获取与制作过程;(2)容积视频的编码及文件封装的过程。在解码设备侧的数据处理过程主要包括:(1)容积视频的文件解封装及解码的过程;(2)容积视频的渲染过程。另外,编码设备与解码设备之间涉及容积视频的传输过程,该传输过程可以基于各种传输协议来进行,此处的传输协议可包括但不限于:DASH协议、HLS协议、SMTP协议、TCP协议等。
下面将结合图3,分别对容积视频的数据处理过程中涉及的各个过程进行详细介绍。
一、在编码设备侧的数据处理过程:
(1)容积视频的媒体内容的获取与制作过程。
1)容积视频的媒体内容的获取过程。
容积视频的媒体内容是通过捕获设备采集现实世界的声音-视觉场景获得的。在一些实施例中,捕获设备可以是指设于编码设备中的硬件组件,例如捕获设备是指终端的麦克风、摄像头、传感器等。另一种实现中,该捕获设备也可以是与编码设备相连接的硬件装置,例如与服务器相连接的摄像头,用于为编码设备提供容积视频的媒体内容的获取服务。该捕获设备可以包括但不限于:音频设备、摄像设备及传感设备。其中,音频设备可以包括音频传感器、麦克风等。摄像设备可以包括普通摄像头、立体摄像头、光场摄像头等。传感设备可以包括激光设备、雷达设备等。捕获设备的数量可以为多个,这些捕获设备被部署在现实空间中的一些特定位置以同时捕获该空间内不同角度的音频内容和视频内容,捕获的音频内容和视频内容在时间和空间上均保持同步。本申请实施例可以将由部署在特定位置的捕获设备所采集到的用于提供多自由度(如3DoF+、6DoF)观看体验的3维空间的媒体内容称作容积视频。
例如,以获取容积视频的视频内容为例进行说明,如图3所示,视觉场景20A(包括真实世界的视觉场景或合成的视觉场景)可以由编码设备200A相连接的一组摄像机阵列捕获,或者,可以由与编码设备200A相连接的具有多个摄像头和传感器的摄像设备捕获,或者,还可以由多个虚拟摄像机捕获。采集结果可以为源容积数据20B(即容积视频的视频内容)。
2)容积视频的媒体内容的制作过程。
应当理解,本申请实施例所涉及的容积视频的媒体内容的制作过程可以理解为容积视频的内容制作的过程,且这里的容积视频的内容制作主要由部署在多个位置的摄像机或摄像机阵列拍摄得到的多视点视频、点云数据、光场等形式的内容制作而成,比如,编码设备可以将容积视频从三维的表示转换成二维的表示。这里的容积视频可以包含几何信息、属性信息、占位图信息以及图集数据等,容积视频在编码前一般需要进行特定处理,例如点云数据在编码前需要切割、映射等过程。例如,多视点视频在编码前一般需要将多视点视频的不同视点进行分组,以在每个分组内进行主视点与辅助视点的区分。
示例性地,①将采集输入的容积视频的三维表示数据(即上述点云数据)投影到二维平面,通常采用正交投影、透视投影、ERP投影(Equi-Rectangular Projection,等距柱状投影)方式,投影到二维平面的容积视频通过几何组件、占位组件和属性组件的数据表示,其中,几何组件的数据提供容积视频每个点在三维空间中的位置信息,属性组件的数据提供容积视频每个点的额外属性(如纹理或材质信息),占位组件的数据指示其他组件中的数据是否与容积视频关联;
②对容积视频的二维表示的组件数据进行处理生成图块,根据几何组件数据中表示的容积视频的位置,将容积视频的二维表示所在的二维平面区域分割成多个不同大小的矩形区域,一个矩形区域为一个图块,图块包含将该矩形区域反投影到三维空间的必要信息;
③打包图块生成图集,将图块放入一个二维网格中,并保证各个图块中的有效部分是没有重叠的。一个容积视频生成的图块可以打包成一个或多个图集;
④基于图集数据生成对应的几何数据、属性数据和占位数据,将图集数据、几何数据、属性数据、占位数据组合形成容积视频在二维平面的最终表示。
其中,需要注意的是,在容积视频的内容制作过程中,几何组件为必选,占位组件为条件必选,属性组件为可选。
此外,需要说明的是,由于采用捕获设备可以捕获到全景视频,这样的视频经编码设备处理并传输至解码设备进行相应的数据处理后,解码设备侧的业务对象需要通过执行一些特定动作(如头部旋转)来观看360度的视频信息,而执行非特定动作(如移动头部)并不能获得相应的视频变化,VR体验不佳,因此需要额外提供与全景视频相匹配的深度信息,来使业务对象获得更优的沉浸度和更佳的VR体验,这就涉及6DoF制作技术。当业务对象可以在模拟的场景中较自由地移动时,称为6DoF。采用6DoF制作技术进行容积视频的视频内容的制作时,捕获设备一般会选用光场摄像头、激光设备、雷达设备等,捕获空间中的点云数据或光场数据。请一并参见图4,图4是本申请实施例提供的6DoF的示意图。如图4所示,6DoF分为窗口6DoF、全方向6DoF和6DoF,其中,窗口6DoF是指业务对象在X轴、Y轴的旋转移动受限,以及在Z轴的平移受限,例如,业务对象不能够看到窗户框架外的景象,以及业务对象无法穿过窗户。全方向6DoF是指业务对象在X轴、Y轴和Z轴的旋转移动受限,例如,业务对象在受限的移动区域中不能自由地穿过三维的360度VR内容。6DoF是指业务对象在3DoF的基础上,可以沿着X轴、Y轴、Z轴自由平移,例如,业务对象可以在三维的360度VR内容中自由地走动。与6DoF相类似的,还有3DoF和3DoF+制作技术。图5是本申请实施例提供的3DoF+的示意图。如图5所示,3DoF+是指当沉浸媒体提供的虚拟场景具有一定的深度信息,业务对象头部可以基于3DoF在一个有限的空间内移动来观看媒体内容提供的画面。其中,3DoF的示意图可以参见上述图2,这里不再进行赘述。
(2)容积视频的编码及文件封装的过程。
捕获到的音频内容可直接进行音频编码形成容积视频的音频码流。捕获到的视频内容可进行视频编码,得到容积视频的视频码流。此处需要说明的是,如果采用6DoF制作技术,在视频编码过程中需要采用特定的编码方式(如基于传统视频编码的点云压缩方式)进行编码。将音频码流和视频码流按照容积视频的文件格式(如ISOBMFF)封装在文件容器中形成容积视频的媒体文件资源,该媒体文件资源可以是媒体文件或媒体片段形成的容积视频的媒体文件;并按照容积视频的文件格式要求采用媒体呈现描述信息(即MPD)记录该容积视频的媒体文件资源的元数据,此处的元数据是对与容积视频的呈现有关的信息的总称,该元数据可包括对媒体内容的描述信息、对构建得到的每个视点组与观看媒体内容的空间位置信息之间的映射关系进行描述的定时元数据信息、对视窗的描述信息以及对媒体内容呈现相关的信令信息等等。如图1所示,编码设备会存储经过数据处理过程之后形成的媒体呈现描述信息和媒体文件资源。
示例性地,采集的音频会被编码成相应的音频码流,容积视频的几何信息、属性信息以及占位图信息可以采用传统的视频编码方式,而容积视频的图集数据可以采用熵编码方式。然后,按一定格式(如ISOBMFF、HNSS)将编码的媒体封装在文件容器中并结合描述媒体内容属性的元数据和视窗元数据,根据一个特定的媒体文件格式组成一个媒体文件或者组成一个初始化片段和媒体片段。
例如,如图3所示,编码设备200A对源容积视频数据20B中的一个或多个容积视频帧进行容积视频编码,得到编码后的VC3码流20E
v(即视频码流),包括一个图集码流(即对图集数据进行编码后得到的码流),至多一个占用码流(即对占位图信息进行编码后得到的码流),一个几何码流(即对几何信息进行编码后得到的码流),以及零个或更多个属性码流(即对属性信息进行编码后得到的码流)。随后,编码设备200A可以根据特定的媒体文件格式(如ISOBMFF),将一个或多个编码后的码流封装成一个用于本地回放的媒体文件20F或者封装成一个用于流式传输的包含一个初始化片段和多个媒体片段的片段序列20F
s。此外,编码设备200A中的文件封装器也可以将元数据添加到媒体文件20F或片段序列20F
s中。编码设备200A可以采用某种传输机制(如DASH、SMTP)将片段序列20F
s传输到解码设备200B,同时将媒体文件20F也传输到解码设备200B。这里的解码设备200B可以为一个播放器。
二、在解码设备侧的数据处理过程:
(3)容积视频的文件解封装及解码的过程。
解码设备可以通过编码设备的推荐或按照解码设备侧的业务对象需求自适应动态从编码设备获得容积视频的媒体文件资源和相应的媒体呈现描述信息,例如解码设备可根据业务对象的头部/眼睛的跟踪信息确定业务对象的朝向和位置,再基于确定的朝向和位置动态向编码设备请求获得相应的媒体文件资源。媒体文件资源和媒体呈现描述信息通过传输机制(如DASH、SMT)由编码设备传输给解码设备。解码设备侧的文件解封装的过程与编码设备侧的文件封装过程是相逆的,解码设备按照容积视频的文件格式(例如,ISOBMFF)要求对媒体文件资源进行解封装,得到音频码流和视频码流。解码设备侧的解码过程与编码设备侧的编码过程是相逆的,解码设备对音频码流进行音频解码,还原出音频内容;解码设备对视频码流进行视频解码,还原出视频内容。
例如,如图3所示,编码设备200A中的文件封装器输出的媒体文件20F与解码设备200B中输入文件解封装器的媒体文件20F'是相同的。文件解封装器对媒体文件20F'或接收到的片段序列20F'
s进行文件解封装处理,并提取出编码后的VC3码流20E'
v,同时解析相应的元数据,随后可以对VC3码流20E'
v进行容积视频解码,得到 解码后的视频信号20D'(即还原出的视频内容)。
(4)容积视频的渲染过程。
解码设备根据媒体文件资源对应的媒体呈现描述信息中与渲染相关的元数据,对音频解码得到的音频内容及视频解码得到的视频内容进行渲染,渲染完成即实现了对该图像的播放输出。
容积视频系统支持数据盒(Box),数据盒是指包括元数据的数据块或对象,即数据盒中包含了相应媒体内容的元数据。容积视频可以包括多个数据盒,例如包括文件封装数据盒(ISO Base Media File Format Box,ISOBMFF Box),其包含用于描述文件封装时的相应信息的元数据。
例如,如图3所示,解码设备200B可以基于当前的观看方向或视窗,对解码后的视频信号20D'进行重构,得到重构后的容积视频数据20B',进而可以对重构后的容积视频数据20B'进行渲染,并显示在头戴式显示器或任何其他显示设备的屏幕上。其中,当前的观看方向由头部跟踪,可能还有眼睛跟踪来确定。在视窗相关的传输中,当前的观看方向也会被传递给解码设备200B中的策略模块,该策略模块可以根据当前的观看方向确定要接收的轨道。
通过上述图1所对应的实施例中描述的过程或者上述图3所对应的实施例中描述的过程,解码设备可以动态地从编码设备侧获取沉浸媒体对应的媒体文件资源,由于媒体文件资源是由编码设备对捕获到的音视频内容进行编码以及封装后所得到的,因此,解码设备接收到编码设备返回的媒体文件资源后,需要先对该媒体文件资源进行解封装,得到相应的音视频码流,随后再对该音视频码流进行解码,最终才能将解码后的音视频内容呈现给业务对象。这里的沉浸媒体包括但不限于全景视频和容积视频,其中,容积视频可以包括多视角视频、基于传统视频编码的点云压缩(Video-based Point Cloud Compression,VPCC)点云媒体、基于几何模型的点云压缩(Geometry-based Point Cloud Compression,GPCC)点云媒体。
应当理解,在业务对象进行沉浸媒体消费时,解码设备与编码设备之间可以不断进行交互反馈,例如,解码设备可以将业务对象状态(例如,对象位置信息)反馈给编码设备,以使编码设备能够根据交互反馈的内容为业务对象提供相应的媒体文件资源。在本申请实施例中,可以将对沉浸媒体的媒体文件资源进行解封装及解码之后所得到的可播放的媒体内容(包括音频内容、视频内容)统称为沉浸媒体内容,对于解码设备而言,解码设备可以在视频播放界面上播放由获取到的媒体文件资源还原出来的沉浸媒体内容。也就是说,一个媒体文件资源可以对应于一个沉浸媒体内容,因此,本申请实施例可以将第一媒体文件资源对应的沉浸媒体内容称为第一沉浸媒体内容,将第二媒体文件资源对应的沉浸媒体内容称为第二沉浸媒体内容,其他媒体文件资源与相应的沉浸媒体内容也可以采用类似的命名。
为了支持更丰富的交互反馈场景,本申请实施例提供了一种沉浸媒体交互反馈消息的指示方法。例如,解码设备(如用户终端)上可以运行有视频客户端,进而可以在该视频客户端的视频播放界面上播放第一沉浸媒体内容。应当理解,这里的第一沉浸媒体内容是由解码设备对第一媒体文件资源进行解封装及解码之后所得到的,而第一媒体文件资源是由编码设备(如服务器)预先对相关音视频内容进行编码及封装后所得到的。在播放第一沉浸媒体内容的过程中,解码设备可以响应针对第一沉浸媒体内容的交互操作,生成交互操作对应的交互反馈消息,其中,这里的交互反馈消息中携带用于描述该交互操作所指示的业务事件的业务关键字段。解码设备可以将交互反馈消息发送至编码设备,以使编码设备可以基于交互反馈消息中的业务关键字段,确定交互操作所指示的业务事件,并可以基于交互操作所指示的业务事件获取用于响应 交互操作的第二媒体文件资源。其中,这里的第二媒体文件资源是由编码设备预先对相关音视频内容进行编码及封装后所得到的。最终,解码设备可以接收编码设备返回的第二媒体文件资源,并对第二媒体文件资源进行解封装及解码,从而得到可播放的第二沉浸媒体内容,随后在其视频播放界面上播放第二沉浸媒体内容。对第二媒体文件资源进行解封装及解码的过程可以参见上述图1所对应的实施例中描述的相关过程或者图3所对应的实施例中描述的相关过程。
在本申请实施例中,上述交互操作不仅可以包含与用户位置相关的操作(例如,用户位置发生变动),还可以包含针对视频客户端当前所播放的沉浸媒体内容的其他操作(例如,缩放操作),因此,通过交互反馈消息中所携带的业务关键字段,解码设备上的视频客户端可以向编码设备反馈多种类型的业务事件,这样,编码设备可以基于这些不同类型的业务事件来确定响应于该交互操作的沉浸媒体内容,而非只能依赖于用户位置信息,从而可以丰富交互反馈的信息类型,且可以提升视频客户端在交互反馈过程中获取媒体内容的准确度。
在一些实施例中,参见图6,图6是本申请实施例提供的一种沉浸媒体的数据处理系统300的架构示意图,为实现支撑一个示例性应用,终端400为设置有视频客户端的解码设备,终端400通过网络500连接服务器600(即编码设备),网络500可以是广域网或者局域网,又或者是二者的组合,使用无线或有线链路实现数据传输。
终端400(视频客户端),用于响应针对第一沉浸媒体内容的交互操作,生成交互操作对应的交互反馈消息;该交互反馈消息中携带业务关键字段,该业务关键字段,用于描述交互操作所指示的业务事件;
发送交互反馈消息至服务器600;
服务器600,用于基于交互反馈消息,确定交互操作所指示的业务事件,基于该业务事件获取用于响应交互操作的第二沉浸媒体内容,并返回第二沉浸媒体内容至终端400;
终端400(视频客户端),还用于接收返回的第二沉浸媒体内容,并播放。
这里,服务器(例如服务器600)可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(CDN,Content Delivery Network)、以及大数据和人工智能平台等基础云计算服务的云服务器。终端(例如终端400)可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能语音交互设备(例如智能音箱)、智能家电(例如智能电视)、智能手表、车载终端等,但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请实施例对此不做限制。
在一些实施例中,终端或服务器可以通过运行计算机程序来实现本申请实施例提供的沉浸媒体的数据处理方法,举例来说,计算机程序可以是操作系统中的原生程序或软件模块;可以是本地(Native)应用程序(APP,Application),即需要在操作系统中安装才能运行的程序;也可以是小程序,即只需要下载到浏览器环境中就可以运行的程序;还可以是能够嵌入至任意APP中的小程序。总而言之,上述计算机程序可以是任意形式的应用程序、模块或插件。
本申请实施例提供的方法可以应用于沉浸媒体系统的服务器端(即编码设备侧)、播放器端(即解码设备侧)以及中间节点(例如,SMT接收实体、SMT发送实体)等环节。其中,解码设备与编码设备之间进行交互反馈的过程可以参见下述图7-图9所对应实施例的描述。
请参见图7,图7是本申请实施例提供的一种沉浸媒体的数据处理方法的流程示 意图。该方法可以由沉浸媒体系统(例如,全景视频系统或者容积视频系统)中的解码设备来执行,该解码设备可以为上述图1所对应实施例中的解码设备100B,也可以为上述图3所对应实施例中的解码设备200B或图6中的终端400。该解码设备可以为集成有视频客户端的用户终端,该方法至少可以包括以下步骤S101-步骤S103:
步骤S101,响应针对第一沉浸媒体内容的交互操作,生成交互操作对应的交互反馈消息。
这里,交互反馈消息中携带业务关键字段,该业务关键字段,用于描述交互操作所指示的业务事件,该业务事件包括以下至少之一:缩放事件、切换事件及位置交互事件;
示例性地,用户终端上的视频客户端在获取到服务器返回的第一媒体文件资源后,可以对第一媒体文件资源进行解封装处理以及解码处理,得到第一沉浸媒体内容,进而可以在该视频客户端的视频播放界面上播放第一沉浸媒体内容。其中,这里的第一沉浸媒体内容是指业务对象当前正在观看的沉浸媒体内容,这里的业务对象可以指消费第一沉浸媒体内容的用户。例如,第一沉浸媒体内容可以属于某个沉浸式视频,该沉浸式视频可以为包含一个或多个沉浸媒体内容的视频集合,本申请实施例对沉浸式视频所包含的内容数量不进行限定。例如,假设服务器提供的一个沉浸式视频包括N个沉浸媒体内容,N为大于1的整数,分别为:与场景P1相关联的沉浸媒体内容A1、与场景P2相关联的沉浸媒体内容A2、…、与场景P1相关联的沉浸媒体内容AN,则视频客户端可以按照服务器的推荐或业务对象的需求,从上述N个沉浸媒体内容中获取到的任意一个或多个沉浸媒体内容,例如,沉浸媒体内容A1,则此时沉浸媒体内容A1可以作为当前的第一沉浸媒体内容。
需要说明的是,该沉浸式视频可以为全景视频;例如,该沉浸式视频可以为容积视频,本申请实施例对沉浸式视频的视频类型不进行限定。
在一些实施例中,在视频客户端的视频播放界面上播放第一沉浸媒体内容的过程中,视频客户端可以响应针对当前正在播放的第一沉浸媒体内容的交互操作,生成该交互操作对应的交互反馈消息。应当理解,交互反馈消息也可称为交互反馈信令,可以提供沉浸式媒体消费时,视频客户端与服务器之间的交互反馈,例如,在消费的第一沉浸媒体内容属于全景视频时,SMT接收实体可以定期地向SMT发送实体反馈虚拟摄像机的方向信息来通知当前VR虚拟摄像机方向,此外,在视场角(Field of view,FOV)改变时也会发送相应的方向信息。又例如,在消费的第一沉浸媒体内容属于容积视频时,SMT接收实体可以定期地向SMT发送实体反馈虚拟摄像机的位置或业务对象的位置、观看方向信息,以便视频客户端获取相应的媒体内容。其中,SMT接收实体和SMT发送实体属于视频客户端与服务器之间的中间节点。
在本申请实施例中,交互操作是指业务对象针对于当前所消费的第一沉浸媒体内容所执行的操作,包括但不限于缩放操作、切换操作、位置交互操作。其中,缩放操作,对应缩放事件,是指对第一沉浸媒体内容进行画面尺寸的缩小或放大的操作,例如,可以通过双击沉浸媒体内容A1,实现对沉浸媒体内容A1的画面放大;又例如,可以通过双指向不同方向同时滑动拉伸沉浸媒体内容A1,实现对沉浸媒体内容A1的画面缩小或画面放大。这里的切换操作对应切换事件,可以包括对第一沉浸媒体内容执行的播放速率切换操作、画质切换操作(即清晰度切换操作)、翻转操作、内容切换操作以及允许在应用层进行预先定义的其他基于事件的触发操作,例如针对画面中的目标位置的点击操作,业务对象面向目标方向时的触发操作等等。这里的位置交互操作对应位置交互事件,是指业务对象在观看第一沉浸媒体内容时所产生的针对其对象位置信息(即用户位置信息)的操作,例如实时位置的变动、观看方向的变动、 视角方向的变动等。为了便于后续理解和区分,本申请实施例在第一沉浸媒体内容属于全景视频时,将对应的位置交互操作称为第一位置交互操作;在第一沉浸媒体内容属于容积视频时,将对应的位置交互操作称为第二位置交互操作。需要说明的是,本申请实施例对缩放操作、切换操作以及位置交互操作的触发方式不进行限定。
应当理解,交互反馈消息中可以携带用于描述交互操作所指示的业务事件的业务关键字段。
在一些实施例中,交互反馈消息中可以直接包含业务关键字段,这里的业务关键字段可以包含第一关键字段、第二关键字段、第三关键字段以及第四关键字段中的一个或多个(即至少两个)。其中,第一关键字段用于表征在交互操作包含缩放操作时,执行缩放操作所指示的缩放事件时的缩放比例;第二关键字段用于表征在交互操作包含切换操作时,切换操作所指示的切换事件对应的事件标签和事件状态;第三关键字段用于表征在交互操作包含第一位置交互操作时,观看属于全景视频的第一沉浸媒体内容的业务对象的第一对象位置信息(例如,业务对象的实时位置、视角方向等);第四关键字段用于表征在交互操作包含第二位置交互操作时,观看属于容积视频的第一沉浸媒体内容的业务对象的第二对象位置信息(例如,业务对象实时观看方向)。由此可见,交互操作所对应的交互反馈消息可以包括缩放比例、事件标签和事件状态,并在包括缩放比例、事件标签和事件状态的基础上,还可以包括第一对象位置信息或第二对象位置信息。
应当理解,一个交互反馈消息可以携带用于描述一个或多个交互操作所指示的业务事件的业务关键字段,本申请实施例对交互反馈消息所对应的交互操作的数量和类型不进行限定。
可以理解,由于第一沉浸媒体内容对应的视频类型不相同,因此第一位置交互操作与第二位置交互操作不能同时存在,也就是说,在同一个交互反馈消息中,不能同时存在有效的第三关键字段和有效的第四关键字段。
应当理解,一个交互反馈消息所携带的业务关键字段可以包括第一关键字段、第二关键字段、第三关键字段、第四关键字段中的任意一种字段,例如,每发生一个交互操作,视频客户端就生成一个相应的交互反馈消息,在第一沉浸媒体内容属于全景视频的场景下,一个交互反馈消息所携带的业务关键字段可以包括第一关键字段、第二关键字段、第三关键字段中的任意一种或多种字段,例如包括第一关键字段及第三关键字段。同理,在第一沉浸媒体内容属于容积视频的场景下,一个交互反馈消息所携带的业务关键字段可以包括第一关键字段、第二关键字段、第四关键字段中的任意一种或多种字段。例如,在一段时间内,业务对象针对上述沉浸媒体内容A2执行了缩放操作,且在观看沉浸媒体内容A2的过程中,业务对象的对象位置信息发生了变化,例如,业务对象边走边看,若沉浸媒体内容A2属于容积视频,则此时生成的交互反馈消息将可同时包含反映缩放比例的第一关键字段以及反映第二对象位置信息的第四关键字段,因此最终获取到的第二沉浸媒体内容是基于该缩放比例以及该第二对象位置信息所共同确定的,也就是说,可以基于不同类型的业务事件来确定响应于交互操作的沉浸媒体内容,也即,结合多个信息维度确定媒体内容,如此,可以提升视频客户端在交互反馈过程中获取媒体内容的准确度。
在一些实施例中,上述交互反馈消息还可以包含信息标识字段,用于表征每种交互操作所指示的业务事件的信息类型,例如,信息标识字段的字段值可以为每类业务事件对应的信息名称,这样,当一个交互反馈消息同时携带多种业务事件时,可以通过信息标识字段进行信息类型的区分。
应当理解,本申请实施例对交互反馈的时机不进行限定,可以根据实际需求在应 用层进行约定。例如,视频客户端可以在检测到某个交互操作时立即生成相应的交互反馈消息发送至服务器;在一些实施例中,视频客户端可以定期向服务器发送交互反馈消息,例如,视频客户端每隔30秒向服务器反馈一次。
在一些实施例中,交互反馈信息中可以携带与交互操作相关联的交互信令表,且交互信令表中包含用于描述交互操作所指示的业务事件的业务关键字段。也就是说,交互反馈消息可以采用交互信令表的形式重新定义和组织多种不同类型的业务事件。
在一些实施例中,针对交互操作为触发操作的场景,视频客户端响应针对第一沉浸媒体内容的触发操作,确定触发操作所指示的业务事件的第一信息类型字段,且记录该触发操作的操作时间戳。其中,这里的触发操作可以指针对第一沉浸媒体内容的接触式操作或某些特定的非接触式操作,例如,该触发操作可以包括缩放操作、切换操作等。在一些实施例中,视频客户端可以将第一信息类型字段和操作时间戳添加至第一沉浸媒体内容相关联的交互信令表,并可以将交互信令表中所添加的第一信息类型字段作为用于描述交互操作所指示的业务事件的业务关键字段。随后,视频客户端可以基于该交互信令表中的业务关键字段和操作时间戳,生成该触发操作对应的交互反馈消息。其中,这里的第一信息类型字段可用于表征触发操作所指示的业务事件的信息类型。可以理解,每一个触发操作都可以对应于一个交互信令表,因此,同一个交互反馈消息可以包括一个或多个交互信令表,本申请实施例对交互反馈消息所包含的交互信令表的数量不进行限制。
在一些实施例中,在触发操作包含缩放操作时,该缩放操作所指示的业务事件为缩放事件,且在缩放操作所对应的第一信息类型字段的字段值为第一字段值时,具有第一字段值的第一信息类型字段所映射的字段用于表征执行该缩放事件时的缩放比例。
在一些实施例中,在触发操作包含切换操作时,该切换操作所指示的业务事件为切换事件,且在切换操作所对应的第一信息类型字段的字段值为第二字段值时,具有第二字段值的第一信息类型字段所映射的字段用于表征该切换事件的事件标签和事件状态。其中,在事件状态的状态值为第一状态值时,具有第一状态值的事件状态用于表征切换事件处于事件触发状态;在事件状态的状态值为第二状态值时,具有第二状态值的事件状态用于表征切换事件处于事件结束状态。
为便于理解,下面以SMT信令消息形式为例进行说明,请参见表1,该表1用于指示本申请实施例提供的一种交互信令表的语法:
表1
上述表1所示语法的语义如下:table_id为信令表标识字段,用于表征交互信令表的标识符。version为信令表版本字段,用于表征交互信令表的版本号。length为信令表长度字段,用于表征交互信令表的长度。table_type为第一信息类型字段,用于表征交互信令表携带的信息类型(例如,缩放事件或切换事件)。timestamp为操作时间戳,用于指示当前触发操作产生的时间戳,这里可以采用UTC时间(Universal Time Coordinated,协调世界时)。如表1所示,当第一信息类型字段(即table_type)的字段值为第一字段值(例如,2)时,第一信息类型字段所映射的字段为zoom_ratio,zoom_ratio指示业务对象缩放行为的比例,即执行缩放事件时的缩放比例(也可称为画面缩放信息),在一些实施例中,zoom_ratio可以2
-3为单位。例如,假设用户1(即业务对象)对沉浸媒体内容F1(即第一沉浸媒体内容)进行放大,则相应的交互反馈消息中会携带table_type==2的交互信令表,且zoom_ratio=16,即表示当前的缩放比例为16*2
-3=2倍。在一些实施例中,zoom_ratio也可作为前述可选实施方式中描述的第一关键字段。如表1所示,当第一信息类型字段的字段值为第二字段值(例如,3)时,第一信息类型字段所映射的字段为event_label和event_trigger_flag,event_label指示业务对象交互触发的事件标签,event_trigger_flag指示业务对象交互触发的事件状态,在一些实施例中,event_trigger_flag取值为1(即第一状态值)时表示事件触发(即切换事件处于事件触发状态),event_trigger_flag取值为0(即第二状态值)时表示事件结束(即切换事件处于事件结束状态)。例如,假设用户2(即业务对象)在观看沉浸媒体内容F2(即第一沉浸媒体内容)时,点击了视频播放界面中的内容切换控件,则相应的交互反馈消息中会携带table_type==3的交互信令表,且event_label=“content switch”,event_trigger_flag=1,即表示用户2触发了内容切换操作,希望将当前播放的沉浸媒体内容F2切换为其他沉浸媒体内容。在一些实施例中,event_label和event_trigger_flag也可作为前述可选实施方式中描述的第二关键字段。此外,reserved指示保留字节位。
其中,本申请实施例对上述第一字段值以及第二字段值的数值不进行限定,且对第一状态值以及第二状态值的数值也不进行限定。应当理解,本申请实施例可以支持研发人员在应用层预先定义所需的切换事件,事件标签的内容可以根据沉浸媒体内容来确定,本申请实施例对此不进行限定,需要说明的是,相关的沉浸媒体内容需要支持自定义的切换事件,才有可能在后续交互过程中实现事件触发,例如,当上述沉浸媒体内容F2支持内容切换时,才会在播放沉浸媒体内容F2的视频播放界面中显示相应的内容切换控件。
在一些实施例中,针对交互操作为位置交互操作的场景,本申请实施例还可以支持在交互反馈消息中携带对象位置信息,此时也可以采用交互信令表的形式进行定义。例如可以为:视频客户端在检测到观看第一沉浸媒体内容的业务对象的对象位置信息时,将针对对象位置信息的位置交互操作作为响应于第一沉浸媒体内容的交互操作,进而可以确定该交互操作所指示的业务事件的第二信息类型字段,且可以记录该交互操作的操作时间戳。在一些实施例中,可以将第二信息类型字段和操作时间戳添加至第一沉浸媒体内容相关联的交互信令表,并可以将该交互信令表中所添加的第二信息类型字段作为用于描述上述交互操作所指示的业务事件的业务关键字段。随后,视频客户端可以基于该交互信令表中的业务关键字段和操作时间戳,生成交互操作对应的交互反馈消息。可以理解,每一个位置交互操作都可以对应于一个交互信令表,因此,同一个交互反馈消息可以包括一个或多个交互信令表,但是,同一个交互反馈消息中不能同时存在携带第一对象位置信息的交互信令表和携带第二对象位置信息 的交互信令表。可以理解,若视频客户端定时向服务器反馈业务对象的对象位置信息,则在业务对象消费第一沉浸媒体内容的一段时间内,其对象位置信息可能会发生变化,也可能没有发生变化。在对象位置信息没有变化时,服务器仍可以基于该对象位置信息去获取相应的沉浸媒体内容,此时获取到的沉浸媒体内容可能与第一沉浸媒体内容相同;同理,除了该对象位置信息之外,若这段时间内视频客户端还向服务器反馈了其他信息,例如,对第一沉浸媒体内容执行缩放操作时的缩放比例,则服务器可以基于该对象位置信息以及该缩放比例去获取相应的沉浸媒体内容,此时获取到的沉浸媒体内容与第一沉浸媒体内容不相同。
在一些实施例中,在第一沉浸媒体内容为沉浸式视频中的沉浸媒体内容,且沉浸式视频为全景视频时,对象位置信息所对应的第二信息类型字段的字段值为第三字段值,具有第三字段值的第二信息类型字段包含第一类位置字段,第一类位置字段用于描述观看属于全景视频的第一沉浸媒体内容的业务对象的位置变动信息。
在一些实施例中,在第一沉浸媒体内容为沉浸式视频中的沉浸媒体内容,且沉浸式视频为容积视频时,对象位置信息所对应的第二信息类型字段的字段值为第四字段值,具有第四字段值的第二信息类型字段包含第二类位置字段,第二类位置字段用于描述观看属于容积视频的第一沉浸媒体内容的业务对象的位置变动信息。
在一些实施例中,请参见表2,该表2用于指示本申请实施例提供的一种交互信令表的语法:
表2
上述表2所示语法的语义如下:table_id为信令表标识字段,用于表征交互信令表的标识符。version为信令表版本字段,用于表征交互信令表的版本号。length为信令表长度字段,用于表征交互信令表的长度。table_type为第二信息类型字段,用于表征交互信令表携带的信息类型(如第一对象位置信息或第二对象位置信息)。timestamp为操作时间戳,用于指示当前位置交互操作产生的时间戳,这里可以采用UTC时间。如表2所示,当table_type的字段值为0(即第三字段值)时,其包含的第一类位置字段有:3DoF+_flag指示3DoF+视频内容;interaction_target为交互目标字段,指示视频客户端当前交互的目标,包括头盔设备当前状态(HMD_status)、业务对象关注目标(Object of interests)、业务对象当前状态(User_status)等。interaction_type为交互类型字段,在本申请实施例将其置0。其中,交互目标字段interaction_target的取值可以参见表3,表3用于指示本申请实施例提供的一种交互目标字段的取值表:
表3
结合表3,请继续参见表2,当交互目标字段取值为1时,表示交互目标为头盔设备当前状态,相应的,ClientRegion为视窗信息,指示视频客户端视窗的尺寸和屏幕分辨率,其语法请参见表4,该表4用于指示本申请实施例提供的一种视窗信息的语法:
表4
上述表4的语义如下:Region_width_angle指示视频客户端视窗在横向的张角,精度为2
-16度,取值范围为(-90*2
16,90*2
16)。Region_height_angle指示视频客户端视窗在纵向的张角,精度为2
-16度,取值范围为(-90*2
16,90*2
16)。Region_width_resolution指示视频客户端视窗横向的分辨率,取值范围为(0,2
16-1)。Region_height_resolution指示视频客户端视窗在纵向的分辨率,取值范围为(0,2
16-1)。
请继续参见表2,当交互目标字段取值为2时,表示交互目标为业务对象关注区域当前状态,相应的,ClientRotation为视角方向,指示业务对象实时的视角相对初始视角的变化,其语法请参见表5,该表5用于指示本申请实施例提供的一种视角方向的语法:
表5
上述表5的语义如下:3D_rotation_type指示旋转信息的表示类型,该字段取值为0表示旋转信息以欧拉角的形式给出;该字段取值为1表示旋转信息以四元数的形式给出;其余取值保留。rotation_yaw指示业务对象实时的视角相对初始视角的沿着x轴的偏航角度,取值范围为(-180*2
16,180*2
16–1)。rotation_pitch指示业务对象实时的视角相对初始视角的沿着y轴的俯仰角度,取值范围为(-90*2
16,90*2
16)。rotation_roll指示业务对象实时的视角相对初始视角的沿着z轴的翻滚角度,取值范围为(-180*2
16,180*2
16–1)。rotation_x,rotation_y,rotation_z以及rotation_w分别指示四元数x,y,z和w分量的取值,表示业务对象实时的视角相对初始视角的旋转信息。
请继续参见表2,当交互目标字段取值为3且3DoF+_flag取值为1时,表示交互目标为业务对象当前状态,相应的,ClientPosition为业务对象实时位置,指示业务对象在虚拟场景中相对起始位置的位移,在3DoF(即3DoF+_flag取值为0)时该结构中所有字段的字段值为0,在3DoF+(即3DoF+_flag取值为1)时该结构中所有字段的字段值为非0值,且取值范围应在约束范围内。behavior_coefficient定义一个放大行为系数。其中,ClientPosition的语法请参见表6,该表6用于指示本申请实施例提供的一种业务对象实时位置的语法:
表6
上述表6的语义如下:position_x指示业务对象实时位置相对起始位置沿着x轴位移,取值范围为(-2
15,2
15-1)毫米。position_y指示业务对象实时位置相对起始位置沿着y轴位移,取值范围为(-2
15,2
15-1)毫米。position_z指示业务对象实时位置相对起始位置沿着z轴位移,取值范围为(-2
15,2
15-1)毫米。
在一些实施例中,table_type的字段值为0时所包含的第一类位置字段可作为前述的第三关键字段。
请继续参见表2,如表2所示,当table_type的字段值为1(即第四字段值)时,其包含的第二类位置字段有:ClientPosition指示业务对象当前在全局坐标系中的位置,其语法可以参见上述表6。V3C_orientation指示业务对象在以当前位置建立的笛卡尔坐标系中的观看方向。last_processed_media_timestamp指示已加入解码器缓冲区的最后一个媒体单元的时间戳。SMT发送实体使用此字段从容积视频播放器的新asset(即新的沉浸媒体内容)中确定下一个传输的媒体单元。下一个媒体单元是紧随该时间戳后的带有时间戳或序号的媒体单元。SMT发送实体从随后的媒体时间戳开 始,从传输先前的asset(根据先前的视窗确定)切换到传输新的asset(根据新的视窗确定),以减少接收对应于新视窗媒体内容的延迟。其中,V3C_orientation的语法请参见表7,该表7用于指示本申请实施例提供的一种业务对象实时观看方向的语法:
表7
上述表7的语义如下:dirx表示以业务对象所在位置为原点建立笛卡尔坐标系,业务对象观看方向在x轴上的坐标。diry表示以业务对象所在位置为原点建立笛卡尔坐标系,业务对象观看方向在y轴上的坐标。dirz表示以业务对象所在位置为原点建立笛卡尔坐标系,业务对象观看方向在z轴上的坐标。
在一些实施例中,table_type的字段值为1时所包含的第二类位置字段可作为前述的第四关键字段。
可以理解,本申请实施例还可以将上述表1和表2进行结合,得到一种至少可以表示四种信息类型的交互信令表,从而可以基于该交互信令表生成交互操作对应的交互反馈消息,可以参见下述图8所对应实施例中的步骤S203。
步骤S102,发送交互反馈消息,该交互反馈消息,用于确定交互操作所指示的业务事件,并基于业务事件获取用于响应交互操作的第二沉浸媒体内容。
这里,终端将交互反馈消息发送至服务器,以使服务器基于交互反馈消息中的业务关键字段,确定交互操作所指示的业务事件,基于交互操作所指示的业务事件获取用于响应交互操作的第二沉浸媒体内容;
示例性地,视频客户端可以将交互反馈消息发送至服务器,后续服务器接收到该交互反馈消息后,可以基于交互反馈消息中的业务关键字段,确定交互操作所指示的业务事件,进而可以基于交互操作所指示的业务事件获取用于响应交互操作的第二沉浸媒体内容所对应的第二媒体文件资源,该第二媒体文件资源是由服务器预先对相关的音视频内容进行编码以及封装后所得到的,其对应于第二沉浸媒体内容,对音视频内容进行编码以及封装的过程可以参见上述图1或图3所对应实施例中的相关描述,这里不再进行赘述。例如,若交互操作为清晰度切换操作,则服务器可以根据该清晰度切换操作所指示的分辨率,获取与该分辨率相匹配的媒体文件资源,作为响应该清晰度切换操作的第二媒体文件资源。
步骤S103,接收返回的第二沉浸媒体内容。
示例性地,视频客户端可以接收服务器返回的第二沉浸媒体内容,并可以在视频播放界面上播放第二沉浸媒体内容。结合上述步骤S102,应当理解,服务器基于业务事件获取到的首先是第二沉浸媒体内容所对应的媒体文件资源,即第二媒体文件资源,并可以将第二媒体文件资源返回至视频客户端,因此,当视频客户端接收到第二媒体文件资源后,可以通过上述图1或图3所对应实施例中的相关描述,对第二媒体文件资源进行解封装以及解码,从而得到可在视频客户端的视频播放界面上播放的第二沉浸媒体内容,其中,解封装以及解码的过程这里不再进行赘述。
在一些实施例中,请参见图8,图8是本申请实施例提供的一种沉浸媒体的数据处理方法的流程示意图。该方法可以由沉浸媒体系统(例如,全景视频系统或者容积视频系统)中的解码设备来执行,该解码设备可以为上述图1所对应实施例中的解码设备100B,也可以为上述图3所对应实施例中的解码设备200B。该解码设备可以为集成有视频客户端的用户终端,该方法至少可以包括以下步骤:
步骤S201,响应针对视频客户端中的沉浸式视频的视频播放操作,生成视频播放操作对应的播放请求,将播放请求发送至服务器,以使服务器基于播放请求获取沉浸式视频的第一沉浸媒体内容;
示例性地,业务对象希望体验沉浸式视频时,可以通过用户终端上的视频客户端请求相应的沉浸媒体内容。例如,视频客户端可以响应针对视频客户端中的沉浸式视频的视频播放操作,生成该视频播放操作对应的播放请求,进而可以将该播放请求发送至服务器,以使服务器可以基于该播放请求获取沉浸式视频中的第一沉浸媒体内容所对应的第一媒体文件资源。这里的第一媒体文件资源是指服务器对相关音视频内容进行编码以及封装等处理后所得的数据。
步骤S202,接收服务器返回的第一沉浸媒体内容,在视频客户端的视频播放界面上播放第一沉浸媒体内容;
示例性地,在服务器基于播放请求获取到第一沉浸媒体内容对应的第一媒体文件资源后,可以将第一媒体文件资源返回至视频客户端,从而视频客户端可以接收由服务器返回的第一媒体文件资源,并对第一媒体文件资源进行解封装以及解码等处理,从而得到可在视频客户端的视频播放界面上播放的第一沉浸媒体内容。
步骤S203,在视频客户端的视频播放界面上播放第一沉浸媒体内容时,响应针对第一沉浸媒体内容的交互操作,生成交互操作对应的交互反馈消息;交互反馈消息中携带与交互操作相关联的交互信令表;
示例性地,在视频客户端的视频播放界面上播放第一沉浸媒体内容时,视频客户端可以响应针对第一沉浸媒体内容的交互操作,生成交互操作对应的交互反馈消息,例如,视频客户端响应针对第一沉浸媒体内容的交互操作,确定交互操作所指示的业务事件的信息类型字段,且记录交互操作的操作时间戳。在一些实施例中,可以将信息类型字段和操作时间戳添加至第一沉浸媒体内容相关联的交互信令表,将交互信令表中所添加的信息类型字段作为用于描述交互操作所指示的业务事件的业务关键字段。随后,可以基于交互信令表中的业务关键字段和操作时间戳,生成交互操作对应的交互反馈消息。
应当理解,这里的交互操作可以包括缩放操作、切换操作、位置交互操作中的一种或多种,其中,位置交互操作可以为第一位置交互操作或第二位置交互操作。
在本申请实施例中,交互反馈消息中可以携带与交互操作相关联的交互信令表,且交互信令表所包含的信息类型字段可作为用于描述交互操作所指示的业务事件的业务关键字段。其中,信息类型字段可以包括与触发操作相关的第一信息类型字段以及与位置交互操作相关的第二信息类型字段,本申请实施例将第一信息类型字段和第二信息类型字段统称为信息类型字段。
应当理解,本申请实施例可以将上述表1和表2进行组合,得到一种至少可以表示四种信息类型的交互信令表,这样,通过交互信令表可以将不同信息类型的交互反馈消息整合在一起,而不至于因为信息类型的多样化而显得混乱。请参见表8,该表8用于指示本申请实施例提供的一种交互信令表的语法:
表8
上述表8所示的table_type为信息类型字段,可用于表征交互信令表携带的信息类型。其他字段的语义可以参见上述图3所对应实施例中的表1和表2,这里不再进行赘述。在一些实施例中,table_type的取值可以参见表9,该表9用于指示本申请实施例提供的一种信息类型字段的取值表:
表9
由表9可知,信息类型字段的字段值可以为第一字段值(例如,2)、第二字段值(例如,3)、第三字段值(例如,0)、第四字段值(例如,1)等。表9中的全景视频用户位置变动信息即为第一类位置字段所描述的位置变动信息,容积视频用户位置 变动信息即为第二类位置字段所描述的位置变动信息,画面缩放信息即为执行缩放事件时的缩放比例,交互事件触发信息包括切换事件的事件标签和事件状态,后续还可以继续增添其他取值的信息。
应当理解,基于上述表1、表2或表8,本申请实施例生成的交互反馈消息可以支持更丰富的交互反馈场景。请一并参见表10,该表10用于指示本申请实施例提供的一种交互反馈消息的语法:
表10
上述表10所示语法的语义如下:message_id指示交互反馈消息的标识符。version指示交互反馈消息的版本,新的版本所携带的信息将覆盖任何之前的旧版本。length指示包含了以字节计算的交互反馈消息的长度,即从下一字段起直到交互反馈消息最后一个字节的长度,其中,“0”值在此字段无效。number_of_tables为信令表数量字段,指示交互反馈消息中包含的交互信令表的数量,这里用N1表示,本申请实施例对N1的数值不进行限定。table_id为信令表标识字段,指示交互反馈消息中包含的每个交互信令表的标识符,这是交互信令表中包含在交互反馈消息的有效负载中的table_id字段的一个副本。table_version为信令表版本字段,指示交互反馈消息中所包含的每个交互信令表的版本号,这是包含在交互反馈消息的有效负载中的交互信令表的版本字段的一个副本。table_length为信令表长度字段,指示交互反馈消息中所包含的每个交互信令表的长度,为包含在交互反馈消息的有效负载中的交互信令表的长度字段的一个副本。message_source指示消息源,0表示交互反馈消息是视频客户端发往服务器,1表示交互反馈消息是服务器发往视频客户端,该值此处置0。asset_group_flag为资源组属性字段,用于表征第一沉浸媒体内容与目标资源组所包含的沉浸媒体内容集之间的从属关系,例如,在资源组属性字段的字段值为第一属性字 段值(例如,1)时,具有第一属性字段值的资源组属性字段用于表征第一沉浸媒体内容属于该沉浸媒体内容集;在资源组属性字段的字段值为第二属性字段值(例如,0)时,具有第二属性字段值的资源组属性字段用于表征第一沉浸媒体内容不属于该沉浸媒体内容集,也就是说,asset_group_flag取值为1表示视频客户端当前消费内容(即第一沉浸媒体内容)属于一个资源组(如目标资源组),取值为0表示视频客户端当前消费内容不属于任何资源组。其中,资源组是指包含多个沉浸媒体内容的集合,本申请实施例中的沉浸式视频可以包括多个沉浸媒体内容(例如第一沉浸媒体内容),这多个沉浸媒体内容可以根据需要以资源组为单位再进行细分,例如,该沉浸式视频本身就可以作为一个资源组,也就是说,沉浸式视频中的所有沉浸媒体内容均属于一个资源组;或者,该沉浸式视频可以被划分为多个资源组,每个资源组均可以包括该沉浸式视频中的多个沉浸媒体内容。asset_group_id为资源组标识字段,指示视频客户端当前消费内容的资源组标识符,即第一沉浸媒体内容所属的沉浸媒体内容集对应的资源组(如目标资源组)的标识符。asset_id指示视频客户端当前消费内容的标识符。应当理解,每个沉浸媒体内容都有唯一对应的asset_id,在第一沉浸媒体内容属于某个资源组时,视频客户端当前消费的第一沉浸媒体内容的数量可能为不止一个,此时反馈其中某个第一沉浸媒体内容的asset_id显然不太恰当,因此可以反馈多个第一沉浸媒体内容所属的资源组的标识符。table()为一个交互信令表实体,在有效负载中的该交互信令表与扩展域中table_id出现的顺序相同,一个交互信令表可以作为一个table()的实例。其中,交互信令表的顺序可以是按照对应的操作时间戳进行排序,也可以按照交互信令表对应的table_id进行排序,还可以采用其它排序方式,本申请实施例对此不进行限定。可以看到,表10所示的交互反馈消息中采用了循环语句,因此可以有序地反馈交互反馈消息所包含的一个或多个交互信令表所携带的业务事件,也就是说,在交互反馈消息包含多个交互信令表时,服务器会按照循环语句中所呈现的交互信令表的顺序,依次读取每个交互信令表。
其中,上述信令表数量字段、信令表标识字段、信令表版本字段、信令表长度字段、资源组属性字段以及资源组标识字段均属于在视频客户端的系统层新增的扩展描述字段。
上述可知,本申请实施例在现有技术的基础上,重新定义和组织了交互反馈消息,并在交互反馈的类型中增加了缩放和事件触发两种类型的反馈信息,以支持更丰富的交互反馈场景,且可以提升视频客户端在交互反馈过程中获取媒体内容的准确度。
步骤S204,将交互反馈消息发送至服务器,以使服务器提取交互信令表,根据交互信令表中的信息类型字段确定交互操作所指示的业务事件,基于交互操作所指示的业务事件获取用于响应交互操作的第二沉浸媒体内容;
示例性地,视频客户端可以将交互反馈消息发送至服务器,服务器接收到该交互反馈消息后,可以按顺序依次从该交互反馈消息中提取交互信令表,并可以从提取到的交互信令表中读取信息类型字段,进而根据该信息类型字段确定交互操作所指示的业务事件。最终,可以基于交互操作所指示的业务事件,从上述沉浸式视频中获取用于响应交互操作的第二沉浸媒体内容,并将第二沉浸媒体内容返回至视频客户端。例如,当信息类型字段的字段值为第一字段值时,可以获取缩放事件对应的缩放比例作为业务事件;当信息类型字段的字段值为第二字段值时,可以获取切换事件的事件标签和事件状态作为业务事件;当信息类型字段的字段值为第三字段值时,可以获取观看属于全景视频的第一沉浸媒体内容的业务对象的位置变动信息作为业务事件;当信息类型字段的字段值为第四字段值时,可以获取观看属于容积视频的第一沉浸媒体内容的业务对象的位置变动信息作为业务事件。
步骤S205,接收服务器返回的第二沉浸媒体内容,在视频播放界面上播放第二沉浸媒体内容。
示例性地,由于服务器返回的其实是沉浸式视频中的第二沉浸媒体内容所对应的第二媒体文件资源,因此视频客户端可以接收由服务器返回的第二媒体文件资源,并对第二媒体文件资源进行解封装以及解码等处理,从而得到可播放的第二沉浸媒体内容,并可在视频客户端的视频播放界面上进行播放。
上述可知,在视频客户端与服务器进行交互的过程中,视频客户端可以向服务器反馈不同类型的交互操作所指示的业务事件,应当理解,这里的交互操作不仅可以包含与用户位置相关的操作(例如,用户位置发生变动),还可以包含针对视频客户端当前所播放的沉浸媒体内容的其他操作(例如,缩放操作),因此,通过交互反馈消息中所携带的业务关键字段,视频客户端可以向服务器反馈多种类型的业务事件,这样,服务器可以基于这些不同类型的业务事件来确定响应于该交互操作的沉浸媒体内容,而非只能依赖于用户位置信息,从而可以丰富交互反馈的信息类型,且可以提升视频客户端在交互反馈过程中获取媒体内容的准确度。
在一些实施例中,请参见图9,图9是本申请实施例提供的一种沉浸媒体的数据处理方法的交互示意图。该方法可以由沉浸媒体系统(例如,全景视频系统或者容积视频系统)中的解码设备和编码设备共同执行,该解码设备可以为上述图1所对应实施例中的解码设备100B,也可以为上述图3所对应实施例中的解码设备200B。该编码设备可以为上述图1所对应实施例中的解码设备100A,也可以为上述图3所对应实施例中的解码设备200A。该解码设备可以为集成有视频客户端的用户终端,该编码设备可以为服务器,该方法至少可以包括以下步骤:
步骤S301,视频客户端向服务器发起播放请求;
该步骤的实现方式可以参见上述图8所对应实施例中的步骤S201,这里不再进行赘述。
步骤S302,服务器基于播放请求获取沉浸式视频的第一沉浸媒体内容;
示例性地,服务器可以基于播放请求中所携带的目标内容标识符(即目标asset_id),从沉浸式视频中获取与该目标内容标识符相匹配的沉浸媒体内容作为第一沉浸媒体内容。在一些实施例中,服务器也可以基于播放请求中所携带的业务对象当前的对象位置信息,从沉浸式视频中获取与该对象位置信息相匹配的沉浸媒体内容作为第一沉浸媒体内容。
步骤S303,服务器将第一沉浸媒体内容返回至视频客户端;
步骤S304,视频客户端在视频播放界面上播放第一沉浸媒体内容;
步骤S305,视频客户端响应针对第一沉浸媒体内容的交互操作,生成交互操作对应的交互反馈消息;
该步骤的实现方式可以参见上述图7所对应实施例中的步骤S101,或者可以参见上述图8所对应实施例中的步骤S203,这里不再进行赘述。
步骤S306,视频客户端将交互反馈消息发送至服务器;
步骤S307,服务器接收由视频客户端发送的交互反馈消息;
步骤S308,服务器基于交互反馈消息中的业务关键字段,确定交互操作所指示的业务事件,基于交互操作所指示的业务事件获取用于响应交互操作的第二沉浸媒体内容;
示例性地,服务器接收到该交互反馈消息后,可以基于交互反馈消息中的业务关键字段,确定交互操作所指示的业务事件,进而可以基于交互操作所指示的业务事件,从沉浸式视频中获取用于响应交互操作的第二沉浸媒体内容。可以理解,当交互反馈 消息采用交互信令表的形式来表示时,交互反馈消息中的业务关键字段即为交互信令表中所添加的信息类型字段;当交互反馈消息不采用交互信令表的形式来表示时,业务关键字段直接添加在交互反馈消息中。
应当理解,若第一沉浸媒体内容属于目标资源组所包含的沉浸媒体内容集,则最终获取到的第二沉浸媒体内容可能同属于该沉浸媒体内容集,或者,第二沉浸媒体内容可能属于其他资源组所包含的沉浸媒体内容集,又或者,第二沉浸媒体内容可能不属于任何一个资源组所包含的沉浸媒体内容集,本申请实施例对此不进行限定。
步骤S309,服务器将第二沉浸媒体内容返回至视频客户端;
步骤S310,视频客户端接收服务器返回的第二沉浸媒体内容,在视频播放界面上播放第二沉浸媒体内容。
为便于理解,以沉浸式视频T为例对上述步骤进行简单说明。假设视频客户端向服务器请求沉浸式视频T,服务器接收到该请求(例如,播放请求)后,可以基于该请求将沉浸式视频T中的沉浸媒体内容T1(即第一沉浸媒体内容)发送给视频客户端。视频客户端接收到沉浸媒体内容T1后,可以在对应的视频播放界面上播放沉浸媒体内容T1,业务对象(例如,用户1)开始消费沉浸媒体内容T1,并可以在消费过程中产生交互行为(即针对沉浸媒体内容T1执行交互操作),从而视频客户端可以生成该交互行为对应的交互反馈消息发送给服务器。在一些实施例中,服务器接收视频客户端发送的交互反馈消息,根据该交互反馈消息的消息内容(例如,业务关键字段),可以从沉浸式视频T中选择其他沉浸媒体内容(即第二沉浸媒体内容,例如,沉浸媒体内容T2)发送给视频客户端,从而业务对象可以体验新的沉浸媒体内容。例如,假设用户1对沉浸媒体内容T1执行缩放操作,如对沉浸媒体内容T1的内容进行放大,且对应的缩放比例为3倍,则服务器可以基于该缩放操作所指示的缩放比例,从沉浸式视频T中选择颜色精度更高的沉浸媒体内容(例如,沉浸媒体内容T2)发送给用户1。又例如,假设用户1对沉浸媒体内容T1执行内容切换操作,则服务器可以基于该内容切换操作,从沉浸式视频T中选择对应替换版本内容的沉浸媒体内容(例如,沉浸媒体内容T3)发送给用户1。
上述可知,本申请实施例在相关技术的基础上,重新组织和定义了交互反馈消息,并在交互反馈的类型中增加了缩放和切换(或事件触发)两种类型的反馈信息,从而可以支持更丰富的交互反馈场景,且可以提升视频客户端在交互反馈过程中获取媒体内容的准确度。
请参见图10,是本申请实施例提供的一种沉浸媒体的数据处理装置的结构示意图。该沉浸媒体的数据处理装置可以是运行于解码设备的一个计算机程序(包括程序代码),例如该沉浸媒体的数据处理装置可以为解码设备中的一个应用软件;该沉浸媒体的数据处理装置可以用于执行本申请实施例提供的沉浸媒体的数据处理方法中的相应步骤。在一些实施例中的,如图10所示,该沉浸媒体的数据处理1可以包括:消息生成模块11、消息发送模块12、内容接收模块13;
消息生成模块11,配置为响应针对第一沉浸媒体内容的交互操作,生成交互操作对应的交互反馈消息;交互反馈消息中携带用于描述交互操作所指示的业务事件的业务关键字段;
消息发送模块12,配置为发送交互反馈消息,该交互反馈消息,用于确定交互操作所指示的业务事件,基于该业务事件获取用于响应交互操作的第二沉浸媒体内容;
内容接收模块13,配置为接收返回的第二沉浸媒体内容。
其中,消息生成模块11、消息发送模块12、内容接收模块13的实现方式可以参 见上述图7所对应实施例中的步骤S101-步骤S103,或者,可以参见上述图8所对应实施例中的步骤S203-步骤S205,这里将不再继续进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
在一些实施例中,如图10所示,该沉浸媒体的数据处理1还可以包括:视频请求模块14;
视频请求模块14,配置为响应针对沉浸式视频的视频播放操作,生成视频播放操作对应的播放请求,并发送播放请求;
其中,所述播放请求,用于请求获取沉浸式视频的第一沉浸媒体内容;接收返回的第一沉浸媒体内容,并播放第一沉浸媒体内容。
其中,视频请求模块14的实现方式可以参见上述图8所对应实施例中的步骤S201-步骤S202,这里将不再继续进行赘述。
其中,业务关键字段包含第一关键字段、第二关键字段、第三关键字段以及第四关键字段中至少之一;第一关键字段,用于表征在交互操作包含缩放操作时,执行缩放操作所指示的缩放事件时的缩放比例;第二关键字段,用于表征在交互操作包含切换操作时,切换操作所指示的切换事件对应的事件标签和事件状态;第三关键字段,用于表征在交互操作包含第一位置交互操作时,观看属于全景视频的第一沉浸媒体内容的业务对象的第一对象位置信息;第四关键字段,用于表征在交互操作包含第二位置交互操作时,观看属于容积视频的第一沉浸媒体内容的业务对象的第二对象位置信息。
在一些实施例中的,如图10所示,消息生成模块11可以包括:第一确定单元111、第一添加单元112、第一生成单元113、第二确定单元114、第二添加单元115、第二生成单元116;
第一确定单元111,配置为响应针对第一沉浸媒体内容的触发操作,确定触发操作所指示的业务事件的第一信息类型字段,且记录触发操作的操作时间戳;
第一添加单元112,配置为将第一信息类型字段和操作时间戳添加至第一沉浸媒体内容相关联的交互信令表,将交互信令表中所添加的第一信息类型字段作为用于描述交互操作所指示的业务事件的业务关键字段;
第一生成单元113,配置为基于交互信令表中的业务关键字段和操作时间戳,生成触发操作对应的交互反馈消息。
其中,在触发操作包含缩放操作时,缩放操作所指示的业务事件为缩放事件,且在缩放操作所对应的第一信息类型字段的字段值为第一字段值时,具有第一字段值的第一信息类型字段所映射的字段,用于表征执行缩放事件时的缩放比例。
其中,在触发操作包含切换操作时,切换操作所指示的业务事件为切换事件,且在切换操作所对应的第一信息类型字段的字段值为第二字段值时,具有第二字段值的第一信息类型字段所映射的字段,用于表征切换事件的事件标签和事件状态。
其中,在事件状态的状态值为第一状态值时,具有第一状态值的事件状态用于表征切换事件处于事件触发状态;在事件状态的状态值为第二状态值时,具有第二状态值的事件状态用于表征切换事件处于事件结束状态。
第二确定单元114,配置为在检测到观看第一沉浸媒体内容的业务对象的对象位置信息时,将针对对象位置信息的位置交互操作作为响应于第一沉浸媒体内容的交互操作;确定交互操作所指示的业务事件的第二信息类型字段,且记录交互操作的操作时间戳;
第二添加单元115,配置为将第二信息类型字段和操作时间戳添加至第一沉浸媒体内容相关联的交互信令表,将交互信令表中所添加的第二信息类型字段作为用于描 述交互操作所指示的业务事件的业务关键字段;
第二生成单元116,配置为基于交互信令表中的业务关键字段和操作时间戳,生成交互操作对应的交互反馈消息。
其中,在第一沉浸媒体内容为沉浸式视频中的沉浸媒体内容,且沉浸式视频为全景视频时,对象位置信息所对应的第二信息类型字段的字段值为第三字段值,具有第三字段值的第二信息类型字段包含第一类位置字段,第一类位置字段用于描述观看属于全景视频的第一沉浸媒体内容的业务对象的位置变动信息。
其中,在第一沉浸媒体内容为沉浸式视频中的沉浸媒体内容,且沉浸式视频为容积视频时,对象位置信息所对应的第二信息类型字段的字段值为第四字段值,具有第四字段值的第二信息类型字段包含第二类位置字段,第二类位置字段用于描述观看属于容积视频的第一沉浸媒体内容的业务对象的位置变动信息。
其中,交互反馈消息还包括扩展描述字段,该扩展描述字段可以为视频客户端的系统层新增的扩展描述字段;扩展描述字段中包含信令表数量字段、信令表标识字段、信令表版本字段以及信令表长度字段中至少之一;信令表数量字段,用于表征交互反馈消息所包含的交互信令表的总数;信令表标识字段,用于表征交互反馈消息所包含的每个交互信令表的标识符;信令表版本字段,用于表征每个交互信令表的版本号;信令表长度字段,用于表征每个交互信令表的长度。
其中,交互反馈消息还包括资源组属性字段以及资源组标识字段;资源组属性字段用于表征第一沉浸媒体内容与目标资源组所包含的沉浸媒体内容集之间的从属关系;资源组标识字段用于表征目标资源组的标识符。
其中,在资源组属性字段的字段值为第一属性字段值时,具有第一属性字段值的资源组属性字段,用于表征第一沉浸媒体内容属于沉浸媒体内容集;在资源组属性字段的字段值为第二属性字段值时,具有第二属性字段值的资源组属性字段用于表征第一沉浸媒体内容不属于沉浸媒体内容集。
其中,视频请求模块14的实现方式可以参见上述图7所对应实施例中的步骤S101,这里将不再继续进行赘述。
请参见图11,是本申请实施例提供的一种沉浸媒体的数据处理装置的结构示意图。该沉浸媒体的数据处理装置可以是运行于编码设备的一个计算机程序(包括程序代码),例如该沉浸媒体的数据处理装置可以为编码设备中的一个应用软件;该沉浸媒体的数据处理装置可以用于执行本申请实施例提供的沉浸媒体的数据处理方法中的相应步骤。在一些实施例中,如图11所示,该沉浸媒体的数据处理2可以包括:消息接收模块21、内容获取模块22、内容返回模块23;
消息接收模块21,配置为接收交互反馈消息;交互反馈消息是在响应针对第一沉浸媒体内容的交互操作时所生成;交互反馈消息中携带用于描述交互操作所指示的业务事件的业务关键字段;
内容获取模块22,配置为基于交互反馈消息中的业务关键字段,确定交互操作所指示的业务事件,基于交互操作所指示的业务事件获取用于响应交互操作的第二沉浸媒体内容;
内容返回模块23,配置为返回第二沉浸媒体内容。
其中,消息接收模块21、内容获取模块22、内容返回模块23的实现方式可以参见上述图9所对应实施例中的步骤S307-步骤S309,这里将不再继续进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
请参见图12,是本申请实施例提供的一种计算机设备的结构示意图。如图12所示,该计算机设备1000可以包括:处理器1001,网络接口1004和存储器1005,此 外,上述计算机设备1000还可以包括:用户接口1003,和至少一个通信总线1002。其中,通信总线1002配置为实现这些组件之间的连接通信。其中,用户接口1003可以包括显示屏(Display)、键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1004可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1005还可以是至少一个位于远离前述处理器1001的存储装置。如图12所示,作为一种计算机可读存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。
在如图12所示的计算机设备1000中,网络接口1004可提供网络通讯功能;而用户接口1003主要用于为用户提供输入的接口;而处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以执行前文图7、图8、图9任一个所对应实施例中对该沉浸媒体的数据处理方法的描述,也可执行前文图10所对应实施例中对该沉浸媒体的数据处理装置1的描述,还可以执行前述图11所对应实施例中对沉浸媒体的数据处理装置2的描述,在此不再赘述。在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,这里需要指出的是:本申请实施例还提供了一种计算机可读存储介质,且计算机可读存储介质中存储有前文提及的沉浸媒体的数据处理装置1或沉浸媒体的数据处理装置2所执行的计算机程序,且计算机程序包括程序指令,当处理器执行程序指令时,能够执行前文图7、图8、图9任一个所对应实施例中对沉浸媒体的数据处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机可读存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
上述计算机可读存储介质可以是前述任一实施例提供的沉浸媒体的数据处理装置或者上述计算机设备的内部存储单元,例如计算机设备的硬盘或内存。该计算机可读存储介质也可以是该计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(smart media card,SMC),安全数字(secure digital,SD)卡,闪存卡(flash card)等。在一些实施例中地,该计算机可读存储介质还可以既包括该计算机设备的内部存储单元也包括外部存储设备。该计算机可读存储介质用于存储该计算机程序以及该计算机设备所需的其他程序和数据。该计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
此外,这里需要指出的是:本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行前文图7、图8、图9任一个所对应实施例提供的方法。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机程序产品或者计算机程序实施例中未披露的技术细节,请参照本申请方法实施例的描述。
在一些实施例中的,请参见图13,图13是本申请实施例提供的一种数据处理系统的结构示意图。该数据处理系统3可以包括数据处理装置1a和数据处理装置2a。其中,数据处理装置1a可以为上述图10所对应实施例中的沉浸媒体的数据处理装置1,可以理解的是,该数据处理装置1a可以集成在上述图1所对应实施例中的解码设备100B或上述图3所对应实施例中的解码设备200B中,因此,这里将不再进行赘述。其中,数据处理装置2a可以为上述图11所对应实施例中的沉浸媒体的数据处理装置2,可以理解的是,该数据处理装置2a可以集成在上述图1所对应实施例中的 编码设备100A或上述图3所对应实施例中的编码设备200A中,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的数据处理系统实施例中未披露的技术细节,请参照本申请方法实施例的描述。
本申请实施例的说明书和权利要求书及附图中的术语“第一”、“第二”等是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、装置、产品或设备没有限定于已列出的步骤或模块,而是可选地还包括没有列出的步骤或模块,或可选地还包括对于这些过程、方法、装置、产品或设备固有的其他步骤单元。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。
Claims (19)
- 一种沉浸媒体的数据处理方法,包括:响应针对第一沉浸媒体内容的交互操作,生成所述交互操作对应的交互反馈消息;所述交互反馈消息中携带业务关键字段,所述业务关键字段,用于描述所述交互操作所指示的业务事件;发送所述交互反馈消息,所述交互反馈消息,用于确定所述交互操作所指示的业务事件,并基于所述业务事件获取用于响应所述交互操作的第二沉浸媒体内容;接收返回的所述第二沉浸媒体内容。
- 根据权利要求1所述的方法,其中,还包括:响应针对沉浸式视频的视频播放操作,生成所述视频播放操作对应的播放请求,并发送所述播放请求;其中,所述播放请求,用于请求获取所述沉浸式视频的第一沉浸媒体内容;接收返回的所述第一沉浸媒体内容,并播放所述第一沉浸媒体内容。
- 根据权利要求1-2任一项所述的方法,其中,所述业务关键字段包含第一关键字段、第二关键字段、第三关键字段以及第四关键字段中至少之一;其中,所述第一关键字段,用于表征在所述交互操作包含缩放操作时,执行所述缩放操作所指示的缩放事件时的缩放比例;所述第二关键字段,用于表征在所述交互操作包含切换操作时,所述切换操作所指示的切换事件对应的事件标签和事件状态;所述第三关键字段,用于表征在所述交互操作包含第一位置交互操作时,观看属于全景视频的第一沉浸媒体内容的业务对象的第一对象位置信息;所述第四关键字段,用于表征在所述交互操作包含第二位置交互操作时,观看属于容积视频的第一沉浸媒体内容的业务对象的第二对象位置信息。
- 根据权利要求1-2任一项所述的方法,其中,所述响应针对第一沉浸媒体内容的交互操作,生成所述交互操作对应的交互反馈消息,包括:响应针对第一沉浸媒体内容的触发操作,确定所述触发操作所指示的业务事件的第一信息类型字段,且记录所述触发操作的操作时间戳;将所述第一信息类型字段和所述操作时间戳添加至所述第一沉浸媒体内容相关联的交互信令表,将所述交互信令表中所添加的所述第一信息类型字段作为所述业务关键字段;基于所述交互信令表中的所述业务关键字段和所述操作时间戳,生成所述触发操作对应的交互反馈消息。
- 根据权利要求4所述的方法,其中,在所述触发操作包含缩放操作时,所述缩放操作所指示的业务事件为缩放事件,且在所述缩放操作所对应的第一信息类型字段的字段值为第一字段值时,具有所述第一字段值的第一信息类型字段所映射的字段,用于表征执行所述缩放事件时的缩放比例。
- 根据权利要求4所述的方法,其中,在所述触发操作包含切换操作时,所述切换操作所指示的业务事件为切换事件,且在所述切换操作所对应的第一信息类型字段的字段值为第二字段值时,具有所述第二字段值的第一信息类型字段所映射的字段,用于表征所述切换事件的事件标签和事件状态。
- 根据权利要求6所述的方法,其中,在所述事件状态的状态值为第一状态值时,具有所述第一状态值的事件状态,用于表征所述切换事件处于事件触发状态;在 所述事件状态的状态值为第二状态值时,具有所述第二状态值的事件状态,用于表征所述切换事件处于事件结束状态。
- 根据权利要求1-2任一项所述的方法,其中,所述响应针对第一沉浸媒体内容的交互操作,生成所述交互操作对应的交互反馈消息,包括:在检测到观看第一沉浸媒体内容的业务对象的对象位置信息时,将针对所述对象位置信息的位置交互操作,确定为针对所述第一沉浸媒体内容的交互操作;确定所述交互操作所指示的业务事件的第二信息类型字段,且记录所述交互操作的操作时间戳;将所述第二信息类型字段和所述操作时间戳添加至所述第一沉浸媒体内容相关联的交互信令表,将所述交互信令表中所添加的所述第二信息类型字段作为所述业务关键字段;基于所述交互信令表中的所述业务关键字段和所述操作时间戳,生成所述交互操作对应的交互反馈消息。
- 根据权利要求8所述的方法,其中,在所述第一沉浸媒体内容为沉浸式视频中的沉浸媒体内容,且所述沉浸式视频为全景视频时,所述对象位置信息所对应的第二信息类型字段的字段值为第三字段值,具有所述第三字段值的第二信息类型字段包含第一类位置字段,所述第一类位置字段,用于描述观看所述第一沉浸媒体内容的业务对象的位置变动信息。
- 根据权利要求8所述的方法,其中,在所述第一沉浸媒体内容为沉浸式视频中的沉浸媒体内容,且所述沉浸式视频为容积视频时,所述对象位置信息所对应的第二信息类型字段的字段值为第四字段值,具有所述第四字段值的第二信息类型字段包含第二类位置字段,所述第二类位置字段,用于描述观看所述第一沉浸媒体内容的业务对象的位置变动信息。
- 根据权利要求1-2任一项所述的方法,其中,所述交互反馈消息还包括扩展描述字段;所述扩展描述字段中包含信令表数量字段、信令表标识字段、信令表版本字段以及信令表长度字段中至少之一;其中,所述信令表数量字段,用于表征所述交互反馈消息所包含的交互信令表的总数;所述信令表标识字段,用于表征所述交互反馈消息所包含的每个交互信令表的标识符;所述信令表版本字段,用于表征所述每个交互信令表的版本号;所述信令表长度字段,用于表征所述每个交互信令表的长度。
- 根据权利要求1-2任一项所述的方法,其中,所述交互反馈消息还包括资源组属性字段以及资源组标识字段;所述资源组属性字段,用于表征所述第一沉浸媒体内容与目标资源组所包含的沉浸媒体内容集之间的从属关系;所述资源组标识字段用于表征所述目标资源组的标识符。
- 根据权利要求12所述的方法,其中,在所述资源组属性字段的字段值为第一属性字段值时,具有所述第一属性字段值的资源组属性字段,用于表征所述第一沉浸媒体内容属于所述沉浸媒体内容集;在所述资源组属性字段的字段值为第二属性字段值时,具有所述第二属性字段值的资源组属性字段,用于表征所述第一沉浸媒体内容不属于所述沉浸媒体内容集。
- 一种沉浸媒体的数据处理方法,所述方法包括:接收交互反馈消息;所述交互反馈消息在响应针对第一沉浸媒体内容的交互操作时所生成;所述交互反馈消息中携带业务关键字段,所述业务关键字段,用于描述所述交互操作所指示的业务事件;基于所述交互反馈消息中的所述业务关键字段,确定所述交互操作所指示的业务事件,基于所述业务事件,获取用于响应所述交互操作的第二沉浸媒体内容;返回所述第二沉浸媒体内容。
- 一种沉浸媒体的数据处理装置,包括:消息生成模块,配置为响应针对第一沉浸媒体内容的交互操作,生成所述交互操作对应的交互反馈消息;所述交互反馈消息中携带业务关键字段,所述业务关键字段,用于描述所述交互操作所指示的业务事件;消息发送模块,配置为发送所述交互反馈消息,所述交互反馈消息,用于确定所述交互操作所指示的业务事件,并基于所述业务事件,获取用于响应所述交互操作的第二沉浸媒体内容;内容接收模块,配置为接收返回的所述第二沉浸媒体内容。
- 一种沉浸媒体的数据处理装置,包括:消息接收模块,配置为接收交互反馈消息;所述交互反馈消息在响应针对第一沉浸媒体内容的交互操作时所生成;所述交互反馈消息中携带业务关键字段,所述业务关键字段,用于描述所述交互操作所指示的业务事件;内容获取模块,配置为基于所述交互反馈消息中的所述业务关键字段,确定所述交互操作所指示的业务事件,基于所述业务事件,获取用于响应所述交互操作的第二沉浸媒体内容;内容返回模块,配置为返回所述第二沉浸媒体内容。
- 一种计算机设备,包括:处理器和存储器;所述处理器与所述存储器相连,其中,所述存储器配置为存储计算机程序,所述处理器配置为调用所述计算机程序,以使所述计算机设备执行权利要求1-14任一项所述的方法。
- 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,该计算机程序适于由处理器加载并执行,以使具有所述处理器的计算机设备执行权利要求1-14任一项所述的方法。
- 一种计算机程序产品,所述计算机程序产品包括计算机指令,所述计算机指令存储在计算机可读存储介质中,该计算机指令适于由处理器读取并执行,以使具有所述处理器的计算机设备执行权利要求1-14任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP22874529.5A EP4412227A1 (en) | 2021-09-29 | 2022-08-31 | Immersive-media data processing method, apparatus, device, storage medium and program product |
US18/382,799 US20240048676A1 (en) | 2021-09-29 | 2023-10-23 | Method, apparatus and device for processing immersive media data, storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111149860.8A CN113891117B (zh) | 2021-09-29 | 2021-09-29 | 沉浸媒体的数据处理方法、装置、设备及可读存储介质 |
CN202111149860.8 | 2021-09-29 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/382,799 Continuation US20240048676A1 (en) | 2021-09-29 | 2023-10-23 | Method, apparatus and device for processing immersive media data, storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023051138A1 true WO2023051138A1 (zh) | 2023-04-06 |
Family
ID=79007959
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/116102 WO2023051138A1 (zh) | 2021-09-29 | 2022-08-31 | 沉浸媒体的数据处理方法、装置、设备、存储介质及程序产品 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20240048676A1 (zh) |
EP (1) | EP4412227A1 (zh) |
CN (2) | CN116233493A (zh) |
WO (1) | WO2023051138A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117156175A (zh) * | 2023-10-30 | 2023-12-01 | 山东大学 | 基于视口预测距离控制的全景视频流QoE优化方法 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113362438A (zh) * | 2021-06-30 | 2021-09-07 | 北京百度网讯科技有限公司 | 全景渲染的方法、装置、电子设备、介质及程序 |
CN116233493A (zh) * | 2021-09-29 | 2023-06-06 | 腾讯科技(深圳)有限公司 | 沉浸媒体的数据处理方法、装置、设备及可读存储介质 |
CN114722218B (zh) * | 2022-05-09 | 2024-10-18 | 北京未来时空科技有限公司 | 一种三维可交互媒体的解析方法、装置及存储介质 |
CN118158377A (zh) * | 2022-06-09 | 2024-06-07 | 腾讯科技(深圳)有限公司 | 点云媒体的数据处理方法、装置、设备、存储介质及产品 |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150062284A1 (en) * | 2013-08-30 | 2015-03-05 | Amrita Vishwa Vidyapeetham | System and Method for Synthesizing and Preserving Consistent Relative Neighborhood Position in Multi-Perspective Multi-Point Tele-Immersive Environments |
CN106992004A (zh) * | 2017-03-06 | 2017-07-28 | 华为技术有限公司 | 一种调整视频的方法及终端 |
CN107667331A (zh) * | 2015-05-28 | 2018-02-06 | 微软技术许可有限责任公司 | 共享空间多人沉浸式虚拟现实中的共享触觉交互和用户安全 |
CN109362242A (zh) * | 2016-10-10 | 2019-02-19 | 华为技术有限公司 | 一种视频数据的处理方法及装置 |
CN111641871A (zh) * | 2020-05-29 | 2020-09-08 | 广州华多网络科技有限公司 | 直播视频的展示方法、装置、终端和可读存储介质 |
US10819953B1 (en) * | 2018-10-26 | 2020-10-27 | Facebook Technologies, Llc | Systems and methods for processing mixed media streams |
CN113206993A (zh) * | 2021-04-13 | 2021-08-03 | 聚好看科技股份有限公司 | 一种调整显示屏幕的方法及显示设备 |
CN113453046A (zh) * | 2020-03-24 | 2021-09-28 | 腾讯科技(深圳)有限公司 | 沉浸式媒体提供方法、获取方法、装置、设备及存储介质 |
CN113891117A (zh) * | 2021-09-29 | 2022-01-04 | 腾讯科技(深圳)有限公司 | 沉浸媒体的数据处理方法、装置、设备及可读存储介质 |
CN113938651A (zh) * | 2021-10-12 | 2022-01-14 | 北京天玛智控科技股份有限公司 | 全景视频交互的监控方法、监控系统、装置及存储介质 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8365213B1 (en) * | 2011-03-18 | 2013-01-29 | Robert Orlowski | System and method for measuring television advertising and program viewing at a second-by-second level and for measuring effectiveness of targeted advertising |
US20170085964A1 (en) * | 2015-09-17 | 2017-03-23 | Lens Entertainment PTY. LTD. | Interactive Object Placement in Virtual Reality Videos |
KR102157655B1 (ko) * | 2016-02-17 | 2020-09-18 | 엘지전자 주식회사 | 360 비디오를 전송하는 방법, 360 비디오를 수신하는 방법, 360 비디오 전송 장치, 360 비디오 수신 장치 |
EP3531244A1 (en) * | 2018-02-26 | 2019-08-28 | Thomson Licensing | Method, apparatus and system providing alternative reality environment |
US10845979B2 (en) * | 2019-01-10 | 2020-11-24 | Tcl Research America, Inc. | Method and system for digital content display and interaction |
CN113114608B (zh) * | 2020-01-10 | 2022-06-10 | 上海交通大学 | 点云数据封装方法及传输方法 |
CN113453083B (zh) * | 2020-03-24 | 2022-06-28 | 腾讯科技(深圳)有限公司 | 多自由度场景下的沉浸式媒体获取方法、设备及存储介质 |
-
2021
- 2021-09-29 CN CN202310228608.9A patent/CN116233493A/zh active Pending
- 2021-09-29 CN CN202111149860.8A patent/CN113891117B/zh active Active
-
2022
- 2022-08-31 EP EP22874529.5A patent/EP4412227A1/en active Pending
- 2022-08-31 WO PCT/CN2022/116102 patent/WO2023051138A1/zh unknown
-
2023
- 2023-10-23 US US18/382,799 patent/US20240048676A1/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150062284A1 (en) * | 2013-08-30 | 2015-03-05 | Amrita Vishwa Vidyapeetham | System and Method for Synthesizing and Preserving Consistent Relative Neighborhood Position in Multi-Perspective Multi-Point Tele-Immersive Environments |
CN107667331A (zh) * | 2015-05-28 | 2018-02-06 | 微软技术许可有限责任公司 | 共享空间多人沉浸式虚拟现实中的共享触觉交互和用户安全 |
CN109362242A (zh) * | 2016-10-10 | 2019-02-19 | 华为技术有限公司 | 一种视频数据的处理方法及装置 |
CN106992004A (zh) * | 2017-03-06 | 2017-07-28 | 华为技术有限公司 | 一种调整视频的方法及终端 |
US10819953B1 (en) * | 2018-10-26 | 2020-10-27 | Facebook Technologies, Llc | Systems and methods for processing mixed media streams |
CN113453046A (zh) * | 2020-03-24 | 2021-09-28 | 腾讯科技(深圳)有限公司 | 沉浸式媒体提供方法、获取方法、装置、设备及存储介质 |
CN111641871A (zh) * | 2020-05-29 | 2020-09-08 | 广州华多网络科技有限公司 | 直播视频的展示方法、装置、终端和可读存储介质 |
CN113206993A (zh) * | 2021-04-13 | 2021-08-03 | 聚好看科技股份有限公司 | 一种调整显示屏幕的方法及显示设备 |
CN113891117A (zh) * | 2021-09-29 | 2022-01-04 | 腾讯科技(深圳)有限公司 | 沉浸媒体的数据处理方法、装置、设备及可读存储介质 |
CN113938651A (zh) * | 2021-10-12 | 2022-01-14 | 北京天玛智控科技股份有限公司 | 全景视频交互的监控方法、监控系统、装置及存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117156175A (zh) * | 2023-10-30 | 2023-12-01 | 山东大学 | 基于视口预测距离控制的全景视频流QoE优化方法 |
CN117156175B (zh) * | 2023-10-30 | 2024-01-30 | 山东大学 | 基于视口预测距离控制的全景视频流QoE优化方法 |
Also Published As
Publication number | Publication date |
---|---|
CN113891117A (zh) | 2022-01-04 |
EP4412227A1 (en) | 2024-08-07 |
US20240048676A1 (en) | 2024-02-08 |
CN116233493A (zh) | 2023-06-06 |
CN113891117B (zh) | 2023-02-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023051138A1 (zh) | 沉浸媒体的数据处理方法、装置、设备、存储介质及程序产品 | |
CN109691094B (zh) | 发送全向视频的方法、接收全向视频的方法、发送全向视频的装置和接收全向视频的装置 | |
CN111213183A (zh) | 渲染三维内容的方法和装置 | |
US20190325652A1 (en) | Information Processing Method and Apparatus | |
US20190238933A1 (en) | Video stream transmission method and related device and system | |
CN110876051B (zh) | 视频数据的处理,传输方法及装置,视频数据的处理系统 | |
CN111869222B (zh) | 基于http的dash客户端网元、方法及介质 | |
WO2023098279A1 (zh) | 视频数据处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品 | |
US20230421810A1 (en) | Encapsulation and decapsulation methods and apparatuses for point cloud media file, and storage medium | |
EP3712751A1 (en) | Method and apparatus for incorporating location awareness in media content | |
CN114095737B (zh) | 媒体文件封装及解封装方法、装置、设备及存储介质 | |
WO2024041239A1 (zh) | 一种沉浸媒体的数据处理方法、装置、设备、存储介质及程序产品 | |
WO2023061131A1 (zh) | 媒体文件封装方法、装置、设备及存储介质 | |
US12107908B2 (en) | Media file encapsulating method, media file decapsulating method, and related devices | |
US20230360678A1 (en) | Data processing method and storage medium | |
WO2023226504A1 (zh) | 一种媒体数据处理方法、装置、设备以及可读存储介质 | |
CN114116617A (zh) | 点云媒体的数据处理方法、装置、设备及可读存储介质 | |
WO2023024839A1 (zh) | 媒体文件封装与解封装方法、装置、设备及存储介质 | |
WO2023280623A1 (en) | Augmenting video or external environment with 3d graphics | |
US20230403411A1 (en) | File decapsulation method and apparatus for free viewpoint video, device, and storage medium | |
WO2022037423A1 (zh) | 点云媒体的数据处理方法、装置、设备及介质 | |
US20230421819A1 (en) | Media file unpacking method and apparatus, device, and storage medium | |
WO2023169003A1 (zh) | 点云媒体的解码方法、点云媒体的编码方法及装置 | |
CN117082262A (zh) | 点云文件封装与解封装方法、装置、设备及存储介质 | |
CN115481280A (zh) | 容积视频的数据处理方法、装置、设备及可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22874529 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022874529 Country of ref document: EP Effective date: 20240429 |