WO2021244132A1 - 沉浸媒体的数据处理方法、装置、设备及计算机存储介质 - Google Patents

沉浸媒体的数据处理方法、装置、设备及计算机存储介质 Download PDF

Info

Publication number
WO2021244132A1
WO2021244132A1 PCT/CN2021/085907 CN2021085907W WO2021244132A1 WO 2021244132 A1 WO2021244132 A1 WO 2021244132A1 CN 2021085907 W CN2021085907 W CN 2021085907W WO 2021244132 A1 WO2021244132 A1 WO 2021244132A1
Authority
WO
WIPO (PCT)
Prior art keywords
independent codec
independent
codec area
area
track
Prior art date
Application number
PCT/CN2021/085907
Other languages
English (en)
French (fr)
Inventor
胡颖
许晓中
刘杉
崔秉斗
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP21818517.1A priority Critical patent/EP4124046A4/en
Publication of WO2021244132A1 publication Critical patent/WO2021244132A1/zh
Priority to US17/731,162 priority patent/US20220272424A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • H04N21/2335Processing of audio elementary streams involving reformatting operations of audio signals, e.g. by converting from one coding standard to another
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8545Content authoring for generating interactive applications

Definitions

  • This application relates to the field of computer technology and virtual reality (Virtual Reality, VR) technology, and in particular to a data processing method, device, device and computer-readable storage medium for immersive media.
  • VR Virtual Reality
  • the content of immersive media is divided into multiple sub-image frames, and these sub-image frames are encapsulated in multiple track groups according to their relevance.
  • the so-called relevance means that multiple sub-image frames in the same track group belong to the same immersion.
  • the media has the same resolution.
  • Such relevance limits the packaging flexibility of immersive media to a certain extent.
  • the perspective adaptive transmission scheme of immersive media in order to ensure that the corresponding picture can be presented in time when the user's head moves, the perspective transmitted to the user is both Contains the high-definition segmented video of the user's current viewing angle, and also includes the low-definition segmented video around the user's current viewing angle.
  • These two videos belong to the same video content, but belong to different resolution versions of the video.
  • these two kinds of videos are packaged into different track groups, so it is difficult to indicate the consumption relationship between the two track groups, which brings inconvenience to the presentation of the content playback device.
  • the embodiments of the present application provide a data processing method, device, device, and computer-readable storage medium for immersive media, which can combine multiple segmented videos (with the same resolution or different resolutions) belonging to the same immersive media in different spaces. It is packaged in the same track group, and an independent codec area description data box is used to indicate the consumption relationship between the tracks in the track group, which is convenient for the presentation of immersive media.
  • the embodiment of the application provides a data processing method for immersive media, including:
  • the independent codec area description data box includes the independent codec area data box And coordinate information data box, where i and N are positive integers, and i ⁇ N;
  • N pieces of video (with the same resolution or different resolutions) belonging to the same immersive media are encapsulated into N tracks, and the N tracks are encapsulated into the same track group; and each piece of The concept of the independent codec area corresponding to the block video uses the independent codec area description data box of the i-th independent codec area to indicate the consumption relationship between the i-th track and other tracks in the track group.
  • an independent codec area description data box of an independent codec area is used to display the i-th independent codec area, a more convenient and accurate presentation effect can be obtained.
  • the embodiment of the application provides a data processing method for immersive media, including:
  • the i-th piece of video is encapsulated in the i-th track; the i-th piece of video corresponds to the i-th independent codec area, where i, N are positive Integer, and i ⁇ N; N tracks belong to the same track group;
  • an independent codec area description data box of the i-th independent codec area is generated, and the independent codec area description data box includes an independent codec area data box and a coordinate information data box.
  • N pieces of video (with the same resolution or different resolutions) belonging to the same immersive media are packaged into N tracks, and the N tracks are packaged into the same track group; this can be applied to more Multiple transmission scenarios, such as adaptive transmission scenarios for viewing angles of immersive media; and make the transmission process of immersive media more reliable, and also avoid unnecessary memory overhead caused by content production equipment when storing different versions of videos.
  • the concept of independent codec area corresponding to each block video is introduced, and the independent codec area description data box of the i-th independent codec area is generated according to the encapsulation process of the i-th block video.
  • the independent codec area description data box of the decoding area indicates the consumption relationship between the i-th track and other tracks in the track group; then, when the independent codec area description data box is transmitted to the content consumption device side, the content consumption The device side can display the i-th independent codec area according to the independent codec area description data box of the i-th independent codec area, so that a more convenient and accurate presentation effect can be obtained.
  • the embodiment of the application provides a data processing method for immersive media, including:
  • the immersive media includes N pieces of video.
  • the N pieces of video are encapsulated into N tracks, and the i-th piece of video is encapsulated in the i-th track; the N tracks belong to the same A track group; the i-th block video corresponds to the i-th independent codec area; the packaged file includes at least the i-th track, and the i-th track contains the independent codec area description data box of the i-th independent codec area, Where i and N are positive integers, and i ⁇ N;
  • the independent codec area description data box includes the independent codec area data box and the coordinate information data box;
  • N pieces of video (with the same resolution or different resolutions) belonging to the same immersive media are encapsulated into N tracks, and the N tracks are encapsulated into the same track group; and each piece of The concept of the independent codec area corresponding to the block video uses the independent codec area description data box of the i-th independent codec area to indicate the consumption relationship between the i-th track and other tracks in the track group.
  • an independent codec area description data box of an independent codec area is used to display the i-th independent codec area, a more convenient and accurate presentation effect can be obtained.
  • the embodiment of the application provides an immersive media data processing device, including:
  • the obtaining unit is configured to obtain the independent codec area description data box of the i-th independent codec area of the immersive media, the i-th independent codec area corresponds to the i-th block video; the independent codec area description data box includes independent Codec area data box and coordinate information data box, where i and N are positive integers, and i ⁇ N;
  • the processing unit is configured to display the i-th block video of the immersive media according to the independent codec area description data box.
  • the embodiment of the present application provides another immersive media data processing device, including:
  • the processing unit is configured to divide the immersive media into N pieces of video; respectively encapsulate the N pieces of video into N tracks, and the i-th piece of video is encapsulated in the i-th track; the i-th piece
  • the video corresponds to the i-th independent codec area, where i, N are positive integers and i ⁇ N; N tracks belong to the same track group; the i-th independent codec area is generated according to the encapsulation process of the i-th block video
  • the independent coding/decoding area description data box includes an independent coding/decoding area data box and a coordinate information data box.
  • the embodiment of the present application provides another immersive media data processing device, including:
  • the acquiring unit is configured to acquire a packaged file of the immersive media.
  • the immersive media includes N block videos, the N blocks are encapsulated in N tracks, and the i-th block video is encapsulated in the i-th track; N The tracks belong to the same track group; the i-th block video corresponds to the i-th independent codec area; the packaged file includes at least the i-th track, and the i-th track contains the description of the independent codec area of the i-th independent codec area Data box, where i and N are positive integers, and i ⁇ N;
  • a processing unit configured to unpack the packaged file to obtain an independent codec area description data box of the i-th independent codec area, the independent codec area description data box including an independent codec area data box and a coordinate information data box; Display the i-th block video of the immersive media according to the independent codec area description data box.
  • the embodiment of the present application provides an immersive media data processing device, including:
  • One or more processors and one or more memories, and at least one piece of program code is stored in the one or more memories, and the at least one piece of program code is loaded and executed by the one or more processors to implement the embodiments of the present application.
  • the data processing method of immersive media is described.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium stores at least one piece of program code.
  • the at least one piece of program code is loaded and executed by a processor to implement the immersive medium provided by the embodiment of the present application. Data processing method.
  • N pieces of video (with the same resolution or different resolutions) belonging to the same immersive media are packaged into N tracks, and the N tracks are packaged into the same track group; this can be applied to more Multiple transmission scenarios, such as adaptive transmission scenarios for viewing angles of immersive media; and make the transmission process of immersive media more reliable, and also avoid unnecessary memory overhead caused by content production equipment when storing different versions of videos.
  • FIG. 1A shows an architecture diagram of an immersive media system provided by an embodiment of the present application
  • FIG. 1B shows a flowchart of an immersive media transmission solution provided by an embodiment of the present application
  • FIG. 1C shows a basic block diagram of a video coding provided by an embodiment of the present application
  • FIG. 1D shows a schematic diagram of 6DoF provided by an embodiment of the present application
  • FIG. 1E shows a schematic diagram of 3DoF provided by an embodiment of the present application
  • Figure 1F shows a schematic diagram of 3DoF+ provided by an embodiment of the present application
  • FIG. 1G shows a schematic diagram of input image division according to an embodiment of the present application
  • FIG. 2 shows a flowchart of an immersive media data processing method provided by an embodiment of the present application
  • FIG. 3 shows a flowchart of another immersive media data processing method provided by an embodiment of the present application
  • FIG. 4A shows an application scenario diagram of immersive media transmission provided by an embodiment of the present application
  • FIG. 4B shows another application scenario diagram of immersive media transmission provided by an embodiment of the present application
  • FIG. 5 shows a flowchart of another immersive media data processing method provided by an embodiment of the present application.
  • FIG. 6 shows a schematic structural diagram of an immersive media data processing apparatus provided by an embodiment of the present application.
  • FIG. 7 shows a schematic structural diagram of another immersive media data processing device provided by an embodiment of the present application.
  • FIG. 8 shows a schematic structural diagram of a content production device provided by an embodiment of the present application.
  • Fig. 9 shows a schematic structural diagram of a content playback device provided by an embodiment of the present application.
  • the embodiments of the present application relate to data processing technology of immersive media.
  • the so-called immersive media refers to media files that can provide immersive media content so that users immersed in the media content can obtain visual and auditory experience in the real world.
  • the immersive media can be 3DoF (Degree of Freedom) immersive media, 3DoF+ immersive media, or 6DoF immersive media.
  • Immersive media content includes video content represented in a three-dimensional (3-Dimension, 3D) space in various forms, for example, three-dimensional video content represented in a spherical form.
  • immersive media content can be virtual reality (Virtual Reality, VR) video content, panoramic video content, spherical video content or 360-degree video content; therefore, immersive media can also be called VR video, panoramic video, spherical video Or 360-degree video.
  • the immersive media content also includes audio content synchronized with the video content represented in the three-dimensional space.
  • FIG 1A shows an architecture diagram of an immersive media system provided by an embodiment of the present application
  • the immersive media system includes content production equipment and content playback equipment
  • the content production equipment may refer to the immersive media provider
  • the computer device may be a terminal (such as a personal computer (PC), a smart mobile device (such as a smart phone), etc.) or a server.
  • the content playback device can refer to the computer equipment used by the user (such as the user) of the immersive media.
  • the computer equipment can be a terminal (such as a PC), a smart mobile device (such as a smart phone), a VR device (such as a VR helmet, VR glasses) Wait)).
  • the data processing process of immersive media includes the data processing process on the content production device side and the data processing process on the content playback device side.
  • the data processing process at the content production equipment side mainly includes: (1) the acquisition and production process of the media content of the immersive media; (2) the process of encoding and file packaging of the immersive media.
  • the data processing process at the content playback device side mainly includes: (3) the process of file decapsulation and decoding of the immersive media; (4) the rendering process of the immersive media.
  • the transmission process of immersive media is involved between content production equipment and content playback equipment.
  • the transmission process can be performed based on various transmission protocols.
  • the transmission protocol here may include, but is not limited to: Dynamic Adaptive Streaming Media Transmission (Dynamic Adaptive Streaming). Streaming over HTTP, DASH) protocol, dynamic bit rate adaptive transmission (HTTP Live Streaming, HLS) protocol, Smart Media Transport Protocol (SMTP), Transmission Control Protocol (Transmission Control Protocol, TCP), etc.
  • FIG. 1B shows a flowchart of an immersive media transmission solution provided by an embodiment of the present application.
  • FIG. 1B in order to solve the problem of the transmission bandwidth load caused by the excessive data volume of the immersive media itself, in the processing of the immersive media, it is usually selected to split the original video into multiple video blocks in space, respectively. After encoding, it is encapsulated, and then transmitted to the client for consumption.
  • FIG. 1C shows a basic block diagram of a video coding provided by an embodiment of the present application.
  • the various processes involved in the data processing process of immersive media are introduced in detail:
  • the capture device may refer to a hardware component provided in the content production device.
  • the capture device refers to the microphone, camera, sensor, etc. of the terminal.
  • the capture device may also be a hardware device connected to the content production device, such as a camera connected to a server; used to provide the content production device with an immersive media content acquisition service.
  • the capture device may include, but is not limited to: audio equipment, camera equipment, and sensor equipment. Among them, the audio device may include an audio sensor, a microphone, and so on.
  • the camera equipment may include a normal camera, a stereo camera, a light field camera, and so on.
  • Sensing equipment may include laser equipment, radar equipment, and so on.
  • the number of capture devices can be multiple. These capture devices are deployed in specific locations in the real space to simultaneously capture audio content and video content from different angles in the space. The captured audio content and video content are both temporally and spatially captured. Stay in sync. Due to the different acquisition methods, the compression encoding methods corresponding to the media content of different immersive media may also be different.
  • the captured audio content itself is the content suitable for the audio encoding of the immersive media.
  • the captured video content undergoes a series of production processes before it becomes content suitable for video encoding for immersive media.
  • the production process includes:
  • stitching refers to the stitching of the video content shot from various angles into a complete video that can reflect the 360-degree visual panorama of the real space, that is, the stitched video It is a panoramic video (or spherical video) expressed in a three-dimensional space.
  • Projection refers to the process of mapping a three-dimensional video formed by splicing onto a two-dimensional (2-Dimension, 2D) image.
  • the 2D image formed by projection is called a projected image; the projection methods can include but are not limited to: latitude and longitude map projection, The projection of the regular hexahedron.
  • the capture device can only capture panoramic video, after such a video is processed by the content production device and transmitted to the content playback device for corresponding data processing, the user on the content playback device can only perform some specific actions (Such as head rotation) to watch 360-degree video information, and performing non-specific actions (such as moving the head) cannot obtain the corresponding video changes.
  • the VR experience is not good, so additional depth information that matches the panoramic video needs to be provided .
  • common production techniques include Six Degrees of Freedom (Six Degrees of Freedom, 6DoF) production technology.
  • FIG. 1D shows a schematic diagram of 6DoF provided by an exemplary embodiment of the present application; 6DoF is divided into window 6DoF, omnidirectional 6DoF, and 6DoF, where window 6DoF means that the user's rotational movement on the X-axis and Y-axis is limited, and The translation in the Z axis is limited; for example, the user cannot see the scene outside the window frame, and the user cannot pass through the window.
  • Omnidirectional 6DoF means that the user's rotational movement on the X-axis, Y-axis, and Z-axis is restricted. For example, the user cannot freely pass through three-dimensional 360-degree VR content in a restricted movement area.
  • 6DoF means that the user can move freely along the X-axis, Y-axis, and Z-axis. For example, the user can move freely in three-dimensional 360-degree VR content. Similar to 6DoF, there are 3DoF and 3DoF+ production technologies.
  • Fig. 1E shows a schematic diagram of 3DoF provided by an exemplary embodiment of the present application; as shown in Fig. 1E, 3DoF means that the user is fixed at the center point of a three-dimensional space, and the user’s head is along the X-axis, Y-axis and Z-axis. Rotate to watch the screen provided by the media content.
  • Figure 1F shows a schematic diagram of 3DoF+ provided by an exemplary embodiment of the present application. As shown in Figure 1F, 3DoF+ means that when the virtual scene provided by immersive media has certain depth information, the user head can be based on 3DoF in a limited Move in the space to watch the screen provided by the media content.
  • the projected image can be directly encoded, or the projected image can be encoded after region encapsulation.
  • Modern mainstream video coding technology taking the international video coding standard HEVC (High Efficiency Video Coding), the international video coding standard VVC (Versatile Video Coding), and the Chinese national video coding standard AVS (Audio Video Coding Standard) as examples, using hybrid coding Frame, the following series of operations and processing are carried out on the input original video signal:
  • Block partition structure According to the size of the processing unit, the input image is divided into several non-overlapping processing units, and similar compression operations are performed on each processing unit. This processing unit is called Coding Tree Unit (CTU), or Largest Coding Unit (LCU). The CTU can continue to be more finely divided to obtain one or more basic coding units, which are called coding units (CUs). Each CU is the most basic element in a coding link.
  • FIG. 1G shows a schematic diagram of input image division according to an embodiment of the present application. The following describes the various encoding methods that may be used for each CU.
  • Predictive Coding It includes intra-frame prediction and inter-frame prediction. After the original video signal is predicted by the selected reconstructed video signal, the residual video signal is obtained. The content production equipment needs to decide for the current CU among the many possible predictive coding modes, select the most suitable one, and inform the content playback equipment.
  • Intra prediction the predicted signal comes from an area that has been coded and reconstructed in the same image
  • Inter-frame prediction The predicted signal comes from another image that has been encoded and is different from the current image (referred to as a reference image)
  • Transform&Quantization The residual video signal undergoes transformation operations such as Discrete Fourier Transform (DFT) and Discrete Cosine Transform (DCT) to transform the signal into the transform domain. Call it the transform coefficient.
  • DFT Discrete Fourier Transform
  • DCT Discrete Cosine Transform
  • the signal in the transform domain is further subjected to lossy quantization operations, and certain information is lost, so that the quantized signal is conducive to compression expression.
  • the content production device also needs to select one of the conversion methods for the current coding CU and notify the content playback device.
  • the fineness of quantization is usually determined by the quantization parameter (QP).
  • a larger value of QP means that coefficients with a larger value range will be quantized into the same output, so it usually brings greater distortion, and Lower code rate; On the contrary, the value of QP is smaller, which means that coefficients with smaller value range will be quantized into the same output, so it usually brings less distortion and corresponds to higher code rate.
  • Entropy coding or statistical coding the quantized transform domain signal will be subjected to statistical compression coding according to the frequency of each value, and finally a binary (0 or 1) compressed code stream is output. At the same time, encoding produces other information, such as the selected mode, motion vector, etc., and entropy encoding is also required to reduce the bit rate.
  • Statistical coding is a lossless coding method that can effectively reduce the bit rate required to express the same signal. Common statistical coding methods include variable length coding (VLC, Variable Length Coding) or context-based binary arithmetic coding (CABAC, Content Adaptive Binary Arithmetic Coding).
  • Loop Filtering The coded image undergoes inverse quantization, inverse transformation, and predictive compensation operations (the inverse operation of 2 to 4 above) to obtain a reconstructed decoded image. Compared with the original image, the reconstructed image has some information different from the original image due to the influence of quantization, resulting in distortion (Distortion). Perform filtering operations on the reconstructed image, such as deblocking, sample adaptive offset (Sample Adaptive Offset, SAO) filter, or adaptive loop filter (Adaptive Loop Filter, ALF), etc., which can effectively reduce Quantify the degree of distortion produced. Since these filtered reconstructed images will be used as references for subsequent coded images to predict future signals, the above-mentioned filtering operation is also called loop filtering, and filtering operations within the coding loop.
  • SAO sample adaptive offset
  • ALF adaptive Loop Filter
  • 6DoF 6DoF production technology
  • a specific encoding method such as point cloud encoding
  • the audio stream and video stream are encapsulated in a file container according to the file format of the immersive media (such as ISO Base Media File Format (ISOBMFF)) to form the media file resource of the immersive media.
  • the media file resource can be Media files or media fragments form media files of immersive media; and according to the file format requirements of immersive media, media presentation description information (Media presentation description, MPD) is used to record the metadata of the media file resources of the immersive media, where the metadata is A general term for information related to the presentation of immersive media.
  • the metadata may include description information of the media content, description information of the window, and signaling information related to the presentation of the media content, and so on.
  • the content production device stores the media presentation description information and media file resources formed after the data processing process.
  • the content playback device can adaptively and dynamically obtain the media file resources of the immersive media and the corresponding media presentation description information from the content production device through the recommendation of the content production device or according to the user needs of the content playback device. For example, the content playback device can be based on the user's head /Eyes/body tracking information determines the user's orientation and location, and then dynamically requests the content production device to obtain the corresponding media file resources based on the determined orientation and location.
  • Media file resources and media presentation description information are transmitted from the content production device to the content playback device through a transmission mechanism (such as DASH, SMT).
  • the file decapsulation process on the content playback device side is opposite to the file encapsulation process on the content production device side.
  • the content playback device decapsulates the media file resources according to the file format requirements of the immersive media to obtain the audio stream and the video stream.
  • the decoding process on the content playback device side is opposite to the encoding process on the content production device side.
  • the content playback device performs audio decoding on the audio code stream to restore the audio content.
  • the decoding process of the video code stream by the content playback device includes the following: 1 Decode the video code stream to obtain a flat projection image. 2The projection image is reconstructed according to the media presentation description information to convert it into a 3D image.
  • the reconstruction process here refers to the process of reprojecting the two-dimensional projection image into the 3D space.
  • the content playback device After the content playback device obtains the compressed code stream, it first performs entropy decoding to obtain various mode information and quantized transform coefficients. Each coefficient undergoes inverse quantization and inverse transformation to obtain a residual signal.
  • the prediction signal corresponding to the CU can be obtained, and after the two are added, the reconstructed signal can be obtained. Finally, the reconstructed value of the decoded image needs to go through an operation of loop filtering to generate the final output signal.
  • the content playback device renders the audio content obtained by audio decoding and the 3D image obtained by video decoding according to the metadata related to the rendering and the window in the media presentation description information, and the playback output of the 3D image is realized when the rendering is completed.
  • the content playback device mainly renders 3D images based on the current viewpoint, parallax, depth information, etc.
  • the content playback device mainly renders the 3D images in the window based on the current viewpoint .
  • the viewpoint refers to the user's viewing position
  • the parallax refers to the line of sight difference caused by the user's binocular or the line of sight caused by motion
  • the window refers to the viewing area.
  • the immersive media system supports a data box (Box), which refers to a data block or object that includes metadata, that is, the data box contains the metadata of the corresponding media content.
  • the immersive media may include multiple data boxes, for example, a rotating data box, an overlay information data box, a media file format data box, and so on.
  • the encoded data stream needs to be encapsulated and transmitted to the user.
  • Related immersive media packaging technology involves the concept of sub-image frames. Multiple sub-image frames that belong to the same immersive media and have the same resolution are encapsulated into the same track group, and sub-image frames that belong to the same immersive media but have different resolutions Are encapsulated into different track groups, and the encapsulation information is recorded using a two-dimensional spatial relationship description data box (SpatialRelationship2DDescriptionBox), where the two-dimensional spatial relationship description data box (SpatialRelationship2DDescriptionBox) is based on the existing track group data box (TrackGroupTypeBox). Expanded.
  • the track refers to a series of time-attributed samples in accordance with the ISO base media file format (ISO base media file format, ISOBMFF) packaging method, such as a video track, which is generated by encoding each frame by a video encoder
  • ISOBMFF ISO base media file format
  • the two-dimensional spatial relationship description data box further includes a two-dimensional spatial relationship source data box (SpatialRelationship2DSourceBox) used to indicate the width and height of the original video frame, and the source ID of the content, and a sub-image frame.
  • the sub-picture frame region data box (SubPictureRegionBox) at the position in the overall video frame.
  • SpatialRelationship2DDescriptionBox SpatialRelationship2DDescriptionBox
  • total_width and total_height indicate the width and height of the original video frame
  • source_id indicates the source ID of the complete video to which the sub-image frame belongs
  • object_x and object_y indicate the coordinates of the left vertex of the sub-image frame
  • object_width and object_height indicates the width and height of the sub-image frame.
  • track_not_alone_flag indicates whether the sub-image frame must be presented simultaneously with other sub-image frames in the track group.
  • track_not_mergeable_flag indicates whether the code stream contained in the track corresponding to the sub-image frame can be directly merged with code streams contained in other sub-image frames in the track group.
  • the prior art adopts the concept of sub-image frames, and the packaging process of sub-image frames limits the packaging flexibility of immersive media to a certain extent, and cannot be applied to various scenes of immersive media, such as The viewing angle is adaptive to the transmission scene.
  • the embodiment of the application expands the track group data box to obtain an independent coded region description data box (IndependentlyCodedRegionDescriptionBox), so that all belong to the same immersive media (like a program or the same content), and are spatially and clearly defined Tracks that have an association relationship can all be defined in the same track group, that is, tracks corresponding to different spatial blocks of the same video content and videos of different resolutions all belong to the same track group. Since videos of different resolution versions may be divided separately in space, block videos of different resolutions use different coordinate systems at this time, which is represented by a coordinate information data box (CoordianteInfoBox). The coordinate information of each block video is represented by IndependentlyCodedRegionBox.
  • the semantics of the independent codec area description data box syntax can be seen in Table 2 below:
  • An independent codec area corresponds to a coordinate system identification field coordinate_id.
  • An independent codec area corresponds to a block video, then N independent codec areas correspond to N block videos, and N independent codec areas correspond to N coordinate system identification fields.
  • the coordinate system identification field of the i-th independent codec area indicates the coordinate system to which the i-th segmented video belongs.
  • the segmented videos of the same resolution belong to the same coordinate system, where i and N are positive integers, and i ⁇ N .
  • An independent codec area corresponds to a height field total_height of a complete video and a width field total_width of a complete video
  • N independent codec areas correspond to N complete video height fields and N complete video width fields.
  • the height field of the complete video in the i-th independent codec area indicates the height of the complete video in the coordinate system to which the i-th segmented video belongs;
  • the width field of the complete video in the i-th independent codec area indicates the i-th segment The width of the complete video in the coordinate system to which the block video belongs. It is understandable that the size of the complete video is indicated by the coordinate system identification field, the height of the complete video and the width of the complete video together.
  • 3An independent codec area corresponds to the vertices of an independent codec area.
  • the abscissa field region_vertex_x and the ordinate field region_vertex_y in the coordinate system of an independent codec area then the vertices of the N independent codec areas corresponding to the N independent codec areas are in the corresponding coordinate system
  • the abscissa and ordinate fields of the vertex of the i-th independent codec area in the coordinate system to which they belong, indicate the abscissa and ordinate of the vertex of the i-th independent codec area.
  • the independent codec area is a rectangular area, and the vertices of the independent codec area may refer to the upper left vertex, the lower left vertex, the upper right vertex, or the lower right vertex of the rectangular area.
  • N independent codec areas correspond to the height field of N complete videos and the width field of N complete videos .
  • the height field of the i-th independent codec area indicates the height of the i-th independent codec area; the width field of the i-th independent codec area indicates the width of the i-th independent codec area.
  • the position of the i-th independent codec area in the coordinate system to which it belongs is determined by the abscissa field, the ordinate field, the height field of the independent codec area and the width field of the independent codec area in the coordinate system of the vertex of the independent codec area.
  • An independent codec area corresponds to a non-independent presentation flag field track_not_alone_flag, and then N independent codec areas correspond to N non-independent presentation flag fields.
  • the non-independent presentation flag field of the i-th independent codec area is a valid value, it indicates that the i-th independent codec area and the independent codec areas in other tracks in the track group to which the i-th independent codec area belongs are presented at the same time ;
  • the non-independent presentation flag field of the i-th independent codec area is an invalid value, it indicates that the i-th independent codec area and the independent codec areas in other tracks in the track group to which the i-th independent codec area belongs can Not presented at the same time.
  • an independent codec area corresponds to a mergeable flag field track_not_mergeable_flag
  • N independent codec areas correspond to N mergeable flag fields.
  • the merge flag field of the i-th independent codec area is an invalid value, it indicates that the code stream contained in the track to which the i-th independent codec area belongs can be included in other tracks in the track group to which the i-th independent codec area belongs
  • the merge flag field of the i-th independent codec area is a valid value, it indicates that the code stream contained in the track to which the i-th independent codec area belongs cannot be combined with the track group to the i-th independent codec area Combine the code streams contained in other tracks in the.
  • N independent codec areas correspond to N track priority information flag fields.
  • the track priority information flag field of the i-th independent codec area is an invalid value, it indicates that the priority of each independent codec area in the track group to which the i-th independent codec area belongs is the same; when the i-th independent codec area
  • the track priority information flag field of the region is a valid value, the priority of the i-th independent codec area is indicated by the track priority field track_priority. The smaller the value of the track priority field, the priority of the i-th independent codec area Higher.
  • the priority of the i-th independent codec area is higher than the priority of the j-th independent codec area, where j It is a positive integer, j ⁇ N and j ⁇ i.
  • N independent codec areas correspond to N track overlap information flag fields.
  • the track overlap information flag field of the i-th independent codec area is an invalid value, it indicates that the i-th independent codec area is not independent of other tracks in the track group to which the i-th independent codec area is displayed.
  • the codec areas overlap; when the track overlap information flag field of the i-th independent codec area is a valid value, the display mode of the i-th independent codec area is indicated by the background flag field background_flag.
  • the background flag field When the background flag field is an invalid value, it indicates that the i-th independent codec area is displayed as the foreground picture of the independent codec area in other tracks in the track group to which the i-th independent codec area belongs; when the background flag field is valid When the value is set, the i-th independent codec area is displayed as the background picture of the independent codec area in other tracks in the track group to which the i-th independent codec area belongs.
  • the transparency field opacity of the i-th independent codec area indicates The transparency of the background pictures of the independent codec areas in other tracks in the track group to which the i-th independent codec area is displayed.
  • the i-th independent codec area is displayed as a transparent background picture; when the value of the transparent field is greater than 0, the i-th independent codec area is displayed as a non-transparent background picture.
  • the description information corresponding to the independent codec area description data box is stored in the independent codec area description signaling file provided in this embodiment of the application, and the independent codec area description signaling file is encapsulated in the media presentation description file of immersive media In the adaptive set hierarchy.
  • the independent codec area description signaling file shall contain the elements and attributes defined in Table 3 below.
  • the independent codec region description signaling file in the embodiment of the present application includes elements and attributes: IndependentlyCodedRegionGroupId, IndependentlyCodedRegionGroupId@coordinateId, IndependentlyCodedRegionGroupId@trackPriority, IndependentlyCodedRegionGroupId@backgroundFlag and related descriptions of these elements and attributes.
  • the content production device will track the multiple blocks of video in the same video Stored in the same track group, it can support more current mainstream immersive media perspective adaptive transmission technology. Make the video transmission process more reliable. At the same time, unnecessary memory overhead caused by content production equipment storing different versions of videos is also avoided. By generating the corresponding independent codec area description data box, the content playback device is more convenient when presenting immersive media.
  • Fig. 2 shows a flowchart of an immersive media data processing method provided by an embodiment of the present application; the method can be executed by a content production device or a content playback device in an immersive media system, and the method includes the following steps S201-S202:
  • an independent codec area description data box of the i-th independent codec area of the immersive media where the i-th independent codec area corresponds to the i-th block video;
  • the independent codec area description data box includes independent codec area data Box and coordinate information data box, where i and N are positive integers, and i ⁇ N.
  • the immersive media includes N block videos, N blocks are encapsulated into N tracks, and the i-th block video is encapsulated in the i-th track; the N tracks belong to the same track group.
  • the syntax of the independent codec area description data box of the immersive media can be referred to Table 2 above.
  • the coordinate information data box is used to indicate the coordinate system used by the segmented video of different resolutions, that is, the value of each field in the coordinate information data box is used by the segmented video of different resolutions after the immersive media is divided.
  • the coordinate system is configured; for example, the coordinate system used by the block video 1 ⁇ the block video 6 with a resolution of 4K (4096 ⁇ 2160 pixels) is the coordinate system 1, and the block video 7 ⁇ the block video with a resolution of 2K
  • the coordinate system used in 12 is coordinate system 2.
  • the independent codec area data box is used to indicate the coordinate information of each block video (such as the size of the block video, its position in the coordinate system, etc.) and the display mode and display mode of each block video in the immersive media It may include, but is not limited to: whether to display independently, whether to overlap with other block videos when displayed, and the transparency of the block video when displaying, etc.
  • an independent codec area description signaling file can also be generated according to the encapsulation process of N block videos of immersive media.
  • the independent codec area description signaling file includes the description information of the independent codec area description data box. .
  • the syntax of the independent codec area description signaling file can be found in Table 3.
  • the media presentation description file of the immersive media can be obtained before the packaged file of the immersive media is obtained, and then the independent codec area description signaling is obtained from the adaptation set level in the media presentation description file.
  • the content playback device requests the packaging file corresponding to the immersive media from the content production device according to user requirements (such as the user's current perspective) and the independent codec area description signaling file.
  • N pieces of video (with the same resolution or different resolutions) belonging to the same immersive media are encapsulated into N tracks, and the N tracks are encapsulated into the same track group; and each piece of The concept of the independent codec area corresponding to the block video uses the independent codec area description data box of the i-th independent codec area to indicate the consumption relationship between the i-th track and other tracks in the track group.
  • an independent codec area description data box of an independent codec area is used to display the i-th independent codec area, a more convenient and accurate presentation effect can be obtained.
  • FIG. 3 shows a flowchart of another immersive media data processing method provided by an embodiment of the present application; this method is executed by a content production device in an immersive media system, and the method includes the following steps S301-S303:
  • the basis for division includes at least one of the following: space, perspective, resolution, etc.; for example, according to the user’s perspective, the immersive media is divided into four areas: front, back, left, and right, and then according to the division rules (such as preset independent codecs)
  • the size of the region, or the number of independent coding and decoding regions) further divide the regions corresponding to different views to obtain N block videos.
  • the resolution of the block video in each track in the track group is the same.
  • the resolution of the block video in the i-th track in the track group is different from the resolution of the block video in the j-th track, where j is a positive integer, j ⁇ N and j ⁇ i ; That is, the tracks corresponding to different spatial blocks and different resolution videos of the same video content are stored in the same track group.
  • the independent codec area description data box includes an independent codec area data box and a coordinate information data box.
  • the process of generating the independent codec area description data box of the i-th independent codec area in step S303 may include the following (1)-(8):
  • the coordinate information data box includes a coordinate system identification field coordinate_id.
  • An independent codec area corresponds to a coordinate system identification field, the coordinate system to which the i-th independent codec area belongs is determined according to the resolution of the i-th segmented video, and the i-th independent codec area is configured according to the identifier of the coordinate system The value of the coordinate system identification field.
  • the resolution of the i-th block video of the immersive media is the same as the resolution of the j-th block video
  • the i-th independent codec area and the j-th independent codec area belong to the same coordinate system, where j is positive Integer, j ⁇ N and j ⁇ i.
  • the coordinate information data box includes the height field total_height of the complete video and the width field total_width of the complete video.
  • An independent codec area corresponds to a height field of a complete video and a width field of a complete video.
  • the complete video is composed of block videos corresponding to all independent codec areas in the coordinate system to which the i-th independent codec area belongs. Obtain the height and width of the complete video composed of the block videos corresponding to all the independent codec areas under the coordinate system to which the i-th independent codec area belongs, and configure the obtained complete video height as the value of the height field of the complete video, and set The width of the obtained complete video is configured as the value of the width field of the complete video.
  • the independent codec area data box includes the abscissa field region_vertex_x and the ordinate field region_vertex_y of the independent codec area, and an independent codec area corresponds to one abscissa field and one ordinate field.
  • the independent codec area is a rectangular area, and the vertices of the independent codec area may refer to the upper left vertex, the lower left vertex, the upper right vertex, or the lower right vertex of the rectangular area.
  • the independent codec area data box includes the height field region_height of the independent codec area and the width field region_width of the independent codec area.
  • An independent codec area corresponds to the height field of an independent codec area and the width of an independent codec area.
  • Field. Obtain the height and width of the i-th independent codec area, configure the obtained height of the i-th independent codec area as the value of the height field of the i-th independent codec area, and the obtained i-th independent codec area
  • the width of the codec area is configured as the value of the width field of the i-th independent codec area.
  • the independent codec area data box includes a non-independent presentation flag field track_not_alone_flag of an independent codec area, and an independent codec area corresponds to a non-independent presentation flag field. If the i-th independent codec area is presented simultaneously with independent codec areas in other tracks in the track group to which the i-th independent codec area belongs, the non-independent presentation flag field of the i-th independent codec area is configured as Valid value; if the i-th independent codec area and independent codec areas in other tracks in the track group to which the i-th independent codec area can be presented at the same time, then the independent codec area of the i-th independent codec area The presentation flag field is configured with an invalid value.
  • the independent codec area data box includes the mergeable flag field track_not_mergeable_flag of the independent codec area, and one independent codec area corresponds to a mergeable flag field. If the code stream contained in the track to which the i-th independent codec area belongs can be directly merged with code streams contained in other tracks in the track group to which the i-th independent codec area belongs (that is, the inter-track coding method is the same), then the The merge flag field of the i independent codec area is configured as an invalid value; if the code stream contained in the track to which the i-th independent codec area belongs cannot be the same as the codes contained in other tracks in the track group to the i-th independent codec area Streams are directly merged (that is, the encoding methods are different between tracks), and the merge flag field of the i-th independent codec area is configured as a valid value.
  • the independent codec area data box includes the track priority information flag field track_priority_info_flag of the independent codec area, and one independent codec area corresponds to one track priority information flag field. If the priority of each independent codec area in the track group to which the i-th independent codec area belongs is the same, the track priority information flag field of the i-th independent codec area is configured as an invalid value. If the priority of each independent codec area in the track group to which the i-th independent codec area belongs is not the same, the track priority information flag field of the i-th independent codec area is configured as a valid value.
  • the independent codec area data box further includes the track priority field track_priority of the i-th independent codec area.
  • the priority of the i-th independent codec area is determined by at least one of the following: the resolution of the i-th independent codec area, the presentation priority of the track to which the i-th independent codec area belongs, and the i-th independent codec area belongs to The transmission priority of the track.
  • the priority of the i-th independent codec area is configured as the track priority field of the i-th independent codec area.
  • the higher the resolution of the i-th independent codec area the smaller the value of the track priority field of the configured i-th independent codec area; in the same way, the i-th independent codec area
  • the higher the presentation priority of the track to which it belongs the smaller the value of the track priority field of the configured i-th independent codec area; the higher the transmission priority of the track to which the i-th independent codec area belongs, the higher the transmission priority of the configured i-th independent codec area.
  • the independent codec area data box includes the track overlap information flag field track_overlap_info_flag of the independent codec area, and one independent codec area corresponds to one track overlap information flag field. If it is required that the i-th independent codec area does not overlap with independent codec areas in other tracks in the track group to which the i-th independent codec area belongs, then the i-th independent codec area track overlap information flag field is configured as Invalid value. If the i-th independent codec area is required to overlap with the j-th independent codec area in the track group to which the i-th independent codec area belongs, the i-th independent codec area track overlap information flag field is configured as a valid value , Where j is a positive integer and j ⁇ i.
  • the independent codec area data box also includes the i-th independent codec area The background flag field background_flag. If the i-th independent codec area is required to be displayed as the foreground picture of the j-th independent codec area in the track group to which the i-th independent codec area belongs, then the background flag field of the i-th independent codec area is configured as Invalid value.
  • the background flag field of the i-th independent codec area is configured to be valid value.
  • the independent codec area data box also includes the i-th independent codec The opacity field of the area.
  • the i-th independent codec area is required to be displayed as a transparent background picture, configure the value of the transparency field of the i-th independent codec area to be 0; if the i-th independent codec area is required to be displayed as a non-transparent background picture , The value of the transparency field of the i-th independent codec area is configured according to the transparency of the i-th independent codec area, where the value of the transparency field of the i-th independent codec area is greater than or equal to 0. It should be noted that two different independent codec regions presented as foreground pictures cannot overlap each other.
  • the independent codec area description signaling file can be generated according to the encapsulation process of the N block videos of immersive media, and the independent codec area description signaling file includes the description information of the independent codec area description data box.
  • the grammar of the independent codec area description signaling file can be found in Table 3.
  • the configuration method of each field in the independent codec area description signaling file can refer to the configuration method of the corresponding field in the independent codec area description data box, which is no longer here. Go into details.
  • the content production equipment sends the independent codec region description signaling file to the user, where: IndependentlyCodedRegionGroupId is configured to be 1; IndependentlyCodedRegionGroupId@coordinateId is configured to be 1; because the resolutions of tracks 1 to 6 are the same, their priorities are the same, and they are all set as The foreground is present, so IndependentlyCodedRegionGroupId@trackPriority and IndependentlyCodedRegionGroupId@backgroundFlag are not included in the independent codec region description signaling file.
  • the content playback device requests the content production device for the video files corresponding to track 2 and track 5.
  • the content production device packs track 2 and track 5 into a packaged file of immersive media, and transmits it to the content playback device.
  • the track of the file contains the aforementioned coordinate information data box and the independent codec area data box.
  • track 7 to track 12 correspond to another coordinate system.
  • the coordinate information data box of track 1 to track 6 is obtained.
  • the coordinate information data box of track 7 to track 12 is obtained.
  • the coordinate information data boxes of track 1 to track 6 are the same, and the coordinate information data boxes of track 7 to track 12 are the same.
  • the origin (0,0) of all coordinate systems is the upper left corner of the video frame
  • the x-axis goes from left to right
  • the y-axis goes from top to bottom.
  • the coordinates of the upper left vertex of each independent codec area in the independent codec area data box corresponding to the independent codec area in track 1 to track 6 are: (0,0), (200,0), (400,0), ( 0,100), (200,100), (400,100)
  • the content production equipment sends the independent codec region description signaling file to the user.
  • IndependentlyCodedRegionGroupId takes the value 1; IndependentlyCodedRegionGroupId@coordinateId takes the value 1; IndependentlyCodedRegionGroupId@ The trackPriority value is 0; IndependentlyCodedRegionGroupId@backgroundFlag is not included in the independent codec region description signaling file.
  • IndependentlyCodedRegionGroupId takes the value 1; IndependentlyCodedRegionGroupId@coordinateId takes the value 2; IndependentlyCodedRegionGroupId@trackPriority takes the value 1; IndependentlyCodedRegionGroupId@backgroundFlag takes the value 1.
  • the content playback device requests track 2, track 5, and video files corresponding to track 7, and track 10 from the content production device.
  • the content production device packs track 2, track 5, track 7, and track 10 into packaged files of immersive media, and transmits them to the content playback device.
  • the packaged file contains two different resolution videos, and the low-resolution independent codec area is presented as the background of the high-resolution independent codec area, therefore: due to the resolution of the complete video corresponding to track 1 to track 6 It is higher, so the value of track_priority_info_flag is 1, and the track_priority values corresponding to tracks 1 to 6 are smaller and the same, which is assumed to be 0, and the track_priority values corresponding to tracks 7 to 12 are larger, which is assumed to be 1. Since the high-resolution independent codec area may overlap with the low-resolution independent codec area, for track 1 to track 12, the value of track_overlap_info_flag is all 1.
  • Track 1 to track 6 are presented as foreground images, so the value of background_flag is 0.
  • Tracks 7 to 12 are presented as background images, so the value of background_flag is 1, and assuming that the transparency of the overlapping part is 100%, the value of opacity is 0.
  • the track of the file contains the aforementioned coordinate information data box and the independent codec area data box.
  • N pieces of video (with the same resolution or different resolutions) belonging to the same immersive media are packaged into N tracks, and the N tracks are packaged into the same track group; this can be applied to more Multiple transmission scenarios, such as adaptive transmission scenarios for viewing angles of immersive media; and make the transmission process of immersive media more reliable, and also avoid unnecessary memory overhead caused by content production equipment when storing different versions of videos.
  • the concept of independent codec area corresponding to each block video is introduced, and the independent codec area description data box of the i-th independent codec area is generated according to the encapsulation process of the i-th block video.
  • the independent codec area description data box of the decoding area indicates the consumption relationship between the i-th track and other tracks in the track group; then, when the independent codec area description data box is transmitted to the content consumption device side, the content consumption The device side can display the i-th independent codec area according to the independent codec area description data box of the i-th independent codec area, so that a more convenient and accurate presentation effect can be obtained.
  • FIG. 5 shows a flowchart of another immersive media data processing method provided by an embodiment of the present application; the method is executed by the content playback device in the immersive media system, and the method includes the following steps S501-S503:
  • the immersive media includes N pieces of video, the N pieces are encapsulated into N tracks, and the i-th piece of video is encapsulated into the i-th track; the N tracks belong to the same A track group; the i-th block video corresponds to the i-th independent codec area; the packaged file includes at least the i-th track, and the i-th track contains the independent codec area description data box of the i-th independent codec area, Where i and N are positive integers, and i ⁇ N.
  • the packaging file of the immersive media is obtained by packaging and packaging one or more tracks in the same track group.
  • the packaging strategy of the packaged file is preset by the content producer of the immersive media (for example, according to the plot of the immersive media).
  • the packaging strategy of the packaged file is dynamically set according to the request of the content playback device (for example, according to different user perspectives).
  • S502 Unpack the packaged file to obtain an independent codec area description data box of the i-th independent codec area, where the independent codec area description data box includes an independent codec area data box and a coordinate information data box.
  • the content playback device decapsulates the packaged file to obtain one or more tracks in the packaged file and an independent codec area description data box corresponding to each track.
  • S503 Display the i-th block video of the immersive media according to the independent codec area description data box.
  • the process of displaying the i-th segmented video of the immersive media according to the independent codec area description data box in step S503 may include the following (1)-(8):
  • the coordinate information data box includes the coordinate system identification field coordinate_id, an independent codec area corresponds to a coordinate system identification field, and the coordinate to which the i-th independent codec area belongs is determined according to the coordinate system identification field of the i-th independent codec area Tie.
  • the resolution of the i-th block video of the immersive media is the same as the resolution of the j-th block video
  • the i-th independent codec area and the j-th independent codec area belong to the same coordinate system, where j is positive Integer, j ⁇ N and j ⁇ i.
  • the coordinate information data box includes the height field total_height of the complete video and the width field total_width of the complete video.
  • An independent codec area corresponds to a height field of a complete video and a width field of a complete video, and an independent codec area corresponds to one Chunked video.
  • the complete video is composed of block videos corresponding to all independent codec areas in the coordinate system to which the i-th independent codec area belongs. Determine the size of the complete video in the coordinate system to which the i-th segmented video belongs according to the height field of the complete video and the width field of the complete video in the coordinate system to which the i-th segmented video belongs.
  • the independent codec area data box includes the abscissa field region_vertex_x and the ordinate field region_vertex_y of the independent codec area, and an independent codec area corresponds to one abscissa field and one ordinate field.
  • the independent codec area is a rectangular area, and the vertices of the independent codec area may refer to the upper left vertex, the lower left vertex, the upper right vertex, or the lower right vertex of the rectangular area.
  • the independent codec area data box includes the height field region_height of the independent codec area and the width field region_width of the independent codec area.
  • An independent codec area corresponds to the height field of an independent codec area and the width of an independent codec area.
  • Field. The size of the i-th independent codec area is determined according to the height field and the width field of the i-th independent codec area.
  • the independent codec area data box includes a non-independent presentation flag field track_not_alone_flag of an independent codec area, and an independent codec area corresponds to a non-independent presentation flag field.
  • the non-independent presentation flag field of the i-th independent codec area is a valid value
  • the i-th independent codec area and the independent codec areas in other tracks in the track group to which the i-th independent codec area belongs are presented at the same time .
  • the non-independent presentation flag field of the i-th independent codec area is an invalid value
  • the independent codec areas in the i-th independent codec area and other tracks in the track group to which the i-th independent codec area belongs may be different.
  • the independent codec area data box includes the mergeable flag field track_not_mergeable_flag of the independent codec area, and an independent codec area corresponds to a mergeable flag field.
  • the merge flag field of the i-th independent codec area is an invalid value
  • the code stream contained in the track to which the i-th independent codec area belongs can be the same as the code stream contained in other tracks in the track group to which the i-th independent codec area belongs.
  • Streams are merged directly.
  • the merge flag field of the i-th independent codec area is a valid value
  • the code stream contained in the track to which the i-th independent codec area belongs cannot be the same as the code stream contained in other tracks in the track group to the i-th independent codec area. Streams are merged directly.
  • the independent codec area data box includes the track priority information flag field track_priority_info_flag of the independent codec area, and one independent codec area corresponds to one track priority information flag field.
  • track priority information flag field of the i-th independent codec area is an invalid value
  • the priority of each independent codec area in the track group to which the i-th independent codec area belongs is the same.
  • the independent codec area data box further includes the track priority field track_priority of the i-th independent codec area.
  • the transmission priority of the track is higher.
  • the independent codec area data box includes the track overlap information flag field track_overlap_info_flag of the independent codec area, and one independent codec area corresponds to one track overlap information flag field.
  • track overlap information flag field of the i-th independent codec area is an invalid value, the i-th independent codec area will not be displayed with the independent codec in other tracks in the track group to which the i-th independent codec area belongs.
  • Area overlap When the track overlap information flag field of the i-th independent codec area is a valid value, the i-th independent codec area and the j-th independent codec area in the track group to which the i-th independent codec area belongs are overlapped and displayed, where j is a positive integer, and j ⁇ i.
  • the independent codec area data box further includes the background flag field background_flag of the i-th independent codec area.
  • the background flag field of the i-th independent codec area is an invalid value
  • the i-th independent codec area is displayed as the foreground picture of the j-th independent codec area in the track group to which the i-th independent codec area belongs.
  • the background flag field of the i-th independent codec area is a valid value
  • the i-th independent codec area is displayed as the background picture of the j-th independent codec area in the track group to which the i-th independent codec area belongs.
  • the independent codec area data box further includes the transparency field opacity of the i-th independent codec area.
  • the i-th independent codec area is displayed as a transparent background picture.
  • the i-th independent codec area is displayed as a non-transparent background picture, and the transparency of the i-th independent codec area is based on the value of the i-th independent codec area. The value of the transparency field is determined. It should be noted that two different independent codec regions presented as foreground pictures cannot overlap each other.
  • the value range of the transparency field of the i-th independent codec area is [0, 100], a value of 0 indicates that the background image is completely transparent, a value of 100 indicates that the background image is completely opaque, and greater than 100 The value of is retained.
  • the content playback device may first obtain the MPD file of the immersive media, and then obtain the independent codec area description signaling file from the adaptation set level in the media presentation description file.
  • the content playback device requests the packaging file corresponding to the immersive media from the content production device according to the user needs (such as the user's current perspective) and the independent codec area description signaling file, and displays the immersion according to the implementation of the above steps (1) to (8) media.
  • the content playback device decapsulates the packaged file of the immersive media received. Because track 2 and track 5 belong to the same track group, and the track group type is'icrr', the content playback device learns Track 2 and Track 5 contain two independent codec areas. The content playback device decodes track 2 and track 5 separately, and then presents and consumes the video content according to the coordinate information in the independent codec area description data box.
  • the content playback device decapsulates the packaged file of the received immersive media, because track 2, track 5, track 7, and track 10 all belong to the same track group, and the track group type is'icrr ', the client thus learns that track 2, track 5, track 7, and track 10 contain four independent codec areas.
  • track 2 and track 5 are in the same coordinate system
  • track 7 and track 10 are in another coordinate system. Since the background_flag of track 2 and track 5 has a value of 0, it is presented as a foreground picture.
  • track 7 and track 10 are presented as background images.
  • the content playback device decodes track 2, track 5, track 7, and track 10 respectively, and then presents and consumes the video content according to the coordinate information in the independent codec area description data box.
  • N pieces of video (with the same resolution or different resolutions) belonging to the same immersive media are encapsulated into N tracks, and the N tracks are encapsulated into the same track group; and each piece of The concept of the independent codec area corresponding to the block video uses the independent codec area description data box of the i-th independent codec area to indicate the consumption relationship between the i-th track and other tracks in the track group.
  • an independent codec area description data box of an independent codec area is used to display the i-th independent codec area, a more convenient and accurate presentation effect can be obtained.
  • FIG. 6 shows a schematic structural diagram of an immersive media data processing apparatus provided by an embodiment of the present application
  • the immersive media data processing apparatus may be a computer program (including program Code), for example, the data processing device of the immersive media may be an application software in the content production equipment.
  • the data processing device of the immersive media includes an acquiring unit 601 and a processing unit 602.
  • the immersive media includes N block videos, the N blocks are encapsulated into N tracks, and the i-th block video is encapsulated in the i-th track;
  • the N tracks belong to the same track group; the data processing device of the immersive media can be used to execute the corresponding steps in the method shown in FIG. 2; then:
  • the obtaining unit 601 is configured to obtain an independent codec area description data box of the i-th independent codec area of the immersive media, where the i-th independent codec area corresponds to the i-th block video; the independent codec The area description data box includes an independent codec area data box and a coordinate information data box, where i and N are positive integers, and i ⁇ N;
  • the processing unit 602 is configured to display the i-th segmented video of the immersive media according to the independent codec area description data box.
  • the acquiring unit 601 is further configured to:
  • the independent codec area description signaling file includes the description information of the independent codec area description data box of the i-th independent codec area.
  • the data processing device of the immersive media can be used to execute the corresponding steps in the method shown in FIG. 3; then:
  • the processing unit 602 is configured to divide the immersive media into N pieces of video; and,
  • the i-th block video is encapsulated in the i-th track;
  • the i-th block video corresponds to the i-th independent codec area, where i, N Is a positive integer, and i ⁇ N; the N tracks belong to the same track group;
  • the independent codec area description data box includes an independent codec area data box and a coordinate information data box .
  • the coordinate information data box includes the coordinate system identification field of the i-th independent codec area; the processing unit 602 is further configured to generate the i-th independent codec according to the encapsulation process of the i-th block video.
  • processing unit 602 is further configured to:
  • the coordinate information data box includes the size field of the complete video in the coordinate system to which the i-th independent codec area belongs; the size field of the complete video includes the height field of the complete video and the The width field of the complete video;
  • the processing unit 602 is further configured to generate an independent codec area description data box of the i-th independent codec area according to the encapsulation process of the i-th block video;
  • processing unit 602 is further configured to:
  • the height of the acquired complete video is configured as the value of the height field of the complete video
  • the width of the acquired complete video is configured as the value of the width field of the complete video
  • the independent codec area data box includes the vertex coordinate field of the i-th independent codec area in the coordinate system to which it belongs and the size field of the i-th independent codec area
  • the vertex coordinate field includes The abscissa field of the i-th independent codec area in the coordinate system to which it belongs and the ordinate field of the i-th independent codec area in the coordinate system to which it belongs.
  • the size field includes the height field of the i-th independent codec area and The width field of the i-th independent codec area;
  • the processing unit 602 is further configured to generate an independent codec area description data box of the i-th independent codec area according to the encapsulation process of the i-th block video;
  • processing unit 602 is further configured to:
  • the height of the obtained i-th independent codec area is configured as the value of the height field of the i-th independent codec area
  • the width of the obtained i-th independent codec area is configured as the i-th The value of the width field of an independent codec area.
  • the independent codec area data box includes a non-independent presentation flag field of the i-th independent codec area
  • the processing unit 602 is further configured to generate an independent codec area description data box of the i-th independent codec area according to the encapsulation process of the i-th block video;
  • processing unit 602 is further configured to:
  • the non-independent presentation flag field of the i-th independent codec area is configured to be valid value.
  • the independent codec area data box includes a merge flag field of the i-th independent codec area
  • the processing unit 602 is further configured to generate an independent codec area description data box of the i-th independent codec area according to the encapsulation process of the i-th block video;
  • processing unit 602 is further configured to:
  • the merge flag of the i-th independent codec area is configured with an invalid value.
  • the independent codec area data box includes a track priority information flag field of the i-th independent codec area
  • the processing unit 602 is further configured to generate an independent codec area description data box of the i-th independent codec area according to the encapsulation process of the i-th block video;
  • processing unit 602 is further configured to:
  • the independent codec area data box further includes the track priority field of the i-th independent codec area;
  • the processing unit 602 is further configured to generate an independent codec area description data box of the i-th independent codec area according to the encapsulation process of the i-th block video;
  • processing unit 602 is further configured to:
  • the priority of the i-th independent codec area is configured as the value of the track priority field.
  • the independent codec area data box includes the track overlap information flag field of the i-th independent codec area; the processing unit 602 is further configured to generate the i-th block video according to the encapsulation process of the i-th block video.
  • processing unit 602 is further configured to:
  • the i-th independent codec area does not overlap with independent codec areas in other tracks in the track group to which the i-th independent codec area belongs, configure the track overlap information flag field to an invalid value;
  • the i-th independent codec area is required to overlap with independent codec areas in other tracks in the track group to which the i-th independent codec area belongs, configure the track overlap information flag field to a valid value;
  • the independent codec area data box further includes the background flag field of the i-th independent codec area; the encapsulation according to the i-th block video
  • the process of generating the independent codec area description data box of the i-th independent codec area also includes:
  • the i-th independent codec area is required to be displayed as a foreground picture of an independent codec area in other tracks in the track group to which the i-th independent codec area belongs, configure the background flag field to an invalid value;
  • the background flag field is configured as a valid value.
  • the independent codec area data box when the background flag field is configured as a valid value, the independent codec area data box further includes a transparency field of the i-th independent codec area, and the value of the transparency field Greater than or equal to 0; the processing unit 602 is further configured to generate an independent codec area description data box of the i-th independent codec area according to the encapsulation process of the i-th block video;
  • processing unit 602 is further configured to:
  • the value of the transparent field is configured to be 0;
  • the value of the transparent field is configured according to the transparency of the i-th independent codec area.
  • processing unit 602 is further configured to:
  • an independent codec area description signaling file is generated, and the independent codec area description signaling file is encapsulated in the media presentation description file of the immersive media.
  • the set hierarchy In the set hierarchy;
  • the independent codec area description signaling file includes the description information of the independent codec area description data box of the i-th independent codec area.
  • FIG. 7 shows a schematic structural diagram of another immersive media data processing apparatus provided by an embodiment of the present application
  • the immersive media data processing apparatus may be a computer program (including Program code), for example, the data processing device of the immersive media may be an application software in a content playback device.
  • the data processing device of the immersive media includes an acquiring unit 701 and a processing unit 702.
  • the immersive media includes N segmented videos, the N segmented videos are respectively encapsulated into N tracks, and the i-th segmented video is encapsulated in the i-th track;
  • the N tracks belong to the same track group; the data processing device of the immersive media can be used to execute the corresponding steps in the method shown in FIG. 2; then:
  • the obtaining unit 701 is configured to obtain the independent codec area description data box of the i-th independent codec area of the immersive media, where the i-th independent codec area corresponds to the i-th block video; the independent codec The area description data box includes an independent codec area data box and a coordinate information data box, where i and N are positive integers, and i ⁇ N;
  • the processing unit 702 is configured to display the i-th segmented video of the immersive media according to the independent codec area description data box.
  • the acquiring unit 701 is further configured to:
  • the independent codec area description signaling file includes the description information of the independent codec area description data box of the i-th independent codec area.
  • the data processing device of the immersive media can be used to execute the corresponding steps in the method shown in FIG. 5; then:
  • the obtaining unit 701 is configured to obtain a packaged file of an immersive media, the immersive media including N segmented videos, the N segmented videos are respectively encapsulated into N tracks, and the i-th segmented video is encapsulated in the th i tracks; the N tracks belong to the same track group; the i-th block video corresponds to the i-th independent codec area; the packed file includes at least the i-th track, and the i-th track contains the i-th track An independent codec area description data box of an independent codec area, where i and N are positive integers, and i ⁇ N;
  • the processing unit 702 is configured to perform unpacking processing on the packed file to obtain an independent codec area description data box of the i-th independent codec area, where the independent codec area description data box includes an independent codec area data box and coordinates Information data box; display the i-th segmented video of the immersive media according to the independent codec area description data box.
  • the acquiring unit 701 is further configured to:
  • the independent codec area description signaling file is encapsulated in the adaptation set level in the media presentation description file of the immersive media;
  • the independent codec The area description signaling file includes the description information of the independent codec area description data box of the i-th independent codec area;
  • the configuration is to obtain the packaged files of the immersive media, such as:
  • the units in the immersive media data processing device shown in FIG. 6 and FIG. 7 can be separately or completely combined into one or several other units to form one or several other units, or one (some) of them
  • the unit can be further divided into multiple smaller units to form a functionally smaller unit, which can achieve the same operation without affecting the realization of the technical effect of the embodiment of the present invention.
  • the above-mentioned units are divided based on logical functions.
  • the function of one unit can also be realized by multiple units, or the functions of multiple units can be realized by one unit.
  • the immersive media data processing device may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by multiple units in cooperation.
  • a central processing unit Central Processing Units, CPU
  • a random access storage medium Random Access Memory, RAM
  • a read-only storage medium Read-Only Memory, ROM
  • a computer program is run on a general-purpose computing device such as a computer such as a processing element and a storage element.
  • the computer program When the computer program is executed, it is used to implement the data processing method of the immersive media provided in the embodiments of the present application; the computer program may be recorded in, for example, a computer readable On a recording medium, the computer-readable recording medium is loaded into the above-mentioned computing device and runs in it.
  • the problem-solving principles and beneficial effects of the data processing device for immersive media provided in the embodiments of this application are similar to the principle and beneficial effects of the data processing method for immersed media in the method embodiments of this application. You can refer to the method The principle and beneficial effects of the implementation are described briefly and will not be repeated here.
  • FIG. 8 shows a schematic structural diagram of a content production device provided by an embodiment of the present application
  • the content production device may refer to a computer device used by a provider of immersive media, and the computer device may be a terminal (such as a PC, a smart mobile device, etc.). Devices (such as smart phones, etc.) or servers.
  • the content production device includes a capture device 801, a processor 802, a memory 803, and a transmitter 804. in:
  • the capture device 801 is configured to collect real-world sound-visual scenes to obtain raw data of immersive media (including audio content and video content that are synchronized in time and space).
  • the capture device 801 may include, but is not limited to: audio equipment, camera equipment, and sensor equipment.
  • the audio device may include an audio sensor, a microphone, and so on.
  • the camera equipment may include a normal camera, a stereo camera, a light field camera, and the like.
  • Sensing equipment may include laser equipment, radar equipment, and so on.
  • the processor 802 (or CPU) is the processing core of the content production device.
  • the processor 802 is suitable for implementing one or more program instructions, and is suitable for loading and executing one or more program instructions to realize the process shown in FIG. 2 or FIG. 3 The flow of the data processing method of immersive media.
  • the memory 803 is a memory device in the content production device, and is configured to store programs and media resources. It can be understood that the memory 803 here may include a built-in storage medium in the content production device, or of course, may also include an extended storage medium supported by the content production device. It should be noted that the memory may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; or at least one memory located far away from the aforementioned processor. The memory provides storage space for storing the operating system of the content production device.
  • the storage space is also used to store a computer program
  • the computer program includes program instructions
  • the program instructions are adapted to be called and executed by the processor to execute the steps of the data processing method of the immersive media.
  • the memory 803 may also be configured to store an immersive media file formed after processing by the processor, and the immersive media file includes media file resources and media presentation description information.
  • the transmitter 804 is configured to implement the transmission interaction between the content production device and other devices, for example, implement the transmission of immersive media between the content production device and the content playback device. That is, the content production device transmits the relevant media resources of the immersive media to the content playback device through the transmitter 804.
  • the processor 802 may include a converter 821, an encoder 822, and a encapsulator 823; among them:
  • the converter 821 is configured to perform a series of conversion processing on the captured video content, so that the video content becomes content suitable for performing video encoding of the immersive media.
  • the conversion process may include: splicing and projection. In practical applications, the conversion process also includes area encapsulation.
  • the converter 821 can convert the captured 3D video content into a 2D image, and provide it to the encoder for video encoding.
  • the encoder 822 is configured to perform audio encoding on the captured audio content to form an audio code stream of the immersive media. It is also used to perform video encoding on the 2D image converted by the converter 821 to obtain a video stream.
  • the encapsulator 823 is configured to encapsulate the audio code stream and the video code stream in a file container according to the file format of the immersive media (such as ISOBMFF) to form the media file resource of the immersive media.
  • the media file resource can be a media file or a media fragment to form an immersive media.
  • the encapsulated file of the immersive media processed by the encapsulator will be stored in the memory and provided to the content playback device for presentation of the immersive media as needed.
  • the immersive media includes N segmented videos, the N segmented videos are respectively encapsulated into N tracks, and the i-th segmented video is encapsulated in the i-th track;
  • the N tracks belong to the same track group;
  • the processor 802 (that is, each device included in the processor) executes the steps of the immersive media data processing method shown in FIG. 2 by calling one or more instructions in the memory.
  • the memory 803 stores one or more first instructions, and the one or more first instructions are suitable for being loaded by the processor 802 and executing the following steps:
  • the independent codec area description data box of the i-th independent codec area of the immersive media, where the i-th independent codec area corresponds to the i-th block video;
  • the independent codec area description data box includes independent Codec area data box and coordinate information data box, where i and N are positive integers, and i ⁇ N;
  • the processor executes the steps of the immersive media data processing method shown in FIG. 3 by calling one or more instructions in the memory 803.
  • the memory stores one or more second instructions, and the one or more second instructions are suitable for being loaded by the processor 802 and executing the following steps:
  • the i-th piece of video is encapsulated in the i-th track; the i-th piece of video corresponds to the i-th independent codec area, where i, N are positive Integer, and i ⁇ N; the N tracks belong to the same track group;
  • an independent codec area description data box of the i-th independent codec area is generated, and the independent codec area description data box includes an independent codec area data box and a coordinate information data box.
  • Figure 9 shows a schematic structural diagram of a content playback device provided by an exemplary embodiment of the present application
  • the content playback device may refer to a computer device used by a user of immersive media, and the computer device may be a terminal (such as a PC).
  • Smart mobile devices such as smart phones
  • VR devices such as VR helmets, VR glasses, etc.
  • the content playback device includes a receiver 901, a processor 902, a memory 903, and a display/playing device 904. in:
  • the receiver 901 is configured to realize the transmission interaction between decoding and other devices, for example, to realize the transmission of immersive media between a content production device and a content playback device. That is, the content playback device receives the relevant media resources of the immersive media transmitted by the content production device through the receiver 901.
  • the processor 902 (or CPU) is the processing core of the content production device.
  • the processor 902 is suitable for implementing one or more program instructions, and is suitable for loading and executing one or more program instructions to realize the one shown in FIG. 2 or FIG. 5.
  • the flow of the data processing method of immersive media is the processing core of the content production device.
  • the memory 903 is a memory device in the content playback device, and is configured to store programs and media resources. It is understandable that the storage 903 here may include a built-in storage medium in the content playback device, or of course, may also include an extended storage medium supported by the content playback device. It should be noted that the memory 903 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; or at least one memory located far away from the aforementioned processor. The memory 903 provides storage space for storing the operating system of the content playback device.
  • the storage space is also used to store a computer program
  • the computer program includes program instructions
  • the program instructions are adapted to be called and executed by the processor to execute the steps of the data processing method of the immersive media.
  • the memory 903 may also be configured to store the three-dimensional image of the immersive media formed after processing by the processor, the audio content corresponding to the three-dimensional image, and the information required for rendering the three-dimensional image and audio content.
  • the display/playing device 904 is configured to output rendered sounds and three-dimensional images.
  • the processor 902 may include a parser 921, a decoder 922, a converter 923, and a renderer 924; among them:
  • the parser 921 is configured to decapsulate the encapsulated file of the rendered media from the content production device, such as decapsulating the media file resource according to the file format requirements of the immersive media to obtain the audio code stream and the video code stream;
  • the code stream and the video code stream are provided to the decoder 922.
  • the decoder 922 is configured to perform audio decoding on the audio code stream to obtain audio content and provide it to the renderer for audio rendering. In addition, the decoder 922 decodes the video code stream to obtain a 2D image. According to the metadata provided by the media presentation description information, if the metadata indicates that the immersive media has performed the area encapsulation process, the 2D image refers to the encapsulated image; if the metadata indicates that the immersive media has not performed the area encapsulation process, the flat image is Refers to the projected image.
  • the converter 923 is configured to convert a 2D image into a 3D image. If the immersive media has performed the area encapsulation process, the converter 923 will first decapsulate the encapsulated image to obtain the projected image. Then reconstruct the projected image to obtain a 3D image. If the rendering media has not performed the region encapsulation process, the converter 923 will directly reconstruct the projected image to obtain a 3D image.
  • the renderer 924 is configured to render audio content and 3D images of the immersive media.
  • the audio content and 3D image are rendered according to the metadata related to the rendering and the window in the media presentation description information, and the rendering is completed by the display/playing device for output.
  • the immersive media includes N chunked videos, the N chunks are encapsulated into N tracks, and the i-th chunked video is encapsulated in the i-th track; the N pieces The tracks belong to the same track group; the processor 902 (that is, each device included in the processor) executes the steps of the immersive media data processing method shown in FIG. 2 by calling one or more instructions in the memory.
  • the memory stores one or more first instructions, and the one or more first instructions are suitable for being loaded by the processor 902 and executing the following steps:
  • the independent codec area description data box of the i-th independent codec area of the immersive media, where the i-th independent codec area corresponds to the i-th block video;
  • the independent codec area description data box includes independent Codec area data box and coordinate information data box, where i and N are positive integers, and i ⁇ N;
  • the processor 902 (that is, each device included in the processor) executes the steps of the immersive media data processing method shown in FIG. 5 by calling one or more instructions in the memory.
  • the memory 903 stores one or more second instructions, and the one or more second instructions are suitable for being loaded by the processor 902 and executing the following steps:
  • the immersive media including N segmented videos, the N segmented videos are respectively encapsulated into N tracks, and the i-th segmented video is encapsulated in the i-th track;
  • the N tracks belong to the same track group;
  • the i-th block video corresponds to the i-th independent codec area;
  • the packaged file includes at least the i-th track, and the i-th track contains the i-th independent codec area Independent codec area description data box, where i and N are positive integers, and i ⁇ N;

Abstract

本申请实施例提供一种沉浸媒体的数据处理方法、设备、装置及计算机存储介质,其中的方法包括:获取沉浸媒体的第i个独立编解码区域的独立编解码区域描述数据盒,沉浸媒体包括N个分块视频,该N个分块被封装至N个轨道中,第i个分块视频被封装在第i个轨道中;N个轨道属于同一个轨道组;第i个分块视频对应第i个独立编解码区域;独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒,其中i,N为正整数,且i≤N;根据独立编解码区域描述数据盒显示沉浸媒体的第i个分块视频。

Description

沉浸媒体的数据处理方法、装置、设备及计算机存储介质
相关申请的交叉引用
本申请基于申请号为202010501322.X、申请日为2020年06月04日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及计算机技术领域、虚拟现实(Virtual Reality,VR)技术领域,尤其涉及一种沉浸媒体的数据处理方法、装置、设备及计算机可读存储介质。
背景技术
相关技术中,沉浸媒体的内容被划分为多个子图像帧,这些子图像帧按照关联性被封装在多个轨道组中,所谓关联性是指同一轨道组中的多个子图像帧既属于同一沉浸媒体,又具备同样的分辨率。这样的关联性在一定程度上局限了沉浸媒体的封装灵活性,例如,在沉浸媒体的视角自适应传输方案中,为了保证在用户头部运动时能够及时呈现对应画面,传输给用户的视角既包含用户当前观看视角的高清分块视频,又包含用户当前观看视角周边的低清分块视频。这两种视频属于同一视频内容,但属于不同分辨率版本的视频。相关技术中,这两种视频会被封装至不同的轨道组中,这样就很难指示这两个轨道组之间的消费关系,从而给内容播放设备的呈现带来了不便。
发明内容
本申请实施例提供一种沉浸媒体的数据处理方法、装置、设备及计算机可读存储介质,可将属于同一沉浸媒体的不同空间的多个分块视频(具备相同分辨率或不同分辨率)均封装至同一轨道组中,并采用独立编解码区域描述数据盒来指示轨道组中各轨道之间的消费关系,便于沉浸媒体的呈现。
本申请实施例提供一种沉浸媒体的数据处理方法,包括:
获取沉浸媒体的第i个独立编解码区域的独立编解码区域描述数据盒,第i个独立编解码区域对应第i个分块视频;该独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒,其中i,N为正整数,且i≤N;
根据独立编解码区域描述数据盒显示沉浸媒体的第i个分块视频。
本申请实施例将属于同一沉浸媒体的N个分块视频(具备相同分辨率或不同分辨率)封装至N个轨道中,并且该N个轨道被封装至同一轨道组中;同时引入与各个分块视频相对应的独立编解码区域的概念,通过第i个独立编解码区域的独立编解码区域描述数据盒来指示第i个轨道与轨道组中其他轨道之间的消费关系,当根据第i个独立编解码区域的独立编解码区域描述数据盒来对第i个独立编解码区域进行显示时,可以获得更加便捷、准确地呈现效果。
本申请实施例提供一种沉浸媒体的数据处理方法,包括:
将沉浸媒体划分为N个分块视频;
分别将N个分块视频封装至N个轨道中,第i个分块视频被封装在第i个轨道中;第i个分块视频对应第i个独立编解码区域,其中i,N为正整数,且i≤N;N个轨道属于同一个轨道组;
根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒,该独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒。
本申请实施例将属于同一沉浸媒体的N个分块视频(具备相同分辨率或不同分辨率)封装至N个轨道中,并且该N个轨道被封装至同一轨道组中;这样可以适用于更多的传输场景,例如适用于沉浸媒体的视角自适应传输场景;并且使得沉浸媒体的传输过程更加可靠,也避免了内容制作设备在存储不同版本视频时带来的不必要内存开销。同时引入与各个分块视频相对应的独立编解码区域的概念,根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒,通过第i个独立编解码区域的独立编解码区域描述数据盒来指示第i个轨道与轨道组中其他轨道之间的消费关系;那么,当该独立编解码区域描述数据盒被传输至内容消费设备侧时,内容消费设备侧可以根据第i个独立编解码区域的独立编解码区域描述数据盒来对第i个独立编解码区域进行显示,这样可以获得更加便捷、准确地呈现效果。
本申请实施例提供一种沉浸媒体的数据处理方法,包括:
获取沉浸媒体的打包文件,沉浸媒体包括N个分块视频,N个分块视频分别被封装至N个轨道中,第i个分块视频被封装在第i个轨道中;N个轨道属于同一个轨道组;第i个分块视频对应第i个独立编解码区域;打包文件至少包括第i个轨道,第i个轨道中包含第i个独立编解码区域的独立编解码区域描述数据盒,其中i,N为正整数,且i≤N;
对打包文件进行解封处理得到第i个独立编解码区域的独立编解码区域描述数据盒,该独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒;
根据独立编解码区域描述数据盒显示沉浸媒体的第i个分块视频。
本申请实施例将属于同一沉浸媒体的N个分块视频(具备相同分辨率或不同分辨率)封装至N个轨道中,并且该N个轨道被封装至同一轨道组中;同时引入与各个分块视频相对应的独立编解码区域的概念,通过第i个独立编解码区域的独立编解码区域描述数据盒来指示第i个轨道与轨道组中其他轨道之间的消费关系,当根据第i个独立编解码区域的独立编解码区域描述数据盒来对第i个独立编解码区域进行显示时,可以获得更加便捷、准确地呈现效果。
本申请实施例提供一种沉浸媒体的数据处理装置,包括:
获取单元,配置为获取沉浸媒体的第i个独立编解码区域的独立编解码区域描述数据盒,第i个独立编解码区域对应第i个分块视频;该独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒,其中i,N为正整数,且i≤N;
处理单元,配置为根据独立编解码区域描述数据盒显示沉浸媒体的第i个分块视频。
本申请实施例提供另一种沉浸媒体的数据处理装置,包括:
处理单元,配置为将沉浸媒体划分为N个分块视频;分别将N个分块视频封装至N个轨道中,第i个分块视频被封装在第i个轨道中;第i个分块视频对应第i个独立编解码区域,其中i,N为正整数,且i≤N;N个轨道属于同一个轨道组;根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒,该独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒。
本申请实施例提供另一种沉浸媒体的数据处理装置,包括:
获取单元,配置为获取沉浸媒体的打包文件,沉浸媒体包括N个分块视频,N个分块被封装至N个轨道中,第i个分块视频被封装在第i个轨道中;N个轨道属于同一个 轨道组;第i个分块视频对应第i个独立编解码区域;打包文件至少包括第i个轨道,第i个轨道中包含第i个独立编解码区域的独立编解码区域描述数据盒,其中i,N为正整数,且i≤N;
处理单元,配置为对打包文件进行解封处理得到第i个独立编解码区域的独立编解码区域描述数据盒,该独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒;根据独立编解码区域描述数据盒显示沉浸媒体的第i个分块视频。
本申请实施例提供一种沉浸媒体的数据处理设备,包括:
一个或多个处理器和一个或多个存储器,该一个或多个存储器中存储有至少一条程序代码,该至少一条程序代码由该一个或多个处理器加载并执行以实现本申请实施例提供的沉浸媒体的数据处理方法。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有至少一条程序代码,该至少一条程序代码由处理器加载并执行以实现本申请实施例提供的沉浸媒体的数据处理方法。
本申请实施例将属于同一沉浸媒体的N个分块视频(具备相同分辨率或不同分辨率)封装至N个轨道中,并且该N个轨道被封装至同一轨道组中;这样可以适用于更多的传输场景,例如适用于沉浸媒体的视角自适应传输场景;并且使得沉浸媒体的传输过程更加可靠,也避免了内容制作设备在存储不同版本视频时带来的不必要内存开销。同时引入与各个分块视频相对应的独立编解码区域的概念,通过第i个独立编解码区域的独立编解码区域描述数据盒来指示第i个轨道与轨道组中其他轨道之间的消费关系,当根据第i个独立编解码区域的独立编解码区域描述数据盒来对第i个独立编解码区域进行显示时,可以获得更加便捷、准确地呈现效果。
附图说明
图1A示出了本申请实施例提供的一种沉浸媒体系统的架构图;
图1B示出了本申请实施例提供的一种沉浸媒体的传输方案流程图;
图1C示出了本申请实施例提供的一种视频编码基本框图;
图1D示出了本申请实施例提供的6DoF的示意图;
图1E示出了本申请实施例提供的3DoF的示意图;
图1F示出了本申请实施例提供的3DoF+的示意图;
图1G示出了本申请实施例提供的一种输入图像划分示意图;
图2示出了本申请实施例提供的一种沉浸媒体的数据处理方法的流程图;
图3示出了本申请实施例提供的另一种沉浸媒体的数据处理方法的流程图;
图4A示出了本申请实施例提供的一种沉浸媒体传输的应用场景图;
图4B示出了本申请实施例提供的另一种沉浸媒体传输的应用场景图;
图5示出了本申请实施例提供的另一种沉浸媒体的数据处理方法的流程图;
图6示出了本申请实施例提供的一种沉浸媒体的数据处理装置的结构示意图;
图7示出了本申请实施例提供的另一种沉浸媒体的数据处理装置的结构示意图;
图8示出了本申请实施例提供的一种内容制作设备的结构示意图;
图9示出了本申请实施例提供的一种内容播放设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完 整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例涉及沉浸媒体的数据处理技术。所谓沉浸媒体是指能够提供沉浸式的媒体内容,使沉浸于该媒体内容中的用户能够获得现实世界中视觉、听觉等感官体验的媒体文件。在实际应用中,沉浸媒体可以是3DoF(Degree of Freedom)沉浸媒体,3DoF+沉浸媒体或者6DoF沉浸媒体。沉浸媒体内容包括以各种形式在三维(3-Dimension,3D)空间中表示的视频内容,例如以球面形式表示的三维视频内容。在实际应用中,沉浸媒体内容可以是虚拟现实(Virtual Reality,VR)视频内容、全景视频内容、球面视频内容或360度视频内容;所以,沉浸媒体又可称为VR视频、全景视频、球面视频或360度视频。另外,沉浸媒体内容还包括与三维空间中表示的视频内容相同步的音频内容。
图1A示出了本申请实施例提供的一种沉浸媒体系统的架构图;如图1A所示,沉浸媒体系统包括内容制作设备和内容播放设备,内容制作设备可以是指沉浸媒体的提供者(例如沉浸媒体的内容制作者)所使用的计算机设备,该计算机设备可以是终端(如个人计算机(Personal Computer,PC)、智能移动设备(如智能手机)等)或服务器。内容播放设备可以是指沉浸媒体的使用者(例如用户)所使用的计算机设备,该计算机设备可以是终端(如PC)、智能移动设备(如智能手机)、VR设备(如VR头盔、VR眼镜等))。沉浸媒体的数据处理过程包括在内容制作设备侧的数据处理过程及在内容播放设备侧的数据处理过程。
在内容制作设备端的数据处理过程主要包括:(1)沉浸媒体的媒体内容的获取与制作过程;(2)沉浸媒体的编码及文件封装的过程。在内容播放设备端的数据处理过程主要包括:(3)沉浸媒体的文件解封装及解码的过程;(4)沉浸媒体的渲染过程。另外,内容制作设备与内容播放设备之间涉及沉浸媒体的传输过程,该传输过程可以基于各种传输协议来进行,此处的传输协议可包括但不限于:动态自适应流媒体传输(Dynamic Adaptive Streaming over HTTP,DASH)协议、动态码率自适应传输(HTTP Live Streaming,HLS)协议、智能媒体传输协议(Smart Media TransportProtocol,SMTP)、传输控制协议(Transmission Control Protocol,TCP)等。
图1B示出了本申请实施例提供的一种沉浸媒体的传输方案流程图。如图1B所示,为了解决沉浸媒体自身数据量过大带来的传输带宽负荷问题,在沉浸媒体的处理过程中,通常选择将原始视频在空间上切分为多个分块视频后,分别编码后进行封装,再传输给客户端消费。
下面分别对沉浸媒体的数据处理过程中涉及的各个过程进行详细介绍。
图1C示出了本申请实施例提供的一种视频编码基本框图。结合图1A-图1C对沉浸媒体的数据处理过程中涉及的各个过程进行详细介绍:
一、在内容制作设备端的数据处理过程:
(1)获取沉浸媒体的媒体内容。
从沉浸媒体的媒体内容的获取方式看,可以分为通过捕获设备采集现实世界的声音-视觉场景获得的以及通过计算机生成的两种方式。在一种实现中,捕获设备可以是指设于内容制作设备中的硬件组件,例如捕获设备是指终端的麦克风、摄像头、传感器等。另一种实现中,该捕获设备也可以是与内容制作设备相连接的硬件装置,例如与服务器相连接摄像头;用于为内容制作设备提供沉浸媒体的媒体内容的获取服务。该捕获设备可以包括但不限于:音频设备、摄像设备及传感设备。其中,音频设备可以包括音频传感器、麦克风等。摄像设备可以包括普通摄像头、立体摄像头、光 场摄像头等。传感设备可以包括激光设备、雷达设备等。捕获设备的数量可以为多个,这些捕获设备被部署在现实空间中的一些特定位置以同时捕获该空间内不同角度的音频内容和视频内容,捕获的音频内容和视频内容在时间和空间上均保持同步。由于获取的方式不同,不同沉浸媒体的媒体内容对应的压缩编码方式也可能有所区别。
(2)沉浸媒体的媒体内容的制作过程。
捕获到的音频内容本身就是适合被执行沉浸媒体的音频编码的内容。捕获到的视频内容进行一系列制作流程后才可成为适合被执行沉浸媒体的视频编码的内容,该制作流程包括:
①拼接。由于捕获到的视频内容是捕获设备在不同角度下拍摄得到的,拼接就是指对这些各个角度拍摄的视频内容拼接成一个完整的、能够反映现实空间360度视觉全景的视频,即拼接后的视频是一个在三维空间表示的全景视频(或球面视频)。
②投影。投影就是指将拼接形成的一个三维视频映射到一个二维(2-Dimension,2D)图像上的过程,投影形成的2D图像称为投影图像;投影的方式可包括但不限于:经纬图投影、正六面体投影。
需要说明的是,由于采用捕获设备只能捕获到全景视频,这样的视频经内容制作设备处理并传输至内容播放设备进行相应的数据处理后,内容播放设备侧的用户只能通过执行一些特定动作(如头部旋转)来观看360度的视频信息,而执行非特定动作(如移动头部)并不能获得相应的视频变化,VR体验不佳,因此需要额外提供与全景视频相匹配的深度信息,来使用户获得更优的沉浸度和更佳的VR体验,这就涉及多种制作技术,常见的制作技术包括六自由度(Six Degrees of Freedom,6DoF)制作技术。图1D示出了本申请一个示例性实施例提供的6DoF的示意图;6DoF分为窗口6DoF、全方向6DoF和6DoF,其中,窗口6DoF是指用户在X轴、Y轴的旋转移动受限,以及在Z轴的平移受限;例如,用户不能够看到窗户框架外的景象,以及用户无法穿过窗户。全方向6DoF是指用户在X轴、Y轴和Z轴的旋转移动受限,例如,用户在受限的移动区域中不能自由的穿过三维的360度VR内容。6DoF是指用户可以沿着X轴、Y轴、Z轴自由平移,例如,用户可以在三维的360度VR内容中自由的走动。与6DoF相类似的,还有3DoF和3DoF+制作技术。图1E示出了本申请一个示例性实施例提供的3DoF的示意图;如图1E所示,3DoF是指用户在一个三维空间的中心点固定,用户头部沿着X轴、Y轴和Z轴旋转来观看媒体内容提供的画面。图1F示出了本申请一个示例性实施例提供的3DoF+的示意图,如图1F所示,3DoF+是指当沉浸媒体提供的虚拟场景具有一定的深度信息,用户头部可以基于3DoF在一个有限的空间内移动来观看媒体内容提供的画面。
(3)沉浸媒体的媒体内容的编码过程。
投影图像可以被直接进行编码,也可以对投影图像进行区域封装之后再进行编码。现代主流视频编码技术,以国际视频编码标准HEVC(High Efficiency Video Coding),国际视频编码标准VVC(Versatile Video Coding),以及中国国家视频编码标准AVS(Audio Video Coding Standard)为例,采用了混合编码框架,对输入的原始视频信号,进行了如下一系列的操作和处理:
1)块划分结构(block partition structure):根据处理单元的大小将输入图像划分成若干个不重叠的处理单元,对每个处理单元进行类似的压缩操作。这个处理单元被称作编码树单元(Coding Tree Unit,CTU),或者最大编码单元(Largest Coding Unit,LCU)。CTU可以继续进行更加精细的划分,得到一个或多个基本编码的单元,称之为编码单元(Coding Unit,CU)。每个CU是一个编码环节中最基本的元素。图1G示出了本申请实施例提供的一种输入图像划分示意图。以下描述的是对每一个CU可 能采用的各种编码方式。
2)预测编码(Predictive Coding):包括了帧内预测和帧间预测等方式,原始视频信号经过选定的已重建视频信号的预测后,得到残差视频信号。内容制作设备需要为当前CU决定在众多可能的预测编码模式中,选择最适合的一种,并告知内容播放设备。
a.帧内预测:预测的信号来自于同一图像内已经编码重建过的区域
b.帧间预测:预测的信号来自已经编码过的,不同于当前图像的其他图像(称之为参考图像)
3)变换编码及量化(Transform&Quantization):残差视频信号经过离散傅里叶变换(Discrete Fourier Transform,DFT),离散余弦变换(Discrete Cosine Transform,DCT)等变换操作,将信号转换到变换域中,称之为变换系数。在变换域中的信号,进一步的进行有损的量化操作,丢失掉一定的信息,使得量化后的信号有利于压缩表达。在一些视频编码标准中,可能有多于一种变换方式可以选择,因此,内容制作设备也需要为当前编码CU选择其中的一种变换,并告知内容播放设备。量化的精细程度通常由量化参数(Quantization Parameter,QP)来决定,QP取值较大,表示更大取值范围的系数将被量化为同一个输出,因此通常会带来更大的失真,及较低的码率;相反,QP取值较小,表示较小取值范围的系数将被量化为同一个输出,因此通常会带来较小的失真,同时对应较高的码率。
4)熵编码(Entropy Coding)或统计编码:量化后的变换域信号,将根据各个值出现的频率,进行统计压缩编码,最后输出二值化(0或者1)的压缩码流。同时,编码产生其他信息,例如选择的模式,运动矢量等,也需要进行熵编码以降低码率。统计编码是一种无损编码方式,可以有效的降低表达同样的信号所需要的码率。常见的统计编码方式有变长编码(VLC,Variable Length Coding)或者基于上下文的二值化算术编码(CABAC,Content Adaptive Binary Arithmetic Coding)。
5)环路滤波(Loop Filtering):已经编码过的图像,经过反量化,反变换及预测补偿的操作(上述2~4的反向操作),可获得重建的解码图像。重建图像与原始图像相比,由于存在量化的影响,部分信息与原始图像有所不同,产生失真(Distortion)。对重建图像进行滤波操作,例如去块效应滤波(deblocking),取样自适应偏移(Sample Adaptive Offset,SAO)滤波器或者自适应环路滤波器(Adaptive Loop Filter,ALF)等,可以有效的降低量化所产生的失真程度。由于这些经过滤波后的重建图像,将作为后续编码图像的参考,用于对将来的信号进行预测,所以上述的滤波操作也被称为环路滤波,及在编码环路内的滤波操作。
此处需要说明的是,如果采用6DoF制作技术(用户可以在模拟的场景中较自由的移动时,称为6DoF),在视频编码过程中需要采用特定的编码方式(如点云编码)进行编码。
(4)沉浸媒体的封装过程。
将音频码流和视频码流按照沉浸媒体的文件格式(如ISO基媒体文件格式(ISO Base Media File Format,ISOBMFF))封装在文件容器中形成沉浸媒体的媒体文件资源,该媒体文件资源可以是媒体文件或媒体片段形成沉浸媒体的媒体文件;并按照沉浸媒体的文件格式要求采用媒体呈现描述信息(Media presentation description,MPD)记录该沉浸媒体的媒体文件资源的元数据,此处的元数据是对与沉浸媒体的呈现有关的信息的总称,该元数据可包括对媒体内容的描述信息、对视窗的描述信息以及对媒体内容呈现相关的信令信息等等。如图1A所示,内容制作设备会存储经过数据处理过程之后形成的媒体呈现描述信息和媒体文件资源。
二、在内容播放设备端的数据处理过程:
(1)沉浸媒体的文件解封装及解码的过程;
内容播放设备可以通过内容制作设备的推荐或按照内容播放设备端的用户需求自适应动态从内容制作设备获得沉浸媒体的媒体文件资源和相应的媒体呈现描述信息,例如内容播放设备可根据用户的头部/眼睛/身体的跟踪信息确定用户的朝向和位置,再基于确定的朝向和位置动态向内容制作设备请求获得相应的媒体文件资源。媒体文件资源和媒体呈现描述信息通过传输机制(如DASH、SMT)由内容制作设备传输给内容播放设备。内容播放设备端的文件解封装的过程与内容制作设备端的文件封装过程是相逆的,内容播放设备按照沉浸媒体的文件格式要求对媒体文件资源进行解封装,得到音频码流和视频码流。内容播放设备端的解码过程与内容制作设备端的编码过程是相逆的,内容播放设备对音频码流进行音频解码,还原出音频内容。另外,内容播放设备对视频码流的解码过程包括如下:①对视频码流进行解码,得到平面的投影图像。②根据媒体呈现描述信息将投影图像进行重建处理以转换为3D图像,此处的重建处理是指将二维的投影图像重新投影至3D空间中的处理。
根据上述编码过程可以看出,在内容播放设备端,对于每一个CU,内容播放设备获得压缩码流后,先进行熵解码,获得各种模式信息及量化后的变换系数。各个系数经过反量化及反变换,得到残差信号。另一方面,根据已知的编码模式信息,可获得该CU对应的预测信号,两者相加之后,即可得到重建信号。最后,解码图像的重建值,需要经过环路滤波的操作,产生最终的输出信号。
(2)沉浸媒体的渲染过程。
内容播放设备根据媒体呈现描述信息中与渲染、视窗相关的元数据对音频解码得到的音频内容及视频解码得到的3D图像进行渲染,渲染完成即实现了对该3D图像的播放输出。如果采用3DoF和3DoF+的制作技术,内容播放设备主要基于当前视点、视差、深度信息等对3D图像进行渲染,如果采用6DoF的制作技术,内容播放设备主要基于当前视点对视窗内的3D图像进行渲染。其中,视点指用户的观看位置点,视差是指用户的双目产生的视线差或由于运动产生的视线差,视窗是指观看区域。
沉浸媒体系统支持数据盒(Box),数据盒是指包括元数据的数据块或对象,即数据盒中包含了相应媒体内容的元数据。沉浸媒体可以包括多个数据盒,例如包括旋转数据盒、覆盖信息数据盒、媒体文件格式数据盒等等。
由上述沉浸媒体的处理过程可知,在对沉浸式视频进行编码后,需要对编码后的数据流进行封装并传输给用户。相关沉浸媒体的封装技术中,涉及子图像帧的概念,属于同一沉浸媒体且具备相同分辨率的多个子图像帧被封装至同一轨道组,而属于同一沉浸媒体但具备不同分辨率的子图像帧被封装至不同的轨道组,这些封装信息采用二维空间关系描述数据盒(SpatialRelationship2DDescriptionBox)来记录,其中,二维空间关系描述数据盒(SpatialRelationship2DDescriptionBox)是对现有的轨道组数据盒(TrackGroupTypeBox)进行扩展得到。按照二维空间关系描述数据盒(SpatialRelationship2DDescriptionBox)的定义,所有包含二维空间关系描述数据盒的轨道(track)属于同一个轨道组,即这些轨道包含的视频内容是同一个坐标系下的完整视频的子图像帧。其中,轨道是指一系列有时间属性的按照ISO基本媒体文件格式(ISO base media file format,ISOBMFF)的封装方式的样本,比如视频track,视频track是通过将视频编码器编码每一帧后产生的码流按照ISOBMFF的规范封装后得到的。
在一些实施例中,二维空间关系描述数据盒还包括用于指示原始视频帧的宽和高,以及所属内容的源ID的二维空间关系源数据盒(SpatialRelationship2DSourceBox)和用于指示子图像帧在整体视频帧中的位置的子图像帧区域数据盒 (SubPictureRegionBox)。
沉浸媒体的二维空间关系描述数据盒(SpatialRelationship2DDescriptionBox)的语法可参见下述表1:
表1
Figure PCTCN2021085907-appb-000001
上述表1所示语法的语义如下:total_width与total_height指示原始视频帧的宽和高;source_id指示了子图像帧所属完整视频的源ID;object_x与object_y指示了子图像帧左顶点的坐标;object_width与object_height指示了子图像帧的宽和高。track_not_alone_flag指示了该子图像帧是否必须与该轨道组中的其他子图像帧同时呈现。track_not_mergeable_flag指示了该子图像帧对应的轨道所包含的码流是否可以直接与该轨道组中的其他子图像帧包含的码流合并。
结合上述表1可知,现有技术采用的是子图像帧的概念,且对子图像帧的封装过程在一定程度上局限了沉浸媒体的封装灵活性,无法适用于沉浸媒体的多种场景,例如视角自适应传输场景。
基于此,本申请实施例对轨道组数据盒进行扩展得到独立编解码区域描述数据盒(IndependentlyCodedRegionDescriptionBox),使得所有同属一个沉浸媒体(如同一个节目或者同一个内容),且在空间上、清晰度上存在关联关系的轨道均可被定义在同一个轨道组中,即同一视频内容的不同空间分块、不同分辨率视频对应的轨道均属于同一个轨道组。由于不同分辨率版本的视频可能在空间上分别划分,此时不同分辨率的分块视频使用不同的坐标系,由坐标信息数据盒(CoordianteInfoBox)表示。每个分块视频的坐标信息则由独立编解码区域数据盒(IndependentlyCodedRegionBox)表示。该独立编解码区域描述数据盒的语法的语义可参见下述表2:
表2
Figure PCTCN2021085907-appb-000002
上述表2中的语法的语义如下①-⑨:
①一个独立编解码区域对应一个坐标系标识字段coordinate_id。一个独立编解码 区域对应一个分块视频,则N个独立编解码区域对应N个分块视频,N个独立编解码区域对应N个坐标系标识字段。第i个独立编解码区域的坐标系标识字段,指示第i个分块视频所属的坐标系,相同分辨率的分块视频属于同一个坐标系,其中i,N为正整数,且i≤N。
②一个独立编解码区域对应一个完整视频的高度字段total_height和一个完整视频的宽度字段total_width,则N个独立编解码区域对应N个完整视频的高度字段和N个完整视频的宽度字段。第i个独立编解码区域的完整视频的高度字段,指示第i个分块视频所属坐标系下的完整视频的高度;第i个独立编解码区域的完整视频的宽度字段,指示第i个分块视频所属坐标系下的完整视频的宽度。可以理解的是,完整视频的尺寸由坐标系标识字段,完整视频的高度和完整视频的宽度共同指示。
③一个独立编解码区域对应一个独立编解码区域的顶点在所属坐标系中的横坐标字段region_vertex_x和纵坐标字段region_vertex_y,则N个独立编解码区域对应N个独立编解码区域的顶点在所属坐标系中的横坐标字段和纵坐标字段。第i个独立编解码区域的顶点在所属坐标系中的横坐标字段和纵坐标字段,指示第i个独立编解码区域的顶点的横坐标和纵坐标。独立编解码区域为矩形区域,该独立编解码区域的顶点可以是指矩形区域的左上顶点、左下顶点,右上顶点或者右下顶点。
④一个独立编解码区域对应一个独立编解码区域的高度字段region_height和一个独立编解码区域的宽度字段region_width,则N个独立编解码区域对应N个完整视频的高度字段和N个完整视频的宽度字段。第i个独立编解码区域的高度字段,指示第i个独立编解码区域的高度;第i个独立编解码区域的宽度字段,指示第i个独立编解码区域的宽度。第i个独立编解码区域在所属坐标系中的位置由独立编解码区域的顶点在所属坐标系中的横坐标字段,纵坐标字段,独立编解码区域的高度字段和独立编解码区域的宽度字段共同指示。
⑤一个独立编解码区域对应一个非独立呈现标志字段track_not_alone_flag,则N个独立编解码区域对应N个非独立呈现标志字段。当第i个独立编解码区域的非独立呈现标志字段为有效值时,指示第i个独立编解码区域与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域同时呈现;当第i个独立编解码区域的非独立呈现标志字段为无效值时,指示第i个独立编解码区域与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域可以不同时呈现。
⑥一个独立编解码区域对应一个合流标志字段track_not_mergeable_flag,则N个独立编解码区域对应N个合流标志字段。当第i个独立编解码区域的合流标志字段为无效值时,指示第i个独立编解码区域所属轨道所包含的码流能够与第i个独立编解码区域所属轨道组中的其他轨道所包含的码流合并;当第i个独立编解码区域的合流标志字段为有效值时,指示第i个独立编解码区域所属轨道所包含的码流不能够与第i个独立编解码区域所属轨道组中的其他轨道所包含的码流合并。
⑦一个独立编解码区域对应一个轨道优先级信息标志字段track_priority_info_flag,则N个独立编解码区域对应N个轨道优先级信息标志字段。当第i个独立编解码区域的轨道优先级信息标志字段为无效值时,指示第i个独立编解码区域所属轨道组中的各个独立编解码区域的优先级相同;当第i个独立编解码区域的轨道优先级信息标志字段为有效值时,第i个独立编解码区域的优先级由轨道优先级字段track_priority指示,轨道优先级字段的值越小,第i个独立编解码区域的优先级越高。当第i个独立编解码区域的清晰度高于第j个独立编解码区域的清晰度时,第i个独立编解码区域的优先级高于第j个独立编解码区域的优先级,其中j为正整数,j≤N且j≠i。
⑧一个独立编解码区域对应一个轨道重叠信息标志字段track_overlap_info_flag,则N个独立编解码区域对应N个轨道重叠信息标志字段。当第i个独立编解码区域的轨道重叠信息标志字段为无效值时,指示第i个独立编解码区域在被显示时不与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域重叠;当第i个独立编解码区域的轨道重叠信息标志字段为有效值时,第i个独立编解码区域的显示方式由背景标志字段background_flag指示。当背景标志字段为无效值时,指示第i个独立编解码区域作为第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域的前景画面被显示;当背景标志字段为有效值时,指示第i个独立编解码区域作为第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域的背景画面被显示。
⑨当第i个独立编解码区域作为第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域的背景画面被显示时,第i个独立编解码区域的透明度字段opacity,指示第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域的背景画面被显示时的透明度。当透明字段的值等于0时,第i个独立编解码区域被显示为透明背景画面;当透明字段的值大于0时,第i个独立编解码区域被显示为非透明背景画面。
与独立编解码区域描述数据盒对应的描述信息被存放于本申请实施例提供的独立编解码区域描述信令文件中,该独立编解码区域描述信令文件被封装于沉浸媒体的媒体呈现描述文件中的自适应集层级中。独立编解码区域描述信令文件应包含下表3中定义的元素和属性。
表3
Figure PCTCN2021085907-appb-000003
Figure PCTCN2021085907-appb-000004
由上述表3可知,本申请实施例中的独立编解码区域描述信令文件中包括元素和属性:IndependentlyCodedRegionGroupId、IndependentlyCodedRegionGroupId@coordinateId、IndependentlyCodedRegionGroupId@trackPriority以及IndependentlyCodedRegionGroupId@backgroundFlag及这些元素和属性的相关描述。
按照本申请实施例的上述表2所示的独立编解码区域描述数据盒,结合表3所示的独立编解码区域描述信令文件,内容制作设备将同一视频中的多个分块视频的轨道存放在同一个轨道组中,可以支持更多当前主流的沉浸媒体视角自适应传输技术。使得视频传输过程更加可靠。同时也避免了内容制作设备在存储不同版本视频时带来的不必要内存开销。通过生成对应的独立编解码区域描述数据盒,使得内容播放设备在呈现沉浸媒体时更加便捷。
图2示出了本申请实施例提供的一种沉浸媒体的数据处理方法的流程图;该方法可由沉浸媒体系统中的内容制作设备或者内容播放设备来执行,该方法包括以下步骤S201-S202:
S201,获取沉浸媒体的第i个独立编解码区域的独立编解码区域描述数据盒,第i个独立编解码区域对应第i个分块视频;独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒,其中i,N为正整数,且i≤N。
其中,沉浸媒体包括N个分块视频,N个分块被封装至N个轨道中,第i个分块视频被封装在第i个轨道中;N个轨道属于同一个轨道组。
S202,根据独立编解码区域描述数据盒显示沉浸媒体的第i个分块视频。
步骤S201-S202中,沉浸媒体的独立编解码区域描述数据盒的语法可以参见上述表2。其中,坐标信息数据盒用于指示不同分辨率的分块视频所使用的坐标系,即坐标信息数据盒中各个字段的值是根据沉浸媒体被划分后,不同分辨率的分块视频所使用的坐标系配置的;例如,分辨率为4K(4096×2160像素)的分块视频1~分块视频6所使用的坐标系为坐标系1,分辨率为2K的分块视频7~分块视频12所使用的坐标系为坐标系2。独立编解码区域数据盒用于指示每个分块视频的坐标信息(如分块视频的大小,在所属坐标系中的位置等)及每个分块视频在沉浸媒体中的显示方式,显示方式可以包括但不限于:是否独立显示,显示时是否与其他分块视频重叠,显示时分块视频的透明度等。
对于内容制作设备来说,还可以根据沉浸媒体的N个分块视频的封装过程生成独立编解码区域描述信令文件,独立编解码区域描述信令文件包括独立编解码区域描述数据盒的描述信息。独立编解码区域描述信令文件的语法可参见表3。
相应的,对于内容播放设备来说,在获取沉浸媒体的打包文件之前可先获取沉浸媒体的媒体呈现描述文件,进而从媒体呈现描述文件中的自适应集层级中获取独立编解码区域描述信令文件。内容播放设备根据用户需求(如用户当前视角)及独立编解码区域描述信令文件向内容制作设备请求对应沉浸媒体的打包文件。
本申请实施例将属于同一沉浸媒体的N个分块视频(具备相同分辨率或不同分辨率)封装至N个轨道中,并且该N个轨道被封装至同一轨道组中;同时引入与各个分块视频相对应的独立编解码区域的概念,通过第i个独立编解码区域的独立编解码区域描述数据盒来指示第i个轨道与轨道组中其他轨道之间的消费关系,当根据第i个独立编解码区域的独立编解码区域描述数据盒来对第i个独立编解码区域进行显示时,可以获得更加便捷、准确地呈现效果。
图3出了本申请实施例提供的另一种沉浸媒体的数据处理方法的流程图;该方法 由沉浸媒体系统中的内容制作设备来执行,该方法包括以下步骤S301-S303:
S301,将沉浸媒体划分为N个分块视频。
划分的依据包括以下至少一个:空间,视角及分辨率等;例如,依据用户的视角将沉浸媒体划分为前、后、左、右4个区域,再按照划分规则(如预设的独立编解码区域的尺寸,或者独立编解码区域的数量)对不同视角对应的区域进行进一步划分,得到N个分块视频。
S302,分别将N个分块视频封装至N个轨道中,第i个分块视频被封装在第i个轨道中;第i个分块视频对应第i个独立编解码区域,其中i,N为正整数,且i≤N;N个轨道属于同一个轨道组。
在一种实施方式中,轨道组中各个轨道中的分块视频的分辨率相同。
在另一种实施方式中,轨道组中存在第i个轨道中分块视频的分辨率与第j个轨道中分块视频的分辨率不同,其中j为正整数,j≤N且j≠i;即同一视频内容的不同空间分块、不同分辨率视频对应的轨道被存放于同一个轨道组中。
S303,根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒,该独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒。
结合上述表2,步骤S303生成第i个独立编解码区域的独立编解码区域描述数据盒的过程可包括以下(1)-(8):
(1)坐标信息数据盒包括坐标系标识字段coordinate_id。一个独立编解码区域对应一个坐标系标识字段,根据第i个分块视频的分辨率确定第i个独立编解码区域所属的坐标系,并根据该坐标系的标识配置第i个独立编解码区域的坐标系标识字段的值。当沉浸媒体的第i个分块视频的分辨率与第j个分块视频的分辨率相同时,第i个独立编解码区域与第j个独立编解码区域属于同一坐标系,其中j为正整数,j≤N且j≠i。
(2)坐标信息数据盒包括完整视频的高度字段total_height和完整视频的宽度字段total_width,一个独立编解码区域对应一个完整视频的高度字段和一个完整视频的宽度字段。完整视频是由第i个独立编解码区域所属坐标系下所有独立编解码区域对应的分块视频组成的。获取第i个独立编解码区域所属坐标系下所有独立编解码区域对应的分块视频组成的完整视频的高度和宽度,将获取到的完整视频的高度配置为完整视频的高度字段的值,将获取到的完整视频的宽度配置为完整视频的宽度字段的值。
(3)独立编解码区域数据盒包括独立编解码区域的横坐标字段region_vertex_x和纵坐标字段region_vertex_y,一个独立编解码区域对应一个横坐标字段和一个纵坐标字段。获取第i个独立编解码区域在所属坐标系中顶点的横坐标的值和纵坐标的值,将获取到的第i个独立编解码区域在所属坐标系中横坐标的值配置为第i个独立编解码区域在所属坐标系中的横坐标字段的值,以及将获取到的第i个独立编解码区域在所属坐标系中纵坐标的值配置为第i个独立编解码区域在所属坐标系中的纵坐标字段的值。独立编解码区域为矩形区域,该独立编解码区域的顶点可以是指矩形区域的左上顶点、左下顶点,右上顶点或者右下顶点。
(4)独立编解码区数据盒包括独立编解码区域的高度字段region_height和独立编解码区域的宽度字段region_width,一个独立编解码区域对应一个独立编解码区域的高度字段和一个独立编解码区域的宽度字段。获取第i个独立编解码区域的高度和宽度,将获取到的第i个独立编解码区域的高度配置为第i个独立编解码区域的高度字段的值,以及将获取到的第i个独立编解码区域的宽度配置为第i个独立编解码区 域的宽度字段的值。
(5)独立编解码区域数据盒包括独立编解码区域的非独立呈现标志字段track_not_alone_flag,一个独立编解码区域对应一个非独立呈现标志字段。若第i个独立编解码区域与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域同时呈现时,则将第i个独立编解码区域的非独立呈现标志字段配置为有效值;若第i个独立编解码区域与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域可以不同时呈现时,则将第i个独立编解码区域的非独立呈现标志字段配置为无效值。
(6)独立编解码区域数据盒包括独立编解码区域的合流标志字段track_not_mergeable_flag,一个独立编解码区域对应一个合流标志字段。若第i个独立编解码区域所属轨道所包含的码流能够与第i个独立编解码区域所属轨道组中的其他轨道所包含的码流直接合并(即轨道间编码方式相同),则将第i个独立编解码区域的合流标志字段配置为无效值;若第i个独立编解码区域所属轨道所包含的码流不能与第i个独立编解码区域所属轨道组中的其他轨道所包含的码流直接合并(即轨道间编码方式不同),则将第i个独立编解码区域的合流标志字段配置为有效值。
(7)独立编解码区域数据盒包括独立编解码区域的轨道优先级信息标志字段track_priority_info_flag,一个独立编解码区域对应一个轨道优先级信息标志字段。若第i个独立编解码区域所属轨道组中的各个独立编解码区域的优先级相同时,则将第i个独立编解码区域的轨道优先级信息标志字段配置为无效值。若第i个独立编解码区域所属轨道组中的各个独立编解码区域的优先级不相同,则将第i个独立编解码区域的轨道优先级信息标志字段配置为有效值。在第i个独立编解码区域所属轨道组中的各个独立编解码区域的优先级不相同的情况下,独立编解码区域数据盒还包括第i个独立编解码区域的轨道优先级字段track_priority。第i个独立编解码区域的优先级由以下至少一项决定:第i个独立编解码区域的分辨率、第i个独立编解码区域所属轨道的呈现优先级、第i个独立编解码区域所属轨道的传输优先级。将第i个独立编解码区域的优先级配置为第i个独立编解码区域的轨道优先级字段。
在一种实施方式中,第i个独立编解码区域的分辨率越高,则配置的第i个独立编解码区域的轨道优先级字段的值越小;同理,第i个独立编解码区域所属轨道的呈现优先级越高,则配置的第i个独立编解码区域的轨道优先级字段的值越小;第i个独立编解码区域所属轨道的传输优先级越高,则配置的第i个独立编解码区域的轨道优先级字段的值越小。
(8)独立编解码区域数据盒包括独立编解码区域的轨道重叠信息标志字段track_overlap_info_flag,一个独立编解码区域对应一个轨道重叠信息标志字段。若要求第i个独立编解码区域不与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域重叠显示,则将第i个独立编解码区域轨道重叠信息标志字段配置为无效值。若要求第i个独立编解码区域与第i个独立编解码区域所属轨道组中的第j个独立编解码区域重叠显示,则将第i个独立编解码区域轨道重叠信息标志字段配置为有效值,其中j为正整数,且j≠i。在第i个独立编解码区域与第i个独立编解码区域所属轨道组中的第j个独立编解码区域重叠显示的情况下,独立编解码区域数据盒还包括第i个独立编解码区域的背景标志字段background_flag。若要求第i个独立编解码区域作为第i个独立编解码区域所属轨道组中的第j个独立编解码区域的前景画面被显示,则将第i个独立编解码区域的背景标志字段配置为无效值。若要求第i个独立编解码区域作为第i个独立编解码区域所属轨道组中的第j独立编解码区域的背景画面被显示,则将第i个独立编解码区域的背景标志字段配置为有效值。在第i个 独立编解码区域作为第i个独立编解码区域所属轨道组中的第j独立编解码区域的背景画面被显示的情况下,独立编解码区域数据盒还包括第i个独立编解码区域的透明度字段opacity。若第i个独立编解码区域被要求显示为透明背景画面,则将第i个独立编解码区域的透明度字段的值配置为0;若第i个独立编解码区域被要求显示为非透明背景画面,则根据第i个独立编解码区域的透明度配置第i个独立编解码区域的透明度字段的值,其中,第i个独立编解码区域的透明度字段的值大于或等于0。需要说明的是,作为前景画面呈现的不同的两个独立编解码区域之间不能相互重叠。
另外,还可以根据沉浸媒体的N个分块视频的封装过程生成独立编解码区域描述信令文件,独立编解码区域描述信令文件包括独立编解码区域描述数据盒的描述信息。独立编解码区域描述信令文件的语法可参见表3,独立编解码区域描述信令文件中各个字段的配置方式可参考上述独立编解码区域描述数据盒中对应字段的配置方式,在此不再赘述。
例如,如图4A所示,内容制作设备将沉浸媒体划分为6个分块视频,并将分块视频码流1~分块视频码流6分别封装在轨道1~轨道6中。由于轨道1~轨道6中的分块视频是属于同一个视频内容的不同分块视频,因此轨道1~轨道6属于同一个轨道组(trackgroup),假设轨道组标识为1,则配置track_group_id=1。且由于轨道1~轨道6对应的分块视频属于同一分辨率,则轨道1~轨道6共用一个坐标系,假设坐标系ID的值为1,则配置coordinate_id=1。假设完整视频帧的宽高分别为600,200,则配置total_width=600,total_height=200。由此得到轨道1~轨道6的坐标信息数据盒。假设所有坐标系的原点(0,0)为视频帧的左上角,x轴由左向右,y轴由上向下。轨道1~轨道6中独立编解码区域对应的独立编解码区域数据盒中各个独立编解码区域的左上顶点坐标分别为:(0,0)、(200,0)、(400,0)、(0,100)、(200,100)、(400,100),独立编解码区域的宽高分别为200,100,即region_height=100,region_width=200。由于轨道1~轨道6分辨率相同且均为前景画面,因此track_priority_info_flag以及track_overlap_info_flag取值均为0。内容制作设备将独立编解码区域描述信令文件发送给用户,其中:IndependentlyCodedRegionGroupId配置为1;IndependentlyCodedRegionGroupId@coordinateId配置为1;由于轨道1~轨道6的分辨率相同,其优先级都相同,且均作为前景呈现,因此IndependentlyCodedRegionGroupId@trackPriority和IndependentlyCodedRegionGroupId@backgroundFlag均不包含在独立编解码区域描述信令文件中。根据用户观看视角和内容播放设备一次性消费的视野区域大小,内容播放设备向内容制作设备请求轨道2与轨道5对应的视频文件。内容制作设备将轨道2与轨道5打包为沉浸媒体的打包文件,传输给内容播放设备,文件的轨道中包含上述坐标信息数据盒和独立编解码区域数据盒。
又如,如图4B所示,内容制作设备将沉浸媒体划分为12个分块视频,并高分辨率(Resolution1)视频的分块视频码流1~分块视频码流6分别被封装在轨道1~轨道6中,低分辨率(Resolution2)视频的分块视频码流1~分块视频码流6分别被封装在轨道7~轨道12中。由于轨道1~轨道12属于同一个视频内容的不同分块,因此轨道1~轨道12属于同一个轨道组(trackgroup),假设轨道组标识为1,则配置track_group_id=1。且由于轨道1~轨道6对应的分块视频属于同一分辨率,则轨道1~轨道6共用一个坐标系,假设坐标系ID的值为1,则配置coordinate_id=1。同理轨道7~轨道12对应另一个坐标系,假设坐标系ID的值为2,则配置coordinate_id=2。假设完整视频帧的宽高分别为600,200,则配置total_width=600,total_height=200。由此得到轨道1~轨道6的坐标信息数据盒。假设低分辨率完整视频帧的宽高为300, 100,则坐标系2对应total_width=300,total_height=100。由此得到轨道7~轨道12的坐标信息数据盒。可见,轨道1~轨道6的坐标信息数据盒相同,轨道7~轨道12的坐标信息数据盒相同。假设所有坐标系的原点(0,0)为视频帧的左上角,x轴由左向右,y轴由上向下。轨道1~轨道6中独立编解码区域对应的独立编解码区域数据盒中各个独立编解码区域的左上顶点坐标分别为:(0,0)、(200,0)、(400,0)、(0,100)、(200,100)、(400,100),独立编解码区域的宽高分别为200,100,即region_height=100,region_width=200。轨道7~轨道12中独立编解码区域对应的独立编解码区域数据盒中各个独立编解码区域的左上顶点坐标分别为:(0,0)、(100,0)、(200,0)、(0,50)、(100,50)、(200,50),独立编解码区域的宽高分别为100,50,即region_height=50,region_width=100。内容制作设备将独立编解码区域描述信令文件发送给用户,高分辨率(Resolution1)视频对应的自适应层级(Adaptation Set)中:IndependentlyCodedRegionGroupId取值为1;IndependentlyCodedRegionGroupId@coordinateId取值为1;IndependentlyCodedRegionGroupId@trackPriority取值为0;IndependentlyCodedRegionGroupId@backgroundFlag不包含在独立编解码区域描述信令文件中。低分辨率(Resolution2)视频对应的自适应层级(Adaptation Set)中:IndependentlyCodedRegionGroupId取值为1;IndependentlyCodedRegionGroupId@coordinateId取值为2;IndependentlyCodedRegionGroupId@trackPriority取值为1;IndependentlyCodedRegionGroupId@backgroundFlag取值为1。根据用户观看视角和内容播放设备一次性消费的视野区域大小,内容播放设备向内容制作设备请求轨道2、轨道5以及轨道7、轨道10对应的视频文件。内容制作设备将轨道2、轨道5以及轨道7、轨道10打包为沉浸媒体的打包文件,传输给内容播放设备。此时由于打包文件中包含两种不同分辨率的视频,且低分辨率独立编解码区域作为高分辨率独立编解码区域的背景呈现,因此:由于轨道1~轨道6对应的完整视频的分辨率更高,因此track_priority_info_flag取值为1,且轨道1~轨道6对应的track_priority取值更小且相同,假设为0,轨道7~轨道12对应的track_priority取值更大,假设为1。由于高分辨率独立编解码区域可能和低分辨率独立编解码区域存在重叠,因此对于轨道1~轨道12,其track_overlap_info_flag取值均为1。轨道1~轨道6作为前景画面呈现,因此background_flag取值为0。轨道7~轨道12作为背景画面呈现,因此background_flag取值为1,且假设重叠部分的透明度为100%,则opacity取值为0。文件的轨道中包含上述坐标信息数据盒和独立编解码区域数据盒。
本申请实施例将属于同一沉浸媒体的N个分块视频(具备相同分辨率或不同分辨率)封装至N个轨道中,并且该N个轨道被封装至同一轨道组中;这样可以适用于更多的传输场景,例如适用于沉浸媒体的视角自适应传输场景;并且使得沉浸媒体的传输过程更加可靠,也避免了内容制作设备在存储不同版本视频时带来的不必要内存开销。同时引入与各个分块视频相对应的独立编解码区域的概念,根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒,通过第i个独立编解码区域的独立编解码区域描述数据盒来指示第i个轨道与轨道组中其他轨道之间的消费关系;那么,当该独立编解码区域描述数据盒被传输至内容消费设备侧时,内容消费设备侧可以根据第i个独立编解码区域的独立编解码区域描述数据盒来对第i个独立编解码区域进行显示,这样可以获得更加便捷、准确地呈现效果。
图5示出了本申请实施例提供的另一种沉浸媒体的数据处理方法的流程图;该方法由沉浸媒体系统中的内容播放设备来执行,该方法包括以下步骤S501-S503:
S501,获取沉浸媒体的打包文件,沉浸媒体包括N个分块视频,N个分块被封 装至N个轨道中,第i个分块视频被封装在第i个轨道中;N个轨道属于同一个轨道组;第i个分块视频对应第i个独立编解码区域;打包文件至少包括第i个轨道,第i个轨道中包含第i个独立编解码区域的独立编解码区域描述数据盒,其中i,N为正整数,且i≤N。
沉浸媒体的打包文件是将同一轨道组中的一个或多个轨道进行封装打包得到的。在一种实施方式中,打包文件的打包策略是由沉浸媒体的内容制作者预先设置的(如根据沉浸媒体的剧情设置)。在另一种实施方式中,打包文件的打包策略是根据内容播放设备的请求动态设置的(如根据不同的用户视角设置)。
S502,对打包文件进行解封处理得到第i个独立编解码区域的独立编解码区域描述数据盒,该独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒。
内容播放设备对打包文件进行解封装处理,得到打包文件中的一个或多个轨道以及各个轨道对应的独立编解码区域描述数据盒。
S503,根据独立编解码区域描述数据盒显示沉浸媒体的第i个分块视频。
结合上述表2,步骤S503中根据独立编解码区域描述数据盒显示沉浸媒体的第i个分块视频的过程可包括以下(1)-(8):
(1)坐标信息数据盒包括坐标系标识字段coordinate_id,一个独立编解码区域对应一个坐标系标识字段,根据第i个独立编解码区域的坐标系标识字段确定第i个独立编解码区域所属的坐标系。当沉浸媒体的第i个分块视频的分辨率与第j个分块视频的分辨率相同时,第i个独立编解码区域与第j个独立编解码区域属于同一坐标系,其中j为正整数,j≤N且j≠i。
(2)坐标信息数据盒包括完整视频的高度字段total_height和完整视频的宽度字段total_width,一个独立编解码区域对应一个完整视频的高度字段和一个完整视频的宽度字段,且一个独立编解码区域对应一个分块视频。完整视频是由第i个独立编解码区域所属坐标系下所有独立编解码区域对应的分块视频组成的。根据第i个分块视频所属坐标系下的完整视频的高度字段和完整视频的宽度字段确定第i个分块视频所属坐标系下的完整视频的尺寸。
(3)独立编解码区域数据盒包括独立编解码区域的横坐标字段region_vertex_x和纵坐标字段region_vertex_y,一个独立编解码区域对应一个横坐标字段和一个纵坐标字段。根据第i个独立编解码区域的横坐标字段和纵坐标字段确定第i个独立编解码区域顶点在第i个独立编解码区域所属坐标系中的坐标。独立编解码区域为矩形区域,该独立编解码区域的顶点可以是指矩形区域的左上顶点、左下顶点,右上顶点或者右下顶点。
(4)独立编解码区域数据盒包括独立编解码区域的高度字段region_height和独立编解码区域的宽度字段region_width,一个独立编解码区域对应一个独立编解码区域的高度字段和一个独立编解码区域的宽度字段。根据第i个独立编解码区域的高度字段和宽度字段确定第i个独立编解码区域的尺寸。
(5)独立编解码区域数据盒包括独立编解码区域的非独立呈现标志字段track_not_alone_flag,一个独立编解码区域对应一个非独立呈现标志字段。当第i个独立编解码区域的非独立呈现标志字段为有效值时,将第i个独立编解码区域与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域同时呈现。当第i个独立编解码区域的非独立呈现标志字段为无效值时,第i个独立编解码区域与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域可以不同时呈现。
(6)独立编解码区域数据盒包括独立编解码区域的合流标志字段 track_not_mergeable_flag,一个独立编解码区域对应一个合流标志字段。当第i个独立编解码区域的合流标志字段为无效值时,第i个独立编解码区域所属轨道所包含的码流能够与第i个独立编解码区域所属轨道组中的其他轨道所包含的码流直接合并。当第i个独立编解码区域的合流标志字段为有效值时,第i个独立编解码区域所属轨道所包含的码流不能与第i个独立编解码区域所属轨道组中的其他轨道所包含的码流直接合并。
(7)独立编解码区域数据盒包括独立编解码区域的轨道优先级信息标志字段track_priority_info_flag,一个独立编解码区域对应一个轨道优先级信息标志字段。当第i个独立编解码区域的轨道优先级信息标志字段为无效值时,第i个独立编解码区域所属轨道组中的各个独立编解码区域的优先级相同。在第i个独立编解码区域的轨道优先级信息标志字段为有效值的情况下,独立编解码区域数据盒还包括第i个独立编解码区域的轨道优先级字段track_priority。根据第i个独立编解码区域的轨道优先级字段确定第i个独立编解码区域的分辨率、第i个独立编解码区域所属轨道的呈现优先级、第i个独立编解码区域所属轨道的传输优先级等。
在一种实施方式中,第i个独立编解码区域的轨道优先级字段的值越小,则第i个独立编解码区域的分辨率越高;同理,第i个独立编解码区域的轨道优先级字段的值越小,则第i个独立编解码区域所属轨道的呈现优先级越高;第i个独立编解码区域的轨道优先级字段的值越小,则第i个独立编解码区域所属轨道的传输优先级越高。
(8)独立编解码区域数据盒包括独立编解码区域的轨道重叠信息标志字段track_overlap_info_flag,一个独立编解码区域对应一个轨道重叠信息标志字段。当第i个独立编解码区域轨道重叠信息标志字段为无效值时,第i个独立编解码区域在被显示时不与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域重叠。当第i个独立编解码区域轨道重叠信息标志字段为有效值时,将第i个独立编解码区域与第i个独立编解码区域所属轨道组中的第j个独立编解码区域重叠显示,其中j为正整数,且j≠i。在第i个独立编解码区域轨道重叠信息标志字段为有效值的情况下,独立编解码区域数据盒还包括第i个独立编解码区域的背景标志字段background_flag。当第i个独立编解码区域的背景标志字段为无效值时,将第i个独立编解码区域作为第i个独立编解码区域所属轨道组中的第j个独立编解码区域的前景画面显示。当第i个独立编解码区域的背景标志字段为有效值时,将第i个独立编解码区域作为第i个独立编解码区域所属轨道组中的第j独立编解码区域的背景画面显示。在第i个独立编解码区域的背景标志字段为有效值的情况下,独立编解码区域数据盒还包括第i个独立编解码区域的透明度字段opacity。当第i个独立编解码区域的透明度字段的值为0时,将第i个独立编解码区域显示为透明背景画面。当第i个独立编解码区域的透明度字段的值大于0时,将第i个独立编解码区域显示为非透明背景画面,第i个独立编解码区域的透明度根据第i个独立编解码区域的透明度字段的值决定。需要说明的是,作为前景画面呈现的不同的两个独立编解码区域之间不能相互重叠。
在一种实施方式中,第i个独立编解码区域的透明度字段的取值范围为[0,100],取值为0表示背景画面完全透明,取值为100表示背景画面完全不透明,大于100的取值保留。
另外,内容播放设备在获取沉浸媒体的封装文件之前,可先获取沉浸媒体的MPD文件,进而从媒体呈现描述文件中的自适应集层级中获取独立编解码区域描述信令文件。内容播放设备根据用户需求(如用户当前视角)及独立编解码区域描述信令文件向内容制作设备请求对应沉浸媒体的打包文件,并按照上述步骤(1)-步骤(8)的 实施方式显示沉浸媒体。
例如,如图4A所示,内容播放设备将收到的沉浸媒体的打包文件解封装,由于轨道2与轨道5均属于同一个轨道组,且轨道组类型为'icrr',内容播放设备因此获悉轨道2与轨道5包含的内容为两个独立编解码区域。内容播放设备将轨道2与轨道5分别解码后,根据独立编解码区域描述数据盒中的坐标信息,呈现视频内容并消费。
又如,如图4B所示,内容播放设备将收到的沉浸媒体的打包文件解封装,由于轨道2、轨道5、轨道7、轨道10均属于同一个轨道组,且轨道组类型为'icrr',客户端因此获悉轨道2、轨道5、轨道7、轨道10包含的内容为四个独立编解码区域。且轨道2、轨道5为同一坐标系,轨道7、轨道10为另一坐标系。由于轨道2、轨道5的background_flag取值为0,其作为前景画面呈现。对应地,轨道7、轨道10作为背景画面呈现。内容播放设备将轨道2、轨道5、轨道7、轨道10分别解码后,根据独立编解码区域描述数据盒中的坐标信息,呈现视频内容并消费。
本申请实施例将属于同一沉浸媒体的N个分块视频(具备相同分辨率或不同分辨率)封装至N个轨道中,并且该N个轨道被封装至同一轨道组中;同时引入与各个分块视频相对应的独立编解码区域的概念,通过第i个独立编解码区域的独立编解码区域描述数据盒来指示第i个轨道与轨道组中其他轨道之间的消费关系,当根据第i个独立编解码区域的独立编解码区域描述数据盒来对第i个独立编解码区域进行显示时,可以获得更加便捷、准确地呈现效果。
上述详细阐述了本申请实施例的方法,为了便于更好地实施本申请实施例的上述方案,相应地,下面提供了本申请实施例的装置。
请参见图6,图6示出了本申请实施例提供的一种沉浸媒体的数据处理装置的结构示意图;该沉浸媒体的数据处理装置可以是运行于内容制作设备中的一个计算机程序(包括程序代码),例如该沉浸媒体的数据处理装置可以是内容制作设备中的一个应用软件。由图6所示,该沉浸媒体的数据处理装置包括获取单元601和处理单元602。
在一个示例性实施例中,所述沉浸媒体包括N个分块视频,所述N个分块被封装至N个轨道中,第i个分块视频被封装在第i个轨道中;所述N个轨道属于同一个轨道组;该沉浸媒体的数据处理装置可以用于执行图2所示的方法中的相应步骤;则:
获取单元601,配置为获取沉浸媒体的第i个独立编解码区域的独立编解码区域描述数据盒,所述第i个独立编解码区域对应所述第i个分块视频;所述独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒,其中i,N为正整数,且i≤N;
处理单元602,配置为根据所述独立编解码区域描述数据盒显示所述沉浸媒体的第i个分块视频。
在一种实施方式中,获取单元601还配置为:
获取独立编解码区域描述信令文件,所述独立编解码区域描述信令文件被封装于所述沉浸媒体的媒体呈现描述文件中的自适应集层级中;
所述独立编解码区域描述信令文件包括第i个独立编解码区域的独立编解码区域描述数据盒的描述信息。
在另一个示例性实施例中,该沉浸媒体的数据处理装置可以用于执行图3所示的方法中的相应步骤;则:
处理单元602,配置为将沉浸媒体划分为N个分块视频;以及,
配置为分别将N个分块视频封装至N个轨道中,第i个分块视频被封装在第i个轨道中;第i个分块视频对应第i个独立编解码区域,其中i,N为正整数,且i≤ N;所述N个轨道属于同一个轨道组;
配置为根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒,所述独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒。
在一种实施方式中,所述坐标信息数据盒包括第i个独立编解码区域的坐标系标识字段;处理单元602还配置为,根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒;
在一种实施方式中,处理单元602还配置为:
根据第i个分块视频的分辨率确定第i个独立编解码区域所属的坐标系;
根据确定的所述第i个独立编解码区域所属的坐标系配置第i个独立编解码区域的坐标系标识字段的值。
在一种实施方式中,所述坐标信息数据盒包括第i个独立编解码区域所属坐标系下的完整视频的尺寸字段;所述完整视频的尺寸字段包括所述完整视频的高度字段和所述完整视频的宽度字段;
处理单元602还配置为,根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒;
在一种实施方式中,处理单元602还配置为:
获取第i个独立编解码区域所属坐标系下所有独立编解码区域对应的分块视频组成的完整视频的高度和宽度;
将获取到的所述完整视频的高度配置为所述完整视频的高度字段的值,以及将获取到的所述完整视频的宽度配置为所述完整视频的宽度字段的值。
在一种实施方式中,所述独立编解码区域数据盒包括第i个独立编解码区域在所属坐标系中的顶点坐标字段及第i个独立编解码区域的尺寸字段,所述顶点坐标字段包括第i个独立编解码区域在所属坐标系中的横坐标字段和第i个独立编解码区域在所属坐标系中的纵坐标字段,所述尺寸字段包括第i个独立编解码区域的高度字段和第i个独立编解码区域的宽度字段;
所述处理单元602还配置为,根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒;
在一种实施方式中,处理单元602还配置为:
获取第i个独立编解码区域在所属坐标系中顶点的横坐标的值和纵坐标的值;
将所述获取到的第i个独立编解码区域在所属坐标系中横坐标的值配置为第i个独立编解码区域在所属坐标系中的横坐标字段的值,以及将所述获取到的第i个独立编解码区域在所属坐标系中纵坐标的值配置为第i个独立编解码区域在所属坐标系中的纵坐标字段的值;以及
获取第i个独立编解码区域的高度和宽度;
将所述获取到的第i个独立编解码区域的高度配置为第i个独立编解码区域的高度字段的值,以及将所述获取到的第i个独立编解码区域的宽度配置为第i个独立编解码区域的宽度字段的值。
在一种实施方式中,所述独立编解码区域数据盒包括第i个独立编解码区域的非独立呈现标志字段;
处理单元602还配置为,根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒;
在一种实施方式中,处理单元602还配置为:
若第i个独立编解码区域与第i个独立编解码区域所属轨道组中的其他轨道中的 独立编解码区域同时呈现,则将第i个独立编解码区域的非独立呈现标志字段配置为有效值。
在一种实施方式中,所述独立编解码区域数据盒包括第i个独立编解码区域的合流标志字段;
处理单元602还配置为,根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒;
在一种实施方式中,处理单元602还配置为:
若第i个独立编解码区域所属轨道所包含的码流能够与第i个独立编解码区域所属轨道组中的其他轨道所包含的码流合并,则将第i个独立编解码区域的合流标志字段配置为无效值。
在一种实施方式中,所述独立编解码区域数据盒包括第i个独立编解码区域的轨道优先级信息标志字段;
处理单元602还配置为,根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒;
在一种实施方式中,处理单元602还配置为:
若第i个独立编解码区域所属轨道组中的各个轨道中的独立编解码区域的优先级相同,则将所述轨道优先级信息标志字段配置为无效值;
若第i个独立编解码区域所属轨道组中的各个轨道中的独立编解码区域的优先级不同,则将所述轨道优先级信息标志字段配置为有效值;
在所述轨道优先级信息标志字段被配置为有效值的情况下,所述独立编解码区域数据盒还包括第i个独立编解码区域的轨道优先级字段;
处理单元602还配置为,根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒;
在一种实施方式中,处理单元602还配置为:
将第i个独立编解码区域的优先级配置为所述轨道优先级字段的值。
在一种实施方式中,所述独立编解码区域数据盒包括第i个独立编解码区域的轨道重叠信息标志字段;处理单元602还配置为,根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒;
在一种实施方式中,处理单元602还配置为:
若要求第i个独立编解码区域不与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域重叠显示,则将所述轨道重叠信息标志字段配置为无效值;
若要求第i个独立编解码区域与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域重叠显示,则将所述轨道重叠信息标志字段配置为有效值;
在所述轨道重叠信息标志字段被配置为有效值的情况下,所述独立编解码区域数据盒还包括第i个独立编解码区域的背景标志字段;所述根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒,还包括:
若要求第i个独立编解码区域作为第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域的前景画面被显示,则将所述背景标志字段配置为无效值;
若要求第i个独立编解码区域作为第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域的背景画面被显示,则将所述背景标志字段配置为有效值。
在一种实施方式中,在所述背景标志字段被配置为有效值的情况下,所述独立编解码区域数据盒还包括第i个独立编解码区域的透明度字段,所述透明度字段的取值大于等于0;处理单元602还配置为,根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒;
在一种实施方式中,处理单元602还配置为:
若第i个独立编解码区域被要求显示为透明背景画面,则将所述透明字段的值配置为0;
若第i个独立编解码区域被要求显示为非透明背景画面,则根据第i个独立编解码区域的透明度配置所述透明字段的值。
在一种实施方式中,处理单元602还配置为:
根据所述沉浸媒体的N个分块视频的封装过程生成独立编解码区域描述信令文件,所述独立编解码区域描述信令文件被封装于所述沉浸媒体的媒体呈现描述文件中的自适应集层级中;
所述独立编解码区域描述信令文件包括第i个独立编解码区域的独立编解码区域描述数据盒的描述信息。
请参见图7,图7示出了本申请实施例提供的另一种沉浸媒体的数据处理装置的结构示意图;该沉浸媒体的数据处理装置可以是运行于内容播放设备中的一个计算机程序(包括程序代码),例如该沉浸媒体的数据处理装置可以是内容播放设备中的一个应用软件。由图7所示,该沉浸媒体的数据处理装置包括获取单元701和处理单元702。
在一个示例性实施例中,沉浸媒体包括N个分块视频,所述N个分块视频分别被封装至N个轨道中,第i个分块视频被封装在第i个轨道中;所述N个轨道属于同一个轨道组;该沉浸媒体的数据处理装置可以用于执行图2所示的方法中的相应步骤;则:
获取单元701,配置为获取沉浸媒体的第i个独立编解码区域的独立编解码区域描述数据盒,所述第i个独立编解码区域对应所述第i个分块视频;所述独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒,其中i,N为正整数,且i≤N;
处理单元702,配置为根据所述独立编解码区域描述数据盒显示所述沉浸媒体的第i个分块视频。
在一种实施方式中,获取单元701还配置为:
获取独立编解码区域描述信令文件,所述独立编解码区域描述信令文件被封装于所述沉浸媒体的媒体呈现描述文件中的自适应集层级中;
所述独立编解码区域描述信令文件包括第i个独立编解码区域的独立编解码区域描述数据盒的描述信息。
在另一个示例性实施例中,该沉浸媒体的数据处理装置可以用于执行图5所示的方法中的相应步骤;则:
获取单元701,配置为获取沉浸媒体的打包文件,所述沉浸媒体包括N个分块视频,所述N个分块视频分别被封装至N个轨道中,第i个分块视频被封装在第i个轨道中;所述N个轨道属于同一个轨道组;第i个分块视频对应第i个独立编解码区域;所述打包文件至少包括第i个轨道,第i个轨道中包含第i个独立编解码区域的独立编解码区域描述数据盒,其中i,N为正整数,且i≤N;
处理单元702,配置为对所述打包文件进行解封处理得到第i个独立编解码区域的独立编解码区域描述数据盒,所述独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒;根据所述独立编解码区域描述数据盒显示所述沉浸媒体的第i个分块视频。
在一种实施方式中,获取单元701还配置为:
获取所述沉浸媒体的独立编解码区域描述信令文件,所述独立编解码区域描述信 令文件被封装于所述沉浸媒体的媒体呈现描述文件中的自适应集层级中;所述独立编解码区域描述信令文件包括第i个独立编解码区域的独立编解码区域描述数据盒的描述信息;
以及配置为,获取沉浸媒体的打包文件,如:
根据所述独立编解码区域描述信令文件获取所述沉浸媒体的打包文件。
根据本发明的一个实施例,图6及图7所示的沉浸媒体的数据处理装置中的各个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本发明的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,该沉浸媒体的数据处理装置也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。根据本申请的另一个实施例,可以通过在包括中央处理单元(Central Processing Units,CPU)、随机存取存储介质(Random Access Memory,RAM)、只读存储介质(Read-Only Memory,ROM)等处理元件和存储元件的例如计算机的通用计算设备上运行计算机程序,该计算机程序被执行时用于实现本申请实施例提供的沉浸媒体的数据处理方法;所述计算机程序可以记载于例如计算机可读记录介质上,并通过计算机可读记录介质装载于上述计算设备中,并在其中运行。
基于同一发明构思,本申请实施例中提供沉浸媒体的数据处理装置解决问题的原理与有益效果与本申请方法实施例中沉浸媒体的数据处理方法解决问题的原理和有益效果相似,可以参见方法的实施的原理和有益效果,为简洁描述,在这里不再赘述。
图8示出了本申请实施例提供的一种内容制作设备的结构示意图;该内容制作设备可以是指沉浸媒体的提供者所使用的计算机设备,该计算机设备可以是终端(如PC、智能移动设备(如智能手机)等)或服务器。如图8所示,该内容制作设备包括捕获设备801、处理器802、存储器803和发射器804。其中:
捕获设备801配置为采集现实世界的声音-视觉场景获得沉浸媒体的原始数据(包括在时间和空间上保持同步的音频内容和视频内容)。该捕获设备801可以包括但不限于:音频设备、摄像设备及传感设备。其中,音频设备可以包括音频传感器、麦克风等。摄像设备可以包括普通摄像头、立体摄像头、光场摄像头等。传感设备可以包括激光设备、雷达设备等。
处理器802(或CPU)是内容制作设备的处理核心,该处理器802适于实现一条或多条程序指令,适于加载并执行一条或多条程序指令从而实现图2或图3所示的沉浸媒体的数据处理方法的流程。
存储器803是内容制作设备中的记忆设备,配置为存放程序和媒体资源。可以理解的是,此处的存储器803既可以包括内容制作设备中的内置存储介质,当然也可以包括内容制作设备所支持的扩展存储介质。需要说明的是,存储器可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器;还可以是至少一个位于远离前述处理器的存储器。存储器提供存储空间,该存储空间用于存储内容制作设备的操作系统。并且,在该存储空间中还用于存储计算机程序,该计算机程序包括程序指令,且该程序指令适于被处理器调用并执行,以用来执行沉浸媒体的数据处理方法的各步骤。另外,存储器803还可配置为存储经处理器处理后形成的沉浸媒体文件,该沉浸媒体文件包括媒体文件资源和媒体呈现描述信息。
发射器804配置为实现内容制作设备与其他设备的传输交互,例如实现内容制作设备与内容播放设备之间关于进行沉浸媒体的传输。即内容制作设备通过发射器804 来向内容播放设备传输沉浸媒体的相关媒体资源。
再请参见图8,处理器802可包括转换器821、编码器822和封装器823;其中:
转换器821配置为对捕获到的视频内容进行一系列转换处理,使视频内容成为适合被执行沉浸媒体的视频编码的内容。转换处理可包括:拼接和投影,在实际应用中,转换处理还包括区域封装。转换器821可以将捕获到的3D视频内容转换为2D图像,并提供给编码器进行视频编码。
编码器822配置为对捕获到的音频内容进行音频编码形成沉浸媒体的音频码流。还用于对转换器821转换得到的2D图像进行视频编码,得到视频码流。
封装器823配置为将音频码流和视频码流按照沉浸媒体的文件格式(如ISOBMFF)封装在文件容器中形成沉浸媒体的媒体文件资源,该媒体文件资源可以是媒体文件或媒体片段形成沉浸媒体的媒体文件;并按照沉浸媒体的文件格式要求采用媒体呈现描述信息记录该沉浸媒体的媒体文件资源的元数据。封装器处理得到的沉浸媒体的封装文件会保存在存储器中,并按需提供给内容播放设备进行沉浸媒体的呈现。
在一个示例性实施例中,沉浸媒体包括N个分块视频,所述N个分块视频分别被封装至N个轨道中,第i个分块视频被封装在第i个轨道中;所述N个轨道属于同一个轨道组;处理器802(即处理器包含的各器件)通过调用存储器中的一条或多条指令来执行图2所示的沉浸媒体的数据处理方法的各步骤。在实际应用中,存储器803存储有一条或多条第一指令,该一条或多条第一指令适于由处理器802加载并执行如下步骤:
获取沉浸媒体的第i个独立编解码区域的独立编解码区域描述数据盒,所述第i个独立编解码区域对应所述第i个分块视频;所述独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒,其中i,N为正整数,且i≤N;
根据所述独立编解码区域描述数据盒显示所述沉浸媒体的第i个分块视频。
在另一个示例性实施例中,处理器通过调用存储器803中的一条或多条指令来执行图3所示的沉浸媒体的数据处理方法的各步骤。在实际应用中,存储器存储有一条或多条第二指令,该一条或多条第二指令适于由处理器802加载并执行如下步骤:
将沉浸媒体划分为N个分块视频;
分别将N个分块视频封装至N个轨道中,第i个分块视频被封装在第i个轨道中;第i个分块视频对应第i个独立编解码区域,其中i,N为正整数,且i≤N;所述N个轨道属于同一个轨道组;
根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒,所述独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒。
图9示出了本申请一个示例性实施例提供的一种内容播放设备的结构示意图;该内容播放设备可以是指沉浸媒体的使用者所使用的计算机设备,该计算机设备可以是终端(如PC、智能移动设备(如智能手机)、VR设备(如VR头盔、VR眼镜等))。如图9所示,该内容播放设备包括接收器901、处理器902、存储器903、显示/播放装置904。其中:
接收器901配置为实现解码与其他设备的传输交互,例如实现内容制作设备与内容播放设备之间关于进行沉浸媒体的传输。即内容播放设备通过接收器901来接收内容制作设备传输沉浸媒体的相关媒体资源。
处理器902(或称CPU)是内容制作设备的处理核心,该处理器902适于实现一条或多条程序指令,适于加载并执行一条或多条程序指令从而实现图2或图5所示的 沉浸媒体的数据处理方法的流程。
存储器903是内容播放设备中的记忆设备,配置为存放程序和媒体资源。可以理解的是,此处的存储器903既可以包括内容播放设备中的内置存储介质,当然也可以包括内容播放设备所支持的扩展存储介质。需要说明的是,存储器903可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器;还可以是至少一个位于远离前述处理器的存储器。存储器903提供存储空间,该存储空间用于存储内容播放设备的操作系统。并且,在该存储空间中还用于存储计算机程序,该计算机程序包括程序指令,且该程序指令适于被处理器调用并执行,以用来执行沉浸媒体的数据处理方法的各步骤。另外,存储器903还可配置为存储经处理器处理后形成的沉浸媒体的三维图像、三维图像对应的音频内容及该三维图像和音频内容渲染所需的信息等。
显示/播放装置904配置为输出渲染得到的声音和三维图像。
再请参见图9,处理器902可包括解析器921、解码器922、转换器923和渲染器924;其中:
解析器921配置为对来自内容制作设备的渲染媒体的封装文件进行文件解封装,如按照沉浸媒体的文件格式要求对媒体文件资源进行解封装,得到音频码流和视频码流;并将该音频码流和视频码流提供给解码器922。
解码器922配置为对音频码流进行音频解码,得到音频内容并提供给渲染器进行音频渲染。另外,解码器922对视频码流进行解码得到2D图像。根据媒体呈现描述信息提供的元数据,如果该元数据指示沉浸媒体执行过区域封装过程,该2D图像是指封装图像;如果该元数据指示沉浸媒体未执行过区域封装过程,则该平面图像是指投影图像。
转换器923配置为将2D图像转换为3D图像。如果沉浸媒体执行过区域封装过程,转换器923还会先将封装图像进行区域解封装得到投影图像。再对投影图像进行重建处理得到3D图像。如果渲染媒体未执行过区域封装过程,转换器923会直接将投影图像重建得到3D图像。
渲染器924配置为对沉浸媒体的音频内容和3D图像进行渲染。如根据媒体呈现描述信息中与渲染、视窗相关的元数据对音频内容及3D图像进行渲染,渲染完成交由显示/播放装置进行输出。
在一个示例性实施例中,沉浸媒体包括N个分块视频,所述N个分块被封装至N个轨道中,第i个分块视频被封装在第i个轨道中;所述N个轨道属于同一个轨道组;处理器902(即处理器包含的各器件)通过调用存储器中的一条或多条指令来执行图2所示的沉浸媒体的数据处理方法的各步骤。在实际应用中,存储器存储有一条或多条第一指令,该一条或多条第一指令适于由处理器902加载并执行如下步骤:
获取沉浸媒体的第i个独立编解码区域的独立编解码区域描述数据盒,所述第i个独立编解码区域对应所述第i个分块视频;所述独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒,其中i,N为正整数,且i≤N;
根据所述独立编解码区域描述数据盒显示所述沉浸媒体的第i个分块视频。
在另一个示例性实施例中,处理器902(即处理器包含的各器件)通过调用存储器中的一条或多条指令来执行图5所示的沉浸媒体的数据处理方法的各步骤。在实际应用中,存储器903存储有一条或多条第二指令,该一条或多条第二指令适于由处理器902加载并执行如下步骤:
获取沉浸媒体的打包文件,所述沉浸媒体包括N个分块视频,所述N个分块视频分别被封装至N个轨道中,第i个分块视频被封装在第i个轨道中;所述N个轨道 属于同一个轨道组;第i个分块视频对应第i个独立编解码区域;所述打包文件至少包括第i个轨道,第i个轨道中包含第i个独立编解码区域的独立编解码区域描述数据盒,其中i,N为正整数,且i≤N;
对所述打包文件进行解封处理得到第i个独立编解码区域的独立编解码区域描述数据盒,所述独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒;
根据所述独立编解码区域描述数据盒显示所述沉浸媒体的第i个分块视频。
基于同一发明构思,本申请实施例中提供的内容制作设备及内容播放设备解决问题的原理与有益效果与本申请方法实施例中沉浸媒体的处理方法解决问题的原理和有益效果相似,可以参见方法的实施的原理和有益效果,为简洁描述,在这里不再赘述。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (27)

  1. 一种沉浸媒体的数据处理方法,所述沉浸媒体包括N个分块视频,所述N个分块视频分别被封装至N个轨道中,第i个分块视频被封装在第i个轨道中;所述N个轨道属于同一个轨道组;所述方法包括:
    获取沉浸媒体的第i个独立编解码区域的独立编解码区域描述数据盒,所述第i个独立编解码区域对应所述第i个分块视频;所述独立编解码区域描述数据盒包括:独立编解码区域数据盒及坐标信息数据盒,其中i,N为正整数,且i≤N;
    根据所述独立编解码区域描述数据盒,显示所述沉浸媒体的第i个分块视频。
  2. 如权利要求1所述的方法,其中,所述坐标信息数据盒包括第i个独立编解码区域的坐标系标识字段;所述坐标系标识字段用于指示第i个独立编解码区域所属的坐标系;
    当所述沉浸媒体的第i个分块视频的分辨率与第j个分块视频的分辨率相同时,第i个独立编解码区域与第j个独立编解码区域属于同一坐标系,其中j为正整数,j≤N且j≠i;
    当所述沉浸媒体的第i个分块视频的分辨率与第j个分块视频的分辨率不同时,第i个独立编解码区域与第j个独立编解码区域分别属于不同的坐标系。
  3. 如权利要求2所述的方法,其中,所述坐标信息数据盒包括第i个独立编解码区域所属坐标系下的完整视频的尺寸字段;所述完整视频的尺寸字段包括所述完整视频的高度字段和所述完整视频的宽度字段;
    所述完整视频是由第i个独立编解码区域所属坐标系下所有独立编解码区域对应的分块视频组成的。
  4. 如权利要求1所述的方法,其中,所述独立编解码区域数据盒包括第i个独立编解码区域在所属坐标系中的顶点坐标字段及第i个独立编解码区域的尺寸字段;
    所述顶点坐标字段包括第i个独立编解码区域在所属坐标系中的横坐标字段和第i个独立编解码区域在所属坐标系中的纵坐标字段;
    所述尺寸字段包括第i个独立编解码区域的高度字段和第i个独立编解码区域的宽度字段。
  5. 如权利要求1所述的方法,其中,所述独立编解码区域数据盒包括第i个独立编解码区域的非独立呈现标志字段;
    当第i个独立编解码区域的非独立呈现标志字段为有效值时,指示第i个独立编解码区域与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域同时呈现。
  6. 如权利要求1所述的方法,其中,所述独立编解码区域数据盒包括第i个独立编解码区域的合流标志字段;
    当第i个独立编解码区域的合流标志字段为无效值时,指示第i个独立编解码区域所属轨道所包含的码流能够与第i个独立编解码区域所属轨道组中的其他轨道所包含的码流合并。
  7. 如权利要求1所述的方法,其中,所述独立编解码区域数据盒包括第i个独立编解码区域的轨道优先级信息标志字段;
    当所述轨道优先级信息标志字段为无效值时,指示第i个独立编解码区域所属轨道组中的各个轨道中的独立编解码区域的优先级相同;
    当所述轨道优先级信息标志字段为有效值时,所述独立编解码区域数据盒还包 括第i个独立编解码区域的轨道优先级字段,所述轨道优先级字段用于指示第i个独立编解码区域的优先级;
    所述轨道优先级字段的值越小,第i个独立编解码区域的优先级越高;
    当第i个独立编解码区域的清晰度高于第j个独立编解码区域的清晰度时,第i个独立编解码区域的优先级高于第j个独立编解码区域的优先级,其中j为正整数,j≤N且j≠i。
  8. 如权利要求1所述的方法,其中,所述独立编解码区域数据盒包括第i个独立编解码区域的轨道重叠信息标志字段;
    当所述轨道重叠信息标志字段为无效值时,指示第i个独立编解码区域在被显示时,不与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域重叠;
    当所述轨道重叠信息标志字段为有效值时,所述独立编解码区域数据盒还包括第i个独立编解码区域的背景标志字段;
    当所述背景标志字段为无效值时,指示第i个独立编解码区域作为第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域的前景画面被显示;
    当所述背景标志字段为有效值时,指示第i个独立编解码区域作为第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域的背景画面被显示。
  9. 如权利要求8所述的方法,其中,当所述背景标志字段为有效值时,所述独立编解码区域数据盒还包括第i个独立编解码区域的透明度字段,所述透明度字段用于指示第i个独立编解码区域作为背景画面被显示时的透明度;所述透明度字段的取值大于等于0;
    若所述透明字段的值等于0,则第i个独立编解码区域被显示为透明背景画面;
    若所述透明字段的值大于0,则第i个独立编解码区域被显示为非透明背景画面。
  10. 如权利要求1所述的方法,其中,所述方法还包括:
    获取独立编解码区域描述信令文件,所述独立编解码区域描述信令文件被封装于所述沉浸媒体的媒体呈现描述文件中的自适应集层级中;
    所述独立编解码区域描述信令文件包括第i个独立编解码区域的独立编解码区域描述数据盒的描述信息。
  11. 一种沉浸媒体的数据处理方法,所述方法包括:
    将沉浸媒体划分为N个分块视频;
    分别将N个分块视频封装至N个轨道中,第i个分块视频被封装在第i个轨道中;第i个分块视频对应第i个独立编解码区域,其中i,N为正整数,且i≤N;所述N个轨道属于同一个轨道组;
    根据第i个分块视频的封装过程,生成第i个独立编解码区域的独立编解码区域描述数据盒,所述独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒。
  12. 如权利要求11所述的方法,其中,所述坐标信息数据盒包括第i个独立编解码区域的坐标系标识字段;所述根据第i个分块视频的封装过程,生成第i个独立编解码区域的独立编解码区域描述数据盒,包括:
    根据第i个分块视频的分辨率,确定第i个独立编解码区域所属的坐标系;
    根据确定的所述第i个独立编解码区域所属的坐标系,配置第i个独立编解码区域的坐标系标识字段的值。
  13. 如权利要求12所述的方法,其中,所述坐标信息数据盒包括第i个独立编解码区域所属坐标系下的完整视频的尺寸字段;所述完整视频的尺寸字段包括所述完整视频的高度字段和所述完整视频的宽度字段;
    所述根据第i个分块视频的封装过程,生成第i个独立编解码区域的独立编解码区域描述数据盒,包括:
    获取第i个独立编解码区域所属坐标系下所有独立编解码区域对应的分块视频组成的完整视频的高度和宽度;
    将获取到的所述完整视频的高度配置为所述完整视频的高度字段的值,以及将获取到的所述完整视频的宽度配置为所述完整视频的宽度字段的值。
  14. 如权利要求11所述的方法,其中,所述独立编解码区域数据盒包括第i个独立编解码区域在所属坐标系中的顶点坐标字段及第i个独立编解码区域的尺寸字段,所述顶点坐标字段包括第i个独立编解码区域在所属坐标系中的横坐标字段和第i个独立编解码区域在所属坐标系中的纵坐标字段,所述尺寸字段包括第i个独立编解码区域的高度字段和第i个独立编解码区域的宽度字段;
    所述根据第i个分块视频的封装过程,生成第i个独立编解码区域的独立编解码区域描述数据盒,包括:
    获取第i个独立编解码区域在所属坐标系中顶点的横坐标的值和纵坐标的值;
    将所述获取到的第i个独立编解码区域在所属坐标系中横坐标的值配置为第i个独立编解码区域在所属坐标系中的横坐标字段的值,以及将所述获取到的第i个独立编解码区域在所属坐标系中纵坐标的值配置为第i个独立编解码区域在所属坐标系中的纵坐标字段的值;以及
    获取第i个独立编解码区域的高度和宽度;
    将所述获取到的第i个独立编解码区域的高度配置为第i个独立编解码区域的高度字段的值,以及将所述获取到的第i个独立编解码区域的宽度配置为第i个独立编解码区域的宽度字段的值。
  15. 如权利要求11所述的方法,其中,所述独立编解码区域数据盒包括第i个独立编解码区域的非独立呈现标志字段;
    所述根据第i个分块视频的封装过程,生成第i个独立编解码区域的独立编解码区域描述数据盒,包括:
    若第i个独立编解码区域与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域同时呈现,则将第i个独立编解码区域的非独立呈现标志字段配置为有效值。
  16. 如权利要求11所述的方法,其中,所述独立编解码区域数据盒包括第i个独立编解码区域的合流标志字段;
    所述根据第i个分块视频的封装过程,生成第i个独立编解码区域的独立编解码区域描述数据盒,包括:
    若第i个独立编解码区域所属轨道所包含的码流能够与第i个独立编解码区域所属轨道组中的其他轨道所包含的码流合并,则将第i个独立编解码区域的合流标志字段配置为无效值。
  17. 如权利要求11所述的方法,其中,所述独立编解码区域数据盒包括第i个独立编解码区域的轨道优先级信息标志字段;
    所述根据第i个分块视频的封装过程,生成第i个独立编解码区域的独立编解码区域描述数据盒,包括:
    若第i个独立编解码区域所属轨道组中的各个轨道中的独立编解码区域的优先级相同,则将所述轨道优先级信息标志字段配置为无效值;
    若第i个独立编解码区域所属轨道组中的各个轨道中的独立编解码区域的优先级不同,则将所述轨道优先级信息标志字段配置为有效值;
    在所述轨道优先级信息标志字段被配置为有效值的情况下,所述独立编解码区域数据盒还包括第i个独立编解码区域的轨道优先级字段;
    所述方法还包括:将第i个独立编解码区域的优先级配置为所述轨道优先级字段的值。
  18. 如权利要求11所述的方法,其中,所述独立编解码区域数据盒包括第i个独立编解码区域的轨道重叠信息标志字段;
    所述根据第i个分块视频的封装过程,生成第i个独立编解码区域的独立编解码区域描述数据盒,包括:
    若要求第i个独立编解码区域不与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域重叠显示,则将所述轨道重叠信息标志字段配置为无效值;
    若要求第i个独立编解码区域与第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域重叠显示,则将所述轨道重叠信息标志字段配置为有效值;
    在所述轨道重叠信息标志字段被配置为有效值的情况下,所述独立编解码区域数据盒还包括第i个独立编解码区域的背景标志字段;
    所述方法还包括:
    若要求第i个独立编解码区域作为第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域的前景画面被显示,则将所述背景标志字段配置为无效值;
    若要求第i个独立编解码区域作为第i个独立编解码区域所属轨道组中的其他轨道中的独立编解码区域的背景画面被显示,则将所述背景标志字段配置为有效值。
  19. 根据权利要求18所述的方法,其中,在所述背景标志字段被配置为有效值的情况下,所述独立编解码区域数据盒还包括第i个独立编解码区域的透明度字段,所述透明度字段的取值大于等于0;
    所述方法还包括:
    若第i个独立编解码区域被要求显示为透明背景画面,则将所述透明字段的值配置为0;
    若第i个独立编解码区域被要求显示为非透明背景画面,则根据第i个独立编解码区域的透明度配置所述透明字段的值。
  20. 如权利要求11所述的方法,其中,所述方法还包括:
    根据所述沉浸媒体的N个分块视频的封装过程生成独立编解码区域描述信令文件,所述独立编解码区域描述信令文件被封装于所述沉浸媒体的媒体呈现描述文件中的自适应集层级中;
    所述独立编解码区域描述信令文件包括第i个独立编解码区域的独立编解码区域描述数据盒的描述信息。
  21. 一种沉浸媒体的数据处理方法,所述方法包括:
    获取沉浸媒体的打包文件,所述沉浸媒体包括N个分块视频,所述N个分块视频分别被封装至N个轨道中,第i个分块视频被封装在第i个轨道中;所述N个轨道属于同一个轨道组;第i个分块视频对应第i个独立编解码区域;所述打包文件至少包括第i个轨道,第i个轨道中包含第i个独立编解码区域的独立编解码区域描述数据盒,其中i,N为正整数,且i≤N;
    对所述打包文件进行解封处理得到第i个独立编解码区域的独立编解码区域描述数据盒,所述独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒;
    根据所述独立编解码区域描述数据盒,显示所述沉浸媒体的第i个分块视频。
  22. 如权利要求21所述的方法,其中,所述方法还包括:
    获取所述沉浸媒体的独立编解码区域描述信令文件,所述独立编解码区域描述信令文件被封装于所述沉浸媒体的媒体呈现描述文件中的自适应集层级中;所述独立编解码区域描述信令文件包括第i个独立编解码区域的独立编解码区域描述数据盒的描述信息;
    所述获取沉浸媒体的打包文件,包括:根据所述独立编解码区域描述信令文件获取所述沉浸媒体的打包文件。
  23. 一种沉浸媒体的数据处理装置,包括:
    获取单元,配置为获取沉浸媒体的第i个独立编解码区域的独立编解码区域描述数据盒,所述沉浸媒体被划分为N个分块视频,第i个分块视频对应第i个独立编解码区域;所述独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒,其中i,N为正整数,且i≤N;
    处理单元,配置为根据所述独立编解码区域描述数据盒显示所述沉浸媒体的第i个分块视频。
  24. 一种沉浸媒体的数据处理装置,包括:
    处理单元,配置为将沉浸媒体划分为N个分块视频;以及,
    配置为分别将N个分块视频封装至N个轨道中,第i个分块视频被封装在第i个轨道中;第i个分块视频对应第i个独立编解码区域,其中i,N为正整数,且i≤N;所述N个轨道属于同一个轨道组;
    配置为根据第i个分块视频的封装过程生成第i个独立编解码区域的独立编解码区域描述数据盒,所述独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒。
  25. 一种沉浸媒体的数据处理装置,包括:
    获取单元,配置为获取沉浸媒体的打包文件,所述沉浸媒体包括N个分块视频,所述N个分块视频分别被封装至N个轨道中,第i个分块视频被封装在第i个轨道中;所述N个轨道属于同一个轨道组;第i个分块视频对应第i个独立编解码区域;所述打包文件至少包括第i个轨道,第i个轨道中包含第i个独立编解码区域的独立编解码区域描述数据盒,其中i,N为正整数,且i≤N;
    处理单元,配置为对所述打包文件进行解封处理得到第i个独立编解码区域的独立编解码区域描述数据盒,所述独立编解码区域描述数据盒包括独立编解码区域数据盒及坐标信息数据盒;根据所述独立编解码区域描述数据盒显示所述沉浸媒体的第i个分块视频。
  26. 一种沉浸媒体的数据处理设备,包括:一个或多个处理器和一个或多个存储器;其中,
    所述一个或多个存储器中存储有至少一条程序代码,所述至少一条程序代码由所述一个或多个处理器加载并执行以实现如权利要求1-22任一项所述的沉浸媒体的数据处理方法。
  27. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行以实现如权利要求1-22任一项所述的沉浸媒体的数据处理方法。
PCT/CN2021/085907 2020-06-04 2021-04-08 沉浸媒体的数据处理方法、装置、设备及计算机存储介质 WO2021244132A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21818517.1A EP4124046A4 (en) 2020-06-04 2021-04-08 DATA PROCESSING METHOD, APPARATUS AND APPARATUS FOR IMMERSIVE MEDIA AND COMPUTER STORAGE MEDIUM
US17/731,162 US20220272424A1 (en) 2020-06-04 2022-04-27 Data processing for immersive media

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010501322.X 2020-06-04
CN202010501322.XA CN113766271B (zh) 2020-06-04 2020-06-04 一种沉浸媒体的数据处理方法、装置及设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/731,162 Continuation US20220272424A1 (en) 2020-06-04 2022-04-27 Data processing for immersive media

Publications (1)

Publication Number Publication Date
WO2021244132A1 true WO2021244132A1 (zh) 2021-12-09

Family

ID=78783828

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/085907 WO2021244132A1 (zh) 2020-06-04 2021-04-08 沉浸媒体的数据处理方法、装置、设备及计算机存储介质

Country Status (4)

Country Link
US (1) US20220272424A1 (zh)
EP (1) EP4124046A4 (zh)
CN (2) CN115022715B (zh)
WO (1) WO2021244132A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116456166A (zh) * 2022-01-10 2023-07-18 腾讯科技(深圳)有限公司 媒体数据的数据处理方法及相关设备
CN115396646B (zh) * 2022-08-22 2024-04-26 腾讯科技(深圳)有限公司 一种点云媒体的数据处理方法及相关设备

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9208577B2 (en) * 2012-06-25 2015-12-08 Adobe Systems Incorporated 3D tracked point visualization using color and perspective size
CN105704581A (zh) * 2016-01-25 2016-06-22 互联天下科技发展(深圳)有限公司 基于mp4文件格式的http实时视频传输方法
CN107534801A (zh) * 2015-02-10 2018-01-02 诺基亚技术有限公司 用于处理图像序列轨道的方法、装置和计算机程序产品
CN107948685A (zh) * 2016-10-13 2018-04-20 腾讯科技(北京)有限公司 信息推广方法及信息推广装置
CN110225371A (zh) * 2016-01-27 2019-09-10 上海交通大学 一种基于媒体自身属性以支持空间分块的存储与传输方法
WO2019199024A1 (ko) * 2018-04-10 2019-10-17 엘지전자 주식회사 360 영상 데이터의 서브픽처 기반 처리 방법 및 그 장치
TW201946464A (zh) * 2018-04-12 2019-12-01 新加坡商 聯發科技(新加坡)私人有限公司 用於提供二維空間關係的方法以及裝置
US20190379884A1 (en) * 2018-06-06 2019-12-12 Lg Electronics Inc. Method and apparatus for processing overlay media in 360 degree video system
CN110771162A (zh) * 2017-06-23 2020-02-07 联发科技股份有限公司 用轨道分组获取合成轨道之方法及装置
CN111133763A (zh) * 2017-09-26 2020-05-08 Lg 电子株式会社 360视频系统中的叠加处理方法及其设备

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5059867B2 (ja) * 2006-10-19 2012-10-31 エルジー エレクトロニクス インコーポレイティド エンコード方法及び装置並びにデコード方法及び装置
JP5392199B2 (ja) * 2010-07-09 2014-01-22 ソニー株式会社 画像処理装置および方法
US10791315B2 (en) * 2013-01-04 2020-09-29 Qualcomm Incorporated Signaling of spatial resolution of depth views in multiview coding file format
GB2519746B (en) * 2013-10-22 2016-12-14 Canon Kk Method, device and computer program for encapsulating scalable partitioned timed media data
KR102170550B1 (ko) * 2016-05-24 2020-10-29 노키아 테크놀로지스 오와이 미디어 콘텐츠를 인코딩하는 방법, 장치 및 컴퓨터 프로그램
JP7022077B2 (ja) * 2016-05-25 2022-02-17 コニンクリーケ・ケイピーエヌ・ナムローゼ・フェンノートシャップ 空間的にタイリングされた全方位ビデオのストリーミング
CN106210549A (zh) * 2016-09-07 2016-12-07 传线网络科技(上海)有限公司 全景视频的播放方法及装置
GB2560921B (en) * 2017-03-27 2020-04-08 Canon Kk Method and apparatus for encoding media data comprising generated content
US20190373245A1 (en) * 2017-03-29 2019-12-05 Lg Electronics Inc. 360 video transmission method, 360 video reception method, 360 video transmission device, and 360 video reception device
GB2563865A (en) * 2017-06-27 2019-01-02 Canon Kk Method, device, and computer program for transmitting media content
US11051040B2 (en) * 2017-07-13 2021-06-29 Mediatek Singapore Pte. Ltd. Method and apparatus for presenting VR media beyond omnidirectional media
CN111587577A (zh) * 2018-01-12 2020-08-25 夏普株式会社 用于针对虚拟现实应用程序发送信号通知子图片组合信息的系统和方法
US10397518B1 (en) * 2018-01-16 2019-08-27 Amazon Technologies, Inc. Combining encoded video streams
US10939086B2 (en) * 2018-01-17 2021-03-02 Mediatek Singapore Pte. Ltd. Methods and apparatus for encoding and decoding virtual reality content
US10944977B2 (en) * 2018-04-03 2021-03-09 Mediatek Singapore Pte. Ltd. Methods and apparatus for encoding and decoding overlay compositions
WO2019203574A1 (ko) * 2018-04-17 2019-10-24 엘지전자 주식회사 360 비디오 데이터의 서브픽처 기반 처리 방법 및 그 장치
GB2575074B (en) * 2018-06-27 2022-09-28 Canon Kk Encapsulating video content with an indication of whether a group of tracks collectively represents a full frame or a part of a frame
US11509878B2 (en) * 2018-09-14 2022-11-22 Mediatek Singapore Pte. Ltd. Methods and apparatus for using track derivations for network based media processing
WO2020189903A1 (ko) * 2019-03-20 2020-09-24 엘지전자 주식회사 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9208577B2 (en) * 2012-06-25 2015-12-08 Adobe Systems Incorporated 3D tracked point visualization using color and perspective size
CN107534801A (zh) * 2015-02-10 2018-01-02 诺基亚技术有限公司 用于处理图像序列轨道的方法、装置和计算机程序产品
CN105704581A (zh) * 2016-01-25 2016-06-22 互联天下科技发展(深圳)有限公司 基于mp4文件格式的http实时视频传输方法
CN110225371A (zh) * 2016-01-27 2019-09-10 上海交通大学 一种基于媒体自身属性以支持空间分块的存储与传输方法
CN107948685A (zh) * 2016-10-13 2018-04-20 腾讯科技(北京)有限公司 信息推广方法及信息推广装置
CN110771162A (zh) * 2017-06-23 2020-02-07 联发科技股份有限公司 用轨道分组获取合成轨道之方法及装置
CN111133763A (zh) * 2017-09-26 2020-05-08 Lg 电子株式会社 360视频系统中的叠加处理方法及其设备
WO2019199024A1 (ko) * 2018-04-10 2019-10-17 엘지전자 주식회사 360 영상 데이터의 서브픽처 기반 처리 방법 및 그 장치
TW201946464A (zh) * 2018-04-12 2019-12-01 新加坡商 聯發科技(新加坡)私人有限公司 用於提供二維空間關係的方法以及裝置
US20190379884A1 (en) * 2018-06-06 2019-12-12 Lg Electronics Inc. Method and apparatus for processing overlay media in 360 degree video system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4124046A4

Also Published As

Publication number Publication date
EP4124046A4 (en) 2023-11-29
CN113766271A (zh) 2021-12-07
CN115022715B (zh) 2023-07-25
CN115022715A (zh) 2022-09-06
CN113766271B (zh) 2022-07-12
EP4124046A1 (en) 2023-01-25
US20220272424A1 (en) 2022-08-25

Similar Documents

Publication Publication Date Title
KR102133848B1 (ko) 360 비디오를 전송하는 방법, 360 비디오를 수신하는 방법, 360 비디오 전송 장치, 360 비디오 수신 장치
JP6984841B2 (ja) イメージ処理方法、端末およびサーバ
TW201838407A (zh) 適應性擾動立方體之地圖投影
KR20170132098A (ko) 레거시 및 몰입형 렌더링 디바이스를 위한 몰입형 비디오를 포맷팅하는 방법, 장치 및 스트림
TW201742435A (zh) 具有用於360度視訊之透鏡失真校正之魚眼呈現
CA3015474A1 (en) Truncated square pyramid geometry and frame packing structure for representing virtual reality video content
KR20230153532A (ko) 포인트 클라우드 데이터 부호화 장치, 포인트 클라우드 데이터 부호화 방법, 포인트 클라우드 데이터 복호화 장치 및 포인트 클라우드 데이터 복호화 방법
US20220272424A1 (en) Data processing for immersive media
WO2023029858A1 (zh) 点云媒体文件的封装与解封装方法、装置及存储介质
US20240015197A1 (en) Method, apparatus and device for encapsulating media file, and storage medium
CN115514972A (zh) 视频编解码的方法、装置、电子设备及存储介质
US20230025664A1 (en) Data processing method and apparatus for immersive media, and computer-readable storage medium
CN116456166A (zh) 媒体数据的数据处理方法及相关设备
WO2021244116A1 (zh) 沉浸媒体的数据处理方法、装置、设备及存储介质
TWI796989B (zh) 沉浸媒體的數據處理方法、裝置、相關設備及儲存媒介
US20230421774A1 (en) Packaging and unpackaging method and apparatus for point cloud media file, and storage medium
WO2023016293A1 (zh) 自由视角视频的文件封装方法、装置、设备及存储介质
CN113497928B (zh) 一种沉浸媒体的数据处理方法及相关设备
WO2023169004A1 (zh) 点云媒体的数据处理方法、装置、设备及介质
CN115883871A (zh) 媒体文件封装与解封装方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21818517

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE