WO2023142127A1 - 编解码方法、装置、设备、及存储介质 - Google Patents

编解码方法、装置、设备、及存储介质 Download PDF

Info

Publication number
WO2023142127A1
WO2023142127A1 PCT/CN2022/075260 CN2022075260W WO2023142127A1 WO 2023142127 A1 WO2023142127 A1 WO 2023142127A1 CN 2022075260 W CN2022075260 W CN 2022075260W WO 2023142127 A1 WO2023142127 A1 WO 2023142127A1
Authority
WO
WIPO (PCT)
Prior art keywords
mosaic
hybrid
flag
heterogeneous
graph
Prior art date
Application number
PCT/CN2022/075260
Other languages
English (en)
French (fr)
Inventor
虞露
朱志伟
戴震宇
Original Assignee
浙江大学
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学, Oppo广东移动通信有限公司 filed Critical 浙江大学
Priority to PCT/CN2022/075260 priority Critical patent/WO2023142127A1/zh
Publication of WO2023142127A1 publication Critical patent/WO2023142127A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks

Definitions

  • the present application relates to the technical field of image processing, and in particular to a codec method, device, equipment, and storage medium.
  • Visual Expressions with different formats may appear in the same scene.
  • media object For example, in the same 3D scene, the scene background and some characters and objects are expressed in video, and another part of the characters is expressed in 3D point cloud or 3D grid.
  • multi-viewpoint video coding, point cloud coding, and grid coding when compressing and encoding, can maintain the effective information of the original expression format better than all projections into multi-viewpoint video coding, improve the quality of the viewing window rendered during viewing, and improve The overall efficiency of code rate-quality.
  • the current encoding and decoding technology is to separately encode and decode multi-view video, point cloud coding and grid grid.
  • a large number of codecs needs to be called, which makes the encoding and decoding cost high.
  • Embodiments of the present application provide a codec method, device, device, and storage medium, so as to reduce the number of codecs called in the codec process and reduce codec cost.
  • the present application provides an encoding method, including:
  • N isomorphic mosaic graphs, at least two of the plurality of visual media contents correspond to different expression formats, and the N is a positive integer greater than 1;
  • the embodiment of the present application provides a decoding method, including:
  • Multiple reconstructed visual media contents are obtained according to the N reconstructed mosaic images, and at least two of the multiple reconstructed visual media contents correspond to different expression formats.
  • the present application provides an encoding device configured to execute the method in the foregoing first aspect or its various implementation manners.
  • the prediction device includes a functional unit configured to execute the method in the above first aspect or each implementation manner thereof.
  • the present application provides a decoding device configured to execute the method in the above second aspect or its various implementation manners.
  • the prediction device includes a functional unit configured to execute the method in the above second aspect or each implementation manner thereof.
  • an encoder including a processor and a memory.
  • the memory is used to store a computer program
  • the processor is used to call and run the computer program stored in the memory, so as to execute the method in the above first aspect or its various implementations.
  • a decoder including a processor and a memory.
  • the memory is used to store a computer program, and the processor is used to invoke and run the computer program stored in the memory, so as to execute the method in the above second aspect or its various implementations.
  • a codec system including an encoder and a decoder.
  • the encoder is configured to execute the method in the above first aspect or its various implementations
  • the decoder is configured to execute the method in the above second aspect or its various implementations.
  • the chip includes: a processor, configured to call and run a computer program from the memory, so that the device installed with the chip executes any one of the above-mentioned first to second aspects or any of the implementations thereof. method.
  • a computer-readable storage medium for storing a computer program, and the computer program causes a computer to execute any one of the above-mentioned first to second aspects or the method in each implementation manner thereof.
  • a computer program product including computer program instructions, the computer program instructions cause a computer to execute any one of the above first to second aspects or the method in each implementation manner.
  • a computer program which, when run on a computer, causes the computer to execute any one of the above first to second aspects or the methods in each implementation.
  • a code stream is provided, and the code stream is generated based on the method in the first aspect above.
  • splicing mosaic images corresponding to visual media content in different expression formats into a heterogeneous hybrid mosaic image for example, splicing multi-viewpoint video mosaic images and point cloud mosaic images into a heterogeneous hybrid mosaic image Codec, which minimizes the number of two-dimensional video codecs such as HEVC, VVC, AVC, and AVS that need to be called, reduces the cost of codec, and improves ease of use.
  • FIG. 1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application
  • FIG. 2A is a schematic block diagram of a video encoder involved in an embodiment of the present application.
  • FIG. 2B is a schematic block diagram of a video decoder involved in an embodiment of the present application.
  • Fig. 3A is the organization and expression frame diagram of multi-viewpoint video data
  • Fig. 3B is a schematic diagram of generating a spliced image of multi-viewpoint video data
  • Fig. 3C is an organization and expression frame diagram of point cloud data
  • 3D to 3F are schematic diagrams of different types of point cloud data
  • Fig. 4 is the coding schematic diagram of multi-viewpoint video
  • Fig. 5 is the decoding schematic diagram of multi-viewpoint video
  • FIG. 6 is a schematic flowchart of an encoding method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an encoding process provided by an embodiment of the present application.
  • Figure 8A is a mosaic diagram of heterogeneous mixed textures
  • Figure 8B is a mosaic diagram of heterogeneous mixed geometry and occupancy
  • FIG. 9 is a schematic diagram of a hybrid encoding process provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a syntax structure involved in the embodiment of the present application.
  • FIG. 11 is a schematic diagram of an encoding process of the present application.
  • FIG. 12 is a schematic diagram of another syntax structure involved in the embodiment of the present application.
  • Figure 13 is a schematic diagram of another encoding process of the present application.
  • FIG. 14 is a schematic diagram of another syntax structure involved in the embodiment of the present application.
  • Figure 15 is a schematic diagram of another encoding process of the present application.
  • FIG. 16 is a schematic flowchart of a decoding method provided by an embodiment of the present application.
  • FIG. 17 is a schematic diagram of a hybrid decoding process provided by an embodiment of the present application.
  • FIG. 18 is a schematic diagram of a decoding process of the present application.
  • FIG. 19 is a schematic diagram of another decoding process of the present application.
  • FIG. 20 is a schematic diagram of another decoding process of the present application.
  • Fig. 21 is a schematic block diagram of an encoding device provided by an embodiment of the present application.
  • Fig. 22 is a schematic block diagram of a decoding device provided by an embodiment of the present application.
  • Fig. 23 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • the application can be applied to the field of image codec, video codec, hardware video codec, dedicated circuit video codec, real-time video codec, etc.
  • the solution of the present application can be combined with audio and video coding standards (audio video coding standard, referred to as AVS), for example, H.264/audio video coding (audio video coding, referred to as AVC) standard, H.265/high efficiency video coding ( High efficiency video coding (HEVC for short) standard and H.266/versatile video coding (VVC for short) standard.
  • the solutions of the present application may operate in conjunction with other proprietary or industry standards, including ITU-TH.261, ISO/IECMPEG-1Visual, ITU-TH.262 or ISO/IECMPEG-2Visual, ITU-TH.263 , ISO/IECMPEG-4Visual, ITU-TH.264 (also known as ISO/IECMPEG-4AVC), including scalable video codec (SVC) and multi-view video codec (MVC) extensions.
  • SVC scalable video codec
  • MVC multi-view video codec
  • the high-degree-of-freedom immersive coding system can be roughly divided into the following steps according to the task line: data collection, data organization and expression, data coding and compression, data decoding and reconstruction, data synthesis and rendering, and finally present the target data to the user.
  • the encoding involved in the embodiment of the present application is mainly video encoding and decoding.
  • the video encoding and decoding system involved in the embodiment of the present application is first introduced with reference to FIG. 1 .
  • FIG. 1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application. It should be noted that FIG. 1 is only an example, and the video codec system in the embodiment of the present application includes but is not limited to what is shown in FIG. 1 .
  • the video codec system 100 includes an encoding device 110 and a decoding device 120 .
  • the encoding device is used to encode (can be understood as compression) the video data to generate a code stream, and transmit the code stream to the decoding device.
  • the decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
  • the encoding device 110 in the embodiment of the present application can be understood as a device having a video encoding function
  • the decoding device 120 can be understood as a device having a video decoding function, that is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120, Examples include smartphones, desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, and the like.
  • the encoding device 110 may transmit the encoded video data (such as code stream) to the decoding device 120 via the channel 130 .
  • Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120 .
  • channel 130 includes one or more communication media that enable encoding device 110 to transmit encoded video data directly to decoding device 120 in real-time.
  • encoding device 110 may modulate the encoded video data according to a communication standard and transmit the modulated video data to decoding device 120 .
  • the communication medium includes a wireless communication medium, such as a radio frequency spectrum.
  • the communication medium may also include a wired communication medium, such as one or more physical transmission lines.
  • the channel 130 includes a storage medium that can store video data encoded by the encoding device 110 .
  • the storage medium includes a variety of local access data storage media, such as optical discs, DVDs, flash memory, and the like.
  • the decoding device 120 may acquire encoded video data from the storage medium.
  • channel 130 may include a storage server that may store video data encoded by encoding device 110 .
  • the decoding device 120 may download the stored encoded video data from the storage server.
  • the storage server may store the encoded video data and may transmit the encoded video data to the decoding device 120, such as a web server (eg, for a website), a file transfer protocol (FTP) server, and the like.
  • FTP file transfer protocol
  • the encoding device 110 includes a video encoder 112 and an output interface 113 .
  • the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
  • the encoding device 110 may include a video source 111 in addition to the video encoder 112 and the input interface 113 .
  • the video source 111 may include at least one of a video capture device (for example, a video camera), a video archive, a video input interface, a computer graphics system, wherein the video input interface is used to receive video data from a video content provider, and the computer graphics system Used to generate video data.
  • a video capture device for example, a video camera
  • a video archive for example, a video archive
  • a video input interface for example, a video archive
  • video input interface for example, a video input interface
  • computer graphics system used to generate video data.
  • the video encoder 112 encodes the video data from the video source 111 to generate a code stream.
  • Video data may include one or more pictures or a sequence of pictures.
  • the code stream contains the encoding information of an image or image sequence in the form of a bit stream.
  • Encoding information may include encoded image data and associated data.
  • the associated data may include a sequence parameter set (SPS for short), a picture parameter set (PPS for short) and other syntax structures.
  • SPS sequence parameter set
  • PPS picture parameter set
  • An SPS may contain parameters that apply to one or more sequences.
  • a PPS may contain parameters applied to one or more images.
  • the syntax structure refers to a set of zero or more syntax elements arranged in a specified order in the code stream.
  • the video encoder 112 directly transmits encoded video data to the decoding device 120 via the output interface 113 .
  • the encoded video data can also be stored on a storage medium or a storage server for subsequent reading by the decoding device 120 .
  • the decoding device 120 includes an input interface 121 and a video decoder 122 .
  • the decoding device 120 may include a display device 123 in addition to the input interface 121 and the video decoder 122 .
  • the input interface 121 includes a receiver and/or a modem.
  • the input interface 121 can receive encoded video data through the channel 130 .
  • the video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .
  • the display device 123 displays the decoded video data.
  • the display device 123 may be integrated with the decoding device 120 or external to the decoding device 120 .
  • the display device 123 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
  • LCD liquid crystal display
  • plasma display a plasma display
  • OLED organic light emitting diode
  • FIG. 1 is only an example, and the technical solutions of the embodiments of the present application are not limited to FIG. 1 .
  • the technology of the present application may also be applied to one-sided video encoding or one-sided video decoding.
  • Fig. 2A is a schematic block diagram of a video encoder involved in an embodiment of the present application. It should be understood that the video encoder 200 can be used to perform lossy compression on images, and can also be used to perform lossless compression on images.
  • the lossless compression may be visually lossless compression or mathematically lossless compression.
  • the video encoder 200 can be applied to image data in luminance-chrominance (YCbCr, YUV) format.
  • the YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y means brightness (Luma), Cb (U) means blue chroma, Cr (V) means red chroma, U and V are expressed as chroma (Chroma) for describing color and saturation.
  • 4:2:0 means that every 4 pixels have 4 luminance components
  • 2 chroma components YYYYCbCr
  • 4:2:2 means that every 4 pixels have 4 luminance components
  • 4 Chroma component YYYYCbCrCbCr
  • 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
  • the video encoder 200 reads video data, and divides a frame of image into several coding tree units (coding tree units, CTUs) for each frame of image in the video data.
  • CTB can be called “Tree block", “Largest Coding unit” (LCU for short) or “coding tree block” (CTB for short).
  • LCU Large Coding unit
  • CTB coding tree block
  • Each CTU may be associated with a pixel block of equal size within the image. Each pixel may correspond to one luminance (luma) sample and two chrominance (chrominance or chroma) samples.
  • each CTU may be associated with one block of luma samples and two blocks of chroma samples.
  • a CTU size is, for example, 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32 and so on.
  • a CTU can be further divided into several coding units (Coding Unit, CU) for coding, and the CU can be a rectangular block or a square block.
  • the CU can be further divided into a prediction unit (PU for short) and a transform unit (TU for short), so that encoding, prediction, and transformation are separated, and processing is more flexible.
  • a CTU is divided into CUs in a quadtree manner, and a CU is divided into TUs and PUs in a quadtree manner.
  • the video encoder and video decoder can support various PU sizes. Assuming that the size of a specific CU is 2N ⁇ 2N, video encoders and video decoders may support 2N ⁇ 2N or N ⁇ N PU sizes for intra prediction, and support 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, NxN or similarly sized symmetric PUs for inter prediction. The video encoder and video decoder may also support asymmetric PUs of 2NxnU, 2NxnD, nLx2N, and nRx2N for inter prediction.
  • the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, and a loop filter unit 260. Decoded image cache 270 and entropy encoding unit 280. It should be noted that the video encoder 200 may include more, less or different functional components.
  • the current block may be called a current coding unit (CU) or a current prediction unit (PU).
  • a predicted block may also be called a predicted image block or an image predicted block, and a reconstructed image block may also be called a reconstructed block or an image reconstructed image block.
  • the prediction unit 210 includes an inter prediction unit 211 and an intra estimation unit 212 . Because there is a strong correlation between adjacent pixels in a video frame, the intra-frame prediction method is used in video coding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Due to the strong similarity between adjacent frames in video, the inter-frame prediction method is used in video coding and decoding technology to eliminate time redundancy between adjacent frames, thereby improving coding efficiency.
  • the inter-frame prediction unit 211 can be used for inter-frame prediction.
  • the inter-frame prediction can include motion estimation (motion estimation) and motion compensation (motion compensation). It can refer to image information of different frames.
  • the inter-frame prediction uses motion information to find reference frames from reference frames. Blocks, generate prediction blocks based on reference blocks to eliminate temporal redundancy; frames used for inter-frame prediction can be P frames and/or B frames, P frames refer to forward prediction frames, and B frames refer to bidirectional prediction frame.
  • Inter-frame prediction uses motion information to find a reference block from a reference frame, and generates a prediction block based on the reference block.
  • the motion information includes the reference frame list where the reference frame is located, the reference frame index, and the motion vector.
  • the motion vector can be an integer pixel or a sub-pixel.
  • the reference frame found according to the motion vector A block of whole pixels or sub-pixels is called a reference block.
  • Some technologies will directly use the reference block as a prediction block, and some technologies will further process the reference block to generate a prediction block. Reprocessing and generating a prediction block based on a reference block can also be understood as taking the reference block as a prediction block and then processing and generating a new prediction block based on the prediction block.
  • the intra-frame estimation unit 212 only refers to the information of the same frame of images to predict the pixel information in the current code image block for eliminating spatial redundancy.
  • a frame used for intra prediction may be an I frame.
  • the intra prediction modes used by HEVC include planar mode (Planar), DC and 33 angle modes, a total of 35 prediction modes.
  • the intra-frame modes used by VVC include Planar, DC and 65 angle modes, with a total of 67 prediction modes.
  • the intra-frame prediction will be more accurate, and it will be more in line with the demand for the development of high-definition and ultra-high-definition digital video.
  • the residual unit 220 may generate a residual block of the CU based on the pixel blocks of the CU and the prediction blocks of the PUs of the CU. For example, residual unit 220 may generate a residual block for a CU such that each sample in the residual block has a value equal to the difference between the samples in the pixel blocks of the CU, and the samples in the PUs of the CU. Corresponding samples in the predicted block.
  • Transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with TUs of a CU based on quantization parameter (QP) values associated with the CU. Video encoder 200 may adjust the degree of quantization applied to transform coefficients associated with a CU by adjusting the QP value associated with the CU.
  • QP quantization parameter
  • Inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct a residual block from the quantized transform coefficients.
  • the reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by the prediction unit 210 to generate a reconstructed image block associated with the TU. By reconstructing the sample blocks of each TU of the CU in this way, the video encoder 200 can reconstruct the pixel blocks of the CU.
  • the loop filtering unit 260 is used to process the inversely transformed and inversely quantized pixels, compensate for distortion information, and provide better references for subsequent encoded pixels. For example, deblocking filtering operations can be performed to reduce block effect.
  • the loop filtering unit 260 includes a deblocking filtering unit and a sample adaptive compensation/adaptive loop filtering (SAO/ALF) unit, wherein the deblocking filtering unit is used for deblocking, and the SAO/ALF unit Used to remove ringing effects.
  • SAO/ALF sample adaptive compensation/adaptive loop filtering
  • the decoded image buffer 270 may store reconstructed pixel blocks.
  • Inter prediction unit 211 may use reference pictures containing reconstructed pixel blocks to perform inter prediction on PUs of other pictures.
  • intra estimation unit 212 may use the reconstructed pixel blocks in decoded picture cache 270 to perform intra prediction on other PUs in the same picture as the CU.
  • Entropy encoding unit 280 may receive the quantized transform coefficients from transform/quantization unit 230 . Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.
  • Fig. 2B is a schematic block diagram of a video decoder involved in the embodiment of the present application.
  • the video decoder 300 includes: an entropy decoding unit 310 , a prediction unit 320 , an inverse quantization/transformation unit 330 , a reconstruction unit 340 , a loop filter unit 350 and a decoded image buffer 360 . It should be noted that the video decoder 300 may include more, less or different functional components.
  • the video decoder 300 can receive code streams.
  • the entropy decoding unit 310 may parse the codestream to extract syntax elements from the codestream. As part of parsing the codestream, the entropy decoding unit 310 may parse the entropy-encoded syntax elements in the codestream.
  • the prediction unit 320 , the inverse quantization/transformation unit 330 , the reconstruction unit 340 and the loop filter unit 350 can decode video data according to the syntax elements extracted from the code stream, that is, generate decoded video data.
  • the prediction unit 320 includes an inter prediction unit 321 and an intra estimation unit 322 .
  • Intra estimation unit 322 may perform intra prediction to generate a predictive block for a PU. Intra-estimation unit 322 may use an intra-prediction mode to generate a predictive block for a PU based on pixel blocks of spatially neighboring PUs. Intra estimation unit 322 may also determine the intra prediction mode for the PU from one or more syntax elements parsed from the codestream.
  • the inter prediction unit 321 can construct the first reference picture list (list 0) and the second reference picture list (list 1) according to the syntax elements parsed from the codestream. Furthermore, if the PU is encoded using inter prediction, entropy decoding unit 310 may parse the motion information for the PU. Inter prediction unit 321 may determine one or more reference blocks for the PU according to the motion information of the PU. Inter prediction unit 321 may generate a prediction block for a PU based on one or more reference blocks of the PU.
  • Inverse quantization/transform unit 330 may inverse quantize (ie, dequantize) transform coefficients associated with a TU. Inverse quantization/transform unit 330 may use QP values associated with CUs of the TU to determine the degree of quantization.
  • inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients in order to generate a residual block associated with the TU.
  • Reconstruction unit 340 uses the residual blocks associated with the TUs of the CU and the prediction blocks of the PUs of the CU to reconstruct the pixel blocks of the CU. For example, the reconstruction unit 340 may add the samples of the residual block to the corresponding samples of the prediction block to reconstruct the pixel block of the CU to obtain the reconstructed image block.
  • Loop filtering unit 350 may perform deblocking filtering operations to reduce blocking artifacts of pixel blocks associated with a CU.
  • Video decoder 300 may store the reconstructed picture of the CU in decoded picture cache 360 .
  • the video decoder 300 may use the reconstructed picture in the decoded picture buffer 360 as a reference picture for subsequent prediction, or transmit the reconstructed picture to a display device for presentation.
  • the basic process of video encoding and decoding is as follows: at the encoding end, a frame of image is divided into blocks, and for the current block, the prediction unit 210 uses intra-frame prediction or inter-frame prediction to generate a prediction block of the current block.
  • the residual unit 220 may calculate a residual block based on the predicted block and the original block of the current block, that is, a difference between the predicted block and the original block of the current block, and the residual block may also be referred to as residual information.
  • the residual block can be transformed and quantized by the transformation/quantization unit 230 to remove information that is not sensitive to human eyes, so as to eliminate visual redundancy.
  • the residual block before being transformed and quantized by the transform/quantization unit 230 may be called a time domain residual block, and the time domain residual block after being transformed and quantized by the transform/quantization unit 230 may be called a frequency residual block or a frequency-domain residual block.
  • the entropy coding unit 280 receives the quantized variation coefficients output by the variation quantization unit 230 , and may perform entropy coding on the quantized variation coefficients to output a code stream.
  • the entropy coding unit 280 can eliminate character redundancy according to the target context model and the probability information of the binary code stream.
  • the entropy decoding unit 310 can analyze the code stream to obtain the prediction information of the current block, the quantization coefficient matrix, etc., and the prediction unit 320 uses intra prediction or inter prediction for the current block based on the prediction information to generate a prediction block of the current block.
  • the inverse quantization/transformation unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block.
  • the reconstruction unit 340 adds the predicted block and the residual block to obtain a reconstructed block.
  • the reconstructed blocks form a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the block to obtain a decoded image.
  • the encoding end also needs similar operations to the decoding end to obtain the decoded image.
  • the decoded image may also be referred to as a reconstructed image, and the reconstructed image may be a subsequent frame as a reference frame for inter-frame prediction.
  • the block division information determined by the encoder as well as mode information or parameter information such as prediction, transformation, quantization, entropy coding, and loop filtering, etc., are carried in the code stream when necessary.
  • the decoding end analyzes the code stream and analyzes the existing information to determine the same block division information as the encoding end, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, so as to ensure the decoding image obtained by the encoding end It is the same as the decoded image obtained by the decoder.
  • the above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. This application is applicable to the block-based hybrid coding framework.
  • the basic process of the video codec but not limited to the framework and process.
  • the current codec methods include at least the following two:
  • Method 1 MPEG (Moving Picture Experts Group, Moving Picture Experts Group) immersive video (MPEG Immersive Video, MIV for short) technology is used to encode and decode multi-viewpoint video, and point cloud video compression (Video based Point Cloud) is used for point cloud Compression (VPCC for short) technology for encoding and decoding.
  • MPEG Motion Picture Experts Group, Moving Picture Experts Group
  • MIV Motion Picture Experts Group
  • point cloud video compression Video based Point Cloud
  • VPCC point cloud Compression
  • the MIV technology and the VPCC technology are introduced below.
  • FIG. 3A In order to reduce the transmission pixel rate while retaining scene information as much as possible, so as to ensure that there is enough information for rendering the target view, the scheme adopted by MPEG-I is shown in Figure 3A.
  • a limited number of viewpoints are selected as the basic viewpoint and as much as possible
  • To express the visible range of the scene the basic viewpoint is transmitted as a complete image, and the redundant pixels between the remaining non-basic viewpoint and the basic viewpoint are removed, that is, only effective information that is not repeatedly expressed is retained, and then the effective information is extracted as sub-block images and basic
  • the viewpoint image is reorganized to form a larger rectangular image, which is called a stitched image.
  • Figure 3A and Figure 3B show the schematic process of generating the stitched image.
  • the spliced image is sent to the codec for compression and reconstruction, and the auxiliary data related to the sub-block image splicing information is also sent to the encoder to form a code stream.
  • the encoding method of VPCC is to project the point cloud into a two-dimensional image or video, and convert the three-dimensional information into two-dimensional information encoding.
  • Figure 3C is a coding block diagram of VPCC.
  • the code stream is roughly divided into four parts.
  • the geometric code stream is the code stream generated by the geometric depth image encoding, which is used to represent the geometric information of the point cloud;
  • the attribute code stream is the code stream generated by the texture image encoding. , which is used to represent the attribute information of the point cloud;
  • the occupancy code stream is the code stream generated by the occupancy map encoding, which is used to indicate the effective area in the depth map and texture map;
  • These three types of videos are encoded and decoded by video encoders, As shown in Figure 3D to Figure 3F.
  • the auxiliary information code stream is the code stream generated by encoding the auxiliary information of the sub-block image, that is, the part related to the patchdataunit in the V3C standard, which indicates information such as
  • Method 2 Both multi-viewpoint video and point cloud are encoded and decoded using frame packing technology in Visual Volumetric Video-based Coding (V3C for short).
  • V3C Visual Volumetric Video-based Coding
  • the frame packing technology is introduced below.
  • the encoding end includes the following steps:
  • Step 1 When encoding the acquired multi-view video, after some pre-processing, generate multi-view video sub-blocks (patches), and then organize the multi-view video sub-patches to generate a multi-view video mosaic.
  • the multi-view video is input into the TIMV for packaging, and the multi-view video mosaic is output.
  • TIMV is a reference software for MIV.
  • the packaging in the embodiment of this application can be understood as splicing.
  • the multi-view video mosaic map includes a multi-view video texture mosaic map and a multi-view video geometric mosaic map, that is, only includes multi-view video sub-blocks.
  • Step 2 input the multi-view video mosaic image into the frame packer, and output the multi-view video hybrid mosaic image.
  • the multi-viewpoint video blending mosaic map includes multi-viewpoint video texture blending mosaic graph, multi-viewpoint video geometric blending mosaic graph, multi-viewpoint video texture and geometric blending mosaic graph.
  • the multi-view video mosaic is framepacked to generate a multi-view video mosaic, and each multi-view video occupies a region (region) of the multi-view video mosaic .
  • a flag pin_region_type_id_minus2 should be transmitted for each region in the code stream. This flag records the information whether the current region belongs to the multi-view video texture mosaic or the multi-view video geometric mosaic. This information needs to be used at the decoding end.
  • Step 3 use a video encoder to encode the multi-viewpoint video hybrid mosaic image to obtain a code stream.
  • the decoding end includes the following steps:
  • Step 1 During multi-view video decoding, input the obtained code stream into a video decoder for decoding to obtain a reconstructed multi-view video hybrid mosaic.
  • Step 2 input the reconstructed multi-view video mosaic into the frame unpacker, and output the reconstructed multi-view video mosaic.
  • the flag pin_region_type_id_minus2 is obtained from the code stream, and if it is determined that the pin_region_type_id_minus2 is V3C_AVD, it indicates that the current region is a multi-viewpoint video texture mosaic, and the current region is split and output as a reconstructed multi-viewpoint video texture mosaic.
  • pin_region_type_id_minus2 is V3C_GVD
  • the current region is a multi-viewpoint video geometric mosaic
  • the current region is split and output as a reconstructed multi-viewpoint video geometric mosaic.
  • Step 3 decoding the reconstructed multi-viewpoint video mosaic graph to obtain the reconstructed multi-viewpoint video.
  • the multi-viewpoint video texture mosaic map and the multi-viewpoint video geometric mosaic map are decoded to obtain the reconstructed multi-viewpoint video.
  • the above takes multi-viewpoint video as an example to analyze and introduce the framepacking technology.
  • the framepacking encoding and decoding method for point clouds is basically the same as the above-mentioned multi-viewpoint video. You can refer to it.
  • TMC a kind of VPCC reference software
  • Packing to get the point cloud mosaic map, frame packing of the input frame packer of the point cloud mosaic map, to obtain the point cloud mosaic map, and splicing the point cloud mosaic map to obtain the point cloud code stream, which will not be repeated here.
  • V3C unit header syntax is shown in Table 1:
  • V3C unit header semantics as shown in Table 2:
  • Packed video frames can be divided into one or more rectangular regions.
  • a region should map exactly to an atlas tile.
  • the rectangular areas of packed video frames are not allowed to overlap.
  • pin_codec_id[j] indicates an identifier of a codec for compressing and packing video data for the atlas whose ID is j.
  • pin_codec_id should be in the range 0 to 255, inclusive.
  • the codec may be identified through the Component Codec Mapping SEI message or by means outside of this document.
  • pin_occupancy_present_flag[j] 0 indicates that the packed video frame of the atlas with ID j does not contain a region with occupancy data.
  • pin_occupancy_present_flag[j] 1 indicates that the packed video frame of the atlas with ID j does contain a region with occupancy data.
  • pin_occupancy_present_flag[j] does not exist, it is inferred to be equal to 0.
  • bitstream consistency requirement is that if pin_occupancy_present_flag[j] is equal to 1 for atlas with atlas ID j, vps_occupancy_video_present_flag[j] should be equal to 0 for atlas with the same atlas ID j.
  • pin_geometry_present_flag[j] 0 indicates that the packed video frame of the atlas with ID j does not contain regions with geometry data.
  • pin_geometry_present_flag[j] 1 indicates that the packed video frame of the atlas with ID j does contain regions with geometry data.
  • pin_geometry_present_flag[j] is not present, it is inferred to be equal to 0.
  • bitstream consistency is that if pin_geometry_present_flag[j] is equal to 1 for atlas with ID j, then vps_geometry_video_present_flag[j] should be equal to 0 for atlas with ID j.
  • pin_attributes_present_flag[j] 0 indicates that the packed video frame of the atlas with ID j does not contain a region with attribute data.
  • pin_attributes_present_flag[j] 1 indicates that the packed video frame of the atlas with ID j does contain regions with attribute data.
  • pin_attributes_present_flag[j] is not present, it is inferred to be equal to 0.
  • bitstream consistency is that if pin_attribute_present_flag[j] is equal to 1 for atlas with ID j, vps_attribute_video_present_flag[j] shall be equal to 0 for atlas with ID j.
  • pin_occupancy_2d_bit_depth_minus1[j] plus 1 indicates the nominal 2D bit depth to which the decoded region of the atlas with ID j containing occupancy data should be converted.
  • pin_occupancy_MSB_align_flag[j] shall be in the range 0 to 31, inclusive.
  • pin_occupancy_MSB_align_flag[j] indicates how the decoded region containing occupied samples of atlas with ID j is converted to samples of the nominal occupied bit depth, as specified in Annex B.
  • pin_lossy_occupancy_compression_threshold[j] indicates the threshold for deriving binary occupancy from a decoded region containing occupancy data for atlas with ID j.
  • pin_lossy_occupancy_compression_threshold[j] should be in the range 0 to 255, inclusive.
  • pin_geometry_2d_bit_depth_minus1[j] plus 1 indicates the nominal 2D bit depth, the nominal 2D bit depth to which the decoding area containing geometry data of the atlas with ID j should be converted.
  • pin_geometry_2d_bit_depth_minus1[j] should be in the range 0 to 31, inclusive.
  • pin_geometry_MSB_align_flag[j] indicates how to convert the decoded region containing geometry samples of atlas with ID j to samples of the nominal occupied bit depth, as described in Annex B.
  • pin_geometry_3d_coordinates_bit_depth_minus1[j] plus 1 indicates the bit depth of the geometric coordinates of the reconstructed stereo content of the atlas with ID j.
  • pin_geometry_3d_coordinates_bit_depth_minus1[j] should be in the range 0 to 31, inclusive.
  • pin_attribute_count[j] indicates the number of attributes with a unique attribute type present in the packed video frame of the atlas with ID j.
  • pin_attribute_type_id[j][i] represents the i-th attribute type of the attribute area of the packed video frame of the atlas whose ID is j.
  • Table 3 describes the list of supported attribute types.
  • pin_attribute_2d_bit_depth_minus1[j][k] plus 1 indicates the nominal 2D bit depth to which the region containing the attribute with attribute index k should be converted, for an atlas with ID j.
  • pin_attribute_2d_bit_depth_minus1[j][k] shall be in the range 0 to 31, inclusive.
  • pin_attribute_MSB_align_flag[j][k] indicates how to convert a decoded region (for an atlas with ID j) containing an attribute of attribute type k to samples of the nominal attribute bit depth, as described in Annex B.
  • pin_attribute_map_absolute_coding_persistence_flag[j][k] 1 indicates that the decoded region contains the attribute map of the attribute with index k, corresponding to the atlas with ID j, encoded without any form of map prediction.
  • pin_attribute_map_absolute_coding_persistence_flag[j][i] 0 indicates that the decoding region contains the attribute map of the attribute with index k, corresponding to the atlas with ID j, and the same map prediction method as used for the geometric component of the atlas with ID j should be used . If pin_attribute_map_absolute_coding_persistence_flag[j][i] is absent, its value equal to 1 shall be inferred.
  • the 3D array AttributeMapAbsoluteCodingEnabledFlag indicates whether a specific mapping of an attribute is to be encoded, with or without prediction, obtained as follows:
  • pin_attribute_dimension_minus1[j][k] plus 1 indicates the total dimension (that is, the number of channels) of the region containing the attribute with index k in the atlas with ID j.
  • pin_attribute_dimension_minus1[j][i] shall be in the range 0 to 63, inclusive.
  • pin_attribute_dimension_partitions_minus1[j][k] plus 1 indicates the number of partition groups that should be grouped by the attribute channel of the region containing the attribute with index k for the atlas with ID j.
  • pin_attribute_dimension_partitions_minus1[j][k] shall be in the range 0 to 63, inclusive.
  • pin_attribute_partition_channels_minus1[j][k][l] plus 1 indicates the number of channels allocated to the dimension partition group with index l for the region containing the attribute with index k in the atlas with ID j.
  • ai_attribute_partition_channels_minus1[j][k][l] should be in the range of 0 to ai_attribute_dimension_minus1[j][k] for all dimension partition groups.
  • pin_regions_count_minus1[j] plus 1 indicates the number of regions of the atlas with ID j packaged in one video frame.
  • pin_regions_count_minus1 should be in the range 0 to 7, inclusive. When absent, the value of pin_regions_count_minus1 is inferred to be equal to 0.
  • pin_region_tile_id[j][i] indicates the tile ID of the region whose index is i in the atlas with ID j.
  • pin_region_type_id_minus2[j][i] plus 2 means that for the atlas with ID j, the ID of the region with index i.
  • the value of pin_region_type_id_minus2[j][i] shall be in the range 0 to 2, inclusive.
  • pin_region_top_left_x[j][i] takes the luma sample in the packed video component frame as the unit, and specifies the horizontal position of the upper left sample for the region with the index i of the atlas with ID j. When absent, the value of pin_region_top_left_x[j][i] is inferred to be equal to 0.
  • pin_region_top_left_y[j][i] takes the luma sample in the packed video component frame as the unit, and specifies the vertical position of the upper left sample for the region with the index i of the atlas with ID j. When absent, the value of pin_region_top_left_y[j][i] is inferred to be equal to 0.
  • pin_region_width_minus1[j][i]plus 1 specifies the width for the region with index i of the atlas with ID j, in units of luminance samples.
  • pin_region_height_minus1[j][i] plus 1 specifies the height of the region whose index is i in the atlas with ID j and specifies the height, in units of brightness samples.
  • pin_region_unpack_top_left_x[j][i] specifies the horizontal position of the upper left sample for the region with the index i of the atlas of ID j in units of luminance samples in the decompressed video component frame. When absent, the value of pin_region_unpack_top_left_x[j][i] is inferred to be equal to 0.
  • pin_region_unpack_top_left_y[j][i] specifies the vertical position of the top left sample for the region with index i of the atlas of ID j in units of luma samples in the decompressed video component frame. When absent, the value of pin_region_unpack_top_left_y[j][i] is inferred to be equal to 0.
  • pin_region_rotation_flag[j][i] 0 indicates that the region with index i of the atlas with ID j is not to be rotated.
  • pin_region_rotation_flag[j][i] 1 indicates that the region with index i of the atlas with ID j is rotated by 90 degrees.
  • pin_region_map_index[j][i] specifies the map index of the region whose atlas index is i with ID j.
  • pin_region_auxiliary_data_flag[j][i] 1 indicates that the region with ID j atlas index i contains only RAW and/or EOM codepoints.
  • pin_region_auxiliary_data_flag 0 indicates that the region with index i in atlas with ID j may contain RAW and/or EOM codepoints.
  • pin_region_attr_type_id[j][i] indicates the attribute type of the region whose ID is j and whose atlas index is i.
  • Table 3 describes the list of supported attributes.
  • pin_region_attr_partition_index[j][i] indicates the attribute partition index of the region whose atlas index is i with ID j. When absent, the value of pin_region_attr_partition_index[j][i] is inferred to be equal to 0.
  • the decoding process of the packed video component of the atlas whose ID is DecAtlasID is performed as follows.
  • the codec is first determined using the configuration file defined in Annex A or the value of pin_codec_id[DecAtlasID] and the Component Codec Mapping SEI message specified in subclause F.2.11, if present. Then, according to the corresponding encoding specification, the packetized video decoding process is invoked using the packetized video sub-bitstream present in the V3C bitstream as input.
  • DecPckFrames decoded packed video frames, where dimensions correspond to decoded packed video frame index, component index, row index, and column index, respectively, and
  • DecPckChromaSamplingPosition indicates the video chroma sampling position as specified in ISO/IEC 23091-2,
  • DecPckFullRange indicates the full range of video codepoints specified in ISO/IEC 23091-2
  • DecPckTransferCharacteristics indicates the transfer characteristics specified in ISO/IEC 23091-2,
  • DecPckMatrixCoeffs indicates the matrix coefficients specified in ISO/IEC 23091-2
  • DecPckOutOrdIdx indicating the packed video output order index
  • DecPckCompTime indicating the packetized video composition time.
  • dimension corresponds to the decoded packed video frame index.
  • DecPckTransferCharacteristics are missing or set to the value 2, i.e. unspecified, these elements shall be set to 8, i.e. linear.
  • the maximum allowed value may be further restricted by the application toolset configuration file.
  • An optimized implementation of the unpacking process can determine the appropriate value of this variable according to the syntax elements in the packing_information() syntax structure.
  • Size be one of NumRegions, RegionTypeId, RegionPackedOffsetX, RegionPackedOffsetY, RegionWidth, RegionHeight, RegionUnpackedOffsetX, RegionUnpackedOffsetY, RegionMapIdx, RegionRotationFlag, RegionAuxilaryDataFlag, RegionAttrTypeID, RegionAttrPatritionIdx, and RegionAttrPatritionChannels
  • the dimension array is set as follows:
  • RegionTypeId[i] pin_region_type_id_minus2[ConvAtlasID][i]+2
  • RegionPackedOffsetX[i] pin_region_top_left_x[ConvAtlasID][i]
  • RegionPackedOffsetY[i] pin_region_top_left_y[ConvAtlasID][i]
  • RegionWidth[i] pin_region_width_minus1[ConvAtlasID][i]+1
  • RegionHeight[i] pin_region_height_minus1[ConvAtlasID][i]+1
  • RegionUnpackedOffsetX[i] pin_region_unpacked_top_left_x[ConvAtlasID][i]
  • RegionUnpackedOffsetY[i] pin_region_unpacked_top_left_y[ConvAtlasID][i]
  • RegionMapIdx[i] pin_region_map_index[ConvAtlasID][i]
  • RegionRotationFlag[i] pin_region_rotation_flag[ConvAtlasID][i]
  • RegionAuxilaryDataFlag[i] pin_region_auxiliary_data_flag[j][i]
  • RegionAttrTypeID[i] pin_region_attr_type_id[ConvAtlasID][i]
  • the unpacking process is defined as follows: Section B.4.2 is invoked to calculate the resolution of the unpacked video component.
  • the outputs of this process are the variables unpckOccWidth, unpckOccHeight, unpckGeoAuxWidth, and unpckGeoAuxHeight, the 1D arrays unpckGeoWidth and unpckGeoHeight, the 2D arrays unpckAttrAuxWidth and unpckAttrAuxHeight, and the 3D arrays unpckAttrWidth and unpckAttrHeight.
  • Invoke subclause B.4.3 to initialize unpacked video component frames.
  • the inputs to this procedure are the variables unpckOccWidth, unpckOccHeight, unpckGeoAuxWidth and unpckGeoAuxHeight, the 1D arrays unpckGeoWidth and unpckGeoHeight, the 2D arrays unpckAttrAuxWidth and unpckAttrAuxHeight, and the 3D arrays unpckAttrWidth and unpckAttrHeight.
  • the output of this process is the 4D array unpckOccFrames, the 5D array unpckGeoFrames, the 4D array unpckGeoAuxFrames, the 7D array unpckAttrFrames, the 6D array unpckAttrAuxFrames. – Copy data to unpacked video component frames, calling subclause B.4.4.
  • the input to this process is the 4D array unpckOccFrames, the 5D array unpckGeoFrames, the 4D array unpckGeoAuxFrames, the 7D array unpckAttrFrames, the 6D array unpckAttrAuxFrames.
  • the output of this process is updated to 4D array unpckOccFrames, 5D array unpckGeoFrames, 4D array unpckGeoAuxFrames, 7D array unpckAttrFrames, 6D array unpckAttrAuxFrames.
  • the unpacked video component frames as output of subclause B.4.4 may be passed as input to the nominal format conversion process defined in subclause B.2.
  • the input to this process is:
  • the 4D array unpckOccFrames represents the decompressed occupied frames, where the dimensions correspond to the occupied video frame index, component index, row index, and column index.
  • the 4D array unpckGeoAuxFrames represents the decompressed auxiliary geometry video frame, where the dimensions correspond to the decoded auxiliary geometry video frame index, component index, row index, and column index respectively.
  • the 7D array unpckAttrFrames represents the unpacked attribute video frame, where the dimensions correspond to attribute index, attribute partition index, map index, decoded attribute video frame index, component index, row index, and column index, respectively.
  • unpckAttrAuxFrames which represents the unpacked auxiliary attribute video frame, where the dimensions correspond to attribute index, attribute partition index, decoding attribute video frame index, component index, row index and column index.
  • – 4D array unpckGeoAuxFrames representing the decompressed auxiliary geometry video frame, where the dimensions correspond to the decoded auxiliary geometry video frame index, component index, row index and column index respectively.
  • – 7D array unpckAttrFrames representing unpacked attribute video frames, where dimensions correspond to attribute index, attribute partition index, map index, decoded attribute video frame index, component index, row index, column index, respectively.
  • –6D array unpckAttrAuxFrames which represents the auxiliary attribute video frame after unpacking, where the dimensions correspond to attribute index, attribute partition index, decoding attribute video frame index, component index, row index and column index.
  • the output of this process is: –Updated 4D Array unpckOccFrames. – Updated 5D array unpckGeoFrames. – Updated 4D array unpckGeoAuxFrames. – Updated 7D array unpckAttrFrames. – Updated 6D array unpckAttrAuxFrames. The following applies:
  • the visual media contents of multiple different expression formats are encoded and decoded separately.
  • the current packaging technology is to compress the point cloud to form a point cloud compression code stream (that is, a V3C code stream), and for multi-viewpoint video
  • the information is compressed to obtain a multi-view video compression code stream (that is, another V3C code stream), and then the system layer multiplexes the compressed code stream to obtain a fused 3D scene multiplexing code stream.
  • the point cloud compressed code stream and the multi-view video compressed code stream are decoded separately. It can be seen that in the prior art, when encoding and decoding visual media content with different expression formats, many codecs are used, and the cost of encoding and decoding is high.
  • the embodiment of the present application splices mosaic graphs corresponding to visual media content in different expression formats into a heterogeneous hybrid mosaic graph, for example, splicing multi-viewpoint video mosaic graphs and point cloud mosaic graphs into a heterogeneous mosaic graph.
  • Encoding and decoding are performed in the mixed splicing image, which minimizes the number of two-dimensional video encoders such as HEVC, VVC, AVC, and AVS that need to be called, reduces the cost of encoding and decoding, and improves usability.
  • the video encoding method provided by the embodiment of the present application will be introduced below with reference to FIG. 6 and by taking the encoding end as an example.
  • Fig. 6 is a schematic flow diagram of an encoding method provided by an embodiment of the present application. As shown in Fig. 6, the method of the embodiment of the present application includes:
  • At least two visual media contents correspond to different expression formats, and N is a positive integer greater than 1.
  • Visual Expressions with different formats may appear in the same scene.
  • Media objects for example, exist in the same 3D scene, express the scene background and some characters and objects in video, and express another part of characters in 3D point cloud or 3D grid.
  • the multiple visual media contents in this embodiment of the present application include media contents such as multi-viewpoint video, point cloud, and grid.
  • the above multiple visual media contents are media contents presented simultaneously in the same three-dimensional space.
  • the above-mentioned multiple visual media contents are media contents presented at different times in the same three-dimensional space.
  • the above multiple visual media contents may also be media contents in different three-dimensional spaces.
  • the expression formats corresponding to at least two visual media contents among the plurality of visual media contents in the embodiment of the present application are different.
  • the multiple visual media contents in the embodiment of the present application have different expression formats, for example, the multiple visual media contents include point cloud and multi-viewpoint video.
  • the expression formats of some visual media contents in the multiple visual media contents in the embodiment of the present application are the same, and the expression formats of some visual media contents are different, for example, multiple visual media contents include two point clouds and one multi-viewpoint video.
  • the embodiment of the present application processes the multiple visual media contents, such as packaging (also called splicing) processing, to obtain each visual media content in the multiple visual media contents The corresponding mosaic.
  • packaging also called splicing
  • multiple visual media contents include point cloud and multi-view video
  • the point cloud is processed to obtain a point cloud mosaic
  • the multi-view video is processed to obtain a multi-view video mosaic.
  • the isomorphic mosaic described in the embodiment of the present application means that the expression format corresponding to each sub-block in the mosaic is the same, for example, each sub-block in an isomorphic mosaic is a multi-view video sub-block, or is a point cloud A subblock is equivalent to a subblock of the same expression format.
  • the above S601 includes the following steps:
  • a limited number of viewpoints are selected as the basic viewpoints and the visual range of the scene is expressed as much as possible.
  • the basic viewpoints are transmitted as a complete image, and the remaining non-basic viewpoints are removed. Redundant pixels, that is, only retain the effective information that is not repeatedly expressed, and then extract the effective information into sub-block images and reorganize the basic view images to form a larger rectangular image, which is called a multi-view video mosaic.
  • the three-dimensional point cloud is projected in parallel to obtain a two-dimensional point cloud.
  • the connected points in the two-dimensional point cloud are composed of point cloud sub-blocks, and these point cloud sub-blocks are spliced. Get the point cloud mosaic map.
  • a plurality of visual media contents are processed separately (that is, packaged) during encoding to obtain N isomorphic mosaic graphs .
  • the N homogeneous mosaic graphs with different expression formats are spliced into a heterogeneous hybrid mosaic graph, and the heterogeneous hybrid mosaic graph is encoded to obtain a code stream.
  • the video encoder can only be called once for encoding during encoding, thereby reducing the number of calls that need to be made.
  • the number of two-dimensional video encoders such as HEVC, VVC, AVC, and AVS reduces the encoding cost and improves usability.
  • region packing the process of stitching N homogeneous mosaic graphs into heterogeneous hybrid mosaic graphs in the embodiment of the present application is called region packing.
  • the above S603 includes, using a video encoder to encode the heterogeneous hybrid mosaic image to obtain a video code stream.
  • a step of encoding the mixed splicing information is also included, that is, the above S603 includes the following steps:
  • the video encoder used to perform video encoding on the heterogeneous hybrid mosaic image to obtain the video compressed sub-stream may be the video encoder shown in FIG. 2A above. That is to say, in the embodiment of the present application, the heterogeneous mixed mosaic image is used as a frame of image, and the block is divided first, and then the predicted value of the coding block is obtained by using intra-frame or inter-frame prediction, and the predicted value of the coding block is subtracted from the original value , to obtain the residual value, and after transforming and quantizing the residual value, a video compressed sub-code stream is obtained.
  • the mixed mosaic information of the heterogeneous mixed mosaic graph is encoded to obtain a sub-code stream of the mixed mosaic information.
  • the embodiment of the present application does not limit the encoding method of the mixed splicing information, for example, the conventional data compression encoding method such as equal-length encoding or variable-length encoding is used for compression.
  • the video compression sub-stream and the mixed splicing information sub-stream are written in the same code stream to obtain the final code stream.
  • the embodiment of the present application not only supports heterogeneous source formats such as video, point cloud, and grid in the same compressed code stream, but also realizes different expressions such as multi-viewpoint video mosaic diagram and point cloud (or network) mosaic diagram.
  • the mosaic image in the same format exists in a heterogeneous hybrid mosaic image at the same time, which minimizes the number of video encoders that need to be called, reduces the implementation cost, and improves ease of use.
  • the heterogeneous hybrid mosaic graph in the embodiment of the present application includes a multi-attribute heterogeneous hybrid mosaic graph and a single-attribute heterogeneous hybrid mosaic graph.
  • the multi-attribute heterogeneous hybrid mosaic graph refers to the heterogeneous hybrid mosaic graph that includes at least two isomorphic mosaic graphs with different attribute information.
  • a multi-attribute heterogeneous hybrid mosaic graph includes The isomorphic mosaic graph of attribute information also includes the isomorphic mosaic graph of geometric information.
  • a multi-attribute heterogeneous hybrid mosaic includes multi-view video texture mosaic and point cloud geometric mosaic
  • a multi-attribute heterogeneous hybrid mosaic includes multi-view video texture mosaic, point cloud geometric mosaic
  • a multi-view video geometric mosaic, or a multi-attribute heterogeneous hybrid mosaic includes a multi-view video geometric mosaic, a point cloud geometric texture mosaic, a point cloud texture mosaic, and so on.
  • the single-attribute heterogeneous hybrid mosaic graph refers to a heterogeneous hybrid mosaic graph in which all the isomorphic mosaic graphs included have the same attribute information.
  • a single-attribute heterogeneous hybrid mosaic is an isomorphic mosaic that only includes attribute information, or a single-attribute heterogeneous hybrid mosaic that only includes geometric information.
  • a single-attribute heterogeneous hybrid mosaic only includes multi-view video texture mosaic and point cloud texture mosaic, or a single-attribute heterogeneous hybrid mosaic only includes multi-view video geometric mosaic and point cloud geometric mosaic .
  • the embodiment of the present application does not limit the expression format of the N isomorphic mosaic graphs.
  • the N isomorphic mosaics include at least two of a multi-view video mosaic, a point cloud mosaic, and a mesh mosaic.
  • the point cloud, multi-viewpoint video and grid in the embodiment of the present application include multiple attributes, such as including geometric attributes and texture attributes.
  • attributes such as including geometric attributes and texture attributes.
  • at least two attributes of the point cloud, multi-viewpoint video and grid Or stitching graphs under any two attributes are spliced into one graph to obtain a heterogeneous hybrid mosaic graph.
  • the N homogeneous mosaic graphs are spliced to generate a heterogeneous hybrid mosaic graph, including:
  • S602-A Concatenate at least the single-attribute mosaic graph in the first expression format and the single-attribute mosaic graph in the second expression format to obtain a heterogeneous hybrid mosaic graph.
  • both the first expression format and the second expression format are any one of multi-view video, point cloud and network, and the first expression format is different from the second expression format.
  • the mosaic of a single attribute of the multi-view video includes at least one of the texture mosaic of the multi-view video and the geometric mosaic of the multi-view video.
  • the single-attribute mosaic of the point cloud includes at least one of a point cloud texture mosaic, a point cloud geometry mosaic, and a point cloud occupancy mosaic.
  • the point cloud attribute mosaic of the grid includes at least one of a mesh texture mosaic, a mesh geometry mosaic, and a grid occupancy mosaic.
  • At least two of the multi-view video geometric mosaic graph, the point cloud geometric mosaic graph, and the mesh geometric mosaic graph are stitched into one graph to obtain a heterogeneous hybrid mosaic graph.
  • This heterogeneous hybrid mosaic is called a single-attribute heterogeneous hybrid mosaic.
  • At least two of the multi-viewpoint video texture mosaic map, the point cloud texture mosaic map, and the grid texture mosaic map are stitched into one image to obtain a heterogeneous hybrid mosaic image.
  • This heterogeneous hybrid mosaic is called a single-attribute heterogeneous hybrid mosaic.
  • the multi-viewpoint video texture mosaic image and at least one of the point cloud geometric mosaic image and the mesh geometric mosaic image are stitched into one image to obtain a heterogeneous hybrid mosaic image.
  • This heterogeneous hybrid mosaic is called a multi-attribute heterogeneous hybrid mosaic.
  • the multi-view video geometric mosaic image, at least one of the point cloud texture mosaic image and the grid texture mosaic image are stitched into one image to obtain a heterogeneous hybrid mosaic image.
  • This heterogeneous hybrid mosaic is called a multi-attribute heterogeneous hybrid mosaic.
  • the point cloud texture mosaic image and at least one of the multi-viewpoint video geometric mosaic image and the mesh geometric mosaic image are stitched into one image to obtain a heterogeneous hybrid mosaic image.
  • This heterogeneous hybrid mosaic is called a multi-attribute heterogeneous hybrid mosaic.
  • the point cloud geometric mosaic image, at least one of the multi-viewpoint video texture mosaic image and the grid texture mosaic image are stitched into one image to obtain a heterogeneous hybrid mosaic image.
  • This heterogeneous hybrid mosaic is called a multi-attribute heterogeneous hybrid mosaic.
  • a single-attribute mosaic for multi-view video includes a multi-view video texture mosaic and a multi-view video geometry mosaic.
  • the single attribute mosaic of point cloud includes point cloud texture mosaic, point cloud geometry mosaic, and point cloud occupancy mosaic.
  • hybrid splicing methods of S602-A above include but are not limited to the following:
  • Method 1 The multi-view video texture mosaic map, the multi-view video geometric mosaic map, the point cloud texture mosaic map, the point cloud geometric mosaic map and the point cloud occupancy mosaic map are all stitched into a heterogeneous hybrid mosaic map.
  • Method 2 stitch the multi-view video texture mosaic map, multi-view video geometric mosaic map, point cloud texture mosaic map, point cloud geometric mosaic map and point cloud occupancy mosaic map to obtain M Heterogeneous hybrid mosaic graph.
  • the above-mentioned multi-viewpoint video mosaic diagram includes a multi-viewpoint video texture mosaic diagram and a multi-viewpoint video geometric mosaic diagram
  • the point cloud mosaic diagram includes a point cloud texture mosaic diagram, a point cloud geometric mosaic diagram, and a point cloud occupancy situation mosaic Graphs are mixed and spliced to obtain M heterogeneous mixed and spliced graphs including at least the following examples:
  • Example 1 the multi-viewpoint video texture mosaic and the point cloud texture mosaic are stitched together to obtain a heterogeneous mixed texture mosaic, and the multi-view video geometric mosaic, point cloud geometric mosaic and point cloud occupancy mosaic are stitched together, A mosaic of heterogeneous mixed geometries and occupancy is obtained.
  • the multi-viewpoint video is processed to obtain a mosaic map of multi-viewpoint video, wherein the mosaic map of multi-viewpoint video includes multi-viewpoint video texture mosaic graph and multi-view video geometric mosaic.
  • Point cloud 1 is processed to obtain point cloud texture mosaic map 1, point cloud geometric mosaic map 1, and occupancy mosaic map of point cloud 1.
  • the point cloud 2 is processed to obtain the point cloud texture mosaic map 2A, the point cloud geometric mosaic map 2A, and the occupancy mosaic map of the point cloud 2.
  • the occupancy mosaic of point cloud 1 and the occupancy mosaic of point cloud 2 may be merged into one occupancy mosaic of point cloud.
  • the multi-view video texture mosaic image, the point cloud texture mosaic image 1 and the point cloud texture mosaic image 2A are mixed and stitched to obtain a heterogeneous mixed texture mosaic image, as shown in FIG. 8A .
  • the multi-viewpoint video geometric mosaic diagram, the point cloud geometric mosaic diagram 1, the point cloud geometric mosaic diagram 2A and the point cloud occupancy mosaic diagram are stitched together to obtain a heterogeneous mixed geometry and occupancy mosaic diagram, as shown in Figure 8B for example.
  • Example 2 the multi-viewpoint video texture mosaic and point cloud texture mosaic are stitched together to obtain a heterogeneous mixed texture mosaic, and the multi-view video geometric mosaic and point cloud geometric mosaic are combined to obtain a heterogeneous mixed geometric mosaic.
  • the mosaic of point cloud occupancy is separately regarded as a mixed mosaic.
  • Example 3 stitching the multi-view video texture mosaic, point cloud texture mosaic and point cloud occupancy mosaic to get a sub-heterogeneous hybrid mosaic, and will multi-view video geometric mosaic and point cloud geometric mosaic Splicing is performed to obtain another sub-heterogeneous hybrid mosaic map.
  • Example 4 the multi-view video texture mosaic, point cloud texture mosaic, multi-view video geometric mosaic, point cloud geometric mosaic and point cloud occupancy mosaic are stitched into a heterogeneous hybrid mosaic.
  • the multi-view video texture mosaic map, multi-view video geometric mosaic map, point cloud texture mosaic map, point cloud geometric mosaic map and point cloud occupancy will be combined according to the preset hybrid mosaic method After splicing the mosaic graphs to obtain M heterogeneous hybrid mosaic graphs, respectively perform video encoding on the M heterogeneous hybrid mosaic graphs to obtain video compressed sub-code streams.
  • a video encoder is used to encode the M sub-heterogeneous mixed mosaic images respectively to obtain video compressed sub-code streams.
  • each heterogeneous hybrid mosaic image in the M sub-heterogeneous hybrid mosaic image may be used as a frame of image for video encoding to obtain a video compressed sub-code stream.
  • a video encoder is used to separately encode the heterogeneous mixed texture mosaic shown in FIG. 8A and the heterogeneous mixed geometry and occupancy mosaic shown in FIG. 8B to obtain video compressed sub-code streams.
  • hybrid mosaic information corresponding to each heterogeneous hybrid mosaic graph in the M heterogeneous hybrid mosaic graphs is generated.
  • the mixed mosaic information of the M heterogeneous mixed mosaic graphs is encoded to obtain a sub-code stream of the mixed mosaic information of the M heterogeneous hybrid mosaic graphs.
  • hybrid mosaic information corresponding to each heterogeneous hybrid mosaic graph in the M heterogeneous hybrid mosaic graphs For example, combine the hybrid mosaic information corresponding to each heterogeneous hybrid mosaic graph in the M heterogeneous hybrid mosaic graphs to form a complete hybrid mosaic information, and then encode the complete hybrid mosaic information to obtain the hybrid mosaic information substream.
  • the multi-view video is processed, for example, by TMIV packaging technology, to obtain a multi-view video texture mosaic map and a multi-view video geometric mosaic map.
  • Process the point cloud for example, through TMC2 packaging technology, to obtain the point cloud texture mosaic map, point cloud geometric mosaic map and point cloud occupancy mosaic map.
  • use the preset hybrid splicing method to splice the multi-view video texture mosaic, multi-view video geometric mosaic, point cloud texture mosaic, point cloud geometric mosaic, and point cloud occupancy mosaic to obtain M subdivisions. Construct a hybrid mosaic.
  • the area packing technology to stitch the multi-view video texture mosaic map and the point cloud texture mosaic map to obtain a heterogeneous mixed texture mosaic map; stitch the multi-view video geometric mosaic map, point cloud geometric mosaic map and point cloud occupancy
  • the graphs are stitched together to obtain a mosaic graph of heterogeneous mixed geometry and occupancy.
  • a video encoder is used to encode the heterogeneous mixed texture mosaic and the heterogeneous mixed geometry and occupancy mosaic to obtain a video compression sub-stream, and to encode the mixed mosaic information to obtain a mixed mosaic information sub-stream.
  • V3C Because framepacking in the original V3C standard only supports the mosaic of isomorphic texture, geometry, and occupancy into a mixed mosaic, that is to say, it only supports packaging multi-viewpoint video mosaic into a multi-view hybrid mosaic, or The point cloud mosaic is packed into a point cloud mixed mosaic, so the packinformation (stitching information) defined by V3C only includes flags for judging whether each area of the mosaic (packed video) belongs to texture, geometry, or occupancy, but does not judge the current An indicator of whether a region belongs to a point cloud or a multi-view video.
  • the hybrid mosaic information in this embodiment of the present application includes a first flag, which is used to indicate the expression format type corresponding to the i-th region in the heterogeneous hybrid mosaic graph, where i is a positive integer.
  • pin_region_format_type_id may be used to indicate the first flag.
  • the expression format type corresponding to the i-th region in the heterogeneous hybrid mosaic graph is indicated by setting different values for the first flag.
  • the embodiment of the present application further includes: if the mosaic graph of the i-th region is a multi-viewpoint video mosaic graph, the The value is set to the first value. If the mosaic image of the i-th region is a point cloud mosaic image, the value of the first flag is set to the second value.
  • the embodiment of the present application does not limit specific values of the first value and the second value.
  • the first value is 0
  • the second value is 1
  • the heterogeneous hybrid mosaic graph includes at least two mosaic graphs with different expression formats, when encoding the heterogeneous hybrid mosaic graph, in order to improve the decoding accuracy of the decoding end, in the hybrid mosaic information A first flag is added, and the expression format type corresponding to each region in the heterogeneous hybrid mosaic graph is indicated by the first flag.
  • the grammatical structure of the scheme 1 is shown in FIG. 10 , wherein A: attribute mosaic graph, G: geometric mosaic graph, O: occupancy mosaic graph, P: point cloud, M: multi-viewpoint video.
  • the mixed splicing information after adding the first flag is shown in Table 5. It should be noted that, in this example, the mixed splicing information multiplexes the splicing information shown in Table 3, and is shown in Table 3. The first flag is added to the splicing information of , as shown in Table 5.
  • pin_region_format_type_id[j][i] indicates the expression format type of the region whose index is i of the atlas with ID j. If pin_region_format_type_id[j][i] is equal to 0, it means that the expression format of the current region is multi-viewpoint video; if pin_region_format_type_id[j][i] is equal to 1, it means that the expression format of the current region is point cloud.
  • the following takes N visual media contents as multi-viewpoint video and point cloud as an example, and combines the above scheme 1 to introduce the encoding method of the embodiment of the present application, as shown in FIG. 11 , the present application
  • the encoding method of the embodiment includes the following steps:
  • Step 11 For multi-viewpoint video, use inter-viewpoint projection, erase repetitions and remove redundancy, connect non-repeated pixels into sub-blocks, and stitch sub-blocks into a multi-viewpoint video mosaic map; through parallel projection of point clouds, connect connected pixels in the projection surface Sub-blocks are formed, and the sub-blocks are spliced into a point cloud point cloud mosaic map.
  • Step 12 stitching the multi-viewpoint video mosaic image and the point cloud mosaic image to generate a heterogeneous hybrid mosaic image.
  • the current region added to the heterogeneous hybrid mosaic is a multi-view video mosaic
  • step 13 perform video coding on the heterogeneous mixed mosaic image to obtain a video compressed sub-stream.
  • Step 14 splicing the multi-viewpoint video mosaic image and the point cloud mosaic image into a heterogeneous hybrid mosaic image and encoding the hybrid mosaic information to form a hybrid mosaic information sub-stream;
  • step 15 the video compressed code stream and the mixed splicing information code stream are written into the compressed code stream.
  • the coding end is used to indicate the expression format type of the mosaic graph of the i-th region in the heterogeneous hybrid mosaic graph by adding the first flag (pin_region_format_type_id) in the hybrid mosaic information.
  • the decoder can accurately determine the expression format type of the mosaic image in the current region in the heterogeneous hybrid mosaic image according to the first flag in the hybrid mosaic information, for example, if the value of the first flag is the first value , the decoding end determines that the mosaic image in the current area in the heterogeneous hybrid mosaic image is a multi-viewpoint video mosaic image, and if the value of the first flag is the second value, the decoding end determines that the current area in the heterogeneous hybrid mosaic image is The mosaic image of the region is a point cloud mosaic image, so that the decoding end can realize accurate decoding according to the first mark.
  • the hybrid mosaic information in this embodiment of the present application includes a second flag, which is used to indicate whether the current hybrid mosaic graph is a heterogeneous hybrid mosaic graph.
  • the second logo is a brand new logo.
  • the second flag may reuse the existing vuh_unit_type, that is, in this embodiment of the present application, assign different values to vuh_unit_type to indicate whether the current hybrid mosaic is a heterogeneous hybrid mosaic.
  • the first flag in the hybrid splicing information. That is to say, when it is determined that the current hybrid mosaic is a heterogeneous hybrid mosaic, a first flag is written in the hybrid mosaic information, which is used to indicate that the mosaic in the current region of the heterogeneous hybrid mosaic is a multi-viewpoint video mosaic Mosaic diagrams of different expression formats such as point cloud mosaic diagrams.
  • the value of the first flag is set to the first value
  • the value of the first flag is set to the second value.
  • the decoder when decoding, the decoder first decodes to obtain the second flag, and if the value of the second flag is a preset value, it continues to decode to obtain the first flag, so as to splicing the current decoding area in the heterogeneous hybrid mosaic map
  • the image is decoded into a mosaic of different expression formats such as a multi-view video mosaic or a point cloud mosaic to achieve accurate decoding.
  • the encoder sets the value of the second flag to a non-preset value.
  • the embodiment of the present application does not limit the specific writing position of the second mark in the hybrid splicing information.
  • the second flag is located in the unit header of the mixed splicing information.
  • the syntax elements of Scheme 2 are as shown in FIG. 12 .
  • the second flag is added to the V3C unit header syntax shown in Table 1 above, and the new V3C unit header syntax is obtained as shown in Table 6:
  • vps_mixed_information_present_flag 1 specifies that one or more mixed information (j) syntax structure instances are present in the v3c_parameter_set() syntax structure.
  • vps_mixed_information_present_flag 0 indicates that the syntax structure does not exist. When absent, the value of vps_mixed_information_present_flag is inferred to be equal to 0.
  • the embodiment of the present application provides the grammatical structure of mixed information (Mixed_information), as shown in Table 10, which redefines the mixed information relative to the splicing information shown in Table 3 above.
  • Use min_region_format_type_id to indicate the first flag.
  • the mixed video frame can be divided into one or more rectangular regions.
  • a region should map exactly to an atlas tile. Rectangular areas of blended video frames are not allowed to overlap.
  • min_codec_id[j] indicates an identifier of a codec for compressing mixed video data for an atlas whose ID is j.
  • min_codec_id should be in the range 0 to 255, inclusive.
  • the codec may be identified through the Component Codec Mapping SEI message or by means outside of this document.
  • min_occupancy_present_flag[j] 0 indicates that the hybrid video frame of the atlas with ID j does not contain a region with occupancy data.
  • min_occupancy_present_flag[j] 1 indicates that the hybrid video frame of the atlas with ID j does contain regions with occupancy data.
  • min_occupancy_present_flag[j] does not exist, it is inferred to be equal to 0.
  • bitstream consistency is that if min_occupancy_present_flag[j] is equal to 1 for atlas with atlas ID j, vps_occupancy_video_present_flag[j] should be equal to 0 for atlas with the same atlas ID j.
  • min_geometry_present_flag[j] 0 indicates that the hybrid video frame of the atlas with ID j does not contain regions with geometry data.
  • min_geometry_present_flag[j] 1 indicates that the composite video frame of the atlas with ID j does contain regions with geometry data.
  • min_geometry_present_flag[j] is not present, it is inferred to be equal to 0.
  • bitstream consistency is that if min_geometry_present_flag[j] is equal to 1 for atlas with ID j, then vps_geometry_video_present_flag[j] should be equal to 0 for atlas with ID j.
  • min_attributes_present_flag[j] 0 indicates that the mixed video frame of the atlas with ID j does not contain a region with attribute data.
  • min_attributes_present_flag[j] 1 indicates that the composite video frame of the atlas with ID j does contain regions with attribute data.
  • min_attributes_present_flag[j] is not present, it is inferred to be equal to 0.
  • bitstream consistency is that if min_attribute_present_flag[j] is equal to 1 for atlas with ID j, vps_attribute_video_present_flag[j] shall be equal to 0 for atlas with ID j.
  • min_occupancy_2d_bit_depth_minus1[j] plus 1 indicates the nominal 2D bit depth to which the decoded region of the atlas with ID j containing occupancy data should be converted.
  • min_occupancy_MSB_align_flag[j] shall be in the range 0 to 31, inclusive.
  • min_occupancy_MSB_align_flag[j] indicates how the decoded region containing occupied samples of atlas with ID j is converted to samples of the nominal occupied bit depth, as specified in Annex B.
  • min_lossy_occupancy_compression_threshold[j] indicates the threshold for deriving binary occupancy from a decoded region containing occupancy data for atlas with ID j.
  • min_lossy_occupancy_compression_threshold[j] should be in the range 0 to 255, inclusive.
  • min_geometry_2d_bit_depth_minus1[j] plus 1 indicates the nominal 2D bit depth, the nominal 2D bit depth to which the decoding area containing geometry data of the atlas with ID j should be converted.
  • min_geometry_2d_bit_depth_minus1[j] should be in the range 0 to 31, inclusive.
  • min_geometry_MSB_align_flag[j] indicates how to convert the decoded region containing geometry samples of atlas with ID j to samples of the nominal occupied bit depth, as described in Annex B.
  • Min_geometry_3d_coordinates_bit_depth_minus1[j] plus 1 indicates the bit depth of the geometric coordinates of the reconstructed stereo content of the atlas with ID j.
  • min_geometry_3d_coordinates_bit_depth_minus1[j] should be in the range 0 to 31, inclusive.
  • min_attribute_count[j] indicates the number of attributes with a unique attribute type present in the mixed video frame of the atlas with ID j.
  • min_attribute_type_id[j][i] represents the ith attribute type of the attribute area of the mixed video frame of the atlas whose ID is j.
  • Table 3 describes the list of supported attribute types.
  • min_attribute_2d_bit_depth_minus1[j][k] plus 1 indicates the nominal 2D bit depth to which the region containing the attribute with attribute index k should be converted, for an atlas with ID j.
  • min_attribute_2d_bit_depth_minus1[j][k] shall be in the range 0 to 31, inclusive.
  • min_attribute_MSB_align_flag[j][k] indicates how to convert a decoded region (for an atlas with ID j) containing an attribute of attribute type k into samples of the nominal attribute bit depth, as described in Annex B.
  • min_attribute_map_absolute_coding_persistence_flag[j][k] 1 indicates that the decoded region contains the attribute map of the attribute with index k, corresponding to the atlas with ID j, encoded without any form of map prediction.
  • min_attribute_map_absolute_coding_persistence_flag[j][i] 0 indicates that the decoding region contains the attribute map of the attribute with index k, corresponding to the atlas with ID j, and the same map prediction method as used for the geometric component of the atlas with ID j should be used . If min_attribute_map_absolute_coding_persistence_flag[j][i] is absent, its value equal to 1 shall be inferred.
  • the 3D array AttributeMapAbsoluteCodingEnabledFlag indicates whether a specific mapping of an attribute is to be encoded, with or without prediction, obtained as follows:
  • Min_attribute_dimension_minus1[j][k] plus 1 indicates the total dimension (that is, the number of channels) of the region containing the attribute with index k in the atlas with ID j.
  • min_attribute_dimension_minus1[j][i] shall be in the range 0 to 63, inclusive.
  • min_attribute_dimension_partitions_minus1[j][k] plus 1 indicates the number of partition groups that should be grouped by the attribute channel of the region containing the attribute with index k for the atlas with ID j.
  • min_attribute_dimension_partitions_minus1[j][k] shall be in the range 0 to 63, inclusive.
  • min_attribute_partition_channels_minus1[j][k][l] plus 1 indicates the number of channels assigned to the dimension partition group with index l for the region containing the attribute with index k in the atlas with ID j.
  • ai_attribute_partition_channels_minus1[j][k][l] should be in the range of 0 to ai_attribute_dimension_minus1[j][k] for all dimension partition groups.
  • min_regions_count_minus1[j] plus 1 indicates the number of regions in which the atlas with ID j is mixed in one video frame.
  • min_regions_count_minus1 should be in the range 0 to 7, inclusive. When absent, the value of min_regions_count_minus1 is inferred to be equal to 0.
  • min_region_tile_id[j][i] indicates the tile ID of the region whose index is i in the atlas with ID j.
  • min_region_format_type_id[j][i] indicates the format type of the region whose ID is j and whose atlas index is i.
  • min_region_format_type_id[j][i] is equal to 0, indicating that the region format is multi-view video; equal to 1, the region format is point cloud.
  • min_region_type_id_minus2[j][i] plus 2 means that for the atlas with ID j, the ID of the region with index i.
  • the value of min_region_type_id_minus2[j][i] shall be in the range 0 to 2, inclusive.
  • min_region_top_left_x[j][i] takes the luminance sample in the mixed video component frame as a unit, and specifies the horizontal position of the upper left sample for the region with the index i of the atlas with ID j. When absent, the value of min_region_top_left_x[j][i] is inferred to be equal to 0.
  • min_region_top_left_y[j][i] specifies the vertical position of the upper left sample for the region with the index i of the atlas with ID j in units of luminance samples in the mixed video component frame. When absent, the value of min_region_top_left_y[j][i] is inferred to be equal to 0.
  • min_region_width_minus1[j][i]plus 1 specifies the width for the region with index i of the atlas with ID j, in units of luminance samples.
  • min_region_height_minus1[j][i] plus 1 specifies the height of the region whose index is i in the atlas with ID j, in units of brightness samples.
  • min_region_unpack_top_left_x[j][i] specifies the horizontal position of the upper left sample for the region with index i of ID j's atlas in units of luma samples in the decompressed video component frame. When absent, the value of min_region_unpack_top_left_x[j][i] is inferred to be equal to 0.
  • min_region_unpack_top_left_y[j][i] specifies the vertical position of the top left sample for the region with index i of the atlas of ID j in units of luma samples in the decompressed video component frame. When absent, the value of min_region_unpack_top_left_y[j][i] is inferred to be equal to 0.
  • min_region_rotation_flag[j][i] 0 indicates that the region with index i of the atlas with ID j is not to be rotated.
  • min_region_rotation_flag[j][i] 1 means that the region with index i of the atlas with ID j is rotated by 90 degrees.
  • min_region_map_index[j][i] specifies the map index of the region whose atlas index is i with ID j.
  • min_region_auxiliary_data_flag[j][i] 1 indicates that the region with index i in the atlas with ID j contains only RAW and/or EOM codepoints.
  • min_region_auxiliary_data_flag 0 indicates that the region with index i in atlas with ID j may contain RAW and/or EOM codepoints.
  • min_region_attr_type_id[j][i] indicates the attribute type of the region whose ID is j and whose atlas index is i.
  • Table 3 describes the list of supported attributes.
  • min_region_attr_partition_index[j][i] indicates the attribute partition index of the region whose atlas index is i with ID j. When absent, the value of min_region_attr_partition_index[j][i] is inferred to be equal to 0.
  • the decoding process of the mixed video component of the atlas whose ID is DecAtlasID is as follows.
  • the codec is first determined using the profile defined in Annex A or the value of mix_codec_id[DecAtlasID] and the component codec mapping SEI message specified in subclause F.2.11, if present. Then, according to the corresponding encoding specification, the hybrid video decoding process is invoked using the hybrid video sub-bitstream present in the V3C bitstream as input.
  • DecMixChromaSamplingPosition indicates the video chroma sampling position as specified in ISO/IEC 23091-2,
  • DecMixFullRange indicates the full range of video codepoints specified in ISO/IEC 23091-2
  • DecMixTransferCharacteristics indicates the transfer characteristics specified in ISO/IEC 23091-2,
  • DecMixMatrixCoeffs indicates the matrix coefficients specified in ISO/IEC 23091-2
  • DecMixCompTime indicating the mixing video compositing time.
  • dimension corresponds to the decoded blended video frame index.
  • DecMixTransferCharacteristics are missing or set to the value 2, i.e. unspecified, these elements shall be set to 8, i.e. linear.
  • any existing video coding specification such as ISO/IEC 14496-10 or ISO/IEC 23008-2 or any future defined video coding specification, can be used if included in min_mixed_codec_id.
  • the following takes N visual media contents as multi-viewpoint video and point cloud as an example, and combines the above scheme 2 to introduce the encoding method of the embodiment of the present application, as shown in FIG. 13 , the present application
  • the encoding method of the embodiment includes the following steps:
  • Step 21 the multi-viewpoint video is projected between viewpoints, duplicates are erased to remove redundancy, non-repeated pixels are connected into sub-blocks, and the sub-blocks are spliced into a multi-viewpoint video mosaic map, and the point cloud is formed into sub-blocks through parallel projection and connected pixels in the projection plane , the sub-blocks are spliced into a point cloud point cloud splicing graph.
  • the current region added to the heterogeneous hybrid mosaic is a multi-view video mosaic, set min_region_format_type_id[j][i] to 0 in the hybrid mosaic information.
  • Step 23 performing video coding on the heterogeneous hybrid mosaic image to obtain a video compressed sub-stream
  • Step 24 splicing the multi-viewpoint video mosaic image and the point cloud mosaic image into a hybrid mosaic information encoding of the heterogeneous hybrid mosaic image to form a hybrid mosaic information sub-stream;
  • Step 25 write the video compressed code stream and the mixed splicing information code stream into the compressed code stream.
  • the decoder when decoding, the decoder first decodes the second flag, and if the second flag indicates that the current hybrid mosaic is a heterogeneous hybrid mosaic, the decoder then decodes the first flag to determine the heterogeneous hybrid mosaic The expression format type of the mosaic image of the current region, and then realize accurate decoding.
  • this scheme defines four new v3c unit types (V3C_MAVD, V3C_MGVD, V3C_MOVD, V3C_MPVD) based on the original four v3c unit types (V3C_AVD, V3C_GVD, V3C_OVD, V3C_PVD), so that the decoder At the v3c unit header level, you can judge whether the current mosaic is a heterogeneous hybrid mosaic according to the v3c unit type.
  • the v3c unit type is one of the above four newly defined v3c unit types, it indicates that the current mosaic graph is a heterogeneous hybrid mosaic graph, and then distinguish the heterogeneous hybrid mosaic graph through a logo similar to the design in scheme 1. format for each region.
  • a third flag is written in the hybrid mosaic information, and the third flag is used to indicate whether the current hybrid mosaic graph is a heterogeneous hybrid mosaic graph, and which heterogeneous hybrid mosaic graph it belongs to.
  • the heterogeneous hybrid mosaic graph includes the following types: heterogeneous hybrid occupancy mosaic graph, heterogeneous hybrid geometric mosaic graph, heterogeneous hybrid attribute mosaic graph, and heterogeneous hybrid package mosaic graph.
  • the method of the embodiment of the present application also includes the following examples:
  • Example 1 if the encoder determines that the current heterogeneous hybrid mosaic is a heterogeneous hybrid occupancy mosaic, then set the value of the third flag to a first preset value, such as V3C_MAVD.
  • Example 2 if the encoding side determines that the current heterogeneous hybrid mosaic is a heterogeneous hybrid geometric mosaic, set the value of the third flag to a second preset value, such as V3C_MGVD.
  • Example 3 if the encoder determines that the current heterogeneous hybrid mosaic is a heterogeneous hybrid attribute mosaic, set the value of the third flag to a third preset value, such as V3C_MGVD.
  • Example 4 if the encoder determines that the current heterogeneous hybrid mosaic is a heterogeneous hybrid packaged mosaic, set the value of the third flag to a fourth preset value, for example, a heterogeneous hybrid packaged mosaic.
  • the embodiment of the present application adds at least one of the following syntax elements in the mixed information: V3C_MAVD, V3C_MGVD, V3C_MOVD, V3C_MPVD.
  • V3C_MAVD is used to indicate that the current mixed mosaic is a heterogeneous mixed occupancy mosaic.
  • the current mixed mosaic only includes the occupancy mosaic of the multi-view video and the occupancy mosaic of the point cloud.
  • V3C_MGVD is used to indicate that the current hybrid mosaic is a heterogeneous hybrid geometric mosaic. For example, it is indicated that the current hybrid mosaic only includes the geometric mosaic of the multi-view video and the geometric mosaic of the point cloud.
  • V3C_MOVD is used to indicate that the current mixed mosaic is a heterogeneous mixed attribute mosaic. For example, it is indicated that the current mixed mosaic only includes the texture mosaic of the multi-viewpoint video and the texture mosaic of the point cloud.
  • V3C_MPVD is used to indicate that the current hybrid mosaic is a heterogeneous hybrid packed mosaic.
  • the heterogeneous hybrid package mosaic graph may also be called a full-attribute heterogeneous hybrid mosaic graph.
  • the current hybrid mosaic includes an occupancy mosaic of multiview video and an occupancy mosaic of point cloud, a geometric mosaic of multiview video and a geometric mosaic of point cloud, and a texture mosaic and point cloud of multiview video Cloud texture mosaic.
  • the above-mentioned third logo is a completely new logo.
  • the above-mentioned third flag reuses the existing vuh_unit_type.
  • the third flag reuses the existing vuh_unit_type as an example for illustration.
  • the above-mentioned third flag may be located in the unit header of the mixed splicing information.
  • the syntax elements of Scheme 3 are as shown in FIG. 14 .
  • the encoder when the encoder determines that the third flag indicates that the current hybrid mosaic is a heterogeneous hybrid mosaic, it writes the first flag into the hybrid mosaic information.
  • the encoder determines that the third flag indicates that the current hybrid mosaic is not a heterogeneous hybrid mosaic, then writing the first flag in the hybrid mosaic information is skipped.
  • V3C unit header shown in Table 12 above are shown in Table 13, wherein compared with Table 2 above, Table 13 adds the semantics of V3C_MVD.
  • V3C unit header semantics are shown in Table 13:
  • V3C unit payload syntax is shown in Table 14:
  • V3C general parameter set syntax is shown in Table 15
  • vps_mixed_occuapancy_video_present_flag[j] 0 indicates that the mixed packed video frame of atlas with ID j does not contain regions with occupancy data.
  • vps_mixed_occuapancy_video_present_flag[j] 1 indicates that the mixed packed video frame of atlas with ID j does contain regions with occupancy data.
  • vps_mixed_occuapancy_video_present_flag[j] is not present, it is inferred to be equal to 0.
  • bitstream consistency requirement is that if vps_mixed_occuapancy_video_present_flag[j] is equal to 1 for atlas with atlas ID j, vps_occupancy_video_present_flag[j] should be equal to 0 for atlas with the same atlas ID j.
  • vps_mixed_occuapancy_information_present_flag 1 specifies that one or more mixed information (j) syntax structure instances are present in the v3c_parameter_set() syntax structure.
  • vps_mixed_occupancy_information_present_flag 0 indicates that the syntax structure does not exist. When not present, the value of vps_mixed_occupancy_information_present_flag is inferred to be equal to 0.
  • vps_mixed_geometry_video_present_flag[j] 0 indicates that the mixed packed video frame of atlas with ID j does not contain regions with geometry data.
  • vps_mixed_geometry_video_present_flag[j] 1 indicates that the mixed packed video frame of the atlas with ID j does contain regions with geometry data.
  • vps_mixed_geometry_video_present_flag[j] is not present, it is inferred to be equal to 0.
  • bitstream consistency is that if vps_mixed_geometry_video_present_flag[j] is equal to 1 for atlas with ID j, then vps_geometry_video_present_flag[j] should be equal to 0 for atlas with ID j.
  • vps_mixed_geometry_information_present_flag 1 specifies that one or more mixed information (j) syntax structure instances are present in the v3c_parameter_set() syntax structure.
  • vps_mixed_geometry_information_present_flag 0 indicates that the syntax structure does not exist. When absent, the value of vps_mixed_geometry_information_present_flag is inferred to be equal to 0.
  • vps_mixed_attribute_video_present_flag[j] 0 indicates that the mixed packed video frame of the atlas with ID j does not contain a region with attribute data.
  • vps_mixed_attribute_video_present_flag[j] 1 indicates that the mixed packed video frame for atlas with ID j does contain regions with attribute data.
  • vps_mixed_attribute_video_present_flag[j] is not present, it is inferred to be equal to 0.
  • bitstream consistency is that if vps_mixed_attribute_video_present_flag[j] is equal to 1 for atlas with ID j, vps_attribute_video_present_flag[j] shall be equal to 0 for atlas with ID j.
  • vps_mixed_attribute_information_present_flag 1 specifies that one or more mixed information (j) syntax structure instances are present in the v3c_parameter_set() syntax structure.
  • vps_mixed_attribute_information_present_flag 0 indicates that the syntax structure does not exist. When absent, the value of vps_mixed_attribute_information_present_flag is inferred to be equal to 0.
  • vps_mixed_packing_information_present_flag 1 specifies that one or more mixed information (j) syntax structure instances are present in the v3c_parameter_set() syntax structure.
  • vps_mixed_packing_information_present_flag 0 indicates that the syntax structure does not exist. When absent, the value of vps_mixed_packing_information_present_flag is inferred to be equal to 0.
  • the mixed occupancy information syntax is shown in Table 16:
  • the mixed occupancy video frame can be divided into one or more rectangular regions.
  • a region should map exactly to an atlas tile. Rectangular areas of mixed occupancy video frames are not allowed to overlap.
  • moi_codec_id[j] indicates the identifier of the codec used to compress mixed occupancy video data for the atlas with ID j.
  • moi_codec_id shall be in the range 0 to 255, inclusive.
  • the codec may be identified through the Component Codec Mapping SEI message or by means outside of this document.
  • moi_occupancy_2d_bit_depth_minus1[j] plus 1 indicates the nominal 2D bit depth to which the decoded region of the atlas with ID j containing occupancy data should be converted.
  • moi_occupancy_MSB_align_flag[j] shall be in the range 0 to 31, inclusive.
  • moi_occupancy_MSB_align_flag[j] indicates how the decoded region containing occupied samples of the atlas with ID j is converted to samples of the nominal occupied bit depth, as specified in Annex B.
  • moi_lossy_occupancy_compression_threshold[j] indicates the threshold for deriving binary occupancy from a decoded region containing occupancy data for atlas with ID j. moi_lossy_occupancy_compression_threshold[j] should be in the range 0 to 255, inclusive.
  • moi_regions_count_minus1[j] plus 1 indicates the number of regions in which the atlas with ID j is mixed in one video frame. moi_regions_count_minus1 should be in the range 0 to 7, inclusive. When absent, the value of moi_regions_count_minus1 is inferred to be equal to 0.
  • moi_region_tile_id[j][i] indicates the tile ID of the region whose index is i in the atlas with ID j.
  • moi_region_format_type_id[j][i] indicates the format type of the region whose ID is j and whose atlas index is i. moi_region_format_type_id[j][i] is equal to 0, indicating that the region format is multi-view video; equal to 1, the region format is point cloud.
  • moi_region_top_left_x[j][i] specifies the horizontal position of the upper left sample for the region with index i of the atlas with ID j, in units of luma samples in the mixed occupancy video component frame. When absent, the value of moi_region_top_left_x[j][i] is inferred to be equal to 0.
  • moi_region_top_left_y[j][i] specifies the vertical position of the top left sample for the region with index i of the atlas with ID j, in units of luma samples in the mixed occupancy video component frame. When absent, the value of moi_region_top_left_y[j][i] is inferred to be equal to 0.
  • moi_region_width_minus1[j][i]plus 1 specifies the width for the region with index i of the atlas with ID j, in units of luminance samples.
  • moi_region_height_minus1[j][i] plus 1 specifies the height of the region whose index is i in the atlas with ID j, in units of brightness samples.
  • moi_region_unpack_top_left_x[j][i] specifies the horizontal position of the upper left sample for the region with index i of ID j's atlas in units of luma samples in the decompressed video component frame. When absent, the value of moi_region_unpack_top_left_x[j][i] is inferred to be equal to 0.
  • moi_region_unpack_top_left_y[j][i] specifies the vertical position of the top left sample for the region with index i of the atlas of ID j in units of luma samples in the decompressed video component frame. When absent, the value of moi_region_unpack_top_left_y[j][i] is inferred to be equal to 0.
  • moi_region_rotation_flag[j][i] 0 indicates that no rotation is performed on the region with index i of the atlas with ID j.
  • moi_region_rotation_flag[j][i] 1 means that the region with index i of the atlas with ID j is rotated by 90 degrees.
  • the mixed geometry information semantics are as follows:
  • the blended geometric video frame can be divided into one or more rectangular regions.
  • a region should map exactly to an atlas tile. Rectangular areas of mixed geometry video frames are not allowed to overlap.
  • mgi_codec_id[j] indicates an identifier of a codec for compressing mixed geometry video data for atlas with ID j.
  • mgi_codec_id shall be in the range 0 to 255, inclusive.
  • the codec may be identified through the Component Codec Mapping SEI message or by means outside of this document.
  • mgi_geometry_2d_bit_depth_minus1[j] plus 1 indicates the nominal 2D bit depth, the nominal 2D bit depth to which the decoding area containing geometry data of the atlas with ID j should be converted.
  • mgi_geometry_2d_bit_depth_minus1[j] should be in the range 0 to 31, inclusive.
  • mgi_geometry_MSB_align_flag[j] indicates how to convert the decoded region containing geometry samples of atlas with ID j to samples of the nominal occupied bit depth, as described in Annex B.
  • mgi_geometry_3d_coordinates_bit_depth_minus1[j] plus 1 indicates the bit depth of the geometric coordinates of the reconstructed stereo content of the atlas with ID j.
  • mgi_geometry_3d_coordinates_bit_depth_minus1[j] should be in the range 0 to 31, inclusive.
  • mgi_regions_count_minus1[j] plus 1 indicates the number of regions in which the atlas with ID j is mixed in one video frame.
  • mgi_regions_count_minus1 shall be in the range 0 to 7, inclusive. When absent, the value of mgi_regions_count_minus1 is inferred to be equal to 0.
  • mgi_region_tile_id[j][i] indicates the tile ID of the region whose index is i in the atlas with ID j.
  • mgi_region_format_type_id[j][i] indicates the format type of the region whose ID is j and whose atlas index is i.
  • mgi_region_format_type_id[j][i] is equal to 0, indicating that the region format is multi-view video; equal to 1, the region format is point cloud.
  • mgi_region_top_left_x[j][i] specifies the horizontal position of the top left sample for the region with index i of the atlas with ID j, in units of luminance samples in the mixed geometry video component frame. When absent, the value of mgi_region_top_left_x[j][i] is inferred to be equal to 0.
  • mgi_region_top_left_y[j][i] specifies the vertical position of the top left sample for the region with index i of the atlas with ID j, in units of luma samples in the mixed geometry video component frame. When absent, the value of mgi_region_top_left_y[j][i] is inferred to be equal to 0.
  • mgi_region_width_minus1[j][i]plus 1 specifies the width for the region whose index is i in the atlas with ID j, in units of brightness samples.
  • mgi_region_height_minus1[j][i] plus 1 specifies the height of the region whose index is i in the atlas with ID j and specifies the height, in units of brightness samples.
  • mgi_region_unpack_top_left_x[j][i] specifies the horizontal position of the upper left sample for the region with index i of ID j's atlas in units of luma samples in the decompressed video component frame. When absent, the value of mgi_region_unpack_top_left_x[j][i] is inferred to be equal to 0.
  • mgi_region_unpack_top_left_y[j][i] specifies the vertical position of the top left sample for the region with index i of ID j's atlas in units of luma samples in the decompressed video component frame. When absent, the value of mgi_region_unpack_top_left_y[j][i] is inferred to be equal to 0.
  • mgi_region_rotation_flag[j][i] 0 indicates that no rotation is performed on the region with index i of the atlas with ID j.
  • mgi_region_rotation_flag[j][i] 1 means that the region with index i of the atlas with ID j is rotated by 90 degrees.
  • mgi_region_map_index[j][i] specifies the map index of the region whose ID is j and whose atlas index is i.
  • mgi_region_auxiliary_data_flag[j][i] 1 indicates that the region with ID j atlas index i contains only RAW and/or EOM codepoints.
  • mgi_region_auxiliary_data_flag 0 indicates that the region with index i in atlas with ID j may contain RAW and/or EOM codepoints.
  • the blended attribute video frame can be divided into one or more rectangular regions.
  • a region should map exactly to an atlas tile. Rectangular regions of mixed attribute video frames are not allowed to overlap.
  • mai_codec_id[j] indicates an identifier of a codec for compressing mixed attribute video data for an atlas whose ID is j.
  • mai_codec_id should be in the range 0 to 255, inclusive.
  • the codec may be identified through the Component Codec Mapping SEI message or by means outside of this document.
  • mai_attribute_count[j] indicates the number of attributes with a unique attribute type present in the mixed attribute video frame of the atlas with ID j.
  • mai_attribute_type_id[j][i] represents the ith attribute type of the attribute area of the mixed attribute video frame of the atlas whose ID is j.
  • Table 3 describes the list of supported attribute types.
  • mai_attribute_2d_bit_depth_minus1[j][k] plus 1 indicates the nominal 2D bit depth to which the region containing the attribute with attribute index k should be converted, for an atlas with ID j.
  • mai_attribute_2d_bit_depth_minus1[j][k] shall be in the range 0 to 31, inclusive.
  • mai_attribute_MSB_align_flag[j][k] indicates how to convert a decoded region (for an atlas with ID j) containing an attribute of attribute type k to samples of the nominal attribute bit depth, as described in Annex B.
  • mai_attribute_map_absolute_coding_persistence_flag[j][k] 1 indicates that the decoding region contains the attribute map of the attribute with index k, corresponding to the atlas with ID j, encoded without any form of map prediction.
  • mai_attribute_map_absolute_coding_persistence_flag[j][i] 0 indicates that the decoding region contains the attribute map of the attribute with index k, corresponding to the atlas with ID j, and the same map prediction method as used for the geometric component of the atlas with ID j should be used . If mai_attribute_map_absolute_coding_persistence_flag[j][i] is absent, its value equal to 1 shall be inferred.
  • the 3D array AttributeMapAbsoluteCodingEnabledFlag indicates whether a specific mapping of an attribute is to be encoded, with or without prediction, obtained as follows:
  • mai_attribute_dimension_minus1[j][k] plus 1 indicates the total dimension (that is, the number of channels) of the region containing the attribute with index k in the atlas with ID j.
  • mai_attribute_dimension_minus1[j][i] shall be in the range 0 to 63, inclusive.
  • mai_attribute_dimension_partitions_minus1[j][k] plus 1 indicates the number of partition groups that should be grouped by the attribute channel of the region containing the attribute with index k for the atlas with ID j.
  • mai_attribute_dimension_partitions_minus1[j][k] shall be in the range 0 to 63, inclusive.
  • mai_attribute_partition_channels_minus1[j][k][l] plus 1 indicates the number of channels assigned to the dimension partition group with index l for the area containing the attribute with index k in the atlas with ID j.
  • ai_attribute_partition_channels_minus1[j][k][l] should be in the range of 0 to ai_attribute_dimension_minus1[j][k] for all dimension partition groups.
  • mai_regions_count_minus1[j] plus 1 indicates the number of regions in which the atlas with ID j is mixed in one video frame.
  • mai_regions_count_minus1 should be in the range 0 to 7, inclusive. When absent, the value of mai_regions_count_minus1 is inferred to be equal to 0.
  • mai_region_tile_id[j][i] indicates the tile ID of the region whose index is i in the atlas whose ID is j.
  • mai_region_format_type_id[j][i] indicates the format type of the region whose ID is j and whose atlas index is i.
  • mai_region_format_type_id[j][i] is equal to 0, indicating that the region format is multi-view video; equal to 1, the region format is point cloud.
  • mai_region_top_left_x[j][i] takes the luminance sample in the mixed attribute video component frame as a unit, and specifies the horizontal position of the upper left sample for the region with the index i of the atlas with ID j. When absent, the value of mai_region_top_left_x[j][i] is inferred to be equal to 0.
  • mai_region_top_left_y[j][i] specifies the vertical position of the upper left sample for the region with the index i of the atlas with the ID j in units of luminance samples in the mixed attribute video component frame. When absent, the value of mai_region_top_left_y[j][i] is inferred to be equal to 0.
  • mai_region_width_minus1[j][i]plus 1 specifies the width for the region whose index is i in the atlas with ID j, in units of brightness samples.
  • mai_region_height_minus1[j][i] plus 1 specifies the height of the region whose index is i in the atlas with ID j and specifies the height, in units of brightness samples.
  • mai_region_unpack_top_left_x[j][i] specifies the horizontal position of the top left sample for the region with index i of ID j's atlas in units of luminance samples in the decompressed video component frame. When absent, the value of mai_region_unpack_top_left_x[j][i] is inferred to be equal to 0.
  • mai_region_unpack_top_left_y[j][i] specifies the vertical position of the top left sample for the region with index i of ID j's atlas in units of luma samples in the decompressed video component frame. When absent, the value of mai_region_unpack_top_left_y[j][i] is inferred to be equal to 0.
  • mai_region_rotation_flag[j][i] 0 indicates that the region with index i of the atlas with ID j is not to be rotated.
  • mai_region_rotation_flag[j][i] 1 means that the region with index i of the atlas with ID j is rotated by 90 degrees.
  • mai_region_map_index[j][i] specifies the map index of the region whose ID is j and whose atlas index is i.
  • mai_region_auxiliary_data_flag[j][i] 1 indicates that the region with index i in atlas with ID j contains only RAW and/or EOM codepoints.
  • mai_region_auxiliary_data_flag 0 indicates that the region with index i in atlas with ID j may contain RAW and/or EOM codepoints.
  • mai_region_attr_type_id[j][i] indicates the attribute type of the region whose ID is j and whose atlas index is i.
  • Table 3 describes the list of supported attributes.
  • mai_region_attr_partition_index[j][i] indicates the attribute partition index of the region whose atlas index is i with ID j. When absent, the value of mai_region_attr_partition_index[j][i] is inferred to be equal to 0.
  • the mixed packed video frame can be divided into one or more rectangular regions.
  • a region should map exactly to an atlas tile. Rectangular regions of mixed packed video frames are not allowed to overlap.
  • mpi_codec_id[j] indicates an identifier of a codec for compressing mixed packed video data for the atlas whose ID is j.
  • mpi_codec_id should be in the range 0 to 255, inclusive.
  • the codec may be identified through the Component Codec Mapping SEI message or by means outside of this document.
  • mpi_occupancy_present_flag[j] 0 indicates that the mixed packed video frame of the atlas with ID j does not contain a region with occupancy data.
  • mpi_occupancy_present_flag[j] 1 indicates that the mixed packed video frame of the atlas with ID j does contain regions with occupancy data.
  • mpi_occupancy_present_flag[j] is not present, it is inferred to be equal to 0.
  • bitstream consistency requirement is that if mpi_occupancy_present_flag[j] is equal to 1 for an atlas with atlas ID j, vps_occupancy_video_present_flag[j] should be equal to 0 for an atlas with the same atlas ID j.
  • mpi_geometry_present_flag[j] 0 indicates that the mixed packed video frame of atlas with ID j does not contain regions with geometry data.
  • mpi_geometry_present_flag[j] 1 indicates that the mixed packed video frame of the atlas with ID j does contain regions with geometry data.
  • mpi_geometry_present_flag[j] is not present, it is inferred to be equal to 0.
  • bitstream consistency is that if mpi_geometry_present_flag[j] is equal to 1 for atlas with ID j, then vps_geometry_video_present_flag[j] should be equal to 0 for atlas with ID j.
  • mpi_attributes_present_flag[j] 0 indicates that the mixed packed video frame of atlas with ID j does not contain regions with attribute data.
  • mpi_attributes_present_flag[j] 1 indicates that the mixed packed video frame for the atlas with ID j does contain regions with attribute data.
  • mpi_attributes_present_flag[j] is not present, it is inferred to be equal to 0.
  • bitstream consistency is that if mpi_attribute_present_flag[j] is equal to 1 for atlas with ID j, vps_attribute_video_present_flag[j] should be equal to 0 for atlas with ID j.
  • mpi_occupancy_2d_bit_depth_minus1[j] plus 1 indicates the nominal 2D bit depth to which the decoded region of the atlas with ID j containing occupancy data should be converted.
  • mpi_occupancy_MSB_align_flag[j] shall be in the range 0 to 31, inclusive.
  • mpi_occupancy_MSB_align_flag[j] indicates how the decoded region containing the occupied samples of the atlas with ID j is converted to samples of the nominal occupied bit depth, as specified in Annex B.
  • mpi_lossy_occupancy_compression_threshold[j] indicates the threshold for deriving binary occupancy from a decoded region containing occupancy data for atlas with ID j.
  • mpi_lossy_occupancy_compression_threshold[j] should be in the range 0 to 255, inclusive.
  • mpi_geometry_2d_bit_depth_minus1[j] plus 1 indicates the nominal 2D bit depth, the nominal 2D bit depth to which the decoding area containing geometry data of the atlas with ID j should be converted.
  • mpi_geometry_2d_bit_depth_minus1[j] should be in the range 0 to 31, inclusive.
  • mpi_geometry_MSB_align_flag[j] indicates how to convert the decoded region containing geometry samples of the atlas with ID j to samples of the nominal occupied bit depth, as described in Annex B.
  • mpi_geometry_3d_coordinates_bit_depth_minus1[j] plus 1 indicates the bit depth of the geometric coordinates of the reconstructed stereo content of the atlas with ID j.
  • mpi_geometry_3d_coordinates_bit_depth_minus1[j] should be in the range 0 to 31, inclusive.
  • mpi_attribute_count[j] indicates the number of attributes with a unique attribute type present in the mixed packed video frame of the atlas with ID j.
  • mpi_attribute_type_id[j][i] represents the i-th attribute type of the attribute area of the mixed packed video frame of the atlas whose ID is j.
  • Table 3 describes the list of supported attribute types.
  • mpi_attribute_2d_bit_depth_minus1[j][k] plus 1 indicates the nominal 2D bit depth to which the region containing the attribute with attribute index k should be converted, for an atlas with ID j.
  • mpi_attribute_2d_bit_depth_minus1[j][k] shall be in the range 0 to 31, inclusive.
  • mpi_attribute_MSB_align_flag[j][k] indicates how to convert a decoded region (for an atlas with ID j) containing an attribute of attribute type k to samples of the nominal attribute bit depth, as described in Annex B.
  • mpi_attribute_map_absolute_coding_persistence_flag[j][k] 1 indicates that the decoded region contains the attribute map of the attribute with index k, corresponding to the atlas with ID j, encoded without any form of map prediction.
  • mpi_attribute_map_absolute_coding_persistence_flag[j][i] 0 indicates that the decoding region contains the attribute map of the attribute with index k, corresponding to the atlas with ID j, and the same map prediction method as used for the geometric component of the atlas with ID j should be used . If mpi_attribute_map_absolute_coding_persistence_flag[j][i] is absent, its value equal to 1 shall be inferred.
  • the 3D array AttributeMapAbsoluteCodingEnabledFlag indicates whether a specific mapping of an attribute is to be encoded, with or without prediction, obtained as follows:
  • mpi_attribute_dimension_minus1[j][k] plus 1 indicates the total dimension (that is, the number of channels) of the region containing the attribute with index k in the atlas with ID j.
  • mpi_attribute_dimension_minus1[j][i] shall be in the range 0 to 63, inclusive.
  • mpi_attribute_dimension_partitions_minus1[j][k] plus 1 indicates the number of partition groups that should be grouped by the attribute channel of the region containing the attribute with index k for the atlas with ID j.
  • mpi_attribute_dimension_partitions_minus1[j][k] shall be in the range 0 to 63, inclusive.
  • mpi_attribute_partition_channels_minus1[j][k][l] plus 1 indicates the number of channels assigned to the dimension partition group with index l for the area containing the attribute with index k in the atlas with ID j.
  • ai_attribute_partition_channels_minus1[j][k][l] should be in the range of 0 to ai_attribute_dimension_minus1[j][k] for all dimension partition groups.
  • mpi_regions_count_minus1[j] plus 1 indicates the number of regions in which the atlas with ID j is mixed in one video frame. mpi_regions_count_minus1 should be in the range 0 to 7, inclusive. When absent, the value of mpi_regions_count_minus1 is inferred to be equal to 0.
  • mpi_region_tile_id[j][i] indicates the tile ID of the region whose index is i in the atlas whose ID is j.
  • mpi_region_format_type_id[j][i] indicates the format type of the region whose ID is j and whose atlas index is i.
  • mpi_region_format_type_id[j][i] is equal to 0, indicating that the region format is multi-view video; equal to 1, the region format is point cloud.
  • mpi_region_type_id_minus2[j][i] plus 2 means that for the atlas with ID j, the ID of the region with index i.
  • the value of mpi_region_type_id_minus2[j][i] shall be in the range 0 to 2, inclusive.
  • mpi_region_top_left_x[j][i] specifies the horizontal position of the upper left sample for the region with the index i of the atlas with ID j in units of luminance samples in the mixed packed video component frame.
  • the value of mpi_region_top_left_x[j][i] is inferred to be equal to 0.
  • mpi_region_top_left_y[j][i] specifies the vertical position of the upper left sample for the region with the index i of the atlas with ID j in units of luma samples in the mixed packed video component frame.
  • the value of mpi_region_top_left_y[j][i] is inferred to be equal to 0.
  • mpi_region_width_minus1[j][i]plus 1 specifies the width for the region with index i of the atlas with ID j, in units of luminance samples.
  • mpi_region_height_minus1[j][i] plus 1 specifies the height of the region whose index is i in the atlas with ID j and specifies the height, in units of brightness samples.
  • mpi_region_unpack_top_left_x[j][i] specifies the horizontal position of the upper left sample for the region with index i of the atlas of ID j in units of luma samples in the decompressed video component frame.
  • the value of mpi_region_unpack_top_left_x[j][i] is inferred to be equal to 0.
  • mpi_region_unpack_top_left_y[j][i] specifies the vertical position of the top left sample for the region with index i of the atlas of ID j in units of luma samples in the decompressed video component frame.
  • the value of mpi_region_unpack_top_left_y[j][i] is inferred to be equal to 0.
  • mpi_region_rotation_flag[j][i] 0 indicates that no rotation is performed on the region with index i of the atlas with ID j.
  • mpi_region_rotation_flag[j][i] 1 means that the region with index i of the atlas with ID j is rotated by 90 degrees.
  • mpi_region_map_index[j][i] specifies the map index of the region whose atlas index is i with ID j.
  • mpi_region_auxiliary_data_flag[j][i] 1 indicates that the region with ID j atlas index i contains only RAW and/or EOM codepoints.
  • mpi_region_auxiliary_data_flag 0 indicates that the region with index i in atlas with ID j may contain RAW and/or EOM codepoints.
  • mpi_region_attr_type_id[j][i] indicates the attribute type of the region whose ID is j and whose atlas index is i.
  • Table 3 describes the list of supported attributes.
  • mpi_region_attr_partition_index[j][i] indicates the attribute partition index of the region whose atlas index is i with ID j. When absent, the value of mpi_region_attr_partition_index[j][i] is inferred to be equal to 0.
  • the hybrid packed video decoding process is as follows:
  • the decoding process of the mixed video component of the atlas whose ID is DecAtlasID is as follows.
  • the codec is first determined using the profile defined in Annex A or the value of mpi_codec_id[DecAtlasID] and the component codec mapping SEI message specified in subclause F.2.11, if present. Then, according to the corresponding encoding specification, the hybrid video decoding process is invoked using the hybrid video sub-bitstream present in the V3C bitstream as input.
  • DecMpkFrames a 4D array DecMpkFrames, the decoded mixed video frames, where the dimensions correspond to the decoded mixed video frame index, component index, row index, and column index, respectively, and
  • DecMpkChromaSamplingPosition indicates the video chroma sampling position as specified in ISO/IEC 23091-2,
  • DecMpkFullRange indicates the full range of video codepoints specified in ISO/IEC 23091-2
  • DecMpkTransferCharacteristics indicates the transfer characteristics specified in ISO/IEC 23091-2,
  • DecMpkMatrixCoeffs indicates the matrix coefficients specified in ISO/IEC 23091-2
  • DecMpkCompTime indicating the composite video compositing time.
  • dimension corresponds to the decoded blended video frame index.
  • DecMpkTransferCharacteristics are missing or set to the value 2, i.e. unspecified, these elements shall be set to 8, i.e. linear.
  • DecMpkChromaSamplingPosition DecMpkColourPrimaries, DecMpkMatrixCoeffs, DecMpkFullRange and DecMpkTransferCharacteristics should not be applied to mpi_region_attr_type_id equal to V3C_OVD, V3C_GVD and V3C_AVD for mpi_region_attr_type_id equal to ATTR_ MATERIAL_ID for any further processing of decoded mixed frame regions, or ATTR_NORMAL.
  • any existing video coding specification such as ISO/IEC 14496-10 or ISO/IEC 23008-2 or any future defined video coding specification can be used if included in mix_packed_codec_id.
  • N visual media contents as multi-viewpoint video and point cloud as an example, combined with the above scheme 3, introduces the encoding method of the embodiment of the present application, as shown in Figure 15, the present application
  • the encoding method of the embodiment includes the following steps:
  • Step 31 the multi-viewpoint video is projected between viewpoints, duplicates are erased to remove redundancy, non-repeated pixels are connected into sub-blocks, and the sub-blocks are spliced into a multi-viewpoint video mosaic map, and the point cloud is formed into sub-blocks through parallel projection and connected pixels in the projection plane , the sub-blocks are spliced into a point cloud point cloud splicing graph.
  • the current region added to the heterogeneous hybrid mosaic is a multi-view video packet mosaic
  • set mpi_region_format_type_id[j][i] to 0 in the hybrid mosaic information.
  • the current region added to the heterogeneous hybrid mosaic is a point cloud package mosaic, set mpi_region_format_type_id[j][i] to 1 in the hybrid mosaic information.
  • Step 33 performing video encoding on the heterogeneous hybrid mosaic image to obtain a video compressed sub-stream
  • Step 34 splicing the multi-viewpoint video package mosaic image and the point cloud package mosaic image into a heterogeneous hybrid package mosaic image and encoding the hybrid mosaic information to form a hybrid mosaic information sub-stream;
  • Step 35 write the video compressed code stream and the mixed splicing information code stream into the compressed code stream.
  • the decoder when decoding, the decoder first decodes the third flag, and if the third flag indicates that the current hybrid mosaic is a certain type of heterogeneous hybrid mosaic, the decoder then decodes the first flag to determine the heterogeneous mosaic.
  • the expression format types of the mosaic in the current region of the mosaic are mixed to achieve accurate decoding.
  • the encoding end processes N visual media contents separately to obtain N mosaic diagrams, at least two of the N visual media contents have different expression formats, and N is greater than 1 is a positive integer; splicing N mosaic graphs to generate a heterogeneous hybrid mosaic graph; encoding the heterogeneous hybrid mosaic graph to obtain a code stream.
  • this application splices mosaic images corresponding to visual media content in different expression formats into a heterogeneous hybrid mosaic image, for example, splicing multi-viewpoint video mosaic images and point cloud mosaic images into a heterogeneous hybrid mosaic image for encoding and decoding , which minimizes the number of two-dimensional video encoders such as HEVC, VVC, AVC, and AVS that need to be called, reduces the encoding cost, and improves usability.
  • two-dimensional video encoders such as HEVC, VVC, AVC, and AVS that need to be called
  • the encoding method of the present application is described above by taking the encoding end as an example, and the description is described below by taking the decoding end as an example.
  • FIG. 16 is a schematic flowchart of a decoding method provided by an embodiment of the present application. As shown in Figure 16, the decoding method of the embodiment of the present application includes:
  • the expression formats corresponding to at least two reconstructed visual media contents among the plurality of reconstructed visual media contents are different.
  • the decoding end decodes the code stream to obtain the reconstructed heterogeneous hybrid mosaic graph, and then splits the reconstructed heterogeneous hybrid mosaic graph to obtain N reconstructed isomorphic mosaic graphs, then N At least two mosaics in the reconstructed isomorphic mosaic map correspond to different expression formats.
  • the decoder performs reconstruction and other processing on the split N reconstructed isomorphic mosaic graphs to obtain multiple reconstructed visual media contents.
  • multiple isomorphic mosaic graphs of different expression formats are spliced into a heterogeneous hybrid mosaic graph, so that when decoding, the two-dimensional video decoding such as HEVC, VVC, AVC, and AVS that needs to be called can be reduced as much as possible
  • the number of devices reduces the cost of decoding and improves ease of use.
  • the above-mentioned code stream includes a video compression sub-code stream.
  • the above-mentioned S701 includes the following steps:
  • the code stream in the embodiment of the present application includes the video compression sub-code stream and may also include other content.
  • the decoding end parses the code stream to obtain the video compression sub-code stream included in the code stream.
  • the compressed video sub-stream is decoded to obtain a reconstructed heterogeneous hybrid mosaic graph.
  • the compressed video sub-stream is input into the video decoder shown in FIG. 2B for decoding to obtain a reconstructed heterogeneous hybrid mosaic graph.
  • the encoding end writes the hybrid splicing information into the code stream, that is to say, in addition to the above-mentioned video compression sub-code stream, the code stream in the embodiment of the present application also includes the hybrid splicing information sub-code
  • the decoding method in the embodiment of the present application further includes: decoding the mixed splicing information sub-stream to obtain the mixed splicing information.
  • the decoding end analyzes the code stream to obtain the video compression sub-rate stream and the hybrid splicing information sub-rate stream, and then, the decoding end decodes the video compression sub-rate stream to obtain a reconstructed heterogeneous hybrid splicing graph.
  • the mixed and spliced information sub-streams are decoded to obtain the mixed and spliced information.
  • the reconstructed heterogeneous hybrid mosaic graph is split to obtain N reconstructed isomorphic mosaic graphs.
  • reconstructing a heterogeneous hybrid mosaic includes reconstructing a multi-attribute heterogeneous hybrid mosaic and reconstructing a single-attribute heterogeneous hybrid mosaic.
  • the N reconstruction isomorphic mosaics include at least two of a multi-view video reconstruction mosaic, a point cloud reconstruction mosaic, and a mesh reconstruction mosaic.
  • first expression format and the second expression format are any one of multi-view video, point cloud and network, and the first expression format and the second expression format are different.
  • the above S702-A1 includes the following examples:
  • Example 1 if the reconstructed heterogeneous mixed texture mosaic is reconstructed heterogeneous mixed texture mosaic, then according to the mixed mosaic information, the reconstructed heterogeneous mixed texture mosaic is split to obtain multi-viewpoint video texture reconstruction mosaic and point cloud texture reconstruction Mosaic.
  • Example 2 if the reconstruction of the heterogeneous hybrid mosaic is the hybrid mosaic information of the heterogeneous hybrid geometry and occupancy mosaic, then according to the hybrid mosaic information, the reconstructed heterogeneous hybrid geometry and occupancy mosaic are split to obtain multiple viewpoints Video geometric reconstruction mosaic, point cloud geometric reconstruction mosaic and point cloud occupancy reconstruction mosaic.
  • the multi-view point can be reconstructed according to video texture mosaic and reconstructed multi-viewpoint video geometry mosaic to obtain reconstructed multi-viewpoint video mosaic;
  • the reconstructed point cloud mosaic According to the reconstructed point cloud texture mosaic, the reconstructed point cloud geometry mosaic and the reconstructed point cloud occupancy mosaic, the reconstructed point cloud mosaic is obtained.
  • the decoding end inputs the code stream into the video decoder, and the decoder decodes the video compressed sub-code stream to obtain the reconstructed heterogeneous mixed texture mosaic map and the reconstructed heterogeneous mixed geometry and occupancy mosaic map , decode the mixed splicing information sub-stream to obtain the mixed splicing information. Then, split the reconstructed heterogeneous mixed texture mosaic according to the mixed mosaic information, for example, use the region unpacking technology to split the reconstructed heterogeneous mixed texture mosaic, and obtain the reconstructed multi-viewpoint video texture mosaic and the reconstructed point cloud texture mosaic picture.
  • Split the reconstructed heterogeneous mixed geometry and occupancy mosaic according to the mixed mosaic information for example, use the region unpacking technology to split the reconstructed heterogeneous mixed geometry and occupancy mosaic to obtain the reconstructed multi-view video geometric mosaic, Reconstruct point cloud geometry mosaic and reconstruct point cloud occupancy mosaic. Then, according to the reconstructed multi-viewpoint video texture mosaic map and the reconstructed multi-viewpoint video geometric mosaic map, the reconstructed multi-viewpoint video mosaic map is obtained. The image is processed to obtain the reconstructed multi-view video mosaic image.
  • the reconstructed point cloud mosaic map is obtained, such as TMC2 unpacking technology, the reconstructed point cloud texture mosaic map, the reconstructed point cloud geometric mosaic map Process with the mosaic map of the occupancy of the reconstructed point cloud to obtain the mosaic map of the reconstructed point cloud.
  • the hybrid mosaic information in the embodiment of the present application includes a first flag, which is used for Indicates the expression format type corresponding to the i-th region in the heterogeneous hybrid mosaic map, where i is a positive integer.
  • S702-A includes the following steps of S702-A2 and S702-A3:
  • the above S702-A3 includes the following steps:
  • the first value is 0.
  • the second value is 1.
  • the decoding process when the mixed splicing information includes the first flag is introduced below through specific embodiments. Specifically, as shown in Figure 18, the decoding process includes the following steps:
  • Step 41 extracting the mixed splicing information sub-code stream and the video compression sub-code stream respectively from the compressed code stream.
  • Step 42 Decode the mixed splicing information substream to obtain the mixed splicing information.
  • Step 43 Input the compressed video sub-stream to the video decoder, and output the reconstructed heterogeneous hybrid mosaic after decoding.
  • Step 44 according to the first flag in the hybrid mosaic information, reconstruct the heterogeneous hybrid mosaic image, split and output the reconstructed multi-viewpoint video mosaic image and the reconstructed point cloud mosaic image.
  • the first flag pin_region_format_type_id[j][i] is obtained from the hybrid splicing information.
  • step 45 the reconstructed multi-view video mosaic is decoded to generate a reconstructed multi-view video, and the reconstructed point cloud mosaic is decoded to generate a reconstructed point cloud.
  • the decoder can accurately determine the location of the mosaic graph in the current region in the heterogeneous hybrid mosaic graph according to the first flag in the hybrid mosaic information when decoding.
  • Expression format type for example, if the value of the first flag is the first value, the decoder determines that the mosaic image in the current region in the heterogeneous hybrid mosaic image is a multi-view video mosaic image, if the value of the first flag is the first value When the value is two, the decoding end determines that the mosaic image of the current region in the heterogeneous hybrid mosaic image is a point cloud mosaic image, and then enables the decoding end to realize accurate decoding according to the first flag.
  • the hybrid mosaic information includes a second flag used to indicate whether the current hybrid mosaic is a heterogeneous hybrid mosaic.
  • the second flag is located in the unit header of the mixed splicing information.
  • the embodiment of the present application first obtains the second flag from the hybrid mosaic information, and according to the The second flag is to determine whether the first flag exists in the mixed splicing information.
  • the decoding end obtains the second flag from the hybrid splicing information. If the value of the second flag is a preset value, it indicates that the current hybrid mosaic image is a heterogeneous hybrid mosaic image. At this time, the decoding end obtains the Read the first flag corresponding to the i-th region, and determine the expression format type corresponding to the mosaic image of the i-th region according to the value of the first flag, for example, when the value of the first flag is the first value, determine The i-th area is a multi-viewpoint video mosaic image, and if the value of the first flag is the second value, then it is determined that the i-th area is a point cloud mosaic image.
  • the decoding end skips obtaining the i-th mosaic from the hybrid mosaic information.
  • the region corresponds to the first flagged step.
  • the decoding process when the mixed splicing information includes the second flag is introduced below through specific embodiments. Specifically, as shown in Figure 19, the decoding process includes the following steps:
  • Step 52 Decode the mixed splicing information sub-stream to obtain the mixed splicing information.
  • Step 53 Input the compressed video sub-stream to the video decoder, and output the reconstructed heterogeneous hybrid mosaic after decoding.
  • Step 54 According to the first flag in the hybrid mosaic information, reconstruct the heterogeneous hybrid mosaic image, split and output the reconstructed multi-viewpoint video mosaic image and the reconstructed point cloud mosaic image.
  • the decoding end acquires the first flag min_region_format_type_id[j][i] from the hybrid splicing information.
  • step 55 the reconstructed multi-view video mosaic is decoded to generate a reconstructed multi-view video, and the reconstructed point cloud mosaic is decoded to generate a reconstructed point cloud.
  • the decoder when decoding, the decoder first checks the second flag Decoding, if the second flag indicates that the current hybrid mosaic is a heterogeneous hybrid mosaic, the decoding end decodes the first flag again to determine the expression format type of the mosaic in the current area of the heterogeneous hybrid mosaic, and then realize Decode accurately.
  • the hybrid mosaic information includes a third flag, which is used to indicate whether the current hybrid mosaic is a heterogeneous hybrid mosaic, and which heterogeneous hybrid mosaic it belongs to.
  • the heterogeneous hybrid mosaic graph includes the following types: heterogeneous hybrid occupancy mosaic graph, heterogeneous hybrid geometric mosaic graph, heterogeneous hybrid attribute mosaic graph, and heterogeneous hybrid package mosaic graph.
  • the third flag is located in the unit header of the mixed splicing information.
  • the embodiment of the present application first obtains the third flag from the hybrid mosaic information, and according to the Three flags, determining whether the first flag exists in the mixed splicing information.
  • the decoding end obtains the third flag from the mixed splicing information, if the value of the third flag is the first preset value, the second preset value, the third preset value or the fourth preset value, then from Obtain the first flag corresponding to the i-th region from the hybrid mosaic information, the first preset value is used to indicate that the current hybrid mosaic is a heterogeneous hybrid occupancy mosaic, and the second preset value is used to indicate that the current hybrid mosaic is heterogeneous The third preset value is used to indicate that the current hybrid mosaic is a heterogeneous hybrid attribute mosaic, and the fourth preset value is used to indicate that the current hybrid mosaic is a heterogeneous hybrid packaged mosaic.
  • the value of the first flag determines the expression format type corresponding to the mosaic of the i-th region, for example, when the value of the first flag is the first value, determine that the i-th region is a multi-viewpoint video mosaic, If the value of the first flag is the second value, it is determined that the i-th area is a point cloud mosaic image.
  • the decoder skips the step of obtaining the first flag corresponding to the i-th region from the mixed splicing information.
  • the third flag is written in the v3c unit header, so that the decoder can judge whether the current mosaic is a heterogeneous hybrid mosaic or heterogeneous hybrid mosaic according to the v3c unit type at the v3c unit header level The type of graph.
  • the third flag v3c unit type is V3C_MAVD, V3C_MGVD, V3C_MOVD, or V3C_MPVD, it indicates that the current mosaic is a heterogeneous mixed mosaic, and then the first flag min_region_format_type_id[j][ in the mixed information (mixed information)[ i] to distinguish whether a region of a heterogeneous hybrid mosaic is a multi-view video mosaic or a point cloud mosaic.
  • the decoding process includes the following steps:
  • step 62 the mixed splicing information sub-stream is decoded to obtain the mixed splicing information.
  • Step 63 Input the compressed video sub-stream to the video decoder, and output the reconstructed heterogeneous hybrid mosaic after decoding.
  • Step 64 According to the first flag in the hybrid mosaic information, reconstruct the heterogeneous hybrid mosaic image, split and output the reconstructed multi-viewpoint video mosaic image and the reconstructed point cloud mosaic image.
  • the decoding end acquires the first flag min_region_format_type_id[j][i] from the hybrid splicing information.
  • step 65 the reconstructed multi-view video mosaic is decoded to generate a reconstructed multi-view video, and the reconstructed point cloud mosaic is decoded to generate a reconstructed point cloud.
  • the decoder when decoding, the decoder first decodes the third flag, and if the third flag indicates that the current hybrid mosaic is a heterogeneous hybrid mosaic, the decoder decodes the first flag again to determine the current heterogeneous hybrid mosaic.
  • the expression format type of the mosaic map of the region so as to realize accurate decoding.
  • the decoding end obtains the reconstructed heterogeneous hybrid mosaic graph by decoding the code stream; splits the reconstructed heterogeneous hybrid mosaic graph to obtain N reconstructed mosaic graphs, where N is a positive integer greater than 1;
  • the N reconstructed mosaic images are respectively decoded to obtain N reconstructed visual media contents, and at least two of the N reconstructed visual media contents correspond to different expression formats. That is, in the embodiment of the present application, multiple mosaics of different expression formats are stitched into a heterogeneous hybrid mosaic, so that when decoding, the rendering advantages of data (point clouds, etc.) from different expression formats are preserved, and the synthesis quality of the image is improved.
  • the number of two-dimensional video decoders such as HEVC, VVC, AVC, and AVS that needs to be called can be reduced as much as possible, which reduces the decoding cost and improves usability.
  • sequence numbers of the above-mentioned processes do not mean the order of execution, and the order of execution of the processes should be determined by their functions and internal logic, and should not be used in this application.
  • the implementation of the examples constitutes no limitation.
  • the term "and/or" is only an association relationship describing associated objects, indicating that there may be three relationships. Specifically, A and/or B may mean: A exists alone, A and B exist simultaneously, and B exists alone.
  • the character "/" in this application generally indicates that the contextual objects are an "or" relationship.
  • FIG. 21 is a schematic block diagram of an encoding device provided by an embodiment of the present application.
  • the encoding device 10 is applied to the above-mentioned video decoding end.
  • the encoding device 10 includes:
  • the first splicing unit 11 is configured to process a plurality of visual media contents to obtain N isomorphic mosaic graphs, at least two of the plurality of visual media contents correspond to different expression formats, and the N is greater than positive integer of 1;
  • the second splicing unit 12 is configured to splice the N homogeneous mosaic graphs to generate a heterogeneous hybrid mosaic graph
  • the encoding unit 13 is configured to encode the heterogeneous hybrid mosaic graph to obtain a code stream.
  • the encoding unit 13 is specifically configured to call a video encoder to perform video encoding on the heterogeneous hybrid mosaic graph to obtain a video compression sub-stream; perform hybrid mosaic information on the heterogeneous hybrid mosaic graph Encoding to obtain a mixed splicing information sub-code stream; writing the video compression sub-code stream and the hybrid splicing information sub-code stream into the code stream.
  • the heterogeneous hybrid mosaic includes a multi-attribute heterogeneous hybrid mosaic and a single-attribute heterogeneous hybrid mosaic.
  • the N homogeneous mosaics include at least two of a multi-view video mosaic, a point cloud mosaic, and a grid mosaic.
  • the second splicing unit 12 is specifically configured to splice at least the single-attribute mosaic graph of the first expression format and the single-attribute mosaic graph of the second expression format to obtain the heterogeneous hybrid mosaic graph, the Both the first expression format and the second expression format are any one of multi-viewpoint video, point cloud, and network, and the first expression format and the second expression format are different.
  • the second stitching unit 12 is specifically used to combine the multi-viewpoint video texture mosaic map and point cloud texture
  • the mosaic map is spliced to obtain a heterogeneous mixed texture mosaic map; or, the multi-view video geometric mosaic map, point cloud geometric mosaic map and point cloud occupancy mosaic map are spliced to obtain a heterogeneous mixed geometry and occupancy mosaic map.
  • the hybrid mosaic information includes a first flag, and the first flag is used to indicate the expression format type corresponding to the i-th region in the heterogeneous hybrid mosaic graph, where i is a positive integer.
  • the second stitching unit 12 is also used in the method if the mosaic image of the i-th region is the multi-view video mosaic, then set the value of the first sign to the first value; if the mosaic of the i-th region is the point cloud mosaic, then set the value of the first sign to the second value.
  • the hybrid mosaic information includes a second flag, and the second flag is used to indicate whether the current hybrid mosaic is a heterogeneous hybrid mosaic.
  • the second stitching unit 12 is further configured to set the second flag to a preset value if the current hybrid mosaic graph is the heterogeneous hybrid mosaic graph
  • the second splicing unit 12 is further configured to write the first flag in the hybrid splicing information if it is determined that the value of the second flag is the preset value.
  • the second splicing unit 12 is further configured to skip writing the first flag in the hybrid splicing information if it is determined that the value of the second flag is not the preset value .
  • the second flag is located in the unit header of the hybrid splicing information.
  • the first stitching unit 11 is specifically configured to perform projection and de-redundancy processing on the acquired multi-viewpoint videos , connecting the non-repeating pixels into video sub-blocks, and splicing the video sub-blocks into the multi-viewpoint video mosaic graph; performing parallel projection on the obtained point cloud, and forming the point cloud sub-blocks from the connected points in the projection surface , and splicing the point cloud sub-blocks into the point cloud splicing graph.
  • the N visual media contents are media contents presented simultaneously in the same three-dimensional space.
  • the hybrid mosaic information includes a third flag, and the third flag is used to indicate whether the current hybrid mosaic is a heterogeneous hybrid mosaic, and which heterogeneous hybrid mosaic it belongs to.
  • the heterogeneous hybrid mosaic graph includes the following types: heterogeneous hybrid occupancy mosaic graph, heterogeneous hybrid geometric mosaic graph, heterogeneous hybrid attribute mosaic graph, and heterogeneous hybrid package mosaic graph.
  • the second mosaic unit 12 is specifically configured to set the third flag to a first preset value if the current hybrid mosaic is the heterogeneous hybrid occupancy mosaic; If the current hybrid mosaic is the heterogeneous hybrid geometric mosaic, set the third flag to a second preset value; if the current hybrid mosaic is the heterogeneous hybrid attribute mosaic, then Set the third flag to a third preset value; if the current hybrid mosaic is the heterogeneous hybrid packed mosaic, set the third flag to a fourth preset value.
  • the second mosaic unit 12 is further configured to write the first hybrid mosaic image in the hybrid mosaic information if it is determined that the third flag indicates that the current hybrid mosaic graph is a heterogeneous hybrid mosaic graph. a sign.
  • the second mosaic unit 12 is further configured to skip writing the hybrid mosaic graph in the hybrid mosaic information if it is determined that the third flag indicates that the current hybrid mosaic graph is not a heterogeneous hybrid mosaic graph. Describe the first sign.
  • the third flag is located in the unit header of the hybrid splicing information
  • the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here.
  • the device 10 shown in FIG. 21 can execute the coding method at the coding end of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 10 are to realize the above-mentioned coding method at the coding side and other methods. For the sake of brevity, the corresponding process will not be repeated here.
  • Fig. 22 is a schematic block diagram of a decoding device provided by an embodiment of the present application, and the decoding device is applied to the above-mentioned decoding end.
  • the decoding device 20 may include:
  • the decoding unit 21 is used to decode the code stream to obtain the reconstructed heterogeneous hybrid mosaic graph
  • the first splitting unit 22 is configured to split the reconstructed heterogeneous hybrid mosaic graph to obtain N reconstructed isomorphic mosaic graphs, where N is a positive integer greater than 1;
  • the processing unit 23 is configured to obtain a plurality of reconstructed visual media contents according to the N isomorphic reconstructed mosaic graphs, and at least two reconstructed visual media contents in the plurality of reconstructed visual media contents have different expression formats.
  • the code stream includes a video compression sub-code stream
  • the decoding unit 21 is specifically configured to call a video decoder to decode the video compression sub-code stream to obtain the reconstructed heterogeneous hybrid mosaic image .
  • the code stream further includes a hybrid splicing information sub-code stream
  • the decoding unit 21 is further configured to decode the hybrid splicing information sub-code stream to obtain hybrid splicing information
  • the first splitting unit 22 is specifically configured to split the reconstructed heterogeneous hybrid mosaic graph according to the hybrid mosaic information, to obtain the N reconstructed isomorphic mosaic graphs.
  • the reconstructed heterogeneous hybrid mosaic includes a multi-attribute reconstructed heterogeneous hybrid mosaic and a single-attribute reconstructed heterogeneous hybrid mosaic.
  • the N reconstruction isomorphic mosaics include at least two of a multi-view video reconstruction mosaic, a point cloud reconstruction mosaic, and a mesh reconstruction mosaic.
  • the first splitting unit 22 is specifically configured to split the reconstructed heterogeneous hybrid mosaic graph according to the hybrid mosaic information, and at least obtain the single-attribute reconstruction mosaic graph in the first expression format and the second A single attribute reconstruction mosaic in two expression formats, the first expression format and the second expression format are any one of multi-viewpoint video, point cloud and network, and the first expression format and the second expression format The expression format is different.
  • the first splitting unit 22 is specifically configured to if the reconstructed heterogeneous hybrid mosaic image is a reconstructed Heterogeneous mixed texture mosaic, split the reconstructed heterogeneous mixed texture mosaic according to the mixed mosaic information to obtain multi-viewpoint video texture reconstructed mosaic and point cloud texture reconstructed mosaic; if the reconstructed heterogeneous The heterogeneous hybrid mosaic graph is the hybrid mosaic information for reconstructing the heterogeneous hybrid geometry and the occupancy mosaic graph, then according to the hybrid mosaic information, the reconstructed heterogeneous hybrid geometry and the occupancy mosaic graph are split to obtain the multi-viewpoint video geometry Reconstruction Mosaic, Point Cloud Geometry Reconstruction Mosaic, and Point Cloud Occupancy Reconstruction Mosaic.
  • the hybrid mosaic information includes a first flag, and the first flag is used to indicate the expression format type corresponding to the i-th region in the heterogeneous hybrid mosaic graph, where i is a positive integer.
  • the first splitting unit 22 is specifically configured to, for the i-th region in the reconstructed heterogeneous hybrid mosaic graph, obtain the i-th region corresponding to the i-th region from the hybrid mosaic information
  • the first flag according to the first flag corresponding to the i-th region, split the i-th region into a reconstructed mosaic of the visual media expression format type corresponding to the i-th region.
  • the first splitting unit 22 is specifically configured to value is the first value, then split the i-th region into the reconstructed multi-viewpoint video mosaic; if the value of the first flag is the second value, split the i-th region into the Describe the reconstructed point cloud mosaic.
  • the hybrid mosaic information includes a second flag, and the second flag is used to indicate whether the current hybrid mosaic is a heterogeneous hybrid mosaic.
  • the first splitting unit 22 is further configured to obtain the The second flag; if the value of the second flag is a preset value, the first flag corresponding to the i-th region is obtained from the mixed splicing information, and the preset value is used to indicate the current A hybrid mosaic is a heterogeneous hybrid mosaic.
  • the first splitting unit 22 is further configured to skip obtaining the second flag from the mixed splicing information if the value of the second flag is not the preset value. The step of the first flag corresponding to the i regions.
  • the second flag is located in the unit header of the hybrid splicing information.
  • the hybrid mosaic information includes a third flag, and the third flag is used to indicate whether the current hybrid mosaic is a heterogeneous hybrid mosaic, and which heterogeneous hybrid mosaic it belongs to.
  • the heterogeneous hybrid mosaic graph includes the following types: heterogeneous hybrid occupancy mosaic graph, heterogeneous hybrid geometric mosaic graph, heterogeneous hybrid attribute mosaic graph, and heterogeneous hybrid package mosaic graph.
  • the first splitting unit 22 is further configured to obtain the third flag; if the value of the third flag is the first preset value, the second preset value, the third preset value or the fourth preset value, then obtain the The first flag corresponding to the i-th area, the first preset value is used to indicate that the current hybrid mosaic is the heterogeneous mixed occupancy mosaic, and the second preset value is used to indicate the The current hybrid mosaic is the heterogeneous hybrid geometric mosaic, the third preset value is used to indicate that the current hybrid mosaic is the heterogeneous hybrid attribute mosaic, and the fourth preset value is used to indicate The current hybrid mosaic is the heterogeneous hybrid packaged mosaic.
  • the first splitting unit 22 is further configured to if the value of the third flag is not the first preset value, the second preset value, the third preset value or the first preset value When the preset value is four, skip the step of obtaining the first flag corresponding to the i-th region from the mixed splicing information.
  • the third flag is located in the unit header of the hybrid splicing information.
  • the N visual media contents are media contents presented simultaneously in the same three-dimensional space.
  • the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here.
  • the device 20 shown in FIG. 22 may correspond to the corresponding subject in the prediction method at the decoding end of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 20 are to realize the decoding at the decoding end.
  • the corresponding processes in each method, such as the method will not be repeated here.
  • the functional unit may be implemented in the form of hardware, may also be implemented by instructions in the form of software, and may also be implemented by a combination of hardware and software units.
  • each step of the method embodiment in the embodiment of the present application can be completed by an integrated logic circuit of the hardware in the processor and/or instructions in the form of software, and the steps of the method disclosed in the embodiment of the present application can be directly embodied as hardware
  • the decoding processor is executed, or the combination of hardware and software units in the decoding processor is used to complete the execution.
  • the software unit may be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, and registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps in the above method embodiments in combination with its hardware.
  • Fig. 23 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 30 may be the video encoder or video decoder described in the embodiment of the present application, and the electronic device 30 may include:
  • a memory 33 and a processor 32 the memory 33 is used to store a computer program 34 and transmit the program code 34 to the processor 32 .
  • the processor 32 can call and run the computer program 34 from the memory 33 to implement the method in the embodiment of the present application.
  • the processor 32 can be used to execute the steps in the above-mentioned method 200 according to the instructions in the computer program 34 .
  • the processor 32 may include, but is not limited to:
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the memory 33 includes but is not limited to:
  • non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
  • RAM Static Random Access Memory
  • SRAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM, DDR SDRAM double data rate synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • the computer program 34 can be divided into one or more units, and the one or more units are stored in the memory 33 and executed by the processor 32 to complete the present application.
  • the one or more units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 34 in the electronic device 30 .
  • the electronic device 30 may also include:
  • a transceiver 33 the transceiver 33 can be connected to the processor 32 or the memory 33 .
  • the processor 32 can control the transceiver 33 to communicate with other devices, specifically, can send information or data to other devices, or receive information or data sent by other devices.
  • Transceiver 33 may include a transmitter and a receiver.
  • the transceiver 33 may further include antennas, and the number of antennas may be one or more.
  • bus system includes not only a data bus, but also a power bus, a control bus and a status signal bus.
  • the present application also provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the computer can execute the methods of the above method embodiments.
  • the embodiments of the present application further provide a computer program product including instructions, and when the instructions are executed by a computer, the computer executes the methods of the foregoing method embodiments.
  • the present application also provides a code stream, which is generated according to the above encoding method.
  • the code stream includes the above-mentioned first flag, or includes the first flag and the second flag.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a digital video disc (digital video disc, DVD)), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请提供一种编解码方法、装置、设备、及存储介质,本申请通过将多种不同表达格式的视觉媒体内容对应的同构拼接图拼接在一张异构混合拼接图中,例如将多视点视频拼接图和点云拼接图拼接在一张异构混合拼接图中进行编解码,这样尽量减少了所需要调用的HEVC,VVC,AVC,AVS等二维视频编解码器的个数,减少了编解码代价,提高易用性。

Description

编解码方法、装置、设备、及存储介质 技术领域
本申请涉及图像处理技术领域,尤其涉及一种编解码方法、装置、设备、及存储介质。
背景技术
在三维应用场景中,例如虚拟现实(Virtual Reality,VR)、增强现实(Augmented Reality,AR)、混合现实(Mix Reality,MR)等应用场景中,在同一个场景中可能出现表达格式不同的视觉媒体对象。例如在同一个三维场景中存在,以视频表达场景背景与部分人物和物件、以三维点云或三维网格表达了另一部分人物。
在压缩编码时分别采用多视点视频编码、点云编码、网格编码,会比全部投影成多视点视频编码更能保持原表达格式的有效信息,提高观看时所渲染的观看视窗的质量,提高码率-质量的综合效率。
但是,目前的编解码技术是,对多视点视频、点云编码和网格网格分别进行编解码,其编解码过程中需要调用的编解码器个数较多,使得编解码代价大。
发明内容
本申请实施例提供了一种编解码方法、装置、设备、及存储介质,以降低编解码过程所调用的编解码器个数,降低编解码代价。
第一方面,本申请提供了一种编码方法,包括:
对多个视觉媒体内容分别进行处理,得到N个同构拼接图,所述多个视觉媒体内容中至少两个视觉媒体内容对应的表达格式不同,所述N为大于1的正整数;
将所述N个同构拼接图进行拼接,生成异构混合拼接图;
对所述异构混合拼接图进行编码,得到码流。
第二方面,本申请实施例提供一种解码方法,包括:
解码码流,得到重建异构混合拼接图;
对所述重建异构混合拼接图进行拆分,得到N个重建同构拼接图,所述N为大于1的正整数;
根据所述N个重建拼接图,得到多个重建视觉媒体内容,所述多个重建视觉媒体内容中至少两个重建视觉媒体内容对应的表达格式不同。
第三方面,本申请提供了一种编码装置,用于执行上述第一方面或其各实现方式中的方法。具体地,该预测装置包括用于执行上述第一方面或其各实现方式中的方法的功能单元。
第四方面,本申请提供了一种解码装置,用于执行上述第二方面或其各实现方式中的方法。具体地,该预测装置包括用于执行上述第二方面或其各实现方式中的方法的功能单元。
第五方面,提供了一种编码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第一方面或其各实现方式中的方法。
第六方面,提供了一种解码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第二方面或其各实现方式中的方法。
第七方面,提供了一种编解码系统,包括编码器和解码器。编码器用于执行上述第一方面或其各实现方式中的方法,解码器用于执行上述第二方面或其各实现方式中的方法。
第八方面,提供了一种芯片,用于实现上述第一方面至第二方面中的任一方面或其各实现方式中的方法。具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备执行如上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第九方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十方面,提供了一种计算机程序产品,包括计算机程序指令,该计算机程序指令使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十一方面,提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第二方面中的任 一方面或其各实现方式中的方法。
第十二方面,提供了一种码流,码流是基于上述第一方面的方法生成的。
基于以上技术方案,通过将多种不同表达格式的视觉媒体内容对应的拼接图拼接在一张异构混合拼接图中,例如将多视点视频拼接图和点云拼接图拼接在一张异构混合拼接图中进行编解码,这样尽量减少了所需要调用的HEVC,VVC,AVC,AVS等二维视频编解码器的个数,减少了编解码代价,提高易用性。
附图说明
图1为本申请实施例涉及的一种视频编解码系统的示意性框图;
图2A是本申请实施例涉及的视频编码器的示意性框图;
图2B是本申请实施例涉及的视频解码器的示意性框图;
图3A是多视点视频数据的组织和表达框架图;
图3B是多视点视频数据的拼接图像生成示意图;
图3C是点云数据的组织和表达框架图;
图3D至图3F为不同类型的点云数据示意图;
图4为多视点视频的编码示意图;
图5为多视点视频的解码示意图;
图6为本申请一实施例提供的编码方法流程示意图;
图7为本申请一实施例提供的编码过程示意图;
图8A为异构混合纹理拼接图;
图8B为异构混合几何和占用情况拼接图;
图9为本申请一实施例提供的混合编码过程示意图;
图10为本申请实施例涉及的一种语法结构示意图;
图11为本申请一编码过程示意图;
图12为本申请实施例涉及的另一种语法结构示意图;
图13为本申请另一编码过程示意图;
图14为本申请实施例涉及的另一种语法结构示意图;
图15为本申请另一编码过程示意图;
图16为本申请实一施例提供的解码方法流程示意图;
图17为本申请一实施例提供的混合解码过程示意图;
图18为本申请一解码过程示意图;
图19为本申请另一解码过程示意图;
图20为本申请另一解码过程示意图;
图21是本申请一实施例提供的编码装置的示意性框图;
图22是本申请一实施例提供的解码装置的示意性框图;
图23是本申请实施例提供的电子设备的示意性框图。
具体实施方式
本申请可应用于图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视频编解码领域、实时视频编解码领域等。例如,本申请的方案可结合至音视频编码标准(audio video coding standard,简称AVS),例如,H.264/音视频编码(audio video coding,简称AVC)标准,H.265/高效视频编码(high efficiency video coding,简称HEVC)标准以及H.266/多功能视频编码(versatile video coding,简称VVC)标准。或者,本申请的方案可结合至其它专属或行业标准而操作,所述标准包含ITU-TH.261、ISO/IECMPEG-1Visual、ITU-TH.262或ISO/IECMPEG-2Visual、ITU-TH.263、ISO/IECMPEG-4Visual,ITU-TH.264(还称为ISO/IECMPEG-4AVC),包含可分级视频编解码(SVC)及多视图视频编解码(MVC)扩展。应理解,本申请的技术不限于任何特定编解码标准或技术。
高自由度沉浸式编码系统根据任务线可大致分为以下几个环节:数据采集、数据的组织与表达、数据编码压缩、数据解码重建、数据合成渲染,最终将目标数据呈现给用户。
本申请实施例涉及的编码主要为视频编解码,为了便于理解,首先结合图1对本申请实施例涉及的视频编解码系统进行介绍。
图1为本申请实施例涉及的一种视频编解码系统的示意性框图。需要说明的是,图1只是一种示例,本申请实施例的视频编解码系统包括但不限于图1所示。如图1所示,该视频编解码系统100包含编码设备110和解码设备120。其中编码设备用于对视频数据进行编码(可以理解成压缩)产生码流,并将码流传输给解码设备。解码设备对编码设备编码产生的码流进行解码,得到解码后的视频数据。
本申请实施例的编码设备110可以理解为具有视频编码功能的设备,解码设备120可以理解为具有视频解码功能的设备,即本申请实施例对编码设备110和解码设备120包括更广泛的装置,例如包含智能手机、台式计算机、移动计算装置、笔记本(例如,膝上型)计算机、平板计算机、机顶盒、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机等。
在一些实施例中,编码设备110可以经由信道130将编码后的视频数据(如码流)传输给解码设备120。信道130可以包括能够将编码后的视频数据从编码设备110传输到解码设备120的一个或多个媒体和/或装置。
在一个实例中,信道130包括使编码设备110能够实时地将编码后的视频数据直接发射到解码设备120的一个或多个通信媒体。在此实例中,编码设备110可根据通信标准来调制编码后的视频数据,且将调制后的视频数据发射到解码设备120。其中通信媒体包含无线通信媒体,例如射频频谱,可选的,通信媒体还可以包含有线通信媒体,例如一根或多根物理传输线。
在另一实例中,信道130包括存储介质,该存储介质可以存储编码设备110编码后的视频数据。存储介质包含多种本地存取式数据存储介质,例如光盘、DVD、快闪存储器等。在该实例中,解码设备120可从该存储介质中获取编码后的视频数据。
在另一实例中,信道130可包含存储服务器,该存储服务器可以存储编码设备110编码后的视频数据。在此实例中,解码设备120可以从该存储服务器中下载存储的编码后的视频数据。可选的,该存储服务器可以存储编码后的视频数据且可以将该编码后的视频数据发射到解码设备120,例如web服务器(例如,用于网站)、文件传送协议(FTP)服务器等。
一些实施例中,编码设备110包含视频编码器112及输出接口113。其中,输出接口113可以包含调制器/解调器(调制解调器)和/或发射器。
在一些实施例中,编码设备110除了包括视频编码器112和输入接口113外,还可以包括视频源111。
视频源111可包含视频采集装置(例如,视频相机)、视频存档、视频输入接口、计算机图形系统中的至少一个,其中,视频输入接口用于从视频内容提供者处接收视频数据,计算机图形系统用于产生视频数据。
视频编码器112对来自视频源111的视频数据进行编码,产生码流。视频数据可包括一个或多个图像(picture)或图像序列(sequence of pictures)。码流以比特流的形式包含了图像或图像序列的编码信息。编码信息可以包含编码图像数据及相关联数据。相关联数据可包含序列参数集(sequence parameter set,简称SPS)、图像参数集(picture parameter set,简称PPS)及其它语法结构。SPS可含有应用于一个或多个序列的参数。PPS可含有应用于一个或多个图像的参数。语法结构是指码流中以指定次序排列的零个或多个语法元素的集合。
视频编码器112经由输出接口113将编码后的视频数据直接传输到解码设备120。编码后的视频数据还可存储于存储介质或存储服务器上,以供解码设备120后续读取。
在一些实施例中,解码设备120包含输入接口121和视频解码器122。
在一些实施例中,解码设备120除包括输入接口121和视频解码器122外,还可以包括显示装置123。
其中,输入接口121包含接收器及/或调制解调器。输入接口121可通过信道130接收编码后的视频数据。
视频解码器122用于对编码后的视频数据进行解码,得到解码后的视频数据,并将解码后的视频数据传输至显示装置123。
显示装置123显示解码后的视频数据。显示装置123可与解码设备120整合或在解码设备120外部。显示装置123可包括多种显示装置,例如液晶显示器(LCD)、等离子体显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。
此外,图1仅为实例,本申请实施例的技术方案不限于图1,例如本申请的技术还可以应用于单侧的视频编码或单侧的视频解码。
下面对本申请实施例涉及的视频编码框架进行介绍。
图2A是本申请实施例涉及的视频编码器的示意性框图。应理解,该视频编码器200可用于对图像进行有损压缩(lossy compression),也可用于对图像进行无损压缩(lossless compression)。该无损压缩可以是视觉无损压缩(visually lossless compression),也可以是数学无损压缩(mathematically lossless compression)。
该视频编码器200可应用于亮度色度(YCbCr,YUV)格式的图像数据上。例如,YUV比例可以为4:2:0、4:2:2或者4:4:4,Y表示明亮度(Luma),Cb(U)表示蓝色色度,Cr(V)表示红色色度,U和V表示为色度(Chroma)用于描述色彩及饱和度。例如,在颜色格式上,4:2:0表示每4个像素有4个亮度分量,2个色度分量(YYYYCbCr),4:2:2表示每4个像素有4个亮度分量,4个色度分量(YYYYCbCrCbCr),4:4:4表示全像素显示(YYYYCbCrCbCrCbCrCbCr)。
例如,该视频编码器200读取视频数据,针对视频数据中的每帧图像,将一帧图像划分成若干个编码树单元(coding tree unit,CTU),在一些例子中,CTB可被称作“树型块”、“最大编码单元”(Largest Coding unit,简称LCU)或“编码树型块”(coding tree block,简称CTB)。每一个CTU可以与图像内的具有相等大小的像素块相关联。每一像素可对应一个亮度(luminance或luma)采样及两个色度(chrominance或chroma)采样。因此,每一个CTU可与一个亮度采样块及两个色度采样块相关联。一个CTU大小例如为128×128、64×64、32×32等。一个CTU又可以继续被划分成若干个编码单元(Coding Unit,CU)进行编码,CU可以为矩形块也可以为方形块。CU可以进一步划分为预测单元(prediction Unit,简称PU)和变换单元(transform unit,简称TU),进而使得编码、预测、变换分离,处理的时候更灵活。在一种示例中,CTU以四叉树方式划分为CU,CU以四叉树方式划分为TU、PU。
视频编码器及视频解码器可支持各种PU大小。假定特定CU的大小为2N×2N,视频编码器及视频解码器可支持2N×2N或N×N的PU大小以用于帧内预测,且支持2N×2N、2N×N、N×2N、N×N或类似大小的对称PU以用于帧间预测。视频编码器及视频解码器还可支持2N×nU、2N×nD、nL×2N及nR×2N的不对称PU以用于帧间预测。
在一些实施例中,如图2A所示,该视频编码器200可包括:预测单元210、残差单元220、变换/量化单元230、反变换/量化单元240、重建单元250、环路滤波单元260、解码图像缓存270和熵编码单元280。需要说明的是,视频编码器200可包含更多、更少或不同的功能组件。
可选的,在本申请中,当前块(current block)可以称为当前编码单元(CU)或当前预测单元(PU)等。预测块也可称为预测图像块或图像预测块,重建图像块也可称为重建块或图像重建图像块。
在一些实施例中,预测单元210包括帧间预测单元211和帧内估计单元212。由于视频的一个帧中的相邻像素之间存在很强的相关性,在视频编解码技术中使用帧内预测的方法消除相邻像素之间的空间冗余。由于视频中的相邻帧之间存在着很强的相似性,在视频编解码技术中使用帧间预测方法消除相邻帧之间的时间冗余,从而提高编码效率。
帧间预测单元211可用于帧间预测,帧间预测可以包括运动估计(motion estimation)和运动补偿(motion compensation),可以参考不同帧的图像信息,帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块。运动信息包括参考帧所在的参考帧列表,参考帧索引,以及运动矢量。运动矢量可以是整像素的或者是分像素的,如果运动矢量是分像素的,那么需要在参考帧中使用插值滤波做出所需的分像素的块,这里把根据运动矢量找到的参考帧中的整像素或者分像素的块叫参考块。有的技术会直接把参考块作为预测块,有的技术会在参考块的基础上再处理生成预测块。在参考块的基础上再处理生成预测块也可以理解为把参考块作为预测块然后再在预测块的基础上处理生成新的预测块。
帧内估计单元212只参考同一帧图像的信息,预测当前码图像块内的像素信息,用于消除空间冗余。帧内预测所使用的帧可以为I帧。
帧内预测有多种预测模式,以国际数字视频编码标准H系列为例,H.264/AVC标准有8种角度预测模式和1种非角度预测模式,H.265/HEVC扩展到33种角度预测模式和2种非角度预测模式。HEVC使用的帧内预测模式有平面模式(Planar)、DC和33种角度模式,共35种预测模式。VVC使用的帧内模式有Planar、DC和65种角度模式,共67种预测模式。
需要说明的是,随着角度模式的增加,帧内预测将会更加精确,也更加符合对高清以及超高清数字视频发展的需求。
残差单元220可基于CU的像素块及CU的PU的预测块来产生CU的残差块。举例来说,残差单元220可产生CU的残差块,使得残差块中的每一采样具有等于以下两者之间的差的值:CU的像素块中的采样,及CU的PU的预测块中的对应采样。
变换/量化单元230可量化变换系数。变换/量化单元230可基于与CU相关联的量化参数(QP)值来量化与CU的TU相关联的变换系数。视频编码器200可通过调整与CU相关联的QP值来调整应用于与CU相关联的变换系数的量 化程度。
反变换/量化单元240可分别将逆量化及逆变换应用于量化后的变换系数,以从量化后的变换系数重建残差块。
重建单元250可将重建后的残差块的采样加到预测单元210产生的一个或多个预测块的对应采样,以产生与TU相关联的重建图像块。通过此方式重建CU的每一个TU的采样块,视频编码器200可重建CU的像素块。
环路滤波单元260用于对反变换与反量化后的像素进行处理,弥补失真信息,为后续编码像素提供更好的参考,例如可执行消块滤波操作以减少与CU相关联的像素块的块效应。
在一些实施例中,环路滤波单元260包括去块滤波单元和样点自适应补偿/自适应环路滤波(SAO/ALF)单元,其中去块滤波单元用于去方块效应,SAO/ALF单元用于去除振铃效应。
解码图像缓存270可存储重建后的像素块。帧间预测单元211可使用含有重建后的像素块的参考图像来对其它图像的PU执行帧间预测。另外,帧内估计单元212可使用解码图像缓存270中的重建后的像素块来对在与CU相同的图像中的其它PU执行帧内预测。
熵编码单元280可接收来自变换/量化单元230的量化后的变换系数。熵编码单元280可对量化后的变换系数执行一个或多个熵编码操作以产生熵编码后的数据。
图2B是本申请实施例涉及的视频解码器的示意性框图。
如图2B所示,视频解码器300包含:熵解码单元310、预测单元320、反量化/变换单元330、重建单元340、环路滤波单元350及解码图像缓存360。需要说明的是,视频解码器300可包含更多、更少或不同的功能组件。
视频解码器300可接收码流。熵解码单元310可解析码流以从码流提取语法元素。作为解析码流的一部分,熵解码单元310可解析码流中的经熵编码后的语法元素。预测单元320、反量化/变换单元330、重建单元340及环路滤波单元350可根据从码流中提取的语法元素来解码视频数据,即产生解码后的视频数据。
在一些实施例中,预测单元320包括帧间预测单元321和帧内估计单元322。
帧内估计单元322可执行帧内预测以产生PU的预测块。帧内估计单元322可使用帧内预测模式以基于空间相邻PU的像素块来产生PU的预测块。帧内估计单元322还可根据从码流解析的一个或多个语法元素来确定PU的帧内预测模式。
帧间预测单元321可根据从码流解析的语法元素来构造第一参考图像列表(列表0)及第二参考图像列表(列表1)。此外,如果PU使用帧间预测编码,则熵解码单元310可解析PU的运动信息。帧间预测单元321可根据PU的运动信息来确定PU的一个或多个参考块。帧间预测单元321可根据PU的一个或多个参考块来产生PU的预测块。
反量化/变换单元330可逆量化(即,解量化)与TU相关联的变换系数。反量化/变换单元330可使用与TU的CU相关联的QP值来确定量化程度。
在逆量化变换系数之后,反量化/变换单元330可将一个或多个逆变换应用于逆量化变换系数,以便产生与TU相关联的残差块。
重建单元340使用与CU的TU相关联的残差块及CU的PU的预测块以重建CU的像素块。例如,重建单元340可将残差块的采样加到预测块的对应采样以重建CU的像素块,得到重建图像块。
环路滤波单元350可执行消块滤波操作以减少与CU相关联的像素块的块效应。
视频解码器300可将CU的重建图像存储于解码图像缓存360中。视频解码器300可将解码图像缓存360中的重建图像作为参考图像用于后续预测,或者,将重建图像传输给显示装置呈现。
视频编解码的基本流程如下:在编码端,将一帧图像划分成块,针对当前块,预测单元210使用帧内预测或帧间预测产生当前块的预测块。残差单元220可基于预测块与当前块的原始块计算残差块,即预测块和当前块的原始块的差值,该残差块也可称为残差信息。该残差块经由变换/量化单元230变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。可选的,经过变换/量化单元230变换与量化之前的残差块可称为时域残差块,经过变换/量化单元230变换与量化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元280接收到变化量化单元230输出的量化后的变化系数,可对该量化后的变化系数进行熵编码,输出码流。例如,熵编码单元280可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。
在解码端,熵解码单元310可解析码流得到当前块的预测信息、量化系数矩阵等,预测单元320基于预测信息对当前块使用帧内预测或帧间预测产生当前块的预测块。反量化/变换单元330使用从码流得到的量化系数矩阵,对量化系数矩阵进行反量化、反变换得到残差块。重建单元340将预测块和残差块相加得到重建块。重建块组成重建图像,环路滤波单元350基于图像或基于块对重建图像进行环路滤波,得到解码图像。编码端同样需要和解码端类似的操作 获得解码图像。该解码图像也可以称为重建图像,重建图像可以为后续的帧作为帧间预测的参考帧。
需要说明的是,编码端确定的块划分信息,以及预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息等在必要时携带在码流中。解码端通过解析码流及根据已有信息进行分析确定与编码端相同的块划分信息,预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息,从而保证编码端获得的解码图像和解码端获得的解码图像相同。
上述是基于块的混合编码框架下的视频编解码器的基本流程,随着技术的发展,该框架或流程的一些模块或步骤可能会被优化,本申请适用于该基于块的混合编码框架下的视频编解码器的基本流程,但不限于该框架及流程。
在一些应用场景中,在同一个三维场景中同时出现多种异构内容,例如出现多视点视频和点云。对于这种情况,目前的编解码方式至少包括如下两种:
方式一,对于多视点视频采用MPEG(Moving Picture Experts Group,动态图像专家组)沉浸式视频(MPEG Immersive Video,简称MIV)技术进行编解码,对于点云则采用点云视频压缩(Video based Point Cloud Compression,简称VPCC)技术进行编解码。
下面对MIV技术和VPCC技术进行介绍。
MIV技术:为了降低传输像素率的同时尽可能保留场景信息,以便保证有足够的信息用于渲染目标视图,MPEG-I采用的方案如图3A所示,选择有限数量视点作为基础视点且尽可能表达场景的可视范围,基础视点作为完整图像传输,去除剩余非基础视点与基础视点之间的冗余像素,即仅保留非重复表达的有效信息,再将有效信息提取为子块图像与基础视点图像进行重组织,形成更大的矩形图像,该矩形图像称为拼接图像,图3A和图3B给出拼接图像的生成示意过程。将拼接图像送入编解码器压缩重建,并且子块图像拼接信息有关的辅助数据也一并送入编码器形成码流。
VPCC的编码方法是将点云投影成二维图像或视频,将三维信息转换成二维信息编码。图3C是VPCC的编码框图,码流大致分为四个部分,几何码流是几何深度图编码产生的码流,用来表示点云的几何信息;属性码流是纹理图编码产生的码流,用来表示点云的属性信息;占用码流是占用图编码产生的码流,用来指示深度图和纹理图中的有效区域;这三种类型的视频都使用视频编码器进行编解码,如图3D至图3F所示。辅助信息码流是子块图像的附属信息编码产生的码流,即V3C标准中的patchdataunit相关的部分,指示了每个子块图像的位置和大小等信息。
方式二,多视点视频和点云均使用可视体视频编码(Visual Volumetric Video-based Coding,简称V3C)中的帧打包(frame packing)技术进行编解码。
下面对frame packing技术进行介绍。
以多视点视频为例,示例性的,如图4所示,编码端包括如下步骤:
步骤1,对获取的多视点视频进行编码时,经过一些前处理,生成多视点视频子块(patch),接着,将多视点视频子块进行组织,生成多视点视频拼接图。
例如,图4所示,将多视点视频输入TIMV中进行打包,输出多视点视频拼接图。TIMV为一种MIV的参考软件。本申请实施例的打包可以理解为拼接。
其中,多视点视频拼接图包括多视点视频纹理拼接图、多视点视频几何拼接图,即只包含多视点视频子块。
步骤2,将多视点视频拼接图输入帧打包器,输出多视点视频混合拼接图。
其中,多视点视频混合拼接图包括多视点视频纹理混合拼接图,多视点视频几何混合拼接图,多视点视频纹理与几何混合拼接图。
具体的,如图4所示,将多视点视频拼接图进行帧打包(framepacking),生成多视点视频混合拼接图,每个多视点视频拼接图占用多视点视频混合拼接图的一个区域(region)。相应地,在码流中要为每个区域传送一个标志pin_region_type_id_minus2,这个标志记录了当前区域属于多视点视频纹理拼接图还是多视点视频几何拼接图的信息,在解码端需要利用该信息。
步骤3,使用视频编码器对多视点视频混合拼接图进行编码,得到码流。
示例性的,如图5所示,解码端包括如下步骤:
步骤1,在多视点视频解码时,将获取的码流输入视频解码器中进行解码,得到重建多视点视频混合拼接图。
步骤2,将重建多视点视频混合拼接图输入帧解打包器中,输出重建多视点视频拼接图。
具体的,首先,从码流中获取标志pin_region_type_id_minus2,若确定该pin_region_type_id_minus2是V3C_AVD,则表示当前区域是多视点视频纹理拼接图,则将该当前区域拆分并输出为重建多视点视频纹理拼接图。
若确定该pin_region_type_id_minus2是V3C_GVD,则表示当前区域是多视点视频几何拼接图,将该当前区域拆 分并输出为重建多视点视频几何拼接图。
步骤3,对重建多视点视频拼接图进行解码,得到重建多视点视频。
具体是,对多视点视频纹理拼接图和多视点视频几何拼接图进行解码,得到重建多视点视频。
上面以多视点视频为例对framepacking技术进行解析介绍,对于点云进行framepacking编解码方式,与上述多视点视频基本相同,参照即可,例如使用TMC(一种VPCC的参考软件)对点云进行打包,得到点云拼接图,对点云拼接图输入帧打包器进行帧打包,得到点云混合拼接图,对点云混合拼接图进行拼接,得到点云码流,在此不再赘述。
下面对标准中与framepacking相关的语法进行介绍。
V3C单元头语法如表1所示:
表1
Figure PCTCN2022075260-appb-000001
V3C单元头语义,如表2所示:
表2:V3C单元类型
Figure PCTCN2022075260-appb-000002
Figure PCTCN2022075260-appb-000003
拼接信息语法如表3所示:
表3
Figure PCTCN2022075260-appb-000004
Figure PCTCN2022075260-appb-000005
拼接信息语义:
打包后的视频帧可以划分为一个或多个矩形区域。一个区域应精确映射到一个地图集图块。打包视频帧的矩形区 域不允许重叠。
pin_codec_id[j]表示用于对ID为j的图集压缩打包视频数据的编解码器的标识符。pin_codec_id应在0到255的范围内,包括0到255。该编解码器可以通过组件编解码器映射SEI消息或通过本文档之外的方式来识别。
pin_occupancy_present_flag[j]等于0表示ID为j的图集的打包视频帧不包含具有占用数据的区域。pin_occupancy_present_flag[j]等于1表示ID为j的图集的打包视频帧确实包含具有占用数据的区域。当pin_occupancy_present_flag[j]不存在时,推断为等于0。
比特流一致性的要求是,如果pin_occupancy_present_flag[j]对于atlas ID j的atlas等于1,vps_occupancy_video_present_flag[j]对于atlas ID j相同的atlas应等于0。
pin_geometry_present_flag[j]等于0表示ID为j的图集的打包视频帧不包含具有几何数据的区域。pin_geometry_present_flag[j]等于1表示ID为j的图集的打包视频帧确实包含具有几何数据的区域。当pin_geometry_present_flag[j]不存在时,推断为等于0。
比特流一致性的要求是,如果pin_geometry_present_flag[j]对于ID为j的图集等于1,则vps_geometry_video_present_flag[j]对于ID为j的图集应等于0。
pin_attributes_present_flag[j]等于0表示ID为j的图集的打包视频帧不包含具有属性数据的区域。pin_attributes_present_flag[j]等于1表示ID为j的图集的打包视频帧确实包含具有属性数据的区域。当pin_attributes_present_flag[j]不存在时,推断为等于0。
比特流一致性的要求是,如果pin_attribute_present_flag[j]对于ID为j的图集等于1,vps_attribute_video_present_flag[j]对于ID为j的图集应等于0。
pin_occupancy_2d_bit_depth_minus1[j]加1表示标称2D位深度,包含占用数据的ID为j的图集的解码区域应转换到该标称2D位深度。pin_occupancy_MSB_align_flag[j]应在0到31的范围内,包括0和31。
pin_occupancy_MSB_align_flag[j]指示ID为j的图集的包含占用样本的解码区域如何转换为标称占用比特深度的样本,如附件B中所指定。
pin_lossy_occupancy_compression_threshold[j]指示用于从包含ID为j的图集的占用数据的解码区域导出二进制占用的阈值。pin_lossy_occupancy_compression_threshold[j]应在0到255的范围内,包括0和255。
pin_geometry_2d_bit_depth_minus1[j]加1表示标称2D位深度,ID为j的图集的包含几何数据的解码区域应转换到的标称2D位深度。pin_geometry_2d_bit_depth_minus1[j]应在0到31的范围内,包括0和31。
pin_geometry_MSB_align_flag[j]指示如何将ID为j的图集的包含几何样本的解码区域转换为标称占用位深度的样本,如附件B中所述。
pin_geometry_3d_coordinates_bit_depth_minus1[j]加1表示ID为j的图集的重建立体内容的几何坐标的位深度。pin_geometry_3d_coordinates_bit_depth_minus1[j]应在0到31的范围内,包括0和31。
pin_attribute_count[j]表示ID为j的图集的打包视频帧中存在的具有唯一属性类型的属性的数量。
pin_attribute_type_id[j][i]表示为ID为j的图集的打包视频帧的属性区域的第i个属性类型。表3描述了支持的属性类型列表。
pin_attribute_2d_bit_depth_minus1[j][k]加1表示对于ID为j的图集,包含属性索引为k的属性的区域应转换到的标称2D位深度。pin_attribute_2d_bit_depth_minus1[j][k]应在0到31的范围内,包括0和31。
pin_attribute_MSB_align_flag[j][k]指示如何将包含属性类型为k的属性的解码区域(对于ID为j的图集)转换为标称属性位深度的样本,如附件B中所述。
pin_attribute_map_absolute_coding_persistence_flag[j][k]等于1表示解码区域包含索引为k的属性的属性图,对应于ID为j的图集,在没有任何形式的地图预测的情况下进行编码。pin_attribute_map_absolute_coding_persistence_flag[j][i]等于0表示解码区域包含索引为k的属性的属性图,对应于ID为j的图集,应使用与用于ID为j的图集的几何分量相同的地图预测方法。如果pin_attribute_map_absolute_coding_persistence_flag[j][i]不存在,则应推断其值等于1。
3D数组AttributeMapAbsoluteCodingEnabledFlag指示是否要对属性的特定映射进行编码,有或没有预测,获得如下:
Figure PCTCN2022075260-appb-000006
Figure PCTCN2022075260-appb-000007
pin_attribute_dimension_minus1[j][k]加1表示ID为j的图集的包含索引为k的属性的区域的总维数(即通道数)。pin_attribute_dimension_minus1[j][i]应在0到63的范围内,包括0和63。
pin_attribute_dimension_partitions_minus1[j][k]加1表示对于ID为j的图集,包含索引为k的属性的区域的属性通道应分组的分区组数。pin_attribute_dimension_partitions_minus1[j][k]应在0到63的范围内,包括0到63。
pin_attribute_partition_channels_minus1[j][k][l]加1表示对于ID为j的图集的包含索引为k的属性的区域,分配给索引为l的维度分区组的通道数。对于所有维度分区组,ai_attribute_partition_channels_minus1[j][k][l]应在0到ai_attribute_dimension_minus1[j][k]的范围内。
pin_regions_count_minus1[j]加1表示ID为j的图集打包在一个视频帧中的区域数。pin_regions_count_minus1应在0到7的范围内,包括0到7。当不存在时,pin_regions_count_minus1的值被推断为等于0。
pin_region_tile_id[j][i]表示ID为j的图集的索引为i的区域的图块ID。
pin_region_type_id_minus2[j][i]加2表示对于ID为j的图集,索引为i的区域的ID。pin_region_type_id_minus2[j][i]的值应在0到2的范围内,包括0到2。
pin_region_top_left_x[j][i]以打包视频分量帧中的亮度样本为单位,为ID为j的图集的索引为i的区域指定左上样本的水平位置。当不存在时,pin_region_top_left_x[j][i]的值被推断为等于0。
pin_region_top_left_y[j][i]以打包视频分量帧中的亮度样本为单位,为ID为j的图集的索引为i的区域指定左上样本的垂直位置。当不存在时,pin_region_top_left_y[j][i]的值被推断为等于0。
pin_region_width_minus1[j][i]plus 1为ID为j的图集的索引为i的区域指定宽度,以亮度样本为单位。
pin_region_height_minus1[j][i]加1指定ID为j的图集的索引为i的区域指定高度,以亮度样本为单位。
pin_region_unpack_top_left_x[j][i]以解压缩视频分量帧中的亮度样本为单位,为ID j的图集的索引为i的区域指定左上样本的水平位置。当不存在时,pin_region_unpack_top_left_x[j][i]的值被推断为等于0。
pin_region_unpack_top_left_y[j][i]以解压缩视频分量帧中的亮度样本为单位,为ID j的图集的索引为i的区域指定左上样本的垂直位置。当不存在时,pin_region_unpack_top_left_y[j][i]的值被推断为等于0。
pin_region_rotation_flag[j][i]等于0表示不对ID为j的图集的索引为i的区域执行旋转。pin_region_rotation_flag[j][i]等于1表示ID为j的图集的索引为i的区域旋转了90度。
pin_region_map_index[j][i]指定ID为j的的图集索引为i的区域的地图索引。
pin_region_auxiliary_data_flag[j][i]等于1表示ID为j的图集索引为i的区域仅包含RAW和/或EOM编码点。pin_region_auxiliary_data_flag等于0表示ID为j的图集索引为i的区域可能包含RAW和/或EOM编码点。
pin_region_attr_type_id[j][i]表示ID为j的图集索引为i的区域的属性类型。表3描述了支持的属性列表。
pin_region_attr_partition_index[j][i]表示ID为j的图集索引为i的区域的属性分区索引。当不存在时,pin_region_attr_partition_index[j][i]的值被推断为等于0。
打包视频解码过程(Packed video decoding process):
ID为DecAtlasID的图集的打包视频分量的解码过程执行如下。
对于打包的视频分量,首先使用附件A中定义的配置文件或pin_codec_id[DecAtlasID]的值和子条款F.2.11中指定的分量编解码器映射SEI消息(如果存在)来确定编解码器。然后,根据相应的编码规范,使用存在于V3C比特流中的打包视频子比特流作为输入来调用打包视频解码过程。
这个过程的输出是:
– NumDecPckFrames,表示解码后打包视频帧的数量,
– 一个4D数组DecPckFrames,解码的打包视频帧,其中维度分别对应于解码的打包视频帧索引、组件索引、 行索引和列索引,以及
– 以下一维数组:
– DecPckBitDepth,表示打包的视频位深度,
– DecPckHeight,表示打包后的视频高度,
– DecPckWidth,表示打包后的视频宽度,
– DecPckChromaFormat,表示属性色度格式,
– DecPckChromaSamplingPosition,如果存在,指示ISO/IEC 23091-2中规定的视频色度采样位置,
– DecPckFullRange,如果存在,指示ISO/IEC 23091-2中规定的视频全范围代码点,
– DecPckColourPrimaries,如果存在,指示ISO/IEC 23091-2中规定的源原色的色度坐标,
– DecPckTransferCharacteristics,如果存在,指示ISO/IEC 23091-2中规定的传输特性,
– DecPckMatrixCoeffs,如果存在,指示ISO/IEC 23091-2中规定的矩阵系数,
– DecPckOutOrdIdx,表示打包的视频输出顺序索引,以及
– DecPckCompTime,表示打包的视频合成时间。
其中维度对应于解码的打包视频帧索引。
如果数组DecPckFullRange缺失,则其所有元素都应设置为1。
如果数组DecPckTransferCharacteristics的任何元素缺失或设置为值2,即未指定,则这些元素应设置为8,即线性。
如果数组DecPckChromaSamplingPosition缺失,则其所有元素都应设置为0。
如果数组DecPckColourPrimaries缺失,则其所有元素都应设置为2。
如果数组DecPckMatrixCoeffs缺失,则其所有元素都应设置为2。
数组DecPckChromaSamplingPosition、DecPckColourPrimaries、DecPckMatrixCoeffs、DecPckFullRange和DecPckTransferCharacteristics的值不得用于pin_region_type_id_minus2等于V3C_OVD、V3C_GVD和V3C_AVD且pin_region_attr_type_id等于ATTR_MATERIAL_ID的解码打包帧区域的任何进一步处理,或ATTR_NORMAL。
这些值应根据ISO/IEC 23091-2中相应的编码点进行解释。
注—任何现有的视频编码规范,如ISO/IEC 14496-10或ISO/IEC 23008-2或任何未来定义的视频编码规范,如果包含在pin_packed_codec_id中,都可以使用。
B.4解码拼接视频的拆解过程(Unpacking process of a decoded packed video)
B.4.1一般的,当色度格式DecPckChromaFormat为4:4:4时,适用B.4节中的过程。其他chroma fromats的过程超出了本文档的范围。让变量NumRegions、NumAttributes、NumPartitions、NumMaps设置如下:NumRegions=pin_regions_count_minus1[ConvAtlasID]NumAttributes=pin_attribute_count[ConvAtlasID]分区数=64NumMaps=vps_map_count_minus1[ConvAtlasID]注—为了简化解包过程的描述,变量NumPartitions设置为最大允许值。最大允许值可能会受到应用程序工具集配置文件的进一步限制。解包过程的优化实现可以根据packing_information()语法结构中的语法元素确定该变量的适当值。让大小为NumRegions、RegionTypeId、RegionPackedOffsetX、RegionPackedOffsetY、RegionWidth、RegionHeight、RegionUnpackedOffsetX、RegionUnpackedOffsetY、RegionMapIdx、RegionRotationFlag、RegionAuxilaryDataFlag、RegionAttrTypeID、RegionAttrPatritionIdx和RegionAttrPatritionChannels的一维数组设置如下:
for(i=0;i<NumRegions;i++){
RegionTypeId[i]=pin_region_type_id_minus2[ConvAtlasID][i]+2
RegionPackedOffsetX[i]=pin_region_top_left_x[ConvAtlasID][i]
RegionPackedOffsetY[i]=pin_region_top_left_y[ConvAtlasID][i]
RegionWidth[i]=pin_region_width_minus1[ConvAtlasID][i]+1
RegionHeight[i]=pin_region_height_minus1[ConvAtlasID][i]+1
RegionUnpackedOffsetX[i]=pin_region_unpacked_top_left_x[ConvAtlasID][i]
RegionUnpackedOffsetY[i]=pin_region_unpacked_top_left_y[ConvAtlasID][i]
RegionMapIdx[i]=pin_region_map_index[ConvAtlasID][i]
RegionRotationFlag[i]=pin_region_rotation_flag[ConvAtlasID][i]
RegionAuxilaryDataFlag[i]=pin_region_auxiliary_data_flag[j][i]
RegionAttrTypeID[i]=pin_region_attr_type_id[ConvAtlasID][i]
— 解包过程定义如下:调用B.4.2节来计算解包视频分量的分辨率。此过程的输出是变量unpckOccWidth、unpckOccHeight、unpckGeoAuxWidth和unpckGeoAuxHeight、一维数组unpckGeoWidth和unpckGeoHeight、二维数组unpckAttrAuxWidth和unpckAttrAuxHeight,以及3D数组unpckAttrWidth和unpckAttrHeight。
– 调用B.4.3子条款来初始化解包的视频分量帧。此过程的输入是变量unpckOccWidth、unpckOccHeight、unpckGeoAuxWidth和unpckGeoAuxHeight、一维数组unpckGeoWidth和unpckGeoHeight、二维数组unpckAttrAuxWidth和unpckAttrAuxHeight,以及3D数组unpckAttrWidth和unpckAttrHeight。该过程的输出是4D数组unpckOccFrames、5D数组unpckGeoFrames、4D数组unpckGeoAuxFrames、7D数组unpckAttrFrames、6D数组unpckAttrAuxFrames。–将数据复制到解包的视频分量帧,调用B.4.4子节。该过程的输入是4D数组unpckOccFrames、5D数组unpckGeoFrames、4D数组unpckGeoAuxFrames、7D数组unpckAttrFrames、6D数组unpckAttrAuxFrames。该过程的输出被更新为4D数组unpckOccFrames、5D数组unpckGeoFrames、4D数组unpckGeoAuxFrames、7D数组unpckAttrFrames、6D数组unpckAttrAuxFrames。作为子条款B.4.4输出的解包视频分量帧可以作为输入传递给子条款B.2中定义的标称格式转换过程。
B.4.2计算解包视频分量分辨率此过程计算解包视频组件的分辨率。这个过程的输出是:–变量unpckOccWidth、unpckOccHeight、unpckGeoAuxWidth和unpckGeoAuxHeight。–一维数组unpckGeoWidth和unpckGeoHeight,大小为NumMaps。–2D数组unpckAttrAuxWidth和unpckAttrAuxHeight,大小为NumAttributes×NumPartitions。–3D数组unpckAttrWidth和unpckAttrHeight,大小为NumAttributes×NumPartitions×NumMaps。让变量unpckOccWidth、unpckOccHeight、unpckGeoAuxWidth和unpckGeoAuxHeight初始化如下:unpckOccWidth=0unpckOccHeight=0unpckGeoAuxWidth=0unpckGeoAuxHeight=0让大小为NumMaps的一维数组unpckGeoWidth、unpckGeoHeight初始化如下:
Figure PCTCN2022075260-appb-000008
让大小为NumAttributes×NumPartitions的二维数组unpckAttrAuxWidth和unpckAttrAuxHeight初始化如下:
Figure PCTCN2022075260-appb-000009
Let 3D arrays unpckAttrWidth,unpckAttrHeight,of size NumAttributes×NumPartitions×NumMaps,be initialized as follows:
Figure PCTCN2022075260-appb-000010
让包含解压缩视频分量维度的变量和数组计算如下:
Figure PCTCN2022075260-appb-000011
Figure PCTCN2022075260-appb-000012
B.4.3初始化解包视频分量帧此过程初始化解压缩的视频分量帧。
这个过程的输入是:
– 变量unpckOccWidth、unpckOccHeight、unpckGeoAuxWidth和unpckGeoAuxHeight。
– 一维数组unpckGeoWidth和unpckGeoHeight,大小为NumMaps。
– 2D数组unpckAttrAuxWidth和unpckAttrAuxHeight,大小为NumAttributes×NumPartitions。
– 3D数组unpckAttrWidth和unpckAttrHeight,大小为NumAttributes×NumPartitions×NumMaps。这个过程的输出是:
– 4D数组unpckOccFrames,表示解压缩的占用帧,其中维度分别对应占用视频帧索引、组件索引、行索引和列索引。
– 5D数组unpckGeoFrames,表示解压缩的几何视频帧,其中维度分别对应于地图索引、解码的几何视频帧索引、组件索引、行索引和列索引。
– 4D数组unpckGeoAuxFrames,表示解压后的辅助几何视频帧,其中维度分别对应解码后的辅助几何视频帧索引、分量索引、行索引和列索引。
– 7D数组unpckAttrFrames,表示解包后的属性视频帧,其中维度对应属性索引、属性分区索引、地图索引、解码后的属性视频帧索引、分量索引、行索引、列索引,分别。
– 6D数组unpckAttrAuxFrames,表示解包后的辅助属性视频帧,其中维度分别对应属性索引、属性分区索引、解码属性视频帧索引、分量索引、行索引和列索引让unpckOccFrames、unpckGeoAuxFrames、unpckAttrFrames和 unpckAttrAuxFrames初始化如下:
Figure PCTCN2022075260-appb-000013
B.4.4将数据从打包区域复制到解包视频分量帧过程此过程根据区域类型将区域数据从打包帧复制到未打包帧结构。这个过程的输入是:–4D数组unpckOccFrames,表示解压缩的占用帧,其中维度分别对应占用视频帧索引、组件索引、行索引和列索引。–5D数组unpckGeoFrames,表示解压缩的几何视频帧,其中维度分别对应于地图索引、解码的几何视频帧索引、组件索引、行索引和列索引。–4D数组unpckGeoAuxFrames,表示解压后的辅助几何视频帧,其中维度分别对应解码后的辅助几何视频帧索引、分量索引、行索引和列索引。–7D数组unpckAttrFrames,表示解包后的属性视频帧,其中维度对应属性索引、属性分区索引、地图索引、解码后的属性视频帧索引、分量索引、行索引、列索引,分别。–6D数组unpckAttrAuxFrames,表示解包后的辅助属性视频帧,其中维度分别对应属性索引、属性分区索引、解码属性视频帧索引、分量索引、行索引和列索引这个过程的输出是:–更新的4D数组unpckOccFrames。–更新的5D数组unpckGeoFrames。–更新的4D数组unpckGeoAuxFrames。–更新的7D数组unpckAttrFrames。–更新的6D数组unpckAttrAuxFrames。以下适用:
Figure PCTCN2022075260-appb-000014
Figure PCTCN2022075260-appb-000015
目前,如果在同一个三维场景中同时出现多种不同表达格式的视觉媒体内容时,则对多种不同表达格式的视觉媒体内容分别进行编解码。例如,对于同一个三维场景中同时出现点云和多视点视频的情况,目前的打包技术是,对点云进行压缩,形成点云压缩码流(即一种V3C码流),对多视点视频信息压缩,得到多视点视频压缩码流(即另一种V3C码流),然后由系统层对压缩码流进行复接,得到融合的三维场景复接码流。解码时,对点云压缩码流和多视点视频压缩码流分别进行解码。由此可知,现有技术在对多种不同表达格式的视觉媒体内容进行编解码时,使用的编解码器多,编解码代价高。
为了解决上述技术问题,本申请实施例通过将多种不同表达格式的视觉媒体内容对应的拼接图拼接在一张异构混合拼接图中,例如将多视点视频拼接图和点云拼接图拼接在一张异构混合拼接图中进行编解码,这样尽量减少了所需要调用的HEVC,VVC,AVC,AVS等二维视频编码器的个数,减少了编解码代价,提高易用性。
下面结合图6,以编码端为例,对本申请实施例提供的视频编码方法进行介绍。
图6为本申请一实施例提供的编码方法流程示意图,如图6所示,本申请实施例的方法包括:
S601、对多个视觉媒体内容分别进行处理,得到N个同构拼接图。
其中,多个视觉媒体内容中至少两个视觉媒体内容对应的表达格式不同,N为大于1的正整数。
在三维应用场景中,例如虚拟现实(Virtual Reality,VR)、增强现实(Augmented Reality,AR)、混合现实(Mix Reality,MR)等应用场景中,在同一个场景中可能出现表达格式不同的视觉媒体对象,例如在同一个三维场景中存在,以视频表达场景背景与部分人物和物件、以三维点云或三维网格表达了另一部分人物。
本申请实施例的多个视觉媒体内容包括多视点视频、点云、网格等媒体内容。
在一些实施例中,上述多个视觉媒体内容为同一个三维空间中同时呈现的媒体内容。
在一些实施例中,上述多个视觉媒体内容为同一个三维空间中不同时间呈现媒体内容。
在一些实施例中,上述多个视觉媒体内容还可以是不同三维空间的媒体内容。
即本申请实施例中,对上述多个视觉媒体内容不做具体限制。
也就是说,本申请实施例的多个视觉媒体内容中至少两个视觉媒体内容对应的表达格式不同。
在一些实施例中,本申请实施例的多个视觉媒体内容的表达格式均不相同,例如,多个视觉媒体内容中包括点云和多视点视频。
在一些实施例中,本申请实施例的多个视觉媒体内容中部分视觉媒体内容的表达格式相同,部分视觉媒体内容的表达格式不同,例如多个视觉媒体内容包括两个点云和一个多视点视频。
为了提高压缩效率,本申请实施例在获得多个视觉媒体内容后,对这多个视觉媒体内容进行处理,例如打包(也称为拼接)处理,得到多个视觉媒体内容中每个视觉媒体内容对应的拼接图。
例如,多个视觉媒体内容中包括点云和多视点视频,对点云进行处理,得到点云拼接图,对多视点视频进行处理,得到多视点视频拼接图。
本申请实施例对多个视觉媒体内容分别进行处理,得到N个同构拼接图的方式不做限制。
本申请实施例所述的同构拼接图是指该拼接图中每个子块对应的表达格式均相同,例如一张同构拼接图中的各子块均为多视点视频子块,或者均为点云子块等同一表达格式的子块。
在一种可能的实现方式中,若多个拼接图包括多视点视频拼接图和点云拼接图,上述S601包括如下步骤:
S601-A、对获取的多视点视频进行投影和去冗余处理后,将不重复像素点连通成视频子块,且将视频子块拼接成多视点视频拼接图。
具体的,对于多视点视频,以MPEG-I为例,选择有限数量视点作为基础视点且尽可能表达场景的可视范围,基础视点作为完整图像传输,去除剩余非基础视点与基础视点之间的冗余像素,即仅保留非重复表达的有效信息,再将有效信息提取为子块图像与基础视点图像进行重组织,形成更大的矩形图像,该矩形图像称为多视点视频拼接图。
S601-B、对获取的点云进行平行投影,将投影面中的连通点组成点云子块,且将点云子块拼接成点云拼接图。
具体的,对于点云,将三维点云进行平行投影,得到二维点云,在投影面中,将二维点云中连通点组成点云子块,在将这些点云子块进行拼接,得到点云拼接图。
根据上述方法,得到N个同构拼接图后,执行如下S602和S603。
S602、将N个同构拼接图进行拼接,生成异构混合拼接图。
S603、对异构混合拼接图进行编码,得到码流。
如图7所示,本申请实施例中,为了减少编码器的个数,降低编码代价,在编码时,首先将多个视觉媒体内容分别进行处理(即打包),得到N个同构拼接图。接着,将表达格式不完全相同的N个同构拼接图拼接成一张异构混合拼接图,对该异构混合拼接图进行编码,得到码流。也就是说,本申请实施例通过将不同表达格式的同构拼接图拼接在一张异构混合拼接图中进行编码,在编码时,可以只调用一次视频编码器进行编码,进而减少了所需要调用的HEVC,VVC,AVC,AVS等二维视频编码器的个数,减少了编码代价,提高易用性。
为了与帧打包区分,本申请实施例中将N个同构拼接图拼接为异构混合拼接图的过程称为区域打包。
在一些实施例中,上述S603包括,采用视频编码器对异构混合拼接图进行编码,得到视频码流。
本申请实施例,将N个同构拼接图拼接拼接为异构混合拼接图时,生成混合拼接信息。这些混合编码信息在解码时需要,因此,需要将这些混编码信息进行编码。
在一些实施例中,本申请实施例中,还包括对混合拼接信息进行编码的步骤,即上述S603包括如下步骤:
S603-A、调用视频编码器,对异构混合拼接图进行视频编码,得到视频压缩子码流;
S603-B、对异构混合拼接图的混合拼接信息进行编码,得到混合拼接信息子码流;
S603-C、将视频压缩子码流和混合拼接信息子码流写入码流。
本申请实施例中,对异构混合拼接图进行视频编码,得到视频压缩子码流所使用的视频编码器,可以为上述图2A所示的视频编码器。也就是说,本申请实施例将异构混合拼接图作为一帧图像,首先进行块划分,接着使用帧内或帧间预测得到编码块的预测值,编码块的预测值和原始值进行相减,得到残差值,对残差值进行变换和量化处理后,得到视频压缩子码流。
同时,对异构混合拼接图的混合拼接信息进行编码,得到混合拼接信息子码流。本申请实施例对混合拼接信息进行编码的方式不做限制,例如使用等长编码或变长编码等常规数据压缩编码方式进行压缩。
最后,视频压缩子码流和混合拼接信息子码流写在同一个码流中,得到最终的码流。
也就是说,本申请实施例不仅实现在同一压缩码流中支持视频、点云、网格等异构信源格式,而且实现多视点视频拼接图和点云(或网络)拼接图等不同表达格式的拼接图同时存在于一张异构混合拼接图中,这样尽量减少了所需要调用的视频编码器的个数,减少了实现代价,提高易用性。
本申请实施例的异构混合拼接图包括多属性异构混合拼接图和单属性异构混合拼接图。
其中,多属性异构混合拼接图是指所包括的同构拼接图中至少两个同构拼接图的属性信息不同的异构混合拼接图,例如一张多属性异构混合拼接图中即包括属性信息的同构拼接图,又包括几何信息的同构拼接图。例如,一张多属性异构混合拼接图包括多视点视频纹理拼接图和点云几何拼接图,或者一张多属性异构混合拼接图中包括多视点视频纹理拼接图、点云几何拼接图、多视点视频几何拼接图,或者,一张多属性异构混合拼接图中包括多视点视频几何拼接图、点云几何纹理拼接图、点云纹理拼接图,等等。
其中,单属性异构混合拼接图是指包括的所有同构拼接图的属性信息均相同的异构混合拼接图。例如,一张单属性异构混合拼接图只包括属性信息的同构拼接图,或者一张单属性异构混合拼接图只包括几何信息的同构拼接图。例如,一张单属性异构混合拼接图只包括多视点视频纹理拼接图和点云纹理拼接图,或者一张单属性异构混合拼接图只包括多视点视频几何拼接图和点云几何拼接图。
本申请实施例对N个同构拼接图的表达格式不做限制。
在一些实施例中,N个同构拼接图包括多视点视频拼接图、点云拼接图和网格拼接图中的至少两个。
本申请实施例的点云、多视点视频和网格包括多个属性,例如包括几何属性和纹理属性,本申请实施例,将点云、多视点视频和网格中至少两个的任意一个属性或任意两个属性下的拼接图拼接在一张图中,得到异构混合拼接图。
即上述S602中,将N个同构拼接图进行拼接,生成异构混合拼接图,包括:
S602-A、对至少第一表达格式的单一属性拼接图和第二表达格式的单一属性拼接图进行拼接,得到异构混合拼接图。
其中,第一表达格式和第二表达格式均为多视点视频、点云和网络中的任意一个,且第一表达格式和所述第二表达格式不同。
多视点视频的单一属性拼接图包括多视点视频纹理拼接图和多视点视频几何拼接图等中的至少一个。
点云的单一属性拼接图包括点云纹理拼接图、点云几何拼接图和点云占用情况拼接图等中的至少一个。
网格的点云属性拼接图包括网格纹理拼接图、网格几何拼接图和网格占用情况拼接图等中的至少一个。
例如,将多视点视频几何拼接图、点云几何拼接图、网格几何拼接图中的至少两个拼接在一张图中,得到一张异构混合拼接图。该异构混合拼接图称为单属性异构混合拼接图。
再例如,将多视点视频纹理拼接图、点云纹理拼接图、网格纹理拼接图中的至少两个拼接在一张图中,得到一张异构混合拼接图。该异构混合拼接图称为单属性异构混合拼接图。
再例如,将多视点视频纹理拼接图,与点云几何拼接图和网格几何拼接图中的至少一个拼接在一张图中,得到一张异构混合拼接图。该异构混合拼接图称为多属性异构混合拼接图。
再例如,将多视点视频几何拼接图,与点云纹理拼接图、网格纹理拼接图中的至少一个拼接在一张图中,得到一张异构混合拼接图。该异构混合拼接图称为多属性异构混合拼接图。
再例如,将点云纹理拼接图,与多视点视频几何拼接图和网格几何拼接图中的至少一个拼接在一张图中,得到一张异构混合拼接图。该异构混合拼接图称为多属性异构混合拼接图。
再例如,将点云几何拼接图,与多视点视频纹理拼接图、网格纹理拼接图中的至少一个拼接在一张图中,得到一张异构混合拼接图。该异构混合拼接图称为多属性异构混合拼接图。
下面以第一表达格式为多视点视频,第二表达格式为点云为例,对上述S602-A进行介绍。
假设多视点视频的单一属性拼接图包括多视点视频纹理拼接图和多视点视频几何拼接图。
假设,点云的单一属性拼接图包括点云纹理拼接图、点云几何拼接图、点云占用情况拼接图。
本申请实施例中,上述S602-A的混合拼接方式包括但不限于如下几种:
方式一,将多视点视频纹理拼接图、多视点视频几何拼接图、点云纹理拼接图、点云几何拼接图和点云占用情况拼接图,均拼接在一张异构混合拼接图中。
方式二、按照预设的混合拼接方式,将多视点视频纹理拼接图、多视点视频几何拼接图、点云纹理拼接图、点云几何拼接图和点云占用情况拼接图进行拼接,得到M个异构混合拼接图。
本申请实施例中,将上述多视点视频拼接图包括多视点视频纹理拼接图和多视点视频几何拼接图,点云拼接图包括点云纹理拼接图、点云几何拼接图和点云占用情况拼接图进行混合拼接,得到M个异构混合拼接图至少包括如下几种示例:
示例1,将多视点视频纹理拼接图和点云纹理拼接图进行拼接,得到异构混合纹理拼接图,将多视点视频几何拼接图、点云几何拼接图和点云占用情况拼接图进行拼接,得到异构混合几何和占用情况拼接图。
举例说明,假设多个视觉媒体内容包括多视点视频、点云1和点云2,对多视点视频进行处理,得到多视点视频的拼接图,其中多视点视频的拼接图包括多视点视频纹理拼接图和多视点视频几何拼接图。对点云1进行处理,得到点云纹理拼接图1、点云几何拼接图1,点云1占用情况拼接图。对点云2进行处理,得到点云纹理拼接图2A、点云几何拼接图2A,点云2占用情况拼接图。可选的,可以将点云1占用情况拼接图和点云2占用情况拼接图合并为一个点云占用情况拼接图。
接着,将多视点视频纹理拼接图、点云纹理拼接图1和点云纹理拼接图2A进行混合拼接,得到异构混合纹理拼接图,如图8A所示。
将多视点视频几何拼接图、点云几何拼接图1、点云几何拼接图2A和点云占用情况拼接图进行拼接,得到异构混合几何和占用情况拼接图,例如图8B所示。
示例2,将多视点视频纹理拼接图和点云纹理拼接图进行拼接,得到异构混合纹理拼接图,将多视点视频几何拼接图和点云几何拼接图,得到异构混合几何拼接图,将点云占用情况拼接图单独作为一张混合拼接图。
示例3,将多视点视频纹理拼接图和点云纹理拼接图和点云占用情况拼接图进行拼接,得到一张子异构混合拼接图,将将多视点视频几何拼接图和点云几何拼接图进行拼接,得到另一张子异构混合拼接图。
示例4,将多视点视频纹理拼接图、点云纹理拼接图、多视点视频几何拼接图、点云几何拼接图和点云占用情况拼接图拼接在一张异构混合拼接图中。
需要说明的是,上述示例1至示例4只是一部分混合拼接方式,本申请实施例的混合拼接方式包括但不限于上述示例1至示例4。
在上述方式二中,根据上述方法,将按照预设的混合拼接方式,将多视点视频纹理拼接图、多视点视频几何拼接图、点云纹理拼接图、点云几何拼接图和点云占用情况拼接图进行拼接,得到M个异构混合拼接图后,对M个异构混合拼接图分别进行视频编码,得到视频压缩子码流。
例如,使用视频编码器对M子异构混合拼接图分别进行编码,得到视频压缩子码流。可选的,可以将M子异构混合拼接图中每一张异构混合拼接图作为一帧图像进行视频编码,得到视频压缩子码流。例如,使用视频编码器,对图8A所示的异构混合纹理拼接图和图8B所示异构混合几何和占用情况拼接图分别进行编码,得到视频压缩子码流。
本申请实施例中,在生成M个异构混合拼接图的同时,生成M个异构混合拼接图中每个异构混合拼接图对应的混合拼接信息。对M个异构混合拼接图的混合拼接信息进行编码,得到M个异构混合拼接图的混合拼接信息子码流。
例如,将M个异构混合拼接图中每个异构混合拼接图对应的混合拼接信息进行组合,形成一个完整的混合拼接信息,接着,对该完整的混合拼接信息进行编码,得到混合拼接信息子码流。
举例说明,如图9所示,以多视点视频和点云为例,对多视点视频进行处理,例如通过TMIV打包技术,得到多视点视频纹理拼接图和多视点视频几何拼接图。对点云进行处理,例如通过TMC2打包技术,得到点云纹理拼接图、点云几何拼接图和点云占用情况拼接图。接着,使用预设的混合拼接方式,将多视点视频纹理拼接图、多视点视频几何拼接图、点云纹理拼接图、点云几何拼接图和点云占用情况拼接图进行拼接,得到M个子异构混合拼接图。例如,使用区域打包技术,将多视点视频纹理拼接图和点云纹理拼接图进行拼接,得到异构混合纹理拼接图;将多视点视频几何拼接图、点云几何拼接图和点云占用情况拼接图进行拼接,得到异构混合几何和占用情况拼接图。然后,使用视频编码器,对异构混合纹理拼接图和异构混合几何和占用情况拼接图进行编码,得到视频压缩子码流,对混合拼接信息进行编码,得到混合拼接信息子码流。最后,将视频压缩子码流和混合拼接信息子码流写入同一个压缩码流中。
由于原来V3C标准中的framepacking仅支持将同构的纹理、几何、占用情况拼接图拼成一个混合的拼接图,也就是说仅支持将多视点视频拼接图打包成多视点混合拼接图,或者将点云拼接图打包成点云混合拼接图,因此原来V3C定义的packinformation(拼接信息)只包括判断拼接图(packed video)的每个区域属于纹理、几何或占用情况的标志位,而没有判断当前区域属于点云还是多视点视频的标志。因此,要想让V3C支持将多视点视频拼接图和点云拼接图打包成同一个异构混合拼接图,则需要在packinformation中增加新的语法元素,比如增加表示每个region是点云还是多视点图像的语法元素。
方案1,本申请实施例的混合拼接信息包括第一标志,该一标志用于指示异构混合拼接图中的第i个区域对应的表达格式类型,其中i为正整数。
可选的,可以使用pin_region_format_type_id表示第一标志。
本申请实施例中,通过对第一标志置不同的值来指示异构混合拼接图中的第i个区域对应的表达格式类型。
以N个同构拼接图包括多视点视频拼接图和点云拼接图为例,则本申请实施例还包括:若第i个区域的拼接图为多视点视频拼接图,则将第一标志的值置为第一数值。若第i个区域的拼接图为点云拼接图,则将第一标志的值置为第二数值。
本申请实施例对第一数值和第二数值的具体取值不做限制。
可选的,第一数值为0
可选的,第二数值为1
示例性的,第一标志的取值与表达格式类型之间的对应关系如表4所示:
表4
第一标志的取值 表达格式类型
0 多视点视频
1 点云
…… ……
即本申请实施例中,由于异构混合拼接图包括至少两个表达格式不同的拼接图,因此,在异构混合拼接图进行编码时,为了提高解码端的解码准确性,则在混合拼接信息中添加第一标志,通过该第一标志来指示异构混合拼接图中每个区域对应的表达格式类型。
在一些实施例中,该方案1的语法结构如图10所示,其中A:属性拼接图,G:几何拼接图,O:占用情况拼接图,P:点云,M:多视点视频。
在一种示例中,添加第一标志后的混合拼接信息如表5所示,需要说明的是,在该示例中,混合拼接信息复用表3所示的拼接信息,并在表3所示的拼接信息中添加第一标志,具体如表5所示。
表5
Figure PCTCN2022075260-appb-000016
Figure PCTCN2022075260-appb-000017
Figure PCTCN2022075260-appb-000018
其中,pin_region_format_type_id[j][i]指示了ID为j的图集的索引为i的区域的表达格式类型。pin_region_format_type_id[j][i]等于0,则表示当前区域的表达格式为多视点视频;pin_region_format_type_id[j][i]等于1,则表示当前区域的表达格式为点云。
在该方案1中,在packing information中根据pin_region_format_type_id[j][i]来区分异构混合拼接图的某个region是多视点视频拼接图还是点云拼接图。
为了进一步说明本申请实施例的方法,下面以N个视觉媒体内容为多视点视频和点云为例,结合上述方案1,对本申请实施例的编码方法进行介绍,如图11所示,本申请实施例的编码方法包括如下步骤:
步骤11,对多视点视频通过视点间投影,擦除重复去冗余,将不重复像素连通成子块、子块拼接为多视点视频拼接图;点云通过平行投影,将投影面中的连通像素形成子块,子块拼接成点云点云拼接图。
步骤12,将多视点视频拼接图和点云拼接图进行拼接,生成异构混合拼接图。
进一步的,如果加入异构混合拼接图的当前区域的是多视点视频拼接图,则在混合拼接信息中将pin_region_format_type_id[j][i]置为0。
如果加入异构混合拼接图的当前区域的是点云拼接图,则在混合拼接信息中将pin_region_format_type_id[j][i]置为1。
步骤13,对异构混合拼接图进行视频编码,获得视频压缩子码流。
步骤14,多视点视频拼接图和点云拼接图拼接成异构混合拼接图的混合拼接信息编码形成混合拼接信息子码流;
步骤15,视频压缩码流和混合拼接信息码流写入压缩码流。
在该方案1中,编码端通过在混合拼接信息中的添加第一标志(pin_region_format_type_id),用于指示异构混合拼接图中的第i个区域的拼接图的表达格式类型。这样,解码端在解码时,可以根据混合拼接信息中的第一标志准确确定出异构混合拼接图中的当前区域的拼接图的表达格式类型,例如若第一标志的取值为第一数值时,则解码端确定异构混合拼接图中的当前区域的拼接图为多视点视频拼接图,若第一标志的取值为第二数值时,则解码端确定异构混合拼接图中的当前区域的拼接图为点云拼接图,进而使得解码端根据该第一标志实现准确解码。
方案2,在一些实施例中,此方案通过定义一种全新的v3c unit type使解码器在v3c unit header级别就可以根据v3c unit type来判断当前拼接图是否为异构混合拼接图。若v3c unit type=V3C_MVD,则表明当前当前拼接图是异构混合拼接图,后续再通过类似方案1中的设计的标志,区分异构混合拼接图每个区域的格式。相比方案1,方案2在V3C语法的更高层就标识了当前拼接图是否为异构混合拼接图,这对系统设计可能更为有利。
也就是说,本申请实施例的混合拼接信息包括第二标志,该第二标志用于指示当前混合拼接图是否为异构混合拼接图。
可选的,第二标志为全新标志。
可选的,第二标志可以复用已有的vuh_unit_type,也就是说,本申请实施例通过为vuh_unit_type赋不同的值,来指示当前混合拼接图是否为异构混合拼接图。
在一些实施例中,若当前混合拼接图为异构混合拼接图,则将第二标志置为预设值,例如令v3c unit type=V3C_MVD。
在一些实施例中,若确定第二标志的值为预设值,则在混合拼接信息中写入第一标志。也就是说,在确定当前混合拼接图为异构混合拼接图时,在混合拼接信息中写入第一标志,用于指示该异构混合拼接图的当前区域的拼接图为多视点视频拼接图或点云拼接图等不同表达格式的拼接图。
例如,若第i个区域的拼接图为多视点视频拼接图,则将第一标志的值置为第一数值;
再例如,若第i个区域的拼接图为点云拼接图,则将第一标志的值置为第二数值。
这样解码端在解码时,首先解码得到第二标志,若第二标志的取值为预设值时,才继续解码,得到第一标志,以将异构混合拼接图中的当前解码区域的拼接图解码为多视点视频拼接图或点云拼接图等不同表达格式的拼接图,实现准确解码。
在一些实施例中,若确定第二标志的值不为预设值,则跳过在混合拼接信息中写入第一标志。也就是说,若当前混合拼接图不是本申请实施例所述的异构混合拼接图时,则编码端将第二标志的值置为非预设值。以第二标志为vuh_unit_type为例,若当前混合拼接图不是异构混合拼接图时,则可以根据实际情况,确定第二标志的值,例如若当前混合拼接图为属性视频数据,则令vuh_unit_type==V3C_AVD,若当前混合拼接图为几何视频数据,则令vuh_unit_type==V3C_GVD等。
本申请实施例对第二标志在混合拼接信息中的具体写入位置不做限制。
在一种可能的实现方式中,第二标志位于混合拼接信息的单元头中。
在一些实施例中,方案2的语法元素如图12所示。
示例性的,本申请实施例将第二标志添加在上述表1所示的V3C单元头语法中,得到新的V3C单元头语法如表6所示:
表6 V3C单元头语法
Figure PCTCN2022075260-appb-000019
Figure PCTCN2022075260-appb-000020
上述表6所示的V3C单元标头的语义如表7所示,其中表7相比于上述表2,增加了V3C_MVD的语义。
表7:V3C单元类型
Figure PCTCN2022075260-appb-000021
表8 V3C单元有效负载语法
Figure PCTCN2022075260-appb-000022
表9 V3C通用参数集语法
Figure PCTCN2022075260-appb-000023
Figure PCTCN2022075260-appb-000024
Figure PCTCN2022075260-appb-000025
对V3C通用参数集语法中的下方语句修改:
vps_extension_present_flag equal to 1specifies that the syntax elements vps_packing_information_present_flag,vps_miv_extension_present_flag,and vps_extension_6bits are present in the v3c_parameter_set()syntax structure.vps_extension_present_flag equal to 0specifies that these syntax elements are not present.
vps_extension_present_flag等于1指定语法元素vps_packing_information_present_flag、vps_miv_extension_present_flag和vps_extension_6bits存在于v3c_parameter_set()语法结构中。vps_extension_present_flag等于0表示这些语法元素不存在。
修改为:
vps_extension_present_flag equal to 1specifies that the syntax elements vps_packing_information_present_flag,vps_mixed_information_present_flag,vps_miv_extension_present_flag,and vps_extension_6bits are present in the v3c_parameter_set()syntax structure.vps_extension_present_flag equal to 0specifies that these syntax elements are not present.
vps_extension_present_flag等于1指定语法元素vps_packing_information_present_flag,vps_mixed_information_present_flag,vps_miv_extension_present_flag和vps_extension_6bits存在于v3c_parameter_set()语法结构中。vps_extension_present_flag等于0表示这些语法元素不存在。
并新增如下语句:
vps_mixed_information_present_flag equal to 1specifies that one or more instances of the mixed_information(j)syntax structure are present in the v3c_parameter_set()syntax structure.vps_mixed_information_present_flag equal to 0specifies that this syntax structure is not present.When not present,the value of vps_mixed_information_present_flag is inferred to be equal to 0.
vps_mixed_information_present_flag等于1指定在v3c_parameter_set()语法结构中存在一个或多个混合信息(j)语法结构实例。vps_mixed_information_present_flag等于0表示该语法结构不存在。当不存在时,vps_mixed_information_present_flag的值被推断为等于0。
相应的,本申请实施例给出了混合信息(Mixed_information)的语法结构,如表10所示,表10相对于上述表3所示的拼接信息,对混合信息进行重新定义,在表10中,使用min_region_format_type_id表示第一标志。
表10:混合信息语法
Figure PCTCN2022075260-appb-000026
Figure PCTCN2022075260-appb-000027
表10所示的混合信息的语义如下所示:
混合后的视频帧可以划分为一个或多个矩形区域。一个区域应精确映射到一个地图集图块。混合视频帧的矩形区域不允许重叠。
min_codec_id[j]表示用于对ID为j的图集压缩混合视频数据的编解码器的标识符。min_codec_id应在0到255的范围内,包括0到255。该编解码器可以通过组件编解码器映射SEI消息或通过本文档之外的方式来识别。
min_occupancy_present_flag[j]等于0表示ID为j的图集的混合视频帧不包含具有占用数据的区域。 min_occupancy_present_flag[j]等于1表示ID为j的图集的混合视频帧确实包含具有占用数据的区域。当min_occupancy_present_flag[j]不存在时,推断为等于0。
比特流一致性的要求是,如果min_occupancy_present_flag[j]对于atlas ID j的atlas等于1,vps_occupancy_video_present_flag[j]对于atlas ID j相同的atlas应等于0。
min_geometry_present_flag[j]等于0表示ID为j的图集的混合视频帧不包含具有几何数据的区域。min_geometry_present_flag[j]等于1表示ID为j的图集的混合视频帧确实包含具有几何数据的区域。当min_geometry_present_flag[j]不存在时,推断为等于0。
比特流一致性的要求是,如果min_geometry_present_flag[j]对于ID为j的图集等于1,则vps_geometry_video_present_flag[j]对于ID为j的图集应等于0。
min_attributes_present_flag[j]等于0表示ID为j的图集的混合视频帧不包含具有属性数据的区域。min_attributes_present_flag[j]等于1表示ID为j的图集的混合视频帧确实包含具有属性数据的区域。当min_attributes_present_flag[j]不存在时,推断为等于0。
比特流一致性的要求是,如果min_attribute_present_flag[j]对于ID为j的图集等于1,vps_attribute_video_present_flag[j]对于ID为j的图集应等于0。
min_occupancy_2d_bit_depth_minus1[j]加1表示标称2D位深度,包含占用数据的ID为j的图集的解码区域应转换到该标称2D位深度。min_occupancy_MSB_align_flag[j]应在0到31的范围内,包括0和31。
min_occupancy_MSB_align_flag[j]指示ID为j的图集的包含占用样本的解码区域如何转换为标称占用比特深度的样本,如附件B中所指定。
min_lossy_occupancy_compression_threshold[j]指示用于从包含ID为j的图集的占用数据的解码区域导出二进制占用的阈值。min_lossy_occupancy_compression_threshold[j]应在0到255的范围内,包括0和255。
min_geometry_2d_bit_depth_minus1[j]加1表示标称2D位深度,ID为j的图集的包含几何数据的解码区域应转换到的标称2D位深度。min_geometry_2d_bit_depth_minus1[j]应在0到31的范围内,包括0和31。
min_geometry_MSB_align_flag[j]指示如何将ID为j的图集的包含几何样本的解码区域转换为标称占用位深度的样本,如附件B中所述。
min_geometry_3d_coordinates_bit_depth_minus1[j]加1表示ID为j的图集的重建立体内容的几何坐标的位深度。min_geometry_3d_coordinates_bit_depth_minus1[j]应在0到31的范围内,包括0和31。
min_attribute_count[j]表示ID为j的图集的混合视频帧中存在的具有唯一属性类型的属性的数量。
min_attribute_type_id[j][i]表示为ID为j的图集的混合视频帧的属性区域的第i个属性类型。表3描述了支持的属性类型列表。
min_attribute_2d_bit_depth_minus1[j][k]加1表示对于ID为j的图集,包含属性索引为k的属性的区域应转换到的标称2D位深度。min_attribute_2d_bit_depth_minus1[j][k]应在0到31的范围内,包括0和31。
min_attribute_MSB_align_flag[j][k]指示如何将包含属性类型为k的属性的解码区域(对于ID为j的图集)转换为标称属性位深度的样本,如附件B中所述。
min_attribute_map_absolute_coding_persistence_flag[j][k]等于1表示解码区域包含索引为k的属性的属性图,对应于ID为j的图集,在没有任何形式的地图预测的情况下进行编码。min_attribute_map_absolute_coding_persistence_flag[j][i]等于0表示解码区域包含索引为k的属性的属性图,对应于ID为j的图集,应使用与用于ID为j的图集的几何分量相同的地图预测方法。如果min_attribute_map_absolute_coding_persistence_flag[j][i]不存在,则应推断其值等于1。
3D数组AttributeMapAbsoluteCodingEnabledFlag指示是否要对属性的特定映射进行编码,有或没有预测,获得如下:
Figure PCTCN2022075260-appb-000028
Figure PCTCN2022075260-appb-000029
min_attribute_dimension_minus1[j][k]加1表示ID为j的图集的包含索引为k的属性的区域的总维数(即通道数)。min_attribute_dimension_minus1[j][i]应在0到63的范围内,包括0和63。
min_attribute_dimension_partitions_minus1[j][k]加1表示对于ID为j的图集,包含索引为k的属性的区域的属性通道应分组的分区组数。min_attribute_dimension_partitions_minus1[j][k]应在0到63的范围内,包括0到63。
min_attribute_partition_channels_minus1[j][k][l]加1表示对于ID为j的图集的包含索引为k的属性的区域,分配给索引为l的维度分区组的通道数。对于所有维度分区组,ai_attribute_partition_channels_minus1[j][k][l]应在0到ai_attribute_dimension_minus1[j][k]的范围内。
min_regions_count_minus1[j]加1表示ID为j的图集混合在一个视频帧中的区域数。min_regions_count_minus1应在0到7的范围内,包括0到7。当不存在时,min_regions_count_minus1的值被推断为等于0。
min_region_tile_id[j][i]表示ID为j的图集的索引为i的区域的图块ID。
min_region_format_type_id[j][i]表示ID为j的图集索引为i的区域的格式类型。min_region_format_type_id[j][i]等于0,表示区域格式为多视角视频;等于1,区域格式为点云。
min_region_type_id_minus2[j][i]加2表示对于ID为j的图集,索引为i的区域的ID。min_region_type_id_minus2[j][i]的值应在0到2的范围内,包括0到2。
min_region_top_left_x[j][i]以混合视频分量帧中的亮度样本为单位,为ID为j的图集的索引为i的区域指定左上样本的水平位置。当不存在时,min_region_top_left_x[j][i]的值被推断为等于0。
min_region_top_left_y[j][i]以混合视频分量帧中的亮度样本为单位,为ID为j的图集的索引为i的区域指定左上样本的垂直位置。当不存在时,min_region_top_left_y[j][i]的值被推断为等于0。
min_region_width_minus1[j][i]plus 1为ID为j的图集的索引为i的区域指定宽度,以亮度样本为单位。
min_region_height_minus1[j][i]加1指定ID为j的图集的索引为i的区域指定高度,以亮度样本为单位。
min_region_unpack_top_left_x[j][i]以解压缩视频分量帧中的亮度样本为单位,为ID j的图集的索引为i的区域指定左上样本的水平位置。当不存在时,min_region_unpack_top_left_x[j][i]的值被推断为等于0。
min_region_unpack_top_left_y[j][i]以解压缩视频分量帧中的亮度样本为单位,为ID j的图集的索引为i的区域指定左上样本的垂直位置。当不存在时,min_region_unpack_top_left_y[j][i]的值被推断为等于0。
min_region_rotation_flag[j][i]等于0表示不对ID为j的图集的索引为i的区域执行旋转。min_region_rotation_flag[j][i]等于1表示ID为j的图集的索引为i的区域旋转了90度。
min_region_map_index[j][i]指定ID为j的的图集索引为i的区域的地图索引。
min_region_auxiliary_data_flag[j][i]等于1表示ID为j的图集索引为i的区域仅包含RAW和/或EOM编码点。min_region_auxiliary_data_flag等于0表示ID为j的图集索引为i的区域可能包含RAW和/或EOM编码点。
min_region_attr_type_id[j][i]表示ID为j的图集索引为i的区域的属性类型。表3描述了支持的属性列表。
min_region_attr_partition_index[j][i]表示ID为j的图集索引为i的区域的属性分区索引。当不存在时,min_region_attr_partition_index[j][i]的值被推断为等于0。
混合视频解码过程:
ID为DecAtlasID的图集的混合视频分量的解码过程如下。
对于混合视频分量,首先使用附件A中定义的配置文件或mix_codec_id[DecAtlasID]的值和子条款F.2.11中指定的分量编解码器映射SEI消息(如果存在)来确定编解码器。然后,根据相应的编码规范,使用存在于V3C比特流中的混合视频子比特流作为输入来调用混合视频解码过程。
这个过程的输出是:
– NumDecMixFrames,指示解码混合视频帧的数量
– a 4D array DecMixFrames,解码后的混合视频帧,其中维度分别对应于解码后的混合视频帧索引、分量索引、行索引和列索引,以及
– 以下的一维数组:
– DecMixBitDepth,指示混合视频的位宽,
– DecMixHeight,指示混合视频的高度,
– DecMixWidth,指示混合视频的宽度,
– DecMixChromaFormat,指示属性色度格式,
– DecMixChromaSamplingPosition,如果存在,指示ISO/IEC 23091-2中规定的视频色度采样位置,
– DecMixFullRange,如果存在,指示ISO/IEC 23091-2中规定的视频全范围代码点,
– DecMixColourPrimaries,如果存在,指示ISO/IEC 23091-2中规定的源原色的色度坐标,
– DecMixTransferCharacteristics,如果存在,指示ISO/IEC 23091-2中规定的传输特性,
– DecMixMatrixCoeffs,如果存在,指示ISO/IEC 23091-2中规定的矩阵系数,
– DecMixOutOrdIdx,指示混合视频输出顺序索引,以及
– DecMixCompTime,指示混合视频合成时间。
其中维度对应于解码的混合视频帧索引。
如果数组DecMixFullRange缺失,则其所有元素都应设置为1。
如果数组DecMixTransferCharacteristics的任何元素缺失或设置为值2,即未指定,则这些元素应设置为8,即线性。
如果数组DecMixChromaSamplingPosition缺失,则其所有元素都应设置为0。
如果数组DecMixColourPrimaries缺失,则其所有元素都应设置为2。
如果数组DecMixMatrixCoeffs缺失,则其所有元素都应设置为2。
数组DecMixChromaSamplingPosition、DecMixColourPrimaries、DecMixMatrixCoeffs、DecMixFullRange和DecMixTransferCharacteristics的值不应用于min_region_type_id_minus2等于V3C_OVD、V3C_GVD和V3C_AVD的min_region_attr_type_id等于ATTR_MATERIAL_ID的解码混合帧区域的任何进一步处理,或ATTR_NORMAL。
这些值应根据ISO/IEC 23091-2中相应的编码点进行解释。
需要注意的是,任何现有的视频编码规范,如ISO/IEC 14496-10或ISO/IEC 23008-2或任何未来定义的视频编码规范,如果包含在min_mixed_codec_id中,都可以使用。
在该方案2中,在v3c unit header级别就可以根据v3c unit type来判断当前拼接图是否为异构混合拼接图若v3c unit type=V3C_MVD,则表明当前当前拼接图是异构混合拼接图,后续再通过mixed information中的min_region_format_type_id[j][i]来区分异构混合拼接图的某个region是多视点视频拼接图还是点云拼接图。
为了进一步说明本申请实施例的方法,下面以N个视觉媒体内容为多视点视频和点云为例,结合上述方案2,对本申请实施例的编码方法进行介绍,如图13所示,本申请实施例的编码方法包括如下步骤:
步骤21,多视点视频通过视点间投影、擦除重复去冗余、不重复像素连通成子块、子块拼接为多视点视频拼接图,点云通过平行投影、投影面中的连通像素形成子块,子块拼接成点云点云拼接图。
步骤22,将多视点视频拼接图和点云拼接图进行拼接,生成异构混合拼接图,同时,令其v3c unit type=V3C_MVD。
进一步的,如果加入异构混合拼接图的当前区域的是多视点视频拼接图,则在混合拼接信息中将min_region_format_type_id[j][i]置为0。
如果加入异构混合拼接图的当前区域的是点云拼接图,则在混合拼接信息中将min_region_format_type_id[j][i]置为1。
步骤23,对异构混合拼接图进行视频编码,获得视频压缩子码流;
步骤24,将多视点视频拼接图和点云拼接图拼接成异构混合拼接图的混合拼接信息编码形成混合拼接信息子码流;
步骤25,将视频压缩码流和混合拼接信息码流写入压缩码流。
在该方案2中,通过设置更高一级的第二标志,用于指示当前混合拼接图是否为本申请实施例提出的异构混合拼接图,在确定当前混合拼接图为异构混合拼接图后,在混合拼接信息中写入第一标志,通过第一标志用于指示异构混合拼接图中的当前区域的拼接图的表达格式类型。这样,解码端在解码时,首先对第二标志进行解码,若第二标志指示当前混合拼接图为异构混合拼接图时,解码端再对第一标志进行解码,以确定异构混合拼接图的当前区域的拼接图的表达格式类型,进而实现准确解码。
方案3,此方案通过在原有四种v3c unit type(V3C_AVD,V3C_GVD,V3C_OVD,V3C_PVD)的基础上,示例 性的定义四种全新的v3c unit type(V3C_MAVD,V3C_MGVD,V3C_MOVD,V3C_MPVD),使解码器在v3c unit header级别就可以根据v3c unit type来判断当前拼接图是否为异构混合拼接图。若v3c unit type为上述四种新定义的v3c unit type中的一种,则表明当前当前拼接图是异构混合拼接图,后续再通过类似方案1中的设计的标志,区分异构混合拼接图每个区域的格式。
本申请实施例中,在混合拼接信息中写入第三标志,该第三标志用于指示用于指示当前混合拼接图是否为异构混合拼接图,以及属于哪一种异构混合拼接图。
示例性的,异构混合拼接图包括如下几种类型:异构混合占用情况拼接图、异构混合几何拼接图、异构混合属性拼接图、异构混合打包拼接图。
基于此,本申请实施例的方法还包括如下示例:
示例1,若编码端确定当前异构混合拼接图为异构混合占用情况拼接图,则令第三标志的取值为第一预设值,例如V3C_MAVD。
示例2,若编码端确定当前异构混合拼接图为异构混合几何拼接图,则令第三标志的取值为第二预设值,例如V3C_MGVD。
示例3,若编码端确定当前异构混合拼接图为异构混合属性拼接图,则令第三标志的取值为第三预设值,例如V3C_MGVD。
示例4,若编码端确定当前异构混合拼接图为异构混合打包拼接图,则令第三标志的取值为第四预设值,例如异构混合打包拼接图。
也就是说,在该方案3中,本申请实施例在混合信息中增加如下语法元素中的至少一个:V3C_MAVD,V3C_MGVD,V3C_MOVD,V3C_MPVD。
其中,V3C_MAVD用于指示当前混合拼接图为异构混合占用情况拼接图。例如指示该当前混合拼接图只包括多视点视频的占用情况拼接图和点云的占用情况拼接图。
V3C_MGVD用于指示当前混合拼接图为异构混合几何拼接图。例如指示该当前混合拼接图只包括多视点视频的几何拼接图和点云的几何拼接图。
V3C_MOVD用于指示当前混合拼接图为异构混合属性拼接图。例如指示该当前混合拼接图只包括多视点视频的纹理拼接图和点云的纹理拼接图。
V3C_MPVD用于指示当前混合拼接图为异构混合打包拼接图。可选的,异构混合打包拼接图也可以称为全属性异构混合拼接图。
例如指示该当前混合拼接图包括多视点视频的占用情况拼接图和点云的占用情况拼接图、多视点视频的几何拼接图和点云的几何拼接图,以及多视点视频的纹理拼接图和点云的纹理拼接图。
本申请实施例对上述第三标志的具体表示方式不做限制。
在一种示例中,上述第三标志为全新的一种标志。
在另一种示例中,上述第三标志复用已有的vuh_unit_type。
下面以第三标志复用已有的vuh_unit_type为例进行说明。
可选的,上述第三标志可以位于混合拼接信息的单元头中。
在一些实施例中,方案3的语法元素如图14所示。
本申请实施例中,编码端在确定第三标志指示所述当前混合拼接图为异构混合拼接图时,则在所述混合拼接信息中写入第一标志。
在一些实施例中,若编码端确定第三标志指示所述当前混合拼接图不是异构混合拼接图时,则跳过在混合拼接信息中写入第一标志。
添加第三标志的V3C单元头语法,如表12所示:
表12 V3C单元头语法
Figure PCTCN2022075260-appb-000030
Figure PCTCN2022075260-appb-000031
上述表12所示的V3C单元标头的语义如表13所示,其中表13相比于上述表2,增加了V3C_MVD的语义。V3C单元头语义如表13所示:
表13 V3C单元类型
Figure PCTCN2022075260-appb-000032
Figure PCTCN2022075260-appb-000033
V3C单元有效负载语法如表14所示:
表14 V3C unit payload syntax
Figure PCTCN2022075260-appb-000034
V3C通用参数集语法如表15所示
表15 General V3C parameter set syntax
Figure PCTCN2022075260-appb-000035
Figure PCTCN2022075260-appb-000036
Figure PCTCN2022075260-appb-000037
对V3C通用参数集语义中的下方语句修改:
vps_extension_present_flag等于1指定语法元素vps_packing_information_present_flag、vps_miv_extension_present_flag和vps_extension_6bits存在于v3c_parameter_set()语法结构中。vps_extension_present_flag等于0表示这些语法元素不存在。
修改为:
vps_extension_present_flag等于1指定语法元素vps_packing_information_present_flag,vps_mixed_occuapancy_information_present_flag,vps_mixed_geometry_information_present_flag,vps_mixed_attribute_information_present_flag,vps_mixed_packing_information_present_flag,vps_miv_extension_present_flag和vps_extension_6bits存在于v3c_parameter_set()语法结构中。vps_extension_present_flag等于0表示这些语法元素不存在。
并新增如下语句:
vps_mixed_occuapancy_video_present_flag[j]等于0表示ID为j的图集的混合打包视频帧不包含具有占用数据的区域。vps_mixed_occuapancy_video_present_flag[j]等于1表示ID为j的图集的混合打包视频帧确实包含具有占用数据的区域。当vps_mixed_occuapancy_video_present_flag[j]不存在时,推断为等于0。
比特流一致性的要求是,如果vps_mixed_occuapancy_video_present_flag[j]对于atlas ID j的atlas等于1,vps_occupancy_video_present_flag[j]对于atlas ID j相同的atlas应等于0。
vps_mixed_occuapancy_information_present_flag等于1指定在v3c_parameter_set()语法结构中存在一个或多个混合信息(j)语法结构实例。vps_mixed_occupancy_information_present_flag等于0表示该语法结构不存在。当不存 在时,vps_mixed_occupancy_information_present_flag的值被推断为等于0。
vps_mixed_geometry_video_present_flag[j]等于0表示ID为j的图集的混合打包视频帧不包含具有几何数据的区域。vps_mixed_geometry_video_present_flag[j]等于1表示ID为j的图集的混合打包视频帧确实包含具有几何数据的区域。当vps_mixed_geometry_video_present_flag[j]不存在时,推断为等于0。
比特流一致性的要求是,如果vps_mixed_geometry_video_present_flag[j]对于ID为j的图集等于1,则vps_geometry_video_present_flag[j]对于ID为j的图集应等于0。
vps_mixed_geometry_information_present_flag等于1指定在v3c_parameter_set()语法结构中存在一个或多个混合信息(j)语法结构实例。vps_mixed_geometry_information_present_flag等于0表示该语法结构不存在。当不存在时,vps_mixed_geometry_information_present_flag的值被推断为等于0。
vps_mixed_attribute_video_present_flag[j]等于0表示ID为j的图集的混合打包视频帧不包含具有属性数据的区域。vps_mixed_attribute_video_present_flag[j]等于1表示ID为j的图集的混合打包视频帧确实包含具有属性数据的区域。当vps_mixed_attribute_video_present_flag[j]不存在时,推断为等于0。
比特流一致性的要求是,如果vps_mixed_attribute_video_present_flag[j]对于ID为j的图集等于1,vps_attribute_video_present_flag[j]对于ID为j的图集应等于0。
vps_mixed_attribute_information_present_flag等于1指定在v3c_parameter_set()语法结构中存在一个或多个混合信息(j)语法结构实例。vps_mixed_attribute_information_present_flag等于0表示该语法结构不存在。当不存在时,vps_mixed_attribute_information_present_flag的值被推断为等于0。
vps_mixed_packing_information_present_flag等于1指定在v3c_parameter_set()语法结构中存在一个或多个混合信息(j)语法结构实例。vps_mixed_packing_information_present_flag等于0表示该语法结构不存在。当不存在时,vps_mixed_packing_information_present_flag的值被推断为等于0。
混合占用信息语法如表16所示:
表16 Mixed occupancy information syntax
Figure PCTCN2022075260-appb-000038
混合占用信息语义(Mixed attribute information semantics)如下所示:
混合后的占用情况视频帧可以划分为一个或多个矩形区域。一个区域应精确映射到一个地图集图块。混合占用情况视频帧的矩形区域不允许重叠。
moi_codec_id[j]表示用于对ID为j的图集压缩混合占用情况视频数据的编解码器的标识符。moi_codec_id应在0到255的范围内,包括0到255。该编解码器可以通过组件编解码器映射SEI消息或通过本文档之外的方式来识别。
moi_occupancy_2d_bit_depth_minus1[j]加1表示标称2D位深度,包含占用数据的ID为j的图集的解码区域应转换到该标称2D位深度。moi_occupancy_MSB_align_flag[j]应在0到31的范围内,包括0和31。
moi_occupancy_MSB_align_flag[j]指示ID为j的图集的包含占用样本的解码区域如何转换为标称占用比特深度的样本,如附件B中所指定。
moi_lossy_occupancy_compression_threshold[j]指示用于从包含ID为j的图集的占用数据的解码区域导出二进制占用的阈值。moi_lossy_occupancy_compression_threshold[j]应在0到255的范围内,包括0和255。
moi_regions_count_minus1[j]加1表示ID为j的图集混合在一个视频帧中的区域数。moi_regions_count_minus1应在0到7的范围内,包括0到7。当不存在时,moi_regions_count_minus1的值被推断为等于0。
moi_region_tile_id[j][i]表示ID为j的图集的索引为i的区域的图块ID。
moi_region_format_type_id[j][i]表示ID为j的图集索引为i的区域的格式类型。moi_region_format_type_id[j][i]等于0,表示区域格式为多视角视频;等于1,区域格式为点云。
moi_region_top_left_x[j][i]以混合占用情况视频分量帧中的亮度样本为单位,为ID为j的图集的索引为i的区域指定左上样本的水平位置。当不存在时,moi_region_top_left_x[j][i]的值被推断为等于0。
moi_region_top_left_y[j][i]以混合占用情况视频分量帧中的亮度样本为单位,为ID为j的图集的索引为i的区域指定左上样本的垂直位置。当不存在时,moi_region_top_left_y[j][i]的值被推断为等于0。
moi_region_width_minus1[j][i]plus 1为ID为j的图集的索引为i的区域指定宽度,以亮度样本为单位。
moi_region_height_minus1[j][i]加1指定ID为j的图集的索引为i的区域指定高度,以亮度样本为单位。
moi_region_unpack_top_left_x[j][i]以解压缩视频分量帧中的亮度样本为单位,为ID j的图集的索引为i的区域指定左上样本的水平位置。当不存在时,moi_region_unpack_top_left_x[j][i]的值被推断为等于0。
moi_region_unpack_top_left_y[j][i]以解压缩视频分量帧中的亮度样本为单位,为ID j的图集的索引为i的区域指定左上样本的垂直位置。当不存在时,moi_region_unpack_top_left_y[j][i]的值被推断为等于0。
moi_region_rotation_flag[j][i]等于0表示不对ID为j的图集的索引为i的区域执行旋转。moi_region_rotation_flag[j][i]等于1表示ID为j的图集的索引为i的区域旋转了90度。
混合几何信息语法(Mixed geometry information syntax)如表17所示:
表17 Mixed geometry information syntax
Figure PCTCN2022075260-appb-000039
混合几何信息语义如下所示:
混合后的几何视频帧可以划分为一个或多个矩形区域。一个区域应精确映射到一个地图集图块。混合几何视频帧的矩形区域不允许重叠。
mgi_codec_id[j]表示用于对ID为j的图集压缩混合几何视频数据的编解码器的标识符。mgi_codec_id应在0到255的范围内,包括0到255。该编解码器可以通过组件编解码器映射SEI消息或通过本文档之外的方式来识别。
mgi_geometry_2d_bit_depth_minus1[j]加1表示标称2D位深度,ID为j的图集的包含几何数据的解码区域应转换到的标称2D位深度。mgi_geometry_2d_bit_depth_minus1[j]应在0到31的范围内,包括0和31。
mgi_geometry_MSB_align_flag[j]指示如何将ID为j的图集的包含几何样本的解码区域转换为标称占用位深度的样本,如附件B中所述。
mgi_geometry_3d_coordinates_bit_depth_minus1[j]加1表示ID为j的图集的重建立体内容的几何坐标的位深度。mgi_geometry_3d_coordinates_bit_depth_minus1[j]应在0到31的范围内,包括0和31。
mgi_regions_count_minus1[j]加1表示ID为j的图集混合在一个视频帧中的区域数。mgi_regions_count_minus1应在0到7的范围内,包括0到7。当不存在时,mgi_regions_count_minus1的值被推断为等于0。
mgi_region_tile_id[j][i]表示ID为j的图集的索引为i的区域的图块ID。
mgi_region_format_type_id[j][i]表示ID为j的图集索引为i的区域的格式类型。mgi_region_format_type_id[j][i]等于0,表示区域格式为多视角视频;等于1,区域格式为点云。
mgi_region_top_left_x[j][i]以混合几何视频分量帧中的亮度样本为单位,为ID为j的图集的索引为i的区域指定左上样本的水平位置。当不存在时,mgi_region_top_left_x[j][i]的值被推断为等于0。
mgi_region_top_left_y[j][i]以混合几何视频分量帧中的亮度样本为单位,为ID为j的图集的索引为i的区域指定左上样本的垂直位置。当不存在时,mgi_region_top_left_y[j][i]的值被推断为等于0。
mgi_region_width_minus1[j][i]plus 1为ID为j的图集的索引为i的区域指定宽度,以亮度样本为单位。
mgi_region_height_minus1[j][i]加1指定ID为j的图集的索引为i的区域指定高度,以亮度样本为单位。
mgi_region_unpack_top_left_x[j][i]以解压缩视频分量帧中的亮度样本为单位,为ID j的图集的索引为i的区域指定左上样本的水平位置。当不存在时,mgi_region_unpack_top_left_x[j][i]的值被推断为等于0。
mgi_region_unpack_top_left_y[j][i]以解压缩视频分量帧中的亮度样本为单位,为ID j的图集的索引为i的区域指定左上样本的垂直位置。当不存在时,mgi_region_unpack_top_left_y[j][i]的值被推断为等于0。
mgi_region_rotation_flag[j][i]等于0表示不对ID为j的图集的索引为i的区域执行旋转。mgi_region_rotation_flag[j][i]等于1表示ID为j的图集的索引为i的区域旋转了90度。
mgi_region_map_index[j][i]指定ID为j的的图集索引为i的区域的地图索引。
mgi_region_auxiliary_data_flag[j][i]等于1表示ID为j的图集索引为i的区域仅包含RAW和/或EOM编码点。mgi_region_auxiliary_data_flag等于0表示ID为j的图集索引为i的区域可能包含RAW和/或EOM编码点。
混合属性信息语法(Mixed attribute information syntax)如表18所示:
表18 Mixed attribute information syntax
Figure PCTCN2022075260-appb-000040
Figure PCTCN2022075260-appb-000041
混合属性信息语义(Mixed attribute information semantics)
混合后的属性视频帧可以划分为一个或多个矩形区域。一个区域应精确映射到一个地图集图块。混合属性视频帧的矩形区域不允许重叠。
mai_codec_id[j]表示用于对ID为j的图集压缩混合属性视频数据的编解码器的标识符。mai_codec_id应在0到255的范围内,包括0到255。该编解码器可以通过组件编解码器映射SEI消息或通过本文档之外的方式来识别。
mai_attribute_count[j]表示ID为j的图集的混合属性视频帧中存在的具有唯一属性类型的属性的数量。
mai_attribute_type_id[j][i]表示为ID为j的图集的混合属性视频帧的属性区域的第i个属性类型。表3描述了支持的属性类型列表。
mai_attribute_2d_bit_depth_minus1[j][k]加1表示对于ID为j的图集,包含属性索引为k的属性的区域应转换到的标称2D位深度。mai_attribute_2d_bit_depth_minus1[j][k]应在0到31的范围内,包括0和31。
mai_attribute_MSB_align_flag[j][k]指示如何将包含属性类型为k的属性的解码区域(对于ID为j的图集)转换为标称属性位深度的样本,如附件B中所述。
mai_attribute_map_absolute_coding_persistence_flag[j][k]等于1表示解码区域包含索引为k的属性的属性图, 对应于ID为j的图集,在没有任何形式的地图预测的情况下进行编码。mai_attribute_map_absolute_coding_persistence_flag[j][i]等于0表示解码区域包含索引为k的属性的属性图,对应于ID为j的图集,应使用与用于ID为j的图集的几何分量相同的地图预测方法。如果mai_attribute_map_absolute_coding_persistence_flag[j][i]不存在,则应推断其值等于1。
3D数组AttributeMapAbsoluteCodingEnabledFlag指示是否要对属性的特定映射进行编码,有或没有预测,获得如下:
Figure PCTCN2022075260-appb-000042
mai_attribute_dimension_minus1[j][k]加1表示ID为j的图集的包含索引为k的属性的区域的总维数(即通道数)。mai_attribute_dimension_minus1[j][i]应在0到63的范围内,包括0和63。
mai_attribute_dimension_partitions_minus1[j][k]加1表示对于ID为j的图集,包含索引为k的属性的区域的属性通道应分组的分区组数。mai_attribute_dimension_partitions_minus1[j][k]应在0到63的范围内,包括0到63。
mai_attribute_partition_channels_minus1[j][k][l]加1表示对于ID为j的图集的包含索引为k的属性的区域,分配给索引为l的维度分区组的通道数。对于所有维度分区组,ai_attribute_partition_channels_minus1[j][k][l]应在0到ai_attribute_dimension_minus1[j][k]的范围内。
mai_regions_count_minus1[j]加1表示ID为j的图集混合在一个视频帧中的区域数。mai_regions_count_minus1应在0到7的范围内,包括0到7。当不存在时,mai_regions_count_minus1的值被推断为等于0。
mai_region_tile_id[j][i]表示ID为j的图集的索引为i的区域的图块ID。
mai_region_format_type_id[j][i]表示ID为j的图集索引为i的区域的格式类型。mai_region_format_type_id[j][i]等于0,表示区域格式为多视角视频;等于1,区域格式为点云。
mai_region_top_left_x[j][i]以混合属性视频分量帧中的亮度样本为单位,为ID为j的图集的索引为i的区域指定左上样本的水平位置。当不存在时,mai_region_top_left_x[j][i]的值被推断为等于0。
mai_region_top_left_y[j][i]以混合属性视频分量帧中的亮度样本为单位,为ID为j的图集的索引为i的区域指定左上样本的垂直位置。当不存在时,mai_region_top_left_y[j][i]的值被推断为等于0。
mai_region_width_minus1[j][i]plus 1为ID为j的图集的索引为i的区域指定宽度,以亮度样本为单位。
mai_region_height_minus1[j][i]加1指定ID为j的图集的索引为i的区域指定高度,以亮度样本为单位。
mai_region_unpack_top_left_x[j][i]以解压缩视频分量帧中的亮度样本为单位,为ID j的图集的索引为i的区域指定左上样本的水平位置。当不存在时,mai_region_unpack_top_left_x[j][i]的值被推断为等于0。
mai_region_unpack_top_left_y[j][i]以解压缩视频分量帧中的亮度样本为单位,为ID j的图集的索引为i的区域指定左上样本的垂直位置。当不存在时,mai_region_unpack_top_left_y[j][i]的值被推断为等于0。
mai_region_rotation_flag[j][i]等于0表示不对ID为j的图集的索引为i的区域执行旋转。mai_region_rotation_flag[j][i]等于1表示ID为j的图集的索引为i的区域旋转了90度。
mai_region_map_index[j][i]指定ID为j的的图集索引为i的区域的地图索引。
mai_region_auxiliary_data_flag[j][i]等于1表示ID为j的图集索引为i的区域仅包含RAW和/或EOM编码点。mai_region_auxiliary_data_flag等于0表示ID为j的图集索引为i的区域可能包含RAW和/或EOM编码点。
mai_region_attr_type_id[j][i]表示ID为j的图集索引为i的区域的属性类型。表3描述了支持的属性列表。
mai_region_attr_partition_index[j][i]表示ID为j的图集索引为i的区域的属性分区索引。当不存在时,mai_region_attr_partition_index[j][i]的值被推断为等于0。
混合打包信息语法(Mixedpacking information syntax)如表19所示:
表18 Mixed packing information syntax
Figure PCTCN2022075260-appb-000043
Figure PCTCN2022075260-appb-000044
混合打包信息语义如下:
混合后的打包视频帧可以划分为一个或多个矩形区域。一个区域应精确映射到一个地图集图块。混合打包视频帧的矩形区域不允许重叠。
mpi_codec_id[j]表示用于对ID为j的图集压缩混合打包视频数据的编解码器的标识符。mpi_codec_id应在0到255的范围内,包括0到255。该编解码器可以通过组件编解码器映射SEI消息或通过本文档之外的方式来识别。
mpi_occupancy_present_flag[j]等于0表示ID为j的图集的混合打包视频帧不包含具有占用数据的区域。mpi_occupancy_present_flag[j]等于1表示ID为j的图集的混合打包视频帧确实包含具有占用数据的区域。当mpi_occupancy_present_flag[j]不存在时,推断为等于0。
比特流一致性的要求是,如果mpi_occupancy_present_flag[j]对于atlas ID j的atlas等于1,vps_occupancy_video_present_flag[j]对于atlas ID j相同的atlas应等于0。
mpi_geometry_present_flag[j]等于0表示ID为j的图集的混合打包视频帧不包含具有几何数据的区域。mpi_geometry_present_flag[j]等于1表示ID为j的图集的混合打包视频帧确实包含具有几何数据的区域。当mpi_geometry_present_flag[j]不存在时,推断为等于0。
比特流一致性的要求是,如果mpi_geometry_present_flag[j]对于ID为j的图集等于1,则vps_geometry_video_present_flag[j]对于ID为j的图集应等于0。
mpi_attributes_present_flag[j]等于0表示ID为j的图集的混合打包视频帧不包含具有属性数据的区域。mpi_attributes_present_flag[j]等于1表示ID为j的图集的混合打包视频帧确实包含具有属性数据的区域。当mpi_attributes_present_flag[j]不存在时,推断为等于0。
比特流一致性的要求是,如果mpi_attribute_present_flag[j]对于ID为j的图集等于1,vps_attribute_video_present_flag[j]对于ID为j的图集应等于0。
mpi_occupancy_2d_bit_depth_minus1[j]加1表示标称2D位深度,包含占用数据的ID为j的图集的解码区域应转换到该标称2D位深度。mpi_occupancy_MSB_align_flag[j]应在0到31的范围内,包括0和31。
mpi_occupancy_MSB_align_flag[j]指示ID为j的图集的包含占用样本的解码区域如何转换为标称占用比特深度的样本,如附件B中所指定。
mpi_lossy_occupancy_compression_threshold[j]指示用于从包含ID为j的图集的占用数据的解码区域导出二进制占用的阈值。mpi_lossy_occupancy_compression_threshold[j]应在0到255的范围内,包括0和255。
mpi_geometry_2d_bit_depth_minus1[j]加1表示标称2D位深度,ID为j的图集的包含几何数据的解码区域应转换到的标称2D位深度。mpi_geometry_2d_bit_depth_minus1[j]应在0到31的范围内,包括0和31。
mpi_geometry_MSB_align_flag[j]指示如何将ID为j的图集的包含几何样本的解码区域转换为标称占用位深度的样本,如附件B中所述。
mpi_geometry_3d_coordinates_bit_depth_minus1[j]加1表示ID为j的图集的重建立体内容的几何坐标的位深度。mpi_geometry_3d_coordinates_bit_depth_minus1[j]应在0到31的范围内,包括0和31。
mpi_attribute_count[j]表示ID为j的图集的混合打包视频帧中存在的具有唯一属性类型的属性的数量。
mpi_attribute_type_id[j][i]表示为ID为j的图集的混合打包视频帧的属性区域的第i个属性类型。表3描述了支持的属性类型列表。
mpi_attribute_2d_bit_depth_minus1[j][k]加1表示对于ID为j的图集,包含属性索引为k的属性的区域应转换到的标称2D位深度。mpi_attribute_2d_bit_depth_minus1[j][k]应在0到31的范围内,包括0和31。
mpi_attribute_MSB_align_flag[j][k]指示如何将包含属性类型为k的属性的解码区域(对于ID为j的图集)转换为标称属性位深度的样本,如附件B中所述。
mpi_attribute_map_absolute_coding_persistence_flag[j][k]等于1表示解码区域包含索引为k的属性的属性图,对应于ID为j的图集,在没有任何形式的地图预测的情况下进行编码。mpi_attribute_map_absolute_coding_persistence_flag[j][i]等于0表示解码区域包含索引为k的属性的属性图,对应于ID为j的图集,应使用与用于ID为j的图集的几何分量相同的地图预测方法。如果mpi_attribute_map_absolute_coding_persistence_flag[j][i]不存在,则应推断其值等于1。
3D数组AttributeMapAbsoluteCodingEnabledFlag指示是否要对属性的特定映射进行编码,有或没有预测,获得 如下:
Figure PCTCN2022075260-appb-000045
mpi_attribute_dimension_minus1[j][k]加1表示ID为j的图集的包含索引为k的属性的区域的总维数(即通道数)。mpi_attribute_dimension_minus1[j][i]应在0到63的范围内,包括0和63。
mpi_attribute_dimension_partitions_minus1[j][k]加1表示对于ID为j的图集,包含索引为k的属性的区域的属性通道应分组的分区组数。mpi_attribute_dimension_partitions_minus1[j][k]应在0到63的范围内,包括0到63。
mpi_attribute_partition_channels_minus1[j][k][l]加1表示对于ID为j的图集的包含索引为k的属性的区域,分配给索引为l的维度分区组的通道数。对于所有维度分区组,ai_attribute_partition_channels_minus1[j][k][l]应在0到ai_attribute_dimension_minus1[j][k]的范围内。
mpi_regions_count_minus1[j]加1表示ID为j的图集混合在一个视频帧中的区域数。mpi_regions_count_minus1应在0到7的范围内,包括0到7。当不存在时,mpi_regions_count_minus1的值被推断为等于0。
mpi_region_tile_id[j][i]表示ID为j的图集的索引为i的区域的图块ID。
mpi_region_format_type_id[j][i]表示ID为j的图集索引为i的区域的格式类型。mpi_region_format_type_id[j][i]等于0,表示区域格式为多视角视频;等于1,区域格式为点云。
mpi_region_type_id_minus2[j][i]加2表示对于ID为j的图集,索引为i的区域的ID。mpi_region_type_id_minus2[j][i]的值应在0到2的范围内,包括0到2。
mpi_region_top_left_x[j][i]以混合打包视频分量帧中的亮度样本为单位,为ID为j的图集的索引为i的区域指定左上样本的水平位置。当不存在时,mpi_region_top_left_x[j][i]的值被推断为等于0。
mpi_region_top_left_y[j][i]以混合打包视频分量帧中的亮度样本为单位,为ID为j的图集的索引为i的区域指定左上样本的垂直位置。当不存在时,mpi_region_top_left_y[j][i]的值被推断为等于0。
mpi_region_width_minus1[j][i]plus 1为ID为j的图集的索引为i的区域指定宽度,以亮度样本为单位。
mpi_region_height_minus1[j][i]加1指定ID为j的图集的索引为i的区域指定高度,以亮度样本为单位。
mpi_region_unpack_top_left_x[j][i]以解压缩视频分量帧中的亮度样本为单位,为ID j的图集的索引为i的区域指定左上样本的水平位置。当不存在时,mpi_region_unpack_top_left_x[j][i]的值被推断为等于0。
mpi_region_unpack_top_left_y[j][i]以解压缩视频分量帧中的亮度样本为单位,为ID j的图集的索引为i的区域指定左上样本的垂直位置。当不存在时,mpi_region_unpack_top_left_y[j][i]的值被推断为等于0。
mpi_region_rotation_flag[j][i]等于0表示不对ID为j的图集的索引为i的区域执行旋转。mpi_region_rotation_flag[j][i]等于1表示ID为j的图集的索引为i的区域旋转了90度。
mpi_region_map_index[j][i]指定ID为j的的图集索引为i的区域的地图索引。
mpi_region_auxiliary_data_flag[j][i]等于1表示ID为j的图集索引为i的区域仅包含RAW和/或EOM编码点。mpi_region_auxiliary_data_flag等于0表示ID为j的图集索引为i的区域可能包含RAW和/或EOM编码点。
mpi_region_attr_type_id[j][i]表示ID为j的图集索引为i的区域的属性类型。表3描述了支持的属性列表。
mpi_region_attr_partition_index[j][i]表示ID为j的图集索引为i的区域的属性分区索引。当不存在时,mpi_region_attr_partition_index[j][i]的值被推断为等于0。
混合打包视频解码过程如下所示:
ID为DecAtlasID的图集的混合视频分量的解码过程如下。
对于混合视频分量,首先使用附件A中定义的配置文件或mpi_codec_id[DecAtlasID]的值和子条款F.2.11中指定的分量编解码器映射SEI消息(如果存在)来确定编解码器。然后,根据相应的编码规范,使用存在于V3C比特流中的混合视频子比特流作为输入来调用混合视频解码过程。
这个过程的输出是:
– NumDecMpkFrames,指示解码混合视频帧的数量
– a 4D array DecMpkFrames,解码后的混合视频帧,其中维度分别对应于解码后的混合视频帧索引、分量索引、行索引和列索引,以及
– 以下的一维数组:
– DecMpkBitDepth,指示混合视频的位宽,
– DecMpkHeight,指示混合视频的高度,
– DecMpkWidth,指示混合视频的宽度,
– DecMpkChromaFormat,指示属性色度格式,
– DecMpkChromaSamplingPosition,如果存在,指示ISO/IEC 23091-2中规定的视频色度采样位置,
– DecMpkFullRange,如果存在,指示ISO/IEC 23091-2中规定的视频全范围代码点,
– DecMpkColourPrimaries,如果存在,指示ISO/IEC 23091-2中规定的源原色的色度坐标,
– DecMpkTransferCharacteristics,如果存在,指示ISO/IEC 23091-2中规定的传输特性,
– DecMpkMatrixCoeffs,如果存在,指示ISO/IEC 23091-2中规定的矩阵系数,
– DecMpkOutOrdIdx,指示混合视频输出顺序索引,以及
– DecMpkCompTime,指示混合视频合成时间。
其中维度对应于解码的混合视频帧索引。
如果数组DecMpkFullRange缺失,则其所有元素都应设置为1。
如果数组DecMpkTransferCharacteristics的任何元素缺失或设置为值2,即未指定,则这些元素应设置为8,即线性。
如果数组DecMpkChromaSamplingPosition缺失,则其所有元素都应设置为0。
如果数组DecMpkColourPrimaries缺失,则其所有元素都应设置为2。
如果数组DecMpkMatrixCoeffs缺失,则其所有元素都应设置为2。
数组DecMpkChromaSamplingPosition、DecMpkColourPrimaries、DecMpkMatrixCoeffs、DecMpkFullRange和DecMpkTransferCharacteristics的值不应用于mpi_region_type_id_minus2等于V3C_OVD、V3C_GVD和V3C_AVD的mpi_region_attr_type_id等于ATTR_MATERIAL_ID的解码混合帧区域的任何进一步处理,或ATTR_NORMAL。
这些值应根据ISO/IEC 23091-2中相应的编码点进行解释。
注:任何现有的视频编码规范,如ISO/IEC 14496-10或ISO/IEC 23008-2或任何未来定义的视频编码规范,如果包含在mix_packed_codec_id中,都可以使用。
在该方案3中,在v3c unit header级别就可以根据v3c unit type来判断当前混合拼接图是否为异构混合拼接图。异构混合拼接图可能出现的情况有以下四种:
1)若v3c unit type=V3C_MOVD,则表明当前混合拼接图是异构混合占用情况拼接图,后续再通过mixedoccupancy information中的moi_region_format_type_id[j][i]来区分异构混合拼接图的某个region是多视点视频拼接图还是点云拼接图。
2)若v3c unit type=V3C_MGVD,则表明当前混合拼接图是异构混合几何拼接图,后续再通过mixedgeometry information中的mgi_region_format_type_id[j][i]来区分异构混合拼接图的某个region是多视点视频拼接图还是点云拼接图。
3)若v3c unit type=V3C_MAVD,则表明当前混合拼接图是异构混合属性拼接图,后续再通过mixedattribute information中的mai_region_format_type_id[j][i]来区分异构混合拼接图的某个region是多视点视频拼接图还是点云拼接图。
4)若v3c unit type=V3C_MPVD,则表明当前混合拼接图是异构混合打包拼接图,后续再通过mixedpacking information中的mpi_region_format_type_id[j][i]来区分异构混合拼接图的某个region是多视点视频拼接图还是点云拼接图。
为了进一步说明本申请实施例的方法,下面以N个视觉媒体内容为多视点视频和点云为例,结合上述方案3,对 本申请实施例的编码方法进行介绍,如图15所示,本申请实施例的编码方法包括如下步骤:
步骤31,多视点视频通过视点间投影、擦除重复去冗余、不重复像素连通成子块、子块拼接为多视点视频拼接图,点云通过平行投影、投影面中的连通像素形成子块,子块拼接成点云点云拼接图。
步骤32,将多视点视频拼接图和点云拼接图进行拼接,生成异构混合拼接图,同时,令其v3c unit type=V3C_MPVD。
进一步的,如果加入异构混合拼接图的当前区域的是多视点视频打包拼接图,则在混合拼接信息中将mpi_region_format_type_id[j][i]置为0。
如果加入异构混合拼接图的当前区域的是点云打包拼接图,则在混合拼接信息中将mpi_region_format_type_id[j][i]置为1。
步骤33,对异构混合拼接图进行视频编码,获得视频压缩子码流;
步骤34,将多视点视频打包拼接图和点云打包拼接图拼接成异构混合打包拼接图的混合拼接信息编码形成混合拼接信息子码流;
步骤35,将视频压缩码流和混合拼接信息码流写入压缩码流。
在该方案3中,通过设置更高一级的第三标志,用于指示当前混合拼接图是哪一种异构混合拼接图,在确定当前混合拼接图为某一种异构混合拼接图后,在混合拼接信息中写入第一标志,通过第一标志用于指示异构混合拼接图中的当前区域的拼接图的表达格式类型。这样,解码端在解码时,首先对第三标志进行解码,若第三标志指示当前混合拼接图为某一种异构混合拼接图时,解码端再对第一标志进行解码,以确定异构混合拼接图的当前区域的拼接图的表达格式类型,进而实现准确解码。
本申请实施例提供的编码方法,编码端通过对N个视觉媒体内容分别进行处理,得到N个拼接图,N个视觉媒体内容中至少两个视觉媒体内容对应的表达格式不同,N为大于1的正整数;将N个拼接图进行拼接,生成异构混合拼接图;对异构混合拼接图进行编码,得到码流。即本申请通过将多种不同表达格式的视觉媒体内容对应的拼接图拼接在一张异构混合拼接图中,例如将多视点视频拼接图和点云拼接图拼接在一张异构混合拼接图中进行编解码,这样尽量减少了所需要调用的HEVC,VVC,AVC,AVS等二维视频编码器的个数,减少了编码代价,提高易用性。
上文以编码端为例对本申请的编码方法进行介绍,下面以解码端为例进行说明。
图16为本申请实一施例提供的解码方法流程示意图。如图16所示,本申请实施例的解码方法包括:
S701、解码码流,得到重建异构混合拼接图。
S702、对重建异构混合拼接图进行拆分,得到N个重建同构拼接图,N为大于1的正整数;
S703、根据N个重建同构拼接图,得到多个重建视觉媒体内容。
其中,多个重建视觉媒体内容中至少两个重建视觉媒体内容对应的表达格式不同。
由上述可知,在编码时,将具有不同表达格式的拼接图拼接在一张异构混合拼接图进行编码。因此,解码端在获得码流后,对码流进行解码,得到重建异构混合拼接图,接着,对该重建异构混合拼接图进行拆分,得到N个重建同构拼接图,则N个重建同构拼接图中至少有两个拼接图对应的表达格式不同。最后,解码端对拆分后的N个重建同构拼接图进行重建等处理,得到多个重建视觉媒体内容。
本申请实施例中,将不同表达格式的多个同构拼接图拼接在一张异构混合拼接图中,这样在解码时,可以尽量减少所需要调用的HEVC,VVC,AVC,AVS等二维视频解码器的个数,减少了解码代价,提高易用性。
在一些实施例中,上述码流中包括视频压缩子码流,此时,上述S701包括如下步骤:
S701-A、解码视频压缩子码流,得到重建异构混合拼接图。
也就是说,本申请实施例的码流包括视频压缩子码流还可能包括其他内容,解码端获得码流后,对码流进行解析,得到码流所包括的视频压缩子码流。接着,对该视频压缩子码流进行解码,得到重建异构混合拼接图,例如,将该视频压缩子码流输入图2B所示的视频解码器中进行解码,得到重建异构混合拼接图。
为了提高解码的准确性,由上述可知,编码端将混合拼接信息写入码流,也就是说,本申请实施例的码流中除了上述视频压缩子码流外,还包括混合拼接信息子码流,此时,本申请实施例的解码方法还包括:解码混合拼接信息子码流,得到混合拼接信息。
对应的,上述S702包括如下步骤:
S702-A、根据混合拼接信息,对重建异构混合拼接图进行拆分,得到N个重建拼接图。
即本申请实施例,解码端对码流进行解析,得到视频压缩子码流和混合拼接信息子码流,接着,解码端对视频压缩子码流进行解码,得到重建异构混合拼接图,对混合拼接信息子码流进行解码,得到混合拼接信息。最后,使用混 合拼接信息,对重建异构混合拼接图进行拆分,得到N个重建同构拼接图。
在一些实施例中,重建异构混合拼接图包括多属性重建异构混合拼接图和单属性重建异构混合拼接图。
在一些实施例中,N个重建同构拼接图包括多视点视频重建拼接图、点云重建拼接图和网格重建拼接图中的至少两个。
基于此,上述S702-A包括如下步骤:
S702-A1、根据混合拼接信息,对重建异构混合拼接图进行拆分,至少得到第一表达格式的单一属性重建拼接图和第二表达格式的单一属性重建拼接图。
其中,第一表达格式和第二表达格式均为多视点视频、点云和网络中的任意一个,且第一表达格式和第二表达格式不同。
在一些实施例中,若第一表达格式为多视点视频,第二表达格式为点云,则上述S702-A1包括如下示例:
示例1,若重建异构混合拼接图为重建异构混合纹理拼接图,则根据混合拼接信息,对重建异构混合纹理拼接图进行拆分,得到多视点视频纹理重建拼接图和点云纹理重建拼接图。
示例2,若重建异构混合拼接图为重建异构混合几何和占用情况拼接图的混合拼接信息,则根据混合拼接信息,对重建异构混合几何和占用情况拼接图进行拆分,得到多视点视频几何重建拼接图、点云几何重建拼接图和点云占用情况重建拼接图。
根据上述方法,得到重建多视点视频纹理拼接图和重建点云纹理拼接图,以及重建多视点视频几何拼接图、重建点云几何拼接图和重建点云占用情况拼接图后,可以根据重建多视点视频纹理拼接图和重建多视点视频几何拼接图,得到重建多视点视频拼接图;
根据重建点云纹理拼接图、重建点云几何拼接图和重建点云占用情况拼接图,得到重建点云拼接图。
举例说明,如图17所示,解码端将码流输入视频解码器中,解码器对视频压缩子码流进行解码,得到重建异构混合纹理拼接图和重建异构混合几何和占用情况拼接图,对混合拼接信息子码流进行解码,得到混合拼接信息。接着,根据混合拼接信息对重建异构混合纹理拼接图进行拆分,例如使用区域解打包技术对重建异构混合纹理拼接图进行拆分,得到重建多视点视频纹理拼接图和重建点云纹理拼接图。根据混合拼接信息对重建异构混合几何和占用情况拼接图进行拆分,例如使用区域解打包技术,对重建异构混合几何和占用情况拼接图进行拆分,得到重建多视点视频几何拼接图、重建点云几何拼接图和重建点云占用情况拼接图。然后,根据重建多视点视频纹理拼接图和重建多视点视频几何拼接图,得到重建多视点视频拼接图,例如,使用TMIV解打包技术,对重建多视点视频纹理拼接图和重建多视点视频几何拼接图进行处理,得到重建多视点视频拼接图。根据重建点云纹理拼接图、重建点云几何拼接图和重建点云占用情况拼接图,得到重建点云拼接图,例如TMC2解打包技术,对重建点云纹理拼接图、重建点云几何拼接图和重建点云占用情况拼接图进行处理,得到重建点云拼接图。
由于本申请实施例的异构混合拼接图中的至少两个拼接图的表达格式不同,因此,为了提高解码准确性,本申请实施例的混合拼接信息包括第一标志,该第一标志用于指示异构混合拼接图中的第i个区域对应的表达格式类型,i为正整数。
此时,上述S702-A包括如下S702-A2和S702-A3的步骤:
S702-A2、针对重建异构混合拼接图中的第i个区域,从混合拼接信息中获取第i个区域对应的第一标志;
S702-A3、根据第i个区域对应的第一标志,将第i区域拆分为第i个区域对应的视觉媒体表达格式类型的重建拼接图。
以N个重建同构拼接图包括重建多视点视频拼接图和重建点云拼接图为例,上述S702-A3包括如下步骤:
S702-A31、若第一标志的取值为第一数值,则将第i区域拆分为重建多视点视频拼接图;
S702-A32、若第一标志的取值为第二数值,则将第i区域拆分为重建点云拼接图。
可选的,第一数值为0。
可选的,第二数值为1。
进一步的,下面通过具体实施例,对混合拼接信息中包括第一标志时的解码过程进行介绍。具体的,如图18所示,解码过程包括如下步骤:
步骤41,从压缩码流中,分别提取混合拼接信息子码流和视频压缩子码流。
步骤42,将混合拼接信息子码流解码后得到混合拼接信息。
步骤43,将视频压缩子码流输入到视频解码器,解码后输出重建异构混合拼接图。
步骤44,根据混合拼接信息中的第一标志,将重建异构混合拼接图,拆分并输出重建多视点视频拼接图和重建点云拼接图。
具体的,从混合拼接信息中获取第一标志pin_region_format_type_id[j][i]。
若确定pin_region_format_type_id[j][i]==0,则表示重建异构混合拼接图中的第i个区域(region)是属于多视点视频的,则将该第i个区域拆分并输出为重建多视点视频拼接图。
若确定pin_region_format_type_id[j][i]==1,则表示重建异构混合拼接图中的第i个区域(region)是属于点云的,则将该第i个区域拆分并输出为重建点云拼接图。
步骤45,重建多视点视频拼接图通过多视点视频解码生成重建多视点视频,重建点云拼接图通过点云解码生成重建点云。
本申请实施例中,通过在混合拼接信息中写入第一标志,解码端在解码时,可以根据混合拼接信息中的第一标志准确确定出异构混合拼接图中的当前区域的拼接图的表达格式类型,例如若第一标志的取值为第一数值时,则解码端确定异构混合拼接图中的当前区域的拼接图为多视点视频拼接图,若第一标志的取值为第二数值时,则解码端确定异构混合拼接图中的当前区域的拼接图为点云拼接图,进而使得解码端根据该第一标志实现准确解码。
在一些实施例中,混合拼接信息包括第二标志,该第二标志用于指示当前混合拼接图是否为异构混合拼接图。
可选的,第二标志位于混合拼接信息的单元头中。
此时,在执行上述S702-A2中从所述混合拼接信息中获取所述第i个区域对应的第一标志之前,本申请实施例首先从混合拼接信息中获取第二标志,并根据该第二标志,确定混合拼接信息中是否存在第一标志。
例如,解码端从混合拼接信息中获得第二标志,若该第二标志的取值为预设值,则说明当前混合拼接图为异构混合拼接图,此时,解码端从混合拼接信息中读取第i个区域对应的第一标志,并根据该第一标志的取值,确定第i个区域的拼接图对应的表达格式类型,例如第一标志的取值为第一数值时,确定第i个区域为多视点视频拼接图,若第一标志的取值为第二数值时,则确定第i个区域为点云拼接图。
在一些实施例中,若第二标志的取值不为所述预设值时,则说明当前混合拼接图不是异构混合拼接图,此时解码端跳过从混合拼接信息中获取第i个区域对应的第一标志的步骤。
本申请实施例中,在v3c unit header中写入第二标志,使得解码端可以在v3c unit header级别就可以根据v3c unit type来判断当前拼接图是否为异构混合拼接图。若第二标志v3c unit type=V3C_MVD,则表明当前当前拼接图是异构混合拼接图,后续再通过混合拼接信息(mixed information)中的第一标志min_region_format_type_id[j][i]来区分异构混合拼接图的某个region是多视点视频拼接图还是点云拼接图。
进一步的,下面通过具体实施例,对混合拼接信息中包括第二标志时的解码过程进行介绍。具体的,如图19所示,解码过程包括如下步骤:
步骤51,解析压缩码流,得到第二标志v3c unit type,若v3c unit type=V3C_MVD,则表明当前混合拼接图为异构混合拼接图,接着,从压缩码流中分别提取混合拼接信息子码流和视频压缩子码流。
步骤52,将混合拼接信息子码流解码后得到混合拼接信息。
步骤53,将视频压缩子码流输入到视频解码器,解码后输出重建异构混合拼接图。
步骤54,根据混合拼接信息中的第一标志,将重建异构混合拼接图,拆分并输出重建多视点视频拼接图和重建点云拼接图。
具体的,解码端从混合拼接信息中获取第一标志min_region_format_type_id[j][i]。
若确定min_region_format_type_id[j][i]==0,则表示重建异构混合拼接图的第i个区域(region)是属于多视点视频的,则将该第i个区域拆分并输出为重建多视点视频拼接图。
若确定min_region_format_type_id[j][i]==1,则表示重建异构混合拼接图的第i个区域(region)是属于点云的,则将该第i个区域拆分并输出为重建点云拼接图。
步骤55,重建多视点视频拼接图通过多视点视频解码生成重建多视点视频,重建点云拼接图通过点云解码生成重建点云。
本申请实施例,通过设置更高一级的第二标志,用于指示当前混合拼接图是否为本申请实施例提出的异构混合拼接图,这样,解码端在解码时,首先对第二标志进行解码,若第二标志指示当前混合拼接图为异构混合拼接图时,解码端再对第一标志进行解码,以确定异构混合拼接图的当前区域的拼接图的表达格式类型,进而实现准确解码。
在一些实施例中,混合拼接信息包括第三标志,该第三标志用于指示当前混合拼接图是否为异构混合拼接图,以及属于哪一种异构混合拼接图。
异构混合拼接图包括如下几种类型:异构混合占用情况拼接图、异构混合几何拼接图、异构混合属性拼接图、异构混合打包拼接图。
可选的,第三标志位于混合拼接信息的单元头中。
此时,在执行上述S702-A2中从所述混合拼接信息中获取所述第i个区域对应的第一标志之前,本申请实施例首先从混合拼接信息中获取第三标志,并根据该第三标志,确定混合拼接信息中是否存在第一标志。
例如,解码端从混合拼接信息中获得第三标志,若该第三标志的取值为第一预设值、第二预设值、第三预设值或第四预设值时,则从混合拼接信息中获取第i个区域对应的第一标志,第一预设值用于指示当前混合拼接图为异构混合占用情况拼接图,第二预设值用于指示当前混合拼接图为异构混合几何拼接图,第三预设值用于指示当前混合拼接图为异构混合属性拼接图,第四预设值用于指示当前混合拼接图为异构混合打包拼接图。接着,根据该第一标志的取值,确定第i个区域的拼接图对应的表达格式类型,例如第一标志的取值为第一数值时,确定第i个区域为多视点视频拼接图,若第一标志的取值为第二数值时,则确定第i个区域为点云拼接图。
在一些实施例中,若第三标志的取值不为第一预设值、第二预设值、第三预设值或第四预设值时,则说明当前混合拼接图不是异构混合拼接图,此时解码端跳过从混合拼接信息中获取第i个区域对应的第一标志的步骤。
本申请实施例中,在v3c unit header中写入第三标志,使得解码端可以在v3c unit header级别就可以根据v3c unit type来判断当前拼接图是否为异构混合拼接图,以及异构混合拼接图的类型。若第三标志v3c unit type为V3C_MAVD,V3C_MGVD,V3C_MOVD,或V3C_MPVD,则表明当前当前拼接图是异构混合拼接图,后续再通过混合拼接信息(mixed information)中的第一标志min_region_format_type_id[j][i]来区分异构混合拼接图的某个region是多视点视频拼接图还是点云拼接图。
进一步的,下面通过具体实施例,对混合拼接信息中包括第三标志时的解码过程进行介绍。具体的,如图20所示,解码过程包括如下步骤:
步骤61,解析压缩码流,得到第三标志v3c unit type,若v3c unit type=V3C_MPVD,则表明当前拼接图为异构混合打包拼接图,从压缩码流中分别提取混合拼接信息子码流和异构混合打包拼接图视频压缩子码流。
步骤62,将混合拼接信息子码流解码后得到混合拼接信息。
步骤63,将视频压缩子码流输入到视频解码器,解码后输出重建异构混合拼接图。
步骤64,根据混合拼接信息中的第一标志,将重建异构混合拼接图,拆分并输出重建多视点视频拼接图和重建点云拼接图。
具体的,解码端从混合拼接信息中获取第一标志min_region_format_type_id[j][i]。
若确定min_region_format_type_id[j][i]==0,则表示重建异构混合拼接图的第i个区域(region)是属于多视点视频的,则将该第i个区域拆分并输出为重建多视点视频拼接图。
若确定min_region_format_type_id[j][i]==1,则表示重建异构混合拼接图的第i个区域(region)是属于点云的,则将该第i个区域拆分并输出为重建点云拼接图。
步骤65,重建多视点视频拼接图通过多视点视频解码生成重建多视点视频,重建点云拼接图通过点云解码生成重建点云。
本申请实施例,通过设置更高一级的第三标志,用于指示当前混合拼接图是否为本申请实施例提出的异构混合拼接图,以及属于哪一种异构混合拼接图,这样,解码端在解码时,首先对第三标志进行解码,若第三标志指示当前混合拼接图为异构混合拼接图时,解码端再对第一标志进行解码,以确定异构混合拼接图的当前区域的拼接图的表达格式类型,进而实现准确解码。
本申请实施例提供的解码方法,解码端通过解码码流,得到重建异构混合拼接图;对重建异构混合拼接图进行拆分,得到N个重建拼接图,N为大于1的正整数;对N个重建拼接图分别进行解码,得到N个重建视觉媒体内容,N个重建视觉媒体内容中至少两个重建视觉媒体内容对应的表达格式不同。即本申请实施例,将不同表达格式的多个拼接图拼接在一张异构混合拼接图中,这样在解码时,保留来自不同表达格式的数据(点云等)的渲染优点,提高图像的合成质量的同时,可以尽量减少所需要调用的HEVC,VVC,AVC,AVS等二维视频解码器的个数,减少了解码代价,提高易用性。
应理解,图14至图19仅为本申请的示例,不应理解为对本申请的限制。
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体细节,在本申请 的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。
还应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。另外,本申请实施例中,术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系。具体地,A和/或B可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本申请中字符“/”,一般表示前后关联对象是一种“或”的关系。
上文结合图6至图20,详细描述了本申请的方法实施例,下文结合图16至图18详细描述本申请的装置实施例。
图21为本申请一实施例提供的编码装置的示意性框图,该编码装置10应用于上述视频解码端。
如图21所示,编码装置10包括:
第一拼接单元11,用于对多个视觉媒体内容进行处理,得到N个同构拼接图,所述多个视觉媒体内容中至少两个视觉媒体内容对应的表达格式不同,所述N为大于1的正整数;
第二拼接单元12,用于将所述N个同构拼接图进行拼接,生成异构混合拼接图;
编码单元13,用于对所述异构混合拼接图进行编码,得到码流。
在一些实施例中,编码单元13,具体用于调用视频编码器,对所述异构混合拼接图进行视频编码,得到视频压缩子码流;对所述异构混合拼接图的混合拼接信息进行编码,得到混合拼接信息子码流;将所述视频压缩子码流和所述混合拼接信息子码流写入所述码流。
在一些实施例中,异构混合拼接图包括多属性异构混合拼接图和单属性异构混合拼接图。
在一些实施例中,所述N个同构拼接图包括多视点视频拼接图、点云拼接图和网格拼接图中的至少两个。
在一些实施例中,第二拼接单元12,具体用于对至少第一表达格式的单一属性拼接图和第二表达格式的单一属性拼接图进行拼接,得到所述异构混合拼接图,所述第一表达格式和所述第二表达格式均为多视点视频、点云和网络中的任意一个,且所述第一表达格式和所述第二表达格式不同。
在一些实施例中,若所述第一表达格式为多视点视频,所述第二表达格式为点云,所述第二拼接单元12,具体用于将多视点视频纹理拼接图和点云纹理拼接图进行拼接,得到异构混合纹理拼接图;或者,将多视点视频几何拼接图、点云几何拼接图和点云占用情况拼接图进行拼接,得到异构混合几何和占用情况拼接图。
在一些实施例中,所述混合拼接信息包括第一标志,所述第一标志用于指示所述异构混合拼接图中的第i个区域对应的表达格式类型,所述i为正整数。
在一些实施例中,若所述N个拼接图包括多视点视频拼接图和点云拼接图,所述第二拼接单元12,还用于方法若所述第i个区域的拼接图为所述多视点视频拼接图,则将所述第一标志的值置为第一数值;若所述第i个区域的拼接图为所述点云拼接图,则将所述第一标志的值置为第二数值。
在一些实施例中,所述混合拼接信息包括第二标志,所述第二标志用于指示当前混合拼接图是否为异构混合拼接图。
在一些实施例中,所述第二拼接单元12,还用于若所述当前混合拼接图为所述异构混合拼接图,则将所述第二标志置为预设值
在一些实施例中,所述第二拼接单元12,还用于若确定所述第二标志的值为所述预设值,则在所述混合拼接信息中写入第一标志。
在一些实施例中,所述第二拼接单元12,还用于若确定所述第二标志的值不为所述预设值时,则跳过在所述混合拼接信息中写入第一标志。
可选的,所述第二标志位于所述混合拼接信息的单元头中。
在一些实施例中,若所述N个拼接图包括多视点视频拼接图和点云拼接图,所述第一拼接单元11,具体用于对获取的多视点视频进行投影和去冗余处理后,将不重复像素点连通成视频子块,且将所述视频子块拼接成所述多视点视频拼接图;对获取的点云进行平行投影,将投影面中的连通点组成点云子块,且将所述点云子块拼接成所述点云拼接图。
可选的,所述N个视觉媒体内容为同一个三维空间中同时呈现的媒体内容。
在一些实施例中,所述混合拼接信息包括第三标志,所述第三标志用于指示当前混合拼接图是否为异构混合拼接 图,以及属于哪一种异构混合拼接图。
在一些实施例中,所述异构混合拼接图包括如下几种类型:异构混合占用情况拼接图、异构混合几何拼接图、异构混合属性拼接图、异构混合打包拼接图。
在一些实施例中,所述第二拼接单元12,具体用于若所述当前混合拼接图为所述异构混合占用情况拼接图,则将所述第三标志置为第一预设值;若所述当前混合拼接图为所述异构混合几何拼接图,则将所述第三标志置为第二预设值;若所述当前混合拼接图为所述异构混合属性拼接图,则将所述第三标志置为第三预设值;若所述当前混合拼接图为所述异构混合打包拼接图,则将所述第三标志置为第四预设值。
在一些实施例中,第二拼接单元12,还用于若确定所述第三标志指示所述当前混合拼接图为异构混合拼接图时,则在所述混合拼接信息中写入所述第一标志。
在一些实施例中,第二拼接单元12,还用于若确定所述第三标志指示所述当前混合拼接图不是异构混合拼接图时,则跳过在所述混合拼接信息中写入所述第一标志。
可选的,所述第三标志位于所述混合拼接信息的单元头中
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图21所示的装置10可以执行本申请实施例的编码端的编码方法,并且装置10中的各个单元的前述和其它操作和/或功能分别为了实现上述编码端的编码方法等各个方法中的相应流程,为了简洁,在此不再赘述。
图22是本申请一实施例提供的解码装置的示意性框图,该解码装置应用于上述解码端。
如图22所示,该解码装置20可以包括:
解码单元21,用于解码码流,得到重建异构混合拼接图;
第一拆分单元22,用于对所述重建异构混合拼接图进行拆分,得到N个重建同构拼接图,所述N为大于1的正整数;
处理单元23,用于根据所述N个同构重建拼接图,得到多个重建视觉媒体内容,所述多个重建视觉媒体内容中至少两个重建视觉媒体内容对应的表达格式不同。
在一些实施例中,所述码流包括视频压缩子码流,所述解码单元21,具体用于调用视频解码器对所述视频压缩子码流进行解码,得到所述重建异构混合拼接图。
在一些实施例中,所述码流还包括混合拼接信息子码流,所述解码单元21,还用于解码所述混合拼接信息子码流,得到混合拼接信息;
对应的,第一拆分单元22,具体用于根据所述混合拼接信息,对所述重建异构混合拼接图进行拆分,得到所述N个重建同构拼接图。
在一些实施例中,所述重建异构混合拼接图包括多属性重建异构混合拼接图和单属性重建异构混合拼接图。
在一些实施例中,所述N个重建同构拼接图包括多视点视频重建拼接图、点云重建拼接图和网格重建拼接图中的至少两个。
在一些实施例中,第一拆分单元22,具体用于根据所述混合拼接信息,对所述重建异构混合拼接图进行拆分,至少得到第一表达格式的单一属性重建拼接图和第二表达格式的单一属性重建拼接图,所述第一表达格式和所述第二表达格式均为多视点视频、点云和网络中的任意一个,且所述第一表达格式和所述第二表达格式不同。
在一些实施例中,若所述第一表达格式为多视点视频,所述第二表达格式为点云,则第一拆分单元22,具体用于若所述重建异构混合拼接图为重建异构混合纹理拼接图,则根据所述混合拼接信息,对所述重建异构混合纹理拼接图进行拆分,得到多视点视频纹理重建拼接图和点云纹理重建拼接图;若所述重建异构混合拼接图为重建异构混合几何和占用情况拼接图的混合拼接信息,则根据所述混合拼接信息,对所述重建异构混合几何和占用情况拼接图进行拆分,得到多视点视频几何重建拼接图、点云几何重建拼接图和点云占用情况重建拼接图。
在一些实施例中,所述混合拼接信息包括第一标志,所述第一标志用于指示所述异构混合拼接图中的第i个区域对应的表达格式类型,所述i为正整数。
在一些实施例中,所述第一拆分单元22,具体用于针对所述重建异构混合拼接图中的第i个区域,从所述混合拼接信息中获取所述第i个区域对应的第一标志;根据所述第i个区域对应的第一标志,将所述第i区域拆分为所述第i个区域对应的视觉媒体表达格式类型的重建拼接图。
在一些实施例中,若所述N个重建同构拼接图包括重建多视点视频拼接图和重建点云拼接图,所述第一拆分单元22,具体用于若所述第一标志的取值为第一数值,则将所述第i区域拆分为所述重建多视点视频拼接图;若所述第一标志的取值为第二数值,则将所述第i区域拆分为所述重建点云拼接图。
在一些实施例中,所述混合拼接信息包括第二标志,所述第二标志用于指示当前混合拼接图是否为异构混合拼接图。
在一些实施例中,所述从所述混合拼接信息中获取所述第i个区域对应的第一标志之前,所述第一拆分单元22,还用于从所述混合拼接信息中获取所述第二标志;若所述第二标志的取值为预设值时,则从所述混合拼接信息中获取所述第i个区域对应的第一标志,所述预设值用于指示当前混合拼接图为异构混合拼接图。
在一些实施例中,所述第一拆分单元22,还用于若所述第二标志的取值不为所述预设值时,则跳过从所述混合拼接信息中获取所述第i个区域对应的第一标志的步骤。
可选的,所述第二标志位于所述混合拼接信息的单元头中。
在一些实施例中,所述混合拼接信息包括第三标志,所述第三标志用于指示当前混合拼接图是否为异构混合拼接图,以及属于哪一种异构混合拼接图。
在一些实施例中,所述异构混合拼接图包括如下几种类型:异构混合占用情况拼接图、异构混合几何拼接图、异构混合属性拼接图、异构混合打包拼接图。
在一些实施例中,所述从所述混合拼接信息中获取所述第i个区域对应的第一标志之前,所述第一拆分单元22,还用于从所述混合拼接信息中获取所述第三标志;若所述第三标志的取值为第一预设值、第二预设值、第三预设值或第四预设值时,则从所述混合拼接信息中获取所述第i个区域对应的第一标志,所述第一预设值用于指示所述当前混合拼接图为所述异构混合占用情况拼接图,所述第二预设值用于指示所述当前混合拼接图为所述异构混合几何拼接图,所述第三预设值用于指示所述当前混合拼接图为所述异构混合属性拼接图,所述第四预设值用于指示所述当前混合拼接图为所述异构混合打包拼接图。
矮一些实施例中,所述第一拆分单元22,还用于若所述第三标志的取值不为所述第一预设值、第二预设值、第三预设值或第四预设值时,则跳过从所述混合拼接信息中获取所述第i个区域对应的第一标志的步骤。
可选的,所述第三标志位于所述混合拼接信息的单元头中。
可选的,所述N个视觉媒体内容为同一个三维空间中同时呈现的媒体内容。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图22所示的装置20可以对应于执行本申请实施例的解码端的预测方法中的相应主体,并且装置20中的各个单元的前述和其它操作和/或功能分别为了实现解码端的解码方法等各个方法中的相应流程,为了简洁,在此不再赘述。
上文中结合附图从功能单元的角度描述了本申请实施例的装置和系统。应理解,该功能单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过硬件和软件单元组合实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件单元组合执行完成。可选地,软件单元可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
图23是本申请实施例提供的电子设备的示意性框图。
如图23所示,该电子设备30可以为本申请实施例所述的视频编码器,或者视频解码器,该电子设备30可包括:
存储器33和处理器32,该存储器33用于存储计算机程序34,并将该程序代码34传输给该处理器32。换言之,该处理器32可以从存储器33中调用并运行计算机程序34,以实现本申请实施例中的方法。
例如,该处理器32可用于根据该计算机程序34中的指令执行上述方法200中的步骤。
在本申请的一些实施例中,该处理器32可以包括但不限于:
通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
在本申请的一些实施例中,该存储器33包括但不限于:
易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同 步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
在本申请的一些实施例中,该计算机程序34可以被分割成一个或多个单元,该一个或者多个单元被存储在该存储器33中,并由该处理器32执行,以完成本申请提供的方法。该一个或多个单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述该计算机程序34在该电子设备30中的执行过程。
如图23所示,该电子设备30还可包括:
收发器33,该收发器33可连接至该处理器32或存储器33。
其中,处理器32可以控制该收发器33与其他设备进行通信,具体地,可以向其他设备发送信息或数据,或接收其他设备发送的信息或数据。收发器33可以包括发射机和接收机。收发器33还可以进一步包括天线,天线的数量可以为一个或多个。
应当理解,该电子设备30中的各个组件通过总线系统相连,其中,总线系统除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。
本申请还提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被计算机执行时使得该计算机能够执行上述方法实施例的方法。或者说,本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。
本申请还提供了一种码流,该码流是根据上述编码方法生成的,可选的,该码流中包括上述第一标志,或者包括第一标志和第二标志。
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本申请中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。例如,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以该权利要求的保护范围为准。

Claims (45)

  1. 一种编码方法,其特征在于,包括:
    对多个视觉媒体内容进行处理,得到N个同构拼接图,所述多个视觉媒体内容中至少两个视觉媒体内容对应的表达格式不同,所述N为大于1的正整数;
    将所述N个同构拼接图进行拼接,生成异构混合拼接图;
    对所述异构混合拼接图进行编码,得到码流。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述异构混合拼接图进行编码,得到码流,包括:
    调用视频编码器,对所述异构混合拼接图进行视频编码,得到视频压缩子码流;
    对所述异构混合拼接图的混合拼接信息进行编码,得到混合拼接信息子码流;
    将所述视频压缩子码流和所述混合拼接信息子码流写入所述码流。
  3. 根据权利要求2所述的方法,其特征在于,所述异构混合拼接图包括多属性异构混合拼接图和单属性异构混合拼接图。
  4. 根据权利要求3所述的方法,其特征在于,所述N个同构拼接图包括多视点视频拼接图、点云拼接图和网格拼接图中的至少两个。
  5. 根据权利要求4所述的方法,其特征在于,所述将所述N个同构拼接图进行拼接,生成异构混合拼接图,包括:
    对至少第一表达格式的单一属性拼接图和第二表达格式的单一属性拼接图进行拼接,得到所述异构混合拼接图,所述第一表达格式和所述第二表达格式均为多视点视频、点云和网络中的任意一个,且所述第一表达格式和所述第二表达格式不同。
  6. 根据权利要求5所述的方法,其特征在于,若所述第一表达格式为多视点视频,所述第二表达格式为点云,则所述对至少第一表达格式的单一属性拼接图和第二表达格式的单一属性拼接图进行拼接,得到所述异构混合拼接图,包括:
    将多视点视频纹理拼接图和点云纹理拼接图进行拼接,得到异构混合纹理拼接图;或者,
    将多视点视频几何拼接图、点云几何拼接图和点云占用情况拼接图进行拼接,得到异构混合几何和占用情况拼接图。
  7. 根据权利要求2-6任一项所述的方法,其特征在于,所述混合拼接信息包括第一标志,所述第一标志用于指示所述异构混合拼接图中的第i个区域对应的表达格式类型,所述i为正整数。
  8. 根据权利要求7所述的方法,其特征在于,若所述N个拼接图包括多视点视频拼接图和点云拼接图,所述方法还包括:
    若所述第i个区域的拼接图为所述多视点视频拼接图,则将所述第一标志的值置为第一数值;
    若所述第i个区域的拼接图为所述点云拼接图,则将所述第一标志的值置为第二数值。
  9. 根据权利要求7所述的方法,其特征在于,所述混合拼接信息包括第二标志,所述第二标志用于指示当前混合拼接图是否为异构混合拼接图。
  10. 根据权利要求9所述的方法,其特征在于,所述方法还包括:
    若所述当前混合拼接图为所述异构混合拼接图,则将所述第二标志置为预设值。
  11. 根据权利要求10所述的方法,其特征在于,所述方法还包括:
    若确定所述第二标志的值为所述预设值,则在所述混合拼接信息中写入所述第一标志。
  12. 根据权利要求10所述的方法,其特征在于,所述方法还包括:
    若确定所述第二标志的值不为所述预设值时,则跳过在所述混合拼接信息中写入所述第一标志。
  13. 根据权利要求9所述的方法,其特征在于,所述第二标志位于所述混合拼接信息的单元头中。
  14. 根据权利要求7所述的方法,其特征在于,所述混合拼接信息包括第三标志,所述第三标志用于指示当前混合拼接图是否为异构混合拼接图,以及属于哪一种异构混合拼接图。
  15. 根据权利要求14所述的方法,其特征在于,所述异构混合拼接图包括如下几种类型:异构混合占用情况拼接图、异构混合几何拼接图、异构混合属性拼接图、异构混合打包拼接图。
  16. 根据权利要求15所述的方法,其特征在于,所述方法还包括:
    若所述当前混合拼接图为所述异构混合占用情况拼接图,则将所述第三标志置为第一预设值;
    若所述当前混合拼接图为所述异构混合几何拼接图,则将所述第三标志置为第二预设值;
    若所述当前混合拼接图为所述异构混合属性拼接图,则将所述第三标志置为第三预设值;
    若所述当前混合拼接图为所述异构混合打包拼接图,则将所述第三标志置为第四预设值。
  17. 根据权利要求14所述的方法,其特征在于,所述方法还包括:
    若确定所述第三标志指示所述当前混合拼接图为异构混合拼接图时,则在所述混合拼接信息中写入所述第一标志。
  18. 根据权利要求17所述的方法,其特征在于,所述方法还包括:
    若确定所述第三标志指示所述当前混合拼接图不是异构混合拼接图时,则跳过在所述混合拼接信息中写入所述第一标志。
  19. 根据权利要求14所述的方法,其特征在于,所述第三标志位于所述混合拼接信息的单元头中。
  20. 根据权利要求1-6任一项所述的方法,其特征在于,所述N个视觉媒体内容为同一个三维空间中同时呈现的媒体内容。
  21. 一种图像解码方法,其特征在于,包括:
    解码码流,得到重建异构混合拼接图;
    对所述重建异构混合拼接图进行拆分,得到N个重建同构拼接图,所述N为大于1的正整数;
    根据所述N个同构重建拼接图,得到多个重建视觉媒体内容,所述多个重建视觉媒体内容中至少两个重建视觉媒体内容对应的表达格式不同。
  22. 根据权利要求21所述的方法,其特征在于,所述码流包括视频压缩子码流,所述解码码流,得到重建异构混合拼接图,包括:
    调用视频解码器对所述视频压缩子码流进行解码,得到所述重建异构混合拼接图。
  23. 根据权利要求22所述的方法,其特征在于,所述码流还包括混合拼接信息子码流,所述方法还包括:
    解码所述混合拼接信息子码流,得到混合拼接信息;
    所述对所述重建异构混合拼接图进行拆分,得到N个重建同构拼接图,包括:
    根据所述混合拼接信息,对所述重建异构混合拼接图进行拆分,得到所述N个重建同构拼接图。
  24. 根据权利要求23所述的方法,其特征在于,所述重建异构混合拼接图包括多属性重建异构混合拼接图和单属性重建异构混合拼接图。
  25. 根据权利要求24所述的方法,其特征在于,所述N个重建同构拼接图包括多视点视频重建拼接图、点云重建拼接图和网格重建拼接图中的至少两个。
  26. 根据权利要求25所述的方法,其特征在于,所述根据所述混合拼接信息,对所述重建异构混合拼接图进行拆分,得到所述N个重建同构拼接图,包括:
    根据所述混合拼接信息,对所述重建异构混合拼接图进行拆分,至少得到第一表达格式的单一属性重建拼接图和第二表达格式的单一属性重建拼接图,所述第一表达格式和所述第二表达格式均为多视点视频、点云和网络中的任意一个,且所述第一表达格式和所述第二表达格式不同。
  27. 根据权利要求26所述的方法,其特征在于,若所述第一表达格式为多视点视频,所述第二表达格式为点云,则所述根据所述混合拼接信息,对所述重建异构混合拼接图进行拆分,至少得到第一表达格式的单一属性重建拼接图和第二表达格式的单一属性重建拼接图,包括:
    若所述重建异构混合拼接图为重建异构混合纹理拼接图,则根据所述混合拼接信息,对所述重建异构混合纹理拼接图进行拆分,得到多视点视频纹理重建拼接图和点云纹理重建拼接图;
    若所述重建异构混合拼接图为重建异构混合几何和占用情况拼接图的混合拼接信息,则根据所述混合拼接信息,对所述重建异构混合几何和占用情况拼接图进行拆分,得到多视点视频几何重建拼接图、点云几何重建拼接图和点云占用情况重建拼接图。
  28. 根据权利要求21-27任一项所述的方法,其特征在于,所述混合拼接信息包括第一标志,所述第一标志用于指示所述重建异构混合拼接图中的第i个区域对应的表达格式类型,所述i为正整数。
  29. 根据权利要求28所述的方法,其特征在于,所述根据所述混合拼接信息,对所述重建异构混合拼接图进行拆分,得到所述N个重建同构拼接图,包括:
    针对所述重建异构混合拼接图中的第i个区域,从所述混合拼接信息中获取所述第i个区域对应的第一标志;
    根据所述第i个区域对应的第一标志,将所述第i区域拆分为所述第i个区域对应的视觉媒体表达格式类型的重建拼接图。
  30. 根据权利要求29所述的方法,其特征在于,若所述N个重建同构拼接图包括重建多视点视频拼接图和重建点云拼接图,所述根据所述第i个区域对应的第一标志,将所述第i区域拆分为所述第i个区域对应的视觉媒体表达格式类型的重建拼接图,包括:
    若所述第一标志的取值为第一数值,则将所述第i区域拆分为所述重建多视点视频拼接图;
    若所述第一标志的取值为第二数值,则将所述第i区域拆分为所述重建点云拼接图。
  31. 根据权利要求29所述的方法,其特征在于,所述混合拼接信息包括第二标志,所述第二标志用于指示当前混合拼接图是否为异构混合拼接图。
  32. 根据权利要求31所述的方法,其特征在于,所述从所述混合拼接信息中获取所述第i个区域对应的第一标志之前,所述方法还包括:
    从所述混合拼接信息中获取所述第二标志;
    所述从所述混合拼接信息中获取所述第i个区域对应的第一标志,包括:
    若所述第二标志的取值为预设值时,则从所述混合拼接信息中获取所述第i个区域对应的第一标志,所述预设值用于指示当前混合拼接图为异构混合拼接图。
  33. 根据权利要求32所述的方法,其特征在于,所述方法还包括:
    若所述第二标志的取值不为所述预设值时,则跳过从所述混合拼接信息中获取所述第i个区域对应的第一标志的步骤。
  34. 根据权利要求31所述的方法,其特征在于,所述第二标志位于所述混合拼接信息的单元头中。
  35. 根据权利要求29所述的方法,其特征在于,所述混合拼接信息包括第三标志,所述第三标志用于指示当前混合拼接图是否为异构混合拼接图,以及属于哪一种异构混合拼接图。
  36. 根据权利要求35所述的方法,其特征在于,所述异构混合拼接图包括如下几种类型:异构混合占用情况拼接图、异构混合几何拼接图、异构混合属性拼接图、异构混合打包拼接图。
  37. 根据权利要求36所述的方法,其特征在于,所述从所述混合拼接信息中获取所述第i个区域对应的第一标志之前,所述方法还包括:
    从所述混合拼接信息中获取所述第三标志;
    所述从所述混合拼接信息中获取所述第i个区域对应的第一标志,包括:
    若所述第三标志的取值为第一预设值、第二预设值、第三预设值或第四预设值时,则从所述混合拼接信息中获取所述第i个区域对应的第一标志,所述第一预设值用于指示所述当前混合拼接图为所述异构混合占用情况拼接图,所述第二预设值用于指示所述当前混合拼接图为所述异构混合几何拼接图,所述第三预设值用于指示所述当前混合拼接图为所述异构混合属性拼接图,所述第四预设值用于指示所述当前混合拼接图为所述异构混合打包拼接图。
  38. 根据权利要求37所述的方法,其特征在于,所述方法还包括:
    若所述第三标志的取值不为所述第一预设值、第二预设值、第三预设值或第四预设值时,则跳过从所述混合拼接信息中获取所述第i个区域对应的第一标志的步骤。
  39. 根据权利要求35所述的方法,其特征在于,所述第三标志位于所述混合拼接信息的单元头中。
  40. 根据权利要求21-27任一项所述的方法,其特征在于,所述N个视觉媒体内容为同一个三维空间中同时呈现的媒体内容。
  41. 一种编码装置,其特征在于,包括:
    第一拼接单元,用于对多个视觉媒体内容分别进行处理,得到N个同构拼接图,所述多个视觉媒体内容中至少两个视觉媒体内容对应的表达格式不同,所述N为大于1的正整数;
    第二拼接单元,用于将所述N个同构拼接图进行拼接,生成异构混合拼接图;
    编码单元,用于对所述异构混合拼接图进行编码,得到码流。
  42. 一种解码装置,其特征在于,包括:
    解码单元,用于解码码流,得到重建异构混合拼接图;
    第一拆分单元,用于对所述重建异构混合拼接图进行拆分,得到N个重建同构拼接图,所述N为大于1的正整数;
    处理单元,用于根据所述N个重建同构拼接图,得到多个重建视觉媒体内容,所述多个重建视觉媒体内容中至少两个重建视觉媒体内容对应的表达格式不同。
  43. 一种电子设备,其特征在于,包括处理器和存储器;
    所示存储器用于存储计算机程序;
    所述处理器用于调用并运行所述存储器中存储的计算机程序,以实现上述权利要求1至20或21至40任一项所述的方法。
  44. 一种计算机可读存储介质,其特征在于,用于存储计算机程序;
    所述计算机程序使得计算机执行如上述权利要求1至20或21至40任一项所述的方法。
  45. 一种码流,其特征在于,所述码流是基于如上述权利要求1至20任一项所述的方法生成的。
PCT/CN2022/075260 2022-01-30 2022-01-30 编解码方法、装置、设备、及存储介质 WO2023142127A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/075260 WO2023142127A1 (zh) 2022-01-30 2022-01-30 编解码方法、装置、设备、及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/075260 WO2023142127A1 (zh) 2022-01-30 2022-01-30 编解码方法、装置、设备、及存储介质

Publications (1)

Publication Number Publication Date
WO2023142127A1 true WO2023142127A1 (zh) 2023-08-03

Family

ID=87470278

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/075260 WO2023142127A1 (zh) 2022-01-30 2022-01-30 编解码方法、装置、设备、及存储介质

Country Status (1)

Country Link
WO (1) WO2023142127A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116916172A (zh) * 2023-09-11 2023-10-20 腾讯科技(深圳)有限公司 一种远程控制方法和相关装置
CN117579843A (zh) * 2024-01-17 2024-02-20 淘宝(中国)软件有限公司 视频编码处理方法及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190208234A1 (en) * 2015-08-20 2019-07-04 Koninklijke Kpn N.V. Forming One Or More Tile Streams On The Basis Of One Or More Video Streams
CN112188180A (zh) * 2019-07-05 2021-01-05 浙江大学 一种处理子块图像的方法及装置
US20210090301A1 (en) * 2019-09-24 2021-03-25 Apple Inc. Three-Dimensional Mesh Compression Using a Video Encoder
CN112598572A (zh) * 2019-10-01 2021-04-02 浙江大学 一种筛选子块图像与处理单元的方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190208234A1 (en) * 2015-08-20 2019-07-04 Koninklijke Kpn N.V. Forming One Or More Tile Streams On The Basis Of One Or More Video Streams
CN112188180A (zh) * 2019-07-05 2021-01-05 浙江大学 一种处理子块图像的方法及装置
US20210090301A1 (en) * 2019-09-24 2021-03-25 Apple Inc. Three-Dimensional Mesh Compression Using a Video Encoder
CN112598572A (zh) * 2019-10-01 2021-04-02 浙江大学 一种筛选子块图像与处理单元的方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116916172A (zh) * 2023-09-11 2023-10-20 腾讯科技(深圳)有限公司 一种远程控制方法和相关装置
CN116916172B (zh) * 2023-09-11 2024-01-09 腾讯科技(深圳)有限公司 一种远程控制方法和相关装置
CN117579843A (zh) * 2024-01-17 2024-02-20 淘宝(中国)软件有限公司 视频编码处理方法及电子设备
CN117579843B (zh) * 2024-01-17 2024-04-02 淘宝(中国)软件有限公司 视频编码处理方法及电子设备

Similar Documents

Publication Publication Date Title
US11151742B2 (en) Point cloud data transmission apparatus, point cloud data transmission method, point cloud data reception apparatus, and point cloud data reception method
US11979605B2 (en) Attribute layers and signaling in point cloud coding
US11170556B2 (en) Apparatus for transmitting point cloud data, a method for transmitting point cloud data, an apparatus for receiving point cloud data and a method for receiving point cloud data
KR102292195B1 (ko) 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법
US20220159261A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
KR20180051594A (ko) 개선된 컬러 재맵핑 정보 보충 강화 정보 메시지 프로세싱
US11968393B2 (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method
CN115443652B (zh) 点云数据发送设备、点云数据发送方法、点云数据接收设备和点云数据接收方法
KR102659806B1 (ko) V-pcc용 스케일링 파라미터
WO2023142127A1 (zh) 编解码方法、装置、设备、及存储介质
TWI713354B (zh) 用於顯示器調適之色彩重映射資訊sei信息發信號
CN110754085A (zh) 用于非4:4:4格式视频内容的颜色重映射
KR20210134391A (ko) 포인트 클라우드 코딩을 위한 패치 데이터 유닛 코딩 및 디코딩
WO2023071557A1 (zh) 媒体文件封装方法、装置、设备及存储介质
US20230038928A1 (en) Picture partitioning-based coding method and device
JP7376211B2 (ja) 点群コーディングにおけるカメラパラメータのシグナリング
WO2024011386A1 (zh) 一种编解码方法、装置、编码器、解码器及存储介质
WO2023201504A1 (zh) 编解码方法、装置、设备及存储介质
WO2024077637A1 (zh) 一种编解码方法、装置、编码器、解码器及存储介质
US12002243B2 (en) Patch data unit coding and decoding for point-cloud coding
CN115428442B (zh) 点云数据发送装置、点云数据发送方法、点云数据接收装置和点云数据接收方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22922951

Country of ref document: EP

Kind code of ref document: A1