WO2024077806A1 - 一种编解码方法、装置、编码器、解码器及存储介质 - Google Patents

一种编解码方法、装置、编码器、解码器及存储介质 Download PDF

Info

Publication number
WO2024077806A1
WO2024077806A1 PCT/CN2023/071083 CN2023071083W WO2024077806A1 WO 2024077806 A1 WO2024077806 A1 WO 2024077806A1 CN 2023071083 W CN2023071083 W CN 2023071083W WO 2024077806 A1 WO2024077806 A1 WO 2024077806A1
Authority
WO
WIPO (PCT)
Prior art keywords
graph
syntax element
isomorphic
sub
spliced
Prior art date
Application number
PCT/CN2023/071083
Other languages
English (en)
French (fr)
Inventor
虞露
金峡钶
朱志伟
戴震宇
Original Assignee
浙江大学
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学, Oppo广东移动通信有限公司 filed Critical 浙江大学
Publication of WO2024077806A1 publication Critical patent/WO2024077806A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements

Definitions

  • the present application relates to the field of image processing technology, and in particular to a coding and decoding method, device, encoder, decoder and storage medium.
  • VR virtual reality
  • AR augmented reality
  • MR mixed reality
  • visual media objects with different expression formats may appear in the same scene.
  • the scene background and some characters and objects are expressed by video, and another part of the characters are expressed by 3D point cloud or 3D mesh.
  • the current coding and decoding technology encodes and decodes multi-viewpoint video, point cloud coding, and mesh grid separately, which requires calling a large number of codecs, making the coding and decoding costly.
  • the embodiments of the present application provide a coding and decoding method, device, encoder, decoder and storage medium.
  • the present application provides a decoding method, comprising: decoding a bit stream to obtain a splicing graph and splicing graph information; when the splicing graph is a heterogeneous mixed splicing graph, obtaining at least two isomorphic blocks and isomorphic block information according to the splicing graph and the splicing graph information; wherein different types of isomorphic blocks correspond to different visual media content expression formats and different isomorphic block information; when the splicing graph is an isomorphic splicing graph, obtaining an isomorphic block and isomorphic block information according to the splicing graph and the splicing graph information; and obtaining visual media content in at least two expression formats according to the isomorphic blocks and the isomorphic block information.
  • the present application provides an encoding method, comprising: processing visual media content in at least two expression formats to obtain at least two isomorphic blocks; splicing the at least two isomorphic blocks to obtain a splicing graph and splicing graph information, wherein when the splicing graph is a heterogeneous mixed splicing graph, the splicing graph includes at least two isomorphic blocks, and different types of isomorphic blocks correspond to different visual media content expression formats and different isomorphic block information; encoding the splicing graph and the splicing graph information to obtain a code stream.
  • the present application provides a decoding device, including:
  • a decoding unit configured to decode the bitstream to obtain a splicing graph and splicing graph information
  • the splitting unit is configured to obtain at least two types of isomorphic blocks and isomorphic block information according to the mosaic and the mosaic information when the mosaic is a heterogeneous mixed mosaic; wherein different types of isomorphic blocks correspond to different visual media content expression formats and different isomorphic block information;
  • the splitting unit is configured to obtain an isomorphic block and isomorphic block information according to the splicing graph and the splicing graph information when the splicing graph is an isomorphic splicing graph;
  • the processing unit is configured to obtain visual media content in at least two expression formats according to the isomorphic blocks and the isomorphic block information.
  • the present application provides an encoding device, applied to an encoder, comprising:
  • a processing unit configured to process the visual media content in at least two expression formats to obtain at least two isomorphic blocks
  • a splicing unit configured to splice the at least two isomorphic blocks to obtain a splicing graph and splicing graph information, wherein when the splicing graph is a heterogeneous mixed splicing graph, the splicing graph includes at least two isomorphic blocks, and different types of isomorphic blocks correspond to different visual media content expression formats and different isomorphic block information;
  • the encoding unit is configured to encode the splicing graph and the splicing graph information to obtain a code stream.
  • a decoder comprising a first memory and a first processor; the first memory stores a computer program executable on the first processor to execute the method in the above-mentioned first aspect or its various implementations.
  • an encoder comprising a second memory and a second processor; the second memory stores a computer program that can be run on the second processor to execute the method in the above-mentioned second aspect or its various implementation methods.
  • a coding and decoding system including an encoder and a decoder.
  • the encoder is used to execute the method in the second aspect or its respective implementations
  • the decoder is used to execute the method in the first aspect or its respective implementations.
  • a chip for implementing the method in any one of the first to second aspects or their respective implementations.
  • the chip includes: a processor for calling and running a computer program from a memory, so that a device equipped with the chip executes the method in any one of the first to second aspects or their respective implementations.
  • a computer-readable storage medium for storing a computer program, wherein the computer program enables a computer to execute the method of any one of the first to second aspects or any of their implementations.
  • a computer program product comprising computer program instructions, which enable a computer to execute the method in any one of the first to second aspects or their respective implementations.
  • a computer program which, when executed on a computer, enables the computer to execute the method in any one of the first to second aspects or in each of their implementations.
  • a code stream is provided, which is generated based on the encoding method of the second aspect.
  • isomorphic blocks of different expression formats are spliced into a heterogeneous mixed mosaic, and data of different expression formats are mixed and encoded, which can reduce the number of encoders and decoders called, reduce the implementation cost, and improve ease of use.
  • some high-level parameters of blocks of different expression formats can be unequal, so that heterogeneous data provides more appropriate high-level parameters, which can effectively improve the encoding efficiency, that is, reduce the bit rate or improve the quality of reconstructed multi-view video or point cloud video.
  • FIG1 is a schematic block diagram of a video encoding and decoding system according to an embodiment of the present application.
  • FIG2A is a schematic block diagram of a video encoder according to an embodiment of the present application.
  • FIG2B is a schematic block diagram of a video decoder according to an embodiment of the present application.
  • FIG3A is a diagram showing the organization and expression framework of multi-view video data
  • FIG3B is a schematic diagram of generating a stitched image of multi-view video data
  • FIG3C is a diagram showing the organization and expression framework of point cloud data
  • 3D to 3F are schematic diagrams of different types of point cloud data
  • FIG4 is a schematic diagram of encoding of a multi-view video
  • FIG5 is a schematic diagram of decoding of a multi-view video
  • FIG6 is a schematic diagram of a coding method flow chart provided by an embodiment of the present application.
  • FIG7 is a schematic diagram of a heterogeneous mixed splicing diagram provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of an isomorphic splicing graph provided by an embodiment of the present application.
  • FIG9 is a schematic flow chart of a decoding method provided in an embodiment of the present application.
  • FIG10 is a schematic diagram of a V3C bitstream structure provided in an embodiment of the present application.
  • FIG11 is a schematic block diagram of an encoding device provided in an embodiment of the present application.
  • FIG12 is a schematic block diagram of a decoding device provided in an embodiment of the present application.
  • FIG13 is a schematic block diagram of an encoder provided in an embodiment of the present application.
  • FIG14 is a schematic block diagram of a decoder provided in an embodiment of the present application.
  • FIG. 15 is a schematic diagram of the composition structure of a coding and decoding system provided in an embodiment of the present application.
  • the present application can be applied to the field of image coding and decoding, the field of video coding and decoding, the field of hardware video coding and decoding, the field of dedicated circuit video coding and decoding, the field of real-time video coding and decoding, etc.
  • the scheme of the present application can be combined with an audio and video coding standard (AVS), such as the H.264/audio and video coding (AVC) standard, the H.265/high efficiency video coding (HEVC) standard, and the H.266/versatile video coding (VVC) standard.
  • AVC H.264/audio and video coding
  • HEVC high efficiency video coding
  • VVC variatile video coding
  • the scheme of the present application can be combined with other proprietary or industry standards for operation, and the standards include ITU-TH.261, ISO/IEC MPEG-1 Visual, ITU-TH.262 or ISO/IEC MPEG-2 Visual, ITU-TH.263, ISO/IEC MPEG-4 Visual, ITU-TH.264 (also known as ISO/IEC MPEG-4 AVC), including scalable video coding (SVC) and multi-view video coding (MVC) extensions.
  • SVC scalable video coding
  • MVC multi-view video coding
  • the high-degree-of-freedom immersive coding system can be roughly divided into the following links according to the task line: data collection, data organization and expression, data encoding and compression, data decoding and reconstruction, data synthesis and rendering, and finally presenting the target data to the user.
  • the encoding involved in the embodiment of the present application is mainly video encoding and decoding.
  • the video encoding and decoding system involved in the embodiment of the present application is first introduced in conjunction with Figure 1.
  • FIG1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application. It should be noted that FIG1 is only an example, and the video encoding and decoding system of the embodiment of the present application includes but is not limited to that shown in FIG1.
  • the video encoding and decoding system 100 includes an encoding device 110 and a decoding device 120.
  • the encoding device is used to encode (which can be understood as compression) the video data to generate a code stream, and transmit the code stream to the decoding device.
  • the decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
  • the encoding device 110 of the embodiment of the present application can be understood as a device with a video encoding function
  • the decoding device 120 can be understood as a device with a video decoding function, that is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120, such as smartphones, desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, etc.
  • the encoding device 110 may transmit the encoded video data (eg, a code stream) to the decoding device 120 via the channel 130.
  • the channel 130 may include one or more media and/or devices capable of transmitting the encoded video data from the encoding device 110 to the decoding device 120.
  • the channel 130 includes one or more communication media that enable the encoding device 110 to transmit the encoded video data directly to the decoding device 120 in real time.
  • the encoding device 110 can modulate the encoded video data according to the communication standard and transmit the modulated video data to the decoding device 120.
  • the communication medium includes a wireless communication medium, such as a radio frequency spectrum, and optionally, the communication medium may also include a wired communication medium, such as one or more physical transmission lines.
  • the channel 130 includes a storage medium, which can store the video data encoded by the encoding device 110.
  • the storage medium includes a variety of locally accessible data storage media, such as optical disks, DVDs, flash memories, etc.
  • the decoding device 120 can obtain the encoded video data from the storage medium.
  • the channel 130 may include a storage server that can store the video data encoded by the encoding device 110.
  • the decoding device 120 can download the stored encoded video data from the storage server.
  • the storage server can store the encoded video data and transmit the encoded video data to the decoding device 120, such as a web server (e.g., for a website), a file transfer protocol (FTP) server, etc.
  • FTP file transfer protocol
  • the encoding device 110 includes a video encoder 112 and an output interface 113.
  • the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
  • the encoding device 110 may further include a video source 111 in addition to the video encoder 112 and the input interface 113 .
  • the video source 111 may include at least one of a video acquisition device (eg, a video camera), a video archive, a video input interface, and a computer graphics system, wherein the video input interface is used to receive video data from a video content provider, and the computer graphics system is used to generate video data.
  • a video acquisition device eg, a video camera
  • a video archive e.g., a video archive
  • a video input interface e.g., a computer graphics system
  • the video input interface is used to receive video data from a video content provider
  • the computer graphics system is used to generate video data.
  • the video encoder 112 encodes the video data from the video source 111 to generate a bitstream.
  • the video data may include one or more pictures or a sequence of pictures.
  • the bitstream contains the encoding information of the picture or the sequence of pictures in the form of a bitstream.
  • the encoding information may include the encoded picture data and associated data.
  • the associated data may include a sequence parameter set (SPS for short), a picture parameter set (PPS for short) and other syntax structures.
  • SPS sequence parameter set
  • PPS picture parameter set
  • the syntax structure refers to a set of zero or more syntax elements arranged in a specified order in the bitstream.
  • the video encoder 112 transmits the encoded video data directly to the decoding device 120 via the output interface 113.
  • the encoded video data may also be stored in a storage medium or a storage server for subsequent reading by the decoding device 120.
  • the decoding device 120 includes an input interface 121 and a video decoder 122. In some embodiments, the decoding device 120 may include a display device 123 in addition to the input interface 121 and the video decoder 122.
  • the input interface 121 includes a receiver and/or a modem.
  • the input interface 121 can receive the encoded video data through the channel 130 .
  • the video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .
  • the decoded video data is displayed on the display device 123.
  • the display device 123 may be integrated with the decoding device 120 or external to the decoding device 120.
  • the display device 123 may include a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
  • FIG1 is only an example, and the technical solution of the embodiment of the present application is not limited to FIG1 .
  • the technology of the present application can also be applied to unilateral video encoding or unilateral video decoding.
  • FIG2A is a schematic block diagram of a video encoder according to an embodiment of the present application. It should be understood that the video encoder 200 can be used for lossy compression of an image, or lossless compression of an image.
  • the lossless compression can be visually lossless compression or mathematically lossless compression.
  • the video encoder 200 can be applied to image data in luminance and chrominance (YCbCr, YUV) format.
  • the YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y represents brightness (Luma), Cb (U) represents blue chrominance, Cr (V) represents red chrominance, and U and V represent chrominance (Chroma) for describing color and saturation.
  • 4:2:0 means that every 4 pixels have 4 luminance components and 2 chrominance components (YYYYCbCr)
  • 4:2:2 means that every 4 pixels have 4 luminance components and 4 chrominance components (YYYYCbCrCbCr)
  • 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
  • the video encoder 200 reads video data, and for each frame of the video data, divides the frame into a number of coding tree units (CTUs).
  • CTB may be referred to as a "tree block", “largest coding unit” (LCU) or “coding tree block” (CTB).
  • Each CTU may be associated with a pixel block of equal size within the image.
  • Each pixel may correspond to a luminance (luminance or luma) sample and two chrominance (chrominance or chroma) samples. Therefore, each CTU may be associated with a luminance sample block and two chrominance sample blocks.
  • the size of a CTU is, for example, 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32, etc.
  • a CTU may be further divided into a number of coding units (CUs) for encoding, and a CU may be a rectangular block or a square block.
  • CU can be further divided into prediction unit (PU) and transform unit (TU), which makes encoding, prediction and transformation separate and more flexible in processing.
  • PU prediction unit
  • TU transform unit
  • CTU is divided into CU in quadtree mode
  • CU is divided into TU and PU in quadtree mode.
  • the video encoder and video decoder may support various PU sizes. Assuming that the size of a particular CU is 2N ⁇ 2N, the video encoder and video decoder may support PU sizes of 2N ⁇ 2N or N ⁇ N for intra-frame prediction, and support symmetric PUs of 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, N ⁇ N or similar sizes for inter-frame prediction. The video encoder and video decoder may also support asymmetric PUs of 2N ⁇ nU, 2N ⁇ nD, nL ⁇ 2N, and nR ⁇ 2N for inter-frame prediction.
  • the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, a loop filter unit 260, a decoded image buffer 270, and an entropy coding unit 280. It should be noted that the video encoder 200 may include more, fewer, or different functional components.
  • the current block may be referred to as a current coding unit (CU) or a current prediction unit (PU), etc.
  • the prediction block may also be referred to as a prediction image block or an image prediction block
  • the reconstructed image block may also be referred to as a reconstructed block or an image reconstructed image block.
  • the prediction unit 210 includes an inter-frame prediction unit 211 and an intra-frame estimation unit 212. Since there is a strong correlation between adjacent pixels in a frame of a video, the intra-frame prediction method is used in the video coding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Since there is a strong similarity between adjacent frames in a video, the inter-frame prediction method is used in the video coding and decoding technology to eliminate the temporal redundancy between adjacent frames, thereby improving the coding efficiency.
  • the inter-frame prediction unit 211 can be used for inter-frame prediction.
  • Inter-frame prediction can include motion estimation and motion compensation. It can refer to the image information of different frames.
  • Inter-frame prediction uses motion information to find a reference block from a reference frame, and generates a prediction block based on the reference block to eliminate temporal redundancy.
  • the frames used for inter-frame prediction can be P frames and/or B frames. P frames refer to forward prediction frames, and B frames refer to bidirectional prediction frames.
  • Inter-frame prediction uses motion information to find a reference block from a reference frame, and generates a prediction block based on the reference block.
  • the motion information includes a reference frame list where the reference frame is located, a reference frame index, and a motion vector.
  • the motion vector can be an integer pixel or a sub-pixel. If the motion vector is a sub-pixel, it is necessary to use interpolation filtering in the reference frame to make the required sub-pixel block.
  • the integer pixel or sub-pixel block in the reference frame found according to the motion vector is called a reference block.
  • Some technologies will directly use the reference block as a prediction block, and some technologies will generate a prediction block based on the reference block. Reprocessing the prediction block based on the reference block can also be understood as using the reference block as a prediction block and then processing the prediction block to generate a new prediction block.
  • the intra-frame estimation unit 212 only refers to the information of the same frame image to predict the pixel information in the current code image block to eliminate spatial redundancy.
  • the frame used for intra-frame prediction can be an I frame.
  • the intra-frame prediction modes used by HEVC are Planar, DC, and 33 angle modes, for a total of 35 prediction modes.
  • the intra-frame modes used by VVC are Planar, DC, and 65 angle modes, for a total of 67 prediction modes.
  • the residual unit 220 may generate a residual block of the CU based on the pixel blocks of the CU and the prediction blocks of the PUs of the CU. For example, the residual unit 220 may generate a residual block of the CU so that each sample in the residual block has a value equal to the difference between the following two: a sample in the pixel blocks of the CU and a corresponding sample in the prediction blocks of the PUs of the CU.
  • the transform/quantization unit 230 may quantize the transform coefficients.
  • the transform/quantization unit 230 may quantize the transform coefficients associated with the TUs of the CU based on a quantization parameter (QP) value associated with the CU.
  • QP quantization parameter
  • the video encoder 200 may adjust the degree of quantization applied to the transform coefficients associated with the CU by adjusting the QP value associated with the CU.
  • the inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct a residual block from the quantized transform coefficients.
  • the reconstruction unit 250 may add the samples of the reconstructed residual block to the corresponding samples of one or more prediction blocks generated by the prediction unit 210 to generate a reconstructed image block associated with the TU. By reconstructing the sample blocks of each TU of the CU in this manner, the video encoder 200 may reconstruct the pixel blocks of the CU.
  • the loop filter unit 260 is used to process the inverse transformed and inverse quantized pixels to compensate for distortion information and provide a better reference for subsequent coded pixels. For example, a deblocking filter operation may be performed to reduce the blocking effect of the pixel blocks associated with the CU.
  • the loop filter unit 260 includes a deblocking filter unit and a sample adaptive offset/adaptive loop filter (SAO/ALF) unit, wherein the deblocking filter unit is used to remove the block effect, and the SAO/ALF unit is used to remove the ringing effect.
  • SAO/ALF sample adaptive offset/adaptive loop filter
  • the decoded image buffer 270 may store the reconstructed pixel blocks.
  • the inter prediction unit 211 may use the reference image containing the reconstructed pixel blocks to perform inter prediction on PUs of other images.
  • the intra estimation unit 212 may use the reconstructed pixel blocks in the decoded image buffer 270 to perform intra prediction on other PUs in the same image as the CU.
  • the entropy encoding unit 280 may receive the quantized transform coefficients from the transform/quantization unit 230.
  • the entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy-encoded data.
  • FIG. 2B is a schematic block diagram of a video decoder according to an embodiment of the present application.
  • the video decoder 300 includes an entropy decoding unit 310, a prediction unit 320, an inverse quantization/transformation unit 330, a reconstruction unit 340, a loop filter unit 350, and a decoded image buffer 360. It should be noted that the video decoder 300 may include more, fewer, or different functional components.
  • the video decoder 300 may receive a bitstream.
  • the entropy decoding unit 310 may parse the bitstream to extract syntax elements from the bitstream. As part of parsing the bitstream, the entropy decoding unit 310 may parse the syntax elements in the bitstream that have been entropy encoded.
  • the prediction unit 320, the inverse quantization/transformation unit 330, the reconstruction unit 340, and the loop filter unit 350 may decode the video data according to the syntax elements extracted from the bitstream, that is, generate decoded video data.
  • the prediction unit 320 includes an inter-frame prediction unit 321 and an intra-frame estimation unit 322 .
  • the intra estimation unit 322 may perform intra prediction to generate a prediction block for the PU.
  • the intra estimation unit 322 may use an intra prediction mode to generate a prediction block for the PU based on pixel blocks of spatially neighboring PUs.
  • the intra estimation unit 322 may also determine the intra prediction mode for the PU according to one or more syntax elements parsed from the code stream.
  • the inter prediction unit 321 may construct a first reference image list (list 0) and a second reference image list (list 1) according to the syntax elements parsed from the code stream.
  • the entropy decoding unit 310 may parse the motion information of the PU.
  • the inter prediction unit 321 may determine one or more reference blocks of the PU according to the motion information of the PU.
  • the inter prediction unit 321 may generate a prediction block of the PU according to one or more reference blocks of the PU.
  • the inverse quantization/transform unit 330 may inversely quantize (ie, dequantize) the transform coefficients associated with the TU.
  • the inverse quantization/transform unit 330 may use the QP value associated with the CU of the TU to determine the degree of quantization.
  • the inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients in order to generate a residual block associated with the TU.
  • the reconstruction unit 340 uses the residual block associated with the TU of the CU and the prediction block of the PU of the CU to reconstruct the pixel block of the CU. For example, the reconstruction unit 340 may add samples of the residual block to corresponding samples of the prediction block to reconstruct the pixel block of the CU to obtain a reconstructed image block.
  • the loop filtering unit 350 may perform a deblocking filtering operation to reduce blocking effects of pixel blocks associated with a CU.
  • the video decoder 300 may store the reconstructed image of the CU in the decoded image buffer 360.
  • the video decoder 300 may use the reconstructed image in the decoded image buffer 360 as a reference image for subsequent prediction, or transmit the reconstructed image to a display device for presentation.
  • the basic process of video encoding and decoding is as follows: at the encoding end, a frame of image is divided into blocks, and for the current block, the prediction unit 210 uses intra-frame prediction or inter-frame prediction to generate a prediction block of the current block.
  • the residual unit 220 can calculate the residual block based on the original block of the prediction block and the current block, that is, the difference between the original block of the prediction block and the current block, and the residual block can also be called residual information.
  • the residual block can remove information that is not sensitive to the human eye through the transformation and quantization process of the transformation/quantization unit 230 to eliminate visual redundancy.
  • the residual block before transformation and quantization by the transformation/quantization unit 230 can be called a time domain residual block, and the time domain residual block after transformation and quantization by the transformation/quantization unit 230 can be called a frequency residual block or a frequency domain residual block.
  • the entropy coding unit 280 receives the quantized change coefficient output by the change quantization unit 230, and can entropy encode the quantized change coefficient and output a bit stream. For example, the entropy coding unit 280 can eliminate character redundancy according to the target context model and the probability information of the binary bit stream.
  • the entropy decoding unit 310 can parse the code stream to obtain the prediction information, quantization coefficient matrix, etc. of the current block.
  • the prediction unit 320 uses intra-frame prediction or inter-frame prediction to generate a prediction block of the current block based on the prediction information.
  • the inverse quantization/transformation unit 330 uses the quantization coefficient matrix obtained from the code stream to inversely quantize and inversely transform the quantization coefficient matrix to obtain a residual block.
  • the reconstruction unit 340 adds the prediction block and the residual block to obtain a reconstructed block.
  • the reconstructed blocks constitute a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or on the block to obtain a decoded image.
  • the encoding end also requires similar operations as the decoding end to obtain a decoded image.
  • the decoded image can also be called a reconstructed image, and the reconstructed image can be used as a reference frame for inter-frame prediction
  • the block division information determined by the encoder as well as the mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc., are carried in the bitstream when necessary.
  • the decoder parses the bitstream and determines the same block division information, prediction, transformation, quantization, entropy coding, loop filtering, etc. mode information or parameter information as the encoder by analyzing the existing information, thereby ensuring that the decoded image obtained by the encoder is the same as the decoded image obtained by the decoder.
  • the above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. The present application is applicable to the basic process of the video codec under the block-based hybrid coding framework, but is not limited to the framework and process.
  • the current encoding and decoding methods include at least the following two:
  • MPEG Motion Picture Experts Group
  • MIV Motion Picture Experts Group
  • point clouds point cloud video compression (Video based Point Cloud Compression, VPCC for short) technology is used for encoding and decoding.
  • FIG. 3A In order to reduce the transmission pixel rate while retaining scene information as much as possible, so as to ensure that there is enough information for rendering the target view, the solution adopted by MPEG-I is shown in Figure 3A.
  • a limited number of viewpoints are selected as basic viewpoints and the visible range of the scene is expressed as much as possible.
  • the basic viewpoint is transmitted as a complete image, and the redundant pixels between the remaining non-basic viewpoints and the basic viewpoint are removed, that is, only the effective information that is not repeatedly expressed is retained, and then the effective information is extracted as a sub-block image and reorganized with the basic viewpoint image to form a larger rectangular image, which is called a spliced image.
  • Figures 3A and 3B show the schematic process of generating a spliced image.
  • the spliced image is sent to the codec for compression and reconstruction, and the auxiliary data related to the splicing information of the sub-block image is also sent to the encoder to form a bit stream.
  • the encoding method of VPCC is to project the point cloud into a two-dimensional image or video, and convert the three-dimensional information into two-dimensional information encoding.
  • Figure 3C is the encoding block diagram of VPCC.
  • the code stream is roughly divided into four parts.
  • the geometric code stream is the code stream generated by the geometric depth map encoding, which is used to represent the geometric information of the point cloud;
  • the attribute code stream is the code stream generated by the texture map encoding, which is used to represent the attribute information of the point cloud;
  • the occupancy code stream is the code stream generated by the occupancy map encoding, which is used to indicate the valid area in the depth map and texture map;
  • These three types of videos are encoded and decoded using a video encoder, as shown in Figures 3D to 3F.
  • the auxiliary information code stream is the code stream generated by the auxiliary information encoding of the sub-block image, that is, the part related to the patch data unit in the V3C standard, which indicates the position and size
  • Method 2 Both multi-viewpoint video and point cloud are encoded and decoded using the frame packing technology in Visual Volumetric Video-based Coding (V3C).
  • V3C Visual Volumetric Video-based Coding
  • the encoding end includes the following steps:
  • Step 1 when encoding the acquired multi-view video, after some pre-processing, a multi-view video sub-block (patch) is generated, and then the multi-view video sub-blocks are organized to generate a multi-view video splicing graph.
  • a multi-view video is input into TIMV for packaging, and a multi-view video splicing image is output.
  • TIMV is a reference software for MIV.
  • the packaging in the embodiment of the present application can be understood as splicing.
  • the multi-view video mosaic map includes a multi-view video texture mosaic map and a multi-view video geometry mosaic map, that is, it only includes multi-view video sub-blocks.
  • Step 2 Input the multi-view video mosaic image into the frame packer, and output the multi-view video mixed mosaic image.
  • the multi-view video mixed mosaic image includes a multi-view video texture mixed mosaic image, a multi-view video geometry mixed mosaic image, and a multi-view video texture and geometry mixed mosaic image.
  • the multi-view video mosaic is frame packed to generate a multi-view video mixed mosaic, and each multi-view video mosaic occupies a region of the multi-view video mixed mosaic. Accordingly, a flag pin_region_type_id_minus2 is transmitted for each region in the bitstream. This flag records the information of whether the current region belongs to a multi-view video texture mosaic or a multi-view video geometric mosaic, and the information needs to be used at the decoding end.
  • Step 3 Use a video encoder to encode the multi-view video mixed splicing image to obtain a bit stream.
  • the decoding end includes the following steps:
  • Step 1 When decoding a multi-view video, the acquired code stream is input into a video decoder for decoding to obtain a reconstructed multi-view video mixed splicing image.
  • Step 2 input the reconstructed multi-view video mixed mosaic image into the frame depacketizer, and output the reconstructed multi-view video mosaic image.
  • the current region is a multi-view video texture mosaic map
  • the current region is split and output as a reconstructed multi-view video texture mosaic map.
  • pin_region_type_id_minus2 is V3C_GVD, it means that the current region is a multi-view video geometric mosaic, and the current region is split and output as a reconstructed multi-view video geometric mosaic.
  • Step 3 decode the reconstructed multi-view video mosaic to obtain the reconstructed multi-view video.
  • the multi-view video texture mosaic map and the multi-view video geometric mosaic map are decoded to obtain the reconstructed multi-view video.
  • the above uses multi-view video as an example to analyze and introduce the frame packing technology.
  • the frame packing encoding and decoding method for point cloud is basically the same as the above-mentioned multi-view video, and you can refer to it.
  • TMC a reference software for VPCC
  • the point cloud mixed mosaic map is stitched to obtain a point cloud code stream, which will not be repeated here.
  • the visual media contents in the multiple expression formats are encoded and decoded separately.
  • the current packaging technology is to compress the point cloud to form a point cloud compressed code stream (i.e., a V3C code stream), compress the multi-view video information to obtain a multi-view video compressed code stream (i.e., another V3C code stream), and then the system layer multiplexes the compressed code stream to obtain a fused three-dimensional scene multiplexed code stream.
  • the point cloud compressed code stream and the multi-view video compressed code stream are decoded separately.
  • the existing technology uses many codecs and the encoding and decoding cost is high.
  • data in different expression formats are mixed and encoded, which can reduce the number of encoders and decoders called, reduce the implementation cost, and improve ease of use.
  • some high-level parameters of blocks in different expression formats can be unequal, so that heterogeneous data can provide more appropriate high-level parameters, which can effectively improve the encoding efficiency, that is, reduce the bit rate or improve the quality of reconstructed multi-view video or point cloud video.
  • the video encoding method provided in the embodiment of the present application is introduced by taking the encoding end as an example.
  • FIG6 is a schematic diagram of a flow chart of an encoding method provided in an embodiment of the present application. As shown in FIG6 , the encoding method includes:
  • Step 601 Processing visual media contents in at least two expression formats to obtain at least two isomorphic blocks
  • VR virtual reality
  • AR augmented reality
  • MR mixed reality
  • visual media objects with different expression formats may appear in the same scene.
  • the scene background and some characters and objects are expressed by video, and another part of the characters are expressed by three-dimensional point clouds or three-dimensional meshes.
  • the visual media content includes visual media content in at least two expression formats, such as multi-viewpoint video, point cloud, and mesh.
  • the multi-viewpoint video may include multiple viewpoint videos and/or single viewpoint videos.
  • One isomorphic block corresponds to one expression format.
  • Different isomorphic blocks correspond to different expression formats.
  • the expression formats corresponding to at least two isomorphic blocks include at least two of the following: multi-viewpoint video, point cloud, and mesh.
  • each isomorphic block may include at least one isomorphic block having the same expression format.
  • a point cloud format isomorphic block includes one or more point cloud blocks
  • a multi-view video format isomorphic region includes one or more multi-view video blocks
  • a grid format isomorphic block includes one or more grid blocks.
  • the visual media content in the first expression format is processed to obtain isomorphic blocks in the first expression format; and the visual media content in the second expression format is processed to obtain isomorphic blocks in the second expression format.
  • the first expression format is one of multi-view video, point cloud, and grid
  • the second expression format is one of multi-view video, point cloud, and grid.
  • the first expression format and the second expression format are different expression formats.
  • a block may be a mosaic with a specific shape, such as a mosaic of a rectangular area with a specific length and/or height.
  • a block includes at least one sub-block, and at least one sub-block is spliced in order, such as from large to small according to the area of the sub-block, or from large to small according to the length and/or height of the sub-block, to obtain a block corresponding to the visual media content.
  • a block can be accurately mapped to a tile in a mosaic (atlas).
  • a block may also be referred to as a strip, that is, a point cloud block may also be referred to as a point cloud strip, a multi-view video block may also be referred to as a multi-view video strip, and a grid block may also be referred to as a grid strip.
  • each sub-tile in a block may have a patch ID to distinguish different sub-tiles in the same block.
  • the same block may include sub-tile 1 (patch 1), sub-tile 2 (patch 2), and sub-tile 3 (patch 3).
  • a homogeneous block refers to a block in which each sub-block has the same expression format.
  • each sub-block in a homogeneous block is a multi-view video sub-block, or a point cloud sub-block or other sub-block in the same expression format.
  • the expression format corresponding to each sub-block in a homogeneous block is the expression format corresponding to the homogeneous block.
  • homogeneous blocks may have a block identifier (tileID) to distinguish different blocks of the same expression format.
  • a point cloud block may include point cloud block 1 or point cloud block 2.
  • multiple visual media contents include point clouds and multi-viewpoint videos, and the point clouds are processed to obtain point cloud blocks, and point cloud block 1 includes point cloud sub-blocks 1 to sub-blocks 3; the multi-viewpoint videos are processed to obtain multi-viewpoint video blocks, and the multi-viewpoint video blocks include multi-viewpoint video sub-blocks 1 to sub-blocks 4.
  • a homogeneous block of the expression format is obtained.
  • at least two visual media contents at least two homogeneous blocks of the expression format are obtained.
  • the embodiment of the present application processes the at least two visual media contents, such as packaging (also known as splicing) processing, to obtain a block corresponding to each visual media content in the at least two visual media contents.
  • packaging also known as splicing
  • the sub-patches corresponding to the at least two visual media contents can be spliced to obtain a block.
  • the embodiment of the present application processes the at least two visual media contents separately, and the method of obtaining the blocks is not limited.
  • the visual media content includes visual media content in two expression formats, namely, multi-view video and point cloud.
  • the visual media content in at least two expression formats is processed to obtain at least two isomorphic blocks, including: after projecting and de-redundancy processing on the acquired multi-view video, non-repeated pixels are connected into multi-view video sub-blocks, and the multi-view video sub-blocks are spliced into multi-view video blocks; and the acquired point cloud is parallel projected, connected points in the projection plane are formed into point cloud sub-blocks, and the point cloud sub-blocks are spliced into point cloud blocks.
  • a limited number of viewpoints are selected as basic viewpoints and the visible range of the scene is expressed as much as possible.
  • the basic viewpoints are transmitted as complete images, and the redundant pixels between the remaining non-basic viewpoints and the basic viewpoints are removed, that is, only the valid information expressed without repetition is retained, and then the valid information is extracted into sub-block images and reorganized with the basic viewpoint images to form a larger strip-shaped image, which is called a multi-viewpoint video block.
  • the visual media content is media content presented simultaneously in the same three-dimensional space. In some embodiments, the visual media content is media content presented at different times in the same three-dimensional space. In some embodiments, the visual media content can also be media content in different three-dimensional spaces. That is, in the embodiments of the present application, no specific restrictions are made to the at least two visual media contents.
  • Step 602 splicing the at least two isomorphic blocks to obtain a splicing graph and splicing graph information, wherein when the splicing graph is a heterogeneous mixed splicing graph, the splicing graph includes at least two isomorphic blocks, and different isomorphic blocks correspond to different visual media content expression formats and different isomorphic block information;
  • the mosaic graph When the mosaic graph is a homogeneous mosaic graph, the mosaic graph includes a homogeneous block, and a homogeneous block corresponds to a visual media content expression format.
  • heterogeneous splicing is performed on isomorphic blocks of at least two expression formats to generate a heterogeneous mixed splicing graph and splicing graph information; isomorphic splicing is performed on isomorphic blocks of the same expression format to generate an isomorphic splicing graph and splicing graph information.
  • a heterogeneous mixed splicing graph is formed by splicing isomorphic blocks of at least two expression formats, and a isomorphic splicing graph is formed by splicing isomorphic blocks of one expression format.
  • isomorphic splicing is performed on isomorphic blocks in a first expression format to obtain a first isomorphic splicing graph and splicing graph information
  • isomorphic splicing is performed on isomorphic blocks in a second expression format to obtain a second isomorphic splicing graph and splicing graph information
  • isomorphic splicing is performed on isomorphic blocks in the first expression format and isomorphic blocks in the second expression format to obtain a heterogeneous mixed splicing graph and splicing graph information
  • isomorphic splicing is performed on isomorphic blocks in the first expression format to obtain a first isomorphic splicing graph and splicing graph information
  • isomorphic splicing is performed on isomorphic blocks in the first expression format and isomorphic blocks in the second expression format to obtain a heterogeneous mixed splicing graph and splicing graph information
  • isomorphic splicing is performed
  • a homogeneous mosaic graph may include one homogeneous block or multiple homogeneous blocks of the same expression format, and a heterogeneous mixed mosaic graph includes at least two homogeneous blocks of at least two expression formats.
  • the first expression format is one of multi-view video, point cloud, and grid
  • the second expression format is one of multi-view video, point cloud, and grid
  • the first expression format and the second expression format are different expression formats.
  • multi-view video block 1, multi-view video block 2, and point cloud block 1 are spliced to obtain a heterogeneous mixed mosaic graph.
  • the first expression format is a multi-view video
  • the second expression format is a point cloud.
  • a portion of the multi-view video blocks and a portion of the point cloud blocks are spliced into a heterogeneous mixed splicing image; another portion of the multi-view video blocks are spliced into a multi-view splicing image; and another portion of the point cloud blocks are spliced into a point cloud splicing image.
  • the mosaic information is used to reconstruct the mosaic.
  • the mosaic information includes at least mosaic type information, mosaic information of homogeneous blocks and homogeneous block information.
  • the mosaic information includes a first syntax element, and the first syntax element is used to indicate that the mosaic is a heterogeneous mixed mosaic or a homogeneous mosaic.
  • the first syntax element is a syntax element of a mosaic sequence parameter set ASPS and/or a syntax element of a mosaic frame parameter set AFPS. The ASPS and/or AFPS are parsed to determine the mosaic type.
  • the first syntax element includes a first sub-syntax element and a second sub-syntax element; determining the splicing graph as a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the first syntax element includes: if the value of the first sub-syntax element is equal to the value of the second sub-syntax element, determining the splicing graph as a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the values.
  • the first syntax element includes a first sub-syntax element and a second sub-syntax element; determining whether the splicing graph is a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the first syntax element includes: determining whether the splicing graph is a heterogeneous mixed splicing graph or a homogeneous splicing according to the value of the first sub-syntax element; determining whether the splicing graph is a heterogeneous mixed splicing graph or a homogeneous splicing according to the value of the second sub-syntax element; when the two judgment results are consistent, determining the splicing graph type.
  • the code stream needs to ensure the absolute consistency of the first sub-syntax element and the second sub-syntax element. Only when the two sub-syntax elements are consistent, the type of the splicing map can be determined. Exemplarily, the consistency of the two sub-syntax elements can be compared first, and then the type of the splicing map can be determined according to the value of one of the sub-syntax elements, or the type of the splicing map can be determined according to each sub-syntax element first, and the absolute consistency can be ensured by comparing whether the splicing map types are the same.
  • the determining whether the mosaic is a heterogeneous mixed mosaic or a homogeneous mosaic according to the value includes: if the value is a first preset value, the mosaic is determined to be a heterogeneous mixed mosaic; if the value is a second preset value, the mosaic is determined to be a homogeneous mosaic. That is, two values or two types of values can be set to identify a heterogeneous mixed mosaic and a homogeneous mosaic. Exemplarily, the first preset value is 1, and the second preset value is 0.
  • the first syntax element is not included in the splicing graph information, and the splicing graph is determined to be an isomorphic splicing graph.
  • the first syntax element is not included in the splicing graph information, and it is inferred that the first syntax element takes a second preset value.
  • the first sub-syntax element is not included in the splicing graph information, and the splicing graph is determined to be an isomorphic splicing graph, and the first sub-syntax element is inferred to take a second preset value; the second sub-syntax element is not included in the splicing graph information, and the splicing graph is determined to be an isomorphic splicing graph, and the second sub-syntax element is inferred to take a second preset value.
  • the determining whether the mosaic is a heterogeneous mixed mosaic or a homogeneous mosaic according to the value includes: if the value is a third preset value, the mosaic is determined to be a heterogeneous mixed mosaic of homogeneous blocks in a first expression format and a second expression format, wherein the first expression format and the second expression format are different expression formats; if the value is a fourth preset value, the mosaic is determined to be an homogeneous mosaic of homogeneous blocks in the first expression format; if the value is a fifth preset value, the mosaic is determined to be an homogeneous mosaic of homogeneous blocks in the second expression format.
  • multiple values can also be set to identify the expression formats of heterogeneous mixed mosaics and homogeneous mosaics, and even to identify which homogeneous blocks of which expression formats are included in the heterogeneous mixed mosaic.
  • the third preset value is 2
  • the fourth preset value is 1
  • the fifth preset value is 0.
  • the first sub-syntax element is a syntax element of a mosaic image sequence parameter set ASPS
  • the second sub-syntax element is a syntax element of a mosaic image frame parameter set AFPS.
  • the first sub-syntax element may be a newly added syntax element in ASPS, or the first sub-syntax element may be a syntax element obtained by a logical operation of at least two syntax elements in ASPS.
  • the second sub-syntax element may be a newly added syntax element in AFPS, or the second sub-syntax element may be a syntax element obtained by a logical operation of at least two syntax elements in ASPS.
  • the first sub-syntax element is a syntax element obtained by an AND operation of two syntax elements in ASPS
  • the second sub-syntax element is a syntax element obtained by an AND operation of two syntax elements in ASPS.
  • asps_heterogeneous_miv_extension_present_flag represents the first sub-syntax element
  • afps_heterogeneous_miv_extension_present_flag represents the second sub-syntax element
  • the method includes: parsing a first sub-syntax element in ASPS; determining whether the spliced graph is a heterogeneous mixed spliced graph or a homogeneous spliced graph according to the value of the first sub-syntax element; parsing a second sub-syntax element in AFPS; determining whether the spliced graph is a heterogeneous mixed spliced graph or a homogeneous spliced graph according to the value of the second sub-syntax element.
  • the method includes: determining that the bitstream includes at least two expression formats of visual media content in the bitstream, parsing the first sub-syntax element in ASPS; determining that the splicing graph is a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the value of the first sub-syntax element; parsing the second sub-syntax element in AFPS; determining that the splicing graph is a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the value of the second sub-syntax element.
  • the bitstream when determining that the bitstream includes at least two expression formats of visual media content in the bitstream, parsing the first sub-syntax element in ASPS, when the first sub-syntax element exists, analyzing the value of the first sub-syntax element to determine whether the splicing graph is a heterogeneous mixed splicing graph or a homogeneous splicing graph; when the first sub-syntax element does not exist or is 0, determining that the splicing graph is a homogeneous splicing graph.
  • the splicing graph is determined to be a homogeneous splicing graph.
  • the spliced graph information when the spliced graph is a heterogeneous mixed spliced graph, the spliced graph information further includes a second syntax element; and the expression format of the homogeneous blocks in the spliced graph is determined according to the second syntax element.
  • the method further includes: parsing a first sub-syntax element at an ASPS; determining whether the spliced graph is a heterogeneous mixed spliced graph or a homogeneous spliced graph according to the value of the first sub-syntax element; parsing a second sub-syntax element at an AFPS; determining whether the spliced graph is a heterogeneous mixed spliced graph or a homogeneous spliced graph according to the value of the second sub-syntax element; when the spliced graph is a heterogeneous mixed spliced graph, parsing the second syntax element at an AFPS, and determining the expression format of the homogeneous blocks in the spliced graph according to the second syntax element.
  • the spliced graph is a heterogeneous mixed spliced graph according to the first syntax element, further parsing the second syntax element of the homogeneous block in the heterogeneous mixed spliced graph to determine the homogeneous block type.
  • the expression format type corresponding to the i-th block in the spliced graph can be indicated by setting different values for the second syntax element.
  • the expression format of the isomorphic blocks in the spliced graph determined according to the second syntax element includes: when the value of the second syntax element of the i-th block is the sixth preset value, the expression format of the i-th block is determined to be the first expression format; when the value of the second syntax element of the i-th block is the seventh preset value, the expression format of the i-th block is determined to be the second expression format.
  • the first expression format is point cloud and the second expression format is multi-view video.
  • the sixth preset value is 0 and the seventh preset value is 1.
  • the expression format of the i-th block is a first expression format, and the i-th block is encoded using a coding method corresponding to the first expression format;
  • the expression format of the i-th block is a second expression format, and the i-th block is encoded using a coding method corresponding to the second expression format.
  • the first sub-syntax element and the second sub-syntax element are extended syntax elements for multiple views in a heterogeneous mixed spliced graph, and the second syntax element is a shared syntax element of multiple expression formats.
  • the second syntax element includes: a third sub-syntax element and a fourth sub-syntax element.
  • the first sub-syntax element is parsed in ASPS; the splicing graph is determined to be a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the value of the first sub-syntax element; the second sub-syntax element is parsed in AFPS; the splicing graph is determined to be a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the value of the second sub-syntax element; the third sub-syntax element is parsed in AFPS, and the splicing graph is determined to be a heterogeneous mixed splicing graph according to the value of the third sub-syntax element; the fourth sub-syntax element is parsed in AFPS, and the expression format of the homogeneous blocks in the splicing
  • the ASPS parses the first sub-syntax element; determines whether the splicing graph is a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the value of the first sub-syntax element; the AFPS parses the second sub-syntax element and the third sub-syntax element; determines whether the splicing graph is a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the value of the second sub-syntax element; when the splicing graph is determined to be a heterogeneous mixed splicing graph according to the value of the third sub-syntax element, the expression format of the homogeneous blocks in the splicing graph is determined according to the fourth sub-syntax element root.
  • the type of the spliced graph is indicated by setting different values for the third sub-syntax element
  • the type of the expression format corresponding to the i-th block in the spliced graph is indicated by setting different values for the fourth sub-syntax element.
  • the splicing graph is determined to be a isomorphic splicing graph.
  • the mosaic information when the mosaic is a heterogeneous mixed mosaic, includes at least two types of isomorphic block information, wherein isomorphic blocks in different expression formats correspond to different isomorphic block information.
  • the isomorphic block information includes reconstruction information of the isomorphic blocks and other supplementary information, which is used to decode and reconstruct the isomorphic blocks.
  • the isomorphic block information includes syntax elements of ASPS and syntax elements of AFPS; different isomorphic block information corresponds to different syntax elements of the ASPS and syntax elements of the AFPS.
  • ASPS and AFPS of isomorphic blocks of different expression formats are at least partially different, that is, the ASPS and AFPS of isomorphic blocks of different expression formats are not exactly the same.
  • the high-level information (ASPS and AFPS) of blocks of different expression formats in the heterogeneous mixed splicing graph are not correspondingly equal.
  • high-level parameters that are more suitable for heterogeneous mixed splicing graphs are achieved, which can effectively improve the coding efficiency, that is, reduce the bit rate or improve the quality of reconstructed multi-view video or point cloud video.
  • the isomorphic block when the isomorphic block is a multi-view video block, it corresponds to the first isomorphic block information, and when the isomorphic block is a point cloud block, it corresponds to the second isomorphic block information; the first isomorphic block information and the second isomorphic block information include the shared syntax elements of the ASPS parameter set and the syntax elements of the AFPS parameter set; the first isomorphic block information also includes the extended syntax elements of the ASPS parameter set and the extended syntax elements of the AFPS parameter set.
  • the extended syntax elements of the ASPS parameter set and the extended syntax elements of the AFPS parameter set are added to the multi-view video blocks of the heterogeneous mixed splicing image to represent the ASPS parameters and AFPS parameters that are not equal to the point cloud blocks, so as to improve the decoding efficiency of the multi-view video blocks. It should be noted that when decoding and reconstructing the point cloud blocks and the multi-view video blocks, these ASPS parameters and AFPS parameters may have the same functions but have unequal values.
  • the second homogeneous block information includes an extended syntax element of the ASPS parameter set and an extended syntax element of the AFPS parameter set, that is, the extended syntax element of the ASPS parameter set and the extended syntax element of the AFPS parameter set can also be added for the point cloud video block of the heterogeneous mixed mosaic to improve the decoding efficiency of the point cloud block.
  • Extended syntax elements of the first homogeneous block information and the second homogeneous block information are included in the first homogeneous block information.
  • the extended syntax elements of the ASPS parameter set include: ashm_geometry_3d_bit_depth_minus1 is used to indicate the bit depth of the geometric coordinates of the reconstructed geometric content. ashm_geometry_2d_bit_depth_minus1 is used to indicate the bit depth of the geometry when projected onto a 2D image. ashm_log2_max_atlas_frame_order_cnt_lsb_minus4 is used to determine the variable value used for the mosaic frame order count during the decoding process.
  • the extended syntax elements of the AFPS parameter set include: afhm_additional_lt_afoc_lsb_len is used to determine the value of the variable MaxLtAtlasFrmOrderCntLsbForMiv used in the reference mosaic frame list during the decoding process.
  • the ASPS parameters and AFPS parameters of the multi-view video blocks represented by these syntax elements are not completely equal to the ASPS parameters and AFPS parameters of the point cloud blocks.
  • the first homogeneous block information and the second homogeneous block information include the shared syntax elements of the ASPS parameter set and the syntax elements of the AFPS parameter set; the first homogeneous block information also includes the first extended syntax element of the ASPS parameter set, which is used to indicate the bit depth of the geometric coordinates of the reconstructed geometric content.
  • the naming of the syntax elements in the embodiments of the present application is mainly for the convenience of understanding and writing, and may be modified in actual applications and standard texts, but their semantic contents should be consistent or similar.
  • ashm_geometry_3d_bit_depth_minus1 and asps_geometry_3d_bit_depth_minus1_for_miv both represent the first extended syntax element, and the first extended syntax element can also be understood as a newly added syntax element.
  • the spliced graph information includes homogeneous block information, which is used to decode and reconstruct the homogeneous blocks in the spliced graph.
  • the heterogeneous mixed mosaic graph of the embodiments of the present application includes at least one of the following: a single-attribute heterogeneous mixed mosaic graph and a multi-attribute heterogeneous mixed mosaic graph.
  • a single-attribute heterogeneous mixed mosaic image refers to a heterogeneous mixed mosaic image in which the attribute information of all homogeneous blocks is the same.
  • a single-attribute heterogeneous mixed mosaic image only includes homogeneous blocks of attribute information, such as only multi-view video texture blocks and point cloud texture blocks.
  • a single-attribute heterogeneous mixed mosaic image only includes homogeneous blocks of geometric information, such as only multi-view video geometry blocks and point cloud geometry blocks.
  • a multi-attribute heterogeneous mixed mosaic graph refers to a heterogeneous mixed mosaic graph including at least two isomorphic blocks with different attribute information.
  • a multi-attribute heterogeneous mixed mosaic graph includes both isomorphic blocks of attribute information and isomorphic blocks of geometric information.
  • blocks under any one attribute or any two attributes of at least two of the point cloud, multi-view video, and mesh can be spliced into one graph to obtain a heterogeneous mixed mosaic graph. This application does not limit this.
  • a single attribute isomorphic block in a first expression format and a single attribute block in a second expression format are spliced to obtain a heterogeneous mixed splicing graph, wherein the first expression format and the second expression format are any one of multi-view video, point cloud and mesh, and the first expression format and the second expression format are different, and the attribute information of the first expression format and the second expression format are the same.
  • the single attribute homogeneous block of the multi-view video includes at least one of a multi-view video texture block and a multi-view video geometry block.
  • the single attribute isomorphic block of the point cloud includes at least one of a point cloud texture block, a point cloud geometry block, a point cloud occupancy status block, and the like.
  • the single attribute homogeneous block of the mesh includes at least one of a mesh texture block and a mesh geometry block.
  • At least two of the multi-view video geometry blocks, point cloud geometry blocks, and mesh geometry blocks are spliced into one image to obtain a heterogeneous mixed spliced image.
  • the heterogeneous mixed spliced image is called a single attribute heterogeneous mixed spliced image.
  • at least two of the multi-view video texture blocks, point cloud texture blocks, and mesh texture blocks are spliced into one image to obtain a heterogeneous mixed spliced image.
  • the heterogeneous mixed spliced image is called a single attribute heterogeneous mixed spliced image.
  • a multi-attribute isomorphic block in a first expression format and a multi-attribute isomorphic block in a second expression format are spliced to obtain a heterogeneous mixed splicing graph, wherein the first expression format and the second expression format are any one of multi-viewpoint video, point cloud and mesh, and the first expression format and the second expression format are different, and the attribute information of the first expression format and the second expression format are not completely the same.
  • a multi-view video texture block is spliced with at least one of a point cloud geometry block and a mesh geometry block in one image to obtain a heterogeneous mixed splicing image.
  • a multi-view video geometry block is spliced with at least one of a point cloud texture block and a mesh texture block in one image to obtain a heterogeneous mixed splicing image.
  • a point cloud texture block is spliced with at least one of a multi-view video geometry block and a mesh geometry block in one image to obtain a heterogeneous mixed splicing image.
  • a point cloud geometry block is spliced with at least one of a multi-view video texture block and a mesh texture block in one image to obtain a heterogeneous mixed splicing image.
  • a point cloud geometry block, a multi-view video texture block and a multi-view video texture block are spliced in one image to obtain a heterogeneous mixed splicing image.
  • a point cloud geometry block, a point cloud texture block and a multi-view video texture block and a multi-view video texture block are spliced in one image to obtain a heterogeneous mixed splicing image.
  • the obtained heterogeneous mixed mosaic graph is called a multi-attribute heterogeneous mixed mosaic graph.
  • the isomorphic mosaic graph of the embodiment of the present application includes at least one of the following: a single-attribute isomorphic mosaic graph and a multi-attribute isomorphic mosaic graph.
  • the first attribute isomorphic blocks of the first expression format are spliced to obtain an isomorphic mosaic graph.
  • the first attribute isomorphic blocks and the second attribute isomorphic blocks of the first expression format are spliced to obtain an isomorphic mosaic graph.
  • a single attribute isomorphic mosaic refers to an isomorphic mosaic including all isomorphic blocks with the same expression format and the same attribute information.
  • a single attribute isomorphic mosaic only includes isomorphic blocks with attribute information in a certain expression format, such as a single attribute isomorphic mosaic only includes multi-view video texture blocks, or only includes point cloud texture blocks.
  • a single attribute isomorphic mosaic only includes isomorphic blocks with geometric information, such as only multi-view video geometry blocks, or only includes point cloud geometry blocks.
  • a multi-attribute isomorphic mosaic graph refers to an isomorphic mosaic graph including at least two isomorphic blocks with the same expression format but different attribute information.
  • a multi-attribute isomorphic mosaic graph includes both isomorphic blocks of attribute information and isomorphic blocks of geometric information.
  • a multi-attribute isomorphic mosaic graph includes a multi-view video texture block and a multi-view video collection block.
  • a multi-attribute isomorphic mosaic graph includes a point cloud geometry block and a point cloud texture block. As shown in FIG8 , a multi-attribute isomorphic mosaic graph includes a point cloud texture block 1, a point cloud geometry block 1, and a point cloud geometry block 2.
  • the splicing graph information may further include syntax elements, and the splicing graph is determined to be a single-attribute heterogeneous mixed splicing graph, a multi-attribute heterogeneous mixed splicing graph, a single-attribute homogeneous splicing graph, or a multi-attribute homogeneous splicing graph according to the syntax elements.
  • Step 603 Encode the splicing graph and the splicing graph information to obtain a bit stream.
  • the parameter set sub-codestream of the codestream includes a third syntax element, and the codestream corresponding to the visual media content including at least one expression format in the codestream is determined according to the third syntax element.
  • the parameter set of the codestream is V3C_VPS
  • the third syntax element may be ptl_profile_toolset_idc in V3C_VPS.
  • the codestream corresponding to the visual media content in at least one expression format is indicated by setting the third syntax element to different values.
  • the method of determining the codestream corresponding to the visual media content in at least one expression format in the codestream according to the third syntax element includes: the third syntax element is a first value, and the codestream includes both the codestream corresponding to the visual media content in the first expression format and the codestream corresponding to the visual media content in the second expression format; the third syntax element is a second value, and the codestream includes the codestream corresponding to the visual media content in the first expression format; the third syntax element is a third value, and the codestream includes the codestream corresponding to the visual media content in the second expression format.
  • the third syntax element when the third syntax element is set to the first value, the first value is used to indicate that the code stream contains both multi-view video code stream and point cloud code stream.
  • the third syntax element when the third syntax element is set to the second value, the second value is used to indicate that the code stream only contains point cloud code stream.
  • the third syntax element is set to the third value, and the third value is used to indicate that the code stream only contains multi-view video code stream.
  • the bitstream includes a video compression substream and a mosaic information substream.
  • the encoding of the mosaic and the mosaic information to obtain the bitstream includes: encoding the mosaic to obtain a video compression substream; encoding the mosaic information of the mosaic to obtain a mosaic information substream; and synthesizing the video compression substream and the mosaic information substream into the bitstream.
  • encoding the mosaic image and the mosaic image information to obtain a code stream includes: if the expression format of the i-th block is a first expression format, determining that the sub-image block in the i-th block is encoded using a coding standard corresponding to the first expression format, and obtaining a code stream corresponding to the visual media content in the first expression format; if the expression format of the i-th block is a second expression format, determining that the sub-image block in the i-th block is encoded using a coding standard corresponding to the second expression format, and obtaining a code stream corresponding to the visual media content in the second expression format.
  • the second syntax element of the i-th block is known to be 1, it is determined that the current sub-block is encoded using the multi-view video coding standard. If the second syntax element of the i-th block is known to be 0, it is determined that the current sub-block is encoded using the point cloud coding standard.
  • the video encoder used for performing video encoding on the heterogeneous mixed mosaic and the homogeneous mosaic to obtain the video compression sub-stream may be the video encoder shown in FIG2A above. That is, the embodiment of the present application takes the heterogeneous mixed mosaic or the homogeneous mosaic as a frame image, first performs block division, then uses intra-frame or inter-frame prediction to obtain the predicted value of the coding block, subtracts the predicted value of the coding block from the original value to obtain the residual value, and transforms and quantizes the residual value to obtain the video compression sub-stream.
  • the splice information corresponding to each splice is generated.
  • the splice information is encoded to obtain a splice information sub-code stream.
  • the splice information includes a first syntax element for indicating the type of the splice, and a second syntax element for expressing the format of each isomorphic block in the splice.
  • the embodiment of the present application does not limit the way of encoding the splice information, for example, it is compressed using a conventional data compression encoding method such as equal-length encoding or variable-length encoding.
  • the video compression sub-stream and the mosaic information sub-stream are written into the same stream to obtain the final stream. That is to say, the embodiment of the present application not only supports heterogeneous source formats such as video, point cloud, and mesh, but also homogeneous source formats in the same compressed stream.
  • the method further includes: encoding the parameter set of the code stream to obtain a code stream parameter set sub-code stream.
  • the encoder combines the video compression sub-code stream, the splicing graph information sub-code stream and the parameter set sub-code stream into a code stream.
  • the parameter set sub-code stream of the code stream includes a third syntax element, and the code stream corresponding to the visual media content in at least one expression format is determined according to the third syntax element. That is, the encoder sends the third syntax element to indicate whether the code stream contains at least two expression formats of visual media content at the same time.
  • the encoder processes the visual media content in one expression format to obtain a homogeneous block, and splices the homogeneous block to obtain a homogeneous splicing graph.
  • the encoder obtains at least two homogeneous blocks from the visual media content in at least two expression formats, and splices the at least two homogeneous blocks to obtain a homogeneous splicing graph and/or a heterogeneous mixed splicing graph.
  • the method includes: performing isomorphic splicing on isomorphic blocks of a first expression format to obtain a first isomorphic splicing graph, and performing isomorphic splicing on isomorphic blocks of a second expression format to obtain a second isomorphic splicing graph; or, performing heterogeneous splicing on isomorphic blocks of the first expression format and isomorphic blocks of the second expression format to obtain a heterogeneous mixed splicing graph; or, performing isomorphic splicing on isomorphic blocks of the first expression format to obtain a first isomorphic splicing graph, and performing heterogeneous splicing on isomorphic blocks of the first expression format and isomorphic blocks of the second expression format to obtain a heterogeneous mixed splicing graph; or, performing isomorphic splicing on isomorphic blocks of the second expression format to obtain a heterogeneous mixed splicing graph; or, performing isomorphic
  • the visual media content is first processed separately (i.e., packaged) to obtain multiple isomorphic blocks. Then, at least two isomorphic blocks with different expression formats are spliced into a heterogeneous mixed splicing graph, and at least one isomorphic block with exactly the same expression format is spliced into an isomorphic splicing graph.
  • the heterogeneous mixed splicing graph and the isomorphic splicing graph are encoded to obtain a video compression sub-stream, and the splicing graph information is encoded to obtain a splicing information sub-stream; the video compression stream and the splicing information stream are synthesized into a compressed stream.
  • some high-level parameters of blocks in different expression formats may be unequal, which can retain more effective information of blocks in different expression formats, improve the synthesis quality of the image, and improve the overall efficiency of bit rate-quality.
  • the above introduces the encoding method of the present application by taking the encoding end as an example.
  • the following describes the video decoding method provided in the embodiment of the present application by taking the decoding end as an example.
  • FIG9 is a schematic flow chart of a decoding method provided in an embodiment of the present application. As shown in FIG9 , the decoding method in the embodiment of the present application includes:
  • Step 901 Decode the bitstream to obtain a splicing graph and splicing graph information
  • the bitstream includes a video compression substream and a splicing graph information substream
  • the decoding of the bitstream to obtain the splicing graph and the splicing graph information includes: extracting the splicing graph information substream and the video compression substream respectively; decoding the video compression substream to obtain the splicing graph; decoding the splicing graph information substream to obtain the splicing graph information.
  • the video compression sub-stream is decoded to obtain a heterogeneous mixed splicing map, a multi-view splicing map and a point cloud splicing map;
  • the splicing map information sub-stream is decoded to obtain heterogeneous mixed splicing map information, multi-view splicing map information and point cloud splicing map information.
  • the code stream also includes a parameter set sub-code stream; the parameter set sub-code stream includes a third syntax element; and the code stream corresponding to the visual media content in at least one expression format is determined in the code stream according to the third syntax element. That is to say, during the decoding process, the code stream is first determined according to the third syntax element to determine the code stream corresponding to the visual media content in several expression formats contained in the code stream.
  • the splicing graph When it is determined according to the third syntax element of the V3C code stream layer that the code stream contains visual media content in one expression format, it is determined that all the splicing graphs are isomorphic splicing graphs; when it is determined according to the third syntax element that the code stream contains visual media content in two expression formats, it is determined that the splicing graph may contain heterogeneous mixed splicing graphs, and it is necessary to further determine whether the splicing graph is an isomorphic splicing graph or a heterogeneous mixed splicing graph.
  • the type of the splicing graph is determined according to the first syntax element, and when it is determined that the splicing graph is a heterogeneous mixed splicing graph, the type of the isomorphic block is determined according to the second syntax element.
  • the codestream corresponding to the visual media content in at least one expression format is indicated by setting the third syntax element to different values.
  • the method of determining the codestream corresponding to the visual media content in at least one expression format in the codestream according to the third syntax element includes: the third syntax element is a first value, and the codestream includes both the codestream corresponding to the visual media content in the first expression format and the codestream corresponding to the visual media content in the second expression format; the third syntax element is a second value, and the codestream includes the codestream corresponding to the visual media content in the first expression format; the third syntax element is a third value, and the codestream includes the codestream corresponding to the visual media content in the second expression format.
  • the third syntax element when the third syntax element is set to the first value, the first value is used to indicate that the code stream contains both multi-view video code stream and point cloud code stream.
  • the third syntax element when the third syntax element is set to the second value, the second value is used to indicate that the code stream only contains point cloud code stream.
  • the third syntax element is set to the third value, and the third value is used to indicate that the code stream only contains multi-view video code stream.
  • Step 902 when the mosaic is a heterogeneous mixed mosaic, obtaining at least two types of isomorphic blocks and isomorphic block information according to the mosaic and the mosaic information; wherein different types of isomorphic blocks correspond to different visual media content expression formats and different isomorphic block information;
  • the mosaic information is used to reconstruct the mosaic.
  • the mosaic information includes at least mosaic type information, mosaic information of homogeneous blocks and homogeneous block information.
  • the mosaic information includes a first syntax element, and the mosaic is determined to be a heterogeneous mixed mosaic or a homogeneous mosaic according to the first syntax element.
  • the first syntax element is a syntax element of a mosaic sequence parameter set ASPS and a syntax element of a mosaic frame parameter set AFPS. The ASPS and AFPS are parsed to determine the type of the mosaic.
  • the mosaic when the mosaic is a heterogeneous mixed mosaic, the mosaic is split to obtain at least two isomorphic blocks; and according to the expression format of the at least two isomorphic blocks, isomorphic block information corresponding to the at least two isomorphic blocks is obtained from the mosaic information.
  • the heterogeneous mixed mosaic is split according to the heterogeneous mixed mosaic information, and the reconstructed multi-view video isomorphic blocks and isomorphic block information, as well as the reconstructed point cloud isomorphic blocks and isomorphic block information are output.
  • the first syntax element includes a first sub-syntax element and a second sub-syntax element; determining the splicing graph as a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the first syntax element includes: if the value of the first sub-syntax element is equal to the value of the second sub-syntax element, determining the splicing graph as a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the values.
  • the code stream needs to ensure the absolute consistency of the first sub-syntax element and the second sub-syntax element.
  • the type of the splice can be determined only when the two sub-syntax elements are consistent. Exemplarily, the consistency of the two sub-syntax elements can be compared first, and then the type of the splice can be determined according to the value of one of the sub-syntax elements, or the type of the splice can be determined according to each sub-syntax element first, and the absolute consistency can be ensured by comparing whether the splice types are the same.
  • the determining, according to the value, whether the mosaic is a heterogeneous mixed mosaic or a homogeneous mosaic includes: if the value is a first preset value, the mosaic is determined to be a heterogeneous mixed mosaic; if the value is a second preset value, the mosaic is determined to be a homogeneous mosaic. That is, two values or two types of values can be set to identify a heterogeneous mixed mosaic and a homogeneous mosaic. Exemplarily, the first preset value is 1, and the second preset value is 0.
  • the first syntax element is not included in the splicing graph information, and the splicing graph is determined to be an isomorphic splicing graph.
  • the first syntax element is not included in the splicing graph information, and it is inferred that the first syntax element takes a second preset value.
  • the first sub-syntax element is not included in the splicing graph information, and the splicing graph is determined to be an isomorphic splicing graph, and the first sub-syntax element is inferred to take a second preset value; the second sub-syntax element is not included in the splicing graph information, and the splicing graph is determined to be an isomorphic splicing graph, and the second sub-syntax element is inferred to take a second preset value.
  • the determining whether the mosaic is a heterogeneous mixed mosaic or a homogeneous mosaic according to the value includes: if the value is a third preset value, the mosaic is determined to be a heterogeneous mixed mosaic of homogeneous blocks in a first expression format and a second expression format, wherein the first expression format and the second expression format are different expression formats; if the value is a fourth preset value, the mosaic is determined to be an homogeneous mosaic of homogeneous blocks in the first expression format; if the value is a fifth preset value, the mosaic is determined to be an homogeneous mosaic of homogeneous blocks in the second expression format.
  • multiple values can also be set to identify the expression formats of heterogeneous mixed mosaics and homogeneous mosaics, and even to identify which homogeneous blocks of which expression formats are included in the heterogeneous mixed mosaic.
  • the third preset value is 2
  • the fourth preset value is 1
  • the fifth preset value is 0.
  • the first sub-syntax element is a syntax element of an ASPS (joint picture sequence parameter set)
  • the second sub-syntax element is a syntax element of an AFPS (joint picture frame parameter set).
  • asps_heterogeneous_miv_extension_present_flag represents the first sub-syntax element
  • afps_heterogeneous_miv_extension_present_flag represents the second sub-syntax element.
  • the mosaic information when the mosaic is a heterogeneous mixed mosaic, the mosaic information further includes a second syntax element; and the expression format of the homogeneous blocks in the mosaic is determined according to the second syntax element.
  • the second syntax element of the homogeneous block is further parsed to determine the homogeneous block type.
  • the second syntax element is a syntax element of the mosaic frame parameter set AFPS.
  • the expression format type corresponding to the i-th block in the spliced graph can be indicated by setting different values for the second syntax element.
  • the expression format of the isomorphic blocks in the spliced graph determined according to the second syntax element includes: when the value of the second syntax element of the i-th block is the sixth preset value, the expression format of the i-th block is determined to be the first expression format; when the value of the second syntax element of the i-th block is the seventh preset value, the expression format of the i-th block is determined to be the second expression format.
  • the first expression format is point cloud and the second expression format is multi-view video.
  • the sixth preset value is 0 and the seventh preset value is 1.
  • the expression format of the i-th block is a first expression format, and the i-th block is decoded using a decoding method corresponding to the first expression format;
  • the expression format of the i-th block is a second expression format, and the i-th block is decoded using a decoding method corresponding to the second expression format.
  • the mosaic information when the mosaic is a heterogeneous mixed mosaic, includes at least two types of isomorphic block information, wherein isomorphic blocks in different expression formats correspond to different isomorphic block information.
  • the isomorphic block information includes reconstruction information of the isomorphic blocks and other supplementary information, which is used to decode and reconstruct the isomorphic blocks.
  • the isomorphic block information includes syntax elements of ASPS and syntax elements of AFPS; different isomorphic block information corresponds to different syntax elements of the ASPS and syntax elements of the AFPS.
  • ASPS and AFPS of isomorphic blocks of different expression formats are at least partially different, that is, the ASPS and AFPS of isomorphic blocks of different expression formats are not exactly the same.
  • the high-level information (ASPS and AFPS) of blocks of different expression formats in the heterogeneous mixed splicing graph are not correspondingly equal.
  • high-level parameters that are more suitable for heterogeneous mixed splicing graphs are achieved, which can effectively improve the coding efficiency, that is, reduce the bit rate or improve the quality of reconstructed multi-view video or point cloud video.
  • the isomorphic block when the isomorphic block is a multi-view video block, it corresponds to the first isomorphic block information; when the isomorphic block is a point cloud block, it corresponds to the second isomorphic block information; the first isomorphic block information and the second isomorphic block information include the shared syntax elements of the ASPS parameter set and the syntax elements of the AFPS parameter set; the first isomorphic block information also includes the extended syntax elements of the ASPS parameter set and the extended syntax elements of the AFPS parameter set.
  • the expression format is multi-view video, point cloud or grid.
  • One isomorphic block corresponds to one expression format.
  • Different isomorphic blocks correspond to different expression formats.
  • the expression formats corresponding to at least two isomorphic blocks include at least two of the following: multi-view video, point cloud, grid.
  • each isomorphic block may include at least one isomorphic block with the same expression format.
  • the isomorphic block in the point cloud format includes one or more point cloud blocks
  • the isomorphic area in the multi-view video format includes one or more multi-view video blocks
  • the isomorphic block in the grid format includes one or more grid blocks.
  • Step 903 when the mosaic is a homogeneous mosaic, obtaining a homogeneous block and homogeneous block information according to the mosaic and the mosaic information;
  • the mosaic when the mosaic is an isomorphic mosaic, the mosaic is split to obtain an isomorphic block; and the isomorphic block information is obtained from the mosaic information.
  • the isomorphic mosaic is split according to the isomorphic mosaic information of the multi-view video, and the reconstructed multi-view video isomorphic blocks and isomorphic block information are output.
  • the isomorphic mosaic is split according to the isomorphic mosaic information of the point cloud, and the reconstructed point cloud isomorphic blocks and isomorphic block information are output.
  • the spliced graph information includes homogeneous block information, which is used to decode and reconstruct the homogeneous blocks in the spliced graph.
  • the heterogeneous mixed mosaic graph of the embodiments of the present application includes at least one of the following: a single-attribute heterogeneous mixed mosaic graph and a multi-attribute heterogeneous mixed mosaic graph.
  • Step 904 Obtain visual media content in at least two expression formats according to the isomorphic blocks and the isomorphic block information.
  • the method of obtaining visual media contents in at least two expression formats based on the isomorphic blocks and the isomorphic block information includes: if the expression format of the ith block is a first expression format, determining that a sub-block in the ith block is decoded and reconstructed using a decoding method corresponding to the first expression format to obtain visual media content in the first expression format; if the expression format of the ith block is a second expression format, determining that a sub-block in the ith block is decoded and reconstructed using a decoding method corresponding to the second expression format to obtain visual media content in the second expression format.
  • FIG10 is a schematic diagram of a V3C bitstream structure provided in an embodiment of the present application.
  • the V3C parameter set () (V3C_parameter_set()) of the V3C_VPS may include a third syntax element (ptl_profile_toolset_idc), and if ptl_profile_toolset_idc is 128 to 133, it means that the current bitstream contains both a point cloud bitstream (such as VPCC basic or VPCC extended, etc.) and a multi-view video bitstream (such as MIV main or MIV Extended or MIV Geometry Absent, etc.).
  • the ASPS parameter set may include a first sub-syntax element (asps_heterogeneous_miv_extension_present_flag).
  • ptl_profile_toolset_idc is 128 to 133
  • the current splice type is determined according to asps_heterogeneous_miv_extension_present_flag.
  • the AFPS parameter set may include a second sub-syntax element (afps_heterogeneous_miv_extension_present_flag).
  • afps_heterogeneous_miv_extension_present_flag When ptl_profile_toolset_idc is 128 to 133, the current mosaic type is determined according to afps_heterogeneous_miv_extension_present_flag.
  • the AFPS parameter set also includes a second syntax element (afps_heterogeneous_frame_tile_toolset_miv_present_flag) for determining the slice type, thereby ensuring that the current slice should belong to multi-view or point cloud during parsing and decoding.
  • the code stream must ensure absolute consistency between afps_heterogeneous_miv_extension_present_flag and asps_heterogeneous_miv_extension_present_flag.
  • VPS is obtained from the V3C code stream
  • ptl_profile_toolset_idc the third syntax element
  • the current code stream is implemented through the point cloud decoding standard.
  • VPS is obtained from the V3C code stream
  • VPS is obtained from the V3C code stream
  • the method further includes: first parsing asps_vpcc_extension_present_flag, asps_miv_extension_present_flag and asps_extension_6bits in ASPS, and then obtaining HeterogeneousPresentFlag by judgment.
  • the following decoding operation is performed only if HeterogeneousPresentFlag is true.
  • the current mosaic ASPS auxiliary high-level information is split into two sub-information sets, that is, one sub-set is used for multi-view strips to realize decoding, and the other sub-set is used for point cloud strips to realize decoding.
  • the auxiliary information required for the point cloud strips can be obtained by parsing Part 8 of Standard 23090-5; the auxiliary information required for the multi-viewpoint strips can be obtained by parsing Part 8 of Standard 23090-5 and the newly added asps_heterogeneous_miv_extension and Part 8 of Standard 23090-12.
  • the method further includes: first parsing afps_miv_extension_present_flag and afps_extension_7bits in AFPS.
  • the following decoding operation can only be performed if HeterogeneousPresentFlag is true.
  • the current mosaic AFPS auxiliary high-level information is split into two sub-information sets, that is, one sub-set is used for multi-view strips to realize decoding, and the other sub-set is used for point cloud strips to realize decoding.
  • the auxiliary information required for the point cloud strips can be obtained by parsing Part 8 of Standard 23090-5; the auxiliary information required for the multi-viewpoint strips can be obtained by parsing Part 8 of Standard 23090-5 and the newly added afps_heterogeneous_miv_extension and Part 8 of Standard 23090-12.
  • the second syntax element may also include only afps_heterogeneous_frame_tile_toolset_miv_present_flag[i].
  • the code stream must ensure the absolute consistency of afps_heterogeneous_miv_extension_present_flag and asps_heterogeneous_miv_extension_present_flag.
  • the ptl_profile_toolset_idc number is used to indicate whether there is a heterogeneous mixed mosaic, and asps_heterogeneous_miv_extension_present_flag and afps_heterogeneous_miv_extension_present_flag are added to determine whether each mosaic should belong to point cloud/multi-view/point cloud + multi-view.
  • asps_heterogeneous_miv_extension_present_flag and afps_heterogeneous_miv_extension_present_flag are added to determine whether each mosaic should belong to point cloud/multi-view/point cloud + multi-view.
  • Table 9-1-1 and Table 9-1-2 respectively indicate the restrictions on the syntax of the toolbox-level components for multi-view under the integrated code stream and the restrictions on the syntax of the toolbox-level components for heterogeneous data.
  • Table 9-1-1 and Table 9-1-2 respectively indicate the restrictions on the syntax of toolbox-level components for multi-viewpoints and the restrictions on the syntax of toolbox-level components for heterogeneous data under the integrated code stream.
  • the type of each slice in the current mosaic is described through the newly added syntax element afps_heterogeneous_frame_tile_toolset_miv_present_flag, so as to ensure that during parsing and decoding, it is realized whether the current slice belongs to multi-view or point cloud.
  • This scheme can ensure that no matter in multi-viewpoint parsing or point cloud parsing, there is only one usable mosaic level parameter (AFPS) and mosaic sequence level parameter (ASPS), and it can achieve that the AFPS and ASPS of multi-viewpoints are not completely equal to the AFPS and ASPS of point cloud.
  • AFPS mosaic level parameter
  • AVS mosaic sequence level parameter
  • Table 1 shows an example of available toolset profile components.
  • Table 1 provides a list of toolset profile components defined for V3C and their corresponding identification syntax element values, such as ptl_profile_toolset_idc and ptc_one_v3c_frame_only_flag, which can be used only for this document.
  • the syntax element ptl_profile_toolset_idc provides the main definition of the toolset profile, and additional syntax elements such as ptc_one_v3c_frame_only_flag can specify additional features or restrictions of the defined profile.
  • ptc_one_v3c_frame_only_flag can be used to support only a single V3C frame.
  • Table 2 shows the RBSP syntax of the general atlas sequence parameter set, which can be used by ISO/IEC 23090-5.
  • the extended syntax element asps_heterogeneous_miv_extension_present_flag in the atlas sequence parameter set is used to indicate the type of atlas. Specifically, the value of the syntax element determines whether the atlas belongs to point cloud/multi-viewpoint/point cloud + multi-viewpoint.
  • Table 3 shows the ASPS heterogeneous multi-view extension syntax elements (Atlas sequence parameter set heterogeneous MIV extension syntax), which can be used by ISO/IEC 23090-5.
  • ashm_geometry_3d_bit_depth_minus1 is used to indicate the bit depth of the geometry coordinates of the reconstructed geometry content.
  • ashm_geometry_2d_bit_depth_minus1 is used to indicate the bit depth of the geometry when projected onto a 2D image.
  • ashm_log2_max_atlas_frame_order_cnt_lsb_minus4 is used to determine the variable value used for the splicing frame order count during the decoding process.
  • Table 4 shows the RBSP syntax of the general atlas frame parameter set, which can be used by ISO/IEC 23090-5.
  • the extended syntax element afps_heterogeneous_miv_extension_present_flag in the atlas frame parameter set is used to indicate the type of the atlas. Specifically, the value of the syntax element determines whether the atlas belongs to point cloud/multi-view/point cloud + multi-view.
  • Table 5 shows the AFPS heterogeneous MIV extension syntax elements (Atlas frame parameter set heterogeneous MIV extension syntax), which can be used by ISO/IEC 23090-5.
  • afhm_additional_lt_afoc_lsb_len is used to determine the value of the variable MaxLtAtlasFrmOrderCntLsbForMiv used in the decoding process of the reference splicing frame list.
  • Table 5 shows the AFPS heterogeneous MIV extended syntax elements.
  • asps_extension_6bits 0 indicates that asps_extension_data_flag is not present in the ASPS RBSP syntax structure. If present, the value of asps_extension_6bits is 0 or 1 in this standard text, and values other than 0 and 1 are reserved for future use by ISO/IEC. Decoders should allow values of asps_extension_6bits other than 0 or 1 and must ignore all asps_extension_data_flag syntax elements in asps. When not present, the value of asps_extension_6bits is inferred to be equal to 0.
  • asps_heterogeneous_miv_extension_present_flag is equal to 1 to indicate that the asps_heterogeneous_miv_extension() syntax structure is present in the syntax structure.
  • asps_heterogeneous_miv_extension_present_flag is equal to 0 to indicate that this syntax structure does not exist.
  • the value of asps_heterogeneous_miv_extension_present_flag is inferred to be 0.
  • ashm_geometry_3d_bit_depth_minus1 plus 1 indicates the bit depth of the geometry coordinates of the reconstructed volume content. ashm_geometry3d_bitdepth_minus1 should be between 0 and 31, inclusive.
  • ashm_geometry_2d_bit_depth_minus1 plus 1 represents the bit depth of the geometry when projected to a 2d image. ashm_geometry2d_bit_depth_minus1 should be in the range of 0 to 31, inclusive.
  • ashm_log2_max_atlas_frame_order_cnt_lsb_minus4 plus 4 specifies the value of the variables Log2MaxAtlasFrmOrderCntLsbForMiv and MaxAtlasFlmOrderCNTLsbForMIv used for the mosaic frame order count during decoding, as shown below:
  • MaxAtlasFrmOrderCntLsbForMiv 2 Log2MaxAtlasFrmOrderCntLsbForMiv
  • ashm_log2_max_atlas_frame_order_cnt_lsb_minus4 takes values from 0 to 12, including 0 and 31.
  • afps_extension_7bits 0 specifies that the afps_extension_data_flag syntax element is not present in the AFPS RBSP syntax structure. If present, afps_extension_7bits shall be equal to a value of 0 or 1 in the codestream conforming to this version of this document. Values of afps_extension_7bits other than 0 and 1 are reserved by ISO/IEC for future use. A decoder shall allow values of afps_extension_7bits other than 0 or 1 and shall ignore the afps_extension_data_flag syntax element in the AFPS. When afps_extension_7bits is not present, the value of afps_extension_7bits is inferred to be equal to 0.
  • afps_heterogeneous_miv_extension_present_flag 1 indicates that the afps_heterogeneous_miv_extension() syntax structure is present in the AFPS syntax structure.
  • afps_heterogeneous_miv_extension_present_flag 0 indicates that the afps_heterogeneous_miv_extension_present_flag specifies that this syntax structure is not present.
  • the value of afps_heterogeneous_miv_extension_present_flag is inferred to be equal to 0.
  • afps_heterogeneous_miv_extension_present_flag and asps_heterigeneous_miv_extenson_present_flag shall be consistent, i.e., they shall both be present and have the same value.
  • afps_heterogeneous_frame_tile_toolset_miv_present_flag[i] 1 indicates that the i-th slice in the heterogeneous hybrid mosaic is a mosaic slice belonging to miv (i.e., a multi-view video slice).
  • afps_heterogeneous_frame_tile_toolset_miv_present_flag[i] 0 specifies that the i-th tile in the heterogeneous hybrid mosaic is a mosaic slice belonging to vpcc (i.e., a point cloud slice).
  • afps_heterogeneous_frame_tile_toolset_miv_present_flag[i] When afps_heterogeneous_frame_tile_toolset_miv_present_flag[i] does not exist, the value of afps_heterogeneous_frame_tile_toolset_miv_present_flag[i] is inferred to be equal to 0.
  • afhm_additional_lt_afoc_lsb_len represents the value of the variable MaxLtAtlasFrmOrderCntLsbForMiv used in the reference mosaic frame list decoding process, as shown below:
  • afhm_additional_lt_afoc_lsb_len shall be between 0 and 32–Log2MaxAtlasFrmOrderCntLsbForMiv, inclusive.
  • Ath_atlas_frm_order_cnt_lsb indicates that the current mosaic slice specifies the mosaic frame order count modulo MaxAtlasFrmOrderCntLsb. If afps_heterogeneous_frame_tile_toolset_miv_present_flag is equal to 0 for the current mosaic slice, the length of the ath_atlas_frm_order_cnt_lsb syntax element is equal to Log2MaxAtlasFrmOrderCntLsb bits. The value of ath_atlas_frm_order_cnt_lsb shall be in the range of 0 to MaxAtlasFrmOrderCntLsb-1, inclusive.
  • Ath_atlas_frm_order_cnt_lsb syntax element is equal to Log2MaxAtlasFrmOrderCntLsbForMiv bits.
  • the value of ath_atlas_frm_order_cnt_lsb shall be in the range of 0 to MaxAtlasFrmOrderCntLsbForMiv-1, inclusive.
  • Ath_additional_afoc_lsb_val[j] specifies the value of FullAtlasFrmOrderCntLsbLt[RlsIdx][j] for the current mosaic strip. If afps_heterogeneous_frame_tile_toolset_miv_present_flag of the current mosaic strip is equal to 0, then
  • Ath_additional_afoc_lsb_val[j] is represented by the afps_additional_lt_afoc_lsb_len bits. When afps_additional_lt_afoc_lsb_len is not present, the value of ath_additional_afoc_lsb_val[j] is inferred to be equal to 0.
  • Ath_raw_3d_offset_axis_bit_count_minus1 plus 1 indicates the fixed bit width size of the values of the three syntax elements rpdu_3d_offset_u[tileID][p], rpdu_3d_offset_v[tileID][p] and rpdu_3e_offset_d[tileID][p], where p indicates that the sub-tile index is p and tileID indicates that the sub-tile is located in the slice whose slice ID is equal to tileID.
  • RawShift asps_geometry_3d_bit_depth_minus1 - ath_raw_3d_offset_axis_bit_count_minus1
  • RawShift ashm_geometry_3d_bit_depth_minus1-ath_raw_3d_offset_axis_bit_count_minus1
  • pdu_3d_offset_u[tileID][p] represents the offset of the reconstructed sub-tile along the tangent axis.
  • the current sub-tile belongs to the sub-tile index p in the strip with the strip index tileID.
  • pdu_3d_offset_u[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • the value of the number of bits used to represent pdu_3d_offset_u[tileID][p] shall be asps_geometry_3d_bit_depth_minus1+1.
  • pdu_3d_offset_u[tileID][p] shall be in the range of 0 to 2 ashm_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • the value of the number of bits used to represent pdu_3d_offset_u[tileID][p] is ashm_geometry_3d_bit_depth_minus1+1.
  • pdu_3d_offset_v[tileID][p] represents the offset along the bitangent axis for reconstructing the sub-tile, where the current sub-tile belongs to the sub-tile index p in the strip with the strip index tileID.
  • pdu_3d_offset_v[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • the value of the number of bits used to represent pdu_3d_offset_v[tileID][p] shall be asps_geometry_3d_bit_depth_minus1+1.
  • pdu_3d_offset_v[tileID][p] shall be in the range of 0 to 2 ashm_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • the value of the number of bits used to represent pdu_3d_offset_v[tileID][p] is ashm_geometry_3d_bit_depth_minus1+1.
  • pdu_3d_offset_d[tileID][p] represents the offset of the reconstructed sub-tile along the normal axis.
  • the current sub-tile belongs to the sub-tile index p in the strip with the strip index tileID.
  • Pdu3dOffsetD[tileID][p] is defined as follows:
  • Pdu3dOffsetD[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • the value used to represent the number of bits of pdu_3d_offset_v[tileID][p] is (asps_geometry_3d_bit_depth_minus1 – ath_pos_min_d_quantizer+1).
  • Pdu3dOffsetD[tileID][p] shall be in the range of 0 to 2 ashm_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • the value used to represent the number of bits of pdu_3d_offset_v[tileID][p] is (ashm_geometry_3d_bit_depth_minus1–ath_pos_min_d_quantizer+1).
  • pdu_3d_range_d[tileID][p] indicates the nominal maximum value of the offset expected in the reconstructed bit depth sub-tile geometry samples along the normal axis after conversion to the nominal representation.
  • the current sub-tile belongs to the sub-tile with sub-tile index p in the strip with strip index tileID.
  • Pdu3dRangeD[tileID][p] is defined as follows:
  • variable rangeDBitDepth takes the following value:
  • rangeDBitDepth Min(ashm_geometry_2d_bit_depth_minus1,ashm_geometry_3d_bit_depth_minus1)+1
  • Pdu3dRangeD[tileID][p] is inferred to be 2 rangeDBitDepth – 1. If present, the value of Pdu3dRangeD[tileID][p] shall be in the range of 0 to 2 rangeDBitDepth – 1, inclusive.
  • the number of bits representing pdu_3d_range_d[tileID][p] is equal to (rangeDBitDepth – ath_pos_delta_max_d_quantizer).
  • mpdu_3d_offset_u[tileID][p] represents the offset difference along the tangent axis to be applied to reconstruct two sub-tiles, where the two sub-tiles are the sub-tile with strip index tileID and sub-tile index p in the current mosaic and the sub-tile with strip index tileID and sub-tile index RefPatchIdx in the current mosaic.
  • mpdu_3d_offset_u[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1+1 +1) to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • mpdu_3d_offset_u[tileID][p] shall be in the range of (-2 ashm_geometry_3d_bit_depth_minus1+1 +1) to (2 ashm_geometry_3d_bit_depth_minus1+1 –1), inclusive.
  • mpdu_3d_offset_v[tileID][p] indicates the offset difference to be applied along the bitangent axis to reconstruct two sub-tiles, where the two sub-tiles are the sub-tile with strip index tileID and sub-tile index p in the current mosaic and the sub-tile with strip index tileID and sub-tile index RefPatchIdx in the current mosaic.
  • mpdu_3d_offset_v[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1+1 +1) to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • mpdu_3d_offset_v[tileID][p] shall be in the range of (-2 ashm_geometry_3d_bit_depth_minus1+1 +1) to (2 ashm_geometry_3d_bit_depth_minus1+1 –1), inclusive.
  • mpdu_3d_offset_d[tileID][p] represents the offset difference to be applied along the normal axis to reconstruct two sub-tiles, where the two sub-tiles are the sub-tile with the strip index tileID and the sub-tile index p in the current mosaic and the sub-tile with the strip index tileID and the sub-tile index RefPatchIdx in the current mosaic.
  • Mpdu3dOffsetD[tileID][p] is defined as follows:
  • Mpdu3dOffsetD[tileID][p] mpdu_3d_offset_d[tileID][p] ⁇ ath_pos_min_d_quantizer
  • mpdu_3d_offset_d[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1+1 +1) to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • mpdu_3d_offset_d[tileID][p] shall be in the range of (-2 ashm_geometry_3d_bit_depth_minus1+1 +1) to (2 ashm_geometry_3d_bit_depth_minus1+1 –1), inclusive.
  • ipdu_3d_offset_v[tileID][p] indicates the offset difference to be applied along the bitangent axis to reconstruct two sub-tiles, where the two sub-tiles are the sub-tile with strip index tileID and sub-tile index p in the current mosaic and the sub-tile with strip index tileID and sub-tile index RefPatchIdx in the current mosaic.
  • ipdu_3d_offset_v[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1+1 +1) to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • ipdu_3d_offset_v[tileID][p] shall be in the range of (-2 ashm_geometry_3d_bit_depth_minus1+1 +1) to (2 ashm_geometry_3d_bit_depth_minus1+1 –1), inclusive.
  • ipdu_3d_offset_d[tileID][p] indicates the offset difference to be applied along the normal axis to reconstruct two sub-tiles, where the two sub-tiles are the sub-tile with the strip index tileID and the sub-tile index p in the current mosaic and the sub-tile with the strip index tileID and the sub-tile index RefPatchIdx in the current mosaic.
  • Mpdu3dOffsetD[tileID][p] is defined as follows:
  • Ipdu3dOffsetD[tileID][p] ipdu_3d_offset_d[tileID][p] ⁇ ath_pos_min_d_quantizer
  • ipdu_3d_offset_d[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1+1 +1) to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • ipdu_3d_offset_d[tileID][p] shall be in the range of (-2 ashm_geometry_3d_bit_depth_minus1+1 +1) to (2 ashm_geometry_3d_bit_depth_minus1+1 –1), inclusive.
  • Codestream conformance requires that asps_geometry_3d_bit_depth_minus1 and asps_geometry_2d_bit-depth_minus1 are equal to gi_geometroy_3d_coordinates_bit_depth_minus1 and gi_geametry_2d_bit_depth_minus1, respectively.
  • asps_heterogeneous_miv_extension_present_flag is equal to 1
  • gi_geometry_3d_coordinates_bit_depth_minus1 and gi_geametry_2d_bit_depth_minus1 refer specifically to ISO/IEC23090-5.
  • ashm_geometroy_3d_bit-depth_minus1 and ashm_geometry_2d_bit_depth_minus1 do not have to be equal to gi_geominatory_3d_coordinates_bit__depth_nus1 and gi_geometriy_2d_pth_minos1.
  • Sub-tile data unit multi-view extension syntax and semantics
  • pdu_depth_occ_threshold[tileID][p] indicates that in the stripe with the stripe index equal to tileID, for the sub-tile with the index equal to p, the occupancy value is set to unoccupied when it is considered to be below the threshold.
  • afps_heterogeneous_frame_tile_toolset_miv_present_flag is equal to 0 for the current mosaic strip
  • the number of bits of pdu_depth_occ_threshold[tileID][p] is equal to asps_geometry_2d_bit_depth_minus1+1.
  • afps_heterogeneous_frame_tile_toolset_miv_present_flag is equal to 1 for the current mosaic slice, the number of bits of pdu_depth_occ_threshold[tileID][p] is equal to ashm_geometry_2d_bit_depth_minus1+1.
  • pdu_depth_occ_threshold[tileID][p] is inferred to be dq_depth_occ_threshold_default[pdu_projection_id[tileID][p]]. Note that pdu_projection_id[tileID][p] corresponds to the view ID of the sub-tile with index equal to p, in the slice indexed by tileID.
  • the output of this process is AtlasFrmOrderCntVal, the mosaic frame order count for the current mosaic slice.
  • the mosaic frame order count is used to identify the output order of the mosaic frames, and for decoder consistency checking.
  • Each encoded mosaic frame is associated with a mosaic frame order count variable, denoted AtlasFrmOrderCntVal.
  • prevAtlasFrm be the previous mosaic frame in decoding order whose TemporalID is equal to 0 and which is not a RASL, RADL or SLNR coded mosaic frame.
  • variable prevAtlasFrmOrderCntLsb is set equal to the mosaic frame order count LSB value of prevAtrasFrmath_atlas_frm_order_cnt_LSB.
  • variable prevAtlasFrmOrderCntMsb is set equal to prevAtrasFrm's AtlasFrmaOrderCNTMsb.
  • the variable AtlasFrmOrderCntMsb of the current mosaic frame is derived as follows:
  • AtlasFrmOrderCntMsb is derived as follows:
  • AtlasFrmOrderCntVal AtlasFrmOrderCntMsb+ath_atlas_frm_order_cnt_lsb
  • AtlasFrmOrderCntVal takes values in the range -2 31 to 2 31 – 1 (inclusive). In one CAS, any two mosaic frames with the same nal_layer_id value have different AtlasFrmOrderCntVal.
  • the AtlasFrmOrderCnt(aFrmX) function is defined as follows:
  • AtlasFrmOrderCnt(aFrmX) AtlasFrmOrderCntVal of the atlas frame aFrmX
  • the DiffAtlasFrmOrderCnt(aFrmA, aFrmB) function is defined as follows:
  • DiffAtlasFrmOrderCnt(aFrmA,aFrmB) AtlasFrmOrderCnt(aFrmA)–AtlasFrmOrderCnt(aFrmB)
  • the bitstream shall not contain data that would cause the value of DiffAtlasFrmOrderCnt(aFrmA, aFrmB) used in the decoding process to be outside the range -2 15 to 2 15 -1, inclusive.
  • This procedure is called at the beginning of the decoding process, for each mosaic slice of a mosaic frame.
  • Reference mosaic frames are handled via reference indices.
  • Reference indices are indices into reference mosaic frame lists (RAFL).
  • RAFL reference mosaic frame lists
  • a single reference mosaic frame list RefAtlasFrmList is used to decode the mosaic strip data.
  • RAFL RefAtlasFrmList is derived. RAFL is used for reference mosaic frame labeling or mosaic strip data decoding as specified in subclause 9.2.4.4.
  • RefAtlasFrmList may be used for bitstream consistency checking, but its derivation is not required for decoding of the current mosaic frame or for mosaics that follow the current mosaic frame in decoding order.
  • the reference mosaic frame list RefAtlasFrmList is constructed as follows:
  • the first NumRefIdxActive entries in RefAtlasFrmList are called active entries in RefAtlasFrmList, and the other entries in RefAtlasFlmList are called inactive entries in RefAtrasFrmLists.
  • the array RefAtduTotalNumPatches is set to the array AtduToTotalNum Patches corresponding to the first entry in RefAtlasFrmList, RefAtlasFlmList[0].
  • the mosaic frame referenced by each active entry in RefAtlasFrmList shall exist in the DAB and its temporal ID shall be less than or equal to the temporal ID of the current mosaic frame.
  • Each entry in RefAtlasFrmList shall reference a mosaic frame that should not be the current mosaic frame.
  • the short-term reference mosaic frame entry and the long-term reference mosaic frame entry in the mosaic strip RefAtlasFrmList shall not refer to the same mosaic frame.
  • setOfRefAtlasFrms be the only mosaic frame referenced by all entries in RefAtlasFlmList that have the same nal_layer_id as the current mosaic frame.
  • the number of mosaic frames in setOfRefAtlasFrms should be less than or equal to asps_max_dec_atlas_frame_buffering_minus1 and setOfrefAtlasFms should be the same for all mosaic strips of the mosaic frame.
  • the mosaic frame referenced by each active entry in RefAtlasFrmList should have exactly the same number of strips as the current mosaic frame.
  • the RefAtlasFrmList of all strips in the current mosaic should contain the same reference mosaic, but there is no restriction on the ordering of the reference mosaic.
  • the mosaic referenced by the entry in RefAtlasFrmList shall not precede any previous IRAP coded mosaic (in decoding order, when nal_layer_id equals layerID) in either output order or decoding order.
  • the mosaic referenced by the active entry in RefAtlasFrmList generated by the decoding process to generate an unavailable reference mosaic frame for the CRA coded mosaic associated with the current mosaic shall not exist.
  • the mosaic frame referenced by the active entry in RefAtlasFrmList shall not precede the IRAP coded mosaic in either output order or decoding order.
  • entries in RefAtlasFrmList shall not reference mosaics preceding the IRAP coded mosaic in output order or decoding order.
  • TilePatch3dOffsetU[tileID][p] represents the offset along the tangent axis for reconstructing the sub-tile, where the current sub-tile belongs to the sub-tile index p in the strip with the strip index tileID.
  • TilePatch3dOffsetU[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1+1 -1 (inclusive).
  • TilePatch3dOffsetU[tileID][p] shall be in the range of 0 to (2 ashm_geometry_3d_bit_depth_minus1+1 -1), inclusive.
  • TilePatch3dOffsetV[tileID][p] represents the offset along the bitangent axis to reconstruct the sub-tile, where the current sub-tile belongs to the sub-tile index p in the strip with the strip index tileID.
  • TilePatch3dOffsetV[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • TilePatch3dOffsetV[tileID][p] shall be in the range of 0 to 2 ashm_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • TilePatch3dOffsetD[tileID][p] represents the offset along the normal axis for reconstructing the sub-tile, where the current sub-tile belongs to the sub-tile index p in the strip with the strip index tileID.
  • TilePatch3dOffsetD[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • TilePatch3dOffsetD[tileID][p] shall be in the range of 0 to 2 ashm_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • TilePatch3dRangeD[tileID][p] represents the nominal maximum value of the offset expected in the reconstructed bit depth sub-tile geometry samples along the normal axis after conversion to the nominal representation.
  • the current sub-tile belongs to the sub-tile with sub-tile index p in the strip with strip index tileID.
  • rangeDBitDepth Min(ashm_geometry_2d_bit_depth_minus1,ashm_geometry_3d_bit_depth_minus1)+1
  • tilePatch3dRangeD[tileID][p] takes values from 0 to 2 rangeDBitDepth – 1 (inclusive). .
  • Table 6 Max allowed syntax element values for the V-PCC toolset profile components.
  • Table 8 shows the maximum allowed syntax element values for the MIV toolset profile components.
  • Table 9-1-1 Allowable values of syntax element values for the heterogeneous toolset profile components Extended
  • Bitstream conformance for each bitstream conformance test, all of the following conditions shall be met:
  • variable deltaTime90k[n] For each coded atlas access unit n (n greater than 0) associated with the buffering period SEI message, let the variable deltaTime90k[n] specify the following:
  • deltaTime90k[n] 90000*(AuNominalRemovalTime[n]-AuFinalArrivalTime[n-1])
  • CAB overflow is specified as the case where the total number of bits in the CAB is greater than the CAB size. A CAB shall not overflow.
  • CAB underflow is defined as follows:
  • CAB underflow is specified as the condition that the nominal CAB removal time AuNominalRemovalTime[n] of coded splice map access unit n is less than the final CAB arrival time AuFinalArrivalTime[n] of coded splice map access unit n, at least one of which has a value of n.
  • the nominal removal time of mosaic frames from the CAB shall satisfy the constraints on AuNominalRemovalTime[n] and AuCabRemovalTime[n] in Annex A.
  • the number of mosaic frames decoded in the DAB after the procedure for removing mosaics from the DAB has been called as specified, including all mosaic frames n marked as "used for reference” or for which AtlasFrameOutputFlag is equal to 1 and AuCabRemovalTime[n] is less than AuCabRemovalTime[currAtlasFrame], where currAtlasFrame is the current mosaic frame and shall be less than or equal to asps_max_dec_alas_frame_buffering_minus1.
  • maxAtlasFrameOrderCnt-minAtlasFrameOrderCnt shall be less than MaxAtlasFrmOrderCntLsb/2. If afps_heterogeneous_frame_tile_toolset_miv_present_flag is equal to 1 for the current mosaic strip, the value of maxAtlasFrame OrderCint-minAtlasFrameOrderCnt shall be less than MaxAtlasFrmOrderCntLsb ForMiv/2.
  • DabOutputInterval[n] is the difference between the output time of one mosaic frame and the output time of the first mosaic frame after it, and AtlasFrameOutputFlag is equal to 1, which shall satisfy the level specified in the bitstream for the profile, layer, and specified decoding process.
  • recovery_afoccnt specifies the recovery point of a decoded mosaic frame in output order. If there is a mosaic frame aFrmA in the CAS that follows the current mosaic frame in decoding order (i.e., the mosaic frame associated with the current SEI message) and whose AtlasFrmOrderCntVal is equal to the Atlasfrmardercntval of the current mosaic frame plus the value of recovery_afoc_cnt, then atlas frame aFrmA is called the recovery point mosaic frame.
  • the recovery point mosaic frame shall not precede the current mosaic frame in decoding order. Starting from the output order position of the recovery point mosaic frame, all decoded mosaic frames displayed in output order are correct or approximately correct in content.
  • the value of recovery_afoc_cnt shall be in the range -MaxAtlasFrmOrderCntLsb/2 to MaxAtlasFlmOrderCNTLsb/2-1 (inclusive).
  • recovery_afoc_cnt shall be in the range -MaxAtlasFrmOrderCntLsbForMiv/2 to MaxAtlasFlmOrderCNTLsbForMIv/2–1.
  • ASPSCommonByteString(stringByte,posByte) function is defined as follows:
  • vui_display_box_origin[d] specifies the offset along axis d relative to the origin of the coordinate system. When an element of vui_ddisplay_box_origin[d] is not present, its value shall be inferred to be equal to 0. If afps_heterogeneous_frame_tile_toolset_miv_present_flag is equal to 0 for the current mosaic strip, the number of bits used to represent vui_display_box_origin[d] is asps_geometry_3d_bit_depth_minus1+1.
  • afps_heterogeneous_frame_tile_toolset_miv_present_flag is equal to 1 for the current mosaic strip
  • the number of bits used to represent vui_display_box_origin[d] is ashm_geometry_3d_pit_depth_minus1+1. Values of d equal to 0, 1, and 2 correspond to the X, Y, and Z axes, respectively.
  • vui_display_box_size[d] specifies the size of the display box, sampled along axis d. When an element of vui_display_box_size[d] does not exist, its value is unknown. If afps_heterogeneous_frame_tile_toolset_miv_present_flag is equal to 0 for the current mosaic strip, the number of bits used to represent vui_display_box_size[d] is asps_geometry_3d_bit_depth_minus1+1.
  • afps_heterogeneous_frame_tile_toolset_miv_present_flag is equal to 1 for the current mosaic strip, the number of bits used to represent vui_display_box_size[d] is aspx_geometry3dbitdepth_minus1+1.
  • vui_anchor_point[d] represents the position of the anchor point along the d-axis. If afps_heterogeneous_frame_tile_toolset_miv_present_flag is equal to 0 for the current mosaic strip, the value of vui_anchor_point[d] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1+1 -1. If vui_anchor_point[d] does not exist, it shall be inferred to be equal to 0. The number of bits used to represent vui_anhor_point[d] is asps_geometry_3d_bit_depth_minus1+1.
  • vui_ankor_point shall be in the range of 0 to 2 ashm_geometry_3d_pth_minos1+1 -1. If vui_anchor_point[d] does not exist, it shall be inferred to be equal to 0.
  • the number of bits used to represent vui_ancor_point[d] is ashm_geometry_3d_bit_depth_minus1 + 1. d values equal to 0, 1, and 2 correspond to the X, Y, and Z axes respectively.
  • Multi-viewpoint standard available for use with ISO/IEC 23090-12
  • This process expands the integer depth values of the mosaic into floating point depth values in scene coordinates (e.g. meters).
  • Integer depth values may be scaled to an implementation-defined bit depth and range 0...maxSampleD. Otherwise, maxSampleD is set to 2asps_geometry_2d_bit_depth_minus1+1 –1.
  • This process decodes the reconstructed volume frames and reconstructs the MPI frames from the bitstream where ptc_restricted_geometry_flag is equal to 1.
  • Inputs to this process include:
  • variable atlasID which is the mosaic ID
  • AspsFrameHeight[atlasID] and AspsFrame Width[atlasID] represent the number of rows and columns of the mosaic frame respectively;
  • variable maxDepthSampleValue indicates the maximum value of the coded geometry sample and is set to 2 (asps_geometry_3d_bit_depth_minus1+1) –1; in the special case, if asps_heterogeneous_miv_extension_present_flag is equal to 1, the variable maxDepthSampleValue indicates the maximum value of the coded geometry sample and is set to 2 (ashm_geometry_3d_bit_depth_minus1+1) –1.
  • maxNbLayers indicates the maximum number of depth layers of MPI, which is set to maxDepthSampleValue+1.
  • ASPS first parse asps_vpcc_extension_present_flag, asps_miv_extension_present_flag, asps_heterogeneous_miv_extension_present_flag (first sub-syntax element) and asps_extension_5bits.
  • asps_vpcc_extension_present_flag is the flag of point cloud
  • asps_miv_extension_present_flag is the flag of multi-view
  • asps_heterogeneous_miv_extension_present_flag is the flag of point cloud + multi-view.
  • the value of asps_heterogeneous_miv_extension_present_flag is obtained by performing an AND operation on the value of asps_vpcc_extension_present_flag and the value of asps_miv_extension_present_flag, that is, it can be replaced as follows: in ASPS, first parse asps_vpcc_extension_present_flag, asps_miv_extension_present_flag, and asps_extension_6bits. Perform an AND operation on asps_vpcc_extension_present_flag and asps_miv_extension_present_flag to obtain asps_heterogeneous_miv_extension_present_flag.
  • the current mosaic image ASPS auxiliary high-level information is split into two sub-information sets, one subset is used for multi-view strips to achieve decoding, and the other subset is used for point cloud strips to achieve decoding.
  • the auxiliary information required for the point cloud strips can be obtained by parsing Part 8 of Standard 23090-5; the auxiliary information required for the multi-view strips can be obtained by parsing Part 8 of Standard 23090-5 and the newly added syntax asps_geometry_3d_bit_depth_minus1_for_miv and Part 8 of Standard 23090-12.
  • afps_miv_extension_present_flag In AFPS, afps_miv_extension_present_flag, afps_heterogeneous_type_extension_present_flag (the third sub-syntax element in the second syntax element), afps_heterogeneous_miv_extension_present_flag (the second sub-syntax element), and afps_extension_5bits are first parsed.
  • the value of afps_heterogeneous_miv_extension_present_flag is obtained by performing an AND operation on the value of asps_vpcc_extension_present_flag and the value of asps_miv_extension_present_flag, that is, it can be replaced here as follows: In AFPS, afps_miv_extension_present_flag, afps_heterogeneous_type_extension_present_flag, and afps_extension_6bits are first parsed. asps_vpcc_extension_present_flag and asps_miv_extension_present_flag are ANDed to obtain afps_heterogeneous_miv_extension_present_flag.
  • afps_heterogeneous_miv_extension_present_flag indicates that the current mosaic image is a homogeneous content (all strips are of point cloud type or multi-view type);
  • the current mosaic AFPS auxiliary high-level information is split into two sub-information sets, one subset is used for multi-view strips to achieve decoding, and the other subset is used for point cloud strips to achieve decoding.
  • the auxiliary information required for point cloud strips can be obtained by parsing Part 8 of Standard 23090-5; the auxiliary information required for multi-view strips can be obtained by parsing Part 8 of Standard 23090-5 and Part 8 of Standard 23090-12.
  • afps_heterogeneous_type_extension_present_flag and afps_heterogeneous_frame_tile_toolset_miv_present_flag both indicate whether the current mosaic is a heterogeneous mixed mosaic
  • afps_heterogeneous_tile_type[i] and afps_heterogeneous_frame_tile_toolset_miv_present_flag[i] both indicate the stripe type of the i-th stripe.
  • the code stream must ensure absolute consistency between afps_heterogeneous_miv_extension_present_flag and asps_heterogeneous_miv_extension_present_flag.
  • Table 1 shows an example of available toolset profile components.
  • Table 1 provides a list of toolset profile components defined for V3C and their corresponding identification syntax element values, such as ptl_profile_toolset_idc and ptc_one_v3c_frame_only_flag, which can be used only for this document.
  • the syntax element ptl_profile_toolset_idc provides the main definition of the toolset profile.
  • Additional syntax elements such as ptc_one_v3c_frame_only_flag can specify additional features or restrictions of the defined profile.
  • ptc_one_v3c_frame_only_flag can be used to support only a single V3C frame.
  • Table 2-1 shows the RBSP syntax of the general atlas sequence parameter set, which can be used by ISO/IEC 23090-5.
  • the extended syntax element asps_vpcc_extension_present_flag in the atlas sequence parameter set is used to indicate that the atlas belongs to the point cloud
  • asps_miv_extension_present_flag is used to indicate that the atlas belongs to multi-view
  • asps_heterogeneous_miv_extension_present_flag is used to indicate the type of the atlas.
  • the value of the syntax element determines whether the atlas belongs to the point cloud/multi-view/point cloud + multi-view.
  • the first extended syntax element asps_geometry_3d_bit_depth_minus1_for_miv is used to determine the bit depth of the geometric coordinates of the reconstructed geometric content.
  • Decoding case 4 only needs to add the first extended syntax element to obtain the auxiliary information required for the multi-view strip.
  • Table 4-1 is the RBSP syntax of the general atlas frame parameter set (General atlas frame parameter set RBSP syntax), which can be used by ISO/IEC 23090-5.
  • afps_miv_extension_present_flag is used to indicate that the mosaic belongs to multi-view
  • afps_heterogeneous_miv_extension_present_flag is used to indicate the type of the mosaic. Specifically, the mosaic should be determined to belong to point cloud/multi-view/point cloud + multi-view according to the value of this syntax element.
  • afps_heterogeneous_type_extension_present_flag is used to determine whether all strips of the current mosaic are of the same type. If not, each strip is traversed according to afps_heterogeneous_tile_type[i] to determine the strip type.
  • asps_extension_present_flag 1 specifies that the syntax elements asps_vpcc_extension_present_flag, asps_miv_extension_present_flag, asps_hetergeous_miv_extension_present_flag, and asps_extension_5 bit are present in the atlas_sequence_parameter_set_rbsp() syntax structure.
  • asps_extension_present_flag 0 specifies that the syntax elements asps_vpcc_extension_present_flag, asps_miv_extension_present_flag, asps_hetergeous_miv_extension_present_flag, and asps_extension_5 bit are not present.
  • asps_terogeneous_miv_extension_present_flag 1 specifies that the asps_geometry_3d_bit_depth_minus1_for_miv syntax element is present in the atlas_sequence_parameter_set_rbsp() syntax structure.
  • asps_terogeneous_miv_extension_present_flag 0 indicates that this syntax element is not present. If not present, the value of asps_terogeneous_miv_extension_present_flag is inferred to be equal to 0.
  • asps_extension_5bits 0 specifies that the syntax element asps_extension_data_flag is not present in the asps RBSP syntax structure. If present, asps_extension_5bits shall be equal to 0 in bitstreams conforming to this version of this document. Values of asps_extension_5bits not equal to 0 are reserved for future use by ISO/IEC. A decoder shall allow values of asps_extension_5bits not equal to 0 and shall ignore all asps_extension_data_flag syntax elements in ASPS NAL units. When not present, the value of asps_extension_5bits is inferred to be equal to 0.
  • asps_geometry_3d_bit_depth_minus1_for_miv indicates the bit depth of the geometry coordinates of the reconstructed volume content.
  • asps_geometry_3d_bit_depth_minus1_for_miv should be in the range of 0 to 31, inclusive.
  • afps_extension_present_flag 1 specifies that the syntax elements afps_miv_extension_present_fag, afps_heterogeneous_type_extension_present_flag, afps_heterogeneous_miv_expansion_ppresent_fag, and afps_expansion_5bits are present in the atlas_frame_parame_parame_set_rbsp() syntax structure.
  • afps_extension_present_flag 0 specifies that the syntax elements afps_miv_extension_present_fag, afps_heterogeneous_type_extension_present_flag, afps_heterogeneous_miv_eExtension_ppresent_fag, and afps_eextension_5bits are not present.
  • afps_heterogeneous_type_extension_present_flag 1 specifies that slices referencing this afps include heterogeneous types.
  • afps_heterogeneous_type_extension_present_flag 0 specifies that each slice referencing this afps includes the same type. If not present, the value of afps_heterogeneous_type_extension_present_flag is inferred to be equal to 0.
  • afps_heterogeneous_miv_extension_present_flag 1 specifies that the XXXXX syntax element is present in the atlas_frame_parame_parame_set_rbsp() syntax structure.
  • afps_heterogeneous_miv_extension_present_flag 0 specifies that the XXXX syntax element is not present. If not present, the value of afps_heterogeneous_miv_extension_present_flag is inferred to be equal to 0.
  • both afps_heterogeneous_miv_extension_present_flag and asps_heterogeneous_miv_extension_present_flag shall be present.
  • afps_extension_5bits 0 specifies that the syntax element afps_extension_data_flag is not present in the afps RBSP syntax structure. If present, afps_extension_5bits shall be equal to 0 in bitstreams conforming to this version of this document. Values of afps_extension_5bits not equal to 0 are reserved for future use by ISO/IEC. A decoder shall allow values of afps_extension_5bits not equal to 0 and shall ignore all afps_extension_data_flag syntax elements in AFPS NAL units. When not present, the value of afps_extension_5bits is inferred to be equal to 0.
  • afps_heterogeneous_tile_type[i] indicates the slice type with tileID equal to i as specified in Table VI. Values indicated as reserved are reserved for future use by ISO/IEC and shall not appear in bitstreams conforming to this version of this document. Decoders conforming to this document shall ignore such reserved slice types.
  • Ath_raw_3d_offset_axis_bit_count_minus1 plus 1 indicates the fixed bit width size of the values of the three syntax elements rpdu_3d_offset_u[tileID][p], rpdu_3d_offset_v[tileID][p] and rpdu_3e_offset_d[tileID][p], where p indicates that the sub-tile index is p and tileID indicates that the sub-tile is located in the slice whose slice ID is equal to tileID.
  • the length of the syntax element used to represent ath_raw_3d_offset_axis_bit_count_minus1 is equal to Floor(Log2(asps_geometry_3d_bit_depth_minus1+1)).
  • RawShift asps_geometry_3d_bit_depth_minus1 - ath_raw_3d_offset_axis_bit_count_minus1
  • the length of the syntax element used to represent ath_raw_3d_offset_axis_bit_count_minus1 is equal to Floor(Log2(asps_geometry_3d_bit_depth_minus1_for_miv+1)).
  • pdu_3d_offset_u[tileID][p] represents the offset of the reconstructed sub-tile along the tangent axis.
  • the current sub-tile belongs to the sub-tile index p in the strip with the strip index tileID.
  • pdu_3d_offset_u[tileID][p] represents the offset of the reconstructed sub-tile along the tangent axis.
  • the current sub-tile belongs to the sub-tile index p in the strip with the strip index tileID.
  • the value of pdu_3d_offset_u[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1+1 -1 (inclusive).
  • the value used to represent the number of bits of pdu_3d_offset_u[tileID][p] is asps_geometry_3d_bit_depth_minus1+1.
  • pdu_3d_offset_u[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1_for_miv+1 -1, inclusive.
  • the value of the number of bits used to represent pdu_3d_offset_u[tileID][p] shall be asps_geometry_3d_bit_depth_minus1_for_miv+1.
  • pdu_3d_offset_v[tileID][p] represents the offset along the bitangent axis for reconstructing the sub-tile, where the current sub-tile belongs to the sub-tile index p in the strip with the strip index tileID.
  • the value of pdu_3d_offset_v[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • the value of the number of bits used to represent pdu_3d_offset_v[tileID][p] is asps_geometry_3d_bit_depth_minus1+1.
  • pdu_3d_offset_v[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1_for_miv+1 -1, inclusive.
  • the value of the number of bits used to represent pdu_3d_offset_v[tileID][p] shall be asps_geometry_3d_bit_depth_minus1_for_miv+1.
  • pdu_3d_offset_d[tileID][p] represents the offset of the reconstructed sub-tile along the normal axis.
  • the current sub-tile belongs to the sub-tile index p in the strip with the strip index tileID.
  • Pdu3dOffsetD[tileID][p] is defined as follows:
  • the value of Pdu3dOffsetD[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • the value used to represent the number of bits of pdu_3d_offset_v[tileID][p] is (asps_geometry_3d_bit_depth_minus1–ath_pos_min_d_quantizer+1).
  • the value of Pdu3dOffsetD[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1_for_miv+1 -1, inclusive.
  • the value used to represent the number of bits of pdu_3d_offset_v[tileID][p] is (asps_geometry_3d_bit_depth_minus1_for_miv – ath_pos_min_d_quantizer+1).
  • pdu_3d_range_d[tileID][p] indicates the nominal maximum value of the offset expected in the reconstructed bit depth sub-tile geometry samples along the normal axis after conversion to the nominal representation.
  • the current sub-tile belongs to the sub-tile with sub-tile index p in the strip with strip index tileID.
  • Pdu3dRangeD[tileID][p] is defined as follows:
  • variable rangeDBitDepth takes the following value:
  • rangeDBitDepth Min(asps_geometry_2d_bit_depth_minus1,asps_geometry_3d_bit_depth_minus1_for_miv)+1
  • Pdu3dRangeD[tileID][p] is inferred to be 2 rangeDBitDepth – 1. If present, the value of Pdu3dRangeD[tileID][p] shall be in the range of 0 to 2 rangeDBitDepth – 1, inclusive.
  • the number of bits representing pdu_3d_range_d[tileID][p] is equal to (rangeDBitDepth – ath_pos_delta_max_d_quantizer).
  • mpdu_3d_offset_u[tileID][p] represents the offset difference along the tangent axis to be applied to reconstruct two sub-tiles, where the two sub-tiles are the sub-tile with strip index tileID and sub-tile index p in the current mosaic and the sub-tile with strip index tileID and sub-tile index RefPatchIdx in the current mosaic.
  • mpdu_3d_offset_u[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1+1 +1) to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • mpdu_3d_offset_u[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1_for_miv+1 +1) to (2 asps_geometry_3d_bit_depth_minus1_for_miv+1 –1), inclusive.
  • mpdu_3d_offset_v[tileID][p] indicates the offset difference to be applied along the bitangent axis to reconstruct two sub-tiles, where the two sub-tiles are the sub-tile with strip index tileID and sub-tile index p in the current mosaic and the sub-tile with strip index tileID and sub-tile index RefPatchIdx in the current mosaic.
  • mpdu_3d_offset_v[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1+1 +1) to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • mpdu_3d_offset_v[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1_for_miv+1 +1) to (2 asps_geometry_3d_bit_depth_minus1_for_miv+1 –1), inclusive.
  • mpdu_3d_offset_d[tileID][p] represents the offset difference to be applied along the normal axis to reconstruct two sub-tiles, where the two sub-tiles are the sub-tile with the strip index tileID and the sub-tile index p in the current mosaic and the sub-tile with the strip index tileID and the sub-tile index RefPatchIdx in the current mosaic.
  • Mpdu3dOffsetD[tileID][p] is defined as follows:
  • Mpdu3dOffsetD[tileID][p] mpdu_3d_offset_d[tileID][p] ⁇ ath_pos_min_d_quantizer
  • mpdu_3d_offset_d[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1+1 +1) to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • mpdu_3d_offset_d[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1_for_miv+1 +1) to (2 asps_geometry_3d_bit_depth_minus1_for_miv+1 –1), inclusive.
  • ipdu_3d_offset_v[tileID][p] indicates the offset difference to be applied along the bitangent axis to reconstruct two sub-tiles, where the two sub-tiles are the sub-tile with strip index tileID and sub-tile index p in the current mosaic and the sub-tile with strip index tileID and sub-tile index RefPatchIdx in the current mosaic.
  • ipdu_3d_offset_v[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1+1 +1) to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • ipdu_3d_offset_v[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1_for_miv+1 +1) to (2 asps_geometry_3d_bit_depth_minus1_for_miv+1 –1), inclusive.
  • ipdu_3d_offset_d[tileID][p] indicates the offset difference to be applied along the normal axis to reconstruct two sub-tiles, where the two sub-tiles are the sub-tile with the strip index tileID and the sub-tile index p in the current mosaic and the sub-tile with the strip index tileID and the sub-tile index RefPatchIdx in the current mosaic.
  • Mpdu3dOffsetD[tileID][p] is defined as follows:
  • Ipdu3dOffsetD[tileID][p] ipdu_3d_offset_d[tileID][p] ⁇ ath_pos_min_d_quantizer
  • ipdu_3d_offset_d[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1+1 +1) to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • ipdu_3d_offset_d[tileID][p] shall be in the range of (-2 asps_geometry_3d_bit_depth_minus1_for_miv+1 +1) to (2 asps_geometry_3d_bit_depth_minus1_for_miv+1 –1), inclusive.
  • Codestream conformance requires that asps_geometry_3d_bit_depth_minus1 and asps_geometry_2d_bit-depth_minus1 are equal to gi_geometroy_3d_coordinates_bit_depth_minus1 and gi_geametry_2d_bit_depth_minus1, respectively.
  • asps_heterogeneous_miv_extension_present_flag is equal to 1
  • gi_geometry_3d_coordinates_bit_depth_minus1 and gi_geametry_2d_bit_depth_minus1 refer specifically to ISO/IEC23090-5.
  • asps_geometry_3d_bit_depth_minus1_for_miv does not have to be equal to gi_geominatory_3d_coordinates_bit__depth_nus1 and gi_geometriy_2d_pth_minos1.
  • TilePatch3dOffsetU[tileID][p] represents the offset along the tangent axis for reconstructing the sub-tile, where the current sub-tile belongs to the sub-tile index p in the strip with the strip index tileID.
  • TilePatch3dOffsetU[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • TilePatch3dOffsetU[tileID][p] shall be in the range of 0 to (2 asps_geometry_3d_bit_depth_minus1_for_miv+1 -1), inclusive.
  • TilePatch3dOffsetV[tileID][p] represents the offset along the bitangent axis to reconstruct the sub-tile, where the current sub-tile belongs to the sub-tile index p in the strip with the strip index tileID.
  • TilePatch3dOffsetV[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • TilePatch3dOffsetV[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1_for_miv+1 -1, inclusive.
  • TilePatch3dOffsetD[tileID][p] represents the offset along the normal axis for reconstructing the sub-tile, where the current sub-tile belongs to the sub-tile index p in the strip with the strip index tileID.
  • TilePatch3dOffsetD[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1+1 -1, inclusive.
  • TilePatch3dOffsetD[tileID][p] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1_for_miv+1 -1, inclusive.
  • TilePatch3dRangeD[tileID][p] represents the nominal maximum value of the offset expected in the reconstructed bit depth sub-tile geometry samples along the normal axis after conversion to the nominal representation.
  • the current sub-tile belongs to the sub-tile with sub-tile index p in the strip with strip index tileID.
  • rangeDBitDepth Min(asps_geometry_2d_bit_depth_minus1,asps_geometry_3d_bit_depth_minus1_for_miv)+1
  • tilePatch3dRangeD[tileID][p] takes a value from 0 to 2 rangeDBitDepth – 1 (inclusive).
  • Table 9-1-1-1 Allowable values of syntax element values for the heterogeneous toolset profile components Extended
  • the syntax elements in asps except asps_geometry_3d_bit_depth_minus1 should have the same value for MIV and VPCC.
  • asps_vpcc_extension_present_flag if asps_vpcc_extension_present_flag is equal to 1, then asps_terogeneous_miv_extension_present_flag, afps_heterogeneous_miv_extention_present_flag, and afps_hettogeneous_type_extension_present_flag are present and their values shall be equal to 1.
  • ASPSCommonByteString(stringByte,posByte) function is defined as follows:
  • vui_display_box_origin[d] specifies the offset along axis d relative to the origin of the coordinate system. When an element of vui_ddisplay_box_origin[d] is not present, its value shall be inferred to be equal to 0. If the afps_heterogeneous_tile_type of the current mosaic strip is equal to 0, the number of bits used to represent vui_display_box_origin[d] is asps_geometry_3d_bit_depth_minus1+1.
  • the number of bits used to represent vui_display_box_origin[d] is asps_geometry_3d_bit_depth_minus1_for_miv+1. Values of d equal to 0, 1, and 2 correspond to the X, Y, and Z axes, respectively.
  • vui_display_box_size[d] specifies the size of the display box, sampled along axis d. When an element of vui_display_box_size[d] does not exist, its value is unknown. If the current mosaic strip has afps_heterogeneous_tile_type equal to 0, the number of bits used to represent vui_display_box_size[d] is asps_geometry_3d_bit_depth_minus1+1. If the current mosaic strip has afps_heterogeneous_tile_type equal to 1, the number of bits used to represent vuidisplaybox_size[d] is asps_geometry_3d_bit_depth_minus1_for_miv+1.
  • vui_anchor_point_present_flag 1 indicates that the vui_anchr_point[d] syntax element is present in the vui_parameters() syntax structure.
  • vui_anchor_point_present_flag 0 indicates that the vui_anchr_point[d] syntax element is not present.
  • vui_anchor_point[d] represents the position of the anchor point along the d-axis. If the afps_heterogeneous_tile_type of the current mosaic strip is equal to 0, the value of vui_anchor_point[d] shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1+1 -1. If vui_anchor_point[d] does not exist, it shall be inferred to be equal to 0. The number of bits used to represent vui_anhor_point[d] is asps_geometry_3d_bit_depth_minus1+1.
  • vui_ankor_point shall be in the range of 0 to 2 asps_geometry_3d_bit_depth_minus1_for_miv -1. If vui_anchor_point[d] does not exist, it shall be inferred to be equal to 0.
  • the number of bits used to represent vui_ancor_point[d] is asps_geometry_3d_bit_depth_minus1_for_miv + 1. d values equal to 0, 1, and 2 correspond to the X, Y, and Z axes respectively.
  • Multi-viewpoint standard available for use with ISO/IEC 23090-12
  • This process expands the integer depth values of the mosaic into floating point depth values in scene coordinates (e.g. meters).
  • Integer depth values may be scaled to an implementation-defined bit depth and range 0...maxSampleD. Otherwise, maxSampleD is set to 2asps_geometry_2d_bit_depth_minus1+1 –1.
  • This process decodes the reconstructed volume frames and reconstructs the MPI frames from the bitstream where ptc_restricted_geometry_flag is equal to 1.
  • Inputs to this process include:
  • variable atlasID which is the mosaic ID
  • AspsFrameHeight[atlasID] and AspsFrame Width[atlasID] represent the number of rows and columns of the mosaic frame respectively;
  • the size of the 3D array of texFrame is 3 ⁇ AspsFrameHeight[atlasID] ⁇ AspsFrameworkWidth[atlasID];
  • variable maxDepthSampleValue which indicates the maximum value of the coded geometry sample, is set to 2 (asps_geometry_3d_bit_depth_minus1+1) –1; in the special case, if asps_heterogeneous_miv_extension_present_flag is equal to 1, the variable maxDepthSampleValue, which indicates the maximum value of the coded geometry sample, is set to 2 (asps_geometry_3d_bit_depth_minus1_for_miv+1) –1.
  • maxNbLayers indicates the maximum number of depth layers of MPI, which is set to maxDepthSampleValue+1.
  • each strip is a collection of sub-block images of multi-viewpoints or a collection of sub-block images of point clouds.
  • the existing standard can only realize the existence of one type of strip in a spliced image. Therefore, it is necessary to extend the relevant standards to distinguish whether there are multi-view type strips and point cloud type strips in a spliced image at the same time.
  • decoding case four will parse the heterogeneously added related ASPS syntax elements and AFPS syntax elements for multi-viewpoints in ASPS and AFPS, while decoding case three will package these related syntax elements in a new parameter set for parsing.
  • decoding case 4 uses a newly added syntax element afps_heterogeneous_type_extension_present_flag to indicate whether the slice type needs to be determined for each slice.
  • the embodiment of the present application is used to implement the coding and decoding scheme of multi-viewpoint mosaics, point cloud mosaics, and heterogeneous mixed mosaics in the code stream, and expands the relevant standards. It has the following advantages: 1) For application scenarios composed of data in different formats, this method can be used to provide real-time immersive video interaction services for data in different formats (such as 3D grids, 3D point clouds, multi-view images, etc.), promoting the development of VR/AR/MR industries; 2) Compared with encoding the multi-viewpoint video images and point cloud format data separately and calling their respective decoders to independently decode the multiple signals, the number of decoders to be called is small, the processing pixel rate of the decoders is fully utilized, and the hardware requirements are reduced; 3) The rendering advantages of data from different formats (point clouds, etc.) are retained to improve the synthesis quality of the image; 4) The reconstruction quality and coding performance of heterogeneous data are further improved.
  • FIG11 is a schematic block diagram of an encoding device provided by an embodiment of the present application.
  • the encoding device 110 is applied to an encoder. As shown in FIG11 , the encoding device 110 includes:
  • the processing unit 1101 is configured to process the visual media content in at least two expression formats to obtain at least two isomorphic blocks;
  • the splicing unit 1102 is configured to splice the at least two isomorphic blocks to obtain a splicing graph and splicing graph information, wherein when the splicing graph is a heterogeneous mixed splicing graph, the splicing graph includes at least two isomorphic blocks, and different isomorphic blocks correspond to different visual media content expression formats and different isomorphic block information;
  • the encoding unit 1103 is configured to encode the splicing graph and the splicing graph information to obtain a code stream.
  • the spliced graph information includes a first syntax element, and it is determined according to the first syntax element whether the spliced graph is a heterogeneous mixed spliced graph or a homogeneous spliced graph.
  • the first syntax element includes a first sub-syntax element and a second sub-syntax element; and determining, according to the first syntax element, that the spliced graph is a heterogeneous mixed spliced graph or a homogeneous spliced graph includes:
  • the value of the first sub-syntax element is equal to the value of the second sub-syntax element, it is determined according to the values whether the spliced graph is a heterogeneous mixed spliced graph or a homogeneous spliced graph.
  • determining whether the spliced image is a heterogeneous mixed spliced image or a homogeneous spliced image according to the value includes: if the value is a first preset value, determining that the spliced image is a heterogeneous mixed spliced image; if the value is a second preset value, determining that the spliced image is a homogeneous spliced image.
  • determining, according to the value, whether the spliced graph is a heterogeneous mixed spliced graph or a homogeneous spliced graph includes:
  • the spliced graph is a heterogeneous mixed spliced graph including homogeneous blocks in a first expression format and a second expression format, wherein the first expression format and the second expression format are different expression formats;
  • the spliced graph is an isomorphic spliced graph including isomorphic blocks in the first expression format
  • the spliced graph is an isomorphic spliced graph including isomorphic blocks in the second expression format.
  • the first sub-syntax element is a syntax element of a mosaic image sequence parameter set ASPS
  • the second sub-syntax element is a syntax element of a mosaic image frame parameter set AFPS.
  • the method includes: parsing a first sub-syntax element in ASPS; determining whether the spliced graph is a heterogeneous mixed spliced graph or a homogeneous spliced graph according to a value of the first sub-syntax element;
  • the second sub-syntax element is parsed in AFPS; and the splicing graph is determined to be a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the value of the second sub-syntax element.
  • the splicing graph information does not include the first syntax element, and the splicing graph is determined to be a homogeneous splicing graph.
  • the spliced graph information when the spliced graph is a heterogeneous mixed spliced graph, the spliced graph information further includes a second syntax element; and the expression format of the homogeneous blocks in the spliced graph is determined according to the second syntax element.
  • determining the expression format of the isomorphic blocks in the splicing graph according to the second grammatical element includes: when the value of the second grammatical element of the i-th block is a sixth preset value, determining that the expression format of the i-th block is the first expression format; when the value of the second grammatical element of the i-th block is a seventh preset value, determining that the expression format of the i-th block is the second expression format.
  • the second syntax element includes: a third sub-syntax element and a fourth sub-syntax element;
  • the method includes: parsing a first sub-syntax element in ASPS; determining whether the spliced graph is a heterogeneous mixed spliced graph or a homogeneous spliced graph according to the value of the first sub-syntax element; parsing a second sub-syntax element and a third sub-syntax element in AFPS; determining whether the spliced graph is a heterogeneous mixed spliced graph or a homogeneous spliced graph according to the value of the second sub-syntax element; when the spliced graph is determined to be a heterogeneous mixed spliced graph according to the value of the third sub-syntax element, determining the expression format of the homogeneous blocks in the spliced graph according to the fourth sub-syntax element.
  • the fourth syntax element of each homogeneous block is parsed, and the value of the fourth syntax element determines the expression format of each homogeneous block.
  • the mosaic graph information when the mosaic graph is a heterogeneous mixed mosaic graph, includes at least two types of isomorphic block information, wherein isomorphic blocks in different expression formats correspond to different isomorphic block information.
  • the isomorphic block information includes syntax elements of ASPS and syntax elements of AFPS; different isomorphic block information corresponds to different syntax elements of ASPS and syntax elements of AFPS.
  • the isomorphic block when the isomorphic block is a multi-view video block, it corresponds to first isomorphic block information; when the isomorphic block is a point cloud block, it corresponds to second isomorphic block information; the first isomorphic block information and the second isomorphic block information include shared syntax elements of the ASPS parameter set and syntax elements of the AFPS parameter set; the first isomorphic block information also includes extended syntax elements of the ASPS parameter set and extended syntax elements of the AFPS parameter set.
  • the isomorphic block when the isomorphic block is a multi-view video block, it corresponds to first isomorphic block information; when the isomorphic block is a point cloud block, it corresponds to second isomorphic block information; the first isomorphic block information and the second isomorphic block information include shared syntax elements of the ASPS parameter set and syntax elements of the AFPS parameter set; the first isomorphic block information also includes a first extended syntax element of the ASPS parameter set, which is used to represent the bit depth of geometric coordinates of reconstructed geometric content.
  • the parameter set sub-codestream of the codestream includes a third syntax element, and the codestream corresponding to the visual media content including at least one expression format in the codestream is determined according to the third syntax element.
  • determining the codestream corresponding to the visual media content in at least one expression format in the codestream according to the third syntax element includes: the third syntax element is a first numerical value, determining that the codestream simultaneously includes the codestream corresponding to the visual media content in the first expression format and the codestream corresponding to the visual media content in the second expression format; the third syntax element is a second numerical value, determining that the codestream includes the codestream corresponding to the visual media content in the first expression format; the third syntax element is a third numerical value, determining that the codestream includes the codestream corresponding to the visual media content in the second expression format.
  • the encoding unit 1103 is configured to encode the splicing graph to obtain a video compression sub-stream; encode the splicing graph information to obtain a splicing graph information sub-stream; and synthesize the video compression sub-stream and the splicing graph information sub-stream into the stream.
  • the representation format is a multi-view video, a point cloud, or a mesh.
  • the heterogeneous mixed mosaic graph is at least one of the following: a single-attribute heterogeneous mixed mosaic graph and a multi-attribute heterogeneous mixed mosaic graph;
  • the homogeneous mosaic graph includes at least one of the following: a single-attribute homogeneous mosaic graph and a multi-attribute homogeneous mosaic graph.
  • FIG12 is a schematic block diagram of a decoding device provided by an embodiment of the present application.
  • the decoding device 120 is applied to a decoder. As shown in FIG12 , the decoding device 120 includes:
  • a decoding unit 1201 is configured to decode the bitstream to obtain a splicing graph and splicing graph information
  • the splitting unit 1202 is configured to obtain at least two types of isomorphic blocks and isomorphic block information according to the mosaic and the mosaic information when the mosaic is a heterogeneous mixed mosaic; wherein different types of isomorphic blocks correspond to different visual media content expression formats and different isomorphic block information;
  • the splitting unit 1202 is configured to obtain a kind of isomorphic block and isomorphic block information according to the splicing graph and the splicing graph information when the splicing graph is a isomorphic splicing graph;
  • the processing unit 1203 is configured to obtain visual media content in at least two expression formats according to the isomorphic blocks and the isomorphic block information.
  • the spliced graph information includes a first syntax element, and it is determined according to the first syntax element whether the spliced graph is a heterogeneous mixed spliced graph or a homogeneous spliced graph.
  • the first syntax element includes a first sub-syntax element and a second sub-syntax element; determining the splicing graph as a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the first syntax element includes: if the value of the first sub-syntax element is equal to the value of the second sub-syntax element, determining the splicing graph as a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the values.
  • determining whether the spliced image is a heterogeneous mixed spliced image or a homogeneous spliced image according to the value includes: if the value is a first preset value, determining that the spliced image is a heterogeneous mixed spliced image; if the value is a second preset value, determining that the spliced image is a homogeneous spliced image.
  • determining whether the splicing graph is a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the value includes: if the value is a third preset value, determining that the splicing graph is a heterogeneous mixed splicing graph including homogeneous blocks in a first expression format and a second expression format, wherein the first expression format and the second expression format are different expression formats; if the value is a fourth preset value, determining that the splicing graph is an isomorphic splicing graph including homogeneous blocks in the first expression format; if the value is a fifth preset value, determining that the splicing graph is an isomorphic splicing graph including homogeneous blocks in the second expression format.
  • the first sub-syntax element is a syntax element of a mosaic image sequence parameter set ASPS
  • the second sub-syntax element is a syntax element of a mosaic image frame parameter set AFPS.
  • the splicing graph information does not include the first syntax element, and the splicing graph is determined to be a homogeneous splicing graph.
  • the spliced graph information when the spliced graph is a heterogeneous mixed spliced graph, the spliced graph information further includes a second syntax element; and the expression format of the homogeneous blocks in the spliced graph is determined according to the second syntax element.
  • determining the expression format of the isomorphic blocks in the splicing graph according to the second grammatical element includes: when the value of the second grammatical element of the i-th block is a sixth preset value, determining that the expression format of the i-th block is the first expression format; when the value of the second grammatical element of the i-th block is a seventh preset value, determining that the expression format of the i-th block is the second expression format.
  • the second syntax element includes: a third sub-syntax element and a fourth sub-syntax element;
  • the method includes: parsing a first sub-syntax element in ASPS; determining whether the splicing graph is a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the value of the first sub-syntax element; parsing a second sub-syntax element and a third sub-syntax element in AFPS; determining whether the splicing graph is a heterogeneous mixed splicing graph or a homogeneous splicing graph according to the value of the second sub-syntax element; when determining that the splicing graph is a heterogeneous mixed splicing graph according to the value of the third sub-syntax element, determining the expression format of the homogeneous blocks in the splicing graph according to the fourth sub-syntax element.
  • the mosaic graph information when the mosaic graph is a heterogeneous mixed mosaic graph, includes at least two types of isomorphic block information, wherein isomorphic blocks in different expression formats correspond to different isomorphic block information.
  • the spliced graph when the spliced graph is a heterogeneous mixed spliced graph, at least two isomorphic blocks and isomorphic block information are obtained according to the spliced graph and the spliced graph information, including: when the spliced graph is a heterogeneous mixed spliced graph, the spliced graph is split into at least two isomorphic blocks; and according to the expression format of the at least two isomorphic blocks, the isomorphic block information corresponding to the at least two isomorphic blocks is obtained from the spliced graph information.
  • the isomorphic block information includes syntax elements of ASPS and syntax elements of AFPS; different isomorphic block information corresponds to different syntax elements of ASPS and syntax elements of AFPS.
  • the isomorphic block when the isomorphic block is a multi-view video block, it corresponds to first isomorphic block information, and when the isomorphic block is a point cloud block, it corresponds to second isomorphic block information; the first isomorphic block information and the second isomorphic block information include the shared syntax elements of the ASPS parameter set and the syntax elements of the AFPS parameter set.
  • the first isomorphic block information also includes extended syntax elements of the ASPS parameter set and the extended syntax elements of the AFPS parameter set.
  • the first isomorphic block information also includes the first extended syntax element of the ASPS parameter set, which is used to indicate the bit depth of the geometric coordinates of the reconstructed geometric content.
  • the isomorphic block when the isomorphic block is a multi-view video block, it corresponds to first isomorphic block information; when the isomorphic block is a point cloud block, it corresponds to second isomorphic block information; the first isomorphic block information and the second isomorphic block information include shared syntax elements of the ASPS parameter set and syntax elements of the AFPS parameter set; the first isomorphic block information also includes a first extended syntax element of the ASPS parameter set, which is used to represent the bit depth of geometric coordinates of reconstructed geometric content.
  • the parameter set sub-codestream of the codestream includes a third syntax element, and the codestream corresponding to the visual media content including at least one expression format in the codestream is determined according to the third syntax element.
  • determining the codestream corresponding to the visual media content in at least one expression format in the codestream according to the third syntax element includes: the third syntax element is a first numerical value, determining that the codestream simultaneously includes the codestream corresponding to the visual media content in the first expression format and the codestream corresponding to the visual media content in the second expression format; the third syntax element is a second numerical value, determining that the codestream includes the codestream corresponding to the visual media content in the first expression format; the third syntax element is a third numerical value, determining that the codestream includes the codestream corresponding to the visual media content in the second expression format.
  • decoding the bitstream to obtain the splicing graph and the splicing graph information includes: determining, according to the second syntax element, a bitstream corresponding to visual media content in at least two expression formats in the bitstream, and decoding the bitstream to obtain a heterogeneous mixed splicing graph and the splicing graph information.
  • the decoding unit 1201 is configured to decode the video compression sub-stream to obtain the splicing graph; and decode the splicing graph information sub-stream to obtain the splicing graph information.
  • the representation format is a multi-view video, a point cloud, or a mesh.
  • the heterogeneous mixed mosaic graph is at least one of the following: a single-attribute heterogeneous mixed mosaic graph and a multi-attribute heterogeneous mixed mosaic graph;
  • the homogeneous mosaic graph includes at least one of the following: a single-attribute homogeneous mosaic graph and a multi-attribute homogeneous mosaic graph.
  • the functional unit can be implemented in hardware form, can be implemented by instructions in software form, and can also be implemented by a combination of hardware and software units.
  • the steps of the method embodiment in the embodiment of the present application can be completed by the hardware integrated logic circuit and/or software form instructions in the processor, and the steps of the method disclosed in the embodiment of the present application can be directly embodied as a hardware decoding processor to perform, or a combination of hardware and software units in the decoding processor to perform.
  • the software unit can be located in a mature storage medium in the field such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, an electrically erasable programmable memory, a register, etc.
  • the storage medium is located in a memory, and the processor reads the information in the memory, and completes the steps in the above method embodiment in conjunction with its hardware.
  • FIG13 is a schematic block diagram of an encoder provided by an embodiment of the present application. As shown in FIG13 , the encoder 1310 includes:
  • FIG14 is a schematic block diagram of a decoder provided in an embodiment of the present application.
  • a decoder 1410 includes:
  • the processor may include but is not limited to:
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • the memory includes but is not limited to:
  • Non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM) or flash memory.
  • the volatile memory can be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link DRAM
  • Direct Rambus RAM Direct Rambus RAM, DR RAM
  • each functional module in this embodiment can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or software functional modules.
  • Figure 15 shows a schematic diagram of the composition structure of a coding and decoding system provided in an embodiment of the present application.
  • the coding and decoding system 150 may include an encoder 1501 and a decoder 1502.
  • the encoder 1501 may be a device integrated with the encoding device described in the above embodiment;
  • the decoder 1502 may be a device integrated with the decoding device described in the above embodiment.
  • both the encoder 1501 and the decoder 1502 can utilize the color component information of adjacent reference pixels and the pixels to be predicted to realize the calculation of the weighting coefficients corresponding to the pixels to be predicted; and different reference pixels can have different weighting coefficients. Applying this weighting coefficient to the chrominance prediction of the pixels to be predicted in the current block can not only improve the accuracy of the chrominance prediction and save bit rate, but also improve the encoding and decoding performance.
  • the embodiment of the present application also provides a chip for implementing the above encoding and decoding method.
  • the chip includes: a processor for calling and running a computer program from a memory, so that an electronic device equipped with the chip executes the above encoding and decoding method.
  • the embodiment of the present application also provides a computer storage medium, in which a computer program is stored, and when the computer program is executed by the second processor, the encoding method of the encoder is implemented; or when the computer program is executed by the first processor, the decoding method of the decoder is implemented.
  • the embodiment of the present application also provides a computer program product containing instructions, and when the instructions are executed by the computer, the computer executes the method of the above method embodiment.
  • the present application also provides a code stream, which is generated according to the above encoding method.
  • the code stream includes the above first syntax element, or includes the second syntax element and the third syntax element.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions can be transmitted from a website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, server or data center.
  • the computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more available media integrated.
  • the available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a solid state drive (solid state disk, SSD)), etc.
  • a magnetic medium e.g., a floppy disk, a hard disk, a tape
  • an optical medium e.g., a digital video disc (digital video disc, DVD)
  • a semiconductor medium e.g., a solid state drive (solid state disk, SSD)
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
  • each functional unit in each embodiment of the present application may be integrated into a processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • first, second, third, etc. may be used in this application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other and are not necessarily used to describe a specific order or sequence.
  • first information may also be referred to as second information
  • second information may also be referred to as first information
  • second information may appear before, after, or at the same time as the first information.
  • the present application provides a coding and decoding method, device, encoder, decoder and storage medium.
  • isomorphic blocks of different expression formats are spliced into a heterogeneous mixed splicing graph
  • isomorphic blocks of different expression formats are spliced in a heterogeneous mixed splicing graph for coding and decoding, which can reduce the number of encoders and decoders called, reduce the implementation cost, and improve ease of use.
  • some high-level parameters of blocks of different expression formats may be unequal, so that heterogeneous data provides more appropriate high-level parameters, which can effectively improve the coding efficiency, that is, reduce the bit rate or improve the quality of reconstructed multi-view video or point cloud video.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请提供一种编解码方法、装置、编码器、解码器及存储介质,针对包括一种或多种表达格式的视觉媒体内容的应用场景,将不同表达格式的同构区块拼接成一张异构混合拼接图,将相同表达格式的的同构区块拼接成一张同构拼接图,将得到的拼接图和拼接图信息写入码流,且针对异构混合拼接图允许不同表达格式的同构区块对应不同的信息。这样,将不同表达格式的数据进行混合编码,能够减少调用的编码器和解码器的个数,降低实现代价,提高易用性。而且,在对异构混合拼接图,不同表达格式区块的某些高层参数可以不相等,从而异构数据提供更合适的高层参数,能有效提升编码效率即减少码率或提高重构多视点视频或点云视频的质量。

Description

一种编解码方法、装置、编码器、解码器及存储介质
相关申请的交叉引用
本申请基于申请号为PCT/CN2022/125525、申请日为2022年10月14日、发明创造名称为“一种编解码方法、装置、编码器、解码器及存储介质”的在先PCT国际申请提出,并要求该在先PCT国际申请的优先权,该在先PCT国际申请的全部内容在此以全文引入的方式引入本申请作为参考。
技术领域
本申请涉及图像处理技术领域,尤其涉及一种编解码方法、装置、编码器、解码器及存储介质。
背景技术
在三维应用场景中,例如虚拟现实(Virtual Reality,VR)、增强现实(Augmented Reality,AR)、混合现实(Mix Reality,MR)等应用场景中,在同一个场景中可能出现表达格式不同的视觉媒体对象。例如在同一个三维场景中,以视频表达了场景背景与部分人物和物件、以三维点云或三维网格表达了另一部分人物。
在压缩编码时分别采用多视点视频编码、点云编码、网格编码,会比全部投影成多视点视频编码更能保持原表达格式的有效信息,提高观看时所渲染的观看视窗的质量,提高码率-质量的综合效率。
但是,目前的编解码技术,对多视点视频、点云编码和网格网格分别进行编解码,需要调用的编解码器个数较多,使得编解码代价大。
发明内容
本申请实施例提供了一种编解码方法、装置、编码器、解码器及存储介质。
第一方面,本申请提供了一种解码方法,包括:解码码流,得到拼接图和拼接图信息;所述拼接图为异构混合拼接图时,根据所述拼接图和所述拼接图信息,得到至少两种同构区块和同构区块信息;其中,不同种同构区块对应不同的视觉媒体内容表达格式和不同的同构区块信息;所述拼接图为同构拼接图时,根据所述拼接图和所述拼接图信息,得到一种同构区块和同构区块信息;根据所述同构区块和所述同构区块信息,得到至少两种表达格式的视觉媒体内容。
第二方面,本申请提供了一种编码方法,包括:对至少两种表达格式的视觉媒体内容进行处理,得到至少两种同构区块;对所述至少两种同构区块进行拼接,得到拼接图和拼接图信息,其中,所述拼接图为异构混合拼接图时,所述拼接图中包括至少两种同构区块,不同种同构区块对应不同的视觉媒体内容表达格式和不同的同构区块信息;对所述拼接图和拼接图信息进行编码,得到码流。
第三方面,本申请提供了一种解码装置,包括:
解码单元,配置为解码码流,得到拼接图和拼接图信息;
拆分单元,配置为所述拼接图为异构混合拼接图时,根据所述拼接图和所述拼接图信息,得到至少两种同构区块和同构区块信息;其中,不同种同构区块对应不同的视觉媒体内容表达格式和不同的同构区块信息;
所述拆分单元,配置为所述拼接图为同构拼接图时,根据所述拼接图和所述拼接图信息,得到一种同构区块和同构区块信息;
处理单元,配置为根据所述同构区块和所述同构区块信息,得到至少两种表达格式的视觉媒体内容。
第四方面,本申请提供了一种编码装置,应用于编码器,其中,包括:
处理单元,配置为对至少两种表达格式的视觉媒体内容进行处理,得到至少两种同构区块;
拼接单元,配置为对所述至少两种同构区块进行拼接,得到拼接图和拼接图信息,其中,所述拼接图为异构混合拼接图时,所述拼接图中包括至少两种同构区块,不同种同构区块对应不同的视觉媒体内容表达格式和不同的同构区块信息;
编码单元,配置为对所述拼接图和拼接图信息进行编码,得到码流。
第五方面,提供了一种解码器,包括第一存储器和第一处理器;所述第一存储器存储有可在第一处理器上运行的计算机程序,以执行上述第一方面或其各实现方式中的方法。
第六方面,提供了一种编码器,包括第二存储器和第二处理器;所述第二存储器存储有可在第二处 理器上运行的计算机程序,以执行上述第二方面或其各实现方式中的方法。
第七方面,提供了一种编解码系统,包括编码器和解码器。编码器用于执行上述第二方面或其各实现方式中的方法,解码器用于执行上述第一方面或其各实现方式中的方法。
第八方面,提供了一种芯片,用于实现上述第一方面至第二方面中的任一方面或其各实现方式中的方法。具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备执行如上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第九方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十方面,提供了一种计算机程序产品,包括计算机程序指令,该计算机程序指令使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十一方面,提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十二方面,提供了一种码流,码流是基于上述第二方面的编码方法生成的。
基于以上技术方案,针对包括一种或多种表达格式的视觉媒体内容的应用场景,将不同表达格式的同构区块拼接成一张异构混合拼接图,将不同表达格式的数据进行混合编码,能够减少调用的编码器和解码器的个数,降低实现代价,提高易用性。而且,在对异构混合拼接图,不同表达格式区块的某些高层参数可以不相等,从而异构数据提供更合适的高层参数,能有效提升编码效率即减少码率或提高重构多视点视频或点云视频的质量。
附图说明
图1为本申请实施例涉及的一种视频编解码系统的示意性框图;
图2A是本申请实施例涉及的视频编码器的示意性框图;
图2B是本申请实施例涉及的视频解码器的示意性框图;
图3A是多视点视频数据的组织和表达框架图;
图3B是多视点视频数据的拼接图像生成示意图;
图3C是点云数据的组织和表达框架图;
图3D至图3F为不同类型的点云数据示意图;
图4为多视点视频的编码示意图;
图5为多视点视频的解码示意图;
图6为本申请一实施例提供的编码方法流程示意图;
图7为本申请一实施例提供的异构混合拼接图示意图;
图8为本申请一实施例提供的同构拼接图示意图;
图9为本申请实施例提供的一种解码方法的示意性流程图;
图10为本申请实施例提供的V3C比特流结构的一个示意图;
图11为本申请一实施例提供的编码装置的示意性框图;
图12为本申请一实施例提供的解码装置的示意性框图;
图13为本申请一实施例提供的编码器的示意性框图;
图14为本申请一实施例提供的解码器的示意性框图;
图15为本申请实施例提供的一种编解码系统的组成结构示意图。
具体实施方式
本申请可应用于图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视频编解码领域、实时视频编解码领域等。例如,本申请的方案可结合至音视频编码标准(audio video coding standard,简称AVS),例如,H.264/音视频编码(audio video coding,简称AVC)标准,H.265/高效视频编码(high efficiency video coding,简称HEVC)标准以及H.266/多功能视频编码(versatile video coding,简称VVC)标准。或者,本申请的方案可结合至其它专属或行业标准而操作,所述标准包含ITU-TH.261、ISO/IECMPEG-1Visual、ITU-TH.262或ISO/IECMPEG-2Visual、ITU-TH.263、ISO/IECMPEG-4Visual,ITU-TH.264(还称为ISO/IECMPEG-4AVC),包含可分级视频编解码(SVC)及多视图视频编解码(MVC)扩展。应理解,本申请的技术不限于任何特定编解码标准或技术。
高自由度沉浸式编码系统根据任务线可大致分为以下几个环节:数据采集、数据的组织与表达、数 据编码压缩、数据解码重建、数据合成渲染,最终将目标数据呈现给用户。
本申请实施例涉及的编码主要为视频编解码,为了便于理解,首先结合图1对本申请实施例涉及的视频编解码系统进行介绍。
图1为本申请实施例涉及的一种视频编解码系统的示意性框图。需要说明的是,图1只是一种示例,本申请实施例的视频编解码系统包括但不限于图1所示。如图1所示,该视频编解码系统100包含编码设备110和解码设备120。其中编码设备用于对视频数据进行编码(可以理解成压缩)产生码流,并将码流传输给解码设备。解码设备对编码设备编码产生的码流进行解码,得到解码后的视频数据。
本申请实施例的编码设备110可以理解为具有视频编码功能的设备,解码设备120可以理解为具有视频解码功能的设备,即本申请实施例对编码设备110和解码设备120包括更广泛的装置,例如包含智能手机、台式计算机、移动计算装置、笔记本(例如,膝上型)计算机、平板计算机、机顶盒、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机等。
在一些实施例中,编码设备110可以经由信道130将编码后的视频数据(如码流)传输给解码设备120。信道130可以包括能够将编码后的视频数据从编码设备110传输到解码设备120的一个或多个媒体和/或装置。
在一个实例中,信道130包括使编码设备110能够实时地将编码后的视频数据直接发射到解码设备120的一个或多个通信媒体。在此实例中,编码设备110可根据通信标准来调制编码后的视频数据,且将调制后的视频数据发射到解码设备120。其中通信媒体包含无线通信媒体,例如射频频谱,可选的,通信媒体还可以包含有线通信媒体,例如一根或多根物理传输线。
在另一实例中,信道130包括存储介质,该存储介质可以存储编码设备110编码后的视频数据。存储介质包含多种本地存取式数据存储介质,例如光盘、DVD、快闪存储器等。在该实例中,解码设备120可从该存储介质中获取编码后的视频数据。
在另一实例中,信道130可包含存储服务器,该存储服务器可以存储编码设备110编码后的视频数据。在此实例中,解码设备120可以从该存储服务器中下载存储的编码后的视频数据。可选的,该存储服务器可以存储编码后的视频数据且可以将该编码后的视频数据发射到解码设备120,例如web服务器(例如,用于网站)、文件传送协议(FTP)服务器等。
一些实施例中,编码设备110包含视频编码器112及输出接口113。其中,输出接口113可以包含调制器/解调器(调制解调器)和/或发射器。
在一些实施例中,编码设备110除了包括视频编码器112和输入接口113外,还可以包括视频源111。
视频源111可包含视频采集装置(例如,视频相机)、视频存档、视频输入接口、计算机图形系统中的至少一个,其中,视频输入接口用于从视频内容提供者处接收视频数据,计算机图形系统用于产生视频数据。
视频编码器112对来自视频源111的视频数据进行编码,产生码流。视频数据可包括一个或多个图像(picture)或图像序列(sequence of pictures)。码流以比特流的形式包含了图像或图像序列的编码信息。编码信息可以包含编码图像数据及相关联数据。相关联数据可包含序列参数集(sequence parameter set,简称SPS)、图像参数集(picture parameter set,简称PPS)及其它语法结构。SPS可含有应用于一个或多个序列的参数。PPS可含有应用于一个或多个图像的参数。语法结构是指码流中以指定次序排列的零个或多个语法元素的集合。
视频编码器112经由输出接口113将编码后的视频数据直接传输到解码设备120。编码后的视频数据还可存储于存储介质或存储服务器上,以供解码设备120后续读取。
在一些实施例中,解码设备120包含输入接口121和视频解码器122。在一些实施例中,解码设备120除包括输入接口121和视频解码器122外,还可以包括显示装置123。
其中,输入接口121包含接收器及/或调制解调器。输入接口121可通过信道130接收编码后的视频数据。
视频解码器122用于对编码后的视频数据进行解码,得到解码后的视频数据,并将解码后的视频数据传输至显示装置123。
显示装置123显示解码后的视频数据。显示装置123可与解码设备120整合或在解码设备120外部。显示装置123可包括多种显示装置,例如液晶显示器(LCD)、等离子体显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。
此外,图1仅为实例,本申请实施例的技术方案不限于图1,例如本申请的技术还可以应用于单侧的视频编码或单侧的视频解码。
下面对本申请实施例涉及的视频编码框架进行介绍。
图2A是本申请实施例涉及的视频编码器的示意性框图。应理解,该视频编码器200可用于对图像进行有损压缩(lossy compression),也可用于对图像进行无损压缩(lossless compression)。该无损压缩可以是视觉无损压缩(visually lossless compression),也可以是数学无损压缩(mathematically lossless compression)。
该视频编码器200可应用于亮度色度(YCbCr,YUV)格式的图像数据上。例如,YUV比例可以为4:2:0、4:2:2或者4:4:4,Y表示明亮度(Luma),Cb(U)表示蓝色色度,Cr(V)表示红色色度,U和V表示为色度(Chroma)用于描述色彩及饱和度。例如,在颜色格式上,4:2:0表示每4个像素有4个亮度分量,2个色度分量(YYYYCbCr),4:2:2表示每4个像素有4个亮度分量,4个色度分量(YYYYCbCrCbCr),4:4:4表示全像素显示(YYYYCbCrCbCrCbCrCbCr)。
例如,该视频编码器200读取视频数据,针对视频数据中的每帧图像,将一帧图像划分成若干个编码树单元(coding tree unit,CTU),在一些例子中,CTB可被称作“树型块”、“最大编码单元”(Largest Coding unit,简称LCU)或“编码树型块”(coding tree block,简称CTB)。每一个CTU可以与图像内的具有相等大小的像素块相关联。每一像素可对应一个亮度(luminance或luma)采样及两个色度(chrominance或chroma)采样。因此,每一个CTU可与一个亮度采样块及两个色度采样块相关联。一个CTU大小例如为128×128、64×64、32×32等。一个CTU又可以继续被划分成若干个编码单元(Coding Unit,CU)进行编码,CU可以为矩形块也可以为方形块。CU可以进一步划分为预测单元(prediction Unit,简称PU)和变换单元(transform unit,简称TU),进而使得编码、预测、变换分离,处理的时候更灵活。在一种示例中,CTU以四叉树方式划分为CU,CU以四叉树方式划分为TU、PU。
视频编码器及视频解码器可支持各种PU大小。假定特定CU的大小为2N×2N,视频编码器及视频解码器可支持2N×2N或N×N的PU大小以用于帧内预测,且支持2N×2N、2N×N、N×2N、N×N或类似大小的对称PU以用于帧间预测。视频编码器及视频解码器还可支持2N×nU、2N×nD、nL×2N及nR×2N的不对称PU以用于帧间预测。
在一些实施例中,如图2A所示,该视频编码器200可包括:预测单元210、残差单元220、变换/量化单元230、反变换/量化单元240、重建单元250、环路滤波单元260、解码图像缓存270和熵编码单元280。需要说明的是,视频编码器200可包含更多、更少或不同的功能组件。
可选的,在本申请中,当前块(current block)可以称为当前编码单元(CU)或当前预测单元(PU)等。预测块也可称为预测图像块或图像预测块,重建图像块也可称为重建块或图像重建图像块。
在一些实施例中,预测单元210包括帧间预测单元211和帧内估计单元212。由于视频的一个帧中的相邻像素之间存在很强的相关性,在视频编解码技术中使用帧内预测的方法消除相邻像素之间的空间冗余。由于视频中的相邻帧之间存在着很强的相似性,在视频编解码技术中使用帧间预测方法消除相邻帧之间的时间冗余,从而提高编码效率。
帧间预测单元211可用于帧间预测,帧间预测可以包括运动估计(motion estimation)和运动补偿(motion compensation),可以参考不同帧的图像信息,帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块。运动信息包括参考帧所在的参考帧列表,参考帧索引,以及运动矢量。运动矢量可以是整像素的或者是分像素的,如果运动矢量是分像素的,那么需要在参考帧中使用插值滤波做出所需的分像素的块,这里把根据运动矢量找到的参考帧中的整像素或者分像素的块叫参考块。有的技术会直接把参考块作为预测块,有的技术会在参考块的基础上再处理生成预测块。在参考块的基础上再处理生成预测块也可以理解为把参考块作为预测块然后再在预测块的基础上处理生成新的预测块。
帧内估计单元212只参考同一帧图像的信息,预测当前码图像块内的像素信息,用于消除空间冗余。帧内预测所使用的帧可以为I帧。
帧内预测有多种预测模式,以国际数字视频编码标准H系列为例,H.264/AVC标准有8种角度预测模式和1种非角度预测模式,H.265/HEVC扩展到33种角度预测模式和2种非角度预测模式。HEVC使用的帧内预测模式有平面模式(Planar)、DC和33种角度模式,共35种预测模式。VVC使用的帧内模式有Planar、DC和65种角度模式,共67种预测模式。
需要说明的是,随着角度模式的增加,帧内预测将会更加精确,也更加符合对高清以及超高清数字视频发展的需求。
残差单元220可基于CU的像素块及CU的PU的预测块来产生CU的残差块。举例来说,残差单元220可产生CU的残差块,使得残差块中的每一采样具有等于以下两者之间的差的值:CU的像素块中的采样,及CU的PU的预测块中的对应采样。
变换/量化单元230可量化变换系数。变换/量化单元230可基于与CU相关联的量化参数(QP)值来 量化与CU的TU相关联的变换系数。视频编码器200可通过调整与CU相关联的QP值来调整应用于与CU相关联的变换系数的量化程度。
反变换/量化单元240可分别将逆量化及逆变换应用于量化后的变换系数,以从量化后的变换系数重建残差块。
重建单元250可将重建后的残差块的采样加到预测单元210产生的一个或多个预测块的对应采样,以产生与TU相关联的重建图像块。通过此方式重建CU的每一个TU的采样块,视频编码器200可重建CU的像素块。
环路滤波单元260用于对反变换与反量化后的像素进行处理,弥补失真信息,为后续编码像素提供更好的参考,例如可执行消块滤波操作以减少与CU相关联的像素块的块效应。
在一些实施例中,环路滤波单元260包括去块滤波单元和样点自适应补偿/自适应环路滤波(SAO/ALF)单元,其中去块滤波单元用于去方块效应,SAO/ALF单元用于去除振铃效应。
解码图像缓存270可存储重建后的像素块。帧间预测单元211可使用含有重建后的像素块的参考图像来对其它图像的PU执行帧间预测。另外,帧内估计单元212可使用解码图像缓存270中的重建后的像素块来对在与CU相同的图像中的其它PU执行帧内预测。
熵编码单元280可接收来自变换/量化单元230的量化后的变换系数。熵编码单元280可对量化后的变换系数执行一个或多个熵编码操作以产生熵编码后的数据。
图2B是本申请实施例涉及的视频解码器的示意性框图。
如图2B所示,视频解码器300包含:熵解码单元310、预测单元320、反量化/变换单元330、重建单元340、环路滤波单元350及解码图像缓存360。需要说明的是,视频解码器300可包含更多、更少或不同的功能组件。
视频解码器300可接收码流。熵解码单元310可解析码流以从码流提取语法元素。作为解析码流的一部分,熵解码单元310可解析码流中的经熵编码后的语法元素。预测单元320、反量化/变换单元330、重建单元340及环路滤波单元350可根据从码流中提取的语法元素来解码视频数据,即产生解码后的视频数据。
在一些实施例中,预测单元320包括帧间预测单元321和帧内估计单元322。
帧内估计单元322可执行帧内预测以产生PU的预测块。帧内估计单元322可使用帧内预测模式以基于空间相邻PU的像素块来产生PU的预测块。帧内估计单元322还可根据从码流解析的一个或多个语法元素来确定PU的帧内预测模式。
帧间预测单元321可根据从码流解析的语法元素来构造第一参考图像列表(列表0)及第二参考图像列表(列表1)。此外,如果PU使用帧间预测编码,则熵解码单元310可解析PU的运动信息。帧间预测单元321可根据PU的运动信息来确定PU的一个或多个参考块。帧间预测单元321可根据PU的一个或多个参考块来产生PU的预测块。
反量化/变换单元330可逆量化(即,解量化)与TU相关联的变换系数。反量化/变换单元330可使用与TU的CU相关联的QP值来确定量化程度。
在逆量化变换系数之后,反量化/变换单元330可将一个或多个逆变换应用于逆量化变换系数,以便产生与TU相关联的残差块。
重建单元340使用与CU的TU相关联的残差块及CU的PU的预测块以重建CU的像素块。例如,重建单元340可将残差块的采样加到预测块的对应采样以重建CU的像素块,得到重建图像块。
环路滤波单元350可执行消块滤波操作以减少与CU相关联的像素块的块效应。
视频解码器300可将CU的重建图像存储于解码图像缓存360中。视频解码器300可将解码图像缓存360中的重建图像作为参考图像用于后续预测,或者,将重建图像传输给显示装置呈现。
视频编解码的基本流程如下:在编码端,将一帧图像划分成块,针对当前块,预测单元210使用帧内预测或帧间预测产生当前块的预测块。残差单元220可基于预测块与当前块的原始块计算残差块,即预测块和当前块的原始块的差值,该残差块也可称为残差信息。该残差块经由变换/量化单元230变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。可选的,经过变换/量化单元230变换与量化之前的残差块可称为时域残差块,经过变换/量化单元230变换与量化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元280接收到变化量化单元230输出的量化后的变化系数,可对该量化后的变化系数进行熵编码,输出码流。例如,熵编码单元280可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。
在解码端,熵解码单元310可解析码流得到当前块的预测信息、量化系数矩阵等,预测单元320基于预测信息对当前块使用帧内预测或帧间预测产生当前块的预测块。反量化/变换单元330使用从码流得到的量化系数矩阵,对量化系数矩阵进行反量化、反变换得到残差块。重建单元340将预测块和残 差块相加得到重建块。重建块组成重建图像,环路滤波单元350基于图像或基于块对重建图像进行环路滤波,得到解码图像。编码端同样需要和解码端类似的操作获得解码图像。该解码图像也可以称为重建图像,重建图像可以为后续的帧作为帧间预测的参考帧。
需要说明的是,编码端确定的块划分信息,以及预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息等在必要时携带在码流中。解码端通过解析码流及根据已有信息进行分析确定与编码端相同的块划分信息,预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息,从而保证编码端获得的解码图像和解码端获得的解码图像相同。
上述是基于块的混合编码框架下的视频编解码器的基本流程,随着技术的发展,该框架或流程的一些模块或步骤可能会被优化,本申请适用于该基于块的混合编码框架下的视频编解码器的基本流程,但不限于该框架及流程。
在一些应用场景中,在同一个三维场景中同时出现多种异构内容,例如出现多视点视频和点云。对于这种情况,目前的编解码方式至少包括如下两种:
方式一,对于多视点视频采用MPEG(Moving Picture Experts Group,动态图像专家组)沉浸式视频(MPEG Immersive Video,简称MIV)技术进行编解码,对于点云则采用点云视频压缩(Video based Point Cloud Compression,简称VPCC)技术进行编解码。
下面对MIV技术和VPCC技术进行介绍。
MIV技术:为了降低传输像素率的同时尽可能保留场景信息,以便保证有足够的信息用于渲染目标视图,MPEG-I采用的方案如图3A所示,选择有限数量视点作为基础视点且尽可能表达场景的可视范围,基础视点作为完整图像传输,去除剩余非基础视点与基础视点之间的冗余像素,即仅保留非重复表达的有效信息,再将有效信息提取为子块图像与基础视点图像进行重组织,形成更大的矩形图像,该矩形图像称为拼接图像,图3A和图3B给出拼接图像的生成示意过程。将拼接图像送入编解码器压缩重建,并且子块图像拼接信息有关的辅助数据也一并送入编码器形成码流。
VPCC的编码方法是将点云投影成二维图像或视频,将三维信息转换成二维信息编码。图3C是VPCC的编码框图,码流大致分为四个部分,几何码流是几何深度图编码产生的码流,用来表示点云的几何信息;属性码流是纹理图编码产生的码流,用来表示点云的属性信息;占用码流是占用图编码产生的码流,用来指示深度图和纹理图中的有效区域;这三种类型的视频都使用视频编码器进行编解码,如图3D至图3F所示。辅助信息码流是子块图像的附属信息编码产生的码流,即V3C标准中的patch data unit相关的部分,指示了每个子块图像的位置和大小等信息。
方式二,多视点视频和点云均使用可视体视频编码(Visual Volumetric Video-based Coding,简称V3C)中的帧打包(frame packing)技术进行编解码。
下面对frame packing技术进行介绍。
以多视点视频为例,示例性的,如图4所示,编码端包括如下步骤:
步骤1,对获取的多视点视频进行编码时,经过一些前处理,生成多视点视频子块(patch),接着,将多视点视频子块进行组织,生成多视点视频拼接图。
例如,图4所示,将多视点视频输入TIMV中进行打包,输出多视点视频拼接图。TIMV为一种MIV的参考软件。本申请实施例的打包可以理解为拼接。
其中,多视点视频拼接图包括多视点视频纹理拼接图、多视点视频几何拼接图,即只包含多视点视频子块。
步骤2,将多视点视频拼接图输入帧打包器,输出多视点视频混合拼接图。
其中,多视点视频混合拼接图包括多视点视频纹理混合拼接图,多视点视频几何混合拼接图,多视点视频纹理与几何混合拼接图。
具体的,如图4所示,将多视点视频拼接图进行帧打包(frame packing),生成多视点视频混合拼接图,每个多视点视频拼接图占用多视点视频混合拼接图的一个区域(region)。相应地,在码流中要为每个区域传送一个标志pin_region_type_id_minus2,这个标志记录了当前区域属于多视点视频纹理拼接图还是多视点视频几何拼接图的信息,在解码端需要利用该信息。
步骤3,使用视频编码器对多视点视频混合拼接图进行编码,得到码流。
示例性的,如图5所示,解码端包括如下步骤:
步骤1,在多视点视频解码时,将获取的码流输入视频解码器中进行解码,得到重建多视点视频混合拼接图。
步骤2,将重建多视点视频混合拼接图输入帧解打包器中,输出重建多视点视频拼接图。
具体的,首先,从码流中获取标志pin_region_type_id_minus2,若确定该pin_region_type_id_minus2是V3C_AVD,则表示当前区域是多视点视频纹理拼接图,则将该当前区域拆分并输出为重建多视点视 频纹理拼接图。
若确定该pin_region_type_id_minus2是V3C_GVD,则表示当前区域是多视点视频几何拼接图,将该当前区域拆分并输出为重建多视点视频几何拼接图。
步骤3,对重建多视点视频拼接图进行解码,得到重建多视点视频。
具体是,对多视点视频纹理拼接图和多视点视频几何拼接图进行解码,得到重建多视点视频。
上面以多视点视频为例对frame packing技术进行解析介绍,对于点云进行frame packing编解码方式,与上述多视点视频基本相同,参照即可,例如使用TMC(一种VPCC的参考软件)对点云进行打包,得到点云拼接图,对点云拼接图输入帧打包器进行帧打包,得到点云混合拼接图,对点云混合拼接图进行拼接,得到点云码流,在此不再赘述。
目前,如果在同一个三维场景中同时出现多种不同表达格式的视觉媒体内容时,则对多种不同表达格式的视觉媒体内容分别进行编解码。例如,对于同一个三维场景中同时出现点云和多视点视频的情况,目前的打包技术是,对点云进行压缩,形成点云压缩码流(即一种V3C码流),对多视点视频信息压缩,得到多视点视频压缩码流(即另一种V3C码流),然后由系统层对压缩码流进行复接,得到融合的三维场景复接码流。解码时,对点云压缩码流和多视点视频压缩码流分别进行解码。但现有技术在对多种不同表达格式的视觉媒体内容进行编解码时,使用的编解码器多,编解码代价高。
为了解决上述技术问题,将不同表达格式的数据进行混合编码,能够减少调用的编码器和解码器的个数,降低实现代价,提高易用性。而且,在对异构混合拼接图,不同表达格式区块的某些高层参数可以不相等,从而异构数据提供更合适的高层参数,能有效提升编码效率即减少码率或提高重构多视点视频或点云视频的质量。
下面结合图6,以编码端为例,对本申请实施例提供的视频编码方法进行介绍。
图6为本申请实施例提供的编码方法的流程示意图,如图6所示,该编码方法包括:
步骤601:对至少两种表达格式的视觉媒体内容进行处理,得到至少两种同构区块;
在三维应用场景中,例如虚拟现实(Virtual Reality,VR)、增强现实(Augmented Reality,AR)、混合现实(Mix Reality,MR)等应用场景中,在同一个场景中可能出现表达格式不同的视觉媒体对象,例如在同一个三维场景中存在,以视频表达场景背景与部分人物和物件、以三维点云或三维网格表达了另一部分人物。
在一些实施例中,视觉媒体内容包括多视点视频、点云和网格等至少两种表达格式的视觉媒体内容。其中,多视点视频可以包括多个视点视频和/或单一视点视频。一种同构区块对应一种表达格式。不同种同构区块对应不同的表达格式。示例性的,至少两种同构区块对应的表达格式包括以下至少两种:多视点视频、点云、网格。
需要说明的是,本申请实施例中每种同构区块中可以包括具备相同表达格式的至少一个同构区块。示例性的,点云格式的同构区块中包括一个或者多个点云区块,多视点视频格式的同构区域中包括一个或多个多视点视频区块,网格格式的同构区块包括一个或多个网格区块。
具体地,对第一表达格式的视觉媒体内容进行处理,得到第一表达格式的同构区块;对第二表达格式的视觉媒体内容进行处理,得到第二表达格式的同构区块。其中,第一表达格式为多视点视频、点云、网格中的一个,第二表达格式为多视点视频、点云、网格中的一个,第一表达格式和第二表达格式为不同表达格式。
需要说明的是,区块可以为具有特定的形状的拼接图,例如为具有特定的长度和/或高度的矩形区域的拼接图。例如,区块包括至少一个子图块,对至少一个子图块进行有序拼接,如按照子图块的面积从大到小,或者按照子图块的长度和/或高度从大到小进行拼接,得到视觉媒体内容对应的区块。可选的,一个区块可以精确映射到拼接图(atlas)中的一个图块。实际应用中,区块也可以称为条带(tile),即点云区块也可以称为点云条带,多视点视频区块也可以称为多视点视频条带,网格区块也可以称为网格条带。
在一些实施例中,区块中的各子图块可以具有块标识(patchID),以对同一区块中的不同子图块进行区别。例如,同一区块中可以包括子图块1(patch1)、子图块2(patch2)和子图块3(patch3)。
进一步的,同构区块是指每个子图块对应的表达格式均相同的区块,例如,同构区块中的各子图块均为多视点视频子图块,或者均为点云子图块等同一表达格式的子图块。同构区块中每个子图块对应的表达格式即该同构区块对应的表达格式。
在一些实施例中,同构区块可以具有区块标识(tileID),以对相同表达格式的不同区块进行区分。例如,点云区块可以包括点云区块1或点云区块2。例如,多个视觉媒体内容中包括点云和多视点视频,对点云进行处理,得到点云区块,点云区块1中包括点云子图块1至子图块3;对多视点视频进行处理,得到多视点视频区块,多视点视频区块中包括多视点视频子图块1至子图块4。
当需要对一个表达格式的视觉媒体内容进行处理时,得到一个表达格式的同构区块。当需要对至少两个视觉媒体内容进行处理时,得到至少两个表达格式的同构区块。为了提高压缩效率,本申请实施例对这至少两个视觉媒体内容进行处理,例如打包(也称为拼接)处理,得到至少两个视觉媒体内容中每个视觉媒体内容对应的区块。例如,可以对至少两个视觉媒体内容对应的子图块(patch)进行拼接得到区块。应注意,本申请实施例对至少两个视觉媒体内容分别进行处理,得到区块的方式不做限制。
在一种可能的实现方式中,视觉媒体内容包括多视点视频和点云两个表达格式的视觉媒体内容,所述对至少两种表达格式的视觉媒体内容进行处理,得到至少两种同构区块,包括:对获取的多视点视频进行投影和去冗余处理后,将不重复像素点连通成多视点视频子图块,且将多视点视频子图块拼接成多视点视频区块;以及对获取的点云进行平行投影,将投影面中的连通点组成点云子图块,且将点云子图块拼接成点云区块。
具体的,对于多视点视频,以MPEG-I为例,选择有限数量视点作为基础视点且尽可能表达场景的可视范围,基础视点作为完整图像传输,去除剩余非基础视点与基础视点之间的冗余像素,即仅保留非重复表达的有效信息,再将有效信息提取为子块图像与基础视点图像进行重组织,形成更大的条带形图像,该条带形图像称为多视点视频区块。
在一些实施例中,上述视觉媒体内容为同一个三维空间中同时呈现的媒体内容。在一些实施例中,上述视觉媒体内容为同一个三维空间中不同时间呈现媒体内容。在一些实施例中,上述视觉媒体内容还可以是不同三维空间的媒体内容。即本申请实施例中,对上述至少两个视觉媒体内容不做具体限制。
步骤602:对所述至少两种同构区块进行拼接,得到拼接图和拼接图信息,其中,所述拼接图为异构混合拼接图时,所述拼接图中包括至少两种同构区块,不同种同构区块对应不同的视觉媒体内容表达格式和不同的同构区块信息;
所述拼接图为同构拼接图时,所述拼接图中包括一种同构区块,一种同构区块对应一种视觉媒体内容表达格式。
具体地,对至少两种表达格式的同构区块进行异构拼接,生成异构混合拼接图和拼接图信息;对相同表达格式的同构区块进行同构拼接,生成同构拼接图和拼接图信息。异构混合拼接图是由至少两种表达格式的同构区块拼接而成,同构拼接图是由一种表达格式的同构区块拼接而成。
示例性的,对第一表达格式的同构区块进行同构拼接,得到第一同构拼接图和拼接图信息,对第二表达格式的同构区块进行同构拼接,得到第二同构拼接图和拼接图信息;或者,对第一表达格式的同构区块和第二表达格式的同构区块进行异构拼接,得到异构混合拼接图和拼接图信息;或者,对第一表达格式的同构区块进行同构拼接,得到第一同构拼接图和拼接图信息,对第一表达格式的同构区块和第二表达格式的同构区块进行异构拼接,得到异构混合拼接图和拼接图信息;或者,对第二表达格式的同构区块进行同构拼接,得到第二同构拼接图和拼接图信息,对第一表达格式的同构区块和第二表达格式的同构区块进行异构拼接,得到异构混合拼接图和拼接图信息。
也就是说,同构拼接图中可以包括同一个表达格式的一个同构区块或者多个同构区块,异构混合拼接图包括至少两种表达格式的至少两个同构区块。本申请实施例中,其中,第一表达格式为多视点视频、点云、网格中的一个,第二表达格式为多视点视频、点云、网格中的一个,第一表达格式和第二表达格式为不同表达格式。如图7所示,多视点视频区块1、多视点视频区块2和点云区块1拼接得到一种异构混合拼接图。
示例性的,第一表达格式为多视点视频,第二表达格式为点云。将一部分多视点视频区块和一部分点云区块拼接成异构混合拼接图;将另一部分多视点视频区块拼接成多视点拼接图;将另一部分点云区块拼接成点云拼接图。
拼接图信息用于重建拼接图。示例性的,拼接图信息至少包括拼接图类型信息,同构区块的拼接信息和同构区块信息。在一些实施例中,所述拼接图信息中包括第一语法元素,所述第一语法元素用于指示拼接图为异构混合拼接图或者同构拼接图。在一些实施例中,第一语法元素为拼接图序列参数集ASPS的语法元素和/或拼接图帧参数集AFPS的语法元素。解析ASPS和/或AFPS确定拼接图类型。
在一些实施例中,所述第一语法元素包括第一子语法元素和第二子语法元素;所述根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图,包括:如果所述第一子语法元素的取值和所述第二子语法元素的取值相等时,根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图。
在一些实施例中,所述第一语法元素包括第一子语法元素和第二子语法元素;所述根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图,包括:根据第一子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接;根据第二子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接;两个判断结果一致时,确定所述拼接图类型。
也就是说,码流需保证第一子语法元素和第二子语法元素的绝对一致性,两个子语法元素一致时, 才可以确定拼接图类型。示例性的,可以先比较两个子语法元素的一致性,再根据其中一个子语法元素的取值确定拼接图类型,或者先根据每个子语法元素确定拼接图类型,通过比较拼接图类型是否相同来保证绝对一致性。
示例性的,所述根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图,包括:所述取值为第一预设值,则确定所述拼接图为异构混合拼接图;所述取值为第二预设值,则确定所述拼接图为同构拼接图。也就是说,可以设置两个值或者两类值来标识异构混合拼接图和同构拼接图。示例性的,第一预设值为1,第二预设值为0。
示例性的,在一些实施例中,所述拼接图信息中不包括所述第一语法元素,确定所述拼接图为同构拼接图。在一些实施例中,所述拼接图信息中不包括所述第一语法元素,则推断所述第一语法元素取值为第二预设值。示例性的,所述拼接图信息中不包括所述第一子语法元素,确定所述拼接图为同构拼接图,并推断第一子语法元素取值为第二预设值;所述拼接图信息中不包括所述第二子语法元素,确定所述拼接图为同构拼接图,并推断第二子语法元素取值为第二预设值。
示例性的,所述根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图,包括:所述取值为第三预设值,则确定所述拼接图为包括第一表达格式和第二表达格式的同构区块的异构混合拼接图,其中,所述第一表达格式和所述第二表达格式为不同表达格式;所述取值为第四预设值,则确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;所述取值为第五预设值,则确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。也就是说,也可以设置多个值,来标识异构混合拼接图以及同构拼接图的表达格式,甚至还可以标识异构混合拼接图中包含哪些表达格式的同构区块。示例性的,第三预设值为2,第四预设值为1,第五预设值为0。
示例性的,所述第一子语法元素为拼接图序列参数集ASPS的语法元素,所述第二子语法元素为拼接图帧参数集AFPS的语法元素。在一些实施例中,所述第一子语法元素可以为ASPS中的新增语法元素,或者所述第一子语法元素可以为ASPS中至少两个语法元素经过逻辑运算得到的语法元素。在一些实施例中,所述第二子语法元素可以为AFPS中的新增语法元素,或者所述第二子语法元素可以为ASPS中至少两个语法元素经过逻辑运算得到的语法元素。示例性的,第一子语法元素为ASPS中两个语法元素经过与运算得到的语法元素,第二子语法元素为ASPS中两个语法元素经过与运算得到的语法元素。
本申请实施例中,asps_heterogeneous_miv_extension_present_flag表示第一子语法元素,afps_heterogeneous_miv_extension_present_flag表示第二子语法元素。
在一些实施例中,该方法包括:在ASPS解析第一子语法元素;根据所述第一子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;在AFPS解析第二子语法元素;根据所述第二子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图。也就是说,直接在ASPS解析第一子语法元素,当第一子语法元素存在时,分析第一子语法元素的取值确定拼接图为异构混合拼接图还是同构拼接图;当第一子语法元素不存在或者为0时,确定拼接图为同构拼接图。
在一些实施例中,该方法包括:确定所述码流中包括至少两种表达格式的视觉媒体内容对应的码流,在ASPS解析第一子语法元素;根据所述第一子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;在AFPS解析第二子语法元素;根据所述第二子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图。也就是说,当确定码流中包括至少两种表达格式的视觉媒体内容对应的码流,在ASPS解析第一子语法元素,当第一子语法元素存在时,分析第一子语法元素的取值确定拼接图为异构混合拼接图还是同构拼接图;当第一子语法元素不存在或者为0时,确定拼接图为同构拼接图。当前确定码流中包括一种表达格式的视觉媒体内容对应的码流时,确定拼接图为同构拼接图。
示例性的,在一些实施例中,所述拼接图为异构混合拼接图时,所述拼接图信息还包括第二语法元素;根据所述第二语法元素确定所述拼接图中同构区块的表达格式。
在一些实施例中,该方法还包括:在ASPS解析第一子语法元素;根据所述第一子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;在AFPS解析第二子语法元素;根据所述第二子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;所述拼接图为异构混合拼接图时,在AFPS解析所述第二语法元素,根据所述第二语法元素确定所述拼接图中同构区块的表达格式。在一些实施例中,在根据第一语法元素确定拼接图为异构混合拼接图之后,进一步解析异构混合拼接图中同构区块的第二语法元素,以确定同构区块类型。
具体的,可以通过对第二语法元素置不同的值来指示拼接图中第i个区块对应的表达格式类型。示例性的,所述根据所述第二语法元素确定所述拼接图中同构区块的表达格式,包括:第i个区块的第二语法元素的取值为第六预设值时,则确定所述第i个区块的表达格式为第一表达格式;第i个区块的第二语法元素的取值为第七预设值时,则确定所述第i个区块的表达格式为第二表达格式。以第一表达格式为点云,第二表达格式为多视点视频为例。可选的,第六预设值为0,第七预设值为1。
进一步地,所述第i个区块的表达格式为第一表达格式,对所述第i个区块采用第一表达格式对应的编码方法进行编码;所述第i个区块的表达格式为第二表达格式,对所述第i个区块采用第二表达格式对应的编码方法进行编码。
在一些实施例中,第一子语法元素和第二子语法元素是针对异构混合拼接图中多视点的扩展语法元素,第二语法元素为多种表达格式的共享语法元素。
进一步地,在一些实施例中,第二语法元素包括:第三子语法元素和第四子语法元素。在ASPS解析第一子语法元素;根据所述第一子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;在AFPS解析第二子语法元素;根据所述第二子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;在AFPS解析第三子语法元素,根据第三子语法元素的取值确定所述拼接图为异构混合拼接图,在AFPS解析第四子语法元素,根据第四子语法元素根确定所述拼接图中同构区块的表达格式。
在一些实施例中,在ASPS解析第一子语法元素;根据所述第一子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;在AFPS解析第二子语法元素和第三子语法元素;根据所述第二子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;根据所述第三子语法元素的取值确定所述拼接图为异构混合拼接图时,根据所述第四子语法元素根确定所述拼接图中同构区块的表达格式。
具体的,通过对第三子语法元素置不同的值来指示拼接图类型,通过对第四子语法元素置不同的值来指示拼接图中第i个区块对应的表达格式类型。
在一些实施例中,所述第三子语法元素不存在时确定所述拼接图为同构拼接图。
示例性的,在一些实施例中,所述拼接图为异构混合拼接图时,所述拼接图信息中包括至少两种同构区块信息,其中,不同表达格式的同构区块对应不同的同构区块信息。同构区块信息包括同构区块的重建信息以及其他补充信息,用于对同构区块进行解码重建。
示例性的,所述同构区块信息包括ASPS的语法元素和AFPS的语法元素;不同的同构区块信息对应不同的所述ASPS的语法元素和所述AFPS的语法元素。在一些实施例中,对于异构混合拼接图,不同表达格式的同构区块的ASPS和AFPS至少部分不同,即不同表达格式的同构区块的ASPS和AFPS不完全相同。对异构混合拼接图进行编码时,能满足异构混合拼接图中不同表达格式区块的高层信息(ASPS和AFPS)不对应相等的情况。从而实现更适合异构混合拼接图的高层参数,能有效提升编码效率即减少码率或提高重构多视点视频或点云视频的质量。
示例性的,所述同构区块为多视点视频区块时对应第一同构区块信息,所述同构区块为点云区块时对应第二同构区块信息;所述第一同构区块信息和所述第二同构区块信息包括共享的所述ASPS参数集的语法元素和所述AFPS参数集的语法元素;所述第一同构区块信息还包括所述ASPS参数集的扩展语法元素和所述AFPS参数集的扩展语法元素。这里,针对异构混合拼接图的多视点视频区块新增ASPS参数集的扩展语法元素和AFPS参数集的扩展语法元素,用于表示与点云区块不相等的ASPS参数和AFPS参数,以提高多视点视频区块解码效率。需要说明的是,在点云区块和多视点视频区块解码重建时,这些ASPS参数和AFPS参数可能具备相同功能但取值不相等。
示例性的,第二同构区块信息包括ASPS参数集的扩展语法元素和所述AFPS参数集的扩展语法元素,即还可以针对异构混合拼接图的点云视频区块新增ASPS参数集的扩展语法元素和AFPS参数集的扩展语法元素,以提高点云区块解码效率。第一同构区块信息和第二同构区块信息的扩展语法元素。
示例性的,ASPS参数集的扩展语法元素包括:ashm_geometry_3d_bit_depth_minus1用于表示重建几何内容的几何坐标的位深度。ashm_geometry_2d_bit_depth_minus1用于表示投影到2d图像时几何的位深度。ashm_log2_max_atlas_frame_order_cnt_lsb_minus4用于确定在解码过程中用于拼接图帧顺序计数的变量值。AFPS参数集的扩展语法元素包括:afhm_additional_lt_afoc_lsb_len用于确定参考拼接图帧列表在解码过程中使用的变量MaxLtAtlasFrmOrderCntLsbForMiv的值。这些语法元素所表示的多视点视频区块的ASPS参数和AFPS参数,与点云区块的ASPS参数和AFPS参数不完全相等。
示例性的,所述第一同构区块信息和所述第二同构区块信息包括共享的所述ASPS参数集的语法元素和所述AFPS参数集的语法元素;所述第一同构区块信息还包括所述ASPS参数集的第一扩展语法元素,用于表示重建几何内容的几何坐标的位深度。需要说明的是,本申请实施例中语法元素的命名主要是方便理解和行文,实际应用和标准文本中可以作出修改,但其语义内容应当一致或相近,例如,ashm_geometry_3d_bit_depth_minus1和asps_geometry_3d_bit_depth_minus1_for_miv都表示第一扩展语法元素,第一扩展语法元素也可以理解为新增语法元素。
示例性的,所述拼接图为同构拼接图时,所述拼接图信息中包括一种同构区块信息,用于对拼接图中的同构区块进行解码重建。
在一些实施例中,本申请实施例的异构混合拼接图包括以下至少一种:单一属性异构混合拼接图和 多属性异构混合拼接图。
其中,单一属性异构混合拼接图是指包括的所有同构区块的属性信息均相同的异构混合拼接图。例如,一张单一属性异构混合拼接图只包括属性信息的同构区块,比如只包括多视点视频纹理区块和点云纹理区块。又例如,一张单一属性异构混合拼接图只包括几何信息的同构区块,比如只包括多视点视频几何区块和点云几何区块。
多属性异构混合拼接图是指包括的至少两个同构区块的属性信息不同的异构混合拼接图,例如一张多属性异构混合拼接图中既包括属性信息的同构区块,又包括几何信息的同构区块。作为示例,可以将点云、多视点视频和网格中至少两个的任意一个属性或任意两个属性下的区块拼接在一张图中,得到异构混合拼接图。本申请对此不做限定。
在一些实施例中,对第一表达格式的单一属性同构区块和第二表达格式的单一属性区块进行拼接,得到异构混合拼接图。其中,第一表达格式和第二表达格式均为多视点视频、点云和网格中的任意一个,且第一表达格式和所述第二表达格式不同,第一表达格式和第二表达格式的属性信息相同。
多视点视频的单一属性同构区块包括多视点视频纹理区块和多视点视频几何区块等中的至少一个。
点云的单一属性同构区块包括点云纹理区块、点云几何区块和点云占用情况区块等中的至少一个。
网格的单一属性同构区块包括网格纹理区块、网格几何区块中的至少一个。
例如,将多视点视频几何区块、点云几何区块、网格几何区块中的至少两个拼接在一张图中,得到一张异构混合拼接图。该异构混合拼接图称为单一属性异构混合拼接图。再例如,将多视点视频纹理区块、点云纹理区块、网格纹理区块中的至少两个拼接在一张图中,得到一张异构混合拼接图。该异构混合拼接图称为单一属性异构混合拼接图。
在一些实施例中,对第一表达格式的多属性同构区块和第二表达格式的多属性同构区块进行拼接,得到异构混合拼接图。其中,第一表达格式和第二表达格式均为多视点视频、点云和网格中的任意一个,且第一表达格式和所述第二表达格式不同,第一表达格式和第二表达格式的属性信息不完全相同。
例如,将多视点视频纹理区块,与点云几何区块和网格几何区块中的至少一个拼接在一张图中,得到一张异构混合拼接图。再例如,将多视点视频几何区块,与点云纹理区块和网格纹理区块中的至少一个拼接在一张图中,得到一张异构混合拼接图。再例如,将点云纹理区块,与多视点视频几何区块和网格几何区块中的至少一个拼接在一张图中,得到一张异构混合拼接图。再例如,将点云几何区块,与多视点视频纹理区块和网格纹理区块中的至少一个拼接在一张图中,得到一张异构混合拼接图。再例如,将点云几何区块、多视点视频纹理区块和多视点视频纹理区块拼接在一张图中,得到一张异构混合拼接图。再例如,将点云几何区块、点云纹理区块和多视点视频纹理区块和多视点视频纹理区块拼接在一张图中,得到一张异构混合拼接图。这里,得到的异构混合拼接图称为多属性异构混合拼接图。
在一些实施例中,本申请实施例的同构拼接图包括以下至少一种:单一属性同构拼接图和多属性同构拼接图。在一些实施例中,对第一表达格式的第一属性同构区块进行拼接,得到同构拼接图。或者,第一表达格式的第一属性同构区块和第二属性同构区块进行拼接,得到同构拼接图。
其中,单一属性同构拼接图是指包括的所有同构区块的表达格式相同和属性信息均相同的同构拼接图。例如,一张单一属性同构拼接图只包括第表达格式的属性信息的同构区块,比如张单一属性同构拼接图只包括多视点视频纹理区块,或只包括点云纹理区块。又例如,一张单一属性同构拼接图只包括几何信息的同构区块,比如只包括多视点视频几何区块,或只包括点云几何区块。
多属性同构拼接图是指包括的至少两个同构区块的表达格式相同但属性信息不同的同构拼接图,例如一张多属性同构拼接图中既包括属性信息的同构区块,又包括几何信息的同构区块。作为示例,一张多属性同构拼接图包括多视点视频纹理区块和多视点视频集合区块。又例如,一张多属性同构拼接图包括点云几何区块和点云纹理区块,如图8所示,一张多属性同构拼接图包括点云纹理区块1、点云几何区块1和点云几何区块2。
在一些实施例中,拼接图信息还可以包括语法元素,根据所述语法元素确定拼接图为单一属性异构混合拼接图、多属性异构混合拼接图、单一属性同构拼接图或者多属性同构拼接图。
步骤603:对所述拼接图和拼接图信息进行编码,得到码流。
在一些实施例中,所述码流的参数集子码流中包括第三语法元素,根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流。示例性的,码流的参数集为V3C_VPS,第三语法元素可以为V3C_VPS中的ptl_profile_toolset_idc。
在一些实施例中,通过将第三语法元素置为不同值来指示码流中包括至少一种表达格式的视觉媒体内容对应的码流。示例性的,所述根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流,包括:所述第三语法元素为第一数值,确定所述码流中同时包括第一表达格式的视觉媒体内容对应的码流和第二表达格式的视觉媒体内容对应的码流;所述第三语法元素为第二数值, 确定所述码流中包括所述第一表达格式的视觉媒体内容对应的码流;所述第三语法元素为第三数值,确定所述码流中包括所述第二表达格式的视觉媒体内容对应的码流。
示例性的,以第一表达格式为多视点视频,第二表达格式为点云为例,第三语法元素置为第一数值时,第一数值用于指示码流中同时包含多视点视频码流和点云码流。作为具体的例子,当ptl_profile_toolset_idc=X,X为128/129/130/131/132/133则表示当前码流中同时包含点云和多视点两类码流。又例如,第三语法元素置为第二数值时,第二数值用于指示码流中只包含点云码流。作为具体的例子,当ptl_profile_toolset_idc=X,X为0/1则表示当前码流中只包含点云码流。又例如,第三语法元素置为第三数值,第三数值用于指示码流中只包含多视点视频码流。作为具体的例子,当ptl_profile_toolset_idc=X,X为64/65/66则表示当前码流中只包含多视点视频码流。应理解,以上第一数值、第二数值、第三数值的取值仅作为示例,本申请实施例并不限制与此。
在一些实施例中,码流包括视频压缩子码流和拼接图信息子码流。所述对所述拼接图和拼接图信息进行编码,得到码流,包括:对所述拼接图进行编码,得到视频压缩子码流;对所述拼接图的拼接图信息进行编码,得到拼接图信息子码流;将所述视频压缩子码流和所述拼接图信息子码流合成所述码流。这样,实现在同一压缩码流中支持视频、点云、网格等异构信源格式,实现压缩码流中同时存在多视点视频拼接图、点云视频拼接图、网格拼接图、异构混合拼接图,能够减少所需要调用的HEVC,VVC,AVC,AVS等二维视频编码器的个数,降低实现代价,提高易用性。
在一些实施例中,所述对所述拼接图和拼接图信息进行编码,得到码流,包括:若所述第i个区块的表达格式为第一表达格式,确定所述第i个区块中子图块采用所述第一表达格式对应的编码标准进行编码,得到所述第一表达格式的视觉媒体内容对应的码流;若所述第i个区块的表达格式为第二表达格式,确定所述第i个区块中子图块采用所述第二表达格式对应的编码标准进行编码,得到所述第二表达格式的视觉媒体内容对应的码流。
示例性的,在已知第i个区块的第二语法元素为1的前提下,确定当前子图块采用多视点视频编码标准进行编码。在已知第i个区块第二语法元素为0的前提下,确定当前子图块采用点云编码标准进行编码。
本申请实施例中,对异构混合拼接图和同构拼接图进行视频编码,得到视频压缩子码流所使用的视频编码器,可以为上述图2A所示的视频编码器。也就是说,本申请实施例将异构混合拼接图或同构拼接图作为一帧图像,首先进行块划分,接着使用帧内或帧间预测得到编码块的预测值,编码块的预测值和原始值进行相减,得到残差值,对残差值进行变换和量化处理后,得到视频压缩子码流。
本申请实施例中,在生成至少一个拼接图的同时,生成每个拼接图对应的拼接图信息。对拼接图信息进行编码得到拼接图信息子码流。其中,拼接图信息包括用于指示拼接图类型的第一语法元素,以及拼接图中每个同构区块的表达格式的第二语法元素。本申请实施例对拼接图信息进行编码的方式不做限制,例如使用等长编码或变长编码等常规数据压缩编码方式进行压缩。
最后,将视频压缩子码流和拼接图信息子码流写在同一个码流中,得到最终的码流。也就是说,本申请实施例不仅实现在同一压缩码流中支持视频、点云、网格等异构信源格式和同构信源格式。
在一些实施例中,该方法还包括:将码流的参数集进行编码得到码流参数集子码流。具体地,编码端将视频压缩子码流、拼接图信息子码流和该参数集子码流合成码流。所述码流的参数集子码流中包括第三语法元素,根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流。也就是说,编码端通过发送第三语法元素,用于指示码流中是否同时包含至少两种表达格式的视觉媒体内容。示例性的,第三语法元素指示码流中包括一种表达格式的视觉媒体内容对应的码流时,可以理解为编码端对一种表达格式的视觉媒体内容进行处理得到一种同构区块,对一种同构区块进行拼接得到同构拼接图。第三语法元素指示码流中包括至少两种表达格式的视觉媒体内容对应的码流时,可以理解为编码端对至少两种表达格式的视觉媒体内容得到至少两种同构区块,对至少两种同构区块进行拼接得到同构拼接图和/或异构混合拼接图。
示例性的,第三语法元素指示码流中包括至少两种表达格式的视觉媒体内容对应的码流时,该方法包括:对第一表达格式的同构区块进行同构拼接,得到第一同构拼接图,对第二表达格式的同构区块进行同构拼接,得到第二同构拼接图;或者,对第一表达格式的同构区块和第二表达格式的同构区块进行异构拼接,得到异构混合拼接图;或者,对第一表达格式的同构区块进行同构拼接,得到第一同构拼接图,对第一表达格式的同构区块和第二表达格式的同构区块进行异构拼接,得到异构混合拼接图;或者,对第二表达格式的同构区块进行同构拼接,得到第二同构拼接图,对第一表达格式的同构区块和第二表达格式的同构区块进行异构拼接,得到异构混合拼接图。
本申请实施例中,为了减少编码器的个数,降低编码代价,在编码时,首先将视觉媒体内容分别进行处理(即打包),得到多个同构区块。接着,将表达格式不完全相同的至少两个同构区块拼接成异构 混合拼接图,将表达格式完全相同的至少一个同构区块拼接成同构拼接图,对异构混合拼接图和同构拼接图进行编码,得到视频压缩子码流,对拼接图信息进行编码,得到拼接信息子码流;将视频压缩码流和拼接信息码流合成压缩码流。使得该编码方法适用于多种表达格式的视觉媒体内容的应用场景,扩展了应用范围,而且通过将不同表达格式的数据进行混合编码,在编码时,可以只调用一次视频编码器进行编码,进而减少了所需要调用的HEVC,VVC,AVC,AVS等二维视频编码器的个数,减少了编码代价,提高易用性。而且,在对异构混合拼接图进行编码时,不同表达格式区块某些高层参数可以不相等,能够保留不同表达格式区块的更多有效信息,提高图像的合成质量,提高码率-质量的综合效率。
上文以编码端为例对本申请的编码方法进行介绍,下面以解码端为例对本申请实施例提供的视频解码方法进行说明。
图9为本申请实施例提供的一种解码方法的示意性流程图。如图9所示,本申请实施例的解码方法包括:
步骤901:解码码流,得到拼接图和拼接图信息;
示例性的,所述码流包括视频压缩子码流和拼接图信息子码流,所述解码码流,得到拼接图和拼接图信息,包括:分别提取拼接图信息子码流和视频压缩子码流;解码所述视频压缩子码流,得到所述拼接图;解码所述拼接图信息子码流,得到所述拼接图信息。
示例性的,解码所述视频压缩子码流,得到异构混合拼接图、多视点拼接图和点云拼接图;解码拼接图信息子码流得到异构混合拼接图信息、多视点拼接图信息和点云拼接图信息。
示例性的,在一些实施例中,所述码流还包括参数集子码流;所述参数集子码流中包括第三语法元素;根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流。也就是说,解码过程中,先根据第三语法元素判断码流判断码流中包含几种表达格式的视觉媒体内容对应的码流,当根据V3C码流层的第三语法元素判断码流包含一种表达格式的视觉媒体内容,则确定拼接图全为同构拼接图;当根据第三语法元素判断码流包含两种表达格式的视觉媒体内容,则确定拼接图可能包含异构混合拼接图,需要进一步判断拼接图为同构拼接图或者异构混合拼接图。进一步地,根据第一语法元素判断拼接图类型,当确定拼接图为异构混合拼接图时再根据第二语法元素判断同构区块类型。
在一些实施例中,通过将第三语法元素置为不同值来指示码流中包括至少一种表达格式的视觉媒体内容对应的码流。示例性的,所述根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流,包括:所述第三语法元素为第一数值,确定所述码流中同时包括第一表达格式的视觉媒体内容对应的码流和第二表达格式的视觉媒体内容对应的码流;所述第三语法元素为第二数值,确定所述码流中包括所述第一表达格式的视觉媒体内容对应的码流;所述第三语法元素为第三数值,确定所述码流中包括所述第二表达格式的视觉媒体内容对应的码流。
示例性的,以第一表达格式为多视点视频,第二表达格式为点云为例,第三语法元素置为第一数值时,第一数值用于指示码流中同时包含多视点视频码流和点云码流。作为具体的例子,当ptl_profile_toolset_idc=X,X为128/129/130/131/132/133则表示当前码流中同时包含点云和多视点两类码流。又例如,第三语法元素置为第二数值时,第二数值用于指示码流中只包含点云码流。作为具体的例子,当ptl_profile_toolset_idc=X,X为0/1则表示当前码流中只包含点云码流。又例如,第三语法元素置为第三数值,第三数值用于指示码流中只包含多视点视频码流。作为具体的例子,当ptl_profile_toolset_idc=X,X为64/65/66则表示当前码流中只包含多视点视频码流。应理解,以上第一数值、第二数值、第三数值的取值仅作为示例,本申请实施例并不限制与此。
步骤902:所述拼接图为异构混合拼接图时,根据所述拼接图和所述拼接图信息,得到至少两种同构区块和同构区块信息;其中,不同种同构区块对应不同的视觉媒体内容表达格式和不同的同构区块信息;
拼接图信息用于重建拼接图。示例性的,拼接图信息至少包括拼接图类型信息,同构区块的拼接信息和同构区块信息。在一些实施例中,所述拼接图信息中包括第一语法元素,根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图。在一些实施例中,第一语法元素为拼接图序列参数集ASPS的语法元素和拼接图帧参数集AFPS的语法元素。解析ASPS和AFPS确定拼接图类型。
示例性的,所述拼接图为异构混合拼接图时,对所述拼接图进行拆分得到至少两种同构区块;根据所述至少两种同构区块的表达格式,从所述拼接图信息获取所述至少两种同构区块对应的同构区块信息。示例性的,根据异构混合拼接图信息对异构混合拼接图进行拆分,并输出重建的多视点视频同构区块和同构区块信息,以及重建的点云同构区块和同构区块信息。
在一些实施例中,所述第一语法元素包括第一子语法元素和第二子语法元素;所述根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图,包括:如果所述第一子语法元素的取值和所述第二子语法元素的取值相等时,根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图。
也就是说,码流需保证第一子语法元素和第二子语法元素的绝对一致性,两个子语法元素一致时,才可以确定拼接图类型。示例性的,可以先比较两个子语法元素的一致性,再根据其中一个子语法元素的取值确定拼接图类型,或者先根据每个子语法元素确定拼接图类型,通过比较拼接图类型是否相同来保证绝对一致性。
示例性的,所述根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图,包括:所述取值为第一预设值,则确定所述拼接图为异构混合拼接图;所述取值为第二预设值,则确定所述拼接图为同构拼接图。也就是说,可以设置两个值或者两类值来标识异构混合拼接图和同构拼接图。示例性的,第一预设值为1,第二预设值为0。
示例性的,在一些实施例中,所述拼接图信息中不包括所述第一语法元素,确定所述拼接图为同构拼接图。在一些实施例中,所述拼接图信息中不包括所述第一语法元素,则推断所述第一语法元素取值为第二预设值。示例性的,所述拼接图信息中不包括所述第一子语法元素,确定所述拼接图为同构拼接图,并推断第一子语法元素取值为第二预设值;所述拼接图信息中不包括所述第二子语法元素,确定所述拼接图为同构拼接图,并推断第二子语法元素取值为第二预设值。
示例性的,所述根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图,包括:所述取值为第三预设值,则确定所述拼接图为包括第一表达格式和第二表达格式的同构区块的异构混合拼接图,其中,所述第一表达格式和所述第二表达格式为不同表达格式;所述取值为第四预设值,则确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;所述取值为第五预设值,则确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。也就是说,也可以设置多个值,来标识异构混合拼接图以及同构拼接图的表达格式,甚至还可以标识异构混合拼接图中包含哪些表达格式的同构区块。示例性的,第三预设值为2,第四预设值为1,第五预设值为0。
示例性的,所述第一子语法元素为拼接图序列参数集ASPS的语法元素,所述第二子语法元素为拼接图帧参数集AFPS的语法元素。本申请实施例中,asps_heterogeneous_miv_extension_present_flag表示第一子语法元素,afps_heterogeneous_miv_extension_present_flag表示第二子语法元素。
示例性的,在一些实施例中,所述拼接图为异构混合拼接图时,所述拼接图信息还包括第二语法元素;根据所述第二语法元素确定所述拼接图中同构区块的表达格式。可以理解的是,在根据第一语法元素确定拼接图为异构混合拼接图之后,进一步解析同构区块的第二语法元素确定同构区块类型。示例性的,所述第二语法元素为拼接图帧参数集AFPS的语法元素。
具体的,可以通过对第二语法元素置不同的值来指示拼接图中第i个区块对应的表达格式类型。示例性的,所述根据所述第二语法元素确定所述拼接图中同构区块的表达格式,包括:第i个区块的第二语法元素的取值为第六预设值时,则确定所述第i个区块的表达格式为第一表达格式;第i个区块的第二语法元素的取值为第七预设值时,则确定所述第i个区块的表达格式为第二表达格式。以第一表达格式为点云,第二表达格式为多视点视频为例。可选的,第六预设值为0,第七预设值为1。
进一步地,所述第i个区块的表达格式为第一表达格式,对所述第i个区块采用第一表达格式对应的解码方法进行解码;所述第i个区块的表达格式为第二表达格式,对所述第i个区块采用第二表达格式对应的解码方法进行解码。
示例性的,在一些实施例中,所述拼接图为异构混合拼接图时,所述拼接图信息中包括至少两种同构区块信息,其中,不同表达格式的同构区块对应不同的同构区块信息。同构区块信息包括同构区块的重建信息以及其他补充信息,用于对同构区块进行解码重建。
示例性的,所述同构区块信息包括ASPS的语法元素和AFPS的语法元素;不同的同构区块信息对应不同的所述ASPS的语法元素和所述AFPS的语法元素。在一些实施例中,对于异构混合拼接图,不同表达格式的同构区块的ASPS和AFPS至少部分不同,即不同表达格式的同构区块的ASPS和AFPS不完全相同。对异构混合拼接图进行编码时,能满足异构混合拼接图中不同表达格式区块的高层信息(ASPS和AFPS)不对应相等的情况。从而实现更适合异构混合拼接图的高层参数,能有效提升编码效率即减少码率或提高重构多视点视频或点云视频的质量。
示例性的,所述同构区块为多视点视频区块时对应第一同构区块信息,所述同构区块为点云区块时对应第二同构区块信息;所述第一同构区块信息和所述第二同构区块信息包括共享的所述ASPS参数集的语法元素和所述AFPS参数集的语法元素;所述第一同构区块信息还包括所述ASPS参数集的扩展语法元素和所述AFPS参数集的扩展语法元素。
在一些实施例中,所述表达格式为多视点视频、点云或网格。一种同构区块对应一种表达格式。不同种同构区块对应不同的表达格式。示例性的,至少两种同构区块对应的表达格式包括以下至少两种:多视点视频、点云、网格。需要说明的是,本申请实施例中每种同构区块中可以包括具备相同表达格式的至少一个同构区块。示例性的,点云格式的同构区块中包括一个或者多个点云区块,多视点视频格式 的同构区域中包括一个或多个多视点视频区块,网格格式的同构区块包括一个或多个网格区块。
步骤903:所述拼接图为同构拼接图时,根据所述拼接图和所述拼接图信息,得到一种同构区块和同构区块信息;
示例性的,所述拼接图为同构拼接图时,对所述拼接图进行拆分得到一种同构区块;从所述拼接图信息获取同构区块信息。示例性的,根据多视点视频的同构拼接图信息对同构拼接图进行拆分,并输出重建的多视点视频同构区块和同构区块信息。根据点云的同构拼接图信息对同构拼接图进行拆分,并输出重建的点云同构区块和同构区块信息。
示例性的,所述拼接图为同构拼接图时,所述拼接图信息中包括一种同构区块信息,用于对拼接图中的同构区块进行解码重建。
在一些实施例中,本申请实施例的异构混合拼接图包括以下至少一种:单一属性异构混合拼接图和多属性异构混合拼接图。
步骤904:根据所述同构区块和所述同构区块信息,得到至少两种表达格式的视觉媒体内容。
示例性的,所述根据所述同构区块和所述同构区块信息,得到至少两种表达格式的视觉媒体内容,包括:若所述第i个区块的表达格式为第一表达格式,确定所述第i个区块中子图块采用所述第一表达格式对应的解码方法进行解码重建,得到所述第一表达格式的视觉媒体内容;若所述第i个区块的表达格式为第二表达格式,确定所述第i个区块中子图块采用所述第二表达格式对应的解码方法进行解码重建,得到所述第二表达格式的视觉媒体内容。
采用上述技术方案,由于将不同表达格式的同构区块拼接在一张异构混合拼接图中进行编码,能够减少所需要调用的HEVC,VVC,AVC,AVS等二维视频编解码器的个数,降低实现代价,提高易用性。而且,在对异构混合拼接图进行解码时,不同表达格式区块某些高层参数可以不相等,能够保留不同表达格式区块的更多有效信息,提高图像的合成质量,提高码率-质量的综合效率。
下面对本申请实施例提供的解码方法进行进一步的举例说明。
图10为本申请实施例提供的V3C比特流结构的一个示意图。其中,V3C_VPS的V3C参数集()(V3C_parameter_set())中可以包括第三语法元素(ptl_profile_toolset_idc),ptl_profile_toolset_idc为128~133则表示当前码流中同时包含点云码流(比如VPCC basic或VPCC extended等)和多视点视频码流(比如MIV main或MIV Extended或MIV Geometry Absent等)。
ASPS参数集中可以包括第一子语法元素(asps_heterogeneous_miv_extension_present_flag),在ptl_profile_toolset_idc为128~133的情况下,根据asps_heterogeneous_miv_extension_present_flag判断当前拼接图类型。
AFPS参数集中可以包括第二子语法元素(afps_heterogeneous_miv_extension_present_flag),在ptl_profile_toolset_idc为128~133的情况下,根据afps_heterogeneous_miv_extension_present_flag判断当前拼接图类型。AFPS参数集中还包括第二语法元素(afps_heterogeneous_frame_tile_toolset_miv_present_flag)用于判断条带类型,从而保证在解析解码时,实现当前条带应从属于多视点还是点云。
码流需保证afps_heterogeneous_miv_extension_present_flag和asps_heterogeneous_miv_extension_present_flag的绝对一致性。
解码案例一
1.在解码时,从V3C码流解析得到VPS,在VPS中解析得到ptl_profile_toolset_idc(第三语法元素)。判断ptl_profile_toolset_idc=0/1,表示当前码流中仅存在点云码流;
2.对当前码流通过点云解码标准实现。
解码案例二
1.在解码时,从V3C码流解析得到VPS,在VPS中解析得到ptl_profile_toolset_idc。判断ptl_profile_toolset_idc=64/65/66,表示当前码流中仅存在多视点码流;
2.对当前码流通过多视点解码标准实现。
解码案例三
1.在解析时,从V3C码流解析得到VPS,在VPS中解析得到ptl_profile_toolset_idc。判断ptl_profile_toolset_idc=128~133,表示当前码流中同时包含点云和多视点两类码流;
2.对每一张拼接图解析ASPS和AFPS等高层语法:
参见表2,步骤a)之前,还包括:在ASPS中首先解析asps_vpcc_extension_present_flag、asps_miv_extension_present_flag和asps_extension_6bits,然后通过判断得到HeterogeneousPresentFlag。只有满足HeterogeneousPresentFlag为真才继续进行下面的解码操作。
a)在ASPS中解析asps_heterogeneous_miv_extension_present_flag(第一子语法元素):
i.判断asps_heterogeneous_miv_extension_present_flag不存在时,即asps_extension_6bits=0时,表示当前该拼接图为同构内容(所有条带均为点云类型或多视点类型);ii.判断asps_heterogeneous_miv_extension_present_flag存在且等于0时,表示当前该拼接图为同构内容(所有条带均为点云类型或多视点类型);iii.判断asps_heterogeneous_miv_extension_present_flag存在且等于1时,表示当前该拼接图为异构内容,同时存在点云条带和多视点条带。因此将当前拼接图ASPS辅助高层信息拆分为两个子信息集合,即一个子集合用于多视点条带实现解码,另一子集合用于点云条带实现解码。其中点云条带所需辅助信息通过标准23090-5中第8部分解析可得;多视点条带所需辅助信息通过标准23090-5第8部分和新增的asps_heterogeneous_miv_extension和标准23090-12第8部分解析可得。
参见表4,步骤b)之前,还包括:在AFPS中首先解析afps_miv_extension_present_flag和afps_extension_7bits。只有满足HeterogeneousPresentFlag为真才能进行下面的解码操作。
b)在AFPS中解析afps_heterogeneous_miv_extension_present_flag(第二子语法元素):
i.判断afps_heterogeneous_miv_extension_present_flag不存在时,即afps_extension_7bits=0时,表示当前该拼接图为同构内容(所有条带均为点云类型或多视点类型);ii.判断afps_heterogeneous_miv_extension_present_flag存在且等于0时,表示当前该拼接图为同构内容(所有条带均为点云类型或多视点类型);iii.判断afps_heterogeneous_miv_extension_present_flag存在且等于1时,表示当前该拼接图为异构内容,同时存在点云条带和多视点条带。因此将当前拼接图AFPS辅助高层信息拆分为两个子信息集合,即一个子集合用于多视点条带实现解码,另一子集合用于点云条带实现解码。其中点云条带所需辅助信息通过标准23090-5中第8部分解析可得;多视点条带所需辅助信息通过标准23090-5第8部分和新增的afps_heterogeneous_miv_extension和标准23090-12第8部分解析可得。
c)在AFPS中解析afps_heterogeneous_frame_tile_toolset_miv_present_flag(第二语法元素中的第三子语法元素):
i.判断afps_heterogeneous_frame_tile_toolset_miv_present_flag不存在,表示当前该拼接图所有条带均为同一类型;ii.遍历所有tile,判断第i个条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag[i]=0,表示当前条带为点云条带,afps_heterogeneous_frame_tile_toolset_miv_present_flag为第二子语法元素中的第四子语法元素;判断第i个条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag[i]=1,表示当前条带为多视点条带。也就是说,当afps_heterogeneous_frame_tile_toolset_miv_present_flag存在时(例如取值为1),根据afps_heterogeneous_frame_tile_toolset_miv_present_flag[i]遍历所有tile,判断第i个条带的条带类型。在一些实施例中,第二语法元素也可以只包括afps_heterogeneous_frame_tile_toolset_miv_present_flag[i]。
其中,码流需保证afps_heterogeneous_miv_extension_present_flag和asps_heterogeneous_miv_extension_present_flag的绝对一致性。
3.进而解析每一个条带中的每一个子图块信息patch_data_unit,在已知当前条带采用多视点辅助信息的前提下,确定当前子图块采用多视点视频解码标准实现;在已知当前条带采用点云辅助信息的前提下,确定当前子图块采用点云视频解码标准实现。
通过ptl_profile_toolset_idc的编号来表示是否存在异构混合拼接图,并且新增了asps_heterogeneous_miv_extension_present_flag和afps_heterogeneous_miv_extension_present_flag用于判断每张拼接图应该归属于点云/多视点/点云+多视点。同时为了兼容之前的标准,实现了以下新增语法和语义,以及对旧语义的约束。其中表9-1-1和表9-1-2分别表示集成码流下针对多视点的工具箱档次组件的相关语法的限制和针对异构数据的工具箱档次组件相关语法的限制。
其中表9-1-1和表9-1-2分别表示集成码流下针对多视点的工具箱档次组件的相关语法的限制和针对异构数据的工具箱档次组件相关语法的限制。
本申请实施例在高层部分(拼接图级别和拼接图序列级别)通过新增的语法元素afps_heterogeneous_frame_tile_toolset_miv_present_flag对当前拼接图中每个条带类型进行说明,从而保证在解析解码时,实现当前条带应从属于多视点还是点云。
本方案能保证无论是多视点解析时还是点云解析时,有且仅有一个可使用的拼接图级别参数(AFPS)和拼接图序列级别参数(ASPS),并且可以实现多视点的AFPS、ASPS与点云的AFPS、ASPS不完全相等。
具体语法元素解析参见以下表1至表9-1-2。
表1示出了可用的工具集配置文件组件(Available toolset profile components)的一个示例。表1提供了为V3C定义的工具集配置文件组件及其相应的标识语法元素值列表,例如ptl_profile_toolset_idc和ptc_one_v3c_frame_only_flag,该定义可以仅供本文档使用。语法元素ptl_profile_toolset_idc提供了工具集配置文件的主要定义,如ptc_one_v3c_frame_only_flag等附加语法元素可以指定已定义配置文件的附加特征或限制。ptc_one_v3c_frame_only_flag可以只用于支持单个V3C帧。需要说明的是, ptl_profile_toolset_idc中的2..63,67..127,134..255保留,暂时未定义,标准组织可能在未来的标准中再做规定。表1中定义的配置文件类型可以包括动态(Dynamic)或静态(Static)。
表1可用的工具集配置文件组件
Figure PCTCN2023071083-appb-000001
Figure PCTCN2023071083-appb-000002
表2示出了通用拼接图序列参数集的RBSP语法(General atlas sequence parameter set RBSP syntax),可以供ISO/IEC 23090-5使用。利用拼接图序列参数集中的扩展语法元素asps_heterogeneous_miv_extension_present_flag表示拼接图类型,具体地根据该语法元素的取值确定拼接图应该归属于点云/多视点/点云+多视点。
表2通用拼接图序列参数集的RBSP语法
Figure PCTCN2023071083-appb-000003
Figure PCTCN2023071083-appb-000004
表3为ASPS异构多视点扩展语法元素(Atlas sequence parameter set heterogeneous MIV extension syntax),可以供ISO/IEC 23090-5使用。ashm_geometry_3d_bit_depth_minus1用于表示重建几何内容的几何坐标的位深度。ashm_geometry_2d_bit_depth_minus1用于表示投影到2d图像时几何的位深度。ashm_log2_max_atlas_frame_order_cnt_lsb_minus4用于确定在解码过程中用于拼接图帧顺序计数的变量值。
表3ASPS异构多视点扩展语法元素
Figure PCTCN2023071083-appb-000005
表4为通用拼接图帧参数集的RBSP语法(General atlas frame parameter set RBSP syntax),可以供 ISO/IEC 23090-5使用。利用拼接图帧参数集中的扩展语法元素afps_heterogeneous_miv_extension_present_flag表示拼接图类型,具体地根据该语法元素的取值确定拼接图应该归属于点云/多视点/点云+多视点。
表4拼接图帧参数集RBSP语法
Figure PCTCN2023071083-appb-000006
表5为AFPS异构MIV扩展语法元素(Atlas frame parameter set heterogeneous MIV extension syntax),可以供ISO/IEC 23090-5使用。afhm_additional_lt_afoc_lsb_len用于确定参考拼接图帧列表在解码过程中使用的变量MaxLtAtlasFrmOrderCntLsbForMiv的值。
表5为AFPS异构MIV扩展语法元素
Figure PCTCN2023071083-appb-000007
下面ASPS语法元素的语义和AFPS语法元素的语义进行解释说明。
1、ASPS语法元素的语义:
asps_extension_6bits等于0表示在ASPS RBSP语法结构中不存在asps_extension_data_flag。如果存在,则asps_extension_6bits的值在本标准文本中为0或1,不等于0和1的值保留供ISO/IEC将来使用。解码器应允许asps_extension_6bits的值不等于0或1,并必须应忽略asps中的所有asps_extension_data_flag语法元素。当不存在时,asps_extension_6bits的值被推断为等于0。
asps_heterogeneous_miv_extension_present_flag等于1表示在语法结构中存在asps_heterogeneous_miv_extension()语法结构。asps_heterogeneous_miv_extension_present_flag等于0表示此语法结构不存在。当asps_heterogeneous_miv_extension_present_flag不存在时,asps_heterogeneous_miv_extension_present_flag的值被推断为0。
ASPS扩展语法元素语义:ashm_geometry_3d_bit_depth_minus1加1表示重建体积内容的几何坐标的位深度。ashm_geometry3d_bitdepth_minus1应在0到31之间,包括0到31。
ashm_geometry_2d_bit_depth_minus1加1表示投影到2d图像时几何的位深度。ashm_geometry2d_bit_depth_minus1应在0到31的范围内,包括0和31。
ashm_log2_max_atlas_frame_order_cnt_lsb_minus4加4指定在解码过程中用于拼接图帧顺序计数的变量Log2MaxAtlasFrmOrderCntLsbForMiv和MaxAtlasFlmOrderCNTLsbForMIv的值,如下所示:
Log2MaxAtlasFrmOrderCntLsbForMiv=ashm_log2_max_atlas_frame_order_cnt_lsb_minus4+4
MaxAtlasFrmOrderCntLsbForMiv=2 Log2MaxAtlasFrmOrderCntLsbForMiv
其中,ashm_log2_max_atlas_frame_order_cnt_lsb_minus4取值从0到12,包括0和31。
2、AFPS语法元素的语义:
afps_extension_7bits等于0指定AFPS RBSP语法结构中不存在afps_extension_data_flag语法元素。如果存在,afps_extension_7bits在码流中应等于符合本文件本版本的值为0或1。afps_extension_7bits的值不等于0和1,由ISO/IEC保留以供将来使用。解码器应允许afps_extension_7bits的值不等于0或1,并应忽略AFPS中的afps_extension_data_flag语法元素。当afps_extension_7bits不存在时,afps_extension_7bits的值被推断为等于0。
afps_heterogeneous_miv_extension_present_flag等于1表示AFPS语法结构中存在afps_hyterogeneoos_miv_extension()语法结构。afps_heterogeneous_miv_extension_present_flag等于0表示afps_heterogeneous_miv_extension_present_flag指定此语法结构不存在。当afps_heterogeneous_miv_extension_present_flag不存在时,afps_heterogeneous_miv_extension_present_flag的值被推断为等于0。
对于符合本文档此版本的比特流,afps_heterogeneous_miv_extension_present_flag和asps_heterigeneous_miv_extenson_present_flag应具有一致性,即需同时存在且值相同。
afps_heterogeneous_frame_tile_toolset_miv_present_flag[i]等于1表示异构混合拼接图中的第i个条带是属于miv的拼接图条带(即多视点视频条带)。afps_heterogeneous_frame_tile_toolset_miv_present_flag[i]等于0指定异构混合拼接图中的第i个图块是属于vpcc的拼接图条带(即点云条带)。当afps_heterogeneous_frame_tile_toolset_miv_present_flag[i]不存在时,afps_heterogeneous_frame_tile_toolset_miv_present_flag[i]的值被推断为等于0。
AFPS扩展语法元素语义:
afhm_additional_lt_afoc_lsb_len表示参考拼接图帧列表解码过程中使用的变量MaxLtAtlasFrmOrderCntLsbForMiv的值,如下所示:
MaxLtAtlasFrmOrderCntLsbForMiv=
2*(Log2MaxAtlasFrmOrderCntLsbForMiv+afhm_additional_lt_afoc_lsb_len)
afhm_additional_lt_afoc_lsb_len的值应在0到32–Log2MaxAtlasFrmOrderCntLsbForMiv之间,包括0和32–Log2MaxAtlasFrmOrderCntLsbForMiv。
当asps_long_term_ref_atlas_frames_flag等于0,afhm_additional_lt_afoc_lsb_len的值被推断为0。
拼接图条带数据单元头语义,
ath_atlas_frm_order_cnt_lsb表示当前拼接图条带指定以MaxAtlasFrmOrderCntLsb为模的拼接图帧顺序计数。如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则ath_atlas_frm_order_cnt_lsb语法元素的长度等于Log2MaxAtlasFrmOrderCntLsb位。ath_atlas_frm_order_cnt_lsb的值应在0到MaxAtlasFrmOrderCntLsb-1的范围内,包括0和MaxAtlasFrmOrderCntLsb-1。如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,ath_atlas_frm_order_cnt_lsb语法元素的长度等于Log2MaxAtlasFrmOrderCntLsbForMiv位。ath_atlas_frm_order_cnt_lsb的值应在0到MaxAtlasFrmOrderCntLsbForMiv-1的范围内,包括0和MaxAtlasFrmOrderCntLsbForMiv-1。
ath_additional_afoc_lsb_val[j]为当前拼接图条带指定FullAtlasFrmOrderCntLsbLt[RlsIdx][j]的值,如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则
FullAtlasFrmOrderCntLsbLt[RlsIdx][j]=
ath_additional_afoc_lsb_val[j]*MaxAtlasFrmOrderCntLsb+afoc_lsb_lt[RlsIdx][j]
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则
FullAtlasFrmOrderCntLsbLt[RlsIdx][j]=
ath_additional_afoc_lsb_val[j]*MaxAtlasFrmOrderCntLsbForMiv+afoc_lsb_lt[RlsIdx][j]
ath_additional_afoc_lsb_val[j]由afps_additional_lt_afoc_lsb_len位表示。当afps_additional_lt_afoc_lsb_len不存在时,ath_additional_afoc_lsb_val[j]的值被推断为等于0。
ath_raw_3d_offset_axis_bit_count_minus1加1表示rpdu_3d_offset_u[tileID][p]、rpdu_3d_offset_v[tileID][p]和rpdu_3e_offset_d[tileID][p]三个语法元素的值的固定位宽大小,其中p表示子图块索引为p、tileID表示子图块位于条带ID等于tileID的条带中。
当存在,并且如果afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则用于表示ath_raw_3d_offset_axis_bit_count_minus1的语法元素的长度等于Floor(Log2(asps_geometry_3d_bit_depth_minus1+1))。
当不存在,则ath_raw_3d_offset_axis_bit_count_minus1的语法元素的值推断为
Max(0,asps_geometry_3d_bit_depth_minus1-asps_geometry_2d_bit_depth_minus1)-1.
变量RawShift定义如下:
如果afps_raw_3d_offset_bit_count_explicit_mode_flag=1,
RawShift=asps_geometry_3d_bit_depth_minus1-ath_raw_3d_offset_axis_bit_count_minus1
否则
RawShift=asps_geometry_2d_bit_depth_minus1+1
当存在,并且如果afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则用于表示ath_raw_3d_offset_axis_bit_count_minus1的语法元素的长度等于Floor(Log2(ashm_geometry_3d_bit_depth_minus1+1))。
当不存在,则ath_raw_3d_offset_axis_bit_count_minus1的语法元素的值推断为
Max(0,ashm_geometry_3d_bit_depth_minus1-ashm_geometry_2d_bit_depth_minus1)-1.
变量RawShift定义如下:
如果afps_raw_3d_offset_bit_count_explicit_mode_flag=1,
RawShift=ashm_geometry_3d_bit_depth_minus1-ath_raw_3d_offset_axis_bit_count_minus1
否则
RawShift=ashm_geometry_2d_bit_depth_minus1+1
子图块数据单元(Patch data unit)语义
pdu_3d_offset_u[tileID][p]表示沿切线轴重建子图块的偏移,当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则pdu_3d_offset_u[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含边界)。用于表示pdu_3d_offset_u[tileID][p]的位数的值为asps_geometry_3d_bit_depth_minus1+1。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则pdu_3d_offset_u[tileID][p]的值应在0到2 ashm_geometry_3d_bit_depth_minus1+1-1的范围内(含)。用于表示pdu_3d_offset_u[tileID][p]的位数的值为ashm_geometry_3d_bit_depth_minus1+1。
pdu_3d_offset_v[tileID][p]表示沿双切线轴重建子图块的偏移,当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则pdu_3d_offset_v[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含边界)。用于表示pdu_3d_offset_v[tileID][p]的位数的值为asps_geometry_3d_bit_depth_minus1+1。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则pdu_3d_offset_v[tileID][p]的值应在0到2 ashm_geometry_3d_bit_depth_minus1+1-1的范围内(含)。用于表示pdu_3d_offset_v[tileID][p]的位数的值为ashm_geometry_3d_bit_depth_minus1+1。
pdu_3d_offset_d[tileID][p]表示沿法线轴重建子图块的偏移,当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。Pdu3dOffsetD[tileID][p]定义如下:
Pdu3dOffsetD[tileID][p]=pdu_3d_offset_d[tileID][p]<<ath_pos_min_d_quantizer
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则Pdu3dOffsetD[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含)。用于表示pdu_3d_offset_v[tileID][p]位数的值为(asps_geometry_3d_bit_depth_minus1–ath_pos_min_d_quantizer+1)。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则Pdu3dOffsetD[tileID][p]的值应在0到2 ashm_geometry_3d_bit_depth_minus1+1-1的范围内(含)。用于表示pdu_3d_offset_v[tileID][p]位数的值为(ashm_geometry_3d_bit_depth_minus1–ath_pos_min_d_quantizer+1)。
pdu_3d_range_d[tileID][p](如果存在)表示转换为标称表示后,沿法线轴重建的位深度子图块几何体样本中预期存在的偏移标称最大值。当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。Pdu3dRangeD[tileID][p]定义如下:
Figure PCTCN2023071083-appb-000008
Figure PCTCN2023071083-appb-000009
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,变量rangeDBitDepth取值如下:
rangeDBitDepth=Min(ashm_geometry_2d_bit_depth_minus1,ashm_geometry_3d_bit_depth_minus1)+1
如果pdu_3d_range_d[tileID][p]不存在,Pdu3dRangeD[tileID][p]的值被推断为2 rangeDBitDepth–1。如果存在,Pdu3dRangeD[tileID][p]的值应在0到2 rangeDBitDepth–1的范围内(含)。
表示pdu_3d_range_d[tileID][p]的位数等于(rangeDBitDepth–ath_pos_delta_max_d_quantizer)。
合并子图块数据单元(Merge Patch data unit)语义
mpdu_3d_offset_u[tileID][p]表示要沿切线轴应用于重建两个子图块之间的偏移差异,其中两个子图块分别是当前拼接图中条带索引为tileID、子图块索引为p的子图块和当前拼接图条带索引为tileID、子图块索引为RefPatchIdx的子图块。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则mpdu_3d_offset_u[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1+1+1)到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含)。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则mpdu_3d_offset_u[tileID][p]的值应在(-2 ashm_geometry_3d_bit_depth_minus1+1+1)到(2 ashm_geometry_3d_bit_depth_minus1+1–1)的范围内(含)。
如果mpdu_3d_offset_u[tileID][p]不存在,该值被推断为0。
mpdu_3d_offset_v[tileID][p]表示要沿双切线轴应用于重建两个子图块之间的偏移差异,其中两个子图块分别是当前拼接图中条带索引为tileID、子图块索引为p的子图块和当前拼接图条带索引为tileID、子图块索引为RefPatchIdx的子图块。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则mpdu_3d_offset_v[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1+1+1)到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含)。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则mpdu_3d_offset_v[tileID][p]的值应在(-2 ashm_geometry_3d_bit_depth_minus1+1+1)到(2 ashm_geometry_3d_bit_depth_minus1+1–1)的范围内(含)。
如果mpdu_3d_offset_v[tileID][p]不存在,该值被推断为0。
mpdu_3d_offset_d[tileID][p]表示要沿法线轴应用于重建两个子图块之间的偏移差异,其中两个子图块分别是当前拼接图中条带索引为tileID、子图块索引为p的子图块和当前拼接图条带索引为tileID、子图块索引为RefPatchIdx的子图块。Mpdu3dOffsetD[tileID][p]定义如下:
Mpdu3dOffsetD[tileID][p]=mpdu_3d_offset_d[tileID][p]<<ath_pos_min_d_quantizer
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则mpdu_3d_offset_d[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1+1+1)到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含)。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则mpdu_3d_offset_d[tileID][p]的值应在(-2 ashm_geometry_3d_bit_depth_minus1+1+1)到(2 ashm_geometry_3d_bit_depth_minus1+1–1)的范围内(含)。
如果mpdu_3d_offset_d[tileID][p]不存在,该值被推断为0。
帧间子图块数据单元(Inter Merge Patch data unit)语义
ipdu_3d_offset_v[tileID][p]表示要沿双切线轴应用于重建两个子图块之间的偏移差异,其中两个子图块分别是当前拼接图中条带索引为tileID、子图块索引为p的子图块和当前拼接图条带索引为tileID、子图块索引为RefPatchIdx的子图块。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则ipdu_3d_offset_v[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1+1+1)到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含)。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则 ipdu_3d_offset_v[tileID][p]的值应在(-2 ashm_geometry_3d_bit_depth_minus1+1+1)到(2 ashm_geometry_3d_bit_depth_minus1+1–1)的范围内(含)。
如果ipdu_3d_offset_v[tileID][p]不存在,该值被推断为0。
ipdu_3d_offset_d[tileID][p]表示要沿法线轴应用于重建两个子图块之间的偏移差异,其中两个子图块分别是当前拼接图中条带索引为tileID、子图块索引为p的子图块和当前拼接图条带索引为tileID、子图块索引为RefPatchIdx的子图块。Mpdu3dOffsetD[tileID][p]定义如下:
Ipdu3dOffsetD[tileID][p]=ipdu_3d_offset_d[tileID][p]<<ath_pos_min_d_quantizer
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则ipdu_3d_offset_d[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1+1+1)到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含)。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则ipdu_3d_offset_d[tileID][p]的值应在(-2 ashm_geometry_3d_bit_depth_minus1+1+1)到(2 ashm_geometry_3d_bit_depth_minus1+1–1)的范围内(含)。
如果ipdu_3d_offset_d[tileID][p]不存在,该值被推断为0。
ISO/IEC 23090-5:2022/Amd 1:-子条款8.4中的规范适用于以下附加内容。
码流一致性要求asps_geometry_3d_bit_depth_minus1和asps_geometry_2d_bit-depth_minus1分别等于gi_geometroy_3d_coordinates_bit_depth_minusl和gi_geametry_2d_bit_depth_minus1。但在特殊情况下,如果asps_heterogeneous_miv_extension_present_flag等于1,则gi_geometry_3d_coordinates_bit_depth_minus1和gi_geametry_2d_bit_depth_minus1特指的是ISO/IEC23090-5中的内容。ashm_geometroy_3d_bit-depth_minus1和ashm_geometry_2d_bit_depth_minus1不一定要等于gi_geominatory_3d_coordinates_bit__depth_nus1和gi_geometriy_2d_pth_minos1。
子图块数据单元多视点扩展语法语义
pdu_depth_occ_threshold[tileID][p]表示条带索引等于tileID的条带中,对于索引等于p的子图块,认为低于阈值时占用值被设为未占用。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则pdu_depth_occ_threshold[tileID][p]的位数等于asps_geometry_2d_bit_depth_minus1+1。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则pdu_depth_occ_threshold[tileID][p]的位数等于ashm_geometry_2d_bit_depth_minus1+1。
如果不存在,pdu_depth_occ_threshold[tileID][p]被推断为dq_depth_occ_threshold_default[pdu_projection_id[tileID][p]]。注意pdu_projection_id[tileID][p]对应于索引等于p的子图块的视图ID,在索引为tileID的条带中。
3、标准相关解码设计
(1)解码过程中拼接图帧顺序计数
此过程的输出是AtlasFrmOrderCntVal,即当前拼接图条带的拼接图帧顺序计数。拼接图帧顺序计数用于识别拼接图帧的输出顺序,以及用于解码器一致性检查。每个编码的拼接图帧都与一个拼接图帧顺序计数变量相关联,表示为AtlasFrmOrderCntVal。
当当前拼接图帧不是NoOutputBeforeRecoveryFlag等于1的IRAP编码拼接图时,变量prevAtlasFrmOrderCntLsb和prevAtrasFrmOrder CntMsb派生如下:
让prevAtlasFrm是解码顺序中的前一个拼接图帧,其TemporalID等于0,并且不是RASL、RADL或SLNR编码的拼接图帧。
变量prevAtlasFrmOrderCntLsb设置为等于prevAtrasFrm的拼接图帧顺序计数LSB值ath_atlas_frm_order_cnt_LSB。
变量prevAtlasFrmOrderCntMsb设置为等于prevAtrasFrm的AtlasFrmaOrderCNTMsb。
当前拼接图帧的变量AtlasFrmOrderCntMsb推导如下:
如果当前拼接图是一个IRAP编码的拼接图,NoOutputBeforeRecoveryFlag等于1,则AtlasFrmaOrderCNTMsb设置为0。
否则,AtlasFrmOrderCntMsb推导如下:
Figure PCTCN2023071083-appb-000010
Figure PCTCN2023071083-appb-000011
AtlasFrmOrderCntVal推导过程如下:
AtlasFrmOrderCntVal=AtlasFrmOrderCntMsb+ath_atlas_frm_order_cnt_lsb
AtlasFrmOrderCntVal取值范围从-2 31到2 31–1(含)。在一个CAS中,具有相同nal_layer_id值的任何两个拼接图帧的AtlasFrmOrderCntVal不同。
AtlasFrmOrderCnt(aFrmX)函数定义如下:
AtlasFrmOrderCnt(aFrmX)=AtlasFrmOrderCntVal of the atlas frame aFrmX
DiffAtlasFrmOrderCnt(aFrmA,aFrmB)函数定义如下:
DiffAtlasFrmOrderCnt(aFrmA,aFrmB)=AtlasFrmOrderCnt(aFrmA)–AtlasFrmOrderCnt(aFrmB)
比特流不应包含导致解码过程中使用的DiffAtlasFrmOrderCnt(aFrmA,aFrmB)值不在-2 15to 2 15-1范围内(含)的数据。
注1:假设X为当前拼接图帧,Y和Z为同一CAS中的其他两个拼接图帧,当DiffAtlasFrmOrderCnt(X,Y)和Diffatlisfrmordercnt(X,Z)均为正或均为负时,Y和Z被视为与X处于相同的输出顺序方向。
(2)参考拼接图帧列表处理过程
该过程在解码过程开始时调用,用于一个拼接图帧的每个拼接图条带。
参考拼接图帧通过参考索引进行处理。参考索引是参考拼接图帧列表(RAFL)的索引。解码I_TILE拼接图条带时,不使用RAFL解码拼接图条带数据时。解码SKIP_TILE或P_TILE拼接图条带时,使用单个参考拼接图帧列表RefAtlasFrmList解码拼接图条带数据。
在每个拼接图条带的解码过程开始时,导出RAFL RefAtlasFrmList。RAFL用于子条款9.2.4.4中规定的参考拼接图帧标记或拼接图条带数据解码。
注1:对于拼接图帧的I_TILE条带,RefAtlasFrmList可用于比特流一致性检查,但其推导对于当前拼接图帧的解码或按照解码顺序在当前拼接图帧之后的拼接图的解码不需要。
参考拼接图帧列表RefAtlasFrmList构造如下:
Figure PCTCN2023071083-appb-000012
RefAtlasFrmList中的第一个NumRefIdxActive条目被称为RefAtlasVrmList的活动条目, RefAtlasFlmList中其他条目被称之为RefAtrasFrmLists中的非活动条目。
如果当前条带是SKIP_tile,则数组RefAtduTotalNumPatches设置为与RefAtlasFrmList、RefAtlasFlmList[0]中的第一个条目对应的数组AtduToTotalNum Patches。
比特流一致性要求应用以下约束:
–num_ref_entries[RlsIdx]不得小于NumRefIdxActive。
–RefAtlasFrmList中每个活动条目所引用的拼接图帧应存在于DAB中,并且其时间ID应小于或等于当前拼接图帧的时间ID。
–RefAtlasFrmList中每个条目引用的拼接图帧不应是当前拼接图帧。
–拼接图条带RefAtlasFrmList中的短期参考拼接图帧条目和长期参考拼接图帧条目不得引用同一拼接图帧。
–RefAtlasFrmList中不应存在当前拼接图条带的AtlasFrmaOrderCntVal与条目所指拼接图帧的AtlasFlmOrderCNTVal之间的差值大于或等于2 24的长期参考拼接图帧条目。
–让setOfRefAtlasFrms是RefAtlasFlmList中所有条目引用的唯一拼接图帧,这些条目与当前拼接图帧具有相同的nal_layer_id。setOfRefAtlasFrms中的拼接图帧数应小于或等于asps_max_dec_atlas_frame_buffering_minus1,并且对于拼接图帧的所有拼接图条带,setOfrefAtlasFms应相同。
–RefAtlasFrmList中每个活动条目所引用的拼接图帧应具有与当前拼接图帧完全相同的条带数量。
–当前拼接图帧中所有条带的RefAtlasFrmList应包含相同的参考拼接图帧,但对参考拼接图帧的排序没有任何限制。
–如果当前拼接图帧(nal_layer_id等于特定值layerID)是IRAP编码拼接图,则RefAtlasFrmList中的条目所引用的拼接图在输出顺序或解码顺序上不应位于任何之前的IRAP编码拼接图(在解码顺序上,nal_laer_id等于layerID时)之前。
–当当前拼接图帧不是与NoOutputBeforeRecoveryFlag等于1的CRA编码拼接图关联的RASL编码拼接图时,RefAtlasFrmList中的活动条目所引用的拼接图不应存在,该条目是由解码过程生成的,用于为与当前拼接图关联的CRA代码拼接图生成不可用的参考拼接图帧。
–当当前拼接图帧跟随一个IRAP编码拼接图,该拼接图在解码顺序和输出顺序上都具有相同的nal_layer_id值时,RefAtlasFrmList中的活动条目所引用的拼接图帧不得在输出顺序或解码顺序上位于IRAP编码拼接图之前。
–当当前拼接图帧在具有相同nal_layer_id值的IRAP编码拼接图以及与该IRAP编码拼接图在解码顺序和输出顺序上关联的所有前导拼接图帧(如有)之后时,在输出顺序或解码顺序上,RefAtlasFrmList中的条目不应引用IRAP编码拼接图之前的拼接图。
–当当前拼接图帧是RADL编码拼接图时,RefAtlasFrmList中不应有活动条目,该条目是一个拼接图帧,位于RADL编码拼接图的相关IRAP编码拼接图的解码顺序之前。
(3)子图块数据单元的通用解码过程
TilePatch3dOffsetU[tileID][p]表示沿切线轴重建子图块的偏移,当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则TilePatch3dOffsetU[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含边界)。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则TilePatch3dOffsetU[tileID][p]的值应在0到(2 ashm_geometry_3d_bit_depth_minus1+1-1)的范围内(含)。
TilePatch3dOffsetV[tileID][p]表示沿双切线轴重建子图块的偏移,当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则TilePatch3dOffsetV[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含边界)。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则TilePatch3dOffsetV[tileID][p]的值应在0到2 ashm_geometry_3d_bit_depth_minus1+1-1的范围内(含)。
TilePatch3dOffsetD[tileID][p]表示沿法线轴重建子图块的偏移,当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则TilePatch3dOffsetD[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含)。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则TilePatch3dOffsetD[tileID][p]的值应在0到2 ashm_geometry_3d_bit_depth_minus1+1-1的范围内(含)。
TilePatch3dRangeD[tileID][p](如果存在)表示转换为标称表示后,沿法线轴重建的位深度子图块几何体样本中预期存在的偏移标称最大值。当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,变量
rangeDBitDepth=Min(ashm_geometry_2d_bit_depth_minus1,ashm_geometry_3d_bit_depth_minus1)+1
如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,变量rangeDBitDepth=Min(ashm_geometry_2d_bit_depth_minus1,ashm_geometry_3d_bit_depth_minus1)+1。TilePatch3dRangeD[tileID][p]取值从0到2 rangeDBitDepth–1之间(含)。.
4、标准相关语法限制条件。
表6V-PCC工具集配置文件组件允许的最大语法元素值(Max allowed syntax element values for the V-PCC toolset profile components)。
表6
Figure PCTCN2023071083-appb-000013
Figure PCTCN2023071083-appb-000014
表7异构工具集配置文件组件的允许的最大语法元素值扩展(Max allowed syntax element values for the heterogeneous toolset profile components Extended)
表7
Figure PCTCN2023071083-appb-000015
Figure PCTCN2023071083-appb-000016
表8为MIV工具集配置文件组件允许的最大语法元素值(Allowable values of syntax element values for the MIV toolset profile components)
表8
Figure PCTCN2023071083-appb-000017
Figure PCTCN2023071083-appb-000018
Figure PCTCN2023071083-appb-000019
表9-1-1异构工具集配置文件组件允许的语法元素值扩展(Allowable values of syntax element values for the heterogeneous toolset profile components Extended)
表9-1-1
Figure PCTCN2023071083-appb-000020
Figure PCTCN2023071083-appb-000021
Figure PCTCN2023071083-appb-000022
表9-1-2异构工具集配置文件组件允许的语法元素值扩展(Allowable values of syntax element values for the heterogeneous toolset profile components Extended)
表9-1-2
Figure PCTCN2023071083-appb-000023
Figure PCTCN2023071083-appb-000024
Figure PCTCN2023071083-appb-000025
新增扩展语法元素设计
比特流一致性,对于每个比特流一致性测试,应满足以下所有条件:
1.对于与缓冲期SEI消息相关联的每个编码地图集访问单元n(n大于0),让变量deltaTime90k[n]指定如下:
deltaTime90k[n]=90000*(AuNominalRemovalTime[n]-AuFinalArrivalTime[n-1])
InitCabRemovalDelay[Htid][SchedSelIdx]的值约束如下:
–如果hrd_cbr_flag[!NalHrdModeFlag][Htid][SchedSelIdx]等于0,则以下条件为真:
InitCabRemovalDelay[Htid][SchedSelIdx]<=Ceil(deltaTime90k[n])
–否则hrd_cbr_flag[!NalHrdModeFlag][Htid][SchedSelIdx]等于1,则以下条件为真:
Floor(deltaTime90k[n])<=
InitCabRemovalDelay[Htid][SchedSelIdx]<=Ceil(deltaTime90k[n])
注1–每个拼接图帧在CAB中删除时的确切位数取决于选择哪个缓冲周期SEI消息来初始化HRD。编码器应考虑到这一点,以确保遵守所有规定的约束,HRD可以在任何一个缓冲期SEE消息中初始化。
2.CAB溢出被指定为CAB中的总位数大于CAB大小的情况。CAB不得溢出。
3.当hrd_low_delay_flag[Htid]等于0时,CAB永远不会下溢。CAB下溢规定如下:
–CAB下溢被指定为编码拼接图访问单元n的标称CAB移除时间AuNominalRemovalTime[n]小于编码拼接图访问单元n AuFinalArrivalTime[n]的最终CAB到达时间的条件,至少有一个值为n。
4.拼接图帧从CAB的标称移除时间(从解码顺序中的第二个拼接图开始)应满足附件A中对AuNominalRemovalTime[n]和AuCabRemovalTime[n]的限制。
5.对于每个当前拼接图帧,在按照规定调用从DAB中删除拼接图的过程后,DAB中解码的拼接图帧数,包括标记为“用于参考”或AtlasFrameOutputFlag等于1且AuCabRemovalTime[n]小于AuCabRemovalTime[currAtlasFrame]的所有拼接图帧n,其中,currAtlasFrame是当前拼接图帧,应小于或等于asps_max_dec_alas_frame_buffering_minus1。
6.当需要进行预测时,所有参考拼接图帧应存在于DAB中。每个AtlasFrameOutputFlag等于1的拼接图帧应在DAB输出时出现在DAB中,除非通过条款中规定的过程之一在其输出时间之前从DAB中删除
7.对于不是IRAP编码的、NoOutputBeforeRecoveryFlag等于1的每个当前拼接图帧,maxAtlasFrameOrderCnt-minAtlasFrameOrderCnt的值应小于MaxAtlasFrmOrderCntLsb/2。如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则maxAtlasFrame OrderCint-minAtlasFrameOrderCnt的值应小于MaxAtlasFrmOrderCntLsb ForMiv/2。
8.DabOutputInterval[n]的值是一个拼接图帧的输出时间与它之后的第一个拼接图帧输出时间之间的 差,且AtlasFrameOutputFlag等于1,应满足对配置文件、层,以及指定的解码过程在比特流中指定的电平。
9.对于同一个CAS中的任意两个拼接图帧m和n,当DabOutputTime[m]大于DabOutput Time[n]时,拼接图帧m的AtlasFrmOrderCntVal应大于拼接图帧n的AtlasFrmOrderCntVal。
恢复点SEI消息语义
recovery_afoccnt按输出顺序指定解码的拼接图帧的恢复点。如果在CAS中存在按照解码顺序在当前拼接图帧之后(即与当前SEI消息相关联的拼接图帧)的拼接图帧aFrmA,并且其AtlasFrmOrderCntVal等于当前拼接图帧的Atlasfrmardercntval加上recovery_afoc_cnt的值,则atlas frame aFrmA被称为恢复点拼接图帧。否则,输出顺序中的第一个AtlasFrmOrderCntVal大于当前拼接图帧的AtlasFrmaOrderCNTVal加上recovery_afoc_cnt的值的拼接图帧称为恢复点拼接图帧。恢复点拼接图帧在解码顺序上不应先于当前拼接图帧。从恢复点拼接图帧的输出顺序位置开始,以输出顺序显示的所有解码拼接图帧在内容上都是正确的或近似正确的。recovery_afoc_cnt的值应在以下范围内-MaxAtlasFrmOrderCntLsb/2到MaxAtlasFlmOrderCNTLsb/2-1(含)。如果当前拼接图帧条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则recovery_afoc_cnt的值应在以下范围内-MaxAtlasFrmOrderCntLsbForMiv/2到MaxAtlasFlmOrderCNTLsbForMIv/2–1。
通用ASPS级字符串
ASPSCommonByteString(stringByte,posByte)函数定义如下:
Figure PCTCN2023071083-appb-000026
VUI参数语义
vui_display_box_origin[d]指定相对于坐标系原点沿轴d的偏移。当vui_ddisplay_box_origin[d]的元素不存在时,应推断其值等于0。如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,用于表示vui_display_box_origin[d]的位数是asps_geometry_3d_bit_depth_minus1+1。当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,用于表示vui_display_box_origin[d]的位数为ashm_geometriy_3d_pit_depth_minus1+1。d的值等于0、1和2分别对应于X、Y和Z轴。
vui_display_box_size[d]指定显示框的大小,以轴d方向进行采样。当vui_display_box_size[d]的元素不存在时,其值未知。如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则表示vui_display_box_size[d]的位数为asps_geometry_3d_bit_depth_minus1+1。如果当前拼接图条带的afps_hetrogeneous_frame_tile_toolset_miv_present_flag等于1,用于表示vuidisplaybox_size[d]的位数是ashmgeometry3dbitdepth_minus1+1。
以下变量来自显示框参数:
minOffset[d]=vui_display_box_origin[d](G-1)
maxOffset[d]=vui_display_box_origin[d]+vui_display_box_size[d]
d的值等于0、1和2分别对应于X、Y和Z轴。
vui_anchor_point[d]表示锚点沿d轴的位置。如果当前拼接图条带的 afps_heterogeneous_frame_tile_toolset_miv_present_flag等于0,则vui_anchor_point[d]的值应在0到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内。如果vui_anchor_point[d]不存在,则应推断为等于0。用于表示vui_anhor_point[d]的位数为asps_geometry_3d_bit_depth_minus1+1。如果当前拼接图条带的afps_heterogeneous_frame_tile_toolset_miv_present_flag等于1,则vui_ankor_point的值应在0到2 ashm_geometroy_3d_pth_minos1+1-1的范围内。如果vui_anchor_point[d]不存在,则应推断为等于0。用于表示vui_ancor_point[d]的位数为ashm_geometry_3d_bit_depth_minus1+1。d值等于0、1和2分别对应于X、Y和Z轴。
多视点标准,可以供ISO/IEC 23090-12使用
深度扩展过程
此过程将拼接图的整数深度值扩展为场景坐标中的浮点深度值(例如米)。
整数深度值可以缩放到实现定义的位深度和范围0…maxSampleD。否则,maxSampleD设置为2 asps_geometry_2d_bit_depth_minus1+1–1。
在特殊情况下,如果asps_heterogeneous_miv_extension_present_flag等于1,则maxSampleD设置为2 ashm_geometriy_2d_pit_deptho_minus1+1–1
重建MPI过程
此过程从ptc_restricted_geometry_flag等于1的比特流中解码重建体积帧重建MPI帧。
注–所述重建过程将重建整个MPI帧。实现可以在投影到视口之前形成视口,而无需缓冲整个纹理和透明度层集。
该过程的输入包括:
-视图参数列表,包含索引为viewIdx的(唯一)源视图的内部和外部参数;
-对于每个拼接图:
-变量atlasID,即拼接图ID;
-变量AspsFrameHeight[atlasID]和AspsFrame Width[atlasID]分别表示拼接图帧的行数和列数;
-二维阵列AtlasBlockToPatchMap;
-变量PatchPackingBlockSize;
-texFrame的三维数组尺寸为3×AspsFrameHeight[atlasID]×AspsFrameworkWidth[atlasID];
-transpFrame二维数据尺寸为AspsFrameHeight[atlasID]×AspsFrame Width[atlasID];
注–0透明度级别对应于完全透明的样本,而最大透明度级别2 (ai_attribute_2d_bit_depth_minus1[atlasID][attrIdx]+1)–1,其中attrIdx是透明度属性的索引,对应于完全不透明的样本。编码法则在最小和最大透明度级别之间是线性的。
-目标视图的外部和内部参数;
变量maxDepthSampleValue,表示编码几何体样本的最大值,设置为2 (asps_geometry_3d_bit_depth_minus1+1)–1;在特殊情况下,如果asps_heterogeneous_miv_extension_present_flag等于1,则变量maxDepthSampleValue表示编码几何体样本的最大值,设置为2 (ashm_geometry_3d_bit_depth_minus1+1)–1。
-常量maxNbLayers,表示MPI的最大深度层数,设置为maxDepthSampleValue+1。
解码案例四
1、在解析时,从V3C码流解析得到VPS,在VPS中解析得到ptl_profile_toolset_idc。判断ptl_profile_toolset_idc=128~133,表示当前码流中同时包含点云和多视点两类码流;
2、对每一张拼接图解析ASPS和AFPS等高层语法;
a)在ASPS中首先解析asps_vpcc_extension_present_flag、asps_miv_extension_present_flag、asps_heterogeneous_miv_extension_present_flag(第一子语法元素)和asps_extension_5bits。asps_vpcc_extension_present_flag为点云的标志位,asps_miv_extension_present_flag为多视点的标志位,asps_heterogeneous_miv_extension_present_flag点云+多视点的标志位。在一些实施例中,asps_heterogeneous_miv_extension_present_flag的取值是对asps_vpcc_extension_present_flag的取值和asps_miv_extension_present_flag的取值进行与运算得到的,即这里可以替换为:在ASPS中首先解析asps_vpcc_extension_present_flag、asps_miv_extension_present_flag、和asps_extension_6bits。asps_vpcc_extension_present_flag和asps_miv_extension_present_flag进行与运算得到asps_heterogeneous_miv_extension_present_flag。
b)分析asps_heterogeneous_miv_extension_present_flag:
i.判断asps_heterogeneous_miv_extension_present_flag不存在时,表示当前该拼接图为同构内容(所有条带均为点云类型或多视点类型);
ii.判断asps_heterogeneous_miv_extension_present_flag存在且等于0时,表示当前该拼接图为同构内容(所有条带均为点云类型或多视点类型);
iii.判断asps_heterogeneous_miv_extension_present_flag存在且等于1时,表示当前该拼接图为异构内容,同时存在点云条带和多视点条带。因此将当前拼接图ASPS辅助高层信息拆分为两个子信息集合,即一个子集合用于多视点条带实现解码,另一子集合用于点云条带实现解码。其中点云条带所需辅助信息通过标准23090-5中第8部分解析可得;多视点条带所需辅助信息通过标准23090-5第8部分和新增语法asps_geometry_3d_bit_depth_minus1_for_miv和标准23090-12第8部分解析可得。
c)在AFPS中首先解析afps_miv_extension_present_flag、afps_heterogeneous_type_extension_present_flag(第二语法元素中的第三子语法元素)、afps_heterogeneous_miv_extension_present_flag(第二子语法元素)和afps_extension_5bits。在一些实施例中,afps_heterogeneous_miv_extension_present_flag的取值是对asps_vpcc_extension_present_flag的取值和asps_miv_extension_present_flag的取值进行与运算得到的,即这里可以替换为:在AFPS中首先解析afps_miv_extension_present_flag、afps_heterogeneous_type_extension_present_flag和afps_extension_6bits。asps_vpcc_extension_present_flag和asps_miv_extension_present_flag进行与运算得到afps_heterogeneous_miv_extension_present_flag。
d)分析afps_heterogeneous_miv_extension_present_flag:
i.判断afps_heterogeneous_miv_extension_present_flag不存在时,表示当前该拼接图为同构内容(所有条带均为点云类型或多视点类型);
ii.判断afps_heterogeneous_miv_extension_present_flag存在且等于0时,表示当前该拼接图为同构内容(所有条带均为点云类型或多视点类型);
iii.判断afps_heterogeneous_miv_extension_present_flag存在且等于1,时,表示当前该拼接图为异构内容,同时存在点云条带和多视点条带。因此将当前拼接图AFPS辅助高层信息拆分为两个子信息集合,即一个子集合用于多视点条带实现解码,另一子集合用于点云条带实现解码。其中点云条带所需辅助信息通过标准23090-5中第8部分解析可得;多视点条带所需辅助信息通过标准23090-5第8部分和标准23090-12第8部分解析可得。
e)分析afps_heterogeneous_type_extension_present_flag:
i.判断afps_heterogeneous_type_extension_present_flag不存在,表示当前该拼接图所有条带均为同一类型;
ii.否则遍历所有tile,判断第i个条带的afps_heterogeneous_tile_type[i]=0,表示当前条带为点云条带;判断第i个条带的afps_heterogeneous_tile_type[i]=1,表示当前条带为多视点条带,afps_heterogeneous_tile_type[i]为第二语法元素中的第四子语法元素;也就是说,当afps_heterogeneous_type_extension_present_flag存在时(例如取值为1),根据afps_heterogeneous_tile_type[i]遍历所有title,判断第i个条带的条带类型。需要说明的是,本申请实施例中语法元素的命名主要是方便理解和行文,实际应用和标准文本中可以作出修改,但其语义内容应当一致或相近,例如,afps_heterogeneous_type_extension_present_flag和afps_heterogeneous_frame_tile_toolset_miv_present_flag均表示当前拼接图是否为异构混合拼接图,afps_heterogeneous_tile_type[i]和afps_heterogeneous_frame_tile_toolset_miv_present_flag[i]均表示若第i个条带的条带类型。
码流需保证afps_heterogeneous_miv_extension_present_flag和asps_heterogeneous_miv_extension_present_flag的绝对一致性。
3、进而解析每一个条带中的每一个子图块信息patch_data_unit,在已知当前条带采用多视点辅助信息的前提下,确定当前子图块采用多视点视频解码标准实现;在已知当前条带采用点云辅助信息的前提下,确定当前子图块采用点云视频解码标准实现。
具体语法元素解析参见以下表1,表2-1,表3,表4-1,表VI,表6,表7-1,表8,表9-1-1-1,表9-1-2-1。
表1示出了可用的工具集配置文件组件(Available toolset profile components)的一个示例。表1提供了为V3C定义的工具集配置文件组件及其相应的标识语法元素值列表,例如ptl_profile_toolset_idc和ptc_one_v3c_frame_only_flag,该定义可以仅供本文档使用。语法元素ptl_profile_toolset_idc提供了工具集配置文件的主要定义,如ptc_one_v3c_frame_only_flag等附加语法元素可以指定已定义配置文件的附加特征或限制。ptc_one_v3c_frame_only_flag可以只用于支持单个V3C帧。需要说明的是,ptl_profile_toolset_idc中的2..63,67..127,134..255保留,暂时未定义,标准组织可能在未来的标准中再做规定。表1中定义的配置文件类型可以包括动态(Dynamic)或静态 (Static)。
表2-1示出了通用拼接图序列参数集的RBSP语法(General atlas sequence parameter set RBSP syntax),可以供ISO/IEC 23090-5使用。利用拼接图序列参数集中的扩展语法元素asps_vpcc_extension_present_flag表示拼接图归于与点云,利用asps_miv_extension_present_flag表示拼接图归属于多视点,利用asps_heterogeneous_miv_extension_present_flag表示拼接图类型,具体地根据该语法元素的取值确定拼接图应该归属于点云/多视点/点云+多视点。利用第一扩展语法元素asps_geometry_3d_bit_depth_minus1_for_miv确定重建几何内容的几何坐标的位深度。解码案例四只需要新增第一扩展语法元素便可以得到多视点条带所需辅助信息。
表2-1
Figure PCTCN2023071083-appb-000027
Figure PCTCN2023071083-appb-000028
表4-1为通用拼接图帧参数集的RBSP语法(General atlas frame parameter set RBSP syntax),可以供ISO/IEC 23090-5使用。利用afps_miv_extension_present_flag表示拼接图归属于多视点,利用afps_heterogeneous_miv_extension_present_flag表示拼接图类型,具体地根据该语法元素的取值确定拼接图应该归属于点云/多视点/点云+多视点。利用afps_heterogeneous_type_extension_present_flag判断当前该拼接图所有条带是否为同一类型,若不是,根据afps_heterogeneous_tile_type[i]遍历每个条带,确定条带类型。
表4-1
Figure PCTCN2023071083-appb-000029
Figure PCTCN2023071083-appb-000030
下面ASPS语法元素的语义和AFPS语法元素的语义进行解释说明。
1、ASPS语法元素的语义:
asps_extension_present_flag等于1指定语法元素asps_vpcc_extension_present_flag、asps_miv_extension_present_flag、asps_hetergeous_miv_extension_present_flag和asps_extension_5位存在于atlas_sequence_parameter_set_rbsp()语法结构中。asps_extension_present_flag等于0的asps_extension_present_flag指定语法元素asps_vpcc_extension_present_fag、asps_miv_extension_present-flag、asps_hetergeous_miv_extension_present_flag和asps_extension_5位不存在。
asps_terogeneous_miv_extension_present_flag等于1指定asps_geometry_3d_bit_depth_minus1_for_miv语法元素存在于atlas_sequence_parameter_set_rbsp()语法结构中。asps_terogeneous_miv_extension_present_flag等于0表示不存在此语法。如果不存在,则推断asps_terogeneous_miv_extension_present_flag的值等于0。
asps_extension_5bits等于0指定asps RBSP语法结构中不存在语法元素asps_extension_data_flag。如果存在,asps_extension_5bits在符合本文件版本的比特流中应等于0。asps_extension_5bits不等于0的值保留供ISO/IEC将来使用。解码器应允许asps_extension_5bits的值不等于0,并应忽略ASPS NAL单元中的所有asps_extension_data_flag语法元素。当不存在时,asps_extension_5bits的值被推断为等于0。
asps_geometry_3d_bit_depth_minus1_for_miv表示重建体积内容的几何坐标的位深度。asps_geometry_3d_bit_depth_minus1_for_miv应在0到31(包括0和31)的范围内。
2、AFPS语法元素的语义:
afps_extension_present_flag等于1指定语法元素afps_miv_extension_present_fag、afps_heterogeneous_type_extension_present_flag、afps_heterogeneous_miv_expansion_ppresent_fag和afps_expansion_5bits存在于atlas_frame_parame_parame_set_rbsp()语法结构中。等于0afps_extension_present_flag指定语法元素afps_miv_extension_present_fag、afps_heterogeneous_type_extension_present_flag、afps_heterogeneous_miv_eExtension_ppresent_fag和afps_eextension_5bits不存在。
afps_heterogeneous_type_extension_present_flag等于1指定引用此afps的条带包括异构类型。afps_heterogeneous_type_extension_present_flag等于0指定引用此afps的每个条带包括同一类型。如果不存在,则推断afps_heterogeneous_type_extension_present_flag的值等于0。
afps_heterogeneous_miv_extension_present_flag等于1指定XXXXX语法元素存在于atlas_frame_parame_parame_set_rbsp()语法结构中。afps_heterogeneous_miv_extension_present_flag等于0表示XXXXX语法元素不存在。如果不存在,则推断afps_heterogeneous_miv_extension_present_flag的值等于0。对于符合本文件此版本的比特流,afps_heterogeneous_miv_extension_present_flag和asps_heterogeneous_miv_extension_present_flag应同时出现。
afps_extension_5bits等于0指定afps RBSP语法结构中不存在语法元素afps_extension_data_flag。如果存在,afps_extension_5bits在符合本文件版本的比特流中应等于0。afps_extension_5bits不等于0的值保留供ISO/IEC将来使用。解码器应允许afps_extension_5bits的值不等于0,并应忽略AFPS NAL单元中的所有afps_extension_data_flag语法元素。当不存在时,afps_extension_5bits的值被推断为等于0。
afps_heterogeneous_tile_type[i]表示表VI中规定的tileID等于i的条带类型。表示为保留的值保留供ISO/IEC将来使用,不应出现在符合本文档本版本的比特流中。符合本文档的解码器应忽略此类保留的条带类型。
表VI条带类型
afps_heterogeneous_tile_type[i] 条带类型
0 VPCC
1 MIV
2…3 保留
拼接图条带数据单元头语义:
ath_raw_3d_offset_axis_bit_count_minus1加1表示rpdu_3d_offset_u[tileID][p]、rpdu_3d_offset_v[tileID][p]和rpdu_3e_offset_d[tileID][p]三个语法元素的值的固定位宽大小,其中p表示子图块索引为p、tileID表示子图块位于条带ID等于tileID的条带中。
当存在,并且如果当前条带的afps_heterogeneous_tile_type等于0,则用于表示ath_raw_3d_offset_axis_bit_count_minus1的语法元素的长度等于Floor(Log2(asps_geometry_3d_bit_depth_minus1+1))。
当不存在,则ath_raw_3d_offset_axis_bit_count_minus1的语法元素的值推断为
Max(0,asps_geometry_3d_bit_depth_minus1-asps_geometry_2d_bit_depth_minus1)-1.
变量RawShift定义如下:
如果afps_raw_3d_offset_bit_count_explicit_mode_flag=1,
RawShift=asps_geometry_3d_bit_depth_minus1-ath_raw_3d_offset_axis_bit_count_minus1
否则
RawShift=asps_geometry_2d_bit_depth_minus1+1
当存在,并且如果当前条带的afps_heterogeneous_tile_type等于1,则用于表示ath_raw_3d_offset_axis_bit_count_minus1的语法元素的长度等于Floor(Log2(asps_geometry_3d_bit_depth_minus1_for_miv+1))。
当不存在,则ath_raw_3d_offset_axis_bit_count_minus1的语法元素的值推断为
Max(0,asps_geometry_3d_bit_depth_minus1_for_miv-asps_geometry_2d_bit_depth_minus1)-
1.
变量RawShift定义如下:
如果afps_raw_3d_offset_bit_count_explicit_mode_flag=1,
RawShift=asps_geometry_3d_bit_depth_minus1_for_miv-ath_raw_3d_offset_axis_bit_count_minus1
否则
RawShift=asps_geometry_2d_bit_depth_minus1+1
子图块数据单元(Patch data unit)语义
pdu_3d_offset_u[tileID][p]表示沿切线轴重建子图块的偏移,当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。
子图块数据单元(Patch data unit)语义
pdu_3d_offset_u[tileID][p]表示沿切线轴重建子图块的偏移,当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。
如果当前拼接图条带的afps_heterogeneous_tile_type等于0,则pdu_3d_offset_u[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含边界)。用于表示pdu_3d_offset_u[tileID][p]的位数的值为asps_geometry_3d_bit_depth_minus1+1。
如果当前拼接图条带的afps_heterogeneous_tile_type等于1,则pdu_3d_offset_u[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1_for_miv+1-1的范围内(含)。用于表示pdu_3d_offset_u[tileID][p]的位数的值为asps_geometry_3d_bit_depth_minus1_for_miv+1。
pdu_3d_offset_v[tileID][p]表示沿双切线轴重建子图块的偏移,当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。
如果当前拼接图条带的afps_heterogeneous_tile_type等于0,则pdu_3d_offset_v[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含边界)。用于表示pdu_3d_offset_v[tileID][p]的位数的值为asps_geometry_3d_bit_depth_minus1+1。
如果当前拼接图条带的afps_heterogeneous_tile_type等于1,则pdu_3d_offset_v[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1_for_miv+1-1的范围内(含)。用于表示pdu_3d_offset_v[tileID][p]的位数的值为asps_geometry_3d_bit_depth_minus1_for_miv+1。
pdu_3d_offset_d[tileID][p]表示沿法线轴重建子图块的偏移,当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。Pdu3dOffsetD[tileID][p]定义如下:
Pdu3dOffsetD[tileID][p]=pdu_3d_offset_d[tileID][p]<<ath_pos_min_d_quantizer
如果当前拼接图条带的afps_heterogeneous_tile_type等于0,则Pdu3dOffsetD[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含)。用于表示pdu_3d_offset_v[tileID][p]位数的值为(asps_geometry_3d_bit_depth_minus1–ath_pos_min_d_quantizer+1)。
如果当前拼接图条带的afps_heterogeneous_tile_type等于1,则Pdu3dOffsetD[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1_for_miv+1-1的范围内(含)。用于表示pdu_3d_offset_v[tileID][p]位数的值为(asps_geometry_3d_bit_depth_minus1_for_miv–ath_pos_min_d_quantizer+1)。
pdu_3d_range_d[tileID][p](如果存在)表示转换为标称表示后,沿法线轴重建的位深度子图块几何体样本中预期存在的偏移标称最大值。当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。Pdu3dRangeD[tileID][p]定义如下:
Figure PCTCN2023071083-appb-000031
如果当前拼接图条带的afps_heterogeneous_tile_type等于0,变量rangeDBitDepth取值如下:
rangeDBitDepth=Min(asps_geometry_2d_bit_depth_minus1,asps_geometry_3d_bit_depth_minus1_for_miv)+1
如果pdu_3d_range_d[tileID][p]不存在,Pdu3dRangeD[tileID][p]的值被推断为2 rangeDBitDepth–1。如果存在,Pdu3dRangeD[tileID][p]的值应在0到2 rangeDBitDepth–1的范围内(含)。
表示pdu_3d_range_d[tileID][p]的位数等于(rangeDBitDepth–ath_pos_delta_max_d_quantizer)。
合并子图块数据单元(Merge Patch data unit)语义
mpdu_3d_offset_u[tileID][p]表示要沿切线轴应用于重建两个子图块之间的偏移差异,其中两个子图块分别是当前拼接图中条带索引为tileID、子图块索引为p的子图块和当前拼接图条带索引为tileID、子图块索引为RefPatchIdx的子图块。
如果当前拼接图条带的afps_heterogeneous_tile_type等于0,则mpdu_3d_offset_u[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1+1+1)到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含)。
如果当前拼接图条带的afps_heterogeneous_tile_type等于1,则mpdu_3d_offset_u[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1_for_miv+1+1)到(2 asps_geometry_3d_bit_depth_minus1_for_miv+1–1)的范围内(含)。
如果mpdu_3d_offset_u[tileID][p]不存在,该值被推断为0。
mpdu_3d_offset_v[tileID][p]表示要沿双切线轴应用于重建两个子图块之间的偏移差异,其中两个子图块分别是当前拼接图中条带索引为tileID、子图块索引为p的子图块和当前拼接图条带索引为tileID、子图块索引为RefPatchIdx的子图块。
如果当前拼接图条带的afps_heterogeneous_tile_type等于0,则mpdu_3d_offset_v[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1+1+1)到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含)。
如果当前拼接图条带的afps_heterogeneous_tile_type等于1,则mpdu_3d_offset_v[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1_for_miv+1+1)到(2 asps_geometry_3d_bit_depth_minus1_for_miv+1–1)的范围内(含)。
如果mpdu_3d_offset_v[tileID][p]不存在,该值被推断为0。
mpdu_3d_offset_d[tileID][p]表示要沿法线轴应用于重建两个子图块之间的偏移差异,其中两个子图块分别是当前拼接图中条带索引为tileID、子图块索引为p的子图块和当前拼接图条带索引为tileID、子图块索引为RefPatchIdx的子图块。Mpdu3dOffsetD[tileID][p]定义如下:
Mpdu3dOffsetD[tileID][p]=mpdu_3d_offset_d[tileID][p]<<ath_pos_min_d_quantizer
如果当前拼接图条带的afps_heterogeneous_tile_type等于0,则mpdu_3d_offset_d[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1+1+1)到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含)。
如果当前拼接图条带的afps_heterogeneous_tile_type等于1,则mpdu_3d_offset_d[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1_for_miv+1+1)到(2 asps_geometry_3d_bit_depth_minus1_for_miv+1–1)的范围内(含)。
如果mpdu_3d_offset_d[tileID][p]不存在,该值被推断为0。
帧间子图块数据单元(Inter Merge Patch data unit)语义
ipdu_3d_offset_v[tileID][p]表示要沿双切线轴应用于重建两个子图块之间的偏移差异,其中两个子图块分别是当前拼接图中条带索引为tileID、子图块索引为p的子图块和当前拼接图条带索引为tileID、 子图块索引为RefPatchIdx的子图块。
如果当前拼接图条带的afps_heterogeneous_tile_type等于0,则ipdu_3d_offset_v[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1+1+1)到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含)。
如果当前拼接图条带的afps_heterogeneous_tile_type等于1,则ipdu_3d_offset_v[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1_for_miv+1+1)到(2 asps_geometry_3d_bit_depth_minus1_for_miv+1–1)的范围内(含)。
如果ipdu_3d_offset_v[tileID][p]不存在,该值被推断为0。
ipdu_3d_offset_d[tileID][p]表示要沿法线轴应用于重建两个子图块之间的偏移差异,其中两个子图块分别是当前拼接图中条带索引为tileID、子图块索引为p的子图块和当前拼接图条带索引为tileID、子图块索引为RefPatchIdx的子图块。Mpdu3dOffsetD[tileID][p]定义如下:
Ipdu3dOffsetD[tileID][p]=ipdu_3d_offset_d[tileID][p]<<ath_pos_min_d_quantizer
如果当前拼接图条带的afps_heterogeneous_tile_type等于0,则ipdu_3d_offset_d[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1+1+1)到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含)。
如果当前拼接图条带的afps_heterogeneous_tile_type等于1,则ipdu_3d_offset_d[tileID][p]的值应在(-2 asps_geometry_3d_bit_depth_minus1_for_miv+1+1)到(2 asps_geometry_3d_bit_depth_minus1_for_miv+1–1)的范围内(含)。
如果ipdu_3d_offset_d[tileID][p]不存在,该值被推断为0。
ISO/IEC 23090-5:2022/Amd 1:-子条款8.4中的规范适用于以下附加内容。
码流一致性要求asps_geometry_3d_bit_depth_minus1和asps_geometry_2d_bit-depth_minus1分别等于gi_geometroy_3d_coordinates_bit_depth_minusl和gi_geametry_2d_bit_depth_minus1。但在特殊情况下,如果asps_heterogeneous_miv_extension_present_flag等于1,则gi_geometry_3d_coordinates_bit_depth_minus1和gi_geametry_2d_bit_depth_minus1特指的是ISO/IEC23090-5中的内容。asps_geometry_3d_bit_depth_minus1_for_miv不一定要等于gi_geominatory_3d_coordinates_bit__depth_nus1和gi_geometriy_2d_pth_minos1。
(3)子图块数据单元的通用解码过程
TilePatch3dOffsetU[tileID][p]表示沿切线轴重建子图块的偏移,当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。
如果当前拼接图条带的afps_heterogeneous_tile_type等于0,则TilePatch3dOffsetU[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含边界)。
如果当前拼接图条带的afps_heterogeneous_tile_type等于1,则TilePatch3dOffsetU[tileID][p]的值应在0到(2 asps_geometry_3d_bit_depth_minus1_for_miv+1-1)的范围内(含)。
TilePatch3dOffsetV[tileID][p]表示沿双切线轴重建子图块的偏移,当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。
如果当前拼接图条带的afps_heterogeneous_tile_type等于0,则TilePatch3dOffsetV[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含边界)。
如果当前拼接图条带的afps_heterogeneous_tile_type等于1,则TilePatch3dOffsetV[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1_for_miv+1-1的范围内(含)。
TilePatch3dOffsetD[tileID][p]表示沿法线轴重建子图块的偏移,当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。
如果当前拼接图条带的afps_heterogeneous_tile_type等于0,则TilePatch3dOffsetD[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内(含)。
如果当前拼接图条带的afps_heterogeneous_tile_type等于1,则TilePatch3dOffsetD[tileID][p]的值应在0到2 asps_geometry_3d_bit_depth_minus1_for_miv+1-1的范围内(含)。
TilePatch3dRangeD[tileID][p](如果存在)表示转换为标称表示后,沿法线轴重建的位深度子图块几何体样本中预期存在的偏移标称最大值。当前子图块属于条带索引为tileID的条带中子图块索引为p的子图块。
如果当前拼接图条带的afps_heterogeneous_tile_type等于0,变量
rangeDBitDepth=Min(asps_geometry_2d_bit_depth_minus1,asps_geometry_3d_bit_depth_minus1_for_miv)+1
如果当前拼接图条带的afps_heterogeneous_tile_type等于1,变量rangeDBitDepth=Min(asps_geometry_2d_bit_depth_minus1,asps_geometry_3d_bit_depth_minus1_for_miv)+1。TilePatch3dRangeD[tileID][p]取值从0到2 rangeDBitDepth–1之间(含)。
4、标准相关语法限制条件。
参见表6V-PCC工具集配置文件组件允许的最大语法元素值(Max allowed syntax element values for  the V-PCC toolset profile components)。
表7-1异构工具集配置文件组件的允许的最大语法元素值扩展(Max allowed syntax element values for the heterogeneous toolset profile components Extended)
表7-1
Figure PCTCN2023071083-appb-000032
参见表8为MIV工具集配置文件组件允许的最大语法元素值(Allowable values of syntax element values for the MIV toolset profile components)
表9-1-1-1异构工具集配置文件组件允许的语法元素值扩展(Allowable values of syntax element values for the heterogeneous toolset profile components Extended)
表9-1-1
Figure PCTCN2023071083-appb-000033
Figure PCTCN2023071083-appb-000034
Figure PCTCN2023071083-appb-000035
表9-1-2-1异构工具集配置文件组件允许的语法元素值扩展(Allowable values of syntax element values for the heterogeneous toolset profile components Extended)
表9-1-2-1
Figure PCTCN2023071083-appb-000036
Figure PCTCN2023071083-appb-000037
Figure PCTCN2023071083-appb-000038
以下限制适用于符合MIV扩展混合VPCC扩展工具集配置文件组件的比特流或V3C子比特流集合:
-对于每个拼接图,asps中除了asps_geometry_3d_bit_depth_minus1之外的语法元素对于MIV和VPCC都应该具有相同的值。
-对于每个拼接图,如果asps_vpcc_extension_present_flag等于1,则asps_terogeneous_miv_extension_present_flag、afps_heterogeneous_miv_extention_present_flag和afps_hettogeneous_type_extension_present_flag存在,且值应等于1。
-对于每个拼接图,如果asps_vpcc_extension_present_flag等于0,则asps_terogeneous_miv_extension_present_flag、afps_heterogeneous_miv_extention_present_flag和afps_hettogeneous_type_extension_present_flag不存在。
通用ASPS级字符串
ASPSCommonByteString(stringByte,posByte)函数定义如下:
Figure PCTCN2023071083-appb-000039
VUI参数语义
vui_display_box_origin[d]指定相对于坐标系原点沿轴d的偏移。当vui_ddisplay_box_origin[d]的元素不存在时,应推断其值等于0。如果当前拼接图条带的afps_heterogeneous_tile_type等于0,用于表示vui_display_box_origin[d]的位数是asps_geometry_3d_bit_depth_minus1+1。当前拼接图条带的afps_heterogeneous_tile_type等于1,用于表示vui_display_box_origin[d]的位数为asps_geometry_3d_bit_depth_minus1_for_miv+1。d的值等于0、1和2分别对应于X、Y和Z轴。
vui_display_box_size[d]指定显示框的大小,以轴d方向进行采样。当vui_display_box_size[d]的元素不存在时,其值未知。如果当前拼接图条带的afps_heterogeneous_tile_type等于0,则表示vui_display_box_size[d]的位数为asps_geometry_3d_bit_depth_minus1+1。如果当前拼接图条带的afps_heterogeneous_tile_type等于1,用于表示vuidisplaybox_size[d]的位数是asps_geometry_3d_bit_depth_minus1_for_miv+1。
以下变量来自显示框参数:
minOffset[d]=vui_display_box_origin[d](G-1)
maxOffset[d]=vui_display_box_origin[d]+vui_display_box_size[d]
d的值等于0、1和2分别对应于X、Y和Z轴。
vui_anchor_point_present_flag等于1表示vui_anchr_point[d]语法元素存在于vui_parameters()语法结构中。vui_anchor_point_present_flag等于0表示不存在vui_anchr_point[d]语法元素。
vui_anchor_point[d]表示锚点沿d轴的位置。如果当前拼接图条带的afps_heterogeneous_tile_type等于0,则vui_anchor_point[d]的值应在0到2 asps_geometry_3d_bit_depth_minus1+1-1的范围内。如果vui_anchor_point[d]不存在,则应推断为等于0。用于表示vui_anhor_point[d]的位数为asps_geometry_3d_bit_depth_minus1+1。如果当前拼接图条带的afps_heterogeneous_tile_type等于1,则vui_ankor_point的值应在0到2 asps_geometry_3d_bit_depth_minus1_for_miv-1的范围内。如果vui_anchor_point[d]不存在,则应推断为等于0。用于表示vui_ancor_point[d]的位数为asps_geometry_3d_bit_depth_minus1_for_miv+1。d值等于0、1和2分别对应于X、Y和Z轴。
多视点标准,可以供ISO/IEC 23090-12使用
深度扩展过程
此过程将拼接图的整数深度值扩展为场景坐标中的浮点深度值(例如米)。
整数深度值可以缩放到实现定义的位深度和范围0…maxSampleD。否则,maxSampleD设置为2 asps_geometry_2d_bit_depth_minus1+1–1。
在特殊情况下,如果asps_heterogeneous_miv_extension_present_flag等于1,则maxSampleD设置为2 ashm_geometriy_2d_pit_deptho_minus1+1–1
重建MPI过程
此过程从ptc_restricted_geometry_flag等于1的比特流中解码重建体积帧重建MPI帧。
注–所述重建过程将重建整个MPI帧。实现可以在投影到视口之前形成视口,而无需缓冲整个纹理和透明度层集。
该过程的输入包括:
-视图参数列表,包含索引为viewIdx的(唯一)源视图的内部和外部参数;
-对于每个拼接图:
-变量atlasID,即拼接图ID;
-变量AspsFrameHeight[atlasID]和AspsFrame Width[atlasID]分别表示拼接图帧的行数和列数;
-二维阵列AtlasBlockToPatchMap;
-变量PatchPackingBlockSize;
-texFrame的三维数组尺寸为3×AspsFrameHeight[atlasID]×AspsFrameworkWidth[atlasID];
-transpFrame二维数据尺寸为AspsFrameHeight[atlasID]×AspsFrame Width[atlasID];
注–0透明度级别对应于完全透明的样本,而最大透明度级别2 (ai_attribute_2d_bit_depth_minus1[atlasID][attrIdx]+1)–1,其中attrIdx是透明度属性的索引,对应于完全不透明的样本。编码法则在最小和最大透明度级别之间是线性的。
-目标视图的外部和内部参数;
变量maxDepthSampleValue,表示编码几何体样本的最大值,设置为2 (asps_geometry_3d_bit_depth_minus1+1)–1;在特殊情况下,如果asps_heterogeneous_miv_extension_present_flag等于1,则变量maxDepthSampleValue表示编码几何体样本的最大值,设置为2 (asps_geometry_3d_bit_depth_minus1_for_miv+1)–1。
-常量maxNbLayers,表示MPI的最大深度层数,设置为maxDepthSampleValue+1。
针对融合多视点和点云的码流,无法实现在一张拼接图中存在多张条带,每张条带均为多视点的子块图合集或者点云的子块图合集的情况。现有标准只能实现一张拼接图内存在一种类型的条带。因此需要扩展相关标准以此分辨出一张拼接图中是否同时存在多视点类型条带和点云类型条带。本申请实施例通过新增两个语法元素asps_heterogeous_miv_extension_present_flag和afps_heterogeneous_miv_extension_present_flag用于拼接图级别参数(AFPS)和拼接图序列级别参数(ASPS),以此来实现码流是否同时包含点云和多视点。但是与解码案例三不同的是,解码案例四将异构新增的用于多视点的相关ASPS语法元素和AFPS语法元素将在ASPS和AFPS解析中,而解码案例三则是将这些相关语法元素打包放在新的参数集中解析。
此外,解码案例四将每一条带是否需要判断条带类型这一工作用一个新增的语法元素afps_heterogeneous_type_extension_present_flag表示。
本申请实施例用于实现码流中同时存在多视点拼接图、点云拼接图、异构混合拼接图的编解码方案,并且扩展了相关标准。具备以下优点:1)针对由不同的格式的数据组成的应用场景,可以通过这种方式,为不同格式的数据(如3D网格、3D点云、多视图图像等)提供实时沉浸式视频交互服务,促进VR/AR/MR产业的发展;2)将多视点视频图像与点云格式的数据混合编码,和分别编码再调用各自解码器独立解多路信号相比,要调用的解码器数量少,充分利用解码器的处理像素率,对硬件要求降低;3)保留来自不同格式的数据(点云等)的渲染优点,提高图像的合成质量;4)进一步提升异构数据的重建质量和编码性能。
本申请实施例还提供了一种编码装置,图11为本申请一实施例提供的编码装置的示意性框图,该编码装置110应用于编码器。如图11所示,编码装置110包括:
处理单元1101,配置为对至少两种表达格式的视觉媒体内容进行处理,得到至少两种同构区块;
拼接单元1102,配置为对所述至少两种同构区块进行拼接,得到拼接图和拼接图信息,其中,所述拼接图为异构混合拼接图时,所述拼接图中包括至少两种同构区块,不同种同构区块对应不同的视觉媒体内容表达格式和不同的同构区块信息;
编码单元1103,配置为对所述拼接图和拼接图信息进行编码,得到码流。
在一些实施例中,所述拼接图信息中包括第一语法元素,根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图。
在一些实施例中,所述第一语法元素包括第一子语法元素和第二子语法元素;所述根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图,包括:
如果所述第一子语法元素的取值和所述第二子语法元素的取值相等时,根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图。
在一些实施例中,所述根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图,包括:所述取值为第一预设值,则确定所述拼接图为异构混合拼接图;所述取值为第二预设值,则确定所述拼接图为同构拼接图。
在一些实施例中,所述根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图,包括:
所述取值为第三预设值,则确定所述拼接图为包括第一表达格式和第二表达格式的同构区块的异构混合拼接图,其中,所述第一表达格式和所述第二表达格式为不同表达格式;
所述取值为第四预设值,则确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;
所述取值为第五预设值,则确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。
在一些实施例中,所述第一子语法元素为拼接图序列参数集ASPS的语法元素,所述第二子语法元素为拼接图帧参数集AFPS的语法元素。
在一些实施例中,所述方法包括:在ASPS解析第一子语法元素;根据所述第一子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;
在AFPS解析第二子语法元素;根据所述第二子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图。
在一些实施例中,所述拼接图信息中不包括所述第一语法元素,确定所述拼接图为同构拼接图。
在一些实施例中,所述拼接图为异构混合拼接图时,所述拼接图信息还包括第二语法元素;根据所述第二语法元素确定所述拼接图中同构区块的表达格式。
在一些实施例中,所述根据所述第二语法元素确定所述拼接图中同构区块的表达格式,包括:第i个区块的第二语法元素的取值为第六预设值时,则确定所述第i个区块的表达格式为第一表达格式;第i个区块的第二语法元素的取值为第七预设值时,则确定所述第i个区块的表达格式为第二表达格式。
在一些实施例中,第二语法元素包括:第三子语法元素和第四子语法元素;
所述方法包括:在ASPS解析第一子语法元素;根据所述第一子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;在AFPS解析第二子语法元素和第三子语法元素;根据所述第二子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;根据所述第三子语法元素的取值确定所述拼接图为异构混合拼接图时,根据所述第四子语法元素确定所述拼接图中同构区块的表达格式。这里,根据所述第三子语法元素的取值确定所述拼接图为异构混合拼接图时,解析每个同构区块的第四语法元素,第四语法元素的取值确定每个同构区块的表达格式。
在一些实施例中,所述拼接图为异构混合拼接图时,所述拼接图信息中包括至少两种同构区块信息,其中,不同表达格式的同构区块对应不同的同构区块信息。
在一些实施例中,所述同构区块信息包括ASPS的语法元素和AFPS的语法元素;不同的同构区块信息对应不同的所述ASPS的语法元素和所述AFPS的语法元素。
在一些实施例中,所述同构区块为多视点视频区块时对应第一同构区块信息,所述同构区块为点云区块时对应第二同构区块信息;所述第一同构区块信息和所述第二同构区块信息包括共享的所述ASPS参数集的语法元素和所述AFPS参数集的语法元素;所述第一同构区块信息还包括所述ASPS参数集的扩展语法元素和所述AFPS参数集的扩展语法元素。
在一些实施例中,所述同构区块为多视点视频区块时对应第一同构区块信息,所述同构区块为点云区块时对应第二同构区块信息;所述第一同构区块信息和所述第二同构区块信息包括共享的所述ASPS参数集的语法元素和所述AFPS参数集的语法元素;所述第一同构区块信息还包括所述ASPS参数集的第一扩展语法元素,用于表示重建几何内容的几何坐标的位深度。
在一些实施例中,所述码流的参数集子码流中包括第三语法元素,根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流。
在一些实施例中,所述根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流,包括:所述第三语法元素为第一数值,确定所述码流中同时包括第一表达格式的视觉媒体内容对应的码流和第二表达格式的视觉媒体内容对应的码流;所述第三语法元素为第二数值,确定所述码流中包括所述第一表达格式的视觉媒体内容对应的码流;所述第三语法元素为第三数值,确定所述码流中包括所述第二表达格式的视觉媒体内容对应的码流。
在一些实施例中,所述编码单元1103,配置为对所述拼接图进行编码,得到视频压缩子码流;对所述拼接图信息进行编码,得到拼接图信息子码流;将所述视频压缩子码流和所述拼接图信息子码流合成所述码流。
在一些实施例中,所述表达格式为多视点视频、点云或网格。
在一些实施例中,所述异构混合拼接图以下至少一种:单一属性异构混合拼接图和多属性异构混合拼接图;所述同构拼接图包括以下至少一种:单一属性同构拼接图和多属性同构拼接图。
本申请实施例还提供了一种解码装置,图12为本申请一实施例提供的解码装置的示意性框图,该解码装置120应用于解码器。如图12所示,解码装置120包括:
解码单元1201,配置为解码码流,得到拼接图和拼接图信息;
拆分单元1202,配置为所述拼接图为异构混合拼接图时,根据所述拼接图和所述拼接图信息,得到至少两种同构区块和同构区块信息;其中,不同种同构区块对应不同的视觉媒体内容表达格式和不同的同构区块信息;
所述拆分单元1202,配置为所述拼接图为同构拼接图时,根据所述拼接图和所述拼接图信息,得到一种同构区块和同构区块信息;
处理单元1203,配置为根据所述同构区块和所述同构区块信息,得到至少两种表达格式的视觉媒体内容。
在一些实施例中,所述拼接图信息中包括第一语法元素,根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图。
在一些实施例中,所述第一语法元素包括第一子语法元素和第二子语法元素;所述根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图,包括:如果所述第一子语法元素的取值和所述第二子语法元素的取值相等时,根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图。
在一些实施例中,所述根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图,包括:所述取值为第一预设值,则确定所述拼接图为异构混合拼接图;所述取值为第二预设值,则确定所述拼接图为同构拼接图。
在一些实施例中,所述根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图,包括:所述取值为第三预设值,则确定所述拼接图为包括第一表达格式和第二表达格式的同构区块的异构混合拼接图,其中,所述第一表达格式和所述第二表达格式为不同表达格式;所述取值为第四预设值,则确定 所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;所述取值为第五预设值,则确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。
在一些实施例中,所述第一子语法元素为拼接图序列参数集ASPS的语法元素,所述第二子语法元素为拼接图帧参数集AFPS的语法元素。
在一些实施例中,所述拼接图信息中不包括所述第一语法元素,确定所述拼接图为同构拼接图。
在一些实施例中,所述拼接图为异构混合拼接图时,所述拼接图信息还包括第二语法元素;根据所述第二语法元素确定所述拼接图中同构区块的表达格式。
在一些实施例中,所述根据所述第二语法元素确定所述拼接图中同构区块的表达格式,包括:第i个区块的第二语法元素的取值为第六预设值时,则确定所述第i个区块的表达格式为第一表达格式;第i个区块的第二语法元素的取值为第七预设值时,则确定所述第i个区块的表达格式为第二表达格式。
在一些实施例中,第二语法元素包括:第三子语法元素和第四子语法元素;
所述方法包括:在ASPS解析第一子语法元素;根据所述第一子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;在AFPS解析第二子语法元素和第三子语法元素;根据所述第二子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;根据所述第三子语法元素的取值确定所述拼接图为异构混合拼接图时,根据所述第四子语法元素确定所述拼接图中同构区块的表达格式。
在一些实施例中,所述拼接图为异构混合拼接图时,所述拼接图信息中包括至少两种同构区块信息,其中,不同表达格式的同构区块对应不同的同构区块信息。
在一些实施例中,所述拼接图为异构混合拼接图时,根据所述拼接图和所述拼接图信息,得到至少两种同构区块和同构区块信息,包括:所述拼接图为异构混合拼接图时,对所述拼接图进行拆分得到至少两种同构区块;根据所述至少两种同构区块的表达格式,从所述拼接图信息获取所述至少两种同构区块对应的同构区块信息。
在一些实施例中,所述同构区块信息包括ASPS的语法元素和AFPS的语法元素;不同的同构区块信息对应不同的所述ASPS的语法元素和所述AFPS的语法元素。
在一些实施例中,所述同构区块为多视点视频区块时对应第一同构区块信息,所述同构区块为点云区块时对应第二同构区块信息;所述第一同构区块信息和所述第二同构区块信息包括共享的所述ASPS参数集的语法元素和所述AFPS参数集的语法元素。在一些实施例中,所述第一同构区块信息还包括所述ASPS参数集的扩展语法元素和所述AFPS参数集的扩展语法元素。在一些实施例中,所述第一同构区块信息还包括所述ASPS参数集的第一扩展语法元素,用于表示重建几何内容的几何坐标的位深度。
在一些实施例中,所述同构区块为多视点视频区块时对应第一同构区块信息,所述同构区块为点云区块时对应第二同构区块信息;所述第一同构区块信息和所述第二同构区块信息包括共享的所述ASPS参数集的语法元素和所述AFPS参数集的语法元素;所述第一同构区块信息还包括所述ASPS参数集的第一扩展语法元素,用于表示重建几何内容的几何坐标的位深度。
在一些实施例中,所述码流的参数集子码流中包括第三语法元素,根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流。
在一些实施例中,所述根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流,包括:所述第三语法元素为第一数值,确定所述码流中同时包括第一表达格式的视觉媒体内容对应的码流和第二表达格式的视觉媒体内容对应的码流;所述第三语法元素为第二数值,确定所述码流中包括所述第一表达格式的视觉媒体内容对应的码流;所述第三语法元素为第三数值,确定所述码流中包括所述第二表达格式的视觉媒体内容对应的码流。
在一些实施例中,所述解码码流,得到拼接图和拼接图信息,包括:根据所述第二语法元素确定所述码流中包括至少两种表达格式的视觉媒体内容对应的码流,解码所述码流得到异构混合拼接图和拼接图信息。
在一些实施例中,所述解码单元1201,配置为解码所述视频压缩子码流,得到所述拼接图;解码所述拼接图信息子码流,得到所述拼接图信息。
在一些实施例中,所述表达格式为多视点视频、点云或网格。
在一些实施例中,所述异构混合拼接图以下至少一种:单一属性异构混合拼接图和多属性异构混合拼接图;所述同构拼接图包括以下至少一种:单一属性同构拼接图和多属性同构拼接图。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。
上文中结合附图从功能单元的角度描述了本申请实施例的装置和系统。应理解,该功能单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过硬件和软件单元组合实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完 成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件单元组合执行完成。可选地,软件单元可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
在实际应用中,本申请实施例还提供了一种编码器,图13为本申请一实施例提供的编码器的示意性框图,如图13所示,编码器1310包括:
第二存储器1320和第二处理器1330;第二存储器1320存储有可在第二处理器1330上运行的计算机程序,第二处理器1330执行程序时编码器侧的编码方法。
在实际应用中,本申请实施例还提供了一种解码器,图14为本申请一实施例提供的解码器的示意性框图,如图14所示,解码器1410包括:
第一存储器1420和第一处理器1430;第一存储器1420存储有可在第一处理器1430上运行的计算机程序,第一处理器1430执行程序时解码器侧的解码方法。
在本申请的一些实施例中,该处理器可以包括但不限于:
通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
在本申请的一些实施例中,该存储器包括但不限于:
易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
另外,在本实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
在本申请的再一实施例中,参见图15,其示出了本申请实施例提供的一种编解码系统的组成结构示意图。如图15所示,编解码系统150可以包括编码器1501和解码器1502。其中,编码器1501可以为集成有前述实施例所述编码装置的设备;解码器1502可以为集成有前述实施例所述解码装置的设备。
在本申请实施例中,该编解码系统150中,无论是编码器1501还是解码器1502,均可以利用相邻参考像素与待预测像素的颜色分量信息,实现待预测像素对应加权系数的计算;而且不同的参考像素可以具有不同的加权系数,将此加权系数应用于当前块中待预测像素的色度预测,不仅可以提高色度预测的准确性,节省码率,而且还能够提升编解码性能。
本申请实施例还提供一种芯片,用于实现上述编解码方法。具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的电子设备执行如上述编解码方法。
本申请实施例还提供一种计算机存储介质,其中存储有计算机程序,该计算机程序被第二处理器执行时,实现编码器的编码方法;或者,该计算机程序被第一处理器执行时,实现解码器的解码方法。或者说,本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。
本申请还提供了一种码流,该码流是根据上述编码方法生成的,可选的,该码流中包括上述第一语法元素,或者包括第二语法元素和第三语法元素。
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质 可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本申请中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。例如,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
应当理解,尽管在本申请可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开,不必用于描述特定的顺序或先后次序。例如,在不脱离本申请范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息,第二信息可以在第一信息之前、之后或同时出现。
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以该权利要求的保护范围为准。
工业实用性
本申请提供一种编解码方法、装置、编码器、解码器及存储介质,针对包括一种或多种表达格式的视觉媒体内容的应用场景,将不同表达格式的同构区块拼接成一张异构混合拼接图,将不同表达格式的同构区块拼接在一张异构混合拼接图中进行编解码,能够减少调用的编码器和解码器的个数,降低实现代价,提高易用性。而且,在对异构混合拼接图,不同表达格式区块的某些高层参数可以不相等,从而异构数据提供更合适的高层参数,能有效提升编码效率即减少码率或提高重构多视点视频或点云视频的质量。

Claims (43)

  1. 一种解码方法,其中,包括:
    解码码流,得到拼接图和拼接图信息;
    所述拼接图为异构混合拼接图时,根据所述拼接图和所述拼接图信息,得到至少两种同构区块和同构区块信息;其中,不同种同构区块对应不同的视觉媒体内容表达格式和不同的同构区块信息;
    所述拼接图为同构拼接图时,根据所述拼接图和所述拼接图信息,得到一种同构区块和同构区块信息;
    根据所述同构区块和所述同构区块信息,得到至少两种表达格式的视觉媒体内容。
  2. 根据权利要求1所述的方法,其中,所述拼接图信息中包括第一语法元素,根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图。
  3. 根据权利要求2所述的方法,其中,所述第一语法元素包括第一子语法元素和第二子语法元素;
    所述根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图,包括:
    如果所述第一子语法元素的取值和所述第二子语法元素的取值相等时,根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图。
  4. 根据权利要求3所述的方法,其中,
    所述根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图,包括:
    所述取值为第一预设值,则确定所述拼接图为异构混合拼接图;
    所述取值为第二预设值,则确定所述拼接图为同构拼接图。
  5. 根据权利要求3所述的方法,其中,
    所述根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图,包括:
    所述取值为第三预设值,则确定所述拼接图为包括第一表达格式和第二表达格式的同构区块的异构混合拼接图,其中,所述第一表达格式和所述第二表达格式为不同表达格式;
    所述取值为第四预设值,则确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;
    所述取值为第五预设值,则确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。
  6. 根据权利要求3所述的方法,其中,所述第一子语法元素为拼接图序列参数集ASPS的语法元素,所述第二子语法元素为拼接图帧参数集AFPS的语法元素。
  7. 根据权利要求6所述的方法,其中,所述方法包括:
    在ASPS解析第一子语法元素;
    根据所述第一子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;
    在AFPS解析第二子语法元素;
    根据所述第二子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图。
  8. 根据权利要求7所述的方法,其中,所述第一子语法元素为ASPS中的新增语法元素,或者所述第一子语法元素为ASPS中至少两个语法元素经过逻辑运算得到的语法元素;
    所述第二子语法元素为AFPS中的新增语法元素,或者所述第二子语法元素为ASPS中至少两个语法元素经过逻辑运算得到的语法元素。
  9. 根据权利要求1-8任一项所述的方法,其中,所述拼接图为异构混合拼接图时,所述拼接图信息还包括第二语法元素;根据所述第二语法元素确定所述拼接图中同构区块的表达格式。
  10. 根据权利要求9所述的方法,其中,
    所述根据所述第二语法元素确定所述拼接图中同构区块的表达格式,包括:
    第i个区块的第二语法元素的取值为第六预设值时,则确定所述第i个区块的表达格式为第一表达格式;
    第i个区块的第二语法元素的取值为第七预设值时,则确定所述第i个区块的表达格式为第二表达格式。
  11. 根据权利要求9所述的方法,其中,第二语法元素包括:第三子语法元素和第四子语法元素;所述方法包括:
    在ASPS解析第一子语法元素;
    根据所述第一子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;
    在AFPS解析第二子语法元素和第三子语法元素;
    根据所述第二子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;
    根据所述第三子语法元素的取值确定所述拼接图为异构混合拼接图时,根据所述第四子语法元素确定所述拼接图中同构区块的表达格式。
  12. 根据权利要求1所述的方法,其中,所述拼接图为异构混合拼接图时,所述拼接图信息中包括 至少两种同构区块信息,其中,不同表达格式的同构区块对应不同的同构区块信息。
  13. 根据权利要求12所述的方法,其中,所述拼接图为异构混合拼接图时,根据所述拼接图和所述拼接图信息,得到至少两种同构区块和同构区块信息,包括:
    所述拼接图为异构混合拼接图时,对所述拼接图进行拆分得到至少两种同构区块;
    根据所述至少两种同构区块的表达格式,从所述拼接图信息获取所述至少两种同构区块对应的同构区块信息。
  14. 根据权利要求12或13所述的方法,其中,所述同构区块信息包括ASPS的语法元素和AFPS的语法元素;
    不同的同构区块信息对应不同的所述ASPS的语法元素和所述AFPS的语法元素。
  15. 根据权利要求12-14任一项所述的方法,其中,所述同构区块为多视点视频区块时对应第一同构区块信息,所述同构区块为点云区块时对应第二同构区块信息;
    所述第一同构区块信息和所述第二同构区块信息包括共享的所述ASPS参数集的语法元素和所述AFPS参数集的语法元素;
    所述第一同构区块信息还包括所述ASPS参数集的第一扩展语法元素,用于表示重建几何内容的几何坐标的位深度。
  16. 根据权利要求1-15任一项所述的方法,其中,所述码流的参数集子码流中包括第三语法元素,根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流。
  17. 根据权利要求16所述的方法,其中,所述根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流,包括:
    所述第三语法元素为第一数值,确定所述码流中同时包括第一表达格式的视觉媒体内容对应的码流和第二表达格式的视觉媒体内容对应的码流;
    所述第三语法元素为第二数值,确定所述码流中包括所述第一表达格式的视觉媒体内容对应的码流;
    所述第三语法元素为第三数值,确定所述码流中包括所述第二表达格式的视觉媒体内容对应的码流。
  18. 根据权利要求1-17任一项所述的方法,其中,所述码流包括视频压缩子码流和拼接图信息子码流,所述解码码流,得到拼接图和拼接图信息,包括:
    解码所述视频压缩子码流,得到所述拼接图;
    解码所述拼接图信息子码流,得到所述拼接图信息。
  19. 根据权利要求1-18任一项所述的方法,其中,所述表达格式为多视点视频、点云或网格。
  20. 一种编码方法,其中,包括:
    对至少两种表达格式的视觉媒体内容进行处理,得到至少两种同构区块;
    对所述至少两种同构区块进行拼接,得到拼接图和拼接图信息,其中,所述拼接图为异构混合拼接图时,所述拼接图中包括至少两种同构区块,不同种同构区块对应不同的视觉媒体内容表达格式和不同的同构区块信息;
    对所述拼接图和拼接图信息进行编码,得到码流。
  21. 根据权利要求20所述的方法,其中,所述拼接图信息中包括第一语法元素,根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图。
  22. 根据权利要求21所述的方法,其中,所述第一语法元素包括第一子语法元素和第二子语法元素;
    所述根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图,包括:
    如果所述第一子语法元素的取值和所述第二子语法元素的取值相等时,根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图。
  23. 根据权利要求22所述的方法,其中,
    所述根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图,包括:
    所述取值为第一预设值,则确定所述拼接图为异构混合拼接图;
    所述取值为第二预设值,则确定所述拼接图为同构拼接图。
  24. 根据权利要求22所述的方法,其中,
    所述根据所述取值确定所述拼接图为异构混合拼接图或者同构拼接图,包括:
    所述取值为第三预设值,则确定所述拼接图为包括第一表达格式和第二表达格式的同构区块的异构混合拼接图,其中,所述第一表达格式和所述第二表达格式为不同表达格式;
    所述取值为第四预设值,则确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;
    所述取值为第五预设值,则确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。
  25. 根据权利要求22所述的方法,其中,所述第一子语法元素为拼接图序列参数集ASPS的语法 元素,所述第二子语法元素为拼接图帧参数集AFPS的语法元素。
  26. 根据权利要求25所述的方法,其中,所述方法包括:
    在ASPS解析第一子语法元素;
    根据所述第一子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;
    在AFPS解析第二子语法元素;
    根据所述第二子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图。
  27. 根据权利要求26所述的方法,其中,
    所述第一子语法元素为ASPS中的新增语法元素,或者所述第一子语法元素为ASPS中至少两个语法元素经过逻辑运算得到的语法元素;
    所述第二子语法元素为AFPS中的新增语法元素,或者所述第二子语法元素为ASPS中至少两个语法元素经过逻辑运算得到的语法元素。
  28. 根据权利要求20-27任一项所述的方法,其中,所述拼接图为异构混合拼接图时,所述拼接图信息还包括第二语法元素;根据所述第二语法元素确定所述拼接图中同构区块的表达格式。
  29. 根据权利要求28所述的方法,其中,
    所述根据所述第二语法元素确定所述拼接图中同构区块的表达格式,包括:
    第i个区块的第二语法元素的取值为第六预设值时,则确定所述第i个区块的表达格式为第一表达格式;
    第i个区块的第二语法元素的取值为第七预设值时,则确定所述第i个区块的表达格式为第二表达格式。
  30. 根据权利要求29所述的方法,其中,第二语法元素包括:第三子语法元素和第四子语法元素;
    在ASPS解析第一子语法元素;
    根据所述第一子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;
    在AFPS解析第二子语法元素和第三子语法元素;
    根据所述第二子语法元素的取值确定所述拼接图为异构混合拼接图或者同构拼接图;
    根据所述第三子语法元素的取值确定所述拼接图为异构混合拼接图时,根据所述第四子语法元素确定所述拼接图中同构区块的表达格式。
  31. 根据权利要20所述的方法,其中,所述拼接图为异构混合拼接图时,所述拼接图信息中包括至少两种同构区块信息,其中,不同表达格式的同构区块对应不同的同构区块信息。
  32. 根据权利要求31所述的方法,其中,所述同构区块信息包括ASPS的语法元素和AFPS的语法元素;
    不同的同构区块信息对应不同的所述ASPS的语法元素和所述AFPS的语法元素。
  33. 根据权利要求31或32所述的方法,其中,
    所述同构区块为多视点视频区块时对应第一同构区块信息,所述同构区块为点云区块时对应第二同构区块信息;
    所述第一同构区块信息和所述第二同构区块信息包括共享的所述ASPS参数集的语法元素和所述AFPS参数集的语法元素;
    所述第一同构区块信息还包括所述ASPS参数集的第一扩展语法元素,用于表示重建几何内容的几何坐标的位深度。
  34. 根据权利要求20-33任一项所述的方法,其中,所述码流的参数集子码流中包括第三语法元素,根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流。
  35. 根据权利要求34所述的方法,其中,所述根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流,包括:
    所述第三语法元素为第一数值,确定所述码流中同时包括第一表达格式的视觉媒体内容对应的码流和第二表达格式的视觉媒体内容对应的码流;
    所述第三语法元素为第二数值,确定所述码流中包括所述第一表达格式的视觉媒体内容对应的码流;
    所述第三语法元素为第三数值,确定所述码流中包括所述第二表达格式的视觉媒体内容对应的码流。
  36. 根据权利要求20-35任一项所述的方法,其中,所述对所述拼接图和拼接图信息进行编码,得到码流,包括:
    对所述拼接图进行编码,得到视频压缩子码流;
    对所述拼接图信息进行编码,得到拼接图信息子码流;
    将所述视频压缩子码流和所述拼接图信息子码流合成所述码流。
  37. 根据权利要求20-36任一项所述的方法,其中,所述表达格式为多视点视频、点云或网格。
  38. 一种解码装置,其中,包括:
    解码单元,配置为解码码流,得到拼接图和拼接图信息;
    拆分单元,配置为所述拼接图为异构混合拼接图时,根据所述拼接图和所述拼接图信息,得到至少两种同构区块和同构区块信息;其中,不同种同构区块对应不同的视觉媒体内容表达格式和不同的同构区块信息;
    所述拆分单元,配置为所述拼接图为同构拼接图时,根据所述拼接图和所述拼接图信息,得到一种同构区块和同构区块信息;
    处理单元,配置为根据所述同构区块和所述同构区块信息,得到至少两种表达格式的视觉媒体内容。
  39. 一种编码装置,其中,包括:
    处理单元,配置为对至少两种表达格式的视觉媒体内容进行处理,得到至少两种同构区块;
    拼接单元,配置为对所述至少两种同构区块进行拼接,得到拼接图和拼接图信息,其中,所述拼接图为异构混合拼接图时,所述拼接图中包括至少两种同构区块,不同种同构区块对应不同的视觉媒体内容表达格式和不同的同构区块信息;
    编码单元,配置为对所述拼接图和拼接图信息进行编码,得到码流。
  40. 一种解码器,其中,所述解码器包括:
    第一存储器和第一处理器;
    所述第一存储器存储有可在第一处理器上运行的计算机程序,所述第一处理器执行所述程序时实现权利要求1至19任一项所述解码方法。
  41. 一种编码器,其中,所述编码器包括:
    第二存储器和第二处理器;
    所述第二存储器存储有可在第二处理器上运行的计算机程序,所述第二处理器执行所述程序时实现权利要求20至37任一项所述编码方法。
  42. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被第一处理器执行时,实现权利要求1至19任一项所述解码方法;或者,所述计算机程序被第二处理器执行时,实现权利要求20至37任一项所述编码方法。
  43. 一种码流,其中,所述码流是基于如上述权利要求20至37任一项所述的方法生成的。
PCT/CN2023/071083 2022-10-14 2023-01-06 一种编解码方法、装置、编码器、解码器及存储介质 WO2024077806A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNPCT/CN2022/125525 2022-10-14
PCT/CN2022/125525 WO2024077637A1 (zh) 2022-10-14 2022-10-14 一种编解码方法、装置、编码器、解码器及存储介质

Publications (1)

Publication Number Publication Date
WO2024077806A1 true WO2024077806A1 (zh) 2024-04-18

Family

ID=90668606

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2022/125525 WO2024077637A1 (zh) 2022-10-14 2022-10-14 一种编解码方法、装置、编码器、解码器及存储介质
PCT/CN2023/071083 WO2024077806A1 (zh) 2022-10-14 2023-01-06 一种编解码方法、装置、编码器、解码器及存储介质

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125525 WO2024077637A1 (zh) 2022-10-14 2022-10-14 一种编解码方法、装置、编码器、解码器及存储介质

Country Status (1)

Country Link
WO (2) WO2024077637A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114009053A (zh) * 2019-06-20 2022-02-01 诺基亚技术有限公司 用于视频编码和解码的装置、方法和计算机程序
US20220159298A1 (en) * 2019-05-14 2022-05-19 Intel Corporation IMMERSIVE VIDEO CODING TECHNIQUES FOR THREE DEGREE OF FREEDOM PLUS/METADATA FOR IMMERSIVE VIDEO (3DoF+/MIV) AND VIDEO-POINT CLOUD CODING (V-PCC)
CN114868396A (zh) * 2019-12-11 2022-08-05 交互数字Vc控股公司 用于多视点3DoF+内容的编码和解码的方法和装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008084891A1 (en) * 2007-01-11 2008-07-17 Electronics And Telecommunications Research Institute Method and apparatus for encoding/decoding 3d mesh information including stitching information
JP6675475B2 (ja) * 2015-08-20 2020-04-01 コニンクリーケ・ケイピーエヌ・ナムローゼ・フェンノートシャップ メディア・ストリームに基づくタイルド・ビデオの形成
KR20210134391A (ko) * 2019-03-12 2021-11-09 후아웨이 테크놀러지 컴퍼니 리미티드 포인트 클라우드 코딩을 위한 패치 데이터 유닛 코딩 및 디코딩

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220159298A1 (en) * 2019-05-14 2022-05-19 Intel Corporation IMMERSIVE VIDEO CODING TECHNIQUES FOR THREE DEGREE OF FREEDOM PLUS/METADATA FOR IMMERSIVE VIDEO (3DoF+/MIV) AND VIDEO-POINT CLOUD CODING (V-PCC)
CN114009053A (zh) * 2019-06-20 2022-02-01 诺基亚技术有限公司 用于视频编码和解码的装置、方法和计算机程序
CN114868396A (zh) * 2019-12-11 2022-08-05 交互数字Vc控股公司 用于多视点3DoF+内容的编码和解码的方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
D. MEHLEM (RWTH-AACHEN), C. ROHLFING (RWTH): "Versatile Video Coding for VPCC", 18. MPEG MEETING; 20200420 - 20200424; ALPBACH; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11), 24 April 2020 (2020-04-24), XP030287078 *

Also Published As

Publication number Publication date
WO2024077637A1 (zh) 2024-04-18

Similar Documents

Publication Publication Date Title
KR102334629B1 (ko) 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법
KR102292195B1 (ko) 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법
US11979605B2 (en) Attribute layers and signaling in point cloud coding
US20220159261A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
EP3804320A1 (en) High-level syntax designs for point cloud coding
US20220141487A1 (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method
TW201830965A (zh) 用於時間延展性支持之修改適應性迴路濾波器時間預測
CN114009051B (zh) 用于v-pcc的假设参考解码器
CN113826391A (zh) 视频编解码中最小编码块大小的范围
US20230344999A1 (en) Explicit Address Signaling In Video Coding
WO2023142127A1 (zh) 编解码方法、装置、设备、及存储介质
KR20210105980A (ko) 비디오 인코더, 비디오 디코더 및 상응하는 방법들
WO2022166462A1 (zh) 编码、解码方法和相关设备
CN113228519A (zh) 任意和环绕分块分组
CN113973210B (zh) 媒体文件封装方法、装置、设备及存储介质
WO2024077806A1 (zh) 一种编解码方法、装置、编码器、解码器及存储介质
WO2024011386A1 (zh) 一种编解码方法、装置、编码器、解码器及存储介质
WO2023201504A1 (zh) 编解码方法、装置、设备及存储介质
WO2024213011A1 (en) Visual volumetric video-based coding method, encoder and decoder
WO2024213012A1 (en) Visual volumetric video-based coding method, encoder and decoder
WO2024151494A2 (en) Visual volumetric video-based coding method, encoder and decoder
TW202046739A (zh) 用於適應性迴圈濾波器變數之適應變數集合

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23876021

Country of ref document: EP

Kind code of ref document: A1