WO2023201504A1 - Procédé et appareil de codage, procédé et appareil de décodage, dispositif et support de stockage - Google Patents

Procédé et appareil de codage, procédé et appareil de décodage, dispositif et support de stockage Download PDF

Info

Publication number
WO2023201504A1
WO2023201504A1 PCT/CN2022/087523 CN2022087523W WO2023201504A1 WO 2023201504 A1 WO2023201504 A1 WO 2023201504A1 CN 2022087523 W CN2022087523 W CN 2022087523W WO 2023201504 A1 WO2023201504 A1 WO 2023201504A1
Authority
WO
WIPO (PCT)
Prior art keywords
strips
code stream
reconstructed
information
flag
Prior art date
Application number
PCT/CN2022/087523
Other languages
English (en)
Chinese (zh)
Inventor
虞露
朱志伟
金峡钶
戴震宇
Original Assignee
浙江大学
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学, Oppo广东移动通信有限公司 filed Critical 浙江大学
Priority to PCT/CN2022/087523 priority Critical patent/WO2023201504A1/fr
Publication of WO2023201504A1 publication Critical patent/WO2023201504A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding

Definitions

  • the present application relates to the field of image processing technology, and in particular, to a coding and decoding method, device, equipment, and storage medium.
  • Visual expressions with different expression formats may appear in the same scene.
  • media objects For example, in the same three-dimensional scene, the scene background and some characters and objects are expressed in video, and another part of the characters is expressed in three-dimensional point cloud or three-dimensional grid.
  • the current encoding and decoding technology is to encode and decode multi-view video, point cloud encoding and grid mesh separately.
  • a large number of codecs need to be called during the encoding and decoding process, making encoding and decoding expensive.
  • Embodiments of the present application provide a coding and decoding method, device, equipment, and storage medium to reduce the number of codecs called in the coding and decoding process and reduce the coding and decoding costs.
  • this application provides an encoding method, including:
  • At least two visual media contents to obtain at least two isomorphic strips, the visual media content corresponds to at least two different expression formats, and the isomorphic strips correspond to at least two different expression formats;
  • the heterogeneous mixed splicing image is encoded to obtain a code stream.
  • embodiments of the present application provide a decoding method, including:
  • At least two reconstructed isomorphic strips at least two reconstructed visual media contents are obtained, and the reconstructed visual media contents correspond to at least two different expression formats.
  • the present application provides an encoding device for executing the method in the above first aspect or its respective implementations.
  • the prediction device includes a functional unit for executing the method in the above-mentioned first aspect or its respective implementations.
  • this application provides a decoding device for performing the method in the above second aspect or its respective implementations.
  • the prediction device includes a functional unit for executing the method in the above-mentioned second aspect or its respective implementations.
  • an encoder including a processor and a memory.
  • the memory is used to store a computer program
  • the processor is used to call and run the computer program stored in the memory to execute the method in the above first aspect or its respective implementations.
  • a sixth aspect provides a decoder, including a processor and a memory.
  • the memory is used to store a computer program
  • the processor is used to call and run the computer program stored in the memory to execute the method in the above second aspect or its respective implementations.
  • the seventh aspect provides a coding and decoding system, including an encoder and a decoder.
  • the encoder is used to perform the method in the above-mentioned first aspect or its various implementations
  • the decoder is used to perform the method in the above-mentioned second aspect or its various implementations.
  • An eighth aspect provides a chip for implementing any one of the above-mentioned first to second aspects or the method in each implementation manner thereof.
  • the chip includes: a processor, configured to call and run a computer program from a memory, so that the device installed with the chip executes any one of the above-mentioned first to second aspects or implementations thereof. method.
  • a ninth aspect provides a computer-readable storage medium for storing a computer program that causes a computer to execute any one of the above-mentioned first to second aspects or the method in each implementation thereof.
  • a computer program product including computer program instructions, which enable a computer to execute any one of the above-mentioned first to second aspects or the methods in each implementation thereof.
  • An eleventh aspect provides a computer program that, when run on a computer, causes the computer to execute any one of the above-mentioned first to second aspects or the method in each implementation thereof.
  • a twelfth aspect provides a code stream, which is generated based on the method of the first aspect.
  • the visual media content of different expression formats (i.e., heterogeneous content) is formed in an orderly grouping.
  • heterogeneous hybrid splicing image for example, multi-viewpoint video strips and point cloud strips are spliced into a heterogeneous hybrid splicing image for encoding and decoding, which minimizes the number of 2D videos such as HEVC, VVC, AVC, and AVS that need to be called.
  • the number of codecs reduces the cost of coding and decoding and improves ease of use.
  • Figure 1 is a schematic block diagram of a video encoding and decoding system related to an embodiment of the present application
  • Figure 2A is a schematic block diagram of a video encoder involved in an embodiment of the present application.
  • Figure 2B is a schematic block diagram of a video decoder involved in an embodiment of the present application.
  • Figure 3A is a diagram of the organization and expression framework of multi-viewpoint video data
  • Figure 3B is a schematic diagram of splicing image generation of multi-viewpoint video data
  • Figure 3C is a diagram of the organization and expression framework of point cloud data
  • Figures 3D to 3F are schematic diagrams of different types of point cloud data
  • Figure 4 is a schematic diagram of multi-viewpoint video encoding
  • Figure 5 is a schematic diagram of decoding multi-viewpoint video
  • Figure 6 is a schematic flow chart of an encoding method provided by an embodiment of the present application.
  • Figure 7A is a schematic diagram of a point cloud strip provided by an embodiment of the present application.
  • Figure 7B is a schematic diagram of a multi-viewpoint video strip provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of the encoding process provided by an embodiment of the present application.
  • Figure 9A is a schematic diagram of a heterogeneous mixed texture mosaic provided by an embodiment of the present application.
  • Figure 9B is a schematic diagram of a heterogeneous mixed geometry splicing diagram provided by an embodiment of the present application.
  • Figure 9C is a schematic diagram of a point cloud mixed occupancy splicing diagram provided by an embodiment of the present application.
  • Figure 10 is a schematic diagram of the hybrid encoding process provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of the V3C bitstream structure provided by an embodiment of the present application.
  • Figure 12 is another schematic flow chart of an encoding method provided by an embodiment of the present application.
  • Figure 13 is a schematic flow chart of a decoding method provided by an embodiment of the present application.
  • Figure 14 is a schematic diagram of the hybrid decoding process provided by an embodiment of the present application.
  • Figure 15 is another schematic flowchart of a decoding method provided by an embodiment of the present application.
  • Figure 16 is a schematic block diagram of an encoding device provided by an embodiment of the present application.
  • Figure 17 is a schematic block diagram of a decoding device provided by an embodiment of the present application.
  • Figure 18 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • This application can be applied to the fields of image encoding and decoding, video encoding and decoding, hardware video encoding and decoding, dedicated circuit video encoding and decoding, real-time video encoding and decoding, etc.
  • the solution of this application can be combined with the audio and video coding standard (AVS for short), such as H.264/audio video coding (AVC for short) standard, H.265/high-efficiency video coding (AVS for short) high efficiency video coding (HEVC) standard and H.266/versatile video coding (VVC) standard.
  • AVC audio video coding
  • HEVC high efficiency video coding
  • VVC variatile video coding
  • the solution of this application can be operated in conjunction with other proprietary or industry standards, including ITU-TH.261, ISO/IECMPEG-1Visual, ITU-TH.262 or ISO/IECMPEG-2Visual, ITU-TH.263 , ISO/IECMPEG-4Visual, ITU-TH.264 (also known as ISO/IECMPEG-4AVC), including scalable video codec (SVC) and multi-view video codec (MVC) extensions.
  • SVC scalable video codec
  • MVC multi-view video codec
  • the high-degree-of-freedom immersive coding system can be roughly divided into the following links according to the task line: data collection, data organization and expression, data encoding and compression, data decoding and reconstruction, data synthesis and rendering, and finally presenting the target data to the user.
  • the encoding involved in the embodiment of the present application is mainly video encoding and decoding. To facilitate understanding, the video encoding and decoding system involved in the embodiment of the present application is first introduced with reference to Figure 1 .
  • Figure 1 is a schematic block diagram of a video encoding and decoding system related to an embodiment of the present application. It should be noted that Figure 1 is only an example, and the video encoding and decoding system in the embodiment of the present application includes but is not limited to what is shown in Figure 1 .
  • the video encoding and decoding system 100 includes an encoding device 110 and a decoding device 120 .
  • the encoding device is used to encode the video data (which can be understood as compression) to generate a code stream, and transmit the code stream to the decoding device.
  • the decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
  • the encoding device 110 in the embodiment of the present application can be understood as a device with a video encoding function
  • the decoding device 120 can be understood as a device with a video decoding function. That is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120. Examples include smartphones, desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, and the like.
  • the encoding device 110 may transmit the encoded video data (eg, code stream) to the decoding device 120 via the channel 130 .
  • Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120 .
  • channel 130 includes one or more communication media that enables encoding device 110 to transmit encoded video data directly to decoding device 120 in real time.
  • encoding device 110 may modulate the encoded video data according to the communication standard and transmit the modulated video data to decoding device 120.
  • the communication media includes wireless communication media, such as radio frequency spectrum.
  • the communication media may also include wired communication media, such as one or more physical transmission lines.
  • channel 130 includes a storage medium that can store video data encoded by encoding device 110 .
  • Storage media include a variety of local access data storage media, such as optical disks, DVDs, flash memories, etc.
  • the decoding device 120 may obtain the encoded video data from the storage medium.
  • channel 130 may include a storage server that may store video data encoded by encoding device 110 .
  • the decoding device 120 may download the stored encoded video data from the storage server.
  • the storage server may store the encoded video data and may transmit the encoded video data to the decoding device 120, such as a web server (eg, for a website), a File Transfer Protocol (FTP) server, etc.
  • FTP File Transfer Protocol
  • the encoding device 110 includes a video encoder 112 and an output interface 113.
  • the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
  • the encoding device 110 may include a video source 111 in addition to the video encoder 112 and the input interface 113 .
  • Video source 111 may include at least one of a video capture device (eg, a video camera), a video archive, a video input interface for receiving video data from a video content provider, a computer graphics system Used to generate video data.
  • a video capture device eg, a video camera
  • a video archive e.g., a video archive
  • video input interface for receiving video data from a video content provider
  • computer graphics system Used to generate video data.
  • the video encoder 112 encodes the video data from the video source 111 to generate a code stream.
  • Video data may include one or more images (pictures) or sequence of pictures (sequence of pictures).
  • the code stream contains the encoding information of an image or image sequence in the form of a bit stream.
  • Encoded information may include encoded image data and associated data.
  • the associated data may include sequence parameter set (SPS), picture parameter set (PPS) and other syntax structures.
  • SPS sequence parameter set
  • PPS picture parameter set
  • An SPS can contain parameters that apply to one or more sequences.
  • a PPS can contain parameters that apply to one or more images.
  • a syntax structure refers to a collection of zero or more syntax elements arranged in a specified order in a code stream.
  • the video encoder 112 transmits the encoded video data directly to the decoding device 120 via the output interface 113 .
  • the encoded video data can also be stored on a storage medium or storage server for subsequent reading by the decoding device 120 .
  • decoding device 120 includes input interface 121 and video decoder 122.
  • the decoding device 120 may also include a display device 123.
  • the input interface 121 includes a receiver and/or a modem. Input interface 121 may receive encoded video data over channel 130.
  • the video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .
  • the display device 123 displays the decoded video data.
  • Display device 123 may be integrated with decoding device 120 or external to decoding device 120 .
  • Display device 123 may include a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
  • LCD liquid crystal display
  • plasma display a plasma display
  • OLED organic light emitting diode
  • Figure 1 is only an example, and the technical solution of the embodiment of the present application is not limited to Figure 1.
  • the technology of the present application can also be applied to unilateral video encoding or unilateral video decoding.
  • FIG. 2A is a schematic block diagram of a video encoder related to an embodiment of the present application. It should be understood that the video encoder 200 can be used to perform lossy compression of images (lossy compression), or can also be used to perform lossless compression (lossless compression) of images.
  • the lossless compression can be visually lossless compression (visually lossless compression) or mathematically lossless compression (mathematically lossless compression).
  • the video encoder 200 can be applied to image data in a luminance-chrominance (YCbCr, YUV) format.
  • YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y represents brightness (Luma), Cb(U) represents blue chroma, Cr(V) represents red chroma, U and V represent Chroma, which is used to describe color and saturation.
  • 4:2:0 means that every 4 pixels have 4 luminance components and 2 chrominance components (YYYYCbCr)
  • 4:2:2 means that every 4 pixels have 4 luminance components and 4 Chroma component (YYYYCbCrCbCr)
  • 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
  • the video encoder 200 reads video data, and for each frame of image in the video data, divides one frame of image into several coding tree units (coding tree units, CTU).
  • CTB may be called “Tree block", “Largest Coding unit” (LCU for short) or “coding tree block” (CTB for short).
  • LCU Large Coding unit
  • CTB coding tree block
  • Each CTU can be associated with an equal-sized block of pixels within the image.
  • Each pixel can correspond to one luminance (luminance or luma) sample and two chrominance (chrominance or chroma) samples. Therefore, each CTU can be associated with one block of luma samples and two blocks of chroma samples.
  • a CTU size is, for example, 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32, etc.
  • a CTU can be further divided into several coding units (Coding Units, CUs) for encoding.
  • CUs can be rectangular blocks or square blocks.
  • CU can be further divided into prediction unit (PU for short) and transform unit (TU for short), thus enabling coding, prediction, and transformation to be separated and processing more flexible.
  • the CTU is divided into CUs in a quad-tree manner, and the CU is divided into TUs and PUs in a quad-tree manner.
  • Video encoders and video decoders can support various PU sizes. Assuming that the size of a specific CU is 2N ⁇ 2N, the video encoder and video decoder can support a PU size of 2N ⁇ 2N or N ⁇ N for intra prediction, and support 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, N ⁇ N or similar sized symmetric PU for inter prediction. The video encoder and video decoder can also support 2N ⁇ nU, 2N ⁇ nD, nL ⁇ 2N and nR ⁇ 2N asymmetric PUs for inter prediction.
  • the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, and a loop filtering unit. 260. Decode the image cache 270 and the entropy encoding unit 280. It should be noted that the video encoder 200 may include more, less, or different functional components.
  • the current block may be called the current coding unit (CU) or the current prediction unit (PU), etc.
  • the prediction block may also be called a predicted image block or an image prediction block
  • the reconstructed image block may also be called a reconstruction block or an image reconstructed image block.
  • prediction unit 210 includes inter prediction unit 211 and intra estimation unit 212. Since there is a strong correlation between adjacent pixels in a video frame, the intra-frame prediction method is used in video encoding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Since there is a strong similarity between adjacent frames in the video, the interframe prediction method is used in video coding and decoding technology to eliminate the temporal redundancy between adjacent frames, thereby improving coding efficiency.
  • the inter-frame prediction unit 211 can be used for inter-frame prediction.
  • Inter-frame prediction can include motion estimation (motion estimation) and motion compensation (motion compensation). It can refer to image information of different frames.
  • Inter-frame prediction uses motion information to find a reference from a reference frame. block, a prediction block is generated based on the reference block to eliminate temporal redundancy; the frames used in inter-frame prediction can be P frames and/or B frames, P frames refer to forward prediction frames, and B frames refer to bidirectional predictions frame.
  • Inter-frame prediction uses motion information to find reference blocks from reference frames and generate prediction blocks based on the reference blocks.
  • the motion information includes the reference frame list where the reference frame is located, the reference frame index, and the motion vector.
  • the motion vector can be in whole pixels or sub-pixels.
  • the reference frame found according to the motion vector is A block of whole pixels or sub-pixels is called a reference block.
  • Some technologies will directly use the reference block as a prediction block, and some technologies will process the reference block to generate a prediction block. Reprocessing to generate a prediction block based on a reference block can also be understood as using the reference block as a prediction block and then processing to generate a new prediction block based on the prediction block.
  • the intra-frame estimation unit 212 only refers to the information of the same frame image and predicts the pixel information in the current coded image block to eliminate spatial redundancy.
  • the frames used in intra prediction may be I frames.
  • Intra-frame prediction has multiple prediction modes. Taking the international digital video coding standard H series as an example, the H.264/AVC standard has 8 angle prediction modes and 1 non-angle prediction mode, and H.265/HEVC has been extended to 33 angles. prediction mode and 2 non-angle prediction modes.
  • the intra-frame prediction modes used by HEVC include planar mode (Planar), DC and 33 angle modes, for a total of 35 prediction modes.
  • the intra-frame modes used by VVC include Planar, DC and 65 angle modes, for a total of 67 prediction modes.
  • Residual unit 220 may generate a residual block of the CU based on the pixel block of the CU and the prediction block of the PU of the CU. For example, residual unit 220 may generate a residual block of a CU such that each sample in the residual block has a value equal to the difference between the sample in the pixel block of the CU and the PU of the CU. Predict the corresponding sample in the block.
  • Transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with the TU of the CU based on quantization parameter (QP) values associated with the CU. Video encoder 200 may adjust the degree of quantization applied to transform coefficients associated with the CU by adjusting the QP value associated with the CU.
  • QP quantization parameter
  • Inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct the residual block from the quantized transform coefficients.
  • Reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by prediction unit 210 to produce a reconstructed image block associated with the TU. By reconstructing blocks of samples for each TU of a CU in this manner, video encoder 200 can reconstruct blocks of pixels of the CU.
  • the loop filtering unit 260 is used to process the inversely transformed and inversely quantized pixels to compensate for distortion information and provide a better reference for subsequent encoding of pixels. For example, a deblocking filtering operation can be performed to reduce the number of pixel blocks associated with the CU. block effect.
  • the loop filtering unit 260 includes a deblocking filtering unit and a sample adaptive compensation/adaptive loop filtering (SAO/ALF) unit, where the deblocking filtering unit is used to remove blocking effects, and the SAO/ALF unit Used to remove ringing effects.
  • SAO/ALF sample adaptive compensation/adaptive loop filtering
  • Decoded image cache 270 may store reconstructed pixel blocks.
  • Inter prediction unit 211 may perform inter prediction on PUs of other images using reference images containing reconstructed pixel blocks.
  • intra estimation unit 212 may use the reconstructed pixel blocks in decoded image cache 270 to perform intra prediction on other PUs in the same image as the CU.
  • Entropy encoding unit 280 may receive the quantized transform coefficients from transform/quantization unit 230 . Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.
  • FIG. 2B is a schematic block diagram of a video decoder related to an embodiment of the present application.
  • the video decoder 300 includes an entropy decoding unit 310 , a prediction unit 320 , an inverse quantization/transformation unit 330 , a reconstruction unit 340 , a loop filtering unit 350 and a decoded image buffer 360 . It should be noted that the video decoder 300 may include more, less, or different functional components.
  • Video decoder 300 can receive the code stream.
  • Entropy decoding unit 310 may parse the codestream to extract syntax elements from the codestream. As part of parsing the code stream, the entropy decoding unit 310 may parse entropy-encoded syntax elements in the code stream.
  • the prediction unit 320, the inverse quantization/transformation unit 330, the reconstruction unit 340 and the loop filtering unit 350 may decode the video data according to the syntax elements extracted from the code stream, that is, generate decoded video data.
  • prediction unit 320 includes inter prediction unit 321 and intra estimation unit 322.
  • Intra estimation unit 322 may perform intra prediction to generate predicted blocks for the PU. Intra estimation unit 322 may use an intra prediction mode to generate predicted blocks for a PU based on pixel blocks of spatially neighboring PUs. Intra estimation unit 322 may also determine the intra prediction mode of the PU based on one or more syntax elements parsed from the codestream.
  • the inter prediction unit 321 may construct a first reference image list (List 0) and a second reference image list (List 1) according to syntax elements parsed from the code stream. Additionally, if the PU uses inter-prediction encoding, entropy decoding unit 310 may parse the motion information of the PU. Inter prediction unit 321 may determine one or more reference blocks for the PU based on the motion information of the PU. Inter prediction unit 321 may generate a predictive block for the PU based on one or more reference blocks of the PU.
  • Inverse quantization/transform unit 330 may inversely quantize (ie, dequantize) transform coefficients associated with a TU. Inverse quantization/transform unit 330 may use the QP value associated with the CU of the TU to determine the degree of quantization.
  • inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients to produce a residual block associated with the TU.
  • Reconstruction unit 340 uses the residual blocks associated with the TU of the CU and the prediction blocks of the PU of the CU to reconstruct the pixel blocks of the CU. For example, reconstruction unit 340 may add samples of the residual block to corresponding samples of the prediction block to reconstruct the pixel block of the CU to obtain a reconstructed image block.
  • Loop filtering unit 350 may perform deblocking filtering operations to reduce blocking artifacts for blocks of pixels associated with the CU.
  • Video decoder 300 may store the reconstructed image of the CU in decoded image cache 360 .
  • the video decoder 300 may use the reconstructed image in the decoded image cache 360 as a reference image for subsequent prediction, or transmit the reconstructed image to a display device for presentation.
  • the basic process of video encoding and decoding is as follows: at the encoding end, an image frame is divided into blocks.
  • the prediction unit 210 uses intra prediction or inter prediction to generate a prediction block of the current block.
  • the residual unit 220 may calculate a residual block based on the prediction block and the original block of the current block, that is, the difference between the prediction block and the original block of the current block.
  • the residual block may also be called residual information.
  • the residual block undergoes transformation and quantization processes such as transformation/quantization unit 230 to remove information that is insensitive to human eyes to eliminate visual redundancy.
  • the residual block before transformation and quantization by the transformation/quantization unit 230 may be called a time domain residual block, and the time domain residual block after transformation and quantization by the transformation/quantization unit 230 may be called a frequency residual block. or frequency domain residual block.
  • the entropy encoding unit 280 receives the quantized change coefficient output from the change quantization unit 230, and may perform entropy encoding on the quantized change coefficient to output a code stream. For example, the entropy encoding unit 280 may eliminate character redundancy according to the target context model and probability information of the binary code stream.
  • the entropy decoding unit 310 can parse the code stream to obtain the prediction information, quantization coefficient matrix, etc. of the current block.
  • the prediction unit 320 uses intra prediction or inter prediction for the current block based on the prediction information to generate a prediction block of the current block.
  • the inverse quantization/transform unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block.
  • the reconstruction unit 340 adds the prediction block and the residual block to obtain a reconstruction block.
  • the reconstructed blocks constitute a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the blocks to obtain a decoded image.
  • the encoding end also needs similar operations as the decoding end to obtain the decoded image.
  • the decoded image may also be called a reconstructed image, and the reconstructed image may be used as a reference frame for inter-frame prediction for
  • the block division information determined by the encoding end as well as mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc., are carried in the code stream when necessary.
  • the decoding end determines the same block division information as the encoding end by parsing the code stream and analyzing the existing information, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, thereby ensuring the decoded image obtained by the encoding end It is the same as the decoded image obtained by the decoding end.
  • the current encoding and decoding methods include at least the following two:
  • Method 1 For multi-viewpoint videos, MPEG (Moving Picture Experts Group) immersive video (MIV) technology is used for encoding and decoding, and for point clouds, point cloud video compression (Video based Point Cloud) is used. Compression (VPCC for short) technology for encoding and decoding.
  • MPEG Motion Picture Experts Group
  • MIV Motion Picture Experts Group
  • point cloud video compression Video based Point Cloud
  • Compression VPCC for short
  • FIG. 3A In order to reduce the transmission pixel rate while retaining scene information as much as possible to ensure that there is enough information for rendering the target view, the scheme adopted by MPEG-I is shown in Figure 3A.
  • a limited number of viewpoints are selected as the basic viewpoints and as much as possible
  • the base viewpoint is transmitted as a complete image, and the redundant pixels between the remaining non-base viewpoints and the base viewpoint are removed, that is, only the effective information of non-repeated expressions is retained, and then the effective information is extracted into sub-block images and base views
  • the viewpoint image is reorganized to form a larger rectangular image, which is called a spliced image.
  • Figure 3A and Figure 3B give a schematic process of generating a spliced image.
  • the spliced image is sent to the codec for compression and reconstruction, and the auxiliary data related to the sub-block image splicing information is also sent to the encoder to form a code stream.
  • the encoding method of VPCC is to project point clouds into two-dimensional images or videos, and convert three-dimensional information into two-dimensional information encoding.
  • Figure 3C is the coding block diagram of VPCC.
  • the code stream is roughly divided into four parts.
  • the geometric code stream is the code stream generated by geometric depth map encoding, which is used to represent the geometric information of the point cloud;
  • the attribute code stream is the code stream generated by texture map encoding. , used to represent the attribute information of the point cloud;
  • the occupancy code stream is the code stream generated by the occupancy map encoding, which is used to indicate the effective area in the depth map and texture map;
  • These three types of videos all use video encoders for encoding and decoding.
  • the auxiliary information code stream is the code stream generated by encoding the auxiliary information of the sub-block image, which is the part related to the patch data unit in the V3C standard, indicating the position and size of each sub-block image.
  • Method 2 Multi-viewpoint videos and point clouds are encoded and decoded using the frame packing technology in Visual Volumetric Video-based Coding (V3C).
  • V3C Visual Volumetric Video-based Coding
  • the encoding end includes the following steps:
  • Step 1 When encoding the acquired multi-view video, perform some pre-processing to generate multi-view video sub-blocks (patch). Then, organize the multi-view video sub-blocks to generate a multi-view video splicing image.
  • multi-viewpoint videos are input into TIMV for packaging, and a multi-viewpoint video splicing image is output.
  • TIMV is a reference software for MIV.
  • Packaging in the embodiment of this application can be understood as splicing.
  • the multi-viewpoint video mosaic includes a multi-view video texture mosaic and a multi-view video geometry mosaic, that is, it only contains multi-view video sub-blocks.
  • Step 2 Input the multi-viewpoint video splicing image into the frame packer and output the multi-viewpoint video mixed splicing image.
  • the multi-viewpoint video hybrid splicing image includes a multi-viewpoint video texture blending splicing image, a multi-viewpoint video geometry blending splicing image, and a multi-viewpoint video texture and geometry blending splicing image.
  • the multi-viewpoint video splicing image is frame packed to generate a multi-viewpoint video hybrid splicing image.
  • Each multi-viewpoint video splicing image occupies a region of the multi-viewpoint video hybrid splicing image.
  • a flag pin_region_type_id_minus2 must be transmitted for each region in the code stream. This flag records the information whether the current area belongs to a multi-viewpoint video texture splicing map or a multi-viewpoint video geometric splicing map. This information needs to be used at the decoding end.
  • Step 3 Use a video encoder to encode the multi-viewpoint video mixed splicing image to obtain a code stream.
  • the decoding end includes the following steps:
  • Step 1 During multi-viewpoint video decoding, input the obtained code stream into the video decoder for decoding to obtain a reconstructed multi-viewpoint video mixed splicing image.
  • Step 2 Input the reconstructed multi-viewpoint video mixed splicing image into the frame depacker and output the reconstructed multi-viewpoint video splicing image.
  • the flag pin_region_type_id_minus2 is obtained from the code stream. If it is determined that the pin_region_type_id_minus2 is V3C_AVD, it means that the current region is a multi-viewpoint video texture mosaic, and then the current region is split and output as a reconstructed multi-viewpoint video texture mosaic.
  • pin_region_type_id_minus2 is V3C_GVD, it means that the current region is a multi-viewpoint video geometric mosaic, and the current region is split and output as a reconstructed multi-viewpoint video geometric mosaic.
  • Step 3 Decode the reconstructed multi-viewpoint video splicing image to obtain the reconstructed multi-viewpoint video.
  • the multi-viewpoint video texture splicing image and the multi-viewpoint video geometric splicing image are decoded to obtain the reconstructed multi-viewpoint video.
  • the above uses multi-viewpoint video as an example to analyze and introduce frame packing technology.
  • the frame packing encoding and decoding method for point clouds is basically the same as the above-mentioned multi-viewpoint video. You can refer to it.
  • TMC a VPCC reference software
  • the cloud is packaged to obtain a point cloud splicing image.
  • the point cloud splicing image is input into the frame packer for frame packaging to obtain a point cloud hybrid splicing image.
  • the point cloud hybrid splicing image is spliced to obtain a point cloud code stream. I will not go into details here. .
  • V3C unit header syntax is shown in Table 1:
  • V3C unit header semantics as shown in Table 2:
  • the visual media content with multiple different expression formats will be encoded and decoded separately.
  • the current packaging technology is to compress the point cloud to form a point cloud compression code stream (i.e. a V3C code stream), and to compress the multi-viewpoint video Information is compressed to obtain a multi-view video compressed code stream (i.e. another V3C code stream), and then the system layer multiplexes the compressed code stream to obtain a fused three-dimensional scene multiplexed code stream.
  • the point cloud compression code stream and the multi-viewpoint video compression code stream are decoded separately. It can be seen from this that when encoding and decoding visual media content in multiple different expression formats, the existing technology uses many codecs and the encoding and decoding cost is high.
  • the embodiment of the present application splices the strips corresponding to the visual media content of multiple different expression formats into a heterogeneous mixed splicing image, so that the visual media content of different expression formats (i.e., heterogeneous content) has Formed sequentially and grouped in the same heterogeneous hybrid splicing image, for example, multi-viewpoint video strips and point cloud strips are encoded and decoded in a heterogeneous hybrid splicing image, thus minimizing the number of HEVC, VVC, AVC, and AVS that need to be called. equal the number of 2D video encoders, reducing encoding and decoding costs and improving ease of use.
  • Figure 6 is a schematic flowchart of the encoding method 600 provided by an embodiment of the present application. As shown in Figure 6, the method 600 of the embodiment of the present application includes:
  • S601. Process at least two visual media contents to obtain at least two isomorphic strips, where the visual media contents correspond to at least two different expression formats.
  • the at least two isomorphic strips correspond to at least two different expression formats.
  • the visual media content corresponds to at least two different expression formats, that is, the at least two visual media contents include visual media content of at least two different expression formats.
  • the visual media content in the embodiment of the present application belongs to multi-viewpoint video, point cloud , grid and other two different expression formats of media content.
  • multi-viewpoint video is single-viewpoint video, that is, the multi-viewpoint video may include multiple viewpoint videos and/or single-viewpoint video.
  • the at least two isomorphic strips correspond to at least two different expression formats, that is, the at least two isomorphic strips include at least two isomorphic strips with different expression formats.
  • the isomorphic strips in the embodiment of the present application belong to There are two strips with different expression formats, such as multi-view video strips, point cloud strips, and grid strips. Among them, one isomorphic strip corresponds to one expression format.
  • Visual expressions with different expression formats may appear in the same scene.
  • Media objects for example, exist in the same three-dimensional scene.
  • the scene background and some characters and objects are expressed in video, and another part of the characters are expressed in three-dimensional point cloud or three-dimensional mesh.
  • the above-mentioned at least two visual media contents are media contents presented simultaneously in the same three-dimensional space. In some embodiments, the at least two visual media contents are media contents presented at different times in the same three-dimensional space. In some embodiments, the above-mentioned at least two visual media contents may also be media contents in different three-dimensional spaces. That is to say, in the embodiments of this application, there are no specific restrictions on the at least two visual media contents mentioned above.
  • the expression formats of at least two visual media contents in the embodiments of the present application are different.
  • the at least two visual media contents include point clouds and multi-viewpoint videos.
  • part of the visual media content in the at least two visual media contents in the embodiments of the present application has the same expression format, and part of the visual media content has different expression formats.
  • the at least two visual media contents include two point clouds and one point cloud. Multi-viewpoint video.
  • the at least two visual media contents are processed, such as packaging (also called splicing), to obtain each of the at least two visual media contents.
  • packaging also called splicing
  • Strips corresponding to visual media content For example, sub-patches corresponding to at least two visual media contents may be spliced to obtain a strip.
  • the embodiment of the present application processes at least two visual media contents separately, and the method of obtaining the strips is not limited.
  • a tile may be a mosaic of a specific shape, for example, a mosaic of a rectangular area having a specific length and/or height.
  • at least one sub-tile can be spliced in an orderly manner, such as from large to small according to the area of the sub-tiles, or from large to small according to the length and/or height of the sub-tiles, to obtain the visual media content corresponding to Bands.
  • a strip can be mapped exactly to an atlas tile.
  • each sub-tile in a strip may have a patch ID (patchID) to distinguish different sub-tiles in the same strip.
  • patch ID patch ID
  • sub-patch 1 patch 1
  • sub-patch 2 sub-patch 2
  • sub-patch 3 patch 3
  • a homogeneous strip means that the expression format corresponding to each sub-tile in the strip is the same.
  • each sub-tile in an isomorphic strip is a multi-view video sub-tile, or is a point cloud.
  • a sub-tile is equivalent to a sub-tile of the same expression format.
  • the expression format corresponding to each sub-tile in the isomorphic strip is the expression format corresponding to the isomorphic strip.
  • homogeneous tiles may have tile IDs (tileIDs) to distinguish different tiles of the same expression format.
  • tile IDs such as point cloud strips
  • isomorphic strips in the same expression format can include point cloud strip 1 or point cloud strip 2.
  • multiple visual media contents include point clouds and multi-viewpoint videos.
  • the point clouds are processed to obtain point cloud strips, as shown in Figure 7A.
  • Point cloud strip 1 includes point cloud sub-tiles 1 to 1.
  • Block 3 Process the multi-view video to obtain a multi-view video strip, as shown in Figure 7B, in which the multi-view video strip includes multi-view video sub-blocks 1 to 4.
  • the above S601 can be specifically implemented as: After projection and de-redundancy processing, non-repeating pixels are connected into video sub-blocks, and the video sub-blocks are spliced into multi-viewpoint video strips; and the obtained point clouds are parallel projected, and the points in the projection plane are Connected points form point cloud sub-tiles, and the point cloud sub-tiles are spliced into point cloud strips.
  • a limited number of viewpoints are selected as base viewpoints and express the visible range of the scene as much as possible.
  • the base viewpoints are transmitted as complete images, and the gaps between the remaining non-base viewpoints and the base viewpoints are removed. Redundant pixels, that is, only the effective information of non-repeated expressions is retained, and then the effective information is extracted into sub-block images and basic viewpoint images and reorganized to form a larger strip-shaped image.
  • This strip-shaped image is called a multi-viewpoint video Bands.
  • the three-dimensional point cloud is parallel projected to obtain a two-dimensional point cloud.
  • the connected points in the two-dimensional point cloud are composed of point cloud sub-blocks, and these point cloud sub-blocks are spliced to obtain the point cloud. Bands.
  • S602. Splice at least two homogeneous strips to generate a heterogeneous hybrid splicing image.
  • At least two visual media contents are first processed separately (that is, packaged) to obtain at least two isomorphic media contents. Bands.
  • at least two homogeneous strips with different expression formats are spliced into a heterogeneous mixed spliced image, and the heterogeneous mixed spliced image is encoded to obtain a code stream. That is to say, the embodiment of the present application performs encoding by splicing homogeneous strips of different expression formats into a heterogeneous mixed splicing image.
  • the video encoder can be called only once for encoding, thereby reducing the number of calls required.
  • the number of 2D video encoders such as HEVC, VVC, AVC, AVS, etc. reduces the encoding cost and improves ease of use.
  • tile packing In order to distinguish it from frame packing, in the embodiment of the present application, the process of splicing at least several homogeneous strips into a heterogeneous hybrid spliced image is called tile packing.
  • the above-mentioned S603 includes: using a video encoder to encode the heterogeneous hybrid spliced image to obtain a video code stream.
  • hybrid splicing information when at least two homogeneous strips are spliced into a heterogeneous hybrid splicing image, hybrid splicing information is generated. These mixed coding information are needed during decoding, therefore, these mixed coding information need to be encoded.
  • the embodiments of the present application also include the step of encoding the hybrid splicing information, that is, the above S603 includes the following steps:
  • S603-A Call the video encoder to perform video encoding on the heterogeneous mixed splicing image to obtain the video compression sub-stream;
  • the video encoder used to perform video encoding on the heterogeneous hybrid splicing image to obtain the video compression sub-stream can be the video encoder shown in Figure 2A above. That is to say, in the embodiment of this application, the heterogeneous hybrid mosaic image is used as a frame image. Blocks are first divided, and then intra-frame or inter-frame prediction is used to obtain the predicted value of the coding block. The predicted value of the coding block is subtracted from the original value. , obtain the residual value, and after transforming and quantizing the residual value, the video compression sub-stream is obtained.
  • the hybrid splicing information of the heterogeneous hybrid splicing image is encoded to obtain a hybrid splicing information sub-stream.
  • the embodiments of the present application do not limit the method of encoding the hybrid splicing information.
  • conventional data compression encoding methods such as equal-length encoding or variable-length encoding may be used for compression.
  • the video compression sub-stream and the hybrid splicing information sub-stream are written in the same code stream to obtain the final code stream.
  • the embodiments of the present application not only support heterogeneous source formats such as video, point cloud, and grid in the same compressed code stream, but also implement different formats such as multi-view video strips and point cloud (or grid) strips.
  • Homogeneous strips of expression formats exist simultaneously in a heterogeneous mixed splicing image, which minimizes the number of video encoders that need to be called, reduces implementation costs, and improves ease of use.
  • the heterogeneous hybrid mosaic graph of the embodiments of the present application includes a single attribute heterogeneous hybrid mosaic graph.
  • a single-attribute heterogeneous hybrid mosaic map refers to a heterogeneous hybrid mosaic map in which all homogeneous strips include the same attribute information.
  • a single attribute heterogeneous hybrid mosaic image only includes homogeneous strips of attribute information, such as only multi-viewpoint video texture strips and point cloud texture strips.
  • a single-attribute heterogeneous hybrid mosaic image only includes homogeneous strips of geometric information, such as only multi-viewpoint video geometric strips and point cloud geometric strips.
  • the embodiments of the present application do not limit the expression format of the at least two isomorphic strips.
  • the at least two isomorphic strips include at least two of a multi-viewpoint video strip, a point cloud strip, and a grid strip.
  • At least two homogeneous strips are spliced to generate a heterogeneous hybrid splicing image, including:
  • S602-A Splice at least a single attribute isomorphic strip in the first expression format and a single attribute strip in the second expression format to obtain a heterogeneous hybrid splicing diagram.
  • first expression format and the second expression format are any one of multi-view video, point cloud, and grid, and the first expression format and the second expression format are different.
  • the single attribute isomorphic strip of the multi-view video includes at least one of a multi-view video texture strip, a multi-view video geometry strip, and the like.
  • the single attribute isomorphic strip of the point cloud includes at least one of a point cloud texture strip, a point cloud geometry strip, a point cloud occupancy strip, and the like.
  • the single attribute isomorphic strip of the mesh includes at least one of a mesh texture strip and a mesh geometry strip.
  • At least two of the multi-viewpoint video geometry strips, point cloud geometry strips, and mesh geometry strips are spliced into one image to obtain a heterogeneous hybrid spliced image.
  • This heterogeneous mixed mosaic diagram is called a single attribute heterogeneous mixed mosaic diagram.
  • At least two of multi-viewpoint video texture strips, point cloud texture strips, and mesh texture strips are spliced into one image to obtain a heterogeneous hybrid spliced image.
  • This heterogeneous mixed mosaic diagram is called a single attribute heterogeneous mixed mosaic diagram.
  • the heterogeneous hybrid mosaic diagram of the embodiments of the present application includes a multi-attribute heterogeneous hybrid mosaic diagram, that is, the attribute information of at least two homogeneous strips among the homogeneous strips included in the heterogeneous hybrid mosaic map.
  • a multi-attribute heterogeneous hybrid spliced image includes both an isomorphic spliced image of attribute information and an isomorphic spliced image of geometric information.
  • any one attribute or strips under any two attributes of at least two of the point cloud, multi-viewpoint video and grid can be spliced into one image to obtain a heterogeneous hybrid spliced image. This application does not limit this.
  • multi-viewpoint video texture strips are spliced into one image with at least one of point cloud geometry strips and mesh geometry strips to obtain a heterogeneous hybrid spliced image.
  • This heterogeneous hybrid mosaic diagram is called a multi-attribute heterogeneous hybrid mosaic diagram.
  • the multi-viewpoint video geometry strips are spliced into one image with at least one of the point cloud texture strips and the mesh texture strips to obtain a heterogeneous hybrid spliced image.
  • This heterogeneous hybrid mosaic diagram is called a multi-attribute heterogeneous hybrid mosaic diagram.
  • the point cloud texture strip is spliced into one image with at least one of the multi-viewpoint video geometry strip and the mesh geometry strip to obtain a heterogeneous hybrid spliced image.
  • This heterogeneous hybrid mosaic diagram is called a multi-attribute heterogeneous hybrid mosaic diagram.
  • a point cloud geometric strip and at least one of a multi-viewpoint video texture strip and a grid texture strip are spliced into one image to obtain a heterogeneous hybrid spliced image.
  • This heterogeneous hybrid mosaic diagram is called a multi-attribute heterogeneous hybrid mosaic diagram.
  • the multi-viewpoint video strips include multi-viewpoint video texture strips and multi-viewpoint video geometry strips
  • the point cloud mosaic includes point cloud texture strips, point cloud geometry strips and point cloud occupancy strips.
  • Method 1 Splice multi-viewpoint video texture strips, multi-viewpoint video geometry strips, point cloud texture strips, point cloud geometry strips and point cloud occupancy strips into a heterogeneous hybrid splicing image.
  • Method 2 According to the preset hybrid splicing method, splice multi-view video texture strips, multi-view video geometry strips, point cloud texture strips, point cloud geometry strips and point cloud occupancy strips to obtain M
  • M is a positive integer greater than or equal to 1.
  • Method 2 can include at least the following examples:
  • Example 1 Splice multi-viewpoint video texture strips and point cloud texture strips to obtain a heterogeneous mixed texture splicing image. Splice multi-viewpoint video geometry strips and point cloud geometry strips to obtain a heterogeneous mixed geometry splicing image. , separate the point cloud occupancy strips into a mixed mosaic image.
  • the multi-view video is processed to obtain a multi-view video strip, where the multi-view video strip includes a multi-view video texture strip. and multi-view video geometric striping.
  • Process point cloud 1 to obtain point cloud texture strip 1, point cloud geometry strip 1, and point cloud 1 occupancy strip.
  • Process point cloud 2 to obtain point cloud texture strip 2, point cloud geometry strip 2, and point cloud 2 occupancy strip.
  • the point cloud 1 occupancy strip and the point cloud 2 occupancy strip can be merged into a point cloud occupancy mosaic.
  • the multi-view video texture strip, point cloud texture strip 1 and point cloud texture strip 2 are mixed and spliced to obtain a heterogeneous mixed texture splicing image, as shown in Figure 9A.
  • the multi-viewpoint video geometry strip, point cloud geometry strip 1 and point cloud geometry strip 2 are spliced to obtain a heterogeneous mixed geometry splicing image, for example, as shown in Figure 9B.
  • the point cloud occupancy splicing image is separately used as a hybrid splicing image, as shown in Figure 9C.
  • Example 2 Splice multi-view video texture strips and point cloud texture strips to obtain a heterogeneous mixed texture splicing image. Splice multi-view video geometry strips, point cloud geometry strips and point cloud occupancy strips. A mosaic of heterogeneous mixed geometry and occupancy is obtained.
  • Example 3 Splice multi-view video texture strips, point cloud texture strips and point cloud occupancy strips to obtain a sub-heterogeneous hybrid stitching image, which combines multi-view video geometry strips and point cloud geometry strips Perform splicing to obtain another sub-heterogeneous hybrid splicing picture.
  • Examples 1 to 3 are only part of the hybrid splicing methods.
  • the hybrid splicing methods in the embodiments of the present application include but are not limited to the above-mentioned Examples 1 to 3.
  • a video encoder is used to encode M heterogeneous hybrid spliced images respectively to obtain a video compression sub-stream.
  • each of the M heterogeneous hybrid spliced images can be used as a frame image for video encoding to obtain a video compression sub-stream.
  • use a video encoder to encode the heterogeneous mixed texture mosaic shown in Figure 9A, the heterogeneous mixed geometry mosaic shown in Figure 9B, and the point cloud mixed occupancy mosaic shown in Figure 9C to obtain video compression. substream.
  • hybrid splicing information corresponding to each heterogeneous hybrid splicing image in the M heterogeneous hybrid splicing images is generated.
  • the hybrid splicing information sub-streams of the M heterogeneous hybrid splicing images can be obtained.
  • the hybrid splicing information corresponding to each of the M heterogeneous hybrid splicing images is combined to form a complete hybrid splicing information. Then, the complete hybrid splicing information is encoded to obtain the hybrid splicing information. substream.
  • the multi-view video is processed, for example, through TMIV packaging technology, to obtain multi-view video texture strips and multi-view video geometry strips.
  • Process the point cloud for example, through TMC2 packaging technology, to obtain point cloud texture strips, point cloud geometry strips, and point cloud occupancy strips.
  • use the preset hybrid splicing method to splice the multi-viewpoint video texture strips, multi-viewpoint video geometry strips, point cloud texture strips, point cloud geometry strips and point cloud occupancy strips to obtain M different Construct a hybrid mosaic.
  • multi-view video texture strips and point cloud texture strips are spliced to obtain a heterogeneous mixed texture splicing image; multi-view video geometry strips and point cloud geometry strips are spliced to obtain heterogeneous mixed texture strips.
  • use the video encoder to encode the heterogeneous mixed texture splicing map, the heterogeneous mixed geometry splicing map and the point cloud mixed occupancy splicing map to obtain the video compression sub-stream, and encode the mixed splicing information to obtain the hybrid splicing information. substream.
  • the video compression sub-stream and the hybrid splicing information sub-stream are written into the same compressed code stream.
  • the original V3C standard since the original V3C standard only supports the stitching of isomorphic texture, geometry, and occupancy splicing images into a hybrid splicing image, that is to say, it only supports packaging multi-viewpoint video splicing images into multi-viewpoint hybrid splicing images, or point clouds. The splicing image is packaged into a point cloud hybrid splicing image. Therefore, the original V3C standard does not support packaging multi-viewpoint video strips and point cloud strips into the same heterogeneous hybrid splicing image. Therefore, some syntax elements need to be added or modified in the V3C standard. .
  • the parameter set of the code stream may include first information, and the first information may be used to indicate whether the code stream simultaneously contains code streams corresponding to at least two visual media contents in different expression formats.
  • the encoding end can also write the parameter set of the code stream into the code stream.
  • the encoding end writes the video compression sub-stream, the hybrid splicing information sub-stream and the parameter set into the same compressed code stream.
  • the parameter set of the code stream may be V3C_VPS, and ptl_profile_toolset_idc in V3C_VPS may be used to represent the first information.
  • setting the first information to different values indicates whether the code stream simultaneously contains two types of code streams corresponding to at least two different expression formats of visual media content, or indicates which two types of code streams are included in the code stream at the same time. (or several) code streams corresponding to visual media content in different expression formats. That is to say, some preset values of the first information can indicate that the code stream contains at least two visual media contents with different expression formats, or indicate that the code stream contains at least two visual media contents with different expression formats.
  • the first preset value is used to indicate that the code stream includes both the code stream of the multi-viewpoint video and the code stream of the point cloud.
  • the first preset value is used to indicate that the code stream includes both the code stream of the multi-viewpoint video and the code stream of the point cloud.
  • ptl_profile_toolset_idc Pack the MIV patches and VPCC patches as different tiles in a heterogeneous atlas.
  • the second preset value is used to indicate that the code stream only contains a code stream of multi-viewpoint video.
  • the third preset value is used to indicate that the code stream only contains a point cloud video code stream.
  • the profile, tier, and level syntax (Profile, tier, and level syntax) containing the first information may be as shown in Table 3.
  • the V3C_VPS in the existing V3C standard can be reused, and ptl_profile_toolset_idc is pre-configured with preset values such as 128/129/130/131/132/133 to indicate that the current code stream also contains
  • VPCC such as VPCC basic or VPCC extended, etc.
  • MIV such as MIVmain or MIV Extended or MIV Geometry Absent, etc.
  • Table 3 Profile, tier, and level syntax
  • Table 4 shows an example of available toolset profile components.
  • Table 4 provides a list of toolset profile components defined for V3C and their corresponding identification syntax element values, such as ptl_profile_toolset_idc and ptc_one_v3c_frame_only_flag. This definition may be used for this document only.
  • the syntax element ptl_profile_toolset_idc provides the main definition of the toolset profile.
  • Additional syntax elements such as ptc_one_v3c_frame_only_flag can specify additional characteristics or restrictions of the defined profile.
  • ptc_one_v3c_frame_only_flag can be used to support only a single V3C frame.
  • ptl_profile_toolset_idc The value of ptl_profile_toolset_idc is reserved for future ISO/IEC use and should not appear in the V3C bitstream in conforming versions of this document.
  • the configuration file types defined in Table 4 can include dynamic (Dynamic) or static (Static).
  • the applicable standard protocols can be V-PCC Extended and MIV Geometry Absent, that is, the code stream contains two types of code streams: V-PCC Extended and MIV Geometry Absent.
  • the embodiment of the present application adds the value of the first information to the parameter set to indicate that the code stream simultaneously contains at least two code streams corresponding to visual media content in different expression formats. , can help improve the decoding accuracy of the decoder, and at the same time enable the V3C standard to support visual media content in different expression formats such as multi-view videos, point clouds, grids, etc. in the same compressed code stream.
  • the hybrid splicing information of the heterogeneous hybrid splicing image may include a first flag and a second flag, where the first flag may be used to indicate whether the code stream contains multi-view video syntax elements, and the second flag may Can be used to indicate whether the code stream contains point cloud syntax elements.
  • asps_miv_extension_present_flag can be used to represent the first flag
  • asps_vpcc_extension_present_flag can be used to represent the second flag
  • the first flag and the second flag are located in an atlas sequence parameter set, such as Network Abstraction Layer-Atlas Sequence Parameter Set (NAL-ASPS).
  • NAL-ASPS Network Abstraction Layer-Atlas Sequence Parameter Set
  • the NAL-ASPS of the V3C_AD code stream may contain asps_miv_extension_present_flag and asps_vpcc_extension_present_flag.
  • the first flag is set to a specific value to indicate whether the code stream contains multi-view video syntax elements
  • the second flag is set to a specific value to indicate whether the code stream contains point cloud syntax elements.
  • a specific value of asps_miv_extension_present_flag can indicate whether the atlas code stream corresponding to altas data contains multi-view video syntax elements
  • a specific value of asps_vpcc_extension_present_flag can indicate whether the atlas code stream corresponding to altas data contains multi-view video syntax elements.
  • the first flag is set to a first value, and the first value can be used to indicate that the code stream contains multi-view video syntax elements, that is, multi-view video coding and decoding standards can be used for video encoding and decoding;
  • the second flag is set to a second value, and the second value can be used to indicate that the code stream contains point cloud syntax elements, that is, the point cloud encoding and decoding standard can be used for video encoding and decoding.
  • the first value is 1.
  • the second value is 1.
  • the first flag when the first flag is 1, it can indicate that the code stream contains multi-view video syntax elements, otherwise (for example, the first flag is 0) the code stream does not contain multi-view video syntax elements; when the second flag is 1, it can Indicates that the code stream contains point cloud syntax elements, otherwise (for example, the first flag is 0) the code stream does not contain point cloud syntax elements.
  • the number of heterogeneous hybrid mosaic images may be one or more.
  • the V3C code stream can include multiple atlas code streams corresponding to multiple altas data.
  • the NAL_ASPS corresponding to each atlas code stream may include the above-mentioned first flag and/or second flag.
  • Example 1 After determining that ptl_profile_toolset_idc is 128/129/130/131/132/133, the NAL_ASPS corresponding to the first atlas code stream among the multiple atlas code streams in the V3C_AD code stream can include asps_vpcc_extension_present_flag as the first value, and the second atlas code stream.
  • the corresponding NAL_ASPS may include asps_miv_extension_present_flag as the second value.
  • the value of asps_miv_extension_present_flag in NAL_ASPS corresponding to Altas 1 is not limited, for example, it can be 0 or 1.
  • the value of asps_vpcc_extension_present_flag in NAL_ASPS corresponding to Altas 2 is not limited, for example, it can be 0 or 1.
  • Example 1 does not limit the multi-view video syntax elements and point cloud syntax elements to be located in the NAL_ASPS corresponding to an atlas code stream. That is, in the entire V3C code stream, part of the altas data in multiple altas data may include multi-view video syntax elements but not point cloud syntax elements, and part of the altas data may include point cloud syntax elements but not multi-view syntax elements.
  • Table 5 shows an example of the allowable values of syntax element values for the MIV toolset profile components, which can be used by ISO/IEC 23090-12.
  • the last eight columns in Table 5 are profiles of the eight new heterogeneous hybrid splicing diagrams that may be supported compared to the existing standard protocol, as well as the possible values of each syntax element.
  • the value of ptl_profile_toolset_idc of the eight new profiles is 128/129/130/131//132/133.
  • the values of asps_vpcc_extension_present_flag and aaps_vpcc_extension_present_flag in the eight new profile tables can be yes or no (i.e. 1 or 0 ), indicating that multi-view video code streams and point cloud code streams are allowed to exist simultaneously in the code stream.
  • the eight new profiles are expanded from the original four profiles.
  • the key is to modify the values of asps_vpcc_extension_present_flag and aaps_vpcc_extension_present_flag to allow them to be yes or no.
  • the profile named MIV Main Mixed V-PCC Basic and the profile named MIV Main Mixed V-PCC Extended are extended based on the profile named MIV Main.
  • the syntax elements in the two new tables are except ptl_profile_toolset_idc. , except asps_vpcc_extension_present_flag and aaps_vpcc_extension_present_flag, the rest of the syntax elements are the same as those in the profile named MIV Main.
  • the remaining six new profiles can also be given by referring to the above example.
  • the embodiment of this application only takes Table 5 as an example to illustrate the available values of the syntax element values of the MIV tool set configuration file component. It should be understood that the available values of each syntax element in Table 5 can also be other than Other values outside Table 5.
  • the standards organization may modify the standard, such as modifying vps_miv_extension_present_flag and vps_packing_information_present_flag (or other syntax elements) to other values than those shown in Table 5, but these still fall within the protection scope of the embodiments of the present application. .
  • Table 6 shows an example of the available values (Max allowed syntax element values for the V-PCC toolset profile components) of the syntax element values of the VPCC toolset profile components (taking the V3C code stream to support VPCC code stream and MIV Main code (Stream mixing as an example), can be used by ISO/IEC23090-5-Annex H.
  • Table 6 takes VPCC mixed MIV main as an example, where V-PCC Basic Mixed MIV Main and V-PCC Basic Still Mixed MIV Main correspond to ptl_profile_toolset_idc value is 131; V-PCC Extended Mixed MIV Main and V-PCC Extended Still Mixed MIV Main correspond to The value of ptl_profile_toolset_idc is 128.
  • the values of the four syntax elements vps_miv_extension_present_flag, asps_miv_extension_present_flag, casps_miv_extension_present_flag, and caf_miv_extension_present_flag in the last four columns of Table 6 can be yes or no (i.e. 1 or 0).
  • the embodiment of the present application adds a first flag and a second flag to the hybrid splicing information to indicate whether the code stream contains multi-view video syntax elements and point cloud syntax elements. It can help improve the decoding accuracy of the decoder, and at the same time enable the V3C standard to support visual media content in different expression formats such as multi-view videos and point clouds in the same compressed code stream.
  • the above-mentioned hybrid splicing information also includes a third flag, which is used to indicate the expression format type of the i-th strip in the above-mentioned heterogeneous hybrid splicing diagram, where i is a positive integer.
  • Atdu_type_flag can be used to represent the third flag.
  • a flag named atdu_type_flag[tileID] can be added to each atlas_tile_data_unit(tileID) to indicate that the stripe for that tileID is a MIV stripe or a VPCC stripe (Itisrecommendedtoadd a flag named atdu_type_flag[tileID]in each atlas_tile_data_unit( tileID) to indicate whether the tile with the ID is MIV tile or VPCC tile).
  • the expression format type corresponding to the i-th strip in the heterogeneous hybrid splicing diagram can be indicated by setting different values to the third flag.
  • the value of the third flag is set to the third value; if the i-th strip is a multi-view video strip, the value of the third flag is set to the third value; i strips are point cloud strips, then the value of the third flag is set to the fourth value.
  • the third value is 0.
  • the fourth value is 1.
  • the third flag may be located in the atlas_tile_data_unit.
  • the hybrid splicing information can be written into the ACL NAL unit type code stream, in which atlas_tile_data_unit contains atdu_type_flag. If the current strip belongs to a point cloud strip, the atdu_type_flag is 1. If the current strip belongs to a multi-viewpoint video strip With, atdu_type_flag is 0.
  • the general atlas tile data unit syntax (General atlas tile data unit syntax) after adding atdu_type_flag is shown in Table 7.
  • Atdu_type_flag when asps_miv_extension_present_flag and asps_vpcc_extension_present_flag are both yes (that is, 1), you can use atdu_type_flag to further indicate the expression format type of the current strip. For example, if atdu_type_flag is yes (that is, 1), it means that the current strip is a point cloud strip. If atdu_type_flag is no (that is, 0), it means that the current strip is a multi-viewpoint video strip.
  • the embodiment of the present application adds a third flag to the hybrid splicing information to indicate the expression format type corresponding to the i-th strip in the heterogeneous hybrid splicing image, which can help In order to improve the decoding accuracy of the decoder, it also enables the V3C standard to support visual media content in different expression formats such as multi-view videos and point clouds in the same compressed code stream.
  • the hybrid splicing information also includes sub-block information of the i-th slice, where the sub-block information is used to indicate the codec information of the sub-block in the i-th slice.
  • the sub-patch information can be located in the sub-patch data unit (patch_data_unit).
  • the sub-tile information can be used to indicate whether the current sub-tile is using The multi-viewpoint video coding standard is used for video coding; if the i-th strip is the point cloud strip, the sub-block information is used to indicate that the current sub-block is video coded using the point cloud video coding standard.
  • the sub-tile information of the current sub-tile may indicate that the multi-viewpoint video coding standard is used for video encoding; when atdu_type_flag is 1, the current sub-tile
  • the sub-tile information of can indicate that the point cloud video coding standard is used for video encoding.
  • the sub-patch data unit syntax (Patch data unit syntax) can be shown in Table 8:
  • FIG 11 shows a schematic diagram of the V3C bitstream structure provided by the embodiment of the present application.
  • the V3C parameter set () (V3C_parameter_set()) of V3C_VPS can include ptl_profile_toolset_idc. If ptl_profile_toolset_idc is 128/129/130, it means that the current code stream contains both VPCC extend and MIV main code streams.
  • the Atlas sequence parameter set () (Atlas_sequence_parameter_set_rbsp()) in the NAL_ASPS in the atlas sub-bitstream () (Atlas_sub_bitstream()) of V3C_AD may include asps_vpcc_extension_present_flag and asps_miv_extension_present_flag.
  • ptl_profile_toolset_idc is 128/129/130
  • asps_vpcc_extension_present_flag is yes (that is, is 1)
  • asps_miv_extension_present_flag is yes (that is, is 1).
  • the ACL NAL unit type (ACL_NAL_unit_type) in Atlas_sub_bitstream() of V3C_AD includes mixed splicing information.
  • atdu_type_flag may be included in the atlas tile data unit (atlas_tile_data_unit()). If atdu_type_flag is yes (that is, 1), it means that the current strip belongs to a point cloud strip; if atdu_type_flag is no (that is, 0), it means that the current strip belongs to a multi-viewpoint video strip.
  • the sub-patch information data () includes a sub-patch data unit (patch_data_unit).
  • atdu_type_flag no and asps_miv_extension_present_flag is yes, it means that the current sub-tile is implemented using the multi-view video coding standard.
  • atdu_type_flag yes, it means that the current sub-tile is implemented using the point cloud video decoding standard.
  • the following takes visual media content including multi-viewpoint video and point cloud as an example, and introduces the encoding method of the embodiment of the present application in combination with the above-mentioned new or modified syntax elements.
  • the encoding method 700 in this embodiment of the present application may include:
  • S701 for multi-viewpoint videos, use inter-viewpoint projection to erase duplicates and remove redundancies, connect non-overlapping pixels into sub-blocks, and splice sub-blocks into multi-viewpoint video strips; the point cloud uses parallel projection to form connected pixels in the projection surface. Sub-blocks are spliced into point cloud strips.
  • S702 Splice multi-viewpoint video strips and point cloud strips to generate a heterogeneous hybrid spliced image.
  • asps_vpcc_extension_present_flag can be set to 1 and asps_miv_extension_present_flag can be set to 1 in the hybrid splicing information.
  • Atdu_type_flag is set to 0 in the hybrid splicing information.
  • the current strip added to the heterogeneous hybrid splicing image is a point cloud strip, set atdu_type_flag to 1 in the hybrid splicing information.
  • S703 Perform video coding on the heterogeneous hybrid spliced image to obtain a video compression sub-stream.
  • V3C_VPS can be written into the compressed code stream.
  • the ptl_profile_toolset_idc in the V3C_VPS parameter set is set to 128/129/130, indicating that the current code stream contains both the code stream corresponding to the point cloud and the code stream corresponding to the multi-viewpoint video.
  • the specific value of ptl_profile_toolset_idc can be related to the standard protocol applicable to multi-view video or point cloud.
  • the encoding end adds atdu_type_flag to the hybrid splicing information to indicate the expression format type of the i-th strip in the heterogeneous hybrid splicing image, and increases the value of ptl_profile_toolset_idc to indicate that the code stream contains both point cloud and multi- Two types of code streams corresponding to viewpoint videos, and adding the values of asps_vpcc_extension_present_flag and asps_miv_extension_present_flag to support the code streams containing two types of code streams corresponding to point cloud and multi-view video.
  • the decoder when the decoder is decoding, it can determine that the current code stream includes two types of code streams corresponding to point cloud and multi-view video according to ptl_profile_toolset_idc, determine the expression format type of the current strip according to atdu_type_flag, and further determine the current sub-picture according to atdu_type_flag and asps_miv_extension_present_flag.
  • the encoding and decoding standard adopted by the block enables the decoding end to accurately implement decoding.
  • the encoding method of the embodiment of the present application obtains at least two isomorphic strips by processing at least two visual media contents respectively, and at least two of the at least two visual media contents have different expression formats; At least two homogeneous strips are spliced to generate a heterogeneous mixed spliced image; the heterogeneous mixed spliced image is encoded to obtain a code stream. That is, this application splices the strips corresponding to the visual media content of multiple different expression formats into a heterogeneous mixed splicing image, so that the visual media content of different expression formats (i.e., heterogeneous content) is orderly grouped into the same heterogeneous image.
  • hybrid splicing image for example, multi-viewpoint video strips and point cloud strips are spliced into a heterogeneous hybrid splicing image for encoding and decoding, which minimizes the number of 2D video encoders such as HEVC, VVC, AVC, and AVS that need to be called. The number reduces the coding cost and improves the ease of use.
  • the encoding method of the present application is introduced above by taking the encoding end as an example.
  • the video decoding method provided by the embodiment of the present application is described below by taking the decoding end as an example.
  • Figure 13 is a schematic flow chart of a decoding method 800 provided by an embodiment of the present application. As shown in Figure 13, the decoding method in this embodiment of the present application includes:
  • S802 Split the reconstructed heterogeneous mixed spliced image to obtain at least two reconstructed isomorphic strips; wherein the reconstructed isomorphic strips correspond to at least two different expression formats;
  • S803 Obtain at least two reconstructed visual media contents based on at least two reconstructed isomorphic strips, where the reconstructed visual media contents correspond to at least two different expression formats.
  • the decoder decodes the code stream to obtain a reconstructed heterogeneous mixed spliced image, and then splits the reconstructed heterogeneous mixed spliced image to obtain at least two reconstructed homogeneous strips, then the Reconstructing isomorphic strips corresponds to at least two different expression formats. Finally, the decoder performs reconstruction and other processing on the split at least two reconstructed isomorphic strips to obtain at least two reconstructed visual media contents.
  • At least two homogeneous strips of different expression formats are spliced into a heterogeneous mixed splicing picture, so that the visual media content (i.e., heterogeneous content) of different expression formats is orderly grouped into the same heterogeneous picture.
  • the number of 2D video decoders such as HEVC, VVC, AVC, and AVS that need to be called during decoding can be minimized, which reduces the decoding cost and improves the ease of use.
  • the above-mentioned code stream includes a video compression sub-stream, in which case the above-mentioned S801 includes:
  • the video decoder is called to decode the video compression sub-stream to obtain the reconstructed heterogeneous hybrid splicing image.
  • the code stream in the embodiment of the present application may also include other content.
  • the decoding end obtains the code stream and analyzes the code stream to obtain the video compression sub-stream. Further, the decoding end can decode the video compression sub-stream to obtain a reconstructed heterogeneous hybrid splicing image.
  • the video compressed sub-stream can be input into the video decoder shown in Figure 2B for decoding to obtain a reconstructed heterogeneous hybrid splicing image.
  • the encoding end can encode the hybrid splicing information to obtain a hybrid splicing information sub-stream, and write the hybrid splicing information sub-stream into the code stream. That is, in addition to the video compression sub-stream, the code stream in the embodiment of the present application may also include the hybrid splicing information sub-stream. At this time, the above decoding method may also include: decoding the hybrid splicing information sub-stream to obtain the hybrid splicing information.
  • step 802 can be implemented as follows: splitting the reconstructed heterogeneous hybrid splicing image according to the hybrid splicing information to obtain at least two reconstructed homogeneous strips.
  • the decoding end parses the code stream to obtain the video compression sub-stream and the hybrid splicing information sub-stream. Then, the decoding end decodes the video compression sub-stream to obtain the reconstructed heterogeneous hybrid splicing image. The mixed splicing information sub-stream is decoded to obtain the mixed splicing information. Finally, use the hybrid splicing information to split the reconstructed heterogeneous hybrid splicing image to obtain at least two reconstructed homogeneous strips.
  • the at least two homogeneous strips belong to at least two different types of strips among multi-view video strips, point cloud strips and mesh strips.
  • the heterogeneous hybrid mosaic graph includes at least one of a single attribute heterogeneous hybrid mosaic graph and a multi-attribute heterogeneous hybrid mosaic graph.
  • first expression format and the second expression format are any one of multi-view video, point cloud and grid, and the first expression format and the second expression format are different.
  • the above S802 may include the following example:
  • Example 1 If the reconstructed heterogeneous mixed texture mosaic is a reconstructed heterogeneous mixed texture mosaic, then split the reconstructed heterogeneous mixed texture mosaic to obtain multi-viewpoint video texture reconstruction strips and point cloud texture reconstruction strips;
  • Example 2 If the reconstructed heterogeneous mixed geometry mosaic is to reconstruct a heterogeneous mixed geometry mosaic, then the reconstructed heterogeneous mixed geometry mosaic is split to obtain multi-viewpoint video geometry reconstruction strips and point cloud geometry reconstruction strips.
  • reconstructing the heterogeneous mixed occupancy mosaic map may also include reconstructing the point cloud mixed occupancy mosaic map. Then the reconstructed point cloud mixed occupancy mosaic map is split to obtain point cloud occupancy reconstruction strips.
  • a multi-view video reconstruction strip can be obtained based on the multi-view video texture reconstruction strip and the multi-view video geometry reconstruction strip;
  • the reconstructed point cloud splicing map is obtained.
  • the decoding end inputs the code stream into the video decoder, and the video decoder decodes the video compression sub-stream to obtain the reconstructed heterogeneous mixed texture mosaic image, the reconstructed heterogeneous mixed geometry mosaic image and the reconstructed
  • the point cloud mixed occupancy splicing map is used to decode the mixed splicing information sub-stream to obtain the mixed splicing information.
  • the reconstructed heterogeneous mixed texture spliced image is split according to the mixed splicing information. For example, strip unpacking technology is used to split the reconstructed heterogeneous mixed texture spliced image to obtain multi-viewpoint video texture reconstruction strips and point cloud texture reconstruction.
  • the TMIV unpacking technology is used to perform the multi-view video texture reconstruction strip and the multi-view video geometry reconstruction strip. Processed to obtain reconstructed multi-view video. Reconstruct strips based on point cloud texture reconstruction strips, point cloud geometry reconstruction strips and point cloud occupancy to obtain reconstructed point clouds.
  • TMC2 unpacking technology can reconstruct point cloud texture strips, point cloud geometry reconstruction strips and points. The cloud occupancy conditions are reconstructed into strips for processing, and a reconstructed point cloud mosaic is obtained.
  • the encoding end of the embodiment of the present application includes the first information in the parameter set of the code stream , the first information is used to indicate that the code stream simultaneously contains code streams corresponding to at least two visual media contents in different expression formats.
  • the decoder decodes the code stream to obtain a parameter set of the code stream, which includes first information.
  • the first information is used to indicate whether the code stream simultaneously contains at least two visual media contents in different expression formats. The corresponding code stream.
  • the code stream includes both the code stream of the multi-view video and the code stream of the point cloud.
  • the code stream includes a code stream of multi-viewpoint video.
  • the code stream includes a code stream of a point cloud.
  • the hybrid splicing information in this embodiment of the present application includes a first flag and a second flag.
  • the decoding end can further parse the hybrid splicing information and obtain the first flag and the second flag.
  • the first flag is a first value, it is determined that the code stream contains multi-view video syntax elements; if the second flag is a second value, it is determined that the code stream contains point cloud syntax elements.
  • the first value is 1 and the second value is 1.
  • the first flag and the second flag are located in the atlas sequence parameter set.
  • the hybrid splicing information in the embodiment of the present application also includes a third flag, which is used to indicate the expression format type of the i-th slice in the reconstructed heterogeneous hybrid splicing diagram, where i is a positive integer.
  • the above-mentioned step 802 that is, splitting the reconstructed heterogeneous hybrid spliced image to obtain at least two reconstructed homogeneous strips, includes the following steps S802-A and S802-B:
  • S802-B extracts the i-th strip as a reconstructed isomorphic strip of the visual media expression format type corresponding to the i-th strip according to the third flag corresponding to the i-th strip.
  • the above S802-B includes the following steps:
  • the visual media expression format type corresponding to the i-th strip is point cloud.
  • the third value is 0 and the fourth value is 1.
  • the third flag is located in the stripe data unit.
  • the hybrid splicing information in the embodiment of the present application also includes the sub-block information of the i-th slice.
  • the sub-block information is used to indicate the encoding of the sub-block in the i-th slice. Decode the message.
  • S803 includes the following steps of S803A and S803B:
  • the sub-tile information is located in the sub-tile data unit.
  • the encoding method 900 in this embodiment of the present application may include:
  • S901 Extract the hybrid splicing information sub-stream and the video compression sub-stream respectively from the compressed code stream.
  • V3C_VPS can be obtained by parsing from the compressed code stream
  • ptl_profile_toolset_idc can be obtained by parsing the V3C_VPS parameter set. Judging that ptl_profile_toolset_idc is 128/129/130/131/132/133, it means that the current code stream contains both the code stream corresponding to the point cloud and the code stream corresponding to the multi-view video.
  • the specific value of ptl_profile_toolset_idc can be related to the standard protocol applicable to multi-view video or point cloud.
  • S902 Obtain the hybrid splicing information after decoding the hybrid splicing information sub-stream.
  • NAL_ASPS can be further parsed from the V3C_AD code stream to obtain asps_vpcc_extension_present_flag and asps_miv_extension_present_flag.
  • S903 Input the video compression sub-stream to the video decoder, and then output the reconstructed heterogeneous hybrid splicing image after decoding.
  • the heterogeneous hybrid splicing image will be reconstructed, and the reconstructed multi-viewpoint video strips and reconstructed point cloud strips will be split and output.
  • the atlas_tile_data_unit is parsed, and the ptl_profile_toolset_idc is determined to be 128/129/130/131/132/133, and atdu_type_flag is obtained. If atdu_type_flag is judged to be yes (that is, 1), the current strip belongs to a point cloud strip; if atdu_type_flag is judged to be no (that is, 0), then the current strip belongs to a multi-viewpoint video strip.
  • the reconstructed multi-viewpoint video strip is generated by multi-viewpoint video decoding to generate a reconstructed multi-viewpoint video
  • the reconstructed point cloud strip is generated by point cloud decoding to generate a reconstructed point cloud.
  • the decoder determines the expression format type of the i-th strip in the heterogeneous hybrid splicing image by parsing the atdu_type_flag in the hybrid splicing information, as well as the values of asps_vpcc_extension_present_flag and asps_miv_extension_present_flag, and parsing the value of ptl_profile_toolset_idc to determine the same time in the code stream Contains two types of code streams corresponding to point cloud and multi-view video, so that the decoder can determine the current code stream according to ptl_profile_toolset_idc, including two types of code streams corresponding to point cloud and multi-view video, and determine the expression format type of the current strip according to atdu_type_flag , and further determine the video codec standard adopted by the current sub-tile according to atdu_type_flag and asps_miv_extension_present_flag, thereby achieving accurate decoding.
  • the decoding end obtains a reconstructed heterogeneous mixed splicing image by decoding the code stream; splits the reconstructed heterogeneous mixed splicing image to obtain at least two reconstructed homogeneous strips; and reconstructs at least two homogeneous strips.
  • the isomorphic strips are decoded separately to obtain at least two reconstructed visual media contents, the visual media content corresponds to at least two different expression formats, and the isomorphic strips correspond to at least two different expression formats.
  • At least two homogeneous strips of different expression formats are spliced into a heterogeneous mixed splicing image, so that the visual media contents (i.e., heterogeneous content) of different expression formats are orderly grouped into the same heterogeneous image.
  • the rendering advantages of data (point clouds, etc.) from different expression formats are retained during decoding, and while improving the synthesis quality of the image, it can minimize the need to call HEVC, VVC, AVC, AVS, etc.
  • the number of dimensional video decoders reduces the decoding cost and improves ease of use.
  • the size of the sequence numbers of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its functions and internal logic, and should not be used in this application.
  • the implementation of the examples does not constitute any limitations.
  • the term "and/or" is only an association relationship describing associated objects, indicating that three relationships can exist. Specifically, A and/or B can represent three situations: A exists alone, A and B exist simultaneously, and B exists alone.
  • the character "/" in this application generally indicates that the related objects are an "or" relationship.
  • FIG. 16 is a schematic block diagram of an encoding device 10 provided by an embodiment of the present application.
  • the encoding device 10 is applied to the above-mentioned video decoding end.
  • the encoding device 10 includes:
  • the first splicing unit 11 is used to process at least two visual media contents respectively to obtain at least two isomorphic strips.
  • the visual media contents correspond to at least two different expression formats.
  • the isomorphic strips correspond to at least Two different expression formats;
  • the second splicing unit 12 is used to splice the at least two homogeneous strips to generate a heterogeneous hybrid splicing image
  • the encoding unit 13 is used to encode the heterogeneous hybrid splicing image to obtain a code stream.
  • the encoding unit 13 is specifically used for:
  • the video compression sub-stream and the hybrid splicing information sub-stream are written into the code stream.
  • the at least two homogeneous strips include at least two different types of strips among a multi-view video strip, a point cloud strip, and a mesh strip.
  • the heterogeneous hybrid mosaic graph includes at least one of a single attribute heterogeneous hybrid mosaic graph and a multi-attribute heterogeneous hybrid mosaic graph.
  • the encoding unit 13 is also used for:
  • the encoding unit 13 is also used to:
  • the first information is set to a first preset value, wherein the first preset value is used to indicate that the code stream includes both a code stream of a multi-viewpoint video and a code stream of a point cloud.
  • the encoding unit 13 is also used to:
  • the first information is set to a second preset value, wherein the second preset value is used to indicate that the code stream includes a code stream of multi-viewpoint video; or
  • the first information is set to a third preset value, where the third preset value is used to indicate that the code stream includes a point cloud.
  • the hybrid splicing information of the heterogeneous hybrid splicing image includes a first flag and a second flag, and the encoding unit 13 is also used to:
  • the second flag is set to a second value, and the second value is used to indicate that the code stream contains a point cloud syntax element.
  • the first flag and the second flag are located in an atlas sequence parameter set.
  • the hybrid splicing information further includes a third flag, the third flag is used to indicate the expression format type of the i-th strip in the heterogeneous hybrid splicing graph, where i is a positive integer.
  • the encoding unit 13 is also used to:
  • the i-th slice is the multi-viewpoint video slice, then set the value of the third flag to a third value;
  • the value of the third flag is set to a fourth value.
  • the third flag is located in the stripe data unit.
  • the hybrid splicing information also includes sub-block information of the i-th slice, and the sub-block information is used to indicate the codec of the sub-block in the i-th slice. information.
  • the sub-block information is used to indicate that the current sub-block is obtained using a multi-view video coding standard
  • the sub-block information is used to indicate that the current sub-block is obtained using a point cloud video coding standard.
  • the sub-tile information is located in a sub-tile data unit.
  • the at least two visual media contents are media contents presented simultaneously in the same three-dimensional space.
  • the device embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here.
  • the device 10 shown in FIG. 16 can execute the encoding method of the encoding end of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 10 are respectively to implement the encoding method of the encoding end and other methods. The corresponding process, for the sake of brevity, will not be repeated here.
  • Figure 17 is a schematic block diagram of a decoding device provided by an embodiment of the present application.
  • the decoding device is applied to the above-mentioned decoding end.
  • the decoding device 20 may include:
  • the decoding unit 21 is used to decode the code stream and obtain the reconstructed heterogeneous hybrid splicing image
  • Splitting unit 22 is used to split the reconstructed heterogeneous mixed spliced image to obtain at least two reconstructed homogeneous strips; the reconstructed homogeneous strips correspond to at least two different expression formats.
  • the processing unit 23 is configured to obtain at least two reconstructed visual media contents based on the at least two reconstructed isomorphic strips, and the reconstructed visual media contents correspond to at least two different expression formats.
  • the code stream includes a video compression sub-stream; the decoding unit 21 is specifically used to:
  • a video decoder is called to decode the video compression sub-stream to obtain the reconstructed heterogeneous hybrid splicing image.
  • the code stream also includes hybrid splicing information sub-stream decoding, and the decoding unit 21 is also used to:
  • the splitting unit 22 is specifically used for:
  • the reconstructed heterogeneous hybrid splicing image is split to obtain the at least two reconstructed homogeneous strips.
  • the at least two homogeneous strips include at least two different types of strips among a multi-view video strip, a point cloud strip, and a mesh strip.
  • the heterogeneous hybrid mosaic graph includes at least one of a single attribute heterogeneous hybrid mosaic graph and a multi-attribute heterogeneous hybrid mosaic graph.
  • the decoding unit 21 is also used to:
  • the decoding unit 21 is also used to:
  • the code stream includes both a code stream of multi-viewpoint video and a code stream of point cloud.
  • the decoding unit 21 is also used to:
  • the code stream contains a code stream of multi-viewpoint video
  • the code stream contains a code stream of point cloud.
  • the hybrid splicing information for reconstructing the heterogeneous hybrid splicing image includes a first flag and a second flag, and the decoding unit 21 is also used to:
  • the first flag is a first value, it is determined that the code stream contains a multi-view video syntax element
  • the second flag is a second value, it is determined that the code stream contains a point cloud syntax element.
  • the first flag and the second flag are located in an atlas sequence parameter set.
  • the hybrid splicing information also includes a third flag, the third flag is used to indicate the expression format type of the i-th strip in the reconstructed heterogeneous hybrid splicing diagram, where i is a positive integer. .
  • the splitting unit 22 is specifically used for:
  • the i-th strip is extracted as a reconstructed isomorphic strip of the visual media expression format type corresponding to the i-th strip.
  • the splitting unit 22 is specifically configured to:
  • the value of the third flag is a third numerical value, it is determined that the expression format of the media content corresponding to the i-th strip is a multi-viewpoint video;
  • the value of the third flag is a fourth value, it is determined that the expression format of the media content corresponding to the i-th strip is a point cloud.
  • the third flag is located in the stripe data unit.
  • the hybrid splicing information also includes sub-block information of the i-th slice, and the sub-block information is used to indicate the codec of the sub-block in the i-th slice. information.
  • the processing unit 23 is specifically used to:
  • the i-th slice is the multi-view video strip, perform multi-view video decoding on the current sub-block of the i-th slice to obtain a reconstructed multi-view video;
  • point cloud decoding is performed on the current sub-block of the i-th strip to obtain a reconstructed point cloud.
  • the sub-tile information is located in a sub-tile data unit.
  • the at least two visual media contents are media contents presented simultaneously in the same three-dimensional space.
  • the device embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here.
  • the device 20 shown in FIG. 17 may correspond to the corresponding subject in performing the prediction method of the decoding end in the embodiment of the present application, and the foregoing and other operations and/or functions of each unit in the device 20 are respectively to implement the decoding of the decoding end. For the sake of brevity, the corresponding processes in each method will not be described again here.
  • the software unit may be located in a mature storage medium in this field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, etc.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps in the above method embodiment in combination with its hardware.
  • Figure 18 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 30 may be the video encoder or video decoder described in the embodiment of the present application.
  • the electronic device 30 may include:
  • Memory 33 and processor 32 the memory 33 is used to store the computer program 34 and transmit the program code 34 to the processor 32.
  • the processor 32 can call and run the computer program 34 from the memory 33 to implement the method in the embodiment of the present application.
  • the processor 32 may be configured to perform the steps in the above method 200 according to instructions in the computer program 34 .
  • the processor 32 may include but is not limited to:
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the memory 33 includes but is not limited to:
  • Non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory may be Random Access Memory (RAM), which is used as an external cache.
  • RAM Random Access Memory
  • RAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDR SDRAM
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • the computer program 34 can be divided into one or more units, and the one or more units are stored in the memory 33 and executed by the processor 32 to complete the tasks provided by this application.
  • the one or more units may be a series of computer program instruction segments capable of completing specific functions. The instruction segments are used to describe the execution process of the computer program 34 in the electronic device 30 .
  • the electronic device 30 may also include:
  • Transceiver 33 the transceiver 33 can be connected to the processor 32 or the memory 33 .
  • the processor 32 can control the transceiver 33 to communicate with other devices. Specifically, it can send information or data to other devices, or receive information or data sent by other devices.
  • Transceiver 33 may include a transmitter and a receiver.
  • the transceiver 33 may further include an antenna, and the number of antennas may be one or more.
  • bus system where in addition to the data bus, the bus system also includes a power bus, a control bus and a status signal bus.
  • This application also provides a computer storage medium on which a computer program is stored.
  • the computer program When the computer program is executed by a computer, the computer can perform the method of the above method embodiment.
  • embodiments of the present application also provide a computer program product containing instructions, which when executed by a computer causes the computer to perform the method of the above method embodiments.
  • This application also provides a code stream, which is generated according to the above encoding method.
  • the code stream includes the above first flag, or includes a first flag and a second flag.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted over a wired connection from a website, computer, server, or data center (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website, computer, server or data center.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the available media may be magnetic media (such as floppy disks, hard disks, magnetic tapes), optical media (such as digital video discs (DVD)), or semiconductor media (such as solid state disks (SSD)), etc.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separate.
  • a component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or it may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in various embodiments of the present application can be integrated into a processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente demande concerne un procédé et un appareil de codage, un procédé et un appareil de décodage, ainsi qu'un dispositif et un support de stockage. Selon la demande, des tranches isomorphes correspondant à un contenu multimédia visuel d'une pluralité de formats d'expression différents sont reliées en une image assemblée hybride hétérogène, de sorte que le contenu multimédia visuel (c'est-à-dire, un contenu hétérogène) de différents formats d'expression soit formé dans une même image assemblée hybride hétérogène de manière ordonnée et groupée, par exemple, des tranches vidéo multi-vues et des tranches de nuages de points sont reliées en une image assemblée hybride hétérogène pour le codage et le décodage, ce qui permet de réduire autant que possible le nombre de codecs vidéo bidimensionnels tels que HEVC, VVC, AVC et AVS à appeler, de réduire les coûts de codage et de décodage et d'améliorer la facilité d'utilisation.
PCT/CN2022/087523 2022-04-18 2022-04-18 Procédé et appareil de codage, procédé et appareil de décodage, dispositif et support de stockage WO2023201504A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/087523 WO2023201504A1 (fr) 2022-04-18 2022-04-18 Procédé et appareil de codage, procédé et appareil de décodage, dispositif et support de stockage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/087523 WO2023201504A1 (fr) 2022-04-18 2022-04-18 Procédé et appareil de codage, procédé et appareil de décodage, dispositif et support de stockage

Publications (1)

Publication Number Publication Date
WO2023201504A1 true WO2023201504A1 (fr) 2023-10-26

Family

ID=88418946

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/087523 WO2023201504A1 (fr) 2022-04-18 2022-04-18 Procédé et appareil de codage, procédé et appareil de décodage, dispositif et support de stockage

Country Status (1)

Country Link
WO (1) WO2023201504A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2373046A2 (fr) * 2010-03-30 2011-10-05 Vestel Elektronik Sanayi ve Ticaret A.S. Super résolution basée sur un codage vidéo d'une vue n et de vues multiples de profondeur n
US20150341614A1 (en) * 2013-01-07 2015-11-26 National Institute Of Information And Communications Technology Stereoscopic video encoding device, stereoscopic video decoding device, stereoscopic video encoding method, stereoscopic video decoding method, stereoscopic video encoding program, and stereoscopic video decoding program
CN110012279A (zh) * 2018-01-05 2019-07-12 上海交通大学 基于3d点云数据的分视角压缩和传输方法及系统
CN112601082A (zh) * 2020-11-30 2021-04-02 南京邮电大学 一种基于视频的快速动态点云编码方法及系统
CN114071116A (zh) * 2020-07-31 2022-02-18 阿里巴巴集团控股有限公司 视频处理方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2373046A2 (fr) * 2010-03-30 2011-10-05 Vestel Elektronik Sanayi ve Ticaret A.S. Super résolution basée sur un codage vidéo d'une vue n et de vues multiples de profondeur n
US20150341614A1 (en) * 2013-01-07 2015-11-26 National Institute Of Information And Communications Technology Stereoscopic video encoding device, stereoscopic video decoding device, stereoscopic video encoding method, stereoscopic video decoding method, stereoscopic video encoding program, and stereoscopic video decoding program
CN110012279A (zh) * 2018-01-05 2019-07-12 上海交通大学 基于3d点云数据的分视角压缩和传输方法及系统
CN114071116A (zh) * 2020-07-31 2022-02-18 阿里巴巴集团控股有限公司 视频处理方法、装置、电子设备及存储介质
CN112601082A (zh) * 2020-11-30 2021-04-02 南京邮电大学 一种基于视频的快速动态点云编码方法及系统

Similar Documents

Publication Publication Date Title
US11151742B2 (en) Point cloud data transmission apparatus, point cloud data transmission method, point cloud data reception apparatus, and point cloud data reception method
US11170556B2 (en) Apparatus for transmitting point cloud data, a method for transmitting point cloud data, an apparatus for receiving point cloud data and a method for receiving point cloud data
US11979605B2 (en) Attribute layers and signaling in point cloud coding
US20220159261A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
US11968393B2 (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method
KR20180051594A (ko) 개선된 컬러 재맵핑 정보 보충 강화 정보 메시지 프로세싱
CN115443652B (zh) 点云数据发送设备、点云数据发送方法、点云数据接收设备和点云数据接收方法
KR102659806B1 (ko) V-pcc용 스케일링 파라미터
US10574959B2 (en) Color remapping for non-4:4:4 format video content
TWI713354B (zh) 用於顯示器調適之色彩重映射資訊sei信息發信號
WO2023142127A1 (fr) Procédés et appareils de codage et de décodage, dispositif et support de stockage
CN114503587A (zh) 点云数据发送装置、点云数据发送方法、点云数据接收装置和点云数据接收方法
JP2024012544A (ja) クロマイントラモード導出のエンコーダ、デコーダ、および対応する方法
WO2022166462A1 (fr) Procédé de codage/décodage et dispositif associé
WO2023201504A1 (fr) Procédé et appareil de codage, procédé et appareil de décodage, dispositif et support de stockage
WO2024011386A1 (fr) Procédé et appareil de codage, procédé et appareil de décodage, et codeur, décodeur et support de stockage
CN115804096A (zh) 点云数据发送装置、点云数据发送方法、点云数据接收装置和点云数据接收方法
JP2023509513A (ja) 点群コーディングにおけるカメラパラメータのシグナリング
WO2024077806A1 (fr) Procédé et appareil de codage, procédé et appareil de décodage, et codeur, décodeur et support de stockage
US20230419557A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
WO2024077616A1 (fr) Procédé de codage et de décodage et appareil de codage et de décodage, dispositif et support de stockage
US20240064337A1 (en) Transform-based image coding method, and device therefor
US20230164356A1 (en) Method and apparatus for coding image on basis of transform
CN115428442A (zh) 点云数据发送装置、点云数据发送方法、点云数据接收装置和点云数据接收方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22937742

Country of ref document: EP

Kind code of ref document: A1