WO2024011386A1 - Coding method and apparatus, decoding method and apparatus, and coder, decoder and storage medium - Google Patents

Coding method and apparatus, decoding method and apparatus, and coder, decoder and storage medium Download PDF

Info

Publication number
WO2024011386A1
WO2024011386A1 PCT/CN2022/105006 CN2022105006W WO2024011386A1 WO 2024011386 A1 WO2024011386 A1 WO 2024011386A1 CN 2022105006 W CN2022105006 W CN 2022105006W WO 2024011386 A1 WO2024011386 A1 WO 2024011386A1
Authority
WO
WIPO (PCT)
Prior art keywords
syntax element
expression format
splicing
isomorphic
spliced
Prior art date
Application number
PCT/CN2022/105006
Other languages
French (fr)
Chinese (zh)
Inventor
虞露
金峡钶
朱志伟
戴震宇
Original Assignee
浙江大学
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学, Oppo广东移动通信有限公司 filed Critical 浙江大学
Priority to PCT/CN2022/105006 priority Critical patent/WO2024011386A1/en
Publication of WO2024011386A1 publication Critical patent/WO2024011386A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • the present application relates to the field of image processing technology, and in particular to a coding and decoding method, device, encoder, decoder and storage medium.
  • Visual expressions with different expression formats may appear in the same scene.
  • media objects For example, in the same three-dimensional scene, the scene background and some characters and objects are expressed in video, and another part of the characters are expressed in three-dimensional point cloud or three-dimensional grid.
  • the current encoding and decoding technology encodes and decodes multi-view video, point cloud encoding and grid mesh respectively.
  • a large number of codecs need to be called during the encoding and decoding process, making encoding and decoding expensive.
  • Embodiments of the present application provide a coding and decoding method, device, encoder, decoder, and storage medium.
  • this application provides a decoding method applied to a decoder, including:
  • the spliced graph is split according to the spliced graph information of the spliced graph to obtain at least two types of isomorphic blocks, wherein: The at least two isomorphic blocks correspond to different visual media content expression formats;
  • the spliced graph is a isomorphic spliced graph according to the first syntax element
  • the spliced graph is split according to the spliced graph information of the spliced graph to obtain a homogeneous block, wherein the one Each isomorphic block corresponds to the same visual media content expression format
  • the homogeneous blocks are decoded and reconstructed to obtain visual media content in at least one expression format.
  • this application provides an encoding method, applied to the encoder, including:
  • the at least one isomorphic block is spliced to obtain at least one spliced graph and spliced graph information, wherein the spliced graph information includes a first syntax element, and it is determined that the spliced graph is a heterogeneous one according to the first syntax element.
  • a heterogeneous hybrid mosaic map or a homogeneous mosaic map the heterogeneous hybrid mosaic map includes at least two types of isomorphic blocks, and the isomorphic mosaic map includes one type of isomorphic block;
  • the at least one spliced image and the spliced image information are encoded to obtain a code stream.
  • this application provides a decoding device, applied to a decoder, which includes:
  • a decoding unit configured to decode the code stream to obtain a splicing image and splicing image information, wherein the splicing image information includes a first syntax element, and the splicing image is determined to be a heterogeneous hybrid splicing image or a homogeneous splicing image according to the first syntax element. picture;
  • a first splitting unit configured to split the spliced image according to the spliced image information of the spliced image to obtain at least two kinds of Isomorphic blocks, wherein the at least two isomorphic blocks correspond to different visual media content expression formats;
  • the second splitting unit is configured to split the spliced diagram according to the spliced diagram information of the spliced diagram to obtain an isomorphic spliced diagram when it is determined according to the first syntax element that the spliced diagram is an isomorphic spliced diagram.
  • a processing unit configured to decode and reconstruct the homogeneous blocks to obtain visual media content in at least one expression format.
  • this application provides an encoding device, applied to an encoder, which includes:
  • a processing unit configured to process visual media content in at least one expression format to obtain at least one isomorphic block, wherein different types of isomorphic blocks correspond to different visual media content expression formats;
  • a splicing unit configured to splice the at least one isomorphic block to obtain at least one splicing graph and splicing graph information, wherein the splicing graph information includes a first syntax element, and the splicing graph information is determined according to the first syntax element.
  • the mosaic diagram is a heterogeneous hybrid mosaic diagram or a homogeneous mosaic diagram, the heterogeneous hybrid mosaic diagram includes at least two types of isomorphic blocks, and the isomorphic mosaic diagram includes one type of isomorphic block;
  • An encoding unit used to encode the at least one spliced image and the spliced image information to obtain a code stream.
  • a decoder including a first memory and a first processor; the first memory stores a computer program executable on the first processor to execute the above first aspect or its respective implementations. method within the method.
  • an encoder including a second memory and a second processor; the second memory stores a computer program that can be run on the second processor to execute the above second aspect or its respective implementations. method within the method.
  • the seventh aspect provides a coding and decoding system, including an encoder and a decoder.
  • the encoder is configured to perform the method in the above second aspect or its implementations, and the decoder is used to perform the method in the above first aspect or its implementations.
  • An eighth aspect provides a chip for implementing any one of the above-mentioned first to second aspects or the method in each implementation manner thereof.
  • the chip includes: a processor, configured to call and run a computer program from a memory, so that the device installed with the chip executes any one of the above-mentioned first to second aspects or implementations thereof. method.
  • a ninth aspect provides a computer-readable storage medium for storing a computer program that causes a computer to execute any one of the above-mentioned first to second aspects or the method in each implementation thereof.
  • a computer program product including computer program instructions, which enable a computer to execute any one of the above-mentioned first to second aspects or the methods in each implementation thereof.
  • An eleventh aspect provides a computer program that, when run on a computer, causes the computer to execute any one of the above-mentioned first to second aspects or the method in each implementation thereof.
  • a twelfth aspect provides a code stream, which is generated based on the encoding method of the second aspect.
  • homogeneous blocks in different expression formats are spliced into a heterogeneous mixed splicing picture, and homogeneous blocks in the same expression format are spliced into a heterogeneous mixed splicing image.
  • the splicing picture information includes the first syntax element used to indicate the type of the splicing picture, which improves the decoding efficiency of the splicing picture at the decoding end.
  • Figure 1 is a schematic block diagram of a video encoding and decoding system related to an embodiment of the present application
  • Figure 2A is a schematic block diagram of a video encoder involved in an embodiment of the present application.
  • Figure 2B is a schematic block diagram of a video decoder involved in an embodiment of the present application.
  • Figure 3A is a diagram of the organization and expression framework of multi-viewpoint video data
  • Figure 3B is a schematic diagram of splicing image generation of multi-viewpoint video data
  • Figure 3C is a diagram of the organization and expression framework of point cloud data
  • Figures 3D to 3F are schematic diagrams of different types of point cloud data
  • Figure 4 is a schematic diagram of multi-viewpoint video encoding
  • Figure 5 is a schematic diagram of decoding multi-viewpoint video
  • Figure 6 is a schematic flow chart of an encoding method provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of a heterogeneous hybrid splicing diagram provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of a isomorphic splicing diagram provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of the V3C bitstream structure provided by the embodiment of the present application.
  • Figure 10 is a schematic flow chart of a decoding method provided by an embodiment of the present application.
  • Figure 11 is a schematic block diagram of an encoding device provided by an embodiment of the present application.
  • Figure 12 is a schematic block diagram of a decoding device provided by an embodiment of the present application.
  • Figure 13 is a schematic block diagram of an encoder provided by an embodiment of the present application.
  • Figure 14 is a schematic block diagram of a decoder provided by an embodiment of the present application.
  • Figure 15 is a schematic structural diagram of a coding and decoding system provided by an embodiment of the present application.
  • This application can be applied to the fields of image encoding and decoding, video encoding and decoding, hardware video encoding and decoding, dedicated circuit video encoding and decoding, real-time video encoding and decoding, etc.
  • the solution of this application can be combined with the audio and video coding standard (AVS for short), such as H.264/audio video coding (AVC for short) standard, H.265/high-efficiency video coding (AVS for short) high efficiency video coding (HEVC) standard and H.266/versatile video coding (VVC) standard.
  • AVC audio video coding
  • HEVC high efficiency video coding
  • VVC variatile video coding
  • the solution of this application can be operated in conjunction with other proprietary or industry standards, including ITU-TH.261, ISO/IECMPEG-1Visual, ITU-TH.262 or ISO/IECMPEG-2Visual, ITU-TH.263 , ISO/IECMPEG-4Visual, ITU-TH.264 (also known as ISO/IECMPEG-4AVC), including scalable video codec (SVC) and multi-view video codec (MVC) extensions.
  • SVC scalable video codec
  • MVC multi-view video codec
  • the high-degree-of-freedom immersive coding system can be roughly divided into the following links according to the task line: data collection, data organization and expression, data encoding and compression, data decoding and reconstruction, data synthesis and rendering, and finally presenting the target data to the user.
  • the encoding involved in the embodiment of the present application is mainly video encoding and decoding. To facilitate understanding, the video encoding and decoding system involved in the embodiment of the present application is first introduced with reference to Figure 1 .
  • Figure 1 is a schematic block diagram of a video encoding and decoding system related to an embodiment of the present application. It should be noted that Figure 1 is only an example, and the video encoding and decoding system in the embodiment of the present application includes but is not limited to what is shown in Figure 1 .
  • the video encoding and decoding system 100 includes an encoding device 110 and a decoding device 120 .
  • the encoding device is used to encode the video data (which can be understood as compression) to generate a code stream, and transmit the code stream to the decoding device.
  • the decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
  • the encoding device 110 in the embodiment of the present application can be understood as a device with a video encoding function
  • the decoding device 120 can be understood as a device with a video decoding function. That is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120. Examples include smartphones, desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, and the like.
  • the encoding device 110 may transmit the encoded video data (eg, code stream) to the decoding device 120 via the channel 130 .
  • Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120 .
  • channel 130 includes one or more communication media that enables encoding device 110 to transmit encoded video data directly to decoding device 120 in real time.
  • encoding device 110 may modulate the encoded video data according to the communication standard and transmit the modulated video data to decoding device 120.
  • the communication media includes wireless communication media, such as radio frequency spectrum.
  • the communication media may also include wired communication media, such as one or more physical transmission lines.
  • channel 130 includes a storage medium that can store video data encoded by encoding device 110 .
  • Storage media include a variety of local access data storage media, such as optical disks, DVDs, flash memories, etc.
  • the decoding device 120 may obtain the encoded video data from the storage medium.
  • channel 130 may include a storage server that may store video data encoded by encoding device 110 .
  • the decoding device 120 may download the stored encoded video data from the storage server.
  • the storage server may store the encoded video data and may transmit the encoded video data to the decoding device 120, such as a web server (eg, for a website), a File Transfer Protocol (FTP) server, etc.
  • FTP File Transfer Protocol
  • the encoding device 110 includes a video encoder 112 and an output interface 113.
  • the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
  • the encoding device 110 may include a video source 111 in addition to the video encoder 112 and the input interface 113 .
  • Video source 111 may include at least one of a video capture device (eg, a video camera), a video archive, a video input interface for receiving video data from a video content provider, a computer graphics system Used to generate video data.
  • a video capture device eg, a video camera
  • a video archive e.g., a video archive
  • video input interface for receiving video data from a video content provider
  • computer graphics system Used to generate video data.
  • the video encoder 112 encodes the video data from the video source 111 to generate a code stream.
  • Video data may include one or more images (pictures) or sequence of pictures (sequence of pictures).
  • the code stream contains the encoding information of an image or image sequence in the form of a bit stream.
  • Encoded information may include encoded image data and associated data.
  • the associated data may include sequence parameter set (SPS), picture parameter set (PPS) and other syntax structures.
  • SPS sequence parameter set
  • PPS picture parameter set
  • An SPS can contain parameters that apply to one or more sequences.
  • a PPS can contain parameters that apply to one or more images.
  • a syntax structure refers to a collection of zero or more syntax elements arranged in a specified order in a code stream.
  • the video encoder 112 transmits the encoded video data directly to the decoding device 120 via the output interface 113 .
  • the encoded video data can also be stored on a storage medium or storage server for subsequent reading by the decoding device 120 .
  • decoding device 120 includes input interface 121 and video decoder 122. In some embodiments, in addition to the input interface 121 and the video decoder 122, the decoding device 120 may also include a display device 123.
  • the input interface 121 includes a receiver and/or a modem. Input interface 121 may receive encoded video data over channel 130.
  • the video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123.
  • the display device 123 displays the decoded video data.
  • Display device 123 may be integrated with decoding device 120 or external to decoding device 120 .
  • Display device 123 may include a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
  • LCD liquid crystal display
  • plasma display a plasma display
  • OLED organic light emitting diode
  • Figure 1 is only an example, and the technical solution of the embodiment of the present application is not limited to Figure 1.
  • the technology of the present application can also be applied to unilateral video encoding or unilateral video decoding.
  • FIG. 2A is a schematic block diagram of a video encoder related to an embodiment of the present application. It should be understood that the video encoder 200 can be used to perform lossy compression of images (lossy compression), or can also be used to perform lossless compression (lossless compression) of images.
  • the lossless compression can be visually lossless compression (visually lossless compression) or mathematically lossless compression (mathematically lossless compression).
  • the video encoder 200 can be applied to image data in a luminance-chrominance (YCbCr, YUV) format.
  • YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y represents brightness (Luma), Cb(U) represents blue chroma, Cr(V) represents red chroma, U and V represent Chroma, which is used to describe color and saturation.
  • 4:2:0 means that every 4 pixels have 4 luminance components and 2 chrominance components (YYYYCbCr)
  • 4:2:2 means that every 4 pixels have 4 luminance components and 4 Chroma component (YYYYCbCrCbCr)
  • 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
  • the video encoder 200 reads video data, and for each frame of image in the video data, divides one frame of image into several coding tree units (coding tree units, CTU).
  • CTB may be called “Tree block", “Largest Coding unit” (LCU for short) or “coding tree block” (CTB for short).
  • LCU Large Coding unit
  • CTB coding tree block
  • Each CTU can be associated with an equal-sized block of pixels within the image.
  • Each pixel can correspond to one luminance (luminance or luma) sample and two chrominance (chrominance or chroma) samples. Therefore, each CTU can be associated with one block of luma samples and two blocks of chroma samples.
  • a CTU size is, for example, 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32, etc.
  • a CTU can be further divided into several coding units (Coding Units, CUs) for encoding.
  • CUs can be rectangular blocks or square blocks.
  • CU can be further divided into prediction unit (PU for short) and transform unit (TU for short), thus enabling coding, prediction, and transformation to be separated and processing more flexible.
  • the CTU is divided into CUs in a quad-tree manner, and the CU is divided into TUs and PUs in a quad-tree manner.
  • Video encoders and video decoders can support various PU sizes. Assuming that the size of a specific CU is 2N ⁇ 2N, the video encoder and video decoder can support a PU size of 2N ⁇ 2N or N ⁇ N for intra prediction, and support 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, N ⁇ N or similar sized symmetric PU for inter prediction. The video encoder and video decoder can also support 2N ⁇ nU, 2N ⁇ nD, nL ⁇ 2N and nR ⁇ 2N asymmetric PUs for inter prediction.
  • the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, and a loop filtering unit. 260. Decode the image cache 270 and the entropy encoding unit 280. It should be noted that the video encoder 200 may include more, less, or different functional components.
  • the current block may be called the current coding unit (CU) or the current prediction unit (PU), etc.
  • the prediction block may also be called a predicted image block or an image prediction block
  • the reconstructed image block may also be called a reconstruction block or an image reconstructed image block.
  • prediction unit 210 includes inter prediction unit 211 and intra estimation unit 212. Since there is a strong correlation between adjacent pixels in a video frame, the intra-frame prediction method is used in video encoding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Since there is a strong similarity between adjacent frames in the video, the interframe prediction method is used in video coding and decoding technology to eliminate the temporal redundancy between adjacent frames, thereby improving coding efficiency.
  • the inter-frame prediction unit 211 can be used for inter-frame prediction.
  • Inter-frame prediction can include motion estimation (motion estimation) and motion compensation (motion compensation). It can refer to image information of different frames.
  • Inter-frame prediction uses motion information to find a reference from a reference frame. block, a prediction block is generated based on the reference block to eliminate temporal redundancy; the frames used in inter-frame prediction can be P frames and/or B frames, P frames refer to forward prediction frames, and B frames refer to bidirectional predictions frame.
  • Inter-frame prediction uses motion information to find reference blocks from reference frames and generate prediction blocks based on the reference blocks.
  • the motion information includes the reference frame list where the reference frame is located, the reference frame index, and the motion vector.
  • the motion vector can be in whole pixels or sub-pixels.
  • the reference frame found according to the motion vector is A block of whole pixels or sub-pixels is called a reference block.
  • Some technologies will directly use the reference block as a prediction block, and some technologies will process the reference block to generate a prediction block. Reprocessing to generate a prediction block based on a reference block can also be understood as using the reference block as a prediction block and then processing to generate a new prediction block based on the prediction block.
  • the intra-frame estimation unit 212 only refers to the information of the same frame image and predicts the pixel information in the current coded image block to eliminate spatial redundancy.
  • the frames used in intra prediction may be I frames.
  • Intra-frame prediction has multiple prediction modes. Taking the international digital video coding standard H series as an example, the H.264/AVC standard has 8 angle prediction modes and 1 non-angle prediction mode, and H.265/HEVC has been extended to 33 angles. prediction mode and 2 non-angle prediction modes.
  • the intra-frame prediction modes used by HEVC include planar mode (Planar), DC and 33 angle modes, for a total of 35 prediction modes.
  • the intra-frame modes used by VVC include Planar, DC and 65 angle modes, for a total of 67 prediction modes.
  • Residual unit 220 may generate a residual block of the CU based on the pixel block of the CU and the prediction block of the PU of the CU. For example, residual unit 220 may generate a residual block of a CU such that each sample in the residual block has a value equal to the difference between the sample in the pixel block of the CU and the PU of the CU. Predict the corresponding sample in the block.
  • Transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with the TU of the CU based on quantization parameter (QP) values associated with the CU. Video encoder 200 may adjust the degree of quantization applied to transform coefficients associated with the CU by adjusting the QP value associated with the CU.
  • QP quantization parameter
  • Inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct the residual block from the quantized transform coefficients.
  • Reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by prediction unit 210 to produce a reconstructed image block associated with the TU. By reconstructing blocks of samples for each TU of a CU in this manner, video encoder 200 can reconstruct blocks of pixels of the CU.
  • the loop filtering unit 260 is used to process the inversely transformed and inversely quantized pixels to compensate for distortion information and provide a better reference for subsequent encoding of pixels. For example, a deblocking filtering operation can be performed to reduce the number of pixel blocks associated with the CU. block effect.
  • the loop filtering unit 260 includes a deblocking filtering unit and a sample adaptive compensation/adaptive loop filtering (SAO/ALF) unit, where the deblocking filtering unit is used to remove blocking effects, and the SAO/ALF unit Used to remove ringing effects.
  • SAO/ALF sample adaptive compensation/adaptive loop filtering
  • Decoded image cache 270 may store reconstructed pixel blocks.
  • Inter prediction unit 211 may perform inter prediction on PUs of other images using reference images containing reconstructed pixel blocks.
  • intra estimation unit 212 may use the reconstructed pixel blocks in decoded image cache 270 to perform intra prediction on other PUs in the same image as the CU.
  • Entropy encoding unit 280 may receive the quantized transform coefficients from transform/quantization unit 230 . Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.
  • FIG. 2B is a schematic block diagram of a video decoder related to an embodiment of the present application.
  • the video decoder 300 includes an entropy decoding unit 310 , a prediction unit 320 , an inverse quantization/transformation unit 330 , a reconstruction unit 340 , a loop filtering unit 350 and a decoded image cache 360 . It should be noted that the video decoder 300 may include more, less, or different functional components.
  • Video decoder 300 can receive the code stream.
  • Entropy decoding unit 310 may parse the codestream to extract syntax elements from the codestream. As part of parsing the code stream, the entropy decoding unit 310 may parse entropy-encoded syntax elements in the code stream.
  • the prediction unit 320, the inverse quantization/transformation unit 330, the reconstruction unit 340 and the loop filtering unit 350 may decode the video data according to the syntax elements extracted from the code stream, that is, generate decoded video data.
  • prediction unit 320 includes inter prediction unit 321 and intra estimation unit 322.
  • Intra estimation unit 322 may perform intra prediction to generate predicted blocks for the PU. Intra estimation unit 322 may use an intra prediction mode to generate predicted blocks for a PU based on pixel blocks of spatially neighboring PUs. Intra estimation unit 322 may also determine the intra prediction mode of the PU based on one or more syntax elements parsed from the codestream.
  • the inter prediction unit 321 may construct a first reference image list (List 0) and a second reference image list (List 1) according to syntax elements parsed from the code stream. Additionally, if the PU uses inter-prediction encoding, entropy decoding unit 310 may parse the motion information of the PU. Inter prediction unit 321 may determine one or more reference blocks for the PU based on the motion information of the PU. Inter prediction unit 321 may generate a predictive block for the PU based on one or more reference blocks of the PU.
  • Inverse quantization/transform unit 330 may inversely quantize (ie, dequantize) transform coefficients associated with a TU. Inverse quantization/transform unit 330 may use the QP value associated with the CU of the TU to determine the degree of quantization.
  • inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients to produce a residual block associated with the TU.
  • Reconstruction unit 340 uses the residual blocks associated with the TU of the CU and the prediction blocks of the PU of the CU to reconstruct the pixel blocks of the CU. For example, reconstruction unit 340 may add samples of the residual block to corresponding samples of the prediction block to reconstruct the pixel block of the CU to obtain a reconstructed image block.
  • Loop filtering unit 350 may perform deblocking filtering operations to reduce blocking artifacts for blocks of pixels associated with the CU.
  • Video decoder 300 may store the reconstructed image of the CU in decoded image cache 360 .
  • the video decoder 300 may use the reconstructed image in the decoded image cache 360 as a reference image for subsequent prediction, or transmit the reconstructed image to a display device for presentation.
  • the basic process of video encoding and decoding is as follows: at the encoding end, an image frame is divided into blocks.
  • the prediction unit 210 uses intra prediction or inter prediction to generate a prediction block of the current block.
  • the residual unit 220 may calculate a residual block based on the prediction block and the original block of the current block, that is, the difference between the prediction block and the original block of the current block.
  • the residual block may also be called residual information.
  • the residual block undergoes transformation and quantization processes such as transformation/quantization unit 230 to remove information that is insensitive to human eyes to eliminate visual redundancy.
  • the residual block before transformation and quantization by the transformation/quantization unit 230 may be called a time domain residual block, and the time domain residual block after transformation and quantization by the transformation/quantization unit 230 may be called a frequency residual block. or frequency domain residual block.
  • the entropy encoding unit 280 receives the quantized change coefficient output from the change quantization unit 230, and may perform entropy encoding on the quantized change coefficient to output a code stream. For example, the entropy encoding unit 280 may eliminate character redundancy according to the target context model and probability information of the binary code stream.
  • the entropy decoding unit 310 can parse the code stream to obtain the prediction information, quantization coefficient matrix, etc. of the current block.
  • the prediction unit 320 uses intra prediction or inter prediction for the current block based on the prediction information to generate a prediction block of the current block.
  • the inverse quantization/transform unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block.
  • the reconstruction unit 340 adds the prediction block and the residual block to obtain a reconstruction block.
  • the reconstructed blocks constitute a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the blocks to obtain a decoded image.
  • the encoding end also needs similar operations as the decoding end to obtain the decoded image.
  • the decoded image may also be called a reconstructed image, and the reconstructed image may be used as a reference frame for inter-frame prediction for
  • the block division information determined by the encoding end as well as mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc., are carried in the code stream when necessary.
  • the decoding end determines the same block division information as the encoding end by parsing the code stream and analyzing the existing information, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, thereby ensuring the decoded image obtained by the encoding end It is the same as the decoded image obtained by the decoding end.
  • the current encoding and decoding methods include at least the following two:
  • Method 1 For multi-viewpoint videos, MPEG (Moving Picture Experts Group) immersive video (MIV) technology is used for encoding and decoding, and for point clouds, point cloud video compression (Video based Point Cloud) is used. Compression (VPCC for short) technology for encoding and decoding.
  • MPEG Motion Picture Experts Group
  • MIV Motion Picture Experts Group
  • point cloud video compression Video based Point Cloud
  • Compression VPCC for short
  • FIG. 3A In order to reduce the transmission pixel rate while retaining scene information as much as possible to ensure that there is enough information for rendering the target view, the scheme adopted by MPEG-I is shown in Figure 3A.
  • a limited number of viewpoints are selected as the basic viewpoints and as much as possible
  • the base viewpoint is transmitted as a complete image, and the redundant pixels between the remaining non-base viewpoints and the base viewpoint are removed, that is, only the effective information of non-repeated expressions is retained, and then the effective information is extracted into sub-block images and base views
  • the viewpoint image is reorganized to form a larger rectangular image, which is called a spliced image.
  • Figure 3A and Figure 3B give a schematic process of generating a spliced image.
  • the spliced image is sent to the codec for compression and reconstruction, and the auxiliary data related to the sub-block image splicing information is also sent to the encoder to form a code stream.
  • the encoding method of VPCC is to project point clouds into two-dimensional images or videos, and convert three-dimensional information into two-dimensional information encoding.
  • Figure 3C is the coding block diagram of VPCC.
  • the code stream is roughly divided into four parts.
  • the geometric code stream is the code stream generated by geometric depth map encoding, which is used to represent the geometric information of the point cloud;
  • the attribute code stream is the code stream generated by texture map encoding. , used to represent the attribute information of the point cloud;
  • the occupancy code stream is the code stream generated by the occupancy map encoding, which is used to indicate the effective area in the depth map and texture map;
  • These three types of videos all use video encoders for encoding and decoding.
  • the auxiliary information code stream is the code stream generated by encoding the auxiliary information of the sub-block image, which is the part related to the patch data unit in the V3C standard, indicating the position and size of each sub-block image.
  • Method 2 Multi-viewpoint videos and point clouds are encoded and decoded using the frame packing technology in Visual Volumetric Video-based Coding (V3C).
  • V3C Visual Volumetric Video-based Coding
  • the encoding end includes the following steps:
  • Step 1 When encoding the acquired multi-view video, perform some pre-processing to generate multi-view video sub-blocks (patch). Then, organize the multi-view video sub-blocks to generate a multi-view video splicing image.
  • multi-viewpoint videos are input into TIMV for packaging, and a multi-viewpoint video splicing image is output.
  • TIMV is a reference software for MIV.
  • Packaging in the embodiment of this application can be understood as splicing.
  • the multi-viewpoint video mosaic includes a multi-view video texture mosaic and a multi-view video geometry mosaic, that is, it only contains multi-view video sub-blocks.
  • Step 2 Input the multi-viewpoint video splicing image into the frame packer and output the multi-viewpoint video mixed splicing image.
  • the multi-viewpoint video hybrid splicing image includes a multi-viewpoint video texture blending splicing image, a multi-viewpoint video geometry blending splicing image, and a multi-viewpoint video texture and geometry blending splicing image.
  • the multi-viewpoint video splicing image is frame packed to generate a multi-viewpoint video hybrid splicing image.
  • Each multi-viewpoint video splicing image occupies a region of the multi-viewpoint video hybrid splicing image.
  • a flag pin_region_type_id_minus2 must be transmitted for each region in the code stream. This flag records the information whether the current area belongs to a multi-viewpoint video texture splicing map or a multi-viewpoint video geometric splicing map. This information needs to be used at the decoding end.
  • Step 3 Use a video encoder to encode the multi-viewpoint video mixed splicing image to obtain a code stream.
  • the decoding end includes the following steps:
  • Step 1 During multi-viewpoint video decoding, input the obtained code stream into the video decoder for decoding to obtain a reconstructed multi-viewpoint video mixed splicing image.
  • Step 2 Input the reconstructed multi-viewpoint video mixed splicing image into the frame depacker and output the reconstructed multi-viewpoint video splicing image.
  • the flag pin_region_type_id_minus2 is obtained from the code stream. If it is determined that the pin_region_type_id_minus2 is V3C_AVD, it means that the current region is a multi-viewpoint video texture mosaic, and then the current region is split and output as a reconstructed multi-viewpoint video texture mosaic.
  • pin_region_type_id_minus2 is V3C_GVD, it means that the current region is a multi-viewpoint video geometric mosaic, and the current region is split and output as a reconstructed multi-viewpoint video geometric mosaic.
  • Step 3 Decode the reconstructed multi-viewpoint video splicing image to obtain the reconstructed multi-viewpoint video.
  • the multi-viewpoint video texture splicing image and the multi-viewpoint video geometric splicing image are decoded to obtain the reconstructed multi-viewpoint video.
  • the above uses multi-viewpoint video as an example to analyze and introduce frame packing technology.
  • the frame packing encoding and decoding method for point clouds is basically the same as the above-mentioned multi-viewpoint video. You can refer to it.
  • TMC a VPCC reference software
  • the cloud is packaged to obtain a point cloud splicing image.
  • the point cloud splicing image is input into the frame packer for frame packaging to obtain a point cloud hybrid splicing image.
  • the point cloud hybrid splicing image is spliced to obtain a point cloud code stream. I will not go into details here. .
  • V3C unit header syntax is shown in Table 1:
  • V3C unit header semantics as shown in Table 2:
  • the visual media content with multiple different expression formats will be encoded and decoded separately.
  • the current packaging technology is to compress the point cloud to form a point cloud compression code stream (i.e. a V3C code stream), and to compress the multi-viewpoint video Information is compressed to obtain a multi-view video compressed code stream (i.e. another V3C code stream), and then the system layer multiplexes the compressed code stream to obtain a fused three-dimensional scene multiplexed code stream.
  • the point cloud compression code stream and the multi-viewpoint video compression code stream are decoded separately. It can be seen from this that when encoding and decoding visual media content in multiple different expression formats, the existing technology uses many codecs and the encoding and decoding cost is high.
  • the embodiments of the present application splice homogeneous blocks with different expression formats into a heterogeneous mixed splicing diagram, and splice homogeneous blocks with the same expression format into a homogeneous splicing diagram.
  • the resulting Heterogeneous hybrid splicing images and/or homogeneous splicing images are encoded and written into the code stream.
  • Homogeneous splicing images (such as at least one of multi-viewpoint splicing images, point cloud splicing images and grid splicing images) can coexist in the code stream.
  • heterogeneous hybrid splicing images to expand the application scenarios of encoding and decoding methods.
  • the splicing picture information includes a first syntax element used to indicate the type of the splicing picture, which can improve the decoding efficiency of the splicing picture at the decoding end.
  • Figure 6 is a schematic flow chart of the encoding method provided by the embodiment of the present application. As shown in Figure 6, the encoding method includes:
  • Step 601 Process the visual media content of at least one expression format to obtain at least one isomorphic block, where different types of isomorphic blocks correspond to different visual media content expression formats;
  • Visual expressions with different expression formats may appear in the same scene.
  • Media objects for example, exist in the same three-dimensional scene.
  • the scene background and some characters and objects are expressed in video, and another part of the characters are expressed in three-dimensional point cloud or three-dimensional grid.
  • the visual media content includes visual media content in at least one expression format such as multi-view video, point cloud, and grid.
  • multi-viewpoint video is single-viewpoint video, that is, the multi-viewpoint video may include multiple viewpoint videos and/or single-viewpoint video.
  • one isomorphic block corresponds to one expression format.
  • the expression format corresponding to at least one isomorphic block includes at least one of the following: multi-view video, point cloud, and grid.
  • At least two isomorphic blocks correspond to at least two different expression formats.
  • the at least two isomorphic blocks in the embodiment of the present application include isomorphic areas of at least two different expression formats such as multi-view video, point cloud, grid, etc. piece.
  • each isomorphic block may include at least one isomorphic block with the same expression format.
  • a homogeneous block in point cloud format includes one or more point cloud blocks
  • a homogeneous area in multi-viewpoint video format includes one or more multi-viewpoint video blocks
  • a homogeneous area in grid format includes one or more point cloud blocks.
  • a block consists of one or more grid blocks.
  • step 601 may be: processing visual media content in an expression format to obtain a homogeneous block.
  • step 601 may include: processing visual media content in at least two expression formats to obtain at least two isomorphic blocks, where different visual media content corresponds to different expression formats.
  • the visual media content in the first expression format is processed to obtain isomorphic blocks in the first expression format
  • the visual multimedia content in the second expression format is processed to obtain isomorphic blocks in the second expression format.
  • the first expression format is one of multi-view video, point cloud, and grid
  • the second expression format is one of multi-view video, point cloud, and grid
  • the first expression format and the second expression format are different expressions. Format.
  • the above-mentioned visual media content includes visual media content in at least one expression format such as multi-viewpoint video, point cloud, grid, etc.
  • the visual media content is processed to obtain isomorphic blocks of an expression format.
  • the visual media content of multiple expression formats is included, the visual media content is processed to obtain isomorphic blocks of multiple expression formats.
  • blocks can also be called tiles, that is, point cloud blocks can also be called point cloud strips, multi-viewpoint video blocks can also be called multi-viewpoint video strips, and grid blocks Also called grid strips.
  • the block may be a mosaic of a specific shape, for example, a mosaic of a rectangular area with a specific length and/or height.
  • at least one sub-tile can be spliced in an orderly manner, such as from large to small according to the area of the sub-tiles, or from large to small according to the length and/or height of the sub-tiles, to obtain the visual media content corresponding to block.
  • a tile can be mapped exactly to an atlas tile.
  • each sub-tile in a block may have a patch ID (patchID) to distinguish different sub-tiles in the same block.
  • patchID patch ID
  • the same block may include sub-patch 1 (patch1), sub-patch 2 (patch2), and sub-patch 3 (patch3).
  • each sub-block in the isomorphic block is a multi-view video sub-block, or is a point cloud sub-block, etc.
  • a subtile for the expression format is the expression format corresponding to the isomorphic block.
  • homogeneous tiles may have tile identifiers (tileIDs) to distinguish different tiles of the same expression format.
  • the point cloud block may include point cloud block 1 or point cloud block 2.
  • multiple visual media contents include point clouds and multi-viewpoint videos.
  • the point clouds are processed to obtain point cloud blocks.
  • Point cloud block 1 includes point cloud sub-blocks 1 to 3; for multi-view points
  • the video is processed to obtain a multi-viewpoint video block, which includes multi-viewpoint video sub-blocks 1 to 4.
  • a homogeneous block of the expression format is obtained.
  • at least two visual media contents need to be processed, at least two isomorphic blocks of expression formats are obtained.
  • embodiments of the present application process the at least two visual media contents, such as packaging (also called splicing) processing, to obtain blocks corresponding to each visual media content in the at least two visual media contents.
  • the block can be obtained by splicing sub-tiles (patches) corresponding to at least two visual media contents. It should be noted that the embodiment of the present application processes at least two visual media contents separately, and the method of obtaining blocks is not limited.
  • the visual media content includes visual media content in two expression formats: multi-view video and point cloud.
  • the visual media content in at least one expression format is processed to obtain at least one isomorphic region. block, including: after projecting and de-redundant processing of the acquired multi-viewpoint video, connecting non-repeating pixel points into video sub-blocks, and splicing the video sub-blocks into multi-viewpoint video blocks; and processing the acquired points
  • the cloud performs parallel projection, and the connected points in the projection surface are composed of point cloud sub-blocks, and the point cloud sub-blocks are spliced into point cloud blocks.
  • a limited number of viewpoints are selected as base viewpoints and express the visible range of the scene as much as possible.
  • the base viewpoints are transmitted as complete images, and the gaps between the remaining non-base viewpoints and the base viewpoints are removed. Redundant pixels, that is, only the effective information of non-repeated expressions is retained, and then the effective information is extracted into sub-block images and basic viewpoint images and reorganized to form a larger strip-shaped image.
  • This strip-shaped image is called a multi-viewpoint video block.
  • the above-mentioned visual media content is media content presented simultaneously in the same three-dimensional space. In some embodiments, the visual media content is media content presented at different times in the same three-dimensional space. In some embodiments, the above-mentioned visual media content may also be media content in different three-dimensional spaces. That is to say, in the embodiments of this application, there are no specific restrictions on the at least two visual media contents mentioned above.
  • Step 602 Splice the at least one isomorphic block to obtain at least one splicing graph and splicing graph information, wherein the splicing graph information includes a first syntax element, and the splicing is determined according to the first syntax element.
  • the picture shows a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram.
  • the heterogeneous hybrid splicing diagram includes at least two types of isomorphic blocks, and the isomorphic splicing diagram includes one type of isomorphic block;
  • the splicing of the at least one homogeneous block to obtain at least one splicing diagram and splicing diagram information includes: heterogeneously splicing homogeneous blocks of at least two expression formats to generate Heterogeneous mixed splicing diagrams and splicing diagram information; isomorphic splicing of homogeneous blocks with the same expression format to generate isomorphic splicing diagrams and splicing diagram information.
  • At least one isomorphic block includes a isomorphic block in a first expression format and a isomorphic block in a second expression format.
  • the method specifically includes: isomorphically splicing the isomorphic blocks of the first expression format to obtain the first isomorphic splicing diagram and splicing diagram information, and isomorphically splicing the isomorphic blocks of the second expression format to obtain the second isomorphic splicing diagram.
  • Homogeneous splicing diagram and splicing diagram information or, performing heterogeneous splicing on the isomorphic blocks of the first expression format and the isomorphic blocks of the second expression format to obtain heterogeneous mixed splicing diagram and splicing diagram information; or, perform heterogeneous splicing on The isomorphic blocks in the first expression format are isomorphically spliced to obtain the first isomorphic splicing diagram and the splicing diagram information, and the isomorphic blocks in the first expression format and the isomorphic blocks in the second expression format are heterogeneously spliced.
  • Blocks are heterogeneously spliced with homogeneous blocks in the second expression format to obtain heterogeneous mixed splicing images and splicing image information.
  • the homogeneous splicing diagram may include one isomorphic block or multiple isomorphic blocks of the same expression format, and the heterogeneous mixed splicing diagram includes at least two isomorphic blocks of at least two expression formats.
  • the first expression format is one of multi-view video, point cloud, and grid
  • the second expression format is one of multi-view video, point cloud, and grid
  • the first expression format and the third expression format are one of multi-view video, point cloud, and grid.
  • the two expression formats are different expression formats. As shown in Figure 7, multi-viewpoint video block 1, multi-viewpoint video block 2 and point cloud block 1 are spliced to obtain a heterogeneous hybrid stitching image.
  • the first expression format is multi-viewpoint video
  • the second expression format is point cloud.
  • the splicing of the at least one homogeneous block to obtain at least one spliced image and spliced image information includes: splicing a part of the multi-viewpoint video block and a part of the point cloud block into a heterogeneous hybrid spliced image; Part of the multi-viewpoint video blocks are spliced into a multi-viewpoint spliced image; another part of the point cloud blocks are spliced into a point cloud spliced image.
  • the mosaic image information includes a first syntax element, and the first syntax element is used to indicate that the mosaic image is a heterogeneous hybrid mosaic image or a homogeneous mosaic image.
  • determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram includes: if the first syntax element is a first preset value, then determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram.
  • the figure shows a heterogeneous mixed splicing diagram including homogeneous blocks of a first expression format and a second expression format, wherein the first expression format and the second expression format are different expression formats; the first syntax element is the second preset value, then it is determined that the splicing diagram is a isomorphic splicing diagram including the isomorphic blocks of the first expression format; the first syntax element is the third preset value, then it is determined that the splicing
  • the figure shows a isomorphic mosaic diagram including the isomorphic blocks of the second expression format. That is to say, by setting different values for the first syntax element, it is used to indicate the mosaic type.
  • the first syntax element can also be set to other values to indicate that the spliced graph is a isomorphic spliced graph that includes isomorphic blocks of other expression formats, or to indicate that the spliced graph includes at least two other expressions. Heterogeneous mosaic graph of homogeneous blocks in format.
  • the first syntax element includes at least two sub-syntax elements.
  • the first syntax element includes: a first sub-grammar element and a second sub-grammar element. According to the first sub-grammar element and the second sub-grammar element, it is determined that the splicing diagram is heterogeneous hybrid splicing.
  • the splicing graph is a heterogeneous hybrid splicing graph or a homogeneous splicing graph according to the first syntax element includes: the first sub-grammar element is a fourth preset value, then it is determined that the splicing graph is a heterogeneous hybrid splicing graph or a isomorphic splicing graph. If the spliced graph includes isomorphic blocks of the first expression format; if the second sub-syntax element is the fifth preset value, it is determined that the spliced graph includes isomorphic blocks of the second expression format.
  • the spliced graph includes isomorphic blocks of the first expression format, that is, it is determined that the spliced graph includes the first expression format.
  • the spliced diagram includes homogeneous blocks in the first expression format and isomorphic blocks in the second expression format, that is, the spliced diagram is determined to be a heterogeneous hybrid splicing including homogeneous blocks in the first expression format and the second expression format. picture.
  • the method further includes: when the first sub-grammar element is a sixth preset value, it is determined that the splicing diagram does not include isomorphic blocks of the first expression format; the second sub-grammar element If the element is the seventh preset value, it is determined that the mosaic image does not include isomorphic blocks in the second expression format.
  • the first sub-grammar element is a fourth preset value and the second sub-grammar element is a fifth preset value
  • the mosaic diagram includes isomorphic blocks of the first expression format and the second A heterogeneous mixed mosaic diagram of homogeneous blocks in an expression format
  • the first sub-grammar element is a fourth preset value and the second sub-grammar element is a seventh preset value
  • the mosaic diagram includes all
  • the first sub-grammar element is the sixth preset value and the second sub-grammar element is the fifth preset value
  • the mosaic diagram is A isomorphic mosaic diagram including the isomorphic blocks of the second expression format.
  • the expression format of the isomorphic block in the splicing diagram can also be determined based on the values of the two sub-grammatical elements.
  • multiple syntax elements can also be used to indicate the expression formats of the isomorphic blocks in the splicing diagram. For example, when three expression formats are included, three syntax elements are set, and when four expression formats are included, four syntax elements are set. Multiple values can also be set through one syntax element to represent multiple expression formats.
  • the first syntax element is located in a parameter set of the code stream.
  • the parameter set of the code stream may be V3C_VPS
  • the first syntax element may be ptl_profile_toolset_idc in V3C_VPS.
  • the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element.
  • the splicing graph sequence parameter set corresponding to the splicing graph includes the first sub-syntax element and the second sub-syntax element.
  • the first sub-syntax element is asps_vpcc_extension_present_flag in the splicing diagram sequence parameter set
  • the second sub-syntax element is asps_miv_extension_present_flag.
  • the first syntax element can be located in the parameter set of the code stream, and the decoding end can parse the splicing pattern type of each splicing pattern earlier.
  • the first syntax element may also be located in the mosaic sequence parameter set corresponding to each mosaic image, and the decoding end obtains and then determines the mosaic image type when parsing each mosaic image.
  • the heterogeneous hybrid mosaic graph of the embodiment of the present application includes at least one of the following: a single attribute heterogeneous hybrid mosaic graph and a multi-attribute heterogeneous hybrid mosaic graph.
  • the single-attribute heterogeneous hybrid splicing diagram refers to the heterogeneous hybrid splicing diagram in which the attribute information of all homogeneous blocks included is the same.
  • a single attribute heterogeneous hybrid mosaic image only includes homogeneous blocks of attribute information, such as only multi-view video texture blocks and point cloud texture blocks.
  • a single-attribute heterogeneous hybrid mosaic image only includes homogeneous blocks of geometric information, such as only multi-view video geometry blocks and point cloud geometry blocks.
  • a multi-attribute heterogeneous hybrid mosaic map refers to a heterogeneous hybrid mosaic map that includes at least two homogeneous blocks with different attribute information.
  • a multi-attribute heterogeneous hybrid mosaic map includes both homogeneous blocks with attribute information. Also includes isomorphic blocks of geometric information.
  • any attribute or blocks under any two attributes of at least two of the point cloud, multi-viewpoint video and grid can be spliced into one image to obtain a heterogeneous hybrid spliced image. This application does not limit this.
  • the single-attribute homogeneous blocks in the first expression format and the single-attribute blocks in the second expression format are spliced to obtain a heterogeneous hybrid spliced image.
  • the first expression format and the second expression format are any one of multi-view video, point cloud, and grid, and the first expression format and the second expression format are different.
  • the first expression format and the second expression format The attribute information is the same.
  • the single attribute isomorphic block of the multi-view video includes at least one of a multi-view video texture block, a multi-view video geometry block, and the like.
  • the single attribute isomorphic block of the point cloud includes at least one of a point cloud texture block, a point cloud geometry block, a point cloud occupancy block, and the like.
  • the single attribute isomorphic block of the grid includes at least one of a grid texture block and a grid geometry block.
  • At least two of the multi-viewpoint video geometry blocks, point cloud geometry blocks, and grid geometry blocks are spliced into one image to obtain a heterogeneous hybrid spliced image.
  • This heterogeneous mixed mosaic diagram is called a single attribute heterogeneous mixed mosaic diagram.
  • at least two of the multi-viewpoint video texture blocks, point cloud texture blocks, and grid texture blocks are spliced into one image to obtain a heterogeneous hybrid spliced image.
  • This heterogeneous mixed mosaic diagram is called a single attribute heterogeneous mixed mosaic diagram.
  • the multi-attribute isomorphic blocks in the first expression format and the multi-attribute isomorphic blocks in the second expression format are spliced to obtain a heterogeneous hybrid spliced image.
  • the first expression format and the second expression format are any one of multi-view video, point cloud, and grid, and the first expression format and the second expression format are different.
  • the first expression format and the second expression format The attribute information is not exactly the same.
  • the multi-viewpoint video texture block is spliced into one picture with at least one of the point cloud geometry block and the mesh geometry block to obtain a heterogeneous hybrid spliced picture.
  • a multi-viewpoint video geometry block is spliced into one picture with at least one of a point cloud texture block and a mesh texture block to obtain a heterogeneous hybrid spliced picture.
  • the point cloud texture block and at least one of the multi-viewpoint video geometry block and the mesh geometry block are spliced into one image to obtain a heterogeneous hybrid spliced image.
  • the point cloud geometry block is spliced into one picture with at least one of the multi-viewpoint video texture block and the mesh texture block to obtain a heterogeneous hybrid spliced picture.
  • point cloud geometry blocks, multi-viewpoint video texture blocks, and multi-viewpoint video texture blocks are spliced into one image to obtain a heterogeneous hybrid spliced image.
  • point cloud geometry blocks, point cloud texture blocks, multi-viewpoint video texture blocks, and multi-viewpoint video texture blocks are spliced into one image to obtain a heterogeneous hybrid spliced image.
  • the obtained heterogeneous hybrid mosaic graph is called a multi-attribute heterogeneous hybrid mosaic graph.
  • the following takes the first expression format as multi-viewpoint video and the second expression format as point cloud as an example to introduce the splicing method in detail.
  • the multi-view video block includes a multi-view video texture block and a multi-view video geometry block
  • the point cloud block includes a point cloud texture block, a point cloud geometry block and a point cloud occupancy block.
  • Method 1 Splice the multi-viewpoint video texture block, multi-viewpoint video geometry block, point cloud texture block, point cloud geometry block and point cloud occupancy block into a heterogeneous hybrid splicing image.
  • Method 2 According to the preset heterogeneous splicing method, splice the multi-view video texture block, multi-view video geometry block, point cloud texture block, point cloud geometry block and point cloud occupancy block to obtain M
  • M is a positive integer greater than or equal to 1.
  • the second method can include at least the following examples: Example 1, splicing multi-view video texture blocks and point cloud texture blocks to obtain a heterogeneous mixed texture splicing map, and combining multi-view video geometry blocks and point cloud geometry The blocks are spliced to obtain a heterogeneous mixed geometry splicing map, and the point cloud occupancy blocks are separately used as a mixed splicing map.
  • Example 2 Splice multi-view video texture blocks and point cloud texture blocks to obtain a heterogeneous mixed texture splicing map. Splice multi-view video geometry blocks, point cloud geometry blocks and point cloud occupancy blocks. A mosaic of heterogeneous mixed geometry and occupancy is obtained.
  • Example 3 Splice the multi-view video texture block, the point cloud texture block and the point cloud occupancy block to obtain a sub-heterogeneous hybrid stitching image, which combines the multi-view video geometry block and the point cloud geometry block. Perform splicing to obtain another sub-heterogeneous hybrid splicing picture. Further, after obtaining M heterogeneous mixed spliced images, video coding can be performed on the M heterogeneous mixed spliced images respectively to obtain video compression sub-streams.
  • the isomorphic splicing graph of the embodiment of the present application includes at least one of the following: a single attribute isomorphic splicing graph and a multi-attribute isomorphic splicing graph.
  • the first attribute isomorphic blocks of the first expression format are spliced to obtain an isomorphic splicing graph.
  • the first attribute isomorphic block and the second attribute isomorphic block of the first expression format are spliced to obtain an isomorphic splicing diagram.
  • a single-attribute isomorphic splicing diagram refers to a isomorphic splicing diagram in which all isomorphic blocks included have the same expression format and the same attribute information.
  • a single-attribute isomorphic mosaic image only includes isomorphic blocks that express attribute information in the first format.
  • a single-attribute isomorphic mosaic image only includes multi-view video texture blocks, or only point cloud texture blocks.
  • a single-attribute isomorphic mosaic image only includes isomorphic blocks of geometric information, such as only multi-view video geometric blocks, or only point cloud geometric blocks.
  • a multi-attribute isomorphic spliced graph refers to an isomorphic spliced graph that includes at least two isomorphic blocks with the same expression format but different attribute information.
  • a multi-attribute isomorphic spliced graph includes both isomorphic blocks with attribute information. , and also includes isomorphic blocks of geometric information.
  • a multi-attribute isomorphic mosaic image includes multi-viewpoint video texture blocks and multi-viewpoint video collection blocks.
  • a multi-attribute isomorphic mosaic image includes a point cloud geometry block and a point cloud texture block. As shown in Figure 8, a multi-attribute isomorphic mosaic image includes a point cloud texture block 1 and a point cloud geometry area. Block 1 and Point Cloud Geometry Block 2.
  • the spliced image information may also include syntax elements, according to which the spliced image is determined to be a single-attribute heterogeneous hybrid spliced image, a multi-attribute heterogeneous hybrid spliced image, a single-attribute isomorphic spliced image, or a multi-attribute homogeneous spliced image. Construct a mosaic diagram.
  • Step 603 Encode the at least one spliced image and the spliced image information to obtain a code stream.
  • the code stream includes a video compression sub-stream and a splicing image information sub-stream.
  • the encoding of the at least one spliced image and the spliced image information to obtain a code stream includes: encoding the at least one spliced image to obtain a video compression sub-stream; and encoding the spliced image information of the at least one spliced image. Encoding is performed to obtain a splicing image information sub-stream; the video compression sub-stream and the splicing image information sub-stream are synthesized into the code stream.
  • Hybrid splicing can reduce the number of 2D video encoders such as HEVC, VVC, AVC, and AVS that need to be called, reduce implementation costs, and improve ease of use.
  • the spliced image information when it is determined based on the first syntax element that the spliced image is a heterogeneous hybrid spliced image, the spliced image information also includes a second syntax element, and the spliced image is determined based on the second syntax element
  • the encoding end writes the second syntax element into the code stream, which can help improve the decoding accuracy of the decoding end, and at the same time enable the V3C standard to support visual media content in different expression formats such as multi-view videos and point clouds in the same compressed code stream. .
  • determining the expression format of the i-th block in the mosaic diagram based on the second syntax element includes: if the second syntax element is an eighth preset value, then determining the i-th block The expression format of the i-th block is the first expression format; if the second syntax element is the ninth preset value, it is determined that the expression format of the i-th block is the second expression format.
  • the expression format type corresponding to the i-th block in the splicing diagram can be indicated by setting different values to the second syntax element.
  • the second syntax element is set to the eighth default value; if the i-th block is a point cloud block, the second syntax element is set to the eighth default value; If the block is a multi-viewpoint video block, the second syntax element is set to the ninth default value.
  • the embodiments of this application do not limit the specific values of the eighth preset value and the ninth preset value.
  • the eighth preset value is 0.
  • the ninth default value is 1.
  • encoding the at least one spliced image and the spliced image information to obtain a code stream includes: if the expression format of the i-th block is a first expression format, determining the i-th block The sub-tiles in each block are encoded using the encoding standard corresponding to the first expression format to obtain a code stream corresponding to the visual media content of the first expression format; if the expression format of the i-th block is the In the second expression format, it is determined that the sub-tiles in the i-th block are encoded using the encoding standard corresponding to the second expression format, and a code stream corresponding to the visual media content of the second expression format is obtained.
  • the second syntax element is located in the mosaic block data unit header of the i-th block of the mosaic map. In some embodiments, the second syntax element may also be located in a sub-patch data unit (patch_data_unit). For example, on the premise that the second syntax element (ath_toolset_type) is known to be 1, it is determined that the current sub-tile is encoded using the multi-view video coding standard. On the premise that the second syntax element (ath_toolset_type) is known to be 0, it is determined that the current sub-tile is encoded using the point cloud encoding standard.
  • encoding the at least one spliced image and the spliced image information to obtain a code stream includes: calling a video encoder to encode the at least one spliced image to obtain a video compression sub-stream.
  • At least two visual media contents are first processed separately (that is, packaged) to obtain multiple isomorphic blocks.
  • at least two homogeneous blocks with different expression formats are spliced into a heterogeneous mixed spliced graph, and at least one homogeneous block with exactly the same expression format is spliced into a homogeneous spliced graph.
  • the isomorphic splicing image is encoded to obtain the video compression sub-stream.
  • the video encoder can be called only once for encoding, thereby reducing the number of 2D video encoders such as HEVC, VVC, AVC, and AVS that need to be called, reducing encoding costs and improving ease of use.
  • the video encoder used to perform video encoding on the heterogeneous hybrid splicing image and the homogeneous splicing image to obtain the video compression sub-stream can be the video encoder shown in Figure 2A above. That is to say, in the embodiment of the present application, the heterogeneous hybrid splicing image or the homogeneous splicing image is used as a frame image. Block division is first performed, and then intra-frame or inter-frame prediction is used to obtain the predicted value of the coding block. The predicted value of the coding block and The original values are subtracted to obtain the residual value. After transforming and quantizing the residual value, the video compression sub-stream is obtained.
  • the mosaic image information corresponding to each mosaic image is generated.
  • the spliced image information is encoded to obtain the spliced image information sub-stream.
  • the splicing diagram information includes a first syntax element used to indicate the type of the splicing diagram, and a second syntax element used to indicate the expression format of each isomorphic block in the splicing diagram.
  • the embodiments of the present application do not limit the method of encoding the spliced image information. For example, conventional data compression encoding methods such as equal-length encoding or variable-length encoding may be used for compression.
  • the video compression sub-stream and the splicing image information sub-stream are written in the same code stream to obtain the final code stream.
  • the embodiments of the present application not only support heterogeneous source formats such as video, point cloud, grid, etc., but also support homogeneous source formats in the same compressed code stream.
  • the method further includes: encoding the parameter set of the code stream to obtain a code stream parameter set sub-stream.
  • the encoding end synthesizes the video compression sub-stream, the splicing image information sub-stream and the parameter set sub-stream into a code stream.
  • the parameter set sub-code stream of the code stream includes a third syntax element, and the code stream corresponding to the visual media content including at least one expression format in the code stream is determined according to the third syntax element. That is to say, the encoding end sends the third syntax element to indicate whether the code stream contains visual media content in at least two expression formats at the same time.
  • the encoding end processes the visual media content in an expression format to obtain a homogeneous block. , splicing a kind of isomorphic blocks to obtain a isomorphic splicing graph.
  • the third syntax element indicates that the code stream includes code streams corresponding to visual media content in at least two expression formats, it can be understood that the encoding end obtains at least two isomorphic blocks for the visual media content in at least two expression formats. Two homogeneous blocks are spliced to obtain a homogeneous spliced image and/or a heterogeneous hybrid spliced image.
  • the method includes: isomorphically splicing the isomorphic blocks of the first expression format to obtain the first Isomorphic splicing diagram: perform isomorphic splicing on the isomorphic blocks of the second expression format to obtain the second isomorphic splicing diagram; or, perform isomorphic splicing on the isomorphic blocks of the first expression format and the isomorphic blocks of the second expression format.
  • the isomorphic blocks of the expression format are heterogeneously spliced to obtain a heterogeneous mixed splicing diagram; or the isomorphic blocks of the second expression format are isomorphically spliced to obtain a second isomorphic splicing diagram.
  • the homogeneous blocks and the homogeneous blocks in the second expression format are heterogeneously spliced to obtain a heterogeneous mixed splicing diagram.
  • setting the third syntax element to a different value indicates that the code stream includes a code stream corresponding to the visual media content of at least one expression format. That is to say, certain preset values of the third syntax element can indicate that the code stream includes code streams corresponding to visual media content in one or more expression formats.
  • determining the code stream corresponding to the visual media content including at least one expression format in the code stream according to the third syntax element includes: the third syntax element is a first value, and determining the The code stream simultaneously includes a code stream corresponding to the visual media content in the first expression format and a code stream corresponding to the visual media content in the second expression format; the third syntax element is a second value, which determines that the code stream includes all The code stream corresponding to the visual media content in the first expression format; the third syntax element is a third value, which determines that the code stream includes the code stream corresponding to the visual media content in the second expression format.
  • the parameter set of the code stream may be V3C_VPS
  • the third syntax element may be ptl_profile_toolset_idc in V3C_VPS.
  • the third syntax element when the third syntax element is set to a first value, the first value is used to indicate that the code stream also contains multi-view video codes. Streams and point cloud code streams.
  • the third syntax element when the third syntax element is set to the second value, the second value is used to indicate that the code stream only contains the point cloud code stream.
  • the third syntax element is set to a third value, and the third value is used to indicate that the code stream only contains a multi-view video code stream.
  • V3C_VPS in the existing V3C standard can be reused, and ptl_profile_toolset_idc is preconfigured with values such as 0/1, 64/65/66, 128/129/130/132/133/134 to indicate the current code stream
  • the code stream type included in .
  • the embodiment of the present application adds the value of the third syntax element in the parameter set to indicate that the code stream contains the code stream corresponding to the visual media content in that expression format, which can help improve
  • the decoding accuracy of the decoder also enables the V3C standard to support visual media content containing one or more expression formats such as multi-view videos, point clouds, grids, etc. in the same compressed code stream.
  • Table 3 shows an example of available toolset profile components (Available toolset profile components).
  • Table 3 provides a list of toolset profile components defined for V3C and their corresponding identification syntax element values, such as ptl_profile_toolset_idc and ptc_one_v3c_frame_only_flag. This definition may be used for this document only.
  • the syntax element ptl_profile_toolset_idc provides the main definition of the toolset profile.
  • Additional syntax elements such as ptc_one_v3c_frame_only_flag can specify additional characteristics or restrictions of the defined profile.
  • ptc_one_v3c_frame_only_flag can be used to support only a single V3C frame.
  • the parameter set of the code stream further includes a first syntax element, wherein the first syntax element is used to indicate the type of each mosaic picture, specifically to indicate that the mosaic picture is the heterogeneous hybrid mosaic picture or The isomorphic splicing diagram; writing the first syntax element into the parameter set of the code stream.
  • the first syntax element vps_toolset_type
  • V3C_VPS V3C_VPS
  • vps_toolset_type is used to determine whether each spliced image and its corresponding V3C unit should belong to a point cloud spliced image/multi-viewpoint spliced image/point cloud + multi-viewpoint heterogeneous mixture.
  • Mosaic diagram At the same time, in order to be compatible with previous standards, the following new syntax and semantics are implemented, as well as constraints on the old semantics.
  • the first syntax element is a first preset value, which determines that the mosaic graph is a heterogeneous hybrid mosaic graph including homogeneous blocks of the first expression format and the second expression format, wherein the first An expression format and the second expression format are different expression formats;
  • the first syntax element is a second preset value, which determines that the splicing diagram is a isomorphic splicing including isomorphic blocks of the first expression format Figure;
  • the first syntax element is a third preset value, which determines that the mosaic graph is a isomorphic mosaic graph including isomorphic blocks of the second expression format.
  • the first syntax element is a first preset value, and the first preset value is used to indicate that the spliced image includes point cloud blocks. and a heterogeneous hybrid spliced image of multi-viewpoint video blocks, the first syntax element is a second preset value, and the second preset value is used to indicate that the spliced image includes a homogeneous spliced image of multi-viewpoint video blocks (which may be called a multi-viewpoint video block).
  • viewpoint video mosaic the first syntax element is a third preset value, and the third preset value is used to indicate that the mosaic includes a isomorphic mosaic of point cloud blocks (which may be called a point cloud mosaic).
  • Table 4 shows the syntax of the general V3C parameter set (General V3C parameter set syntax).
  • the V3C parameter set has a new syntax element vps_toolset_type.
  • vps_toolset_type[j] can be used to represent the type of splicing diagram with index j.
  • the decoding end can obtain the vps_toolset_type from the V3C parameter set.
  • the vps_toolset_type it can quickly determine whether each stitched image and its corresponding V3C unit should belong to point cloud/multi-viewpoint/point cloud+ Multiple viewpoints to determine which coding method the spliced image should meet.
  • the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element.
  • the splicing graph sequence parameter set corresponding to the splicing graph includes the first sub-syntax element and the second sub-syntax element.
  • the first sub-syntax element and the second sub-syntax element are used to indicate a splicing diagram type, wherein the splicing diagram is the heterogeneous hybrid splicing diagram or the isomorphic splicing diagram.
  • the first sub-grammar element is a fourth preset value and the second sub-grammar element is a fifth preset value, and it is determined that the mosaic diagram includes the isomorphic block of the first expression format and the th
  • a heterogeneous hybrid splicing diagram of homogeneous blocks in two expression formats the first sub-grammar element is a fourth preset value and the second sub-grammar element is a seventh preset value, and it is determined that the splicing diagram includes The isomorphic mosaic diagram of the isomorphic blocks of the first expression format; the first sub-grammar element is the sixth preset value and the second sub-grammar element is the fifth preset value, determining the mosaic diagram It is a isomorphic mosaic diagram including the isomorphic blocks of the second expression format.
  • the first sub-grammar element is asps_vpcc_extension_present_flag in the splicing diagram sequence parameter set
  • the second sub-grammar element is asps_miv_extension_present_flag.
  • the NAL-ASPS of the V3C_AD code stream may contain asps_miv_extension_present_flag and asps_vpcc_extension_present_flag.
  • the first sub-syntax element and the second sub-syntax element are set to specific values to indicate that the spliced image is the heterogeneous hybrid spliced image or the isomorphic spliced image.
  • Table 5 shows the syntax of the general atlas sequence parameter set RBSP syntax.
  • the splicing map sequence parameter set can be understood as splicing map information.
  • the encoding end uses the syntax elements asps_vpcc_extension_present_flag and asps_miv_extension_present_flag in the splicing map sequence parameter set to represent The type of splicing image.
  • the encoding end can obtain these two syntax elements from the parameter set of the splicing image by parsing the code stream. Based on the values of these two syntax elements, it is determined that the splicing image should belong to point cloud/multi-view/point cloud+multi-view. This determines which encoding method requirements the spliced image should meet.
  • each isomorphic block is a multi-view mosaic.
  • the mosaic map block data unit header of the i-th block includes a second syntax element.
  • the mosaic map is a heterogeneous hybrid mosaic map
  • the information also includes a second syntax element, according to which the expression format of the i-th block in the spliced image is determined.
  • the embodiment of the present application sets a second syntax element to indicate the expression format of the i-th block in the heterogeneous hybrid splicing image, which can help improve the decoding accuracy of the decoder.
  • the V3C standard can support visual media content in different expression formats such as multi-view videos and point clouds in the same compressed code stream.
  • the second syntax element may be ath_toolset_type in the mosaic map tile data unit header (atlas_tile_header).
  • the second syntax element is the eighth preset value, and it is determined that the expression format of the i-th block is the first expression format; the second syntax element is the ninth preset value, and it is determined that the expression format of the i-th block is the first expression format.
  • the expression format of the i-th block is the second expression format.
  • Table 6 shows the Atlas tile header syntax.
  • the encoding end adds a new syntax element ath_toolset_type to the Atlas tile header syntax to indicate the block type.
  • the decoded code stream can be spliced.
  • the picture block data unit header syntax obtains ath_toolset_type to determine whether the current block belongs to multi-view video decoding or point cloud decoding.
  • the second syntax element may also be located in the sub-patch data unit (patch_data_unit).
  • the second syntax element (ath_toolset_type) is known to be 1
  • it is determined that the current sub-tile is encoded using the multi-viewpoint video encoding method.
  • the second syntax element (ath_toolset_type) is known to be 0
  • the current sub-tile is encoded using the point cloud encoding method.
  • the sub-patch data unit syntax can be shown in Table 7:
  • a vps_toolset_type[j] value of 1 indicates that the value of the syntax element of the toolset profile component of the atlas with index j should comply with ISO/IEC 23090-12 Table A-1-1 (i.e. The values specified in Table 8);
  • vps_toolset_type[j] 2 indicating that the value of the syntax element of the atlas toolset profile component with index j should comply with the values specified in ISO/IEC 23090-5 Table H-3, but vps_extension_present_flag, vps_packing_information_present_flag, vps_miv_extension_present_flag, Except for the values of vuh_unit_type, vps_atlas_count_minus1, their values should comply with the values specified in ISO/IEC 23090-12 Table A-1-1;
  • a vps_toolset_type[j] value of 3 indicates that the value of the syntax element of the atlas toolset grade component with index j should comply with the extended ISO/IEC 23090-12 Table A-1-2 (i.e. Table 9-1 and Table 9 -2); Table A-1-1 and Table A-1-2 respectively represent the relevant syntax restrictions of toolbox level components for multi-viewpoints and the toolbox level for heterogeneous data under the integrated code stream. Restrictions on component-related syntax.
  • a vps_toolset_type[j] value of 0 or any value from 4 to 7 indicates that the value is reserved for future use by ISO/IEC and should not appear in bitstreams conforming to this version of this document. Decoders conforming to this version of this document should ignore such reserved unit types. Allowed values for the syntax element value of the MIV toolset configuration file.
  • Ath_toolset_type indicates that the value of the syntax element of the tool set level component of the current tile should conform to the value specified in Table A-1 of the ISO/IEC 23090-12 extension.
  • the value range of ath_toolset_type should be between 0 and 1.
  • FIG. 9 is a schematic diagram of the V3C bitstream structure provided by the embodiment of the present application.
  • the V3C parameter set () (V3C_parameter_set()) of V3C_VPS can include ptl_profile_toolset_idc. If ptl_profile_toolset_idc is 128/129/130/132/133/134, it means that the current code stream also contains a point cloud code stream (such as VPCC basic or VPCC extended, etc.) and multi-view video streams (such as MIV main or MIV Extended or MIV Geometry Absent, etc.).
  • ptl_profile_toolset_idc is 128/129/130/132/133/134, it means that the current code stream also contains a point cloud code stream (such as VPCC basic or VPCC extended, etc.) and multi-view video streams (such as MIV main or MIV Extended or MIV Geometry Absent, etc.).
  • V3C parameter set () (V3C_parameter_set()) of V3C_VPS can include the first syntax element (vps_toolset_type).
  • vps_toolset_type is 1, which means that the current splicing diagram only exists
  • a value of 2 means that only point cloud blocks exist in the current spliced image
  • a value of 3 means that both multi-view point blocks and point cloud blocks exist in the current spliced image.
  • the splicing map sequence parameter set () (Atlas_sequence_parameter_set_rbsp()) in the NAL_ASPS in the atlas sub-bitstream () (Atlas_sub_bitstream()) of V3C_AD may include asps_vpcc_extension_present_flag and asps_miv_extension_present_flag.
  • ptl_profile_toolset_idc is 128/129/130/132/133/134
  • asps_vpcc_extension_present_flag X
  • asps_miv_extension_present_flag Y.
  • X is 0 and Y is 1, it means that the spliced image only contains multi-viewpoint video blocks; when X is 1 and Y is 0, it means that the spliced image only contains point cloud blocks; Blocks and point cloud blocks.
  • the ACL NAL unit type (ACL_NAL_unit_type) in Atlas_sub_bitstream() of V3C_AD includes splicing image information.
  • the mosaic map tile data unit (atlas_tile_data_unit()) may include ath_toolset_type. If ath_toolset_type is no (that is, 0), it means that the current block belongs to a point cloud block. If atdu_type_flag is yes (that is, 1), it means that the current block belongs to a multi-viewpoint video block.
  • the sub-patch information data () includes a sub-patch data unit (patch_data_unit). If ath_toolset_type is no (that is, 0), it means that the current sub-tile is implemented using the point cloud video encoding method. When ath_toolset_type is yes (that is, 1), it means that the current sub-tile is implemented using a multi-viewpoint video coding method.
  • the spliced image By obtaining the first syntax element of each spliced image, determining whether the spliced image includes both point cloud blocks and multi-viewpoint video blocks based on the first syntax element value, and determining that both point cloud blocks and multi-viewpoint video blocks exist in the spliced image.
  • the encoding method of the present application is introduced above by taking the encoding end as an example.
  • the video decoding method provided by the embodiment of the present application is described below by taking the decoding end as an example.
  • Figure 10 is a schematic flow chart of a decoding method provided by an embodiment of the present application. As shown in Figure 10, the decoding method in this embodiment of the present application includes:
  • Step 1001 Decode the code stream to obtain a spliced image and spliced image information, wherein the spliced image information includes a first syntax element, and the spliced image is determined to be a heterogeneous hybrid spliced image or a homogeneous spliced image according to the first syntax element;
  • determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram includes: if the first syntax element is a first preset value, then determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram.
  • the figure shows a heterogeneous mixed splicing diagram including homogeneous blocks of a first expression format and a second expression format, wherein the first expression format and the second expression format are different expression formats; the first syntax element is the second preset value, then it is determined that the splicing diagram is a isomorphic splicing diagram including the isomorphic blocks of the first expression format; the first syntax element is the third preset value, then it is determined that the splicing
  • the figure shows a isomorphic mosaic diagram including the isomorphic blocks of the second expression format.
  • the first syntax element includes: a first sub-syntax element and a second sub-syntax element, and it is determined that the splicing graph is heterogeneous according to the first sub-syntax element and the second sub-syntax element.
  • determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram further includes: determining that the first sub-grammar element is a sixth preset value. If the mosaic graph does not include the isomorphic blocks of the first expression format; if the second sub-syntax element is the seventh preset value, it is determined that the mosaic graph does not include the isomorphic blocks of the second expression format.
  • determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram according to the first syntax element includes: the first sub-grammar element is a fourth preset value and the second sub-grammar element is the fifth preset value, it is determined that the splicing diagram is a heterogeneous hybrid splicing diagram including homogeneous blocks of the first expression format and homogeneous blocks of the second expression format, and the first sub-grammar element is the fourth
  • the preset value and the second sub-grammar element is the seventh preset value, which determines that the mosaic diagram is a isomorphic mosaic diagram including isomorphic blocks of the first expression format; the first sub-grammar element is The sixth preset value and the second sub-syntax element are the fifth preset value, which determines that the mosaic graph is a isomorphic mosaic graph including isomorphic blocks of the second expression format.
  • the first syntax element is located in a parameter set sub-codestream of the codestream.
  • the mosaic map sequence parameter set corresponding to the mosaic map includes the first syntax element.
  • the at least one expression format includes: at least one of multi-view video, point cloud, and mesh.
  • the first expression format is one of multi-view video, point cloud and grid
  • the second expression format is one of multi-view video, point cloud and grid
  • the first expression format and the second expression format are different.
  • the code stream further includes a parameter set sub-code stream of the code stream, the parameter set sub-code stream of the code stream includes a third syntax element, and it is determined according to the third syntax element that the code stream includes A code stream corresponding to visual media content in at least one expression format.
  • the method further includes: decoding the parameter set sub-code stream of the code stream to obtain the parameter set of the code stream, and obtaining the third syntax element from the parameter set of the code stream.
  • the method further includes: the third syntax element is a first value, determining that the code stream includes both a code stream corresponding to the visual media content in the first expression format and a visual code stream in the second expression format.
  • the third syntax element is a second value, and it is determined that the code stream includes the code stream corresponding to the visual media content of the first expression format;
  • the third syntax element is a third value , determining the code stream corresponding to the visual media content including the second expression format in the code stream.
  • decoding the code stream to obtain at least one spliced image includes: determining according to the third syntax element that the code stream includes code streams corresponding to visual media content in at least two expression formats, decoding the The code stream obtains a heterogeneous hybrid splicing image.
  • decoding the code stream to obtain at least one spliced image includes: determining according to the third syntax element that the code stream includes code streams corresponding to visual media content in at least two expression formats, decoding the The code stream obtains isomorphic splicing images of at least two expression formats. That is to say, when the code stream includes code streams corresponding to visual media content in at least two expression formats, each expression format corresponds to a isomorphic splicing diagram.
  • decoding the code stream to obtain at least one spliced image includes: determining according to the third syntax element that the code stream includes code streams corresponding to visual media content in at least two expression formats, decoding the The code stream obtains heterogeneous mixed splicing images and isomorphic splicing images of at least two expression formats. That is to say, when the code stream includes code streams corresponding to visual media content in at least two expression formats, the isomorphic blocks of some expression formats construct a heterogeneous hybrid splicing diagram, and the isomorphic blocks of another part of the expression formats construct an isomorphic Mosaic diagram.
  • the heterogeneous hybrid mosaic diagram includes at least one of the following: a single attribute heterogeneous hybrid mosaic diagram and a multi-attribute heterogeneous hybrid mosaic diagram;
  • the isomorphic mosaic diagram includes at least one of the following: single attribute isomorphism Mosaic graphs and multi-attribute isomorphic mosaic graphs.
  • the code stream includes a video compression sub-stream and a splicing picture information sub-stream
  • decoding the code stream to obtain at least one splicing picture and splicing picture information includes: decoding the video compression sub-stream , obtain the at least one spliced image; decode the spliced image information sub-stream to obtain the spliced image information of the at least one spliced image.
  • the code stream includes a code stream corresponding to visual media content in at least two expression formats, decode the video compression sub-stream, and decode the code stream to obtain a heterogeneous hybrid splicing image and Isomorphic splicing diagram; or, determine according to the third syntax element that the code stream includes code streams corresponding to visual media content of at least two expression formats, decode the video compression sub-code stream, and obtain the same code stream of at least two expression formats. Construct a mosaic diagram.
  • Step 1002 When it is determined that the spliced image is a heterogeneous hybrid spliced image according to the first syntax element, split the spliced image according to the spliced image information of the spliced image to obtain at least two types of isomorphic blocks, Wherein, the at least two isomorphic blocks correspond to different visual media content expression formats;
  • Step 1003 When it is determined that the spliced graph is a isomorphic spliced graph according to the first syntax element, the spliced graph is split according to the spliced graph information of the spliced graph to obtain a homogeneous block, wherein, The one isomorphic block corresponds to the same visual media content expression format;
  • Step 1004 Decode and reconstruct the isomorphic blocks to obtain visual media content in at least one expression format.
  • the method further includes: when determining that the spliced image is a heterogeneous hybrid spliced image according to the first syntax element, the spliced image information further includes a second syntax element. According to the second syntax element The element determines the expression format of the i-th block in the mosaic diagram.
  • determining the expression format of the i-th block in the mosaic diagram based on the second syntax element includes: the second syntax element is an eighth preset value, and determining the i-th block
  • the expression format is the first expression format; the second syntax element is the ninth preset value, and the expression format of the i-th block is determined to be the second expression format.
  • the second syntax element is located in the mosaic block data unit header of the i-th block of the mosaic map.
  • decoding and reconstructing the isomorphic blocks to obtain visual media content in at least one expression format includes: if the expression format of the i-th block is the first expression format, determining The sub-tiles in the i-th block are decoded and reconstructed using the decoding method corresponding to the first expression format to obtain the visual media content of the first expression format; if the expression format of the i-th block is In the second expression format, it is determined that the sub-tiles in the i-th block are decoded and reconstructed using the decoding method corresponding to the second expression format to obtain the visual media content of the second expression format.
  • the decoded code stream obtains a multi-viewpoint video splicing image, a point cloud splicing image, and a heterogeneous hybrid splicing image.
  • the heterogeneous hybrid splicing image is split, and the reconstructed multi-viewpoint video blocks and point cloud blocks are output;
  • the multi-viewpoint video is split Splicing image, output the reconstructed multi-view video block; split the point cloud splicing image according to the splicing image information corresponding to the point cloud splicing image, and output the reconstructed point cloud block; pass all the acquired multi-view point video blocks through the multi-view video Decoding generates a reconstructed multi-view video; all acquired point cloud blocks are decoded to generate a reconstructed point cloud.
  • homogeneous blocks of different expression formats are spliced into a heterogeneous mixed splicing picture, and homogeneous blocks of the same expression format are spliced into a heterogeneous mixed splicing image.
  • the splicing picture information includes the first syntax element used to indicate the type of the splicing picture, which improves the decoding efficiency of the splicing picture at the decoding end.
  • FIG. 11 is a schematic block diagram of an encoding device provided by an embodiment of the present application.
  • the encoding device 110 is applied to an encoder. As shown in Figure 11, the encoding device 110 includes:
  • the processing unit 1101 is configured to process visual media content in at least one expression format to obtain at least one isomorphic block, where different types of isomorphic blocks correspond to different visual media content expression formats;
  • the splicing unit 1102 is used to splice the at least one isomorphic block to obtain at least one spliced image and spliced image information, wherein the spliced image information includes a first syntax element, which is determined according to the first syntax element.
  • the mosaic diagram is a heterogeneous hybrid mosaic diagram or a homogeneous mosaic diagram, the heterogeneous hybrid mosaic diagram includes at least two types of isomorphic blocks, and the isomorphic mosaic diagram includes one type of isomorphic block;
  • the encoding unit 1103 is used to encode the at least one splicing picture and the splicing picture information to obtain a code stream.
  • determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram includes: if the first syntax element is a first preset value, then determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram.
  • the figure shows a heterogeneous mixed splicing diagram including homogeneous blocks of a first expression format and a second expression format, wherein the first expression format and the second expression format are different expression formats; the first syntax element is the second preset value, then it is determined that the splicing diagram is a isomorphic splicing diagram including the isomorphic blocks of the first expression format; the first syntax element is the third preset value, then it is determined that the splicing
  • the figure shows a isomorphic mosaic diagram including the isomorphic blocks of the second expression format.
  • the first syntax element includes: a first sub-syntax element and a second sub-syntax element, and it is determined that the splicing graph is heterogeneous according to the first sub-syntax element and the second sub-syntax element.
  • Hybrid mosaic or isomorphic mosaic
  • Determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram based on the first syntax element includes: if the first sub-grammar element is a fourth preset value, then it is determined that the splicing diagram includes the first expression isomorphic blocks of the format; and/or, if the second sub-syntax element is the fifth preset value, it is determined that the splicing diagram includes a isomorphic block of the second expression format; wherein, the first expression format and the second expression format are different expression formats.
  • determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram further includes: determining that the first sub-grammar element is a sixth preset value. If the mosaic graph does not include the isomorphic blocks of the first expression format; if the second sub-syntax element is the seventh preset value, it is determined that the mosaic graph does not include the isomorphic blocks of the second expression format.
  • the first syntax element is located in a parameter set sub-codestream of the codestream.
  • the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element.
  • the spliced image information when it is determined based on the first syntax element that the spliced image is a heterogeneous hybrid spliced image, the spliced image information also includes a second syntax element, and the spliced image is determined based on the second syntax element
  • determining the expression format of the i-th block in the mosaic diagram based on the second syntax element includes: the second syntax element is an eighth preset value, and determining the i-th block The expression format of the block is the first expression format; the second syntax element is the ninth preset value, which determines that the expression format of the i-th block is the second expression format.
  • the second syntax element is located in the mosaic block data unit header of the i-th block of the mosaic map.
  • the encoding unit 1103 is configured to, if the expression format of the i-th block is a first expression format, determine that the sub-tile in the i-th block adopts the first expression format. Encode with the corresponding encoding method to obtain the code stream corresponding to the visual media content of the first expression format; if the expression format of the i-th block is the second expression format, determine the neutron of the i-th block The tiles are encoded using the encoding method corresponding to the second expression format to obtain a code stream corresponding to the visual media content of the second expression format.
  • the parameter set sub-code stream of the code stream includes a third syntax element, and the code stream corresponding to the visual media content including at least one expression format in the code stream is determined according to the third syntax element.
  • determining the code stream corresponding to visual media content including at least one expression format in the code stream according to the third syntax element includes: the third syntax element is a first value, determining The code stream simultaneously includes a code stream corresponding to the visual media content in the first expression format and a code stream corresponding to the visual media content in the second expression format; the third syntax element is a second value, which determines the code stream in the code stream.
  • the code stream includes the code stream corresponding to the visual media content in the first expression format; the third syntax element is a third value, which determines that the code stream includes the code stream corresponding to the visual media content in the second expression format.
  • the third syntax element is used to indicate that the code stream includes a code stream corresponding to visual media content in at least two expression formats.
  • the encoding unit 1103 is used to encode the at least one spliced image to obtain a video compression sub-stream; encode the spliced image information of the at least one spliced image to obtain the spliced image information sub-stream. Code stream; synthesize the video compression sub-stream and the splicing image information sub-stream into the code stream.
  • the at least one expression format includes: at least one of multi-view video, point cloud, and mesh.
  • the heterogeneous hybrid mosaic diagram includes at least one of the following: a single attribute heterogeneous hybrid mosaic diagram and a multi-attribute heterogeneous hybrid mosaic diagram;
  • the isomorphic mosaic diagram includes at least one of the following: single attribute isomorphism Mosaic graphs and multi-attribute isomorphic mosaic graphs.
  • FIG. 12 is a schematic block diagram of a decoding device provided by an embodiment of the present application.
  • the decoding device 120 is applied to a decoder. As shown in Figure 12, the decoding device 120 includes:
  • the decoding unit 1201 is used to decode the code stream to obtain the splicing image and the splicing image information, wherein the splicing image information includes a first syntax element, and it is determined according to the first syntax element that the splicing image is a heterogeneous hybrid splicing image or isomorphic mosaic;
  • the splitting unit 1202 is configured to split the spliced image according to the spliced image information of the spliced image to obtain at least two homogeneous ones when it is determined according to the first syntax element that the spliced image is a heterogeneous hybrid spliced image. Constructing blocks, wherein the at least two isomorphic blocks correspond to different visual media content expression formats;
  • the splitting unit 1202 is configured to split the spliced diagram according to the spliced diagram information of the spliced diagram to obtain a homogeneous spliced diagram when it is determined according to the first syntax element that the spliced diagram is a homogeneous spliced diagram.
  • the processing unit 1203 is configured to decode and reconstruct the homogeneous blocks to obtain visual media content in at least one expression format.
  • determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram includes: if the first syntax element is a first preset value, then determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram.
  • the figure shows a heterogeneous mixed splicing diagram including homogeneous blocks of a first expression format and a second expression format, wherein the first expression format and the second expression format are different expression formats; the first syntax element is the second preset value, then it is determined that the splicing diagram is a isomorphic splicing diagram including the isomorphic blocks of the first expression format; the first syntax element is the third preset value, then it is determined that the splicing
  • the figure shows a isomorphic mosaic diagram including the isomorphic blocks of the second expression format.
  • the first syntax element includes: a first sub-syntax element and a second sub-syntax element, and it is determined that the splicing graph is heterogeneous according to the first sub-syntax element and the second sub-syntax element.
  • Hybrid mosaic or isomorphic mosaic
  • Determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram based on the first syntax element includes: if the first sub-grammar element is a fourth preset value, then it is determined that the splicing diagram includes the first expression isomorphic blocks of the format; and/or, if the second sub-syntax element is the fifth preset value, it is determined that the splicing diagram includes a isomorphic block of the second expression format; wherein, the first expression format and the second expression format are different expression formats.
  • determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram further includes: determining that the first sub-grammar element is a sixth preset value. If the mosaic graph does not include the isomorphic blocks of the first expression format; if the second sub-syntax element is the seventh preset value, it is determined that the mosaic graph does not include the isomorphic blocks of the second expression format.
  • the first syntax element is located in a parameter set sub-codestream of the codestream.
  • the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element.
  • the spliced image information when it is determined based on the first syntax element that the spliced image is a heterogeneous hybrid spliced image, the spliced image information also includes a second syntax element, and the spliced image is determined based on the second syntax element
  • determining the expression format of the i-th block in the mosaic diagram based on the second syntax element includes: the second syntax element is an eighth preset value, and determining the i-th block The expression format of the block is the first expression format; the second syntax element is the ninth preset value, which determines that the expression format of the i-th block is the second expression format.
  • the second syntax element is located in the mosaic block data unit header of the i-th block of the mosaic map.
  • the processing unit 1203 is configured to, if the expression format of the i-th block is a first expression format, determine that the sub-tiles in the i-th block adopt the first expression format.
  • the corresponding decoding method performs decoding and reconstruction to obtain the visual media content of the first expression format; if the expression format of the i-th block is the second expression format, determine the sub-block in the i-th block using The decoding method corresponding to the second expression format performs decoding and reconstruction to obtain the visual media content of the second expression format.
  • the parameter set sub-code stream of the code stream includes a third syntax element, and the code stream corresponding to the visual media content including at least one expression format in the code stream is determined according to the third syntax element.
  • determining the code stream corresponding to visual media content including at least one expression format in the code stream according to the third syntax element includes: the third syntax element is a first value, determining The code stream simultaneously includes a code stream corresponding to the visual media content in the first expression format and a code stream corresponding to the visual media content in the second expression format; the third syntax element is a second value, which determines the code stream in the code stream.
  • the code stream includes the code stream corresponding to the visual media content in the first expression format; the third syntax element is a third value, which determines that the code stream includes the code stream corresponding to the visual media content in the second expression format.
  • the decoding unit 1201 is configured to determine, according to the third syntax element, that the code stream includes a code stream corresponding to visual media content in at least two expression formats, and decode the code stream to obtain a heterogeneous hybrid Mosaic diagram.
  • the code stream includes a video compression sub-stream and a splicing image information sub-stream
  • the decoding unit 1201 is used to decode the video compression sub-stream to obtain the at least one splicing image; decoding The splicing picture information sub-stream is used to obtain the splicing picture information of the at least one splicing picture.
  • the at least one expression format includes: at least one of multi-view video, point cloud, and mesh.
  • the heterogeneous hybrid mosaic diagram includes at least one of the following: a single attribute heterogeneous hybrid mosaic diagram and a multi-attribute heterogeneous hybrid mosaic diagram;
  • the isomorphic mosaic diagram includes at least one of the following: single attribute isomorphism Mosaic graphs and multi-attribute isomorphic mosaic graphs.
  • the software unit may be located in a mature storage medium in this field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, etc.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps in the above method embodiment in combination with its hardware.
  • Figure 13 is a schematic block diagram of the encoder provided by an embodiment of the present application. As shown in Figure 13, the encoder 1310 includes:
  • Figure 14 is a schematic block diagram of a decoder provided by an embodiment of the present application. As shown in Figure 14, the decoder 1410 includes:
  • the processor may include, but is not limited to:
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the memory includes but is not limited to:
  • Non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory may be Random Access Memory (RAM), which is used as an external cache.
  • RAM Random Access Memory
  • RAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDR SDRAM
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • each functional module in this embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software function modules.
  • FIG. 15 shows a schematic structural diagram of a coding and decoding system provided by an embodiment of the present application.
  • the encoding and decoding system 150 may include an encoder 1501 and a decoder 1502.
  • the encoder 1501 may be a device integrated with the encoding device described in the previous embodiment;
  • the decoder 1502 may be a device integrated with the decoding device described in the previous embodiment.
  • both the encoder 1501 and the decoder 1502 can use the color component information of adjacent reference pixels and the pixels to be predicted to implement the calculation of the weighting coefficient corresponding to the pixel to be predicted; Moreover, different reference pixels can have different weighting coefficients. Applying this weighting coefficient to the chroma prediction of the pixels to be predicted in the current block can not only improve the accuracy of chroma prediction and save code rate, but also improve the encoding and decoding performance. .
  • An embodiment of the present application also provides a chip for implementing the above encoding and decoding method.
  • the chip includes: a processor, configured to call and run a computer program from a memory, so that the electronic device installed with the chip executes the above encoding and decoding method.
  • Embodiments of the present application also provide a computer storage medium in which a computer program is stored.
  • the computer program is executed by the second processor, the encoding method of the encoder is implemented; or, when the computer program is executed by the first processor, the encoding method of the encoder is implemented.
  • the decoding method of the decoder In other words, embodiments of the present application also provide a computer program product containing instructions, which when executed by a computer causes the computer to perform the method of the above method embodiments.
  • This application also provides a code stream, which is generated according to the above encoding method.
  • the code stream includes the above first syntax element, or includes a second syntax element and a third syntax element.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted over a wired connection from a website, computer, server, or data center (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website, computer, server or data center.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the available media may be magnetic media (such as floppy disks, hard disks, magnetic tapes), optical media (such as digital video discs (DVD)), or semiconductor media (such as solid state disks (SSD)), etc.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separate.
  • a component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or it may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in various embodiments of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

Abstract

The present application provides a coding method and apparatus, a decoding method and apparatus, and a coder, a decoder and a storage medium. For an application scenario that comprises visual media content in one or more expression formats, isomorphic blocks in different expression formats are stitched into a heterogeneous hybrid stitched image, isomorphic blocks in the same expression format are stitched into an isomorphic stitched image, and the obtained stitched images and stitched image information are written into a code stream. An isomorphic stitched image (e.g. at least one of a multi-view stitched image, a point cloud stitched image and a grid stitched image) and a heterogeneous hybrid stitched image are simultaneously present in a code stream, such that the coding method and the decoding method are applicable to the application scenarios of visual media content in a plurality of expression formats, thereby expanding the application scope. Moreover, the code stream includes a first syntactic element, such that the efficiency of a decoding end decoding a stitched image can be improved. Since isomorphic blocks in different expression formats are stitched in a heterogeneous hybrid stitched image for coding and decoding, the number of decoders invoked can be reduced, thereby reducing the implementation cost and improving the usability.

Description

一种编解码方法、装置、编码器、解码器及存储介质A coding and decoding method, device, encoder, decoder and storage medium 技术领域Technical field
本申请涉及图像处理技术领域,尤其涉及一种编解码方法、装置、编码器、解码器及存储介质。The present application relates to the field of image processing technology, and in particular to a coding and decoding method, device, encoder, decoder and storage medium.
背景技术Background technique
在三维应用场景中,例如虚拟现实(Virtual Reality,VR)、增强现实(Augmented Reality,AR)、混合现实(Mix Reality,MR)等应用场景中,在同一个场景中可能出现表达格式不同的视觉媒体对象。例如在同一个三维场景中,以视频表达了场景背景与部分人物和物件、以三维点云或三维网格表达了另一部分人物。In three-dimensional application scenarios, such as virtual reality (VR), augmented reality (AR), mixed reality (Mix Reality, MR) and other application scenarios, visual expressions with different expression formats may appear in the same scene. media objects. For example, in the same three-dimensional scene, the scene background and some characters and objects are expressed in video, and another part of the characters are expressed in three-dimensional point cloud or three-dimensional grid.
在压缩编码时分别采用多视点视频编码、点云编码、网格编码,会比全部投影成多视点视频编码更能保持原表达格式的有效信息,提高观看时所渲染的观看视窗的质量,提高码率-质量的综合效率。When compressing and encoding, using multi-viewpoint video encoding, point cloud encoding, and grid encoding respectively will maintain the effective information of the original expression format better than all projection into multi-viewpoint video encoding, improve the quality of the viewing window rendered during viewing, and improve The overall efficiency of code rate-quality.
但是,目前的编解码技术,对多视点视频、点云编码和网格网格分别进行编解码,其编解码过程中需要调用的编解码器个数较多,使得编解码代价大。However, the current encoding and decoding technology encodes and decodes multi-view video, point cloud encoding and grid mesh respectively. A large number of codecs need to be called during the encoding and decoding process, making encoding and decoding expensive.
发明内容Contents of the invention
本申请实施例提供了一种编解码方法、装置、编码器、解码器及存储介质。Embodiments of the present application provide a coding and decoding method, device, encoder, decoder, and storage medium.
第一方面,本申请提供了一种解码方法,应用于解码器,包括:In the first aspect, this application provides a decoding method applied to a decoder, including:
解码码流,得到拼接图和拼接图信息,其中,所述拼接图信息包括第一语法元素,根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图;Decode the code stream to obtain a spliced image and spliced image information, wherein the spliced image information includes a first syntax element, and it is determined according to the first syntax element that the spliced image is a heterogeneous hybrid spliced image or a homogeneous spliced image;
根据所述第一语法元素确定所述拼接图为异构混合拼接图时,根据所述拼接图的拼接图信息对所述拼接图进行拆分,得到至少两种同构区块,其中,所述至少两种同构区块对应不同的视觉媒体内容表达格式;When it is determined that the spliced graph is a heterogeneous hybrid spliced graph according to the first syntax element, the spliced graph is split according to the spliced graph information of the spliced graph to obtain at least two types of isomorphic blocks, wherein: The at least two isomorphic blocks correspond to different visual media content expression formats;
根据所述第一语法元素确定所述拼接图为同构拼接图时,根据所述拼接图的拼接图信息对所述拼接图进行拆分,得到一种同构区块,其中,所述一种同构区块对应相同的视觉媒体内容表达格式;When it is determined that the spliced graph is a isomorphic spliced graph according to the first syntax element, the spliced graph is split according to the spliced graph information of the spliced graph to obtain a homogeneous block, wherein the one Each isomorphic block corresponds to the same visual media content expression format;
对所述同构区块进行解码重建,得到至少一种表达格式的视觉媒体内容。The homogeneous blocks are decoded and reconstructed to obtain visual media content in at least one expression format.
第二方面,本申请提供了一种编码方法,应用于编码器,包括:In the second aspect, this application provides an encoding method, applied to the encoder, including:
对至少一种表达格式的视觉媒体内容进行处理,得到至少一种同构区块,其中,不同种同构区块对应不同的视觉媒体内容表达格式;Process the visual media content of at least one expression format to obtain at least one isomorphic block, where different types of isomorphic blocks correspond to different visual media content expression formats;
对所述至少一种同构区块进行拼接,得到至少一个拼接图和拼接图信息,其中,所述拼接图信息包括第一语法元素,根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图,所述异构混合拼接图包括至少两种同构区块,所述同构拼接图包括一种同构区块;The at least one isomorphic block is spliced to obtain at least one spliced graph and spliced graph information, wherein the spliced graph information includes a first syntax element, and it is determined that the spliced graph is a heterogeneous one according to the first syntax element. A heterogeneous hybrid mosaic map or a homogeneous mosaic map, the heterogeneous hybrid mosaic map includes at least two types of isomorphic blocks, and the isomorphic mosaic map includes one type of isomorphic block;
对所述至少一个拼接图和拼接图信息进行编码,得到码流。The at least one spliced image and the spliced image information are encoded to obtain a code stream.
第三方面,本申请提供了一种解码装置,应用于解码器,其中,包括:In a third aspect, this application provides a decoding device, applied to a decoder, which includes:
解码单元,用于解码码流,得到拼接图和拼接图信息,其中,所述拼接图信息包括第一语法元素,根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图;A decoding unit, configured to decode the code stream to obtain a splicing image and splicing image information, wherein the splicing image information includes a first syntax element, and the splicing image is determined to be a heterogeneous hybrid splicing image or a homogeneous splicing image according to the first syntax element. picture;
第一拆分单元,用于根据所述第一语法元素确定所述拼接图为异构混合拼接图时,根据所述拼接图的拼接图信息对所述拼接图进行拆分,得到至少两种同构区块,其中,所述至少两种同构区块对应不同的视觉媒体内容表达格式;A first splitting unit configured to split the spliced image according to the spliced image information of the spliced image to obtain at least two kinds of Isomorphic blocks, wherein the at least two isomorphic blocks correspond to different visual media content expression formats;
第二拆分单元,用于根据所述第一语法元素确定所述拼接图为同构拼接图时,根据所述拼接图的拼接图信息对所述拼接图进行拆分,得到一种同构区块,其中,所述一种同构区块对应相同的视觉媒体内容表达格式;The second splitting unit is configured to split the spliced diagram according to the spliced diagram information of the spliced diagram to obtain an isomorphic spliced diagram when it is determined according to the first syntax element that the spliced diagram is an isomorphic spliced diagram. Blocks, wherein said one isomorphic block corresponds to the same visual media content expression format;
处理单元,用于对所述同构区块进行解码重建,得到至少一种表达格式的视觉媒体内容。A processing unit configured to decode and reconstruct the homogeneous blocks to obtain visual media content in at least one expression format.
第四方面,本申请提供了一种编码装置,应用于编码器,其中,包括:In a fourth aspect, this application provides an encoding device, applied to an encoder, which includes:
处理单元,用于对至少一种表达格式的视觉媒体内容进行处理,得到至少一种同构区块,其中,不同种同构区块对应不同的视觉媒体内容表达格式;A processing unit, configured to process visual media content in at least one expression format to obtain at least one isomorphic block, wherein different types of isomorphic blocks correspond to different visual media content expression formats;
拼接单元,用于对所述至少一种同构区块进行拼接,得到至少一个拼接图和拼接图信息,其中,所述拼接图信息包括第一语法元素,根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图,所述异构混合拼接图包括至少两种同构区块,所述同构拼接图包括一种同构区块;A splicing unit, configured to splice the at least one isomorphic block to obtain at least one splicing graph and splicing graph information, wherein the splicing graph information includes a first syntax element, and the splicing graph information is determined according to the first syntax element. The mosaic diagram is a heterogeneous hybrid mosaic diagram or a homogeneous mosaic diagram, the heterogeneous hybrid mosaic diagram includes at least two types of isomorphic blocks, and the isomorphic mosaic diagram includes one type of isomorphic block;
编码单元,用于对所述至少一个拼接图和拼接图信息进行编码,得到码流。An encoding unit, used to encode the at least one spliced image and the spliced image information to obtain a code stream.
第五方面,提供了一种解码器,包括第一存储器和第一处理器;所述第一存储器存储有可在第一处理器上运行的计算机程序,以执行上述第一方面或其各实现方式中的方法。In a fifth aspect, a decoder is provided, including a first memory and a first processor; the first memory stores a computer program executable on the first processor to execute the above first aspect or its respective implementations. method within the method.
第六方面,提供了一种编码器,包括第二存储器和第二处理器;所述第二存储器存储有可在第二处理器上运行的计算机程序,以执行上述第二方面或其各实现方式中的方法。In a sixth aspect, an encoder is provided, including a second memory and a second processor; the second memory stores a computer program that can be run on the second processor to execute the above second aspect or its respective implementations. method within the method.
第七方面,提供了一种编解码系统,包括编码器和解码器。编码器用于执行上述第二方面或其各实现方式中的方法,解码器用于执行上述第一方面或其各实现方式中的方法。The seventh aspect provides a coding and decoding system, including an encoder and a decoder. The encoder is configured to perform the method in the above second aspect or its implementations, and the decoder is used to perform the method in the above first aspect or its implementations.
第八方面,提供了一种芯片,用于实现上述第一方面至第二方面中的任一方面或其各实现方式中的方法。具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备执行如上述第一方面至第二方面中的任一方面或其各实现方式中的方法。An eighth aspect provides a chip for implementing any one of the above-mentioned first to second aspects or the method in each implementation manner thereof. Specifically, the chip includes: a processor, configured to call and run a computer program from a memory, so that the device installed with the chip executes any one of the above-mentioned first to second aspects or implementations thereof. method.
第九方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。A ninth aspect provides a computer-readable storage medium for storing a computer program that causes a computer to execute any one of the above-mentioned first to second aspects or the method in each implementation thereof.
第十方面,提供了一种计算机程序产品,包括计算机程序指令,该计算机程序指令使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。In a tenth aspect, a computer program product is provided, including computer program instructions, which enable a computer to execute any one of the above-mentioned first to second aspects or the methods in each implementation thereof.
第十一方面,提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。An eleventh aspect provides a computer program that, when run on a computer, causes the computer to execute any one of the above-mentioned first to second aspects or the method in each implementation thereof.
第十二方面,提供了一种码流,码流是基于上述第二方面的编码方法生成的。A twelfth aspect provides a code stream, which is generated based on the encoding method of the second aspect.
基于以上技术方案,针对包括一种或多种表达格式的视觉媒体内容的应用场景,将不同表达格式的同构区块拼接成一张异构混合拼接图,将相同表达格式的的同构区块拼接成一张同构拼接图,将得到的拼接图和拼接图信息写入码流。码流中同时存在同构拼接图(例如多视点拼接图、点云拼接图和网格拼接图中的至少一个)和异构混合拼接图,使得该编解码方法适用于多种表达格式的视觉媒体内容的应用场景,扩大了编解码方法的应用范围。而且拼接图信息中包括了用于指示拼接图类型的第一语法元素,提高了解码端对拼接图的解码效率。进一步地,由于将不同表达格式的同构区块拼接在一张异构混合拼接图中进行编解码,能够减少所需要调用的HEVC,VVC,AVC,AVS等二维视频编解码器的个数,降低实现代价,提高易用性。Based on the above technical solution, for application scenarios that include visual media content in one or more expression formats, homogeneous blocks in different expression formats are spliced into a heterogeneous mixed splicing picture, and homogeneous blocks in the same expression format are spliced into a heterogeneous mixed splicing image. Create a isomorphic splicing image, and write the resulting splicing image and the splicing image information into the code stream. There are both homogeneous splicing images (such as at least one of multi-viewpoint splicing images, point cloud splicing images and grid splicing images) and heterogeneous hybrid splicing images in the code stream, making this encoding and decoding method suitable for visual expressions of multiple expression formats. The application scenarios of media content expand the application scope of encoding and decoding methods. Moreover, the splicing picture information includes the first syntax element used to indicate the type of the splicing picture, which improves the decoding efficiency of the splicing picture at the decoding end. Furthermore, since homogeneous blocks of different expression formats are spliced into a heterogeneous hybrid splicing image for encoding and decoding, the number of 2D video codecs such as HEVC, VVC, AVC, and AVS that need to be called can be reduced, reducing Realize value and improve ease of use.
附图说明Description of drawings
图1为本申请实施例涉及的一种视频编解码系统的示意性框图;Figure 1 is a schematic block diagram of a video encoding and decoding system related to an embodiment of the present application;
图2A是本申请实施例涉及的视频编码器的示意性框图;Figure 2A is a schematic block diagram of a video encoder involved in an embodiment of the present application;
图2B是本申请实施例涉及的视频解码器的示意性框图;Figure 2B is a schematic block diagram of a video decoder involved in an embodiment of the present application;
图3A是多视点视频数据的组织和表达框架图;Figure 3A is a diagram of the organization and expression framework of multi-viewpoint video data;
图3B是多视点视频数据的拼接图像生成示意图;Figure 3B is a schematic diagram of splicing image generation of multi-viewpoint video data;
图3C是点云数据的组织和表达框架图;Figure 3C is a diagram of the organization and expression framework of point cloud data;
图3D至图3F为不同类型的点云数据示意图;Figures 3D to 3F are schematic diagrams of different types of point cloud data;
图4为多视点视频的编码示意图;Figure 4 is a schematic diagram of multi-viewpoint video encoding;
图5为多视点视频的解码示意图;Figure 5 is a schematic diagram of decoding multi-viewpoint video;
图6为本申请一实施例提供的编码方法流程示意图;Figure 6 is a schematic flow chart of an encoding method provided by an embodiment of the present application;
图7为本申请一实施例提供的异构混合拼接图示意图;Figure 7 is a schematic diagram of a heterogeneous hybrid splicing diagram provided by an embodiment of the present application;
图8为本申请一实施例提供的同构拼接图示意图;Figure 8 is a schematic diagram of a isomorphic splicing diagram provided by an embodiment of the present application;
图9为本申请实施例提供的V3C比特流结构的一个示意图;Figure 9 is a schematic diagram of the V3C bitstream structure provided by the embodiment of the present application;
图10为本申请实施例提供的一种解码方法的示意性流程图;Figure 10 is a schematic flow chart of a decoding method provided by an embodiment of the present application;
图11为本申请一实施例提供的编码装置的示意性框图;Figure 11 is a schematic block diagram of an encoding device provided by an embodiment of the present application;
图12为本申请一实施例提供的解码装置的示意性框图;Figure 12 is a schematic block diagram of a decoding device provided by an embodiment of the present application;
图13为本申请一实施例提供的编码器的示意性框图;Figure 13 is a schematic block diagram of an encoder provided by an embodiment of the present application;
图14为本申请一实施例提供的解码器的示意性框图;Figure 14 is a schematic block diagram of a decoder provided by an embodiment of the present application;
图15为本申请实施例提供的一种编解码系统的组成结构示意图。Figure 15 is a schematic structural diagram of a coding and decoding system provided by an embodiment of the present application.
具体实施方式Detailed ways
本申请可应用于图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视频编解码领域、实时视频编解码领域等。例如,本申请的方案可结合至音视频编码标准(audio video coding standard, 简称AVS),例如,H.264/音视频编码(audio video coding,简称AVC)标准,H.265/高效视频编码(high efficiency video coding,简称HEVC)标准以及H.266/多功能视频编码(versatile video coding,简称VVC)标准。或者,本申请的方案可结合至其它专属或行业标准而操作,所述标准包含ITU-TH.261、ISO/IECMPEG-1Visual、ITU-TH.262或ISO/IECMPEG-2Visual、ITU-TH.263、ISO/IECMPEG-4Visual,ITU-TH.264(还称为ISO/IECMPEG-4AVC),包含可分级视频编解码(SVC)及多视图视频编解码(MVC)扩展。应理解,本申请的技术不限于任何特定编解码标准或技术。This application can be applied to the fields of image encoding and decoding, video encoding and decoding, hardware video encoding and decoding, dedicated circuit video encoding and decoding, real-time video encoding and decoding, etc. For example, the solution of this application can be combined with the audio and video coding standard (AVS for short), such as H.264/audio video coding (AVC for short) standard, H.265/high-efficiency video coding (AVS for short) high efficiency video coding (HEVC) standard and H.266/versatile video coding (VVC) standard. Alternatively, the solution of this application can be operated in conjunction with other proprietary or industry standards, including ITU-TH.261, ISO/IECMPEG-1Visual, ITU-TH.262 or ISO/IECMPEG-2Visual, ITU-TH.263 , ISO/IECMPEG-4Visual, ITU-TH.264 (also known as ISO/IECMPEG-4AVC), including scalable video codec (SVC) and multi-view video codec (MVC) extensions. It should be understood that the technology of this application is not limited to any specific codec standard or technology.
高自由度沉浸式编码系统根据任务线可大致分为以下几个环节:数据采集、数据的组织与表达、数据编码压缩、数据解码重建、数据合成渲染,最终将目标数据呈现给用户。The high-degree-of-freedom immersive coding system can be roughly divided into the following links according to the task line: data collection, data organization and expression, data encoding and compression, data decoding and reconstruction, data synthesis and rendering, and finally presenting the target data to the user.
本申请实施例涉及的编码主要为视频编解码,为了便于理解,首先结合图1对本申请实施例涉及的视频编解码系统进行介绍。The encoding involved in the embodiment of the present application is mainly video encoding and decoding. To facilitate understanding, the video encoding and decoding system involved in the embodiment of the present application is first introduced with reference to Figure 1 .
图1为本申请实施例涉及的一种视频编解码系统的示意性框图。需要说明的是,图1只是一种示例,本申请实施例的视频编解码系统包括但不限于图1所示。如图1所示,该视频编解码系统100包含编码设备110和解码设备120。其中编码设备用于对视频数据进行编码(可以理解成压缩)产生码流,并将码流传输给解码设备。解码设备对编码设备编码产生的码流进行解码,得到解码后的视频数据。Figure 1 is a schematic block diagram of a video encoding and decoding system related to an embodiment of the present application. It should be noted that Figure 1 is only an example, and the video encoding and decoding system in the embodiment of the present application includes but is not limited to what is shown in Figure 1 . As shown in FIG. 1 , the video encoding and decoding system 100 includes an encoding device 110 and a decoding device 120 . The encoding device is used to encode the video data (which can be understood as compression) to generate a code stream, and transmit the code stream to the decoding device. The decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
本申请实施例的编码设备110可以理解为具有视频编码功能的设备,解码设备120可以理解为具有视频解码功能的设备,即本申请实施例对编码设备110和解码设备120包括更广泛的装置,例如包含智能手机、台式计算机、移动计算装置、笔记本(例如,膝上型)计算机、平板计算机、机顶盒、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机等。The encoding device 110 in the embodiment of the present application can be understood as a device with a video encoding function, and the decoding device 120 can be understood as a device with a video decoding function. That is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120. Examples include smartphones, desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, and the like.
在一些实施例中,编码设备110可以经由信道130将编码后的视频数据(如码流)传输给解码设备120。信道130可以包括能够将编码后的视频数据从编码设备110传输到解码设备120的一个或多个媒体和/或装置。In some embodiments, the encoding device 110 may transmit the encoded video data (eg, code stream) to the decoding device 120 via the channel 130 . Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120 .
在一个实例中,信道130包括使编码设备110能够实时地将编码后的视频数据直接发射到解码设备120的一个或多个通信媒体。在此实例中,编码设备110可根据通信标准来调制编码后的视频数据,且将调制后的视频数据发射到解码设备120。其中通信媒体包含无线通信媒体,例如射频频谱,可选的,通信媒体还可以包含有线通信媒体,例如一根或多根物理传输线。In one example, channel 130 includes one or more communication media that enables encoding device 110 to transmit encoded video data directly to decoding device 120 in real time. In this example, encoding device 110 may modulate the encoded video data according to the communication standard and transmit the modulated video data to decoding device 120. The communication media includes wireless communication media, such as radio frequency spectrum. Optionally, the communication media may also include wired communication media, such as one or more physical transmission lines.
在另一实例中,信道130包括存储介质,该存储介质可以存储编码设备110编码后的视频数据。存储介质包含多种本地存取式数据存储介质,例如光盘、DVD、快闪存储器等。在该实例中,解码设备120可从该存储介质中获取编码后的视频数据。In another example, channel 130 includes a storage medium that can store video data encoded by encoding device 110 . Storage media include a variety of local access data storage media, such as optical disks, DVDs, flash memories, etc. In this example, the decoding device 120 may obtain the encoded video data from the storage medium.
在另一实例中,信道130可包含存储服务器,该存储服务器可以存储编码设备110编码后的视频数据。在此实例中,解码设备120可以从该存储服务器中下载存储的编码后的视频数据。可选的,该存储服务器可以存储编码后的视频数据且可以将该编码后的视频数据发射到解码设备120,例如web服务器(例如,用于网站)、文件传送协议(FTP)服务器等。In another example, channel 130 may include a storage server that may store video data encoded by encoding device 110 . In this example, the decoding device 120 may download the stored encoded video data from the storage server. Optionally, the storage server may store the encoded video data and may transmit the encoded video data to the decoding device 120, such as a web server (eg, for a website), a File Transfer Protocol (FTP) server, etc.
一些实施例中,编码设备110包含视频编码器112及输出接口113。其中,输出接口113可以包含调制器/解调器(调制解调器)和/或发射器。In some embodiments, the encoding device 110 includes a video encoder 112 and an output interface 113. Among other things, the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
在一些实施例中,编码设备110除了包括视频编码器112和输入接口113外,还可以包括视频源111。In some embodiments, the encoding device 110 may include a video source 111 in addition to the video encoder 112 and the input interface 113 .
视频源111可包含视频采集装置(例如,视频相机)、视频存档、视频输入接口、计算机图形系统中的至少一个,其中,视频输入接口用于从视频内容提供者处接收视频数据,计算机图形系统用于产生视频数据。Video source 111 may include at least one of a video capture device (eg, a video camera), a video archive, a video input interface for receiving video data from a video content provider, a computer graphics system Used to generate video data.
视频编码器112对来自视频源111的视频数据进行编码,产生码流。视频数据可包括一个或多个图像(picture)或图像序列(sequence of pictures)。码流以比特流的形式包含了图像或图像序列的编码信息。编码信息可以包含编码图像数据及相关联数据。相关联数据可包含序列参数集(sequence parameter set,简称SPS)、图像参数集(picture parameter set,简称PPS)及其它语法结构。SPS可含有应用于一个或多个序列的参数。PPS可含有应用于一个或多个图像的参数。语法结构是指码流中以指定次序排列的零个或多个语法元素的集合。The video encoder 112 encodes the video data from the video source 111 to generate a code stream. Video data may include one or more images (pictures) or sequence of pictures (sequence of pictures). The code stream contains the encoding information of an image or image sequence in the form of a bit stream. Encoded information may include encoded image data and associated data. The associated data may include sequence parameter set (SPS), picture parameter set (PPS) and other syntax structures. An SPS can contain parameters that apply to one or more sequences. A PPS can contain parameters that apply to one or more images. A syntax structure refers to a collection of zero or more syntax elements arranged in a specified order in a code stream.
视频编码器112经由输出接口113将编码后的视频数据直接传输到解码设备120。编码后的视频数据还可存储于存储介质或存储服务器上,以供解码设备120后续读取。The video encoder 112 transmits the encoded video data directly to the decoding device 120 via the output interface 113 . The encoded video data can also be stored on a storage medium or storage server for subsequent reading by the decoding device 120 .
在一些实施例中,解码设备120包含输入接口121和视频解码器122。在一些实施例中,解码设备120除包括输入接口121和视频解码器122外,还可以包括显示装置123。In some embodiments, decoding device 120 includes input interface 121 and video decoder 122. In some embodiments, in addition to the input interface 121 and the video decoder 122, the decoding device 120 may also include a display device 123.
其中,输入接口121包含接收器及/或调制解调器。输入接口121可通过信道130接收编码后的视频数据。The input interface 121 includes a receiver and/or a modem. Input interface 121 may receive encoded video data over channel 130.
视频解码器122用于对编码后的视频数据进行解码,得到解码后的视频数据,并将解码后的视频数 据传输至显示装置123。The video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123.
显示装置123显示解码后的视频数据。显示装置123可与解码设备120整合或在解码设备120外部。显示装置123可包括多种显示装置,例如液晶显示器(LCD)、等离子体显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。The display device 123 displays the decoded video data. Display device 123 may be integrated with decoding device 120 or external to decoding device 120 . Display device 123 may include a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
此外,图1仅为实例,本申请实施例的技术方案不限于图1,例如本申请的技术还可以应用于单侧的视频编码或单侧的视频解码。In addition, Figure 1 is only an example, and the technical solution of the embodiment of the present application is not limited to Figure 1. For example, the technology of the present application can also be applied to unilateral video encoding or unilateral video decoding.
下面对本申请实施例涉及的视频编码框架进行介绍。The video coding framework involved in the embodiments of this application is introduced below.
图2A是本申请实施例涉及的视频编码器的示意性框图。应理解,该视频编码器200可用于对图像进行有损压缩(lossy compression),也可用于对图像进行无损压缩(lossless compression)。该无损压缩可以是视觉无损压缩(visually lossless compression),也可以是数学无损压缩(mathematically lossless compression)。FIG. 2A is a schematic block diagram of a video encoder related to an embodiment of the present application. It should be understood that the video encoder 200 can be used to perform lossy compression of images (lossy compression), or can also be used to perform lossless compression (lossless compression) of images. The lossless compression can be visually lossless compression (visually lossless compression) or mathematically lossless compression (mathematically lossless compression).
该视频编码器200可应用于亮度色度(YCbCr,YUV)格式的图像数据上。例如,YUV比例可以为4:2:0、4:2:2或者4:4:4,Y表示明亮度(Luma),Cb(U)表示蓝色色度,Cr(V)表示红色色度,U和V表示为色度(Chroma)用于描述色彩及饱和度。例如,在颜色格式上,4:2:0表示每4个像素有4个亮度分量,2个色度分量(YYYYCbCr),4:2:2表示每4个像素有4个亮度分量,4个色度分量(YYYYCbCrCbCr),4:4:4表示全像素显示(YYYYCbCrCbCrCbCrCbCr)。The video encoder 200 can be applied to image data in a luminance-chrominance (YCbCr, YUV) format. For example, the YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y represents brightness (Luma), Cb(U) represents blue chroma, Cr(V) represents red chroma, U and V represent Chroma, which is used to describe color and saturation. For example, in the color format, 4:2:0 means that every 4 pixels have 4 luminance components and 2 chrominance components (YYYYCbCr), 4:2:2 means that every 4 pixels have 4 luminance components and 4 Chroma component (YYYYCbCrCbCr), 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
例如,该视频编码器200读取视频数据,针对视频数据中的每帧图像,将一帧图像划分成若干个编码树单元(coding tree unit,CTU),在一些例子中,CTB可被称作“树型块”、“最大编码单元”(Largest Coding unit,简称LCU)或“编码树型块”(coding tree block,简称CTB)。每一个CTU可以与图像内的具有相等大小的像素块相关联。每一像素可对应一个亮度(luminance或luma)采样及两个色度(chrominance或chroma)采样。因此,每一个CTU可与一个亮度采样块及两个色度采样块相关联。一个CTU大小例如为128×128、64×64、32×32等。一个CTU又可以继续被划分成若干个编码单元(Coding Unit,CU)进行编码,CU可以为矩形块也可以为方形块。CU可以进一步划分为预测单元(prediction Unit,简称PU)和变换单元(transform unit,简称TU),进而使得编码、预测、变换分离,处理的时候更灵活。在一种示例中,CTU以四叉树方式划分为CU,CU以四叉树方式划分为TU、PU。For example, the video encoder 200 reads video data, and for each frame of image in the video data, divides one frame of image into several coding tree units (coding tree units, CTU). In some examples, CTB may be called "Tree block", "Largest Coding unit" (LCU for short) or "coding tree block" (CTB for short). Each CTU can be associated with an equal-sized block of pixels within the image. Each pixel can correspond to one luminance (luminance or luma) sample and two chrominance (chrominance or chroma) samples. Therefore, each CTU can be associated with one block of luma samples and two blocks of chroma samples. A CTU size is, for example, 128×128, 64×64, 32×32, etc. A CTU can be further divided into several coding units (Coding Units, CUs) for encoding. CUs can be rectangular blocks or square blocks. CU can be further divided into prediction unit (PU for short) and transform unit (TU for short), thus enabling coding, prediction, and transformation to be separated and processing more flexible. In an example, the CTU is divided into CUs in a quad-tree manner, and the CU is divided into TUs and PUs in a quad-tree manner.
视频编码器及视频解码器可支持各种PU大小。假定特定CU的大小为2N×2N,视频编码器及视频解码器可支持2N×2N或N×N的PU大小以用于帧内预测,且支持2N×2N、2N×N、N×2N、N×N或类似大小的对称PU以用于帧间预测。视频编码器及视频解码器还可支持2N×nU、2N×nD、nL×2N及nR×2N的不对称PU以用于帧间预测。Video encoders and video decoders can support various PU sizes. Assuming that the size of a specific CU is 2N×2N, the video encoder and video decoder can support a PU size of 2N×2N or N×N for intra prediction, and support 2N×2N, 2N×N, N×2N, N×N or similar sized symmetric PU for inter prediction. The video encoder and video decoder can also support 2N×nU, 2N×nD, nL×2N and nR×2N asymmetric PUs for inter prediction.
在一些实施例中,如图2A所示,该视频编码器200可包括:预测单元210、残差单元220、变换/量化单元230、反变换/量化单元240、重建单元250、环路滤波单元260、解码图像缓存270和熵编码单元280。需要说明的是,视频编码器200可包含更多、更少或不同的功能组件。In some embodiments, as shown in FIG. 2A , the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, and a loop filtering unit. 260. Decode the image cache 270 and the entropy encoding unit 280. It should be noted that the video encoder 200 may include more, less, or different functional components.
可选的,在本申请中,当前块(current block)可以称为当前编码单元(CU)或当前预测单元(PU)等。预测块也可称为预测图像块或图像预测块,重建图像块也可称为重建块或图像重建图像块。Optionally, in this application, the current block (current block) may be called the current coding unit (CU) or the current prediction unit (PU), etc. The prediction block may also be called a predicted image block or an image prediction block, and the reconstructed image block may also be called a reconstruction block or an image reconstructed image block.
在一些实施例中,预测单元210包括帧间预测单元211和帧内估计单元212。由于视频的一个帧中的相邻像素之间存在很强的相关性,在视频编解码技术中使用帧内预测的方法消除相邻像素之间的空间冗余。由于视频中的相邻帧之间存在着很强的相似性,在视频编解码技术中使用帧间预测方法消除相邻帧之间的时间冗余,从而提高编码效率。In some embodiments, prediction unit 210 includes inter prediction unit 211 and intra estimation unit 212. Since there is a strong correlation between adjacent pixels in a video frame, the intra-frame prediction method is used in video encoding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Since there is a strong similarity between adjacent frames in the video, the interframe prediction method is used in video coding and decoding technology to eliminate the temporal redundancy between adjacent frames, thereby improving coding efficiency.
帧间预测单元211可用于帧间预测,帧间预测可以包括运动估计(motion estimation)和运动补偿(motion compensation),可以参考不同帧的图像信息,帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块。运动信息包括参考帧所在的参考帧列表,参考帧索引,以及运动矢量。运动矢量可以是整像素的或者是分像素的,如果运动矢量是分像素的,那么需要在参考帧中使用插值滤波做出所需的分像素的块,这里把根据运动矢量找到的参考帧中的整像素或者分像素的块叫参考块。有的技术会直接把参考块作为预测块,有的技术会在参考块的基础上再处理生成预测块。在参考块的基础上再处理生成预测块也可以理解为把参考块作为预测块然后再在预测块的基础上处理生成新的预测块。The inter-frame prediction unit 211 can be used for inter-frame prediction. Inter-frame prediction can include motion estimation (motion estimation) and motion compensation (motion compensation). It can refer to image information of different frames. Inter-frame prediction uses motion information to find a reference from a reference frame. block, a prediction block is generated based on the reference block to eliminate temporal redundancy; the frames used in inter-frame prediction can be P frames and/or B frames, P frames refer to forward prediction frames, and B frames refer to bidirectional predictions frame. Inter-frame prediction uses motion information to find reference blocks from reference frames and generate prediction blocks based on the reference blocks. The motion information includes the reference frame list where the reference frame is located, the reference frame index, and the motion vector. The motion vector can be in whole pixels or sub-pixels. If the motion vector is in sub-pixels, then interpolation filtering needs to be used in the reference frame to make the required sub-pixel blocks. Here, the reference frame found according to the motion vector is A block of whole pixels or sub-pixels is called a reference block. Some technologies will directly use the reference block as a prediction block, and some technologies will process the reference block to generate a prediction block. Reprocessing to generate a prediction block based on a reference block can also be understood as using the reference block as a prediction block and then processing to generate a new prediction block based on the prediction block.
帧内估计单元212只参考同一帧图像的信息,预测当前码图像块内的像素信息,用于消除空间冗余。帧内预测所使用的帧可以为I帧。The intra-frame estimation unit 212 only refers to the information of the same frame image and predicts the pixel information in the current coded image block to eliminate spatial redundancy. The frames used in intra prediction may be I frames.
帧内预测有多种预测模式,以国际数字视频编码标准H系列为例,H.264/AVC标准有8种角度预测模式和1种非角度预测模式,H.265/HEVC扩展到33种角度预测模式和2种非角度预测模式。HEVC使用的帧内预测模式有平面模式(Planar)、DC和33种角度模式,共35种预测模式。VVC使用的帧 内模式有Planar、DC和65种角度模式,共67种预测模式。Intra-frame prediction has multiple prediction modes. Taking the international digital video coding standard H series as an example, the H.264/AVC standard has 8 angle prediction modes and 1 non-angle prediction mode, and H.265/HEVC has been extended to 33 angles. prediction mode and 2 non-angle prediction modes. The intra-frame prediction modes used by HEVC include planar mode (Planar), DC and 33 angle modes, for a total of 35 prediction modes. The intra-frame modes used by VVC include Planar, DC and 65 angle modes, for a total of 67 prediction modes.
需要说明的是,随着角度模式的增加,帧内预测将会更加精确,也更加符合对高清以及超高清数字视频发展的需求。It should be noted that with the increase of angle modes, intra-frame prediction will be more accurate and more in line with the development needs of high-definition and ultra-high-definition digital videos.
残差单元220可基于CU的像素块及CU的PU的预测块来产生CU的残差块。举例来说,残差单元220可产生CU的残差块,使得残差块中的每一采样具有等于以下两者之间的差的值:CU的像素块中的采样,及CU的PU的预测块中的对应采样。 Residual unit 220 may generate a residual block of the CU based on the pixel block of the CU and the prediction block of the PU of the CU. For example, residual unit 220 may generate a residual block of a CU such that each sample in the residual block has a value equal to the difference between the sample in the pixel block of the CU and the PU of the CU. Predict the corresponding sample in the block.
变换/量化单元230可量化变换系数。变换/量化单元230可基于与CU相关联的量化参数(QP)值来量化与CU的TU相关联的变换系数。视频编码器200可通过调整与CU相关联的QP值来调整应用于与CU相关联的变换系数的量化程度。Transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with the TU of the CU based on quantization parameter (QP) values associated with the CU. Video encoder 200 may adjust the degree of quantization applied to transform coefficients associated with the CU by adjusting the QP value associated with the CU.
反变换/量化单元240可分别将逆量化及逆变换应用于量化后的变换系数,以从量化后的变换系数重建残差块。Inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct the residual block from the quantized transform coefficients.
重建单元250可将重建后的残差块的采样加到预测单元210产生的一个或多个预测块的对应采样,以产生与TU相关联的重建图像块。通过此方式重建CU的每一个TU的采样块,视频编码器200可重建CU的像素块。 Reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by prediction unit 210 to produce a reconstructed image block associated with the TU. By reconstructing blocks of samples for each TU of a CU in this manner, video encoder 200 can reconstruct blocks of pixels of the CU.
环路滤波单元260用于对反变换与反量化后的像素进行处理,弥补失真信息,为后续编码像素提供更好的参考,例如可执行消块滤波操作以减少与CU相关联的像素块的块效应。The loop filtering unit 260 is used to process the inversely transformed and inversely quantized pixels to compensate for distortion information and provide a better reference for subsequent encoding of pixels. For example, a deblocking filtering operation can be performed to reduce the number of pixel blocks associated with the CU. block effect.
在一些实施例中,环路滤波单元260包括去块滤波单元和样点自适应补偿/自适应环路滤波(SAO/ALF)单元,其中去块滤波单元用于去方块效应,SAO/ALF单元用于去除振铃效应。In some embodiments, the loop filtering unit 260 includes a deblocking filtering unit and a sample adaptive compensation/adaptive loop filtering (SAO/ALF) unit, where the deblocking filtering unit is used to remove blocking effects, and the SAO/ALF unit Used to remove ringing effects.
解码图像缓存270可存储重建后的像素块。帧间预测单元211可使用含有重建后的像素块的参考图像来对其它图像的PU执行帧间预测。另外,帧内估计单元212可使用解码图像缓存270中的重建后的像素块来对在与CU相同的图像中的其它PU执行帧内预测。Decoded image cache 270 may store reconstructed pixel blocks. Inter prediction unit 211 may perform inter prediction on PUs of other images using reference images containing reconstructed pixel blocks. Additionally, intra estimation unit 212 may use the reconstructed pixel blocks in decoded image cache 270 to perform intra prediction on other PUs in the same image as the CU.
熵编码单元280可接收来自变换/量化单元230的量化后的变换系数。熵编码单元280可对量化后的变换系数执行一个或多个熵编码操作以产生熵编码后的数据。 Entropy encoding unit 280 may receive the quantized transform coefficients from transform/quantization unit 230 . Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.
图2B是本申请实施例涉及的视频解码器的示意性框图。FIG. 2B is a schematic block diagram of a video decoder related to an embodiment of the present application.
如图2B所示,视频解码器300包含:熵解码单元310、预测单元320、反量化/变换单元330、重建单元340、环路滤波单元350及解码图像缓存360。需要说明的是,视频解码器300可包含更多、更少或不同的功能组件。As shown in FIG. 2B , the video decoder 300 includes an entropy decoding unit 310 , a prediction unit 320 , an inverse quantization/transformation unit 330 , a reconstruction unit 340 , a loop filtering unit 350 and a decoded image cache 360 . It should be noted that the video decoder 300 may include more, less, or different functional components.
视频解码器300可接收码流。熵解码单元310可解析码流以从码流提取语法元素。作为解析码流的一部分,熵解码单元310可解析码流中的经熵编码后的语法元素。预测单元320、反量化/变换单元330、重建单元340及环路滤波单元350可根据从码流中提取的语法元素来解码视频数据,即产生解码后的视频数据。Video decoder 300 can receive the code stream. Entropy decoding unit 310 may parse the codestream to extract syntax elements from the codestream. As part of parsing the code stream, the entropy decoding unit 310 may parse entropy-encoded syntax elements in the code stream. The prediction unit 320, the inverse quantization/transformation unit 330, the reconstruction unit 340 and the loop filtering unit 350 may decode the video data according to the syntax elements extracted from the code stream, that is, generate decoded video data.
在一些实施例中,预测单元320包括帧间预测单元321和帧内估计单元322。In some embodiments, prediction unit 320 includes inter prediction unit 321 and intra estimation unit 322.
帧内估计单元322可执行帧内预测以产生PU的预测块。帧内估计单元322可使用帧内预测模式以基于空间相邻PU的像素块来产生PU的预测块。帧内估计单元322还可根据从码流解析的一个或多个语法元素来确定PU的帧内预测模式。Intra estimation unit 322 may perform intra prediction to generate predicted blocks for the PU. Intra estimation unit 322 may use an intra prediction mode to generate predicted blocks for a PU based on pixel blocks of spatially neighboring PUs. Intra estimation unit 322 may also determine the intra prediction mode of the PU based on one or more syntax elements parsed from the codestream.
帧间预测单元321可根据从码流解析的语法元素来构造第一参考图像列表(列表0)及第二参考图像列表(列表1)。此外,如果PU使用帧间预测编码,则熵解码单元310可解析PU的运动信息。帧间预测单元321可根据PU的运动信息来确定PU的一个或多个参考块。帧间预测单元321可根据PU的一个或多个参考块来产生PU的预测块。The inter prediction unit 321 may construct a first reference image list (List 0) and a second reference image list (List 1) according to syntax elements parsed from the code stream. Additionally, if the PU uses inter-prediction encoding, entropy decoding unit 310 may parse the motion information of the PU. Inter prediction unit 321 may determine one or more reference blocks for the PU based on the motion information of the PU. Inter prediction unit 321 may generate a predictive block for the PU based on one or more reference blocks of the PU.
反量化/变换单元330可逆量化(即,解量化)与TU相关联的变换系数。反量化/变换单元330可使用与TU的CU相关联的QP值来确定量化程度。Inverse quantization/transform unit 330 may inversely quantize (ie, dequantize) transform coefficients associated with a TU. Inverse quantization/transform unit 330 may use the QP value associated with the CU of the TU to determine the degree of quantization.
在逆量化变换系数之后,反量化/变换单元330可将一个或多个逆变换应用于逆量化变换系数,以便产生与TU相关联的残差块。After inversely quantizing the transform coefficients, inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients to produce a residual block associated with the TU.
重建单元340使用与CU的TU相关联的残差块及CU的PU的预测块以重建CU的像素块。例如,重建单元340可将残差块的采样加到预测块的对应采样以重建CU的像素块,得到重建图像块。 Reconstruction unit 340 uses the residual blocks associated with the TU of the CU and the prediction blocks of the PU of the CU to reconstruct the pixel blocks of the CU. For example, reconstruction unit 340 may add samples of the residual block to corresponding samples of the prediction block to reconstruct the pixel block of the CU to obtain a reconstructed image block.
环路滤波单元350可执行消块滤波操作以减少与CU相关联的像素块的块效应。Loop filtering unit 350 may perform deblocking filtering operations to reduce blocking artifacts for blocks of pixels associated with the CU.
视频解码器300可将CU的重建图像存储于解码图像缓存360中。视频解码器300可将解码图像缓存360中的重建图像作为参考图像用于后续预测,或者,将重建图像传输给显示装置呈现。Video decoder 300 may store the reconstructed image of the CU in decoded image cache 360 . The video decoder 300 may use the reconstructed image in the decoded image cache 360 as a reference image for subsequent prediction, or transmit the reconstructed image to a display device for presentation.
视频编解码的基本流程如下:在编码端,将一帧图像划分成块,针对当前块,预测单元210使用帧内预测或帧间预测产生当前块的预测块。残差单元220可基于预测块与当前块的原始块计算残差块,即预测块和当前块的原始块的差值,该残差块也可称为残差信息。该残差块经由变换/量化单元230变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。可选的,经过变换/量化单元230变换 与量化之前的残差块可称为时域残差块,经过变换/量化单元230变换与量化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元280接收到变化量化单元230输出的量化后的变化系数,可对该量化后的变化系数进行熵编码,输出码流。例如,熵编码单元280可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。The basic process of video encoding and decoding is as follows: at the encoding end, an image frame is divided into blocks. For the current block, the prediction unit 210 uses intra prediction or inter prediction to generate a prediction block of the current block. The residual unit 220 may calculate a residual block based on the prediction block and the original block of the current block, that is, the difference between the prediction block and the original block of the current block. The residual block may also be called residual information. The residual block undergoes transformation and quantization processes such as transformation/quantization unit 230 to remove information that is insensitive to human eyes to eliminate visual redundancy. Optionally, the residual block before transformation and quantization by the transformation/quantization unit 230 may be called a time domain residual block, and the time domain residual block after transformation and quantization by the transformation/quantization unit 230 may be called a frequency residual block. or frequency domain residual block. The entropy encoding unit 280 receives the quantized change coefficient output from the change quantization unit 230, and may perform entropy encoding on the quantized change coefficient to output a code stream. For example, the entropy encoding unit 280 may eliminate character redundancy according to the target context model and probability information of the binary code stream.
在解码端,熵解码单元310可解析码流得到当前块的预测信息、量化系数矩阵等,预测单元320基于预测信息对当前块使用帧内预测或帧间预测产生当前块的预测块。反量化/变换单元330使用从码流得到的量化系数矩阵,对量化系数矩阵进行反量化、反变换得到残差块。重建单元340将预测块和残差块相加得到重建块。重建块组成重建图像,环路滤波单元350基于图像或基于块对重建图像进行环路滤波,得到解码图像。编码端同样需要和解码端类似的操作获得解码图像。该解码图像也可以称为重建图像,重建图像可以为后续的帧作为帧间预测的参考帧。At the decoding end, the entropy decoding unit 310 can parse the code stream to obtain the prediction information, quantization coefficient matrix, etc. of the current block. The prediction unit 320 uses intra prediction or inter prediction for the current block based on the prediction information to generate a prediction block of the current block. The inverse quantization/transform unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block. The reconstruction unit 340 adds the prediction block and the residual block to obtain a reconstruction block. The reconstructed blocks constitute a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the blocks to obtain a decoded image. The encoding end also needs similar operations as the decoding end to obtain the decoded image. The decoded image may also be called a reconstructed image, and the reconstructed image may be used as a reference frame for inter-frame prediction for subsequent frames.
需要说明的是,编码端确定的块划分信息,以及预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息等在必要时携带在码流中。解码端通过解析码流及根据已有信息进行分析确定与编码端相同的块划分信息,预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息,从而保证编码端获得的解码图像和解码端获得的解码图像相同。It should be noted that the block division information determined by the encoding end, as well as mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc., are carried in the code stream when necessary. The decoding end determines the same block division information as the encoding end by parsing the code stream and analyzing the existing information, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, thereby ensuring the decoded image obtained by the encoding end It is the same as the decoded image obtained by the decoding end.
上述是基于块的混合编码框架下的视频编解码器的基本流程,随着技术的发展,该框架或流程的一些模块或步骤可能会被优化,本申请适用于该基于块的混合编码框架下的视频编解码器的基本流程,但不限于该框架及流程。The above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. This application is applicable to the block-based hybrid coding framework. The basic process of the video codec, but is not limited to this framework and process.
在一些应用场景中,在同一个三维场景中同时出现多种异构内容,例如出现多视点视频和点云。对于这种情况,目前的编解码方式至少包括如下两种:In some application scenarios, multiple heterogeneous contents appear simultaneously in the same three-dimensional scene, such as multi-viewpoint videos and point clouds. For this situation, the current encoding and decoding methods include at least the following two:
方式一,对于多视点视频采用MPEG(Moving Picture Experts Group,动态图像专家组)沉浸式视频(MPEG Immersive Video,简称MIV)技术进行编解码,对于点云则采用点云视频压缩(Video based Point Cloud Compression,简称VPCC)技术进行编解码。Method 1: For multi-viewpoint videos, MPEG (Moving Picture Experts Group) immersive video (MIV) technology is used for encoding and decoding, and for point clouds, point cloud video compression (Video based Point Cloud) is used. Compression (VPCC for short) technology for encoding and decoding.
下面对MIV技术和VPCC技术进行介绍。MIV technology and VPCC technology are introduced below.
MIV技术:为了降低传输像素率的同时尽可能保留场景信息,以便保证有足够的信息用于渲染目标视图,MPEG-I采用的方案如图3A所示,选择有限数量视点作为基础视点且尽可能表达场景的可视范围,基础视点作为完整图像传输,去除剩余非基础视点与基础视点之间的冗余像素,即仅保留非重复表达的有效信息,再将有效信息提取为子块图像与基础视点图像进行重组织,形成更大的矩形图像,该矩形图像称为拼接图像,图3A和图3B给出拼接图像的生成示意过程。将拼接图像送入编解码器压缩重建,并且子块图像拼接信息有关的辅助数据也一并送入编码器形成码流。MIV technology: In order to reduce the transmission pixel rate while retaining scene information as much as possible to ensure that there is enough information for rendering the target view, the scheme adopted by MPEG-I is shown in Figure 3A. A limited number of viewpoints are selected as the basic viewpoints and as much as possible To express the visible range of the scene, the base viewpoint is transmitted as a complete image, and the redundant pixels between the remaining non-base viewpoints and the base viewpoint are removed, that is, only the effective information of non-repeated expressions is retained, and then the effective information is extracted into sub-block images and base views The viewpoint image is reorganized to form a larger rectangular image, which is called a spliced image. Figure 3A and Figure 3B give a schematic process of generating a spliced image. The spliced image is sent to the codec for compression and reconstruction, and the auxiliary data related to the sub-block image splicing information is also sent to the encoder to form a code stream.
VPCC的编码方法是将点云投影成二维图像或视频,将三维信息转换成二维信息编码。图3C是VPCC的编码框图,码流大致分为四个部分,几何码流是几何深度图编码产生的码流,用来表示点云的几何信息;属性码流是纹理图编码产生的码流,用来表示点云的属性信息;占用码流是占用图编码产生的码流,用来指示深度图和纹理图中的有效区域;这三种类型的视频都使用视频编码器进行编解码,如图3D至图3F所示。辅助信息码流是子块图像的附属信息编码产生的码流,即V3C标准中的patch data unit相关的部分,指示了每个子块图像的位置和大小等信息。The encoding method of VPCC is to project point clouds into two-dimensional images or videos, and convert three-dimensional information into two-dimensional information encoding. Figure 3C is the coding block diagram of VPCC. The code stream is roughly divided into four parts. The geometric code stream is the code stream generated by geometric depth map encoding, which is used to represent the geometric information of the point cloud; the attribute code stream is the code stream generated by texture map encoding. , used to represent the attribute information of the point cloud; the occupancy code stream is the code stream generated by the occupancy map encoding, which is used to indicate the effective area in the depth map and texture map; these three types of videos all use video encoders for encoding and decoding. As shown in Figure 3D to Figure 3F. The auxiliary information code stream is the code stream generated by encoding the auxiliary information of the sub-block image, which is the part related to the patch data unit in the V3C standard, indicating the position and size of each sub-block image.
方式二,多视点视频和点云均使用可视体视频编码(Visual Volumetric Video-based Coding,简称V3C)中的帧打包(frame packing)技术进行编解码。Method 2: Multi-viewpoint videos and point clouds are encoded and decoded using the frame packing technology in Visual Volumetric Video-based Coding (V3C).
下面对frame packing技术进行介绍。The following is an introduction to frame packing technology.
以多视点视频为例,示例性的,如图4所示,编码端包括如下步骤:Taking multi-viewpoint video as an example, as shown in Figure 4, the encoding end includes the following steps:
步骤1,对获取的多视点视频进行编码时,经过一些前处理,生成多视点视频子块(patch),接着,将多视点视频子块进行组织,生成多视点视频拼接图。Step 1: When encoding the acquired multi-view video, perform some pre-processing to generate multi-view video sub-blocks (patch). Then, organize the multi-view video sub-blocks to generate a multi-view video splicing image.
例如,图4所示,将多视点视频输入TIMV中进行打包,输出多视点视频拼接图。TIMV为一种MIV的参考软件。本申请实施例的打包可以理解为拼接。For example, as shown in Figure 4, multi-viewpoint videos are input into TIMV for packaging, and a multi-viewpoint video splicing image is output. TIMV is a reference software for MIV. Packaging in the embodiment of this application can be understood as splicing.
其中,多视点视频拼接图包括多视点视频纹理拼接图、多视点视频几何拼接图,即只包含多视点视频子块。Among them, the multi-viewpoint video mosaic includes a multi-view video texture mosaic and a multi-view video geometry mosaic, that is, it only contains multi-view video sub-blocks.
步骤2,将多视点视频拼接图输入帧打包器,输出多视点视频混合拼接图。Step 2: Input the multi-viewpoint video splicing image into the frame packer and output the multi-viewpoint video mixed splicing image.
其中,多视点视频混合拼接图包括多视点视频纹理混合拼接图,多视点视频几何混合拼接图,多视点视频纹理与几何混合拼接图。Among them, the multi-viewpoint video hybrid splicing image includes a multi-viewpoint video texture blending splicing image, a multi-viewpoint video geometry blending splicing image, and a multi-viewpoint video texture and geometry blending splicing image.
具体的,如图4所示,将多视点视频拼接图进行帧打包(frame packing),生成多视点视频混合拼接图,每个多视点视频拼接图占用多视点视频混合拼接图的一个区域(region)。相应地,在码流中要为每个区域传送一个标志pin_region_type_id_minus2,这个标志记录了当前区域属于多视点视频纹理拼接图还是多视点视频几何拼接图的信息,在解码端需要利用该信息。Specifically, as shown in Figure 4, the multi-viewpoint video splicing image is frame packed to generate a multi-viewpoint video hybrid splicing image. Each multi-viewpoint video splicing image occupies a region of the multi-viewpoint video hybrid splicing image. ). Correspondingly, a flag pin_region_type_id_minus2 must be transmitted for each region in the code stream. This flag records the information whether the current area belongs to a multi-viewpoint video texture splicing map or a multi-viewpoint video geometric splicing map. This information needs to be used at the decoding end.
步骤3,使用视频编码器对多视点视频混合拼接图进行编码,得到码流。Step 3: Use a video encoder to encode the multi-viewpoint video mixed splicing image to obtain a code stream.
示例性的,如图5所示,解码端包括如下步骤:For example, as shown in Figure 5, the decoding end includes the following steps:
步骤1,在多视点视频解码时,将获取的码流输入视频解码器中进行解码,得到重建多视点视频混合拼接图。Step 1: During multi-viewpoint video decoding, input the obtained code stream into the video decoder for decoding to obtain a reconstructed multi-viewpoint video mixed splicing image.
步骤2,将重建多视点视频混合拼接图输入帧解打包器中,输出重建多视点视频拼接图。Step 2: Input the reconstructed multi-viewpoint video mixed splicing image into the frame depacker and output the reconstructed multi-viewpoint video splicing image.
具体的,首先,从码流中获取标志pin_region_type_id_minus2,若确定该pin_region_type_id_minus2是V3C_AVD,则表示当前区域是多视点视频纹理拼接图,则将该当前区域拆分并输出为重建多视点视频纹理拼接图。Specifically, first, the flag pin_region_type_id_minus2 is obtained from the code stream. If it is determined that the pin_region_type_id_minus2 is V3C_AVD, it means that the current region is a multi-viewpoint video texture mosaic, and then the current region is split and output as a reconstructed multi-viewpoint video texture mosaic.
若确定该pin_region_type_id_minus2是V3C_GVD,则表示当前区域是多视点视频几何拼接图,将该当前区域拆分并输出为重建多视点视频几何拼接图。If it is determined that pin_region_type_id_minus2 is V3C_GVD, it means that the current region is a multi-viewpoint video geometric mosaic, and the current region is split and output as a reconstructed multi-viewpoint video geometric mosaic.
步骤3,对重建多视点视频拼接图进行解码,得到重建多视点视频。Step 3: Decode the reconstructed multi-viewpoint video splicing image to obtain the reconstructed multi-viewpoint video.
具体是,对多视点视频纹理拼接图和多视点视频几何拼接图进行解码,得到重建多视点视频。Specifically, the multi-viewpoint video texture splicing image and the multi-viewpoint video geometric splicing image are decoded to obtain the reconstructed multi-viewpoint video.
上面以多视点视频为例对frame packing技术进行解析介绍,对于点云进行frame packing编解码方式,与上述多视点视频基本相同,参照即可,例如使用TMC(一种VPCC的参考软件)对点云进行打包,得到点云拼接图,对点云拼接图输入帧打包器进行帧打包,得到点云混合拼接图,对点云混合拼接图进行拼接,得到点云码流,在此不再赘述。The above uses multi-viewpoint video as an example to analyze and introduce frame packing technology. The frame packing encoding and decoding method for point clouds is basically the same as the above-mentioned multi-viewpoint video. You can refer to it. For example, use TMC (a VPCC reference software) to point cloud. The cloud is packaged to obtain a point cloud splicing image. The point cloud splicing image is input into the frame packer for frame packaging to obtain a point cloud hybrid splicing image. The point cloud hybrid splicing image is spliced to obtain a point cloud code stream. I will not go into details here. .
下面对标准中与frame packing相关的语法进行介绍。The following is an introduction to the syntax related to frame packing in the standard.
V3C单元头语法如表1所示:The V3C unit header syntax is shown in Table 1:
表1Table 1
Figure PCTCN2022105006-appb-000001
Figure PCTCN2022105006-appb-000001
V3C单元头语义,如表2所示:V3C unit header semantics, as shown in Table 2:
表2:V3C单元类型Table 2: V3C unit types
Figure PCTCN2022105006-appb-000002
Figure PCTCN2022105006-appb-000002
Figure PCTCN2022105006-appb-000003
Figure PCTCN2022105006-appb-000003
目前,如果在同一个三维场景中同时出现多种不同表达格式的视觉媒体内容时,则对多种不同表达格式的视觉媒体内容分别进行编解码。例如,对于同一个三维场景中同时出现点云和多视点视频的情况,目前的打包技术是,对点云进行压缩,形成点云压缩码流(即一种V3C码流),对多视点视频信息压缩,得到多视点视频压缩码流(即另一种V3C码流),然后由系统层对压缩码流进行复接,得到融合的三维场景复接码流。解码时,对点云压缩码流和多视点视频压缩码流分别进行解码。由此可知,现有技术在对多种不同表达格式的视觉媒体内容进行编解码时,使用的编解码器多,编解码代价高。Currently, if multiple visual media contents with different expression formats appear simultaneously in the same three-dimensional scene, the visual media content with multiple different expression formats will be encoded and decoded separately. For example, for the situation where point cloud and multi-viewpoint video appear simultaneously in the same three-dimensional scene, the current packaging technology is to compress the point cloud to form a point cloud compression code stream (i.e. a V3C code stream), and to compress the multi-viewpoint video Information is compressed to obtain a multi-view video compressed code stream (i.e. another V3C code stream), and then the system layer multiplexes the compressed code stream to obtain a fused three-dimensional scene multiplexed code stream. During decoding, the point cloud compression code stream and the multi-viewpoint video compression code stream are decoded separately. It can be seen from this that when encoding and decoding visual media content in multiple different expression formats, the existing technology uses many codecs and the encoding and decoding cost is high.
为了解决上述技术问题,本申请实施例通过将不同表达格式的同构区块拼接在一张异构混合拼接图中,将相同表达格式的的同构区块拼接在一张同构拼接图中,对得到的异构混合拼接图和/或同构拼接图进行编码写入码流,码流中可以同时存在同构拼接图(例如多视点拼接图、点云拼接图和网格拼接图中的至少一个)和异构混合拼接图,扩大编解码方法的应用场景。而且拼接图信息中包括了用于指示拼接图类型的第一语法元素,能够提高解码端对拼接图的解码效率。In order to solve the above technical problems, the embodiments of the present application splice homogeneous blocks with different expression formats into a heterogeneous mixed splicing diagram, and splice homogeneous blocks with the same expression format into a homogeneous splicing diagram. The resulting Heterogeneous hybrid splicing images and/or homogeneous splicing images are encoded and written into the code stream. Homogeneous splicing images (such as at least one of multi-viewpoint splicing images, point cloud splicing images and grid splicing images) can coexist in the code stream. and heterogeneous hybrid splicing images to expand the application scenarios of encoding and decoding methods. Moreover, the splicing picture information includes a first syntax element used to indicate the type of the splicing picture, which can improve the decoding efficiency of the splicing picture at the decoding end.
下面结合图6,以编码端为例,对本申请实施例提供的视频编码方法进行介绍。The video encoding method provided by the embodiment of the present application will be introduced below with reference to Figure 6, taking the encoding end as an example.
图6为本申请实施例提供的编码方法的流程示意图,如图6所示,该编码方法包括:Figure 6 is a schematic flow chart of the encoding method provided by the embodiment of the present application. As shown in Figure 6, the encoding method includes:
步骤601:对至少一种表达格式的视觉媒体内容进行处理,得到至少一种同构区块,其中,不同种同构区块对应不同的视觉媒体内容表达格式;Step 601: Process the visual media content of at least one expression format to obtain at least one isomorphic block, where different types of isomorphic blocks correspond to different visual media content expression formats;
在三维应用场景中,例如虚拟现实(Virtual Reality,VR)、增强现实(Augmented Reality,AR)、混合现实(Mix Reality,MR)等应用场景中,在同一个场景中可能出现表达格式不同的视觉媒体对象,例如在同一个三维场景中存在,以视频表达场景背景与部分人物和物件、以三维点云或三维网格表达了另一部分人物。In three-dimensional application scenarios, such as virtual reality (VR), augmented reality (AR), mixed reality (Mix Reality, MR) and other application scenarios, visual expressions with different expression formats may appear in the same scene. Media objects, for example, exist in the same three-dimensional scene. The scene background and some characters and objects are expressed in video, and another part of the characters are expressed in three-dimensional point cloud or three-dimensional grid.
在一些实施例中,视觉媒体内容包括多视点视频、点云和网格等至少一种表达格式的视觉媒体内容。其中,多视点视频的一种特例是单一视点视频,即该多视点视频可以包括多个视点视频和/或单一视点视频。In some embodiments, the visual media content includes visual media content in at least one expression format such as multi-view video, point cloud, and grid. A special example of multi-viewpoint video is single-viewpoint video, that is, the multi-viewpoint video may include multiple viewpoint videos and/or single-viewpoint video.
其中,一种同构区块对应一种表达格式。示例性的,至少一种同构区块对应的表达格式包括以下至少一种:多视点视频、点云、网格。至少两种同构区块对应至少两种不同的表达格式,例如本申请实施例的至少两种同构区块包括多视点视频、点云、网格等至少两种不同表达格式的同构区块。Among them, one isomorphic block corresponds to one expression format. Exemplarily, the expression format corresponding to at least one isomorphic block includes at least one of the following: multi-view video, point cloud, and grid. At least two isomorphic blocks correspond to at least two different expression formats. For example, the at least two isomorphic blocks in the embodiment of the present application include isomorphic areas of at least two different expression formats such as multi-view video, point cloud, grid, etc. piece.
需要说明的是,本申请实施例中每种同构区块中可以包括具备相同表达格式的至少一个同构区块。示例性的,点云格式的同构区块中包括一个或者多个点云区块,多视点视频格式的同构区域中包括一个或多个多视点视频区块,网格格式的同构区块包括一个或多个网格区块。It should be noted that in the embodiment of the present application, each isomorphic block may include at least one isomorphic block with the same expression format. Exemplarily, a homogeneous block in point cloud format includes one or more point cloud blocks, a homogeneous area in multi-viewpoint video format includes one or more multi-viewpoint video blocks, and a homogeneous area in grid format includes one or more point cloud blocks. A block consists of one or more grid blocks.
在一些实施例中,步骤601可以为:对一种表达格式的视觉媒体内容进行处理,得到一种同构区块。在一些实施例中,步骤601可以为:对至少两种表达格式的视觉媒体内容进行处理,得到至少两种同构区块,其中,不同视觉媒体内容对应的表达格式不同。具体地,对第一表达格式的视觉媒体内容进行处理,得到第一表达格式的同构区块;对第二表达格式的视觉多媒体内容进行处理,得到第二表达格式的同构区块。其中,第一表达格式为多视点视频、点云、网格中的一个,第二表达格式为多视点视频、点云、网格中的一个,第一表达格式和第二表达格式为不同表达格式。In some embodiments, step 601 may be: processing visual media content in an expression format to obtain a homogeneous block. In some embodiments, step 601 may include: processing visual media content in at least two expression formats to obtain at least two isomorphic blocks, where different visual media content corresponds to different expression formats. Specifically, the visual media content in the first expression format is processed to obtain isomorphic blocks in the first expression format; the visual multimedia content in the second expression format is processed to obtain isomorphic blocks in the second expression format. Wherein, the first expression format is one of multi-view video, point cloud, and grid, the second expression format is one of multi-view video, point cloud, and grid, and the first expression format and the second expression format are different expressions. Format.
也就是说,上述视觉媒体内容包括多视点视频、点云、网格等至少一种表达格式的视觉媒体内容。当包含一种表达格式的视觉媒体内容,对视觉媒体内容进行处理,得到一种表达格式的同构区块。当包含多种表达格式的视觉媒体内容,对视觉媒体内容进行处理,得到多种表达格式的同构区块。That is to say, the above-mentioned visual media content includes visual media content in at least one expression format such as multi-viewpoint video, point cloud, grid, etc. When visual media content of an expression format is included, the visual media content is processed to obtain isomorphic blocks of an expression format. When visual media content of multiple expression formats is included, the visual media content is processed to obtain isomorphic blocks of multiple expression formats.
需要说明的是,区块也可以称为条带(tile),即点云区块也可以称为点云条带,多视点视频区块也可以称为多视点视频条带,网格区块也可以称为网格条带。区块可以为具有特定的形状的拼接图,例如为具有特定的长度和/或高度的矩形区域的拼接图。例如,可以对至少一个子图块进行有序拼接,如 按照子图块的面积从大到小,或者按照子图块的长度和/或高度从大到小进行拼接,得到视觉媒体内容对应的区块。可选的,一个区块可以精确映射到一个地图集(atlas)图块。It should be noted that blocks can also be called tiles, that is, point cloud blocks can also be called point cloud strips, multi-viewpoint video blocks can also be called multi-viewpoint video strips, and grid blocks Also called grid strips. The block may be a mosaic of a specific shape, for example, a mosaic of a rectangular area with a specific length and/or height. For example, at least one sub-tile can be spliced in an orderly manner, such as from large to small according to the area of the sub-tiles, or from large to small according to the length and/or height of the sub-tiles, to obtain the visual media content corresponding to block. Optionally, a tile can be mapped exactly to an atlas tile.
在一些实施例中,区块中的各子图块可以具有块标识(patchID),以对同一区块中的不同子图块进行区别。例如,同一区块中可以包括子图块1(patch1)、子图块2(patch2)和子图块3(patch3)。In some embodiments, each sub-tile in a block may have a patch ID (patchID) to distinguish different sub-tiles in the same block. For example, the same block may include sub-patch 1 (patch1), sub-patch 2 (patch2), and sub-patch 3 (patch3).
进一步的,同构区块中每个子图块对应的表达格式均相同,例如,同构区块中的各子图块均为多视点视频子图块,或者均为点云子图块等同一表达格式的子图块。同构区块中每个子图块对应的表达格式即该同构区块对应的表达格式。Furthermore, the expression format corresponding to each sub-block in the isomorphic block is the same. For example, each sub-block in the isomorphic block is a multi-view video sub-block, or is a point cloud sub-block, etc. A subtile for the expression format. The expression format corresponding to each sub-block in the isomorphic block is the expression format corresponding to the isomorphic block.
在一些实施例中,同构区块可以具有区块标识(tileID),以对相同表达格式的不同区块进行区分。例如,点云区块可以包括点云区块1或点云区块2。例如,多个视觉媒体内容中包括点云和多视点视频,对点云进行处理,得到点云区块,点云区块1中包括点云子图块1至子图块3;对多视点视频进行处理,得到多视点视频区块,多视点视频区块中包括多视点视频子图块1至子图块4。In some embodiments, homogeneous tiles may have tile identifiers (tileIDs) to distinguish different tiles of the same expression format. For example, the point cloud block may include point cloud block 1 or point cloud block 2. For example, multiple visual media contents include point clouds and multi-viewpoint videos. The point clouds are processed to obtain point cloud blocks. Point cloud block 1 includes point cloud sub-blocks 1 to 3; for multi-view points The video is processed to obtain a multi-viewpoint video block, which includes multi-viewpoint video sub-blocks 1 to 4.
当需要对一个表达格式的视觉媒体内容进行处理时,得到一个表达格式的同构区块。当需要对至少两个视觉媒体内容进行处理时,得到至少两个表达格式的同构区块。为了提高压缩效率,本申请实施例对这至少两个视觉媒体内容进行处理,例如打包(也称为拼接)处理,得到至少两个视觉媒体内容中每个视觉媒体内容对应的区块。例如,可以对至少两个视觉媒体内容对应的子图块(patch)进行拼接得到区块。应注意,本申请实施例对至少两个视觉媒体内容分别进行处理,得到区块的方式不做限制。When visual media content of an expression format needs to be processed, a homogeneous block of the expression format is obtained. When at least two visual media contents need to be processed, at least two isomorphic blocks of expression formats are obtained. In order to improve compression efficiency, embodiments of the present application process the at least two visual media contents, such as packaging (also called splicing) processing, to obtain blocks corresponding to each visual media content in the at least two visual media contents. For example, the block can be obtained by splicing sub-tiles (patches) corresponding to at least two visual media contents. It should be noted that the embodiment of the present application processes at least two visual media contents separately, and the method of obtaining blocks is not limited.
在一种可能的实现方式中,视觉媒体内容包括多视点视频和点云两个表达格式的视觉媒体内容,所述对至少一种表达格式的视觉媒体内容进行处理,得到至少一种同构区块,包括:对获取的多视点视频进行投影和去冗余处理后,将不重复像素点连通成视频子图块,且将视频子图块拼接成多视点视频区块;以及对获取的点云进行平行投影,将投影面中的连通点组成点云子图块,且将点云子图块拼接成点云区块。In a possible implementation, the visual media content includes visual media content in two expression formats: multi-view video and point cloud. The visual media content in at least one expression format is processed to obtain at least one isomorphic region. block, including: after projecting and de-redundant processing of the acquired multi-viewpoint video, connecting non-repeating pixel points into video sub-blocks, and splicing the video sub-blocks into multi-viewpoint video blocks; and processing the acquired points The cloud performs parallel projection, and the connected points in the projection surface are composed of point cloud sub-blocks, and the point cloud sub-blocks are spliced into point cloud blocks.
具体的,对于多视点视频,以MPEG-I为例,选择有限数量视点作为基础视点且尽可能表达场景的可视范围,基础视点作为完整图像传输,去除剩余非基础视点与基础视点之间的冗余像素,即仅保留非重复表达的有效信息,再将有效信息提取为子块图像与基础视点图像进行重组织,形成更大的条带形图像,该条带形图像称为多视点视频区块。Specifically, for multi-viewpoint videos, taking MPEG-I as an example, a limited number of viewpoints are selected as base viewpoints and express the visible range of the scene as much as possible. The base viewpoints are transmitted as complete images, and the gaps between the remaining non-base viewpoints and the base viewpoints are removed. Redundant pixels, that is, only the effective information of non-repeated expressions is retained, and then the effective information is extracted into sub-block images and basic viewpoint images and reorganized to form a larger strip-shaped image. This strip-shaped image is called a multi-viewpoint video block.
在一些实施例中,上述视觉媒体内容为同一个三维空间中同时呈现的媒体内容。在一些实施例中,上述视觉媒体内容为同一个三维空间中不同时间呈现媒体内容。在一些实施例中,上述视觉媒体内容还可以是不同三维空间的媒体内容。即本申请实施例中,对上述至少两个视觉媒体内容不做具体限制。In some embodiments, the above-mentioned visual media content is media content presented simultaneously in the same three-dimensional space. In some embodiments, the visual media content is media content presented at different times in the same three-dimensional space. In some embodiments, the above-mentioned visual media content may also be media content in different three-dimensional spaces. That is to say, in the embodiments of this application, there are no specific restrictions on the at least two visual media contents mentioned above.
步骤602:对所述至少一种同构区块进行拼接,得到至少一个拼接图和拼接图信息,其中,所述拼接图信息包括第一语法元素,根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图,所述异构混合拼接图包括至少两种同构区块,所述同构拼接图包括一种同构区块;Step 602: Splice the at least one isomorphic block to obtain at least one splicing graph and splicing graph information, wherein the splicing graph information includes a first syntax element, and the splicing is determined according to the first syntax element. The picture shows a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram. The heterogeneous hybrid splicing diagram includes at least two types of isomorphic blocks, and the isomorphic splicing diagram includes one type of isomorphic block;
在一些实施例中,所述对所述至少一种同构区块进行拼接,得到至少一个拼接图和拼接图信息,包括:对至少两种表达格式的同构区块进行异构拼接,生成异构混合拼接图和拼接图信息;对相同表达格式的同构区块进行同构拼接,生成同构拼接图和拼接图信息。In some embodiments, the splicing of the at least one homogeneous block to obtain at least one splicing diagram and splicing diagram information includes: heterogeneously splicing homogeneous blocks of at least two expression formats to generate Heterogeneous mixed splicing diagrams and splicing diagram information; isomorphic splicing of homogeneous blocks with the same expression format to generate isomorphic splicing diagrams and splicing diagram information.
示例性的,至少一种同构区块包括第一表达格式的同构区块和第二表达格式的同构区块。该方法具体包括:对第一表达格式的同构区块进行同构拼接,得到第一同构拼接图和拼接图信息,对第二表达格式的同构区块进行同构拼接,得到第二同构拼接图和拼接图信息;或者,对第一表达格式的同构区块和第二表达格式的同构区块进行异构拼接,得到异构混合拼接图和拼接图信息;或者,对第一表达格式的同构区块进行同构拼接,得到第一同构拼接图和拼接图信息,对第一表达格式的同构区块和第二表达格式的同构区块进行异构拼接,得到异构混合拼接图和拼接图信息;或者,对第二表达格式的同构区块进行同构拼接,得到第二同构拼接图和拼接图信息,对第一表达格式的同构区块和第二表达格式的同构区块进行异构拼接,得到异构混合拼接图和拼接图信息。Exemplarily, at least one isomorphic block includes a isomorphic block in a first expression format and a isomorphic block in a second expression format. The method specifically includes: isomorphically splicing the isomorphic blocks of the first expression format to obtain the first isomorphic splicing diagram and splicing diagram information, and isomorphically splicing the isomorphic blocks of the second expression format to obtain the second isomorphic splicing diagram. Homogeneous splicing diagram and splicing diagram information; or, performing heterogeneous splicing on the isomorphic blocks of the first expression format and the isomorphic blocks of the second expression format to obtain heterogeneous mixed splicing diagram and splicing diagram information; or, perform heterogeneous splicing on The isomorphic blocks in the first expression format are isomorphically spliced to obtain the first isomorphic splicing diagram and the splicing diagram information, and the isomorphic blocks in the first expression format and the isomorphic blocks in the second expression format are heterogeneously spliced. , obtain a heterogeneous mixed splicing diagram and splicing diagram information; or perform isomorphic splicing on the isomorphic blocks of the second expression format to obtain the second isomorphic splicing diagram and splicing diagram information, and perform isomorphic splicing on the isomorphic areas of the first expression format Blocks are heterogeneously spliced with homogeneous blocks in the second expression format to obtain heterogeneous mixed splicing images and splicing image information.
也就是说,同构拼接图中可以包括同一个表达格式的一个同构区块或者多个同构区块,异构混合拼接图包括至少两种表达格式的至少两个同构区块。本申请实施例中,其中,第一表达格式为多视点视频、点云、网格中的一个,第二表达格式为多视点视频、点云、网格中的一个,第一表达格式和第二表达格式为不同表达格式。如图7所示,多视点视频区块1、多视点视频区块2和点云区块1拼接得到一种异构混合拼接图。That is to say, the homogeneous splicing diagram may include one isomorphic block or multiple isomorphic blocks of the same expression format, and the heterogeneous mixed splicing diagram includes at least two isomorphic blocks of at least two expression formats. In the embodiment of the present application, the first expression format is one of multi-view video, point cloud, and grid, the second expression format is one of multi-view video, point cloud, and grid, and the first expression format and the third expression format are one of multi-view video, point cloud, and grid. The two expression formats are different expression formats. As shown in Figure 7, multi-viewpoint video block 1, multi-viewpoint video block 2 and point cloud block 1 are spliced to obtain a heterogeneous hybrid stitching image.
示例性的,第一表达格式为多视点视频,第二表达格式为点云。所述对所述至少一种同构区块进行拼接,得到至少一个拼接图和拼接图信息,包括:将一部分多视点视频区块和一部分点云区块拼接成异构混合拼接图;将另一部分多视点视频区块拼接成多视点拼接图;将另一部分点云区块拼接成点云拼接图。For example, the first expression format is multi-viewpoint video, and the second expression format is point cloud. The splicing of the at least one homogeneous block to obtain at least one spliced image and spliced image information includes: splicing a part of the multi-viewpoint video block and a part of the point cloud block into a heterogeneous hybrid spliced image; Part of the multi-viewpoint video blocks are spliced into a multi-viewpoint spliced image; another part of the point cloud blocks are spliced into a point cloud spliced image.
其中,所述拼接图信息包括第一语法元素,所述第一语法元素用于指示拼接图为异构混合拼接图或 者同构拼接图。Wherein, the mosaic image information includes a first syntax element, and the first syntax element is used to indicate that the mosaic image is a heterogeneous hybrid mosaic image or a homogeneous mosaic image.
在一些实施例中,所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,包括:所述第一语法元素为第一预设值,则确定所述拼接图为包括第一表达格式和第二表达格式的同构区块的异构混合拼接图,其中,所述第一表达格式和所述第二表达格式为不同表达格式;所述第一语法元素为第二预设值,则确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;所述第一语法元素为第三预设值,则确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。也就是说,通过为第一语法元素设置不同值,用于指示拼接图类型。进一步地,所述第一语法元素还可以设置为其他值,用于指示拼接图为包括其他表达格式的同构区块的同构拼接图,或者用于指示拼接图为包括其他至少两种表达格式的同构区块的异构拼接图。In some embodiments, determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram includes: if the first syntax element is a first preset value, then determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram. The figure shows a heterogeneous mixed splicing diagram including homogeneous blocks of a first expression format and a second expression format, wherein the first expression format and the second expression format are different expression formats; the first syntax element is the second preset value, then it is determined that the splicing diagram is a isomorphic splicing diagram including the isomorphic blocks of the first expression format; the first syntax element is the third preset value, then it is determined that the splicing The figure shows a isomorphic mosaic diagram including the isomorphic blocks of the second expression format. That is to say, by setting different values for the first syntax element, it is used to indicate the mosaic type. Further, the first syntax element can also be set to other values to indicate that the spliced graph is a isomorphic spliced graph that includes isomorphic blocks of other expression formats, or to indicate that the spliced graph includes at least two other expressions. Heterogeneous mosaic graph of homogeneous blocks in format.
在一些实施例中,所述第一语法元素包括至少两个子语法元素。示例性的,所述第一语法元素包括:第一子语法元素和第二子语法元素,根据所述第一子语法元素和所述第二子语法元素确定所述拼接图为异构混合拼接图或者同构拼接图;所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,包括:所述第一子语法元素为第四预设值,则确定所述拼接图包括第一表达格式的同构区块;所述第二子语法元素为第五预设值,则确定所述拼接图包括第二表达格式的同构区块。In some embodiments, the first syntax element includes at least two sub-syntax elements. Exemplarily, the first syntax element includes: a first sub-grammar element and a second sub-grammar element. According to the first sub-grammar element and the second sub-grammar element, it is determined that the splicing diagram is heterogeneous hybrid splicing. graph or isomorphic splicing graph; determining that the splicing graph is a heterogeneous hybrid splicing graph or a homogeneous splicing graph according to the first syntax element includes: the first sub-grammar element is a fourth preset value, then it is determined that the splicing graph is a heterogeneous hybrid splicing graph or a isomorphic splicing graph. If the spliced graph includes isomorphic blocks of the first expression format; if the second sub-syntax element is the fifth preset value, it is determined that the spliced graph includes isomorphic blocks of the second expression format.
可以理解为,当所述第一子语法元素为第四预设值,则确定所述拼接图包括第一表达格式的同构区块,即确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;所述第二子语法元素为第五预设值,则确定所述拼接图包括第二表达格式的同构区块,即确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图;当所述第一子语法元素为第四预设值且所述第二子语法元素为第五预设值,则确定所述拼接图包括第一表达格式的同构区块和第二表达格式的同构区块,即确定所述拼接图为包括第一表达格式和第二表达格式的同构区块的异构混合拼接图。It can be understood that when the first sub-syntax element is the fourth preset value, it is determined that the spliced graph includes isomorphic blocks of the first expression format, that is, it is determined that the spliced graph includes the first expression format. isomorphic splicing diagram of isomorphic blocks; the second sub-grammar element is the fifth preset value, then it is determined that the splicing diagram includes isomorphic blocks of the second expression format, that is, it is determined that the splicing diagram includes Isomorphic mosaic diagram of the isomorphic blocks of the second expression format; when the first sub-grammar element is a fourth preset value and the second sub-grammar element is a fifth preset value, then it is determined that the The spliced diagram includes homogeneous blocks in the first expression format and isomorphic blocks in the second expression format, that is, the spliced diagram is determined to be a heterogeneous hybrid splicing including homogeneous blocks in the first expression format and the second expression format. picture.
在一些实施例中,所述方法还包括:所述第一子语法元素为第六预设值,则确定所述拼接图不包括第一表达格式的同构区块;所述第二子语法元素为第七预设值,则确定所述拼接图不包括第二表达格式的同构区块。In some embodiments, the method further includes: when the first sub-grammar element is a sixth preset value, it is determined that the splicing diagram does not include isomorphic blocks of the first expression format; the second sub-grammar element If the element is the seventh preset value, it is determined that the mosaic image does not include isomorphic blocks in the second expression format.
具体地,所述第一子语法元素为第四预设值且所述第二子语法元素为第五预设值,确定所述拼接图为包括第一表达格式的同构区块和第二表达格式的同构区块的异构混合拼接图,所述第一子语法元素为第四预设值且所述第二子语法元素为第七预设值,确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;所述第一子语法元素为第六预设值且所述第二子语法元素为第五预设值,确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。Specifically, the first sub-grammar element is a fourth preset value and the second sub-grammar element is a fifth preset value, and it is determined that the mosaic diagram includes isomorphic blocks of the first expression format and the second A heterogeneous mixed mosaic diagram of homogeneous blocks in an expression format, the first sub-grammar element is a fourth preset value and the second sub-grammar element is a seventh preset value, and it is determined that the mosaic diagram includes all The isomorphic mosaic diagram of the isomorphic blocks of the first expression format; the first sub-grammar element is the sixth preset value and the second sub-grammar element is the fifth preset value, and it is determined that the mosaic diagram is A isomorphic mosaic diagram including the isomorphic blocks of the second expression format.
也就是说,还可以根据两个子语法元素的取值确定拼接图中同构区块的表达格式。进一步地,拼接图包括更多表达格式时,还可以通过多个语法元素来指示拼接图中同构区块的表达格式。例如,包括三种表达格式时,设置三个语法元素,包括四种表达格式时设置四个语法元素,也可以通过一个语法元素设置多种取值,来表示多种表达格式。In other words, the expression format of the isomorphic block in the splicing diagram can also be determined based on the values of the two sub-grammatical elements. Furthermore, when the splicing diagram includes more expression formats, multiple syntax elements can also be used to indicate the expression formats of the isomorphic blocks in the splicing diagram. For example, when three expression formats are included, three syntax elements are set, and when four expression formats are included, four syntax elements are set. Multiple values can also be set through one syntax element to represent multiple expression formats.
在一些实施例中,所述第一语法元素位于所述码流的参数集中。示例性的,码流的参数集可以为V3C_VPS,第一语法元素可以为V3C_VPS中的ptl_profile_toolset_idc。In some embodiments, the first syntax element is located in a parameter set of the code stream. For example, the parameter set of the code stream may be V3C_VPS, and the first syntax element may be ptl_profile_toolset_idc in V3C_VPS.
在一些实施例中,所述拼接图对应的拼接图序列参数集包括所述第一语法元素。示例性的,所述拼接图对应的拼接图序列参数集包括所述第一子语法元素和所述第二子语法元素。示例性的,第一子语法元素为拼接图序列参数集中的asps_vpcc_extension_present_flag,第二子语法元素为asps_miv_extension_present_flag。In some embodiments, the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element. Exemplarily, the splicing graph sequence parameter set corresponding to the splicing graph includes the first sub-syntax element and the second sub-syntax element. For example, the first sub-syntax element is asps_vpcc_extension_present_flag in the splicing diagram sequence parameter set, and the second sub-syntax element is asps_miv_extension_present_flag.
也就是说,第一语法元素可以位于码流的参数集中,解码端能够更早的解析到每个拼接图的拼接图类型。第一语法元素也可以位于每个拼接图对应的拼接图序列参数集中,解码端在解析每个拼接图时获取再确定拼接图类型。In other words, the first syntax element can be located in the parameter set of the code stream, and the decoding end can parse the splicing pattern type of each splicing pattern earlier. The first syntax element may also be located in the mosaic sequence parameter set corresponding to each mosaic image, and the decoding end obtains and then determines the mosaic image type when parsing each mosaic image.
在一些实施例中,本申请实施例的异构混合拼接图包括以下至少一种:单一属性异构混合拼接图和多属性异构混合拼接图。In some embodiments, the heterogeneous hybrid mosaic graph of the embodiment of the present application includes at least one of the following: a single attribute heterogeneous hybrid mosaic graph and a multi-attribute heterogeneous hybrid mosaic graph.
其中,单一属性异构混合拼接图是指包括的所有同构区块的属性信息均相同的异构混合拼接图。例如,一张单一属性异构混合拼接图只包括属性信息的同构区块,比如只包括多视点视频纹理区块和点云纹理区块。又例如,一张单一属性异构混合拼接图只包括几何信息的同构区块,比如只包括多视点视频几何区块和点云几何区块。Among them, the single-attribute heterogeneous hybrid splicing diagram refers to the heterogeneous hybrid splicing diagram in which the attribute information of all homogeneous blocks included is the same. For example, a single attribute heterogeneous hybrid mosaic image only includes homogeneous blocks of attribute information, such as only multi-view video texture blocks and point cloud texture blocks. For another example, a single-attribute heterogeneous hybrid mosaic image only includes homogeneous blocks of geometric information, such as only multi-view video geometry blocks and point cloud geometry blocks.
多属性异构混合拼接图是指包括的至少两个同构区块的属性信息不同的异构混合拼接图,例如一张多属性异构混合拼接图中既包括属性信息的同构区块,又包括几何信息的同构区块。作为示例,可以将点云、多视点视频和网格中至少两个的任意一个属性或任意两个属性下的区块拼接在一张图中,得到异构混合拼接图。本申请对此不做限定。A multi-attribute heterogeneous hybrid mosaic map refers to a heterogeneous hybrid mosaic map that includes at least two homogeneous blocks with different attribute information. For example, a multi-attribute heterogeneous hybrid mosaic map includes both homogeneous blocks with attribute information. Also includes isomorphic blocks of geometric information. As an example, any attribute or blocks under any two attributes of at least two of the point cloud, multi-viewpoint video and grid can be spliced into one image to obtain a heterogeneous hybrid spliced image. This application does not limit this.
在一些实施例中,对第一表达格式的单一属性同构区块和第二表达格式的单一属性区块进行拼接, 得到异构混合拼接图。其中,第一表达格式和第二表达格式均为多视点视频、点云和网格中的任意一个,且第一表达格式和所述第二表达格式不同,第一表达格式和第二表达格式的属性信息相同。In some embodiments, the single-attribute homogeneous blocks in the first expression format and the single-attribute blocks in the second expression format are spliced to obtain a heterogeneous hybrid spliced image. Wherein, the first expression format and the second expression format are any one of multi-view video, point cloud, and grid, and the first expression format and the second expression format are different. The first expression format and the second expression format The attribute information is the same.
多视点视频的单一属性同构区块包括多视点视频纹理区块和多视点视频几何区块等中的至少一个。The single attribute isomorphic block of the multi-view video includes at least one of a multi-view video texture block, a multi-view video geometry block, and the like.
点云的单一属性同构区块包括点云纹理区块、点云几何区块和点云占用情况区块等中的至少一个。The single attribute isomorphic block of the point cloud includes at least one of a point cloud texture block, a point cloud geometry block, a point cloud occupancy block, and the like.
网格的单一属性同构区块包括网格纹理区块、网格几何区块中的至少一个。The single attribute isomorphic block of the grid includes at least one of a grid texture block and a grid geometry block.
例如,将多视点视频几何区块、点云几何区块、网格几何区块中的至少两个拼接在一张图中,得到一张异构混合拼接图。该异构混合拼接图称为单一属性异构混合拼接图。再例如,将多视点视频纹理区块、点云纹理区块、网格纹理区块中的至少两个拼接在一张图中,得到一张异构混合拼接图。该异构混合拼接图称为单一属性异构混合拼接图。For example, at least two of the multi-viewpoint video geometry blocks, point cloud geometry blocks, and grid geometry blocks are spliced into one image to obtain a heterogeneous hybrid spliced image. This heterogeneous mixed mosaic diagram is called a single attribute heterogeneous mixed mosaic diagram. For another example, at least two of the multi-viewpoint video texture blocks, point cloud texture blocks, and grid texture blocks are spliced into one image to obtain a heterogeneous hybrid spliced image. This heterogeneous mixed mosaic diagram is called a single attribute heterogeneous mixed mosaic diagram.
在一些实施例中,对第一表达格式的多属性同构区块和第二表达格式的多属性同构区块进行拼接,得到异构混合拼接图。其中,第一表达格式和第二表达格式均为多视点视频、点云和网格中的任意一个,且第一表达格式和所述第二表达格式不同,第一表达格式和第二表达格式的属性信息不完全相同。In some embodiments, the multi-attribute isomorphic blocks in the first expression format and the multi-attribute isomorphic blocks in the second expression format are spliced to obtain a heterogeneous hybrid spliced image. Wherein, the first expression format and the second expression format are any one of multi-view video, point cloud, and grid, and the first expression format and the second expression format are different. The first expression format and the second expression format The attribute information is not exactly the same.
例如,将多视点视频纹理区块,与点云几何区块和网格几何区块中的至少一个拼接在一张图中,得到一张异构混合拼接图。再例如,将多视点视频几何区块,与点云纹理区块和网格纹理区块中的至少一个拼接在一张图中,得到一张异构混合拼接图。再例如,将点云纹理区块,与多视点视频几何区块和网格几何区块中的至少一个拼接在一张图中,得到一张异构混合拼接图。再例如,将点云几何区块,与多视点视频纹理区块和网格纹理区块中的至少一个拼接在一张图中,得到一张异构混合拼接图。再例如,将点云几何区块、多视点视频纹理区块和多视点视频纹理区块拼接在一张图中,得到一张异构混合拼接图。再例如,将点云几何区块、点云纹理区块和多视点视频纹理区块和多视点视频纹理区块拼接在一张图中,得到一张异构混合拼接图。这里,得到的异构混合拼接图称为多属性异构混合拼接图。For example, the multi-viewpoint video texture block is spliced into one picture with at least one of the point cloud geometry block and the mesh geometry block to obtain a heterogeneous hybrid spliced picture. For another example, a multi-viewpoint video geometry block is spliced into one picture with at least one of a point cloud texture block and a mesh texture block to obtain a heterogeneous hybrid spliced picture. For another example, the point cloud texture block and at least one of the multi-viewpoint video geometry block and the mesh geometry block are spliced into one image to obtain a heterogeneous hybrid spliced image. For another example, the point cloud geometry block is spliced into one picture with at least one of the multi-viewpoint video texture block and the mesh texture block to obtain a heterogeneous hybrid spliced picture. For another example, point cloud geometry blocks, multi-viewpoint video texture blocks, and multi-viewpoint video texture blocks are spliced into one image to obtain a heterogeneous hybrid spliced image. For another example, point cloud geometry blocks, point cloud texture blocks, multi-viewpoint video texture blocks, and multi-viewpoint video texture blocks are spliced into one image to obtain a heterogeneous hybrid spliced image. Here, the obtained heterogeneous hybrid mosaic graph is called a multi-attribute heterogeneous hybrid mosaic graph.
下面以第一表达格式为多视点视频,第二表达格式为点云为例,对拼接方法进行详细介绍。The following takes the first expression format as multi-viewpoint video and the second expression format as point cloud as an example to introduce the splicing method in detail.
假设多视点视频区块包括多视点视频纹理区块和多视点视频几何区块,点云区块包括点云纹理区块、点云几何区块和点云占用情况区块。那么,上述的异构拼接方式可以包括但不限于如下两种:It is assumed that the multi-view video block includes a multi-view video texture block and a multi-view video geometry block, and the point cloud block includes a point cloud texture block, a point cloud geometry block and a point cloud occupancy block. Then, the above-mentioned heterogeneous splicing methods can include but are not limited to the following two:
方式一:将多视点视频纹理区块、多视点视频几何区块、点云纹理区块、点云几何区块和点云占用情况区块,均拼接在一张异构混合拼接图中。Method 1: Splice the multi-viewpoint video texture block, multi-viewpoint video geometry block, point cloud texture block, point cloud geometry block and point cloud occupancy block into a heterogeneous hybrid splicing image.
方式二:按照预设的异构拼接方式,将多视点视频纹理区块、多视点视频几何区块、点云纹理区块、点云几何区块和点云占用情况区块进行拼接,得到M个异构混合拼接图,M为大于或等于1的正整数。Method 2: According to the preset heterogeneous splicing method, splice the multi-view video texture block, multi-view video geometry block, point cloud texture block, point cloud geometry block and point cloud occupancy block to obtain M A heterogeneous mixed mosaic image, M is a positive integer greater than or equal to 1.
其中,方式二至少可以包括如下几种示例:示例1,将多视点视频纹理区块和点云纹理区块进行拼接,得到异构混合纹理拼接图,将多视点视频几何区块和点云几何区块进行拼接,得到异构混合几何拼接图,将点云占用情况区块单独作为一张混合拼接图。示例2,将多视点视频纹理区块和点云纹理区块进行拼接,得到异构混合纹理拼接图,将多视点视频几何区块、点云几何区块和点云占用情况区块进行拼接,得到异构混合几何和占用情况拼接图。示例3,将多视点视频纹理区块、点云纹理区块和点云占用情况区块进行拼接,得到一张子异构混合拼接图,将将多视点视频几何区块和点云几何区块进行拼接,得到另一张子异构混合拼接图。进一步地,得到M个异构混合拼接图后,可以对该M个异构混合拼接图分别进行视频编码,得到视频压缩子码流。Among them, the second method can include at least the following examples: Example 1, splicing multi-view video texture blocks and point cloud texture blocks to obtain a heterogeneous mixed texture splicing map, and combining multi-view video geometry blocks and point cloud geometry The blocks are spliced to obtain a heterogeneous mixed geometry splicing map, and the point cloud occupancy blocks are separately used as a mixed splicing map. Example 2: Splice multi-view video texture blocks and point cloud texture blocks to obtain a heterogeneous mixed texture splicing map. Splice multi-view video geometry blocks, point cloud geometry blocks and point cloud occupancy blocks. A mosaic of heterogeneous mixed geometry and occupancy is obtained. Example 3: Splice the multi-view video texture block, the point cloud texture block and the point cloud occupancy block to obtain a sub-heterogeneous hybrid stitching image, which combines the multi-view video geometry block and the point cloud geometry block. Perform splicing to obtain another sub-heterogeneous hybrid splicing picture. Further, after obtaining M heterogeneous mixed spliced images, video coding can be performed on the M heterogeneous mixed spliced images respectively to obtain video compression sub-streams.
在一些实施例中,本申请实施例的同构拼接图包括以下至少一种:单一属性同构拼接图和多属性同构拼接图。在一些实施例中,对第一表达格式的第一属性同构区块进行拼接,得到同构拼接图。或者,第一表达格式的第一属性同构区块和第二属性同构区块进行拼接,得到同构拼接图。In some embodiments, the isomorphic splicing graph of the embodiment of the present application includes at least one of the following: a single attribute isomorphic splicing graph and a multi-attribute isomorphic splicing graph. In some embodiments, the first attribute isomorphic blocks of the first expression format are spliced to obtain an isomorphic splicing graph. Alternatively, the first attribute isomorphic block and the second attribute isomorphic block of the first expression format are spliced to obtain an isomorphic splicing diagram.
其中,单一属性同构拼接图是指包括的所有同构区块的表达格式相同和属性信息均相同的同构拼接图。例如,一张单一属性同构拼接图只包括第表达格式的属性信息的同构区块,比如张单一属性同构拼接图只包括多视点视频纹理区块,或只包括点云纹理区块。又例如,一张单一属性同构拼接图只包括几何信息的同构区块,比如只包括多视点视频几何区块,或只包括点云几何区块。Among them, a single-attribute isomorphic splicing diagram refers to a isomorphic splicing diagram in which all isomorphic blocks included have the same expression format and the same attribute information. For example, a single-attribute isomorphic mosaic image only includes isomorphic blocks that express attribute information in the first format. For example, a single-attribute isomorphic mosaic image only includes multi-view video texture blocks, or only point cloud texture blocks. For another example, a single-attribute isomorphic mosaic image only includes isomorphic blocks of geometric information, such as only multi-view video geometric blocks, or only point cloud geometric blocks.
多属性同构拼接图是指包括的至少两个同构区块的表达格式相同但属性信息不同的同构拼接图,例如一张多属性同构拼接图中既包括属性信息的同构区块,又包括几何信息的同构区块。作为示例,一张多属性同构拼接图包括多视点视频纹理区块和多视点视频集合区块。又例如,一张多属性同构拼接图包括点云几何区块和点云纹理区块,如图8所示,一张多属性同构拼接图包括点云纹理区块1、点云几何区块1和点云几何区块2。A multi-attribute isomorphic spliced graph refers to an isomorphic spliced graph that includes at least two isomorphic blocks with the same expression format but different attribute information. For example, a multi-attribute isomorphic spliced graph includes both isomorphic blocks with attribute information. , and also includes isomorphic blocks of geometric information. As an example, a multi-attribute isomorphic mosaic image includes multi-viewpoint video texture blocks and multi-viewpoint video collection blocks. For another example, a multi-attribute isomorphic mosaic image includes a point cloud geometry block and a point cloud texture block. As shown in Figure 8, a multi-attribute isomorphic mosaic image includes a point cloud texture block 1 and a point cloud geometry area. Block 1 and Point Cloud Geometry Block 2.
在一些实施例中,拼接图信息还可以包括语法元素,根据所述语法元素确定拼接图为单一属性异构混合拼接图、多属性异构混合拼接图、单一属性同构拼接图或者多属性同构拼接图。In some embodiments, the spliced image information may also include syntax elements, according to which the spliced image is determined to be a single-attribute heterogeneous hybrid spliced image, a multi-attribute heterogeneous hybrid spliced image, a single-attribute isomorphic spliced image, or a multi-attribute homogeneous spliced image. Construct a mosaic diagram.
步骤603:对所述至少一个拼接图和拼接图信息进行编码,得到码流。Step 603: Encode the at least one spliced image and the spliced image information to obtain a code stream.
在一些实施例中,码流包括视频压缩子码流和拼接图信息子码流。所述对所述至少一个拼接图和拼接图信息进行编码,得到码流,包括:对所述至少一个拼接图进行编码,得到视频压缩子码流;对所述 至少一个拼接图的拼接图信息进行编码,得到拼接图信息子码流;将所述视频压缩子码流和所述拼接图信息子码流合成所述码流。这样,实现在同一压缩码流中支持视频、点云、网格等异构信源格式,实现压缩码流中同时存在多视点视频拼接图、点云视频拼接图、网格拼接图、异构混合拼接图,能够减少所需要调用的HEVC,VVC,AVC,AVS等二维视频编码器的个数,降低实现代价,提高易用性。In some embodiments, the code stream includes a video compression sub-stream and a splicing image information sub-stream. The encoding of the at least one spliced image and the spliced image information to obtain a code stream includes: encoding the at least one spliced image to obtain a video compression sub-stream; and encoding the spliced image information of the at least one spliced image. Encoding is performed to obtain a splicing image information sub-stream; the video compression sub-stream and the splicing image information sub-stream are synthesized into the code stream. In this way, it is possible to support heterogeneous source formats such as video, point cloud, and grid in the same compressed code stream, and to realize the simultaneous existence of multi-viewpoint video splicing images, point cloud video splicing images, grid splicing images, and heterogeneous formats in the compressed code stream. Hybrid splicing can reduce the number of 2D video encoders such as HEVC, VVC, AVC, and AVS that need to be called, reduce implementation costs, and improve ease of use.
在一些实施例中,根据所述第一语法元素确定所述拼接图为异构混合拼接图时,所述拼接图信息还包括第二语法元素,根据所述第二语法元素确定所述拼接图中第i个区块的表达格式。编码端将第二语法元素写入码流,能够有助于提高解码端的解码准确性,同时能够使得V3C标准支持在同一压缩码流中包含多视点视频和点云等不同表达格式的视觉媒体内容。In some embodiments, when it is determined based on the first syntax element that the spliced image is a heterogeneous hybrid spliced image, the spliced image information also includes a second syntax element, and the spliced image is determined based on the second syntax element The expression format of the i-th block in . The encoding end writes the second syntax element into the code stream, which can help improve the decoding accuracy of the decoding end, and at the same time enable the V3C standard to support visual media content in different expression formats such as multi-view videos and point clouds in the same compressed code stream. .
在一些实施例中,所述根据所述第二语法元素确定所述拼接图中第i个区块的表达格式包括:所述第二语法元素为第八预设值,则确定所述第i个区块的表达格式为第一表达格式;所述第二语法元素为第九预设值,则确定所述第i个区块的表达格式为第二表达格式。In some embodiments, determining the expression format of the i-th block in the mosaic diagram based on the second syntax element includes: if the second syntax element is an eighth preset value, then determining the i-th block The expression format of the i-th block is the first expression format; if the second syntax element is the ninth preset value, it is determined that the expression format of the i-th block is the second expression format.
具体的,可以通过对第二语法元素置不同的值来指示拼接图中第i个区块对应的表达格式类型。以第一表达格式为点云,第二表达格式为多视点视频为例,若第i个区块为点云区块,则将第二语法元素置为第八预设值;若第i个区块为多视点视频区块,则将第二语法元素置为第九预设值。本申请实施例对第八预设值和第九预设值的具体取值不做限定。可选的,第八预设值为0。可选的,第九预设值为1。Specifically, the expression format type corresponding to the i-th block in the splicing diagram can be indicated by setting different values to the second syntax element. Taking the first expression format as point cloud and the second expression format as multi-viewpoint video as an example, if the i-th block is a point cloud block, the second syntax element is set to the eighth default value; if the i-th block is a point cloud block, the second syntax element is set to the eighth default value; If the block is a multi-viewpoint video block, the second syntax element is set to the ninth default value. The embodiments of this application do not limit the specific values of the eighth preset value and the ninth preset value. Optional, the eighth preset value is 0. Optional, the ninth default value is 1.
在一些实施例中,所述对所述至少一个拼接图和拼接图信息进行编码,得到码流,包括:若所述第i个区块的表达格式为第一表达格式,确定所述第i个区块中子图块采用所述第一表达格式对应的编码标准进行编码,得到所述第一表达格式的视觉媒体内容对应的码流;若所述第i个区块的表达格式为第二表达格式,确定所述第i个区块中子图块采用所述第二表达格式对应的编码标准进行编码,得到所述第二表达格式的视觉媒体内容对应的码流。In some embodiments, encoding the at least one spliced image and the spliced image information to obtain a code stream includes: if the expression format of the i-th block is a first expression format, determining the i-th block The sub-tiles in each block are encoded using the encoding standard corresponding to the first expression format to obtain a code stream corresponding to the visual media content of the first expression format; if the expression format of the i-th block is the In the second expression format, it is determined that the sub-tiles in the i-th block are encoded using the encoding standard corresponding to the second expression format, and a code stream corresponding to the visual media content of the second expression format is obtained.
在一些实施例中,所述第二语法元素位于所述拼接图的第i个区块的拼接图区块数据单元头中。在一些实施例中,第二语法元素还可以位于子图块数据单元(patch_data_unit)中。示例性的,在已知第二语法元素(ath_toolset_type)为1的前提下,确定当前子图块采用多视点视频编码标准进行编码。在已知第二语法元素(ath_toolset_type)为0的前提下,确定当前子图块采用点云编码标准进行编码。In some embodiments, the second syntax element is located in the mosaic block data unit header of the i-th block of the mosaic map. In some embodiments, the second syntax element may also be located in a sub-patch data unit (patch_data_unit). For example, on the premise that the second syntax element (ath_toolset_type) is known to be 1, it is determined that the current sub-tile is encoded using the multi-view video coding standard. On the premise that the second syntax element (ath_toolset_type) is known to be 0, it is determined that the current sub-tile is encoded using the point cloud encoding standard.
在一些实施例中,所述对所述至少一个拼接图和拼接图信息进行编码,得到码流,包括:调用视频编码器,对所述至少一个拼接图进行编码,得到视频压缩子码流。In some embodiments, encoding the at least one spliced image and the spliced image information to obtain a code stream includes: calling a video encoder to encode the at least one spliced image to obtain a video compression sub-stream.
本申请实施例中,为了减少编码器的个数,降低编码代价,在编码时,首先将至少两个视觉媒体内容分别进行处理(即打包),得到多个同构区块。接着,将表达格式不完全相同的至少两个同构区块拼接成异构混合拼接图,将表达格式完全相同的至少一个同构区块拼接成同构拼接图,对异构混合拼接图和同构拼接图进行编码,得到视频压缩子码流。使得该编解码方法适用于多种表达格式的视觉媒体内容的应用场景,扩展了应用范围,而且通过将不同表达格式的同构区块拼接在一张异构混合拼接图中进行编码,在编码时,可以只调用一次视频编码器进行编码,进而减少了所需要调用的HEVC,VVC,AVC,AVS等二维视频编码器的个数,减少了编码代价,提高易用性。In the embodiment of the present application, in order to reduce the number of encoders and reduce the encoding cost, during encoding, at least two visual media contents are first processed separately (that is, packaged) to obtain multiple isomorphic blocks. Next, at least two homogeneous blocks with different expression formats are spliced into a heterogeneous mixed spliced graph, and at least one homogeneous block with exactly the same expression format is spliced into a homogeneous spliced graph. For the heterogeneous mixed spliced graph and The isomorphic splicing image is encoded to obtain the video compression sub-stream. This encoding and decoding method is suitable for application scenarios of visual media content in multiple expression formats, expanding the scope of application. Moreover, by splicing homogeneous blocks of different expression formats into a heterogeneous hybrid splicing image for encoding, during encoding, The video encoder can be called only once for encoding, thereby reducing the number of 2D video encoders such as HEVC, VVC, AVC, and AVS that need to be called, reducing encoding costs and improving ease of use.
本申请实施例中,对异构混合拼接图和同构拼接图进行视频编码,得到视频压缩子码流所使用的视频编码器,可以为上述图2A所示的视频编码器。也就是说,本申请实施例将异构混合拼接图或同构拼接图作为一帧图像,首先进行块划分,接着使用帧内或帧间预测得到编码块的预测值,编码块的预测值和原始值进行相减,得到残差值,对残差值进行变换和量化处理后,得到视频压缩子码流。In the embodiment of the present application, the video encoder used to perform video encoding on the heterogeneous hybrid splicing image and the homogeneous splicing image to obtain the video compression sub-stream can be the video encoder shown in Figure 2A above. That is to say, in the embodiment of the present application, the heterogeneous hybrid splicing image or the homogeneous splicing image is used as a frame image. Block division is first performed, and then intra-frame or inter-frame prediction is used to obtain the predicted value of the coding block. The predicted value of the coding block and The original values are subtracted to obtain the residual value. After transforming and quantizing the residual value, the video compression sub-stream is obtained.
本申请实施例中,在生成至少一个拼接图的同时,生成每个拼接图对应的拼接图信息。对拼接图信息进行编码得到拼接图信息子码流。其中,拼接图信息包括用于指示拼接图类型的第一语法元素,以及拼接图中每个同构区块的表达格式的第二语法元素。本申请实施例对拼接图信息进行编码的方式不做限制,例如使用等长编码或变长编码等常规数据压缩编码方式进行压缩。In the embodiment of the present application, while generating at least one mosaic image, the mosaic image information corresponding to each mosaic image is generated. The spliced image information is encoded to obtain the spliced image information sub-stream. Wherein, the splicing diagram information includes a first syntax element used to indicate the type of the splicing diagram, and a second syntax element used to indicate the expression format of each isomorphic block in the splicing diagram. The embodiments of the present application do not limit the method of encoding the spliced image information. For example, conventional data compression encoding methods such as equal-length encoding or variable-length encoding may be used for compression.
最后,将视频压缩子码流和拼接图信息子码流写在同一个码流中,得到最终的码流。也就是说,本申请实施例不仅实现在同一压缩码流中支持视频、点云、网格等异构信源格式和同构信源格式。Finally, the video compression sub-stream and the splicing image information sub-stream are written in the same code stream to obtain the final code stream. In other words, the embodiments of the present application not only support heterogeneous source formats such as video, point cloud, grid, etc., but also support homogeneous source formats in the same compressed code stream.
在一些实施例中,该方法还包括:将码流的参数集进行编码得到码流参数集子码流。具体地,编码端将视频压缩子码流、拼接图信息子码流和该参数集子码流合成码流。所述码流的参数集子码流中包括第三语法元素,根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流。也就是说,编码端通过发送第三语法元素,用于指示码流中是否同时包含至少两种表达格式的视觉媒体内容。示例性的,第三语法元素指示码流中包括一种表达格式的视觉媒体内容对应的码流时,可以理解为编码端对一种表达格式的视觉媒体内容进行处理得到一种同构区块,对一种同构区块进行拼接得到同构拼接图。第三语法元素指示码流中包括至少两种表达格式的视觉媒体内容对应的码流时,可以理解为编码端对至少两种表达格式的视觉媒体内容得到至少两种同构区块,对至少两种同构区块进行拼接得到同构拼接图和/或异构混合拼接图。In some embodiments, the method further includes: encoding the parameter set of the code stream to obtain a code stream parameter set sub-stream. Specifically, the encoding end synthesizes the video compression sub-stream, the splicing image information sub-stream and the parameter set sub-stream into a code stream. The parameter set sub-code stream of the code stream includes a third syntax element, and the code stream corresponding to the visual media content including at least one expression format in the code stream is determined according to the third syntax element. That is to say, the encoding end sends the third syntax element to indicate whether the code stream contains visual media content in at least two expression formats at the same time. For example, when the third syntax element indicates that the code stream includes a code stream corresponding to visual media content in an expression format, it can be understood that the encoding end processes the visual media content in an expression format to obtain a homogeneous block. , splicing a kind of isomorphic blocks to obtain a isomorphic splicing graph. When the third syntax element indicates that the code stream includes code streams corresponding to visual media content in at least two expression formats, it can be understood that the encoding end obtains at least two isomorphic blocks for the visual media content in at least two expression formats. Two homogeneous blocks are spliced to obtain a homogeneous spliced image and/or a heterogeneous hybrid spliced image.
示例性的,第三语法元素指示码流中包括至少两种表达格式的视觉媒体内容对应的码流时,该方法包括:对第一表达格式的同构区块进行同构拼接,得到第一同构拼接图,对第二表达格式的同构区块进行同构拼接,得到第二同构拼接图;或者,对第一表达格式的同构区块和第二表达格式的同构区块进行异构拼接,得到异构混合拼接图;或者,对第一表达格式的同构区块进行同构拼接,得到第一同构拼接图,对第一表达格式的同构区块和第二表达格式的同构区块进行异构拼接,得到异构混合拼接图;或者,对第二表达格式的同构区块进行同构拼接,得到第二同构拼接图,对第一表达格式的同构区块和第二表达格式的同构区块进行异构拼接,得到异构混合拼接图。Exemplarily, when the third syntax element indicates that the code stream includes code streams corresponding to visual media content in at least two expression formats, the method includes: isomorphically splicing the isomorphic blocks of the first expression format to obtain the first Isomorphic splicing diagram: perform isomorphic splicing on the isomorphic blocks of the second expression format to obtain the second isomorphic splicing diagram; or, perform isomorphic splicing on the isomorphic blocks of the first expression format and the isomorphic blocks of the second expression format. Perform heterogeneous splicing to obtain a heterogeneous mixed splicing diagram; or perform isomorphic splicing on the isomorphic blocks of the first expression format to obtain a first homogeneous splicing diagram, and perform homogeneous splicing on the isomorphic blocks of the first expression format and the second The isomorphic blocks of the expression format are heterogeneously spliced to obtain a heterogeneous mixed splicing diagram; or the isomorphic blocks of the second expression format are isomorphically spliced to obtain a second isomorphic splicing diagram. The homogeneous blocks and the homogeneous blocks in the second expression format are heterogeneously spliced to obtain a heterogeneous mixed splicing diagram.
在一些实施例中,通过将第三语法元素置为不同值来指示码流中包括至少一种表达格式的视觉媒体内容对应的码流。也就是说,第三语法元素的某些预设值能够表明码流中包括一种或多种表达格式的视觉媒体内容对应的码流。In some embodiments, setting the third syntax element to a different value indicates that the code stream includes a code stream corresponding to the visual media content of at least one expression format. That is to say, certain preset values of the third syntax element can indicate that the code stream includes code streams corresponding to visual media content in one or more expression formats.
示例性的,所述根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流,包括:所述第三语法元素为第一数值,确定所述码流中同时包括第一表达格式的视觉媒体内容对应的码流和第二表达格式的视觉媒体内容对应的码流;所述第三语法元素为第二数值,确定所述码流中包括所述第一表达格式的视觉媒体内容对应的码流;所述第三语法元素为第三数值,确定所述码流中包括所述第二表达格式的视觉媒体内容对应的码流。Exemplarily, determining the code stream corresponding to the visual media content including at least one expression format in the code stream according to the third syntax element includes: the third syntax element is a first value, and determining the The code stream simultaneously includes a code stream corresponding to the visual media content in the first expression format and a code stream corresponding to the visual media content in the second expression format; the third syntax element is a second value, which determines that the code stream includes all The code stream corresponding to the visual media content in the first expression format; the third syntax element is a third value, which determines that the code stream includes the code stream corresponding to the visual media content in the second expression format.
示例性的,码流的参数集可以为V3C_VPS,第三语法元素可以为V3C_VPS中的ptl_profile_toolset_idc。For example, the parameter set of the code stream may be V3C_VPS, and the third syntax element may be ptl_profile_toolset_idc in V3C_VPS.
示例性的,以第一表达格式为多视点视频,第二表达格式为点云为例,第三语法元素置为第一数值时,第一数值用于指示码流中同时包含多视点视频码流和点云码流。作为具体的例子,当ptl_profile_toolset_idc=X,X为128/129/130/132/133/134则表示当前码流中同时包含点云和多视点两类码流。又例如,第三语法元素置为第二数值时,第二数值用于指示码流中只包含点云码流。作为具体的例子,当ptl_profile_toolset_idc=X,X为0/1则表示当前码流中只包含点云码流。又例如,第三语法元素置为第三数值,第三数值用于指示码流中只包含多视点视频码流。作为具体的例子,当ptl_profile_toolset_idc=X,X为64/65/66则表示当前码流中只包含多视点视频码流。应理解,以上第一数值、第二数值、第三数值的取值仅作为示例,本申请实施例并不限制与此。For example, taking the first expression format as multi-view video and the second expression format as point cloud, when the third syntax element is set to a first value, the first value is used to indicate that the code stream also contains multi-view video codes. Streams and point cloud code streams. As a specific example, when ptl_profile_toolset_idc=X and X is 128/129/130/132/133/134, it means that the current code stream contains both point cloud and multi-viewpoint code streams. For another example, when the third syntax element is set to the second value, the second value is used to indicate that the code stream only contains the point cloud code stream. As a specific example, when ptl_profile_toolset_idc=X and X is 0/1, it means that the current code stream only contains point cloud code streams. For another example, the third syntax element is set to a third value, and the third value is used to indicate that the code stream only contains a multi-view video code stream. As a specific example, when ptl_profile_toolset_idc=X and X is 64/65/66, it means that the current code stream only contains multi-view video code streams. It should be understood that the above values of the first numerical value, the second numerical value, and the third numerical value are only examples, and the embodiments of the present application are not limited thereto.
在该示例中,可以复用现有V3C标准中的V3C_VPS,并为ptl_profile_toolset_idc预配置了0/1,64/65/66,128/129/130/132/133/134等数值来指示当前码流中包含的码流类型。本申请实施例在对视觉媒体内容进行编码时,通过在参数集中添加第三语法元素的取值,来指示码流中包含那种表达格式的视觉媒体内容对应的码流,能够有助于提高解码端的解码准确性,同时能够使得V3C标准支持在同一压缩码流中包含多视点视频、点云、网格等一种或多种表达格式的视觉媒体内容。In this example, V3C_VPS in the existing V3C standard can be reused, and ptl_profile_toolset_idc is preconfigured with values such as 0/1, 64/65/66, 128/129/130/132/133/134 to indicate the current code stream The code stream type included in . When encoding visual media content, the embodiment of the present application adds the value of the third syntax element in the parameter set to indicate that the code stream contains the code stream corresponding to the visual media content in that expression format, which can help improve The decoding accuracy of the decoder also enables the V3C standard to support visual media content containing one or more expression formats such as multi-view videos, point clouds, grids, etc. in the same compressed code stream.
表3示出了可用的工具集配置文件组件(Available toolset profile components)的一个示例。表3提供了为V3C定义的工具集配置文件组件及其相应的标识语法元素值列表,例如ptl_profile_toolset_idc和ptc_one_v3c_frame_only_flag,该定义可以仅供本文档使用。语法元素ptl_profile_toolset_idc提供了工具集配置文件的主要定义,如ptc_one_v3c_frame_only_flag等附加语法元素可以指定已定义配置文件的附加特征或限制。ptc_one_v3c_frame_only_flag可以只用于支持单个V3C帧。需要说明的是,ptl_profile_toolset_idc中的2..63,67..127,131,135..255保留,暂时未定义,标准组织可能在未来的标准中再做规定。表3中定义的配置文件类型可以包括动态(Dynamic)或静态(Static)。Table 3 shows an example of available toolset profile components (Available toolset profile components). Table 3 provides a list of toolset profile components defined for V3C and their corresponding identification syntax element values, such as ptl_profile_toolset_idc and ptc_one_v3c_frame_only_flag. This definition may be used for this document only. The syntax element ptl_profile_toolset_idc provides the main definition of the toolset profile. Additional syntax elements such as ptc_one_v3c_frame_only_flag can specify additional characteristics or restrictions of the defined profile. ptc_one_v3c_frame_only_flag can be used to support only a single V3C frame. It should be noted that 2..63, 67..127, 131, 135..255 in ptl_profile_toolset_idc are reserved and are temporarily undefined. The standards organization may further stipulate them in future standards. The configuration file types defined in Table 3 can include dynamic (Dynamic) or static (Static).
表3可用的工具集配置文件组件(Available toolset profile components)Table 3 Available toolset profile components (Available toolset profile components)
Figure PCTCN2022105006-appb-000004
Figure PCTCN2022105006-appb-000004
Figure PCTCN2022105006-appb-000005
Figure PCTCN2022105006-appb-000005
在一些实施例中,码流的参数集还包括第一语法元素,其中,所述第一语法元素用于指示每张拼接图类型,具体用于指示拼接图为所述异构混合拼接图或所述同构拼接图;将所述第一语法元素写入所述码流的参数集。示例性的,V3C_VPS中增加第一语法元素(vps_toolset_type),vps_toolset_type来分辨每张拼接图及其对应的V3C unit应该归属于点云拼接图/多视点拼接图/点云+多视点的异构混合拼接图。同时为了兼容之前的标准,实现了以下新增语法和语义,以及对旧语义的约束。In some embodiments, the parameter set of the code stream further includes a first syntax element, wherein the first syntax element is used to indicate the type of each mosaic picture, specifically to indicate that the mosaic picture is the heterogeneous hybrid mosaic picture or The isomorphic splicing diagram; writing the first syntax element into the parameter set of the code stream. For example, the first syntax element (vps_toolset_type) is added to V3C_VPS, and vps_toolset_type is used to determine whether each spliced image and its corresponding V3C unit should belong to a point cloud spliced image/multi-viewpoint spliced image/point cloud + multi-viewpoint heterogeneous mixture. Mosaic diagram. At the same time, in order to be compatible with previous standards, the following new syntax and semantics are implemented, as well as constraints on the old semantics.
示例性的,所述第一语法元素为第一预设值,确定所述拼接图为包括第一表达格式和第二表达格式的同构区块的异构混合拼接图,其中,所述第一表达格式和所述第二表达格式为不同表达格式;所述第一语法元素为第二预设值,确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;所述第一语法元素为第三预设值,确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。Exemplarily, the first syntax element is a first preset value, which determines that the mosaic graph is a heterogeneous hybrid mosaic graph including homogeneous blocks of the first expression format and the second expression format, wherein the first An expression format and the second expression format are different expression formats; the first syntax element is a second preset value, which determines that the splicing diagram is a isomorphic splicing including isomorphic blocks of the first expression format Figure; the first syntax element is a third preset value, which determines that the mosaic graph is a isomorphic mosaic graph including isomorphic blocks of the second expression format.
示例性的,以第一表达格式为多视点视频,第二表达格式为点云为例,第一语法元素为第一预设值,第一预设值用于指示拼接图包括点云区块和多视点视频区块的异构混合拼接图,第一语法元素为第二预设值,第二预设值用于指示拼接图包括多视点视频区块的同构拼接图(可以称为多视点视频拼接图),第一语法元素为第三预设值,第三预设值用于指示拼接图包括点云区块的同构拼接图(可以称为点云拼接图)。For example, taking the first expression format as multi-view video and the second expression format as point cloud, the first syntax element is a first preset value, and the first preset value is used to indicate that the spliced image includes point cloud blocks. and a heterogeneous hybrid spliced image of multi-viewpoint video blocks, the first syntax element is a second preset value, and the second preset value is used to indicate that the spliced image includes a homogeneous spliced image of multi-viewpoint video blocks (which may be called a multi-viewpoint video block). viewpoint video mosaic), the first syntax element is a third preset value, and the third preset value is used to indicate that the mosaic includes a isomorphic mosaic of point cloud blocks (which may be called a point cloud mosaic).
示例性的,得到第三语法元素ptl_profile_toolset_idc=128/129/130/132/133/134后,在VPS中需要对每一张拼接图解析得到第一语法元素(vps_toolset_type),判断vps_toolset_type=X,X为1表示拼接图仅存在多视点视频区块,应满足多视点编码方法要求;X为2表示拼接图仅存在点云区块,应满足点云编码方法要求;X为3表示拼接图同时存在多视点视频区块和点云区块,应同时满足多视点和点云编码方法要求。应理解,以上第一语法元素的取值仅作为示例,本申请实施例并不限制与此。For example, after obtaining the third syntax element ptl_profile_toolset_idc=128/129/130/132/133/134, it is necessary to parse each splicing image in VPS to obtain the first syntax element (vps_toolset_type), and determine vps_toolset_type=X, X When X is 1, it means that the spliced image only contains multi-view video blocks, which should meet the requirements of the multi-view coding method; when X is 2, it means that the spliced image only contains point cloud blocks, which should meet the requirements of the point cloud encoding method; Multi-view video blocks and point cloud blocks should meet the requirements of both multi-view and point cloud encoding methods. It should be understood that the above values of the first syntax element are only examples, and the embodiments of the present application are not limited thereto.
表4示出了通用V3C参数集的语法(General V3C parameter set syntax),V3C参数集新增语法元素vps_toolset_type,具体可以用vps_toolset_type[j]表示索引为j的拼接图的类型。通过在V3C参数集中新增语法元素vps_toolset_type,解码端解码码流能够从V3C参数集获取vps_toolset_type,根据vps_toolset_type快速分辨每张拼接图及其对应的V3C unit应该归属于点云/多视点/点云+多视点,从而确定拼接图应满足哪种编码方法要求。Table 4 shows the syntax of the general V3C parameter set (General V3C parameter set syntax). The V3C parameter set has a new syntax element vps_toolset_type. Specifically, vps_toolset_type[j] can be used to represent the type of splicing diagram with index j. By adding the syntax element vps_toolset_type to the V3C parameter set, the decoding end can obtain the vps_toolset_type from the V3C parameter set. According to the vps_toolset_type, it can quickly determine whether each stitched image and its corresponding V3C unit should belong to point cloud/multi-viewpoint/point cloud+ Multiple viewpoints to determine which coding method the spliced image should meet.
表4通用V3C参数集的语法(General V3C parameter set syntax)Table 4 General V3C parameter set syntax (General V3C parameter set syntax)
Figure PCTCN2022105006-appb-000006
Figure PCTCN2022105006-appb-000006
Figure PCTCN2022105006-appb-000007
Figure PCTCN2022105006-appb-000007
Figure PCTCN2022105006-appb-000008
Figure PCTCN2022105006-appb-000008
在一些实施例中,所述拼接图对应的拼接图序列参数集包括所述第一语法元素。示例性的,所述拼接图对应的拼接图序列参数集包括所述第一子语法元素和所述第二子语法元素。所述第一子语法元素和所述第二子语法元素用于指示拼接图类型,其中,所述拼接图为所述异构混合拼接图或所述同构拼接图。In some embodiments, the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element. Exemplarily, the splicing graph sequence parameter set corresponding to the splicing graph includes the first sub-syntax element and the second sub-syntax element. The first sub-syntax element and the second sub-syntax element are used to indicate a splicing diagram type, wherein the splicing diagram is the heterogeneous hybrid splicing diagram or the isomorphic splicing diagram.
示例性的,所述第一子语法元素为第四预设值且所述第二子语法元素为第五预设值,确定所述拼接图为包括第一表达格式的同构区块和第二表达格式的同构区块的异构混合拼接图,所述第一子语法元素为第四预设值且所述第二子语法元素为第七预设值,确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;所述第一子语法元素为第六预设值且所述第二子语法元素为第五预设值,确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。Exemplarily, the first sub-grammar element is a fourth preset value and the second sub-grammar element is a fifth preset value, and it is determined that the mosaic diagram includes the isomorphic block of the first expression format and the th A heterogeneous hybrid splicing diagram of homogeneous blocks in two expression formats, the first sub-grammar element is a fourth preset value and the second sub-grammar element is a seventh preset value, and it is determined that the splicing diagram includes The isomorphic mosaic diagram of the isomorphic blocks of the first expression format; the first sub-grammar element is the sixth preset value and the second sub-grammar element is the fifth preset value, determining the mosaic diagram It is a isomorphic mosaic diagram including the isomorphic blocks of the second expression format.
可选的,第一子语法元素为拼接图序列参数集中的asps_vpcc_extension_present_flag,第二子语法元素为asps_miv_extension_present_flag。Optionally, the first sub-grammar element is asps_vpcc_extension_present_flag in the splicing diagram sequence parameter set, and the second sub-grammar element is asps_miv_extension_present_flag.
例如,可以在V3C_AD码流的NAL-ASPS包含asps_miv_extension_present_flag和asps_vpcc_extension_present_flag。For example, the NAL-ASPS of the V3C_AD code stream may contain asps_miv_extension_present_flag and asps_vpcc_extension_present_flag.
本申请实施例中,通过将第一子语法元素和第二子语法元素置特定值来指示拼接图为所述异构混合拼接图或所述同构拼接图。In the embodiment of the present application, the first sub-syntax element and the second sub-syntax element are set to specific values to indicate that the spliced image is the heterogeneous hybrid spliced image or the isomorphic spliced image.
示例性的,得到第三语法元素ptl_profile_toolset_idc=128/129/130/132/133/134后,从拼接图的拼接图信息中获取asps_vpcc_extension_present_flag=X和asps_miv_extension_present_flag=Y。X为0,Y为1表示拼接图仅存在多视点视频区块,应满足多视点编码方法要求;X为1,Y为0表示拼接图仅存在点云区块,应满足点云编码方法要求;X为1,Y为1表示拼接图同时存在多视点视频区块和点云区块,应同时满足多视点和点云编码方法要求。应理解,以上第七预设值、第八预设值的取值仅作为示例,本申请实施例并不限制与此。For example, after obtaining the third syntax element ptl_profile_toolset_idc=128/129/130/132/133/134, obtain asps_vpcc_extension_present_flag=X and asps_miv_extension_present_flag=Y from the splicing image information of the splicing image. When X is 0 and Y is 1, it means that the spliced image only contains multi-viewpoint video blocks, which should meet the requirements of the multi-viewpoint encoding method; when X is 1 and Y is 0, it means that the spliced image only contains point cloud blocks, which should meet the requirements of the point cloud encoding method. ; X is 1 and Y is 1, which means that the spliced image contains both multi-view video blocks and point cloud blocks, and should meet the requirements of multi-view and point cloud encoding methods at the same time. It should be understood that the above values of the seventh preset value and the eighth preset value are only examples, and the embodiments of the present application are not limited thereto.
表5示出了通用拼接图序列参数集的语法(General atlas sequence parameter set RBSP syntax),拼接图序列参数集可以理解为拼接图信息,编码端利用拼接图序列参数集中的语法元素asps_vpcc_extension_present_flag和asps_miv_extension_present_flag表示拼接图类型,编码端解析码流能够从拼接图的参数集中获取这两个语法元素,根据这两个语法元素的取值确定拼接图应该归属于点云/多视点/点云+多视点,从而确定拼接图应满足哪种编码方法要求。Table 5 shows the syntax of the general atlas sequence parameter set RBSP syntax. The splicing map sequence parameter set can be understood as splicing map information. The encoding end uses the syntax elements asps_vpcc_extension_present_flag and asps_miv_extension_present_flag in the splicing map sequence parameter set to represent The type of splicing image. The encoding end can obtain these two syntax elements from the parameter set of the splicing image by parsing the code stream. Based on the values of these two syntax elements, it is determined that the splicing image should belong to point cloud/multi-view/point cloud+multi-view. This determines which encoding method requirements the spliced image should meet.
表5通用拼接图序列参数集的语法(General atlas sequence parameter set RBSP syntax)Table 5 Syntax of general atlas sequence parameter set RBSP syntax
Figure PCTCN2022105006-appb-000009
Figure PCTCN2022105006-appb-000009
Figure PCTCN2022105006-appb-000010
Figure PCTCN2022105006-appb-000010
在实现上述语法后,能够实现一个VPS下同时存在多视点和点云的拼接图,进一步地需要实现在一张拼接图中存在多个同构区块时,每个同构区块均为多视点子块图合集或者点云子块图合集的情况。由于现有技术只能实现一张拼接图内存在一种同构区块。因此本申请实施例增加了第二语法元素,根据第二语法元素确定一张拼接图中一个同构区块的表达格式是多视点视频、点云或网格等。After implementing the above syntax, it is possible to realize a spliced image with multiple viewpoints and point clouds under a VPS. It is further necessary to realize that when there are multiple isomorphic blocks in a spliced image, each isomorphic block is a multi-view mosaic. The case of a collection of viewpoint sub-block images or a collection of point cloud sub-block images. Because the existing technology can only realize one kind of isomorphic block in a mosaic picture. Therefore, the embodiment of the present application adds a second syntax element. According to the second syntax element, it is determined whether the expression format of a homogeneous block in a spliced image is multi-view video, point cloud, grid, etc.
在一些实施例中,第i个区块的拼接图区块数据单元头中包括第二语法元素,根据所述第一语法元素确定所述拼接图为异构混合拼接图时,所述拼接图信息还包括第二语法元素,根据所述第二语法元素确定所述拼接图中第i个区块的表达格式。In some embodiments, the mosaic map block data unit header of the i-th block includes a second syntax element. When it is determined according to the first syntax element that the mosaic map is a heterogeneous hybrid mosaic map, the mosaic map The information also includes a second syntax element, according to which the expression format of the i-th block in the spliced image is determined.
本申请实施例在对异构混合拼接图进行编码时,通过设置第二语法元素,来指示异构混合拼接图中第i个区块的表达格式,能够有助于提高解码端的解码准确性,同时能够使得V3C标准支持在同一压缩码流中包含多视点视频和点云等不同表达格式的视觉媒体内容。示例性的,所述第二语法元素可以是拼接图区块数据单元头(atlas_tile_header)中ath_toolset_type。When encoding the heterogeneous hybrid splicing image, the embodiment of the present application sets a second syntax element to indicate the expression format of the i-th block in the heterogeneous hybrid splicing image, which can help improve the decoding accuracy of the decoder. At the same time, the V3C standard can support visual media content in different expression formats such as multi-view videos and point clouds in the same compressed code stream. For example, the second syntax element may be ath_toolset_type in the mosaic map tile data unit header (atlas_tile_header).
示例性的,所述第二语法元素为第八预设值,确定所述第i个区块的表达格式为第一表达格式;所述第二语法元素为第九预设值,确定所述第i个区块的表达格式为第二表达格式。Exemplarily, the second syntax element is the eighth preset value, and it is determined that the expression format of the i-th block is the first expression format; the second syntax element is the ninth preset value, and it is determined that the expression format of the i-th block is the first expression format. The expression format of the i-th block is the second expression format.
示例性的,拼接图为异构混合拼接图时,在AD unit中对ACL NAL unit type码流中解析atlas_tile_header,从中解析得到ath_toolset_type,判断ath_toolset_type=X,X为0表示当前区块为点云区块;X为1表示当前区块为多视点视频区块。For example, when the spliced image is a heterogeneous mixed spliced image, the atlas_tile_header is parsed in the ACL NAL unit type code stream in the AD unit, and the ath_toolset_type is obtained from the analysis, and it is judged that ath_toolset_type=X. If X is 0, it means that the current block is a point cloud area. block; X is 1 indicating that the current block is a multi-view video block.
表6示出了拼接图区块数据单元头语法(Atlas tile header syntax),编码端在拼接图区块数据单元头语法中新增语法元素ath_toolset_type,用于表示区块类型,解码码流能够拼接图区块数据单元头语法获取ath_toolset_type,从而确定当前区块应该属于多视点视频解码还是点云解码。Table 6 shows the Atlas tile header syntax. The encoding end adds a new syntax element ath_toolset_type to the Atlas tile header syntax to indicate the block type. The decoded code stream can be spliced. The picture block data unit header syntax obtains ath_toolset_type to determine whether the current block belongs to multi-view video decoding or point cloud decoding.
表6拼接图区块数据单元头语法(Atlas tile header syntax)Table 6 Atlas tile header syntax
Figure PCTCN2022105006-appb-000011
Figure PCTCN2022105006-appb-000011
Figure PCTCN2022105006-appb-000012
Figure PCTCN2022105006-appb-000012
可选的,第二语法元素还可以位于子图块数据单元(patch_data_unit)中。示例性的,在已知第二语法元素(ath_toolset_type)为1的前提下,确定当前子图块采用多视点视频编码方法进行编码。在已知第二语法元素(ath_toolset_type)为0的前提下,确定当前子图块采用点云编码方法进行编码。子图块数据单元语法(Patch data unit syntax)可以如表7所示:Optionally, the second syntax element may also be located in the sub-patch data unit (patch_data_unit). For example, on the premise that the second syntax element (ath_toolset_type) is known to be 1, it is determined that the current sub-tile is encoded using the multi-viewpoint video encoding method. On the premise that the second syntax element (ath_toolset_type) is known to be 0, it is determined that the current sub-tile is encoded using the point cloud encoding method. The sub-patch data unit syntax (Patch data unit syntax) can be shown in Table 7:
表7子图块数据单元语法(Patch data unit syntax)Table 7 Sub-patch data unit syntax (Patch data unit syntax)
Figure PCTCN2022105006-appb-000013
Figure PCTCN2022105006-appb-000013
Figure PCTCN2022105006-appb-000014
Figure PCTCN2022105006-appb-000014
在一些实施例中,vps_toolset_type[j]值为1表示索引为j的拼接图(atlas)的工具集档次组件的语法元素的取值应符合ISO/IEC 23090-12表A-1-1(即表8)中规定的取值;In some embodiments, a vps_toolset_type[j] value of 1 indicates that the value of the syntax element of the toolset profile component of the atlas with index j should comply with ISO/IEC 23090-12 Table A-1-1 (i.e. The values specified in Table 8);
vps_toolset_type[j]值为2表示索引为j的atlas的的工具集档次组件的语法元素的取值应符合ISO/IEC 23090-5表H-3中规定的取值,但是vps_extension_present_flag,vps_packing_information_present_flag,vps_miv_extension_present_flag,vuh_unit_type,vps_atlas_count_minus1的取值除外,它们的取值应符合ISO/IEC 23090-12表A-1-1中规定的取值;The value of vps_toolset_type[j] is 2, indicating that the value of the syntax element of the atlas toolset profile component with index j should comply with the values specified in ISO/IEC 23090-5 Table H-3, but vps_extension_present_flag, vps_packing_information_present_flag, vps_miv_extension_present_flag, Except for the values of vuh_unit_type, vps_atlas_count_minus1, their values should comply with the values specified in ISO/IEC 23090-12 Table A-1-1;
vps_toolset_type[j]值为3表示索引为j的atlas的工具集档次组件的语法元素的取值应符合扩展后的ISO/IEC 23090-12表A-1-2(即表9-1和表9-2)中规定的取值;其中表A-1-1和表A-1-2分别表示集成码流下针对多视点的工具箱档次组件的相关语法的限制和针对异构数据的工具箱档次组件相关语法的限制。A vps_toolset_type[j] value of 3 indicates that the value of the syntax element of the atlas toolset grade component with index j should comply with the extended ISO/IEC 23090-12 Table A-1-2 (i.e. Table 9-1 and Table 9 -2); Table A-1-1 and Table A-1-2 respectively represent the relevant syntax restrictions of toolbox level components for multi-viewpoints and the toolbox level for heterogeneous data under the integrated code stream. Restrictions on component-related syntax.
vps_toolset_type[j]值为0或4~7中的任意值表示为将保留的值用于ISO/IEC将来使用,并且不应出现在符合本文档此版本的比特流中。符合本文档此版本的解码器应忽略此类保留单元类型。MIV工具集配置文件的语法元素值的允许值。A vps_toolset_type[j] value of 0 or any value from 4 to 7 indicates that the value is reserved for future use by ISO/IEC and should not appear in bitstreams conforming to this version of this document. Decoders conforming to this version of this document should ignore such reserved unit types. Allowed values for the syntax element value of the MIV toolset configuration file.
ath_toolset_type表示当前tile的工具集档次组件的语法元素的取值应符合ISO/IEC 23090-12扩展的表A-1中规定的取值。ath_toolset_type取值范围应在0和1之间。ath_toolset_type indicates that the value of the syntax element of the tool set level component of the current tile should conform to the value specified in Table A-1 of the ISO/IEC 23090-12 extension. The value range of ath_toolset_type should be between 0 and 1.
表8工具集配置文件语法元素的允许取值(Allowable values of syntax element values for the MIV toolset profile)Table 8 Allowable values of syntax element values for the MIV toolset profile
Figure PCTCN2022105006-appb-000015
Figure PCTCN2022105006-appb-000015
Figure PCTCN2022105006-appb-000016
Figure PCTCN2022105006-appb-000016
Figure PCTCN2022105006-appb-000017
Figure PCTCN2022105006-appb-000017
表9-1工具集配置文件语法元素的允许取值扩展(Allowable values of syntax element values for the MIV toolset profile(Extended))Table 9-1 Allowable values of syntax element values for the MIV toolset profile(Extended)
Figure PCTCN2022105006-appb-000018
Figure PCTCN2022105006-appb-000018
Figure PCTCN2022105006-appb-000019
Figure PCTCN2022105006-appb-000019
Figure PCTCN2022105006-appb-000020
Figure PCTCN2022105006-appb-000020
表9-2工具集配置文件语法元素的允许取值扩展(Allowable values of syntax element values for the MIV toolset profile(Extended))Table 9-2 Allowable values of syntax element values for the MIV toolset profile(Extended)
Figure PCTCN2022105006-appb-000021
Figure PCTCN2022105006-appb-000021
Figure PCTCN2022105006-appb-000022
Figure PCTCN2022105006-appb-000022
Figure PCTCN2022105006-appb-000023
Figure PCTCN2022105006-appb-000023
图9为本申请实施例提供的V3C比特流结构的一个示意图。其中,V3C_VPS的V3C参数集()(V3C_parameter_set())中可以包括ptl_profile_toolset_idc,ptl_profile_toolset_idc为128/129/130/132/133/134则表示当前码流中同时包含点云码流(比如VPCC basic或VPCC extended等)和多视点视频码流(比如MIV main或MIV Extended或MIV Geometry Absent等)。Figure 9 is a schematic diagram of the V3C bitstream structure provided by the embodiment of the present application. Among them, the V3C parameter set () (V3C_parameter_set()) of V3C_VPS can include ptl_profile_toolset_idc. If ptl_profile_toolset_idc is 128/129/130/132/133/134, it means that the current code stream also contains a point cloud code stream (such as VPCC basic or VPCC extended, etc.) and multi-view video streams (such as MIV main or MIV Extended or MIV Geometry Absent, etc.).
V3C_VPS的V3C参数集()(V3C_parameter_set())中可以包括第一语法元素(vps_toolset_type),在ptl_profile_toolset_idc为128/129/130/132/133/134的情况下,vps_toolset_type为1表示当前拼接图仅存在多视点视频区块,为2表示当前拼接图仅存在点云区块,为3表示当前拼接图同时存在多视点区块和点云区块。The V3C parameter set () (V3C_parameter_set()) of V3C_VPS can include the first syntax element (vps_toolset_type). When ptl_profile_toolset_idc is 128/129/130/132/133/134, vps_toolset_type is 1, which means that the current splicing diagram only exists For multi-view video blocks, a value of 2 means that only point cloud blocks exist in the current spliced image, and a value of 3 means that both multi-view point blocks and point cloud blocks exist in the current spliced image.
或者,V3C_AD的atlas子比特流()(Atlas_sub_bitstream())中NAL_ASPS中的拼接图序列参数集()(Atlas_sequence_parameter_set_rbsp())可以包括asps_vpcc_extension_present_flag和asps_miv_extension_present_flag。在ptl_profile_toolset_idc为128/129/130/132/133/134的情况下,asps_vpcc_extension_present_flag=X,asps_miv_extension_present_flag=Y。X为0,Y为1表示拼接图仅存在多视点视频区块;X为1,Y为0表示拼接图仅存在点云区块;X为1,Y为1表示拼接图同时存在多视点视频区块和点云区块。Alternatively, the splicing map sequence parameter set () (Atlas_sequence_parameter_set_rbsp()) in the NAL_ASPS in the atlas sub-bitstream () (Atlas_sub_bitstream()) of V3C_AD may include asps_vpcc_extension_present_flag and asps_miv_extension_present_flag. When ptl_profile_toolset_idc is 128/129/130/132/133/134, asps_vpcc_extension_present_flag=X and asps_miv_extension_present_flag=Y. When X is 0 and Y is 1, it means that the spliced image only contains multi-viewpoint video blocks; when X is 1 and Y is 0, it means that the spliced image only contains point cloud blocks; Blocks and point cloud blocks.
V3C_AD的Atlas_sub_bitstream())中ACL NAL单元类型(ACL_NAL_unit_type)中包括拼接图信息。例如,拼接图区块数据单元(atlas_tile_data_unit())中可以包括ath_toolset_type。ath_toolset_type为否(即为0),则表示当前区块属于点云区块,atdu_type_flag为是(即为1),则表示当前条区块属于多视点视频区块。The ACL NAL unit type (ACL_NAL_unit_type) in Atlas_sub_bitstream() of V3C_AD includes splicing image information. For example, the mosaic map tile data unit (atlas_tile_data_unit()) may include ath_toolset_type. If ath_toolset_type is no (that is, 0), it means that the current block belongs to a point cloud block. If atdu_type_flag is yes (that is, 1), it means that the current block belongs to a multi-viewpoint video block.
进一步的,子图块信息数据()(patch_information_data)中包括子图块数据单元(patch_data_unit)。ath_toolset_type为否(即为0)的情况下,表示当前子图块采用点云视频编码方法实现。在ath_toolset_type为是(即为1)的情况下,表示当前子图块采用多视点视频编码方法实现。Further, the sub-patch information data () (patch_information_data) includes a sub-patch data unit (patch_data_unit). If ath_toolset_type is no (that is, 0), it means that the current sub-tile is implemented using the point cloud video encoding method. When ath_toolset_type is yes (that is, 1), it means that the current sub-tile is implemented using a multi-viewpoint video coding method.
通过获取每一张拼接图的第一语法元素,根据第一语法元素值确定拼接图中是否同时包括点云区块和多视点视频区块,确定拼接图中同时存在点云区块和多视点视频区块时,需要获取拼接图中每个区块的ath_toolset_type,来确定区块类型。By obtaining the first syntax element of each spliced image, determining whether the spliced image includes both point cloud blocks and multi-viewpoint video blocks based on the first syntax element value, and determining that both point cloud blocks and multi-viewpoint video blocks exist in the spliced image. When selecting video blocks, you need to obtain the ath_toolset_type of each block in the splicing image to determine the block type.
上文以编码端为例对本申请的编码方法进行介绍,下面以解码端为例对本申请实施例提供的视频解码方法进行说明。The encoding method of the present application is introduced above by taking the encoding end as an example. The video decoding method provided by the embodiment of the present application is described below by taking the decoding end as an example.
图10为本申请实施例提供的一种解码方法的示意性流程图。如图10所示,本申请实施例的解码方法包括:Figure 10 is a schematic flow chart of a decoding method provided by an embodiment of the present application. As shown in Figure 10, the decoding method in this embodiment of the present application includes:
步骤1001:解码码流,得到拼接图和拼接图信息,其中,所述拼接图信息包括第一语法元素,根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图;Step 1001: Decode the code stream to obtain a spliced image and spliced image information, wherein the spliced image information includes a first syntax element, and the spliced image is determined to be a heterogeneous hybrid spliced image or a homogeneous spliced image according to the first syntax element;
在一些实施例中,所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,包括:所述第一语法元素为第一预设值,则确定所述拼接图为包括第一表达格式和第二表达格式的同构区块的异构混合拼接图,其中,所述第一表达格式和所述第二表达格式为不同表达格式;所述第一语法元素为第二预设值,则确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;所述第一语法元素为第三预设值,则确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。In some embodiments, determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram includes: if the first syntax element is a first preset value, then determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram. The figure shows a heterogeneous mixed splicing diagram including homogeneous blocks of a first expression format and a second expression format, wherein the first expression format and the second expression format are different expression formats; the first syntax element is the second preset value, then it is determined that the splicing diagram is a isomorphic splicing diagram including the isomorphic blocks of the first expression format; the first syntax element is the third preset value, then it is determined that the splicing The figure shows a isomorphic mosaic diagram including the isomorphic blocks of the second expression format.
在一些实施例中,所述第一语法元素包括:第一子语法元素和第二子语法元素,根据所述第一子语法元素和所述第二子语法元素确定所述拼接图为异构混合拼接图或者同构拼接图;相应的,所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,包括:所述第一子语法元素为第四预设值,则确定所述拼接图包括第一表达格式的同构区块;和/或,所述第二子语法元素为第五预设值,则 确定所述拼接图包括第二表达格式的同构区块;其中,所述第一表达格式和所述第二表达格式为不同表达格式。In some embodiments, the first syntax element includes: a first sub-syntax element and a second sub-syntax element, and it is determined that the splicing graph is heterogeneous according to the first sub-syntax element and the second sub-syntax element. A hybrid splicing diagram or a homogeneous splicing diagram; accordingly, the determination of the splicing diagram according to the first syntax element as a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram includes: the first sub-grammar element is a fourth preset value, it is determined that the spliced graph includes isomorphic blocks of the first expression format; and/or, the second sub-grammar element is a fifth preset value, then it is determined that the spliced graph includes isomorphic blocks of the second expression format. Building block; wherein the first expression format and the second expression format are different expression formats.
在一些实施例中,所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,还包括:所述第一子语法元素为第六预设值,则确定所述拼接图不包括第一表达格式的同构区块;所述第二子语法元素为第七预设值,则确定所述拼接图不包括第二表达格式的同构区块。In some embodiments, determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram further includes: determining that the first sub-grammar element is a sixth preset value. If the mosaic graph does not include the isomorphic blocks of the first expression format; if the second sub-syntax element is the seventh preset value, it is determined that the mosaic graph does not include the isomorphic blocks of the second expression format.
具体地,所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,包括:所述第一子语法元素为第四预设值且所述第二子语法元素为第五预设值,确定所述拼接图为包括第一表达格式的同构区块和第二表达格式的同构区块的异构混合拼接图,所述第一子语法元素为第四预设值且所述第二子语法元素为第七预设值,确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;所述第一子语法元素为第六预设值且所述第二子语法元素为第五预设值,确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。Specifically, determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram according to the first syntax element includes: the first sub-grammar element is a fourth preset value and the second sub-grammar element is the fifth preset value, it is determined that the splicing diagram is a heterogeneous hybrid splicing diagram including homogeneous blocks of the first expression format and homogeneous blocks of the second expression format, and the first sub-grammar element is the fourth The preset value and the second sub-grammar element is the seventh preset value, which determines that the mosaic diagram is a isomorphic mosaic diagram including isomorphic blocks of the first expression format; the first sub-grammar element is The sixth preset value and the second sub-syntax element are the fifth preset value, which determines that the mosaic graph is a isomorphic mosaic graph including isomorphic blocks of the second expression format.
在一些实施例中,所述第一语法元素位于所述码流的参数集子码流。In some embodiments, the first syntax element is located in a parameter set sub-codestream of the codestream.
在另一些实施例中,所述拼接图对应的拼接图序列参数集包括所述第一语法元素。In some other embodiments, the mosaic map sequence parameter set corresponding to the mosaic map includes the first syntax element.
在一些实施例中,所述至少一种表达格式包括:多视点视频、点云和网格中的至少一种。具体地,第一表达格式为多视点视频、点云和网格中的一种,第二表达格式为多视点视频、点云和网格中的一种,第一表达格式和第二表达格式不同。In some embodiments, the at least one expression format includes: at least one of multi-view video, point cloud, and mesh. Specifically, the first expression format is one of multi-view video, point cloud and grid, the second expression format is one of multi-view video, point cloud and grid, the first expression format and the second expression format are different.
在一些实施例中,所述码流还包括码流的参数集子码流,所述码流的参数集子码流中包括第三语法元素,根据第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流。所述方法还包括:解码所述码流的参数集子码流,得到所述码流的参数集,从码流的参数集中获取第三语法元素。In some embodiments, the code stream further includes a parameter set sub-code stream of the code stream, the parameter set sub-code stream of the code stream includes a third syntax element, and it is determined according to the third syntax element that the code stream includes A code stream corresponding to visual media content in at least one expression format. The method further includes: decoding the parameter set sub-code stream of the code stream to obtain the parameter set of the code stream, and obtaining the third syntax element from the parameter set of the code stream.
在一些实施例中,所述方法还包括:所述第三语法元素为第一数值,确定所述码流中同时包括第一表达格式的视觉媒体内容对应的码流和第二表达格式的视觉媒体内容对应的码流;所述第三语法元素为第二数值,确定所述码流中包括所述第一表达格式的视觉媒体内容对应的码流;所述第三语法元素为第三数值,确定所述码流中包括所述第二表达格式的视觉媒体内容对应的码流。In some embodiments, the method further includes: the third syntax element is a first value, determining that the code stream includes both a code stream corresponding to the visual media content in the first expression format and a visual code stream in the second expression format. The code stream corresponding to the media content; the third syntax element is a second value, and it is determined that the code stream includes the code stream corresponding to the visual media content of the first expression format; the third syntax element is a third value , determining the code stream corresponding to the visual media content including the second expression format in the code stream.
在一些实施例中,所述解码码流,得到至少一个拼接图,包括:根据所述第三语法元素确定所述码流包括至少两种表达格式的视觉媒体内容对应的码流,解码所述码流得到异构混合拼接图。In some embodiments, decoding the code stream to obtain at least one spliced image includes: determining according to the third syntax element that the code stream includes code streams corresponding to visual media content in at least two expression formats, decoding the The code stream obtains a heterogeneous hybrid splicing image.
在一些实施例中,所述解码码流,得到至少一个拼接图,包括:根据所述第三语法元素确定所述码流包括至少两种表达格式的视觉媒体内容对应的码流,解码所述码流得到至少两种表达格式的同构拼接图。也就是说,当码流中包括至少两种表达格式的视觉媒体内容对应的码流,每种表达格式对应一种同构拼接图。In some embodiments, decoding the code stream to obtain at least one spliced image includes: determining according to the third syntax element that the code stream includes code streams corresponding to visual media content in at least two expression formats, decoding the The code stream obtains isomorphic splicing images of at least two expression formats. That is to say, when the code stream includes code streams corresponding to visual media content in at least two expression formats, each expression format corresponds to a isomorphic splicing diagram.
在一些实施例中,所述解码码流,得到至少一个拼接图,包括:根据所述第三语法元素确定所述码流包括至少两种表达格式的视觉媒体内容对应的码流,解码所述码流得到至少两种表达格式的异构混合拼接图和同构拼接图。也就是说,当码流中包括至少两种表达格式的视觉媒体内容对应的码流,一部分表达格式的同构区块构建异构混合拼接图,另一部分表达格式的同构区块构建同构拼接图。In some embodiments, decoding the code stream to obtain at least one spliced image includes: determining according to the third syntax element that the code stream includes code streams corresponding to visual media content in at least two expression formats, decoding the The code stream obtains heterogeneous mixed splicing images and isomorphic splicing images of at least two expression formats. That is to say, when the code stream includes code streams corresponding to visual media content in at least two expression formats, the isomorphic blocks of some expression formats construct a heterogeneous hybrid splicing diagram, and the isomorphic blocks of another part of the expression formats construct an isomorphic Mosaic diagram.
在一些实施例中,所述异构混合拼接图以下至少一种:单一属性异构混合拼接图和多属性异构混合拼接图;所述同构拼接图包括以下至少一种:单一属性同构拼接图和多属性同构拼接图。In some embodiments, the heterogeneous hybrid mosaic diagram includes at least one of the following: a single attribute heterogeneous hybrid mosaic diagram and a multi-attribute heterogeneous hybrid mosaic diagram; the isomorphic mosaic diagram includes at least one of the following: single attribute isomorphism Mosaic graphs and multi-attribute isomorphic mosaic graphs.
在一些实施例中,所述码流包括视频压缩子码流和拼接图信息子码流,所述解码码流,得到至少一个拼接图和拼接图信息,包括:解码所述视频压缩子码流,得到所述至少一个拼接图;解码所述拼接图信息子码流,得到所述至少一个拼接图的拼接图信息。示例性的,根据所述第三语法元素确定所述码流包括至少两种表达格式的视觉媒体内容对应的码流,解码所述视频压缩子码流,解码所述码流得到异构混合拼接图。或者,根据所述第三语法元素确定所述码流包括至少两种表达格式的视觉媒体内容对应的码流,解码所述视频压缩子码流,解码所述码流得到异构混合拼接图和同构拼接图;或者,根据所述第三语法元素确定所述码流包括至少两种表达格式的视觉媒体内容对应的码流,解码所述视频压缩子码流,至少两种表达格式的同构拼接图。In some embodiments, the code stream includes a video compression sub-stream and a splicing picture information sub-stream, and decoding the code stream to obtain at least one splicing picture and splicing picture information includes: decoding the video compression sub-stream , obtain the at least one spliced image; decode the spliced image information sub-stream to obtain the spliced image information of the at least one spliced image. Exemplarily, it is determined according to the third syntax element that the code stream includes a code stream corresponding to visual media content in at least two expression formats, the video compression sub-stream is decoded, and the code stream is decoded to obtain heterogeneous hybrid splicing. picture. Or, determine according to the third syntax element that the code stream includes a code stream corresponding to visual media content in at least two expression formats, decode the video compression sub-stream, and decode the code stream to obtain a heterogeneous hybrid splicing image and Isomorphic splicing diagram; or, determine according to the third syntax element that the code stream includes code streams corresponding to visual media content of at least two expression formats, decode the video compression sub-code stream, and obtain the same code stream of at least two expression formats. Construct a mosaic diagram.
步骤1002:根据所述第一语法元素确定所述拼接图为异构混合拼接图时,根据所述拼接图的拼接图信息对所述拼接图进行拆分,得到至少两种同构区块,其中,所述至少两种同构区块对应不同的视觉媒体内容表达格式;Step 1002: When it is determined that the spliced image is a heterogeneous hybrid spliced image according to the first syntax element, split the spliced image according to the spliced image information of the spliced image to obtain at least two types of isomorphic blocks, Wherein, the at least two isomorphic blocks correspond to different visual media content expression formats;
步骤1003:根据所述第一语法元素确定所述拼接图为同构拼接图时,根据所述拼接图的拼接图信息对所述拼接图进行拆分,得到一种同构区块,其中,所述一种同构区块对应相同的视觉媒体内容表达格式;Step 1003: When it is determined that the spliced graph is a isomorphic spliced graph according to the first syntax element, the spliced graph is split according to the spliced graph information of the spliced graph to obtain a homogeneous block, wherein, The one isomorphic block corresponds to the same visual media content expression format;
步骤1004:对所述同构区块进行解码重建,得到至少一种表达格式的视觉媒体内容。Step 1004: Decode and reconstruct the isomorphic blocks to obtain visual media content in at least one expression format.
在一些实施例中,所述方法还包括:根据所述第一语法元素确定所述拼接图为异构混合拼接图时,所述拼接图信息还包括第二语法元素,根据所述第二语法元素确定所述拼接图中第i个区块的表达格式。In some embodiments, the method further includes: when determining that the spliced image is a heterogeneous hybrid spliced image according to the first syntax element, the spliced image information further includes a second syntax element. According to the second syntax element The element determines the expression format of the i-th block in the mosaic diagram.
示例性的,所述根据所述第二语法元素确定所述拼接图中第i个区块的表达格式包括:所述第二语法元素为第八预设值,确定所述第i个区块的表达格式为第一表达格式;所述第二语法元素为第九预设值,确定所述第i个区块的表达格式为第二表达格式。Exemplarily, determining the expression format of the i-th block in the mosaic diagram based on the second syntax element includes: the second syntax element is an eighth preset value, and determining the i-th block The expression format is the first expression format; the second syntax element is the ninth preset value, and the expression format of the i-th block is determined to be the second expression format.
在一些实施例中,所述第二语法元素位于所述拼接图的第i个区块的拼接图区块数据单元头中。In some embodiments, the second syntax element is located in the mosaic block data unit header of the i-th block of the mosaic map.
在一些实施例中,所述对所述同构区块进行解码重建,得到至少一种表达格式的视觉媒体内容,包括:若所述第i个区块的表达格式为第一表达格式,确定所述第i个区块中子图块采用所述第一表达格式对应的解码方法进行解码重建,得到所述第一表达格式的视觉媒体内容;若所述第i个区块的表达格式为第二表达格式,确定所述第i个区块中子图块采用所述第二表达格式对应的解码方法进行解码重建,得到所述第二表达格式的视觉媒体内容。In some embodiments, decoding and reconstructing the isomorphic blocks to obtain visual media content in at least one expression format includes: if the expression format of the i-th block is the first expression format, determining The sub-tiles in the i-th block are decoded and reconstructed using the decoding method corresponding to the first expression format to obtain the visual media content of the first expression format; if the expression format of the i-th block is In the second expression format, it is determined that the sub-tiles in the i-th block are decoded and reconstructed using the decoding method corresponding to the second expression format to obtain the visual media content of the second expression format.
示例性的,解码码流得到多视点视频拼接图、点云拼接图和异构混合拼接图。根据异构混合拼接图的拼接图信息,拆分异构混合拼接图,输出重建的多视点视频区块和点云区块;根据多视点视频拼接图对应的拼接图信息,拆分多视点视频拼接图,输出重建的多视点视频区块;根据点云拼接图对应的拼接图信息,拆分点云拼接图,输出重建点云区块;将获取的所有多视点视频区块通过多视点视频解码生成重建的多视点视频;将获取的所有点云区块通过点云解码生成重建点云。For example, the decoded code stream obtains a multi-viewpoint video splicing image, a point cloud splicing image, and a heterogeneous hybrid splicing image. According to the splicing information of the heterogeneous hybrid splicing image, the heterogeneous hybrid splicing image is split, and the reconstructed multi-viewpoint video blocks and point cloud blocks are output; according to the splicing information corresponding to the multi-viewpoint video splicing image, the multi-viewpoint video is split Splicing image, output the reconstructed multi-view video block; split the point cloud splicing image according to the splicing image information corresponding to the point cloud splicing image, and output the reconstructed point cloud block; pass all the acquired multi-view point video blocks through the multi-view video Decoding generates a reconstructed multi-view video; all acquired point cloud blocks are decoded to generate a reconstructed point cloud.
采用上述技术方案,针对包括一种或多种表达格式的视觉媒体内容的应用场景,将不同表达格式的同构区块拼接成一张异构混合拼接图,将相同表达格式的的同构区块拼接成一张同构拼接图,将得到的拼接图和拼接图信息写入码流。码流中同时存在同构拼接图(例如多视点拼接图、点云拼接图和网格拼接图中的至少一个)和异构混合拼接图,使得该编解码方法适用于多种表达格式的视觉媒体内容的应用场景,扩大了编解码方法的应用范围。而且拼接图信息中包括了用于指示拼接图类型的第一语法元素,提高了解码端对拼接图的解码效率。进一步地,由于将不同表达格式的同构区块拼接在一张异构混合拼接图中进行编解码,能够减少所需要调用的HEVC,VVC,AVC,AVS等二维视频编解码器的个数,降低实现代价,提高易用性。Using the above technical solution, for application scenarios that include visual media content in one or more expression formats, homogeneous blocks of different expression formats are spliced into a heterogeneous mixed splicing picture, and homogeneous blocks of the same expression format are spliced into a heterogeneous mixed splicing image. Create a isomorphic splicing image, and write the resulting splicing image and the splicing image information into the code stream. There are both homogeneous splicing images (such as at least one of multi-viewpoint splicing images, point cloud splicing images and grid splicing images) and heterogeneous hybrid splicing images in the code stream, making this encoding and decoding method suitable for visual expressions of multiple expression formats. The application scenarios of media content expand the application scope of encoding and decoding methods. Moreover, the splicing picture information includes the first syntax element used to indicate the type of the splicing picture, which improves the decoding efficiency of the splicing picture at the decoding end. Furthermore, since homogeneous blocks of different expression formats are spliced into a heterogeneous hybrid splicing image for encoding and decoding, the number of 2D video codecs such as HEVC, VVC, AVC, and AVS that need to be called can be reduced, reducing Realize value and improve ease of use.
本申请实施例还提供了一种编码装置,图11为本申请一实施例提供的编码装置的示意性框图,该编码装置110应用于编码器。如图11所示,编码装置110包括:An embodiment of the present application also provides an encoding device. Figure 11 is a schematic block diagram of an encoding device provided by an embodiment of the present application. The encoding device 110 is applied to an encoder. As shown in Figure 11, the encoding device 110 includes:
处理单元1101,用于对至少一种表达格式的视觉媒体内容进行处理,得到至少一种同构区块,其中,不同种同构区块对应不同的视觉媒体内容表达格式;The processing unit 1101 is configured to process visual media content in at least one expression format to obtain at least one isomorphic block, where different types of isomorphic blocks correspond to different visual media content expression formats;
拼接单元1102,用于对所述至少一种同构区块进行拼接,得到至少一个拼接图和拼接图信息,其中,所述拼接图信息包括第一语法元素,根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图,所述异构混合拼接图包括至少两种同构区块,所述同构拼接图包括一种同构区块;The splicing unit 1102 is used to splice the at least one isomorphic block to obtain at least one spliced image and spliced image information, wherein the spliced image information includes a first syntax element, which is determined according to the first syntax element. The mosaic diagram is a heterogeneous hybrid mosaic diagram or a homogeneous mosaic diagram, the heterogeneous hybrid mosaic diagram includes at least two types of isomorphic blocks, and the isomorphic mosaic diagram includes one type of isomorphic block;
编码单元1103,用于对所述至少一个拼接图和拼接图信息进行编码,得到码流。The encoding unit 1103 is used to encode the at least one splicing picture and the splicing picture information to obtain a code stream.
在一些实施例中,所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,包括:所述第一语法元素为第一预设值,则确定所述拼接图为包括第一表达格式和第二表达格式的同构区块的异构混合拼接图,其中,所述第一表达格式和所述第二表达格式为不同表达格式;所述第一语法元素为第二预设值,则确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;所述第一语法元素为第三预设值,则确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。In some embodiments, determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram includes: if the first syntax element is a first preset value, then determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram. The figure shows a heterogeneous mixed splicing diagram including homogeneous blocks of a first expression format and a second expression format, wherein the first expression format and the second expression format are different expression formats; the first syntax element is the second preset value, then it is determined that the splicing diagram is a isomorphic splicing diagram including the isomorphic blocks of the first expression format; the first syntax element is the third preset value, then it is determined that the splicing The figure shows a isomorphic mosaic diagram including the isomorphic blocks of the second expression format.
在一些实施例中,所述第一语法元素包括:第一子语法元素和第二子语法元素,根据所述第一子语法元素和所述第二子语法元素确定所述拼接图为异构混合拼接图或者同构拼接图;In some embodiments, the first syntax element includes: a first sub-syntax element and a second sub-syntax element, and it is determined that the splicing graph is heterogeneous according to the first sub-syntax element and the second sub-syntax element. Hybrid mosaic or isomorphic mosaic;
所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,包括:所述第一子语法元素为第四预设值,则确定所述拼接图包括第一表达格式的同构区块;和/或,所述第二子语法元素为第五预设值,则确定所述拼接图包括第二表达格式的同构区块;其中,所述第一表达格式和所述第二表达格式为不同表达格式。Determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram based on the first syntax element includes: if the first sub-grammar element is a fourth preset value, then it is determined that the splicing diagram includes the first expression isomorphic blocks of the format; and/or, if the second sub-syntax element is the fifth preset value, it is determined that the splicing diagram includes a isomorphic block of the second expression format; wherein, the first expression format and the second expression format are different expression formats.
在一些实施例中,所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,还包括:所述第一子语法元素为第六预设值,则确定所述拼接图不包括第一表达格式的同构区块;所述第二子语法元素为第七预设值,则确定所述拼接图不包括第二表达格式的同构区块。In some embodiments, determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram further includes: determining that the first sub-grammar element is a sixth preset value. If the mosaic graph does not include the isomorphic blocks of the first expression format; if the second sub-syntax element is the seventh preset value, it is determined that the mosaic graph does not include the isomorphic blocks of the second expression format.
在一些实施例中,所述第一语法元素位于所述码流的参数集子码流。In some embodiments, the first syntax element is located in a parameter set sub-codestream of the codestream.
在一些实施例中,所述拼接图对应的拼接图序列参数集包括所述第一语法元素。In some embodiments, the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element.
在一些实施例中,根据所述第一语法元素确定所述拼接图为异构混合拼接图时,所述拼接图信息还包括第二语法元素,根据所述第二语法元素确定所述拼接图中第i个区块的表达格式。In some embodiments, when it is determined based on the first syntax element that the spliced image is a heterogeneous hybrid spliced image, the spliced image information also includes a second syntax element, and the spliced image is determined based on the second syntax element The expression format of the i-th block in .
在一些实施例中,所述根据所述第二语法元素确定所述拼接图中第i个区块的表达格式包括:所述第二语法元素为第八预设值,确定所述第i个区块的表达格式为第一表达格式;所述第二语法元素为第九预设值,确定所述第i个区块的表达格式为第二表达格式。In some embodiments, determining the expression format of the i-th block in the mosaic diagram based on the second syntax element includes: the second syntax element is an eighth preset value, and determining the i-th block The expression format of the block is the first expression format; the second syntax element is the ninth preset value, which determines that the expression format of the i-th block is the second expression format.
在一些实施例中,所述第二语法元素位于所述拼接图的第i个区块的拼接图区块数据单元头中。In some embodiments, the second syntax element is located in the mosaic block data unit header of the i-th block of the mosaic map.
在一些实施例中,所述编码单元1103,用于若所述第i个区块的表达格式为第一表达格式,确定所述第i个区块中子图块采用所述第一表达格式对应的编码方法进行编码,得到所述第一表达格式的视觉媒体内容对应的码流;若所述第i个区块的表达格式为第二表达格式,确定所述第i个区块中子图块采用所述第二表达格式对应的编码方法进行编码,得到所述第二表达格式的视觉媒体内容对应的码流。In some embodiments, the encoding unit 1103 is configured to, if the expression format of the i-th block is a first expression format, determine that the sub-tile in the i-th block adopts the first expression format. Encode with the corresponding encoding method to obtain the code stream corresponding to the visual media content of the first expression format; if the expression format of the i-th block is the second expression format, determine the neutron of the i-th block The tiles are encoded using the encoding method corresponding to the second expression format to obtain a code stream corresponding to the visual media content of the second expression format.
在一些实施例中,所述码流的参数集子码流中包括第三语法元素,根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流。In some embodiments, the parameter set sub-code stream of the code stream includes a third syntax element, and the code stream corresponding to the visual media content including at least one expression format in the code stream is determined according to the third syntax element. .
在一些实施例中,所述根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流,包括:所述第三语法元素为第一数值,确定所述码流中同时包括第一表达格式的视觉媒体内容对应的码流和第二表达格式的视觉媒体内容对应的码流;所述第三语法元素为第二数值,确定所述码流中包括所述第一表达格式的视觉媒体内容对应的码流;所述第三语法元素为第三数值,确定所述码流中包括所述第二表达格式的视觉媒体内容对应的码流。In some embodiments, determining the code stream corresponding to visual media content including at least one expression format in the code stream according to the third syntax element includes: the third syntax element is a first value, determining The code stream simultaneously includes a code stream corresponding to the visual media content in the first expression format and a code stream corresponding to the visual media content in the second expression format; the third syntax element is a second value, which determines the code stream in the code stream. The code stream includes the code stream corresponding to the visual media content in the first expression format; the third syntax element is a third value, which determines that the code stream includes the code stream corresponding to the visual media content in the second expression format.
在一些实施例中,所述至少一个拼接图包括异构混合拼接图时,所述第三语法元素用于指示所述码流中包括至少两种表达格式的视觉媒体内容对应的码流。In some embodiments, when the at least one mosaic includes a heterogeneous hybrid mosaic, the third syntax element is used to indicate that the code stream includes a code stream corresponding to visual media content in at least two expression formats.
在一些实施例中,所述编码单元1103,用于对所述至少一个拼接图进行编码,得到视频压缩子码流;对所述至少一个拼接图的拼接图信息进行编码,得到拼接图信息子码流;将所述视频压缩子码流和所述拼接图信息子码流合成所述码流。In some embodiments, the encoding unit 1103 is used to encode the at least one spliced image to obtain a video compression sub-stream; encode the spliced image information of the at least one spliced image to obtain the spliced image information sub-stream. Code stream; synthesize the video compression sub-stream and the splicing image information sub-stream into the code stream.
在一些实施例中,所述至少一种表达格式包括:多视点视频、点云和网格中的至少一种。In some embodiments, the at least one expression format includes: at least one of multi-view video, point cloud, and mesh.
在一些实施例中,所述异构混合拼接图以下至少一种:单一属性异构混合拼接图和多属性异构混合拼接图;所述同构拼接图包括以下至少一种:单一属性同构拼接图和多属性同构拼接图。In some embodiments, the heterogeneous hybrid mosaic diagram includes at least one of the following: a single attribute heterogeneous hybrid mosaic diagram and a multi-attribute heterogeneous hybrid mosaic diagram; the isomorphic mosaic diagram includes at least one of the following: single attribute isomorphism Mosaic graphs and multi-attribute isomorphic mosaic graphs.
本申请实施例还提供了一种解码装置,图12为本申请一实施例提供的解码装置的示意性框图,该解码装置120应用于解码器。如图12所示,解码装置120包括:An embodiment of the present application also provides a decoding device. Figure 12 is a schematic block diagram of a decoding device provided by an embodiment of the present application. The decoding device 120 is applied to a decoder. As shown in Figure 12, the decoding device 120 includes:
解码单元1201,用于解码码流,得到拼接图和拼接图信息,其中,所述拼接图信息包括第一语法元素,根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图;The decoding unit 1201 is used to decode the code stream to obtain the splicing image and the splicing image information, wherein the splicing image information includes a first syntax element, and it is determined according to the first syntax element that the splicing image is a heterogeneous hybrid splicing image or isomorphic mosaic;
拆分单元1202,用于根据所述第一语法元素确定所述拼接图为异构混合拼接图时,根据所述拼接图的拼接图信息对所述拼接图进行拆分,得到至少两种同构区块,其中,所述至少两种同构区块对应不同的视觉媒体内容表达格式;The splitting unit 1202 is configured to split the spliced image according to the spliced image information of the spliced image to obtain at least two homogeneous ones when it is determined according to the first syntax element that the spliced image is a heterogeneous hybrid spliced image. Constructing blocks, wherein the at least two isomorphic blocks correspond to different visual media content expression formats;
所述拆分单元1202,用于根据所述第一语法元素确定所述拼接图为同构拼接图时,根据所述拼接图的拼接图信息对所述拼接图进行拆分,得到一种同构区块,其中,所述一种同构区块对应相同的视觉媒体内容表达格式;The splitting unit 1202 is configured to split the spliced diagram according to the spliced diagram information of the spliced diagram to obtain a homogeneous spliced diagram when it is determined according to the first syntax element that the spliced diagram is a homogeneous spliced diagram. Constituent blocks, wherein said one isomorphic block corresponds to the same visual media content expression format;
处理单元1203,用于对所述同构区块进行解码重建,得到至少一种表达格式的视觉媒体内容。The processing unit 1203 is configured to decode and reconstruct the homogeneous blocks to obtain visual media content in at least one expression format.
在一些实施例中,所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,包括:所述第一语法元素为第一预设值,则确定所述拼接图为包括第一表达格式和第二表达格式的同构区块的异构混合拼接图,其中,所述第一表达格式和所述第二表达格式为不同表达格式;所述第一语法元素为第二预设值,则确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;所述第一语法元素为第三预设值,则确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。In some embodiments, determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram includes: if the first syntax element is a first preset value, then determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram. The figure shows a heterogeneous mixed splicing diagram including homogeneous blocks of a first expression format and a second expression format, wherein the first expression format and the second expression format are different expression formats; the first syntax element is the second preset value, then it is determined that the splicing diagram is a isomorphic splicing diagram including the isomorphic blocks of the first expression format; the first syntax element is the third preset value, then it is determined that the splicing The figure shows a isomorphic mosaic diagram including the isomorphic blocks of the second expression format.
在一些实施例中,所述第一语法元素包括:第一子语法元素和第二子语法元素,根据所述第一子语法元素和所述第二子语法元素确定所述拼接图为异构混合拼接图或者同构拼接图;In some embodiments, the first syntax element includes: a first sub-syntax element and a second sub-syntax element, and it is determined that the splicing graph is heterogeneous according to the first sub-syntax element and the second sub-syntax element. Hybrid mosaic or isomorphic mosaic;
所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,包括:所述第一子语法元素为第四预设值,则确定所述拼接图包括第一表达格式的同构区块;和/或,所述第二子语法元素为第五预设值,则确定所述拼接图包括第二表达格式的同构区块;其中,所述第一表达格式和所述第二表达格式为不同表达格式。Determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram based on the first syntax element includes: if the first sub-grammar element is a fourth preset value, then it is determined that the splicing diagram includes the first expression isomorphic blocks of the format; and/or, if the second sub-syntax element is the fifth preset value, it is determined that the splicing diagram includes a isomorphic block of the second expression format; wherein, the first expression format and the second expression format are different expression formats.
在一些实施例中,所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,还包括:所述第一子语法元素为第六预设值,则确定所述拼接图不包括第一表达格式的同构区块;所述第二子语法元素为第七预设值,则确定所述拼接图不包括第二表达格式的同构区块。In some embodiments, determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram further includes: determining that the first sub-grammar element is a sixth preset value. If the mosaic graph does not include the isomorphic blocks of the first expression format; if the second sub-syntax element is the seventh preset value, it is determined that the mosaic graph does not include the isomorphic blocks of the second expression format.
在一些实施例中,所述第一语法元素位于所述码流的参数集子码流。In some embodiments, the first syntax element is located in a parameter set sub-codestream of the codestream.
在一些实施例中,所述拼接图对应的拼接图序列参数集包括所述第一语法元素。In some embodiments, the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element.
在一些实施例中,根据所述第一语法元素确定所述拼接图为异构混合拼接图时,所述拼接图信息还包括第二语法元素,根据所述第二语法元素确定所述拼接图中第i个区块的表达格式。In some embodiments, when it is determined based on the first syntax element that the spliced image is a heterogeneous hybrid spliced image, the spliced image information also includes a second syntax element, and the spliced image is determined based on the second syntax element The expression format of the i-th block in .
在一些实施例中,所述根据所述第二语法元素确定所述拼接图中第i个区块的表达格式包括:所述第二语法元素为第八预设值,确定所述第i个区块的表达格式为第一表达格式;所述第二语法元素为第九预设值,确定所述第i个区块的表达格式为第二表达格式。In some embodiments, determining the expression format of the i-th block in the mosaic diagram based on the second syntax element includes: the second syntax element is an eighth preset value, and determining the i-th block The expression format of the block is the first expression format; the second syntax element is the ninth preset value, which determines that the expression format of the i-th block is the second expression format.
在一些实施例中,所述第二语法元素位于所述拼接图的第i个区块的拼接图区块数据单元头中。In some embodiments, the second syntax element is located in the mosaic block data unit header of the i-th block of the mosaic map.
在一些实施例中,所述处理单元1203,用于若所述第i个区块的表达格式为第一表达格式,确定所述第i个区块中子图块采用所述第一表达格式对应的解码方法进行解码重建,得到所述第一表达格式的视觉媒体内容;若所述第i个区块的表达格式为第二表达格式,确定所述第i个区块中子图块采用所述第二表达格式对应的解码方法进行解码重建,得到所述第二表达格式的视觉媒体内容。In some embodiments, the processing unit 1203 is configured to, if the expression format of the i-th block is a first expression format, determine that the sub-tiles in the i-th block adopt the first expression format. The corresponding decoding method performs decoding and reconstruction to obtain the visual media content of the first expression format; if the expression format of the i-th block is the second expression format, determine the sub-block in the i-th block using The decoding method corresponding to the second expression format performs decoding and reconstruction to obtain the visual media content of the second expression format.
在一些实施例中,所述码流的参数集子码流中包括第三语法元素,根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流。In some embodiments, the parameter set sub-code stream of the code stream includes a third syntax element, and the code stream corresponding to the visual media content including at least one expression format in the code stream is determined according to the third syntax element. .
在一些实施例中,所述根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流,包括:所述第三语法元素为第一数值,确定所述码流中同时包括第一表达格式的视觉媒体内容对应的码流和第二表达格式的视觉媒体内容对应的码流;所述第三语法元素为第二数值,确定所述码流中包括所述第一表达格式的视觉媒体内容对应的码流;所述第三语法元素为第三数值,确定所述码流中包括所述第二表达格式的视觉媒体内容对应的码流。In some embodiments, determining the code stream corresponding to visual media content including at least one expression format in the code stream according to the third syntax element includes: the third syntax element is a first value, determining The code stream simultaneously includes a code stream corresponding to the visual media content in the first expression format and a code stream corresponding to the visual media content in the second expression format; the third syntax element is a second value, which determines the code stream in the code stream. The code stream includes the code stream corresponding to the visual media content in the first expression format; the third syntax element is a third value, which determines that the code stream includes the code stream corresponding to the visual media content in the second expression format.
在一些实施例中,所述解码单元1201,用于根据所述第三语法元素确定所述码流包括至少两种表达格式的视觉媒体内容对应的码流,解码所述码流得到异构混合拼接图。In some embodiments, the decoding unit 1201 is configured to determine, according to the third syntax element, that the code stream includes a code stream corresponding to visual media content in at least two expression formats, and decode the code stream to obtain a heterogeneous hybrid Mosaic diagram.
在一些实施例中,所述码流包括视频压缩子码流和拼接图信息子码流,所述解码单元1201,用于解码所述视频压缩子码流,得到所述至少一个拼接图;解码所述拼接图信息子码流,得到所述至少一个拼接图的拼接图信息。In some embodiments, the code stream includes a video compression sub-stream and a splicing image information sub-stream, and the decoding unit 1201 is used to decode the video compression sub-stream to obtain the at least one splicing image; decoding The splicing picture information sub-stream is used to obtain the splicing picture information of the at least one splicing picture.
在一些实施例中,所述至少一种表达格式包括:多视点视频、点云和网格中的至少一种。In some embodiments, the at least one expression format includes: at least one of multi-view video, point cloud, and mesh.
在一些实施例中,所述异构混合拼接图以下至少一种:单一属性异构混合拼接图和多属性异构混合拼接图;所述同构拼接图包括以下至少一种:单一属性同构拼接图和多属性同构拼接图。In some embodiments, the heterogeneous hybrid mosaic diagram includes at least one of the following: a single attribute heterogeneous hybrid mosaic diagram and a multi-attribute heterogeneous hybrid mosaic diagram; the isomorphic mosaic diagram includes at least one of the following: single attribute isomorphism Mosaic graphs and multi-attribute isomorphic mosaic graphs.
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。It should be understood that the device embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here.
上文中结合附图从功能单元的角度描述了本申请实施例的装置和系统。应理解,该功能单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过硬件和软件单元组合实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件单元组合执行完成。可选地,软件单元可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。The device and system of the embodiments of the present application are described above from the perspective of functional units in conjunction with the accompanying drawings. It should be understood that this functional unit can be implemented in the form of hardware, can also be implemented in the form of instructions in the software, or can also be implemented in a combination of hardware and software units. Specifically, each step of the method embodiments in the embodiments of the present application can be completed by integrated logic circuits of hardware in the processor and/or instructions in the form of software. The steps of the methods disclosed in conjunction with the embodiments of the present application can be directly embodied in hardware. The execution of the decoding processor is completed, or the execution is completed using a combination of hardware and software units in the decoding processor. Optionally, the software unit may be located in a mature storage medium in this field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, etc. The storage medium is located in the memory, and the processor reads the information in the memory and completes the steps in the above method embodiment in combination with its hardware.
在实际应用中,本申请实施例还提供了一种编码器,图13为本申请一实施例提供的编码器的示意性框图,如图13所示,编码器1310包括:In practical applications, the embodiment of the present application also provides an encoder. Figure 13 is a schematic block diagram of the encoder provided by an embodiment of the present application. As shown in Figure 13, the encoder 1310 includes:
第二存储器1320和第二处理器1330;第二存储器1320存储有可在第二处理器1330上运行的计算机程序,第二处理器1330执行程序时编码器侧的编码方法。The second memory 1320 and the second processor 1330; the second memory 1320 stores a computer program that can be run on the second processor 1330, and the second processor 1330 executes the encoding method on the encoder side of the program.
在实际应用中,本申请实施例还提供了一种解码器,图14为本申请一实施例提供的解码器的示意性框图,如图14所示,解码器1410包括:In practical applications, this embodiment of the present application also provides a decoder. Figure 14 is a schematic block diagram of a decoder provided by an embodiment of the present application. As shown in Figure 14, the decoder 1410 includes:
第一存储器1420和第一处理器1430;第一存储器1420存储有可在第一处理器1430上运行的计算机程序,第一处理器1430执行程序时解码器侧的解码方法。The first memory 1420 and the first processor 1430; the first memory 1420 stores a computer program that can be run on the first processor 1430, and the first processor 1430 executes the decoding method on the decoder side of the program.
在本申请的一些实施例中,该处理器可以包括但不限于:In some embodiments of the present application, the processor may include, but is not limited to:
通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。General processor, Digital Signal Processor (DSP), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates Or transistor logic devices, discrete hardware components, etc.
在本申请的一些实施例中,该存储器包括但不限于:In some embodiments of the present application, the memory includes but is not limited to:
易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。Volatile memory and/or non-volatile memory. Among them, non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory may be Random Access Memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are available, such as static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synch link DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DR RAM).
另外,在本实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采 用软件功能模块的形式实现。In addition, each functional module in this embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software function modules.
在本申请的再一实施例中,参见图15,其示出了本申请实施例提供的一种编解码系统的组成结构示意图。如图15所示,编解码系统150可以包括编码器1501和解码器1502。其中,编码器1501可以为集成有前述实施例所述编码装置的设备;解码器1502可以为集成有前述实施例所述解码装置的设备。In yet another embodiment of the present application, see FIG. 15 , which shows a schematic structural diagram of a coding and decoding system provided by an embodiment of the present application. As shown in Figure 15, the encoding and decoding system 150 may include an encoder 1501 and a decoder 1502. The encoder 1501 may be a device integrated with the encoding device described in the previous embodiment; the decoder 1502 may be a device integrated with the decoding device described in the previous embodiment.
在本申请实施例中,该编解码系统150中,无论是编码器1501还是解码器1502,均可以利用相邻参考像素与待预测像素的颜色分量信息,实现待预测像素对应加权系数的计算;而且不同的参考像素可以具有不同的加权系数,将此加权系数应用于当前块中待预测像素的色度预测,不仅可以提高色度预测的准确性,节省码率,而且还能够提升编解码性能。In this embodiment of the present application, in the coding and decoding system 150, both the encoder 1501 and the decoder 1502 can use the color component information of adjacent reference pixels and the pixels to be predicted to implement the calculation of the weighting coefficient corresponding to the pixel to be predicted; Moreover, different reference pixels can have different weighting coefficients. Applying this weighting coefficient to the chroma prediction of the pixels to be predicted in the current block can not only improve the accuracy of chroma prediction and save code rate, but also improve the encoding and decoding performance. .
本申请实施例还提供一种芯片,用于实现上述编解码方法。具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的电子设备执行如上述编解码方法。An embodiment of the present application also provides a chip for implementing the above encoding and decoding method. Specifically, the chip includes: a processor, configured to call and run a computer program from a memory, so that the electronic device installed with the chip executes the above encoding and decoding method.
本申请实施例还提供一种计算机存储介质,其中存储有计算机程序,该计算机程序被第二处理器执行时,实现编码器的编码方法;或者,该计算机程序被第一处理器执行时,实现解码器的解码方法。或者说,本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。Embodiments of the present application also provide a computer storage medium in which a computer program is stored. When the computer program is executed by the second processor, the encoding method of the encoder is implemented; or, when the computer program is executed by the first processor, the encoding method of the encoder is implemented. The decoding method of the decoder. In other words, embodiments of the present application also provide a computer program product containing instructions, which when executed by a computer causes the computer to perform the method of the above method embodiments.
本申请还提供了一种码流,该码流是根据上述编码方法生成的,可选的,该码流中包括上述第一语法元素,或者包括第二语法元素和第三语法元素。This application also provides a code stream, which is generated according to the above encoding method. Optionally, the code stream includes the above first syntax element, or includes a second syntax element and a third syntax element.
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted over a wired connection from a website, computer, server, or data center (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website, computer, server or data center. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media. The available media may be magnetic media (such as floppy disks, hard disks, magnetic tapes), optical media (such as digital video discs (DVD)), or semiconductor media (such as solid state disks (SSD)), etc.
本领域普通技术人员可以意识到,结合本申请中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed in this application can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。例如,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。A unit described as a separate component may or may not be physically separate. A component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or it may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, each functional unit in various embodiments of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以该权利要求的保护范围为准。The above contents are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or replacements within the technical scope disclosed in the present application, and should are covered by the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims (38)

  1. 一种解码方法,其中,包括:A decoding method, including:
    解码码流,得到拼接图和拼接图信息,其中,所述拼接图信息包括第一语法元素,根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图;Decode the code stream to obtain a spliced image and spliced image information, wherein the spliced image information includes a first syntax element, and it is determined according to the first syntax element that the spliced image is a heterogeneous hybrid spliced image or a homogeneous spliced image;
    根据所述第一语法元素确定所述拼接图为异构混合拼接图时,根据所述拼接图的拼接图信息对所述拼接图进行拆分,得到至少两种同构区块,其中,所述至少两种同构区块对应不同的视觉媒体内容表达格式;When it is determined that the spliced graph is a heterogeneous hybrid spliced graph according to the first syntax element, the spliced graph is split according to the spliced graph information of the spliced graph to obtain at least two types of isomorphic blocks, wherein: The at least two isomorphic blocks correspond to different visual media content expression formats;
    根据所述第一语法元素确定所述拼接图为同构拼接图时,根据所述拼接图的拼接图信息对所述拼接图进行拆分,得到一种同构区块,其中,所述一种同构区块对应相同的视觉媒体内容表达格式;When it is determined that the spliced graph is a isomorphic spliced graph according to the first syntax element, the spliced graph is split according to the spliced graph information of the spliced graph to obtain a homogeneous block, wherein the one Each isomorphic block corresponds to the same visual media content expression format;
    对所述同构区块进行解码重建,得到至少一种表达格式的视觉媒体内容。The homogeneous blocks are decoded and reconstructed to obtain visual media content in at least one expression format.
  2. 根据权利要求1所述的方法,其中,所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,包括:The method according to claim 1, wherein determining according to the first syntax element that the spliced graph is a heterogeneous hybrid spliced graph or a homogeneous spliced graph includes:
    所述第一语法元素为第一预设值,则确定所述拼接图为包括第一表达格式和第二表达格式的同构区块的异构混合拼接图,其中,所述第一表达格式和所述第二表达格式为不同表达格式;If the first syntax element is a first preset value, it is determined that the mosaic diagram is a heterogeneous hybrid mosaic diagram including homogeneous blocks of the first expression format and the second expression format, wherein the first expression format and the second expression format are different expression formats;
    所述第一语法元素为第二预设值,则确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;If the first syntax element is a second preset value, it is determined that the mosaic graph is a isomorphic mosaic graph including isomorphic blocks of the first expression format;
    所述第一语法元素为第三预设值,则确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。If the first syntax element is a third preset value, it is determined that the mosaic graph is a isomorphic mosaic graph including isomorphic blocks of the second expression format.
  3. 根据权利要求1所述的方法,其中,所述第一语法元素包括:第一子语法元素和第二子语法元素,根据所述第一子语法元素和所述第二子语法元素确定所述拼接图为异构混合拼接图或者同构拼接图;The method according to claim 1, wherein the first syntax element includes: a first sub-syntax element and a second sub-syntax element, and the said first sub-syntax element and the second sub-syntax element are determined according to the first sub-syntax element and the second sub-syntax element. The splicing diagram is a heterogeneous mixed splicing diagram or a homogeneous splicing diagram;
    所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,包括:Determining according to the first syntax element that the spliced graph is a heterogeneous hybrid spliced graph or a homogeneous spliced graph includes:
    所述第一子语法元素为第四预设值,则确定所述拼接图包括第一表达格式的同构区块;If the first sub-syntax element is a fourth preset value, it is determined that the splicing diagram includes isomorphic blocks of the first expression format;
    所述第二子语法元素为第五预设值,则确定所述拼接图包括第二表达格式的同构区块;If the second sub-syntax element is a fifth preset value, it is determined that the splicing diagram includes isomorphic blocks of the second expression format;
    其中,所述第一表达格式和所述第二表达格式为不同表达格式。Wherein, the first expression format and the second expression format are different expression formats.
  4. 根据权利要求3所述的方法,其中,所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,还包括:The method according to claim 3, wherein determining according to the first syntax element that the spliced graph is a heterogeneous hybrid spliced graph or a homogeneous spliced graph further includes:
    所述第一子语法元素为第六预设值,则确定所述拼接图不包括第一表达格式的同构区块;If the first sub-syntax element is a sixth preset value, it is determined that the spliced image does not include isomorphic blocks of the first expression format;
    所述第二子语法元素为第七预设值,则确定所述拼接图不包括第二表达格式的同构区块。If the second sub-syntax element is a seventh preset value, it is determined that the spliced image does not include isomorphic blocks of the second expression format.
  5. 根据权利要求1-4任一项所述的方法,其中,所述第一语法元素位于所述码流的参数集子码流。The method according to any one of claims 1 to 4, wherein the first syntax element is located in a parameter set sub-code stream of the code stream.
  6. 根据权利要求1-4任一项所述的方法,其中,所述拼接图对应的拼接图序列参数集包括所述第一语法元素。The method according to any one of claims 1 to 4, wherein the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element.
  7. 根据权利要求1-6任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 1-6, wherein the method further includes:
    根据所述第一语法元素确定所述拼接图为异构混合拼接图时,所述拼接图信息还包括第二语法元素,根据所述第二语法元素确定所述拼接图中第i个区块的表达格式。When the mosaic image is determined to be a heterogeneous hybrid mosaic image based on the first syntax element, the mosaic image information also includes a second syntax element, and the i-th block in the mosaic image is determined based on the second syntax element. expression format.
  8. 根据权利要求7所述的方法,其中,所述根据所述第二语法元素确定所述拼接图中第i个区块的表达格式包括:The method according to claim 7, wherein determining the expression format of the i-th block in the mosaic diagram according to the second syntax element includes:
    所述第二语法元素为第八预设值,则确定所述第i个区块的表达格式为第一表达格式;If the second syntax element is the eighth preset value, then the expression format of the i-th block is determined to be the first expression format;
    所述第二语法元素为第九预设值,则确定所述第i个区块的表达格式为第二表达格式。If the second syntax element is a ninth preset value, it is determined that the expression format of the i-th block is the second expression format.
  9. 根据权利要求7所述的方法,其中,所述第二语法元素位于所述拼接图的第i个区块的拼接图区块数据单元头中。The method of claim 7, wherein the second syntax element is located in a mosaic block data unit header of the i-th block of the mosaic.
  10. 根据权利要求7-9任一项所述的方法,其中,所述对所述同构区块进行解码重建,得到至少一种表达格式的视觉媒体内容,包括:The method according to any one of claims 7-9, wherein the decoding and reconstruction of the isomorphic blocks to obtain visual media content in at least one expression format includes:
    若所述第i个区块的表达格式为第一表达格式,确定所述第i个区块中子图块采用所述第一表达格式对应的解码方法进行解码重建,得到所述第一表达格式的视觉媒体内容;If the expression format of the i-th block is the first expression format, it is determined that the sub-block in the i-th block is decoded and reconstructed using the decoding method corresponding to the first expression format to obtain the first expression. formats of visual media content;
    若所述第i个区块的表达格式为第二表达格式,确定所述第i个区块中子图块采用所述第二表达格式对应的解码方法进行解码重建,得到所述第二表达格式的视觉媒体内容。If the expression format of the i-th block is the second expression format, it is determined that the sub-block in the i-th block is decoded and reconstructed using the decoding method corresponding to the second expression format to obtain the second expression. format of visual media content.
  11. 根据权利要求1-10任一项所述的方法,其中,所述码流的参数集子码流中包括第三语法元素,根据第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流。The method according to any one of claims 1 to 10, wherein the parameter set sub-code stream of the code stream includes a third syntax element, and it is determined according to the third syntax element that the code stream includes at least one expression format The code stream corresponding to the visual media content.
  12. 根据权利要求11所述的方法,其中,所述根据第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流,包括:The method according to claim 11, wherein determining, according to the third syntax element, the code stream corresponding to the visual media content including at least one expression format in the code stream includes:
    所述第三语法元素为第一数值,确定所述码流中同时包括第一表达格式的视觉媒体内容对应的码流和第二表达格式的视觉媒体内容对应的码流;The third syntax element is a first value, which determines that the code stream includes both a code stream corresponding to the visual media content in the first expression format and a code stream corresponding to the visual media content in the second expression format;
    所述第三语法元素为第二数值,确定所述码流中包括所述第一表达格式的视觉媒体内容对应的码流;The third syntax element is a second value, which determines that the code stream includes a code stream corresponding to the visual media content of the first expression format;
    所述第三语法元素为第三数值,确定所述码流中包括所述第二表达格式的视觉媒体内容对应的码流。The third syntax element is a third numerical value, which determines that the code stream includes a code stream corresponding to the visual media content of the second expression format.
  13. 根据权利要求11-12任一项所述的方法,其中,所述解码码流,得到至少一个拼接图,包括:The method according to any one of claims 11-12, wherein the decoding code stream to obtain at least one splicing image includes:
    根据所述第三语法元素确定所述码流包括至少两种表达格式的视觉媒体内容对应的码流,解码所述码流得到异构混合拼接图。It is determined according to the third syntax element that the code stream includes a code stream corresponding to visual media content in at least two expression formats, and the code stream is decoded to obtain a heterogeneous hybrid splicing image.
  14. 根据权利要求1-13任一项所述的方法,其中,所述码流包括视频压缩子码流和拼接图信息子码流,所述解码码流,得到至少一个拼接图和拼接图信息,包括:The method according to any one of claims 1 to 13, wherein the code stream includes a video compression sub-stream and a splicing image information sub-stream, and the decoding code stream obtains at least one splicing image and splicing image information, include:
    解码所述视频压缩子码流,得到所述至少一个拼接图;Decode the video compression sub-stream to obtain the at least one splicing image;
    解码所述拼接图信息子码流,得到所述至少一个拼接图的拼接图信息。Decode the splicing picture information sub-stream to obtain the splicing picture information of the at least one splicing picture.
  15. 根据权利要求1-14任一项所述的方法,其中,所述至少一种表达格式包括:多视点视频、点云和网格中的至少一种。The method according to any one of claims 1 to 14, wherein the at least one expression format includes: at least one of multi-view video, point cloud and mesh.
  16. 根据权利要求1-15任一项所述的方法,其中,所述异构混合拼接图以下至少一种:单一属性异构混合拼接图和多属性异构混合拼接图;The method according to any one of claims 1 to 15, wherein the heterogeneous hybrid mosaic graph is at least one of the following: a single attribute heterogeneous hybrid mosaic graph and a multi-attribute heterogeneous hybrid mosaic graph;
    所述同构拼接图包括以下至少一种:单一属性同构拼接图和多属性同构拼接图。The isomorphic splicing diagram includes at least one of the following: a single attribute isomorphic splicing diagram and a multi-attribute isomorphic splicing diagram.
  17. 一种编码方法,其中,包括:A coding method that includes:
    对至少一种表达格式的视觉媒体内容进行处理,得到至少一种同构区块,其中,不同种同构区块对应不同的视觉媒体内容表达格式;Process the visual media content of at least one expression format to obtain at least one isomorphic block, where different types of isomorphic blocks correspond to different visual media content expression formats;
    对所述至少一种同构区块进行拼接,得到至少一个拼接图和拼接图信息,其中,所述拼接图信息包括第一语法元素,根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图,所述异构混合拼接图包括至少两种同构区块,所述同构拼接图包括一种同构区块;The at least one isomorphic block is spliced to obtain at least one spliced graph and spliced graph information, wherein the spliced graph information includes a first syntax element, and it is determined that the spliced graph is a heterogeneous one according to the first syntax element. A heterogeneous hybrid mosaic map or a homogeneous mosaic map, the heterogeneous hybrid mosaic map includes at least two types of isomorphic blocks, and the isomorphic mosaic map includes one type of isomorphic block;
    对所述至少一个拼接图和拼接图信息进行编码,得到码流。The at least one spliced image and the spliced image information are encoded to obtain a code stream.
  18. 根据权利要求17所述的方法,其中,所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,包括:The method according to claim 17, wherein the determining according to the first syntax element that the spliced graph is a heterogeneous hybrid spliced graph or a homogeneous spliced graph includes:
    所述第一语法元素为第一预设值,则确定所述拼接图为包括第一表达格式和第二表达格式的同构区块的异构混合拼接图,其中,所述第一表达格式和所述第二表达格式为不同表达格式;If the first syntax element is a first preset value, it is determined that the mosaic diagram is a heterogeneous hybrid mosaic diagram including homogeneous blocks of the first expression format and the second expression format, wherein the first expression format and the second expression format are different expression formats;
    所述第一语法元素为第二预设值,则确定所述拼接图为包括所述第一表达格式的同构区块的同构拼接图;If the first syntax element is a second preset value, it is determined that the mosaic graph is a isomorphic mosaic graph including isomorphic blocks of the first expression format;
    所述第一语法元素为第三预设值,则确定所述拼接图为包括所述第二表达格式的同构区块的同构拼接图。If the first syntax element is a third preset value, it is determined that the mosaic graph is a isomorphic mosaic graph including isomorphic blocks of the second expression format.
  19. 根据权利要求17所述的方法,其中,所述第一语法元素包括:第一子语法元素和第二子语法元素,根据所述第一子语法元素和所述第二子语法元素确定所述拼接图为异构混合拼接图或者同构拼接图;The method of claim 17, wherein the first syntax element includes: a first sub-syntax element and a second sub-syntax element, and the first sub-syntax element and the second sub-syntax element are determined according to the first sub-syntax element and the second sub-syntax element. The splicing diagram is a heterogeneous mixed splicing diagram or a homogeneous splicing diagram;
    所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,包括:Determining according to the first syntax element that the spliced graph is a heterogeneous hybrid spliced graph or a homogeneous spliced graph includes:
    所述第一子语法元素为第四预设值,则确定所述拼接图包括第一表达格式的同构区块;If the first sub-syntax element is a fourth preset value, it is determined that the splicing diagram includes isomorphic blocks of the first expression format;
    所述第二子语法元素为第五预设值,则确定所述拼接图包括第二表达格式的同构区块;If the second sub-syntax element is a fifth preset value, it is determined that the splicing diagram includes isomorphic blocks of the second expression format;
    其中,所述第一表达格式和所述第二表达格式为不同表达格式。Wherein, the first expression format and the second expression format are different expression formats.
  20. 根据权利要求19所述的方法,其中,所述根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图,还包括:The method according to claim 19, wherein determining according to the first syntax element that the spliced graph is a heterogeneous hybrid spliced graph or a homogeneous spliced graph further includes:
    所述第一子语法元素为第六预设值,则确定所述拼接图不包括第一表达格式的同构区块;If the first sub-syntax element is a sixth preset value, it is determined that the spliced image does not include isomorphic blocks of the first expression format;
    所述第二子语法元素为第七预设值,则确定所述拼接图不包括第二表达格式的同构区块。If the second sub-syntax element is a seventh preset value, it is determined that the spliced image does not include isomorphic blocks of the second expression format.
  21. 根据权利要求17-20任一项所述的方法,其中,所述第一语法元素位于所述码流的参数集子码流。The method according to any one of claims 17 to 20, wherein the first syntax element is located in a parameter set sub-code stream of the code stream.
  22. 根据权利要求17-20任一项所述的方法,其中,所述拼接图对应的拼接图序列参数集包括所述第一语法元素。The method according to any one of claims 17 to 20, wherein the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element.
  23. 根据权利要求17-22任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 17-22, wherein the method further includes:
    根据所述第一语法元素确定所述拼接图为异构混合拼接图时,所述拼接图信息还包括第二语法元素,根据所述第二语法元素确定所述拼接图中第i个区块的表达格式。When the mosaic image is determined to be a heterogeneous hybrid mosaic image based on the first syntax element, the mosaic image information also includes a second syntax element, and the i-th block in the mosaic image is determined based on the second syntax element. expression format.
  24. 根据权利要求23所述的方法,其中,所述根据所述第二语法元素确定所述拼接图中第i个区块的表达格式包括:The method according to claim 23, wherein determining the expression format of the i-th block in the mosaic diagram according to the second syntax element includes:
    所述第二语法元素为第八预设值,则确定所述第i个区块的表达格式为第一表达格式;If the second syntax element is the eighth preset value, then the expression format of the i-th block is determined to be the first expression format;
    所述第二语法元素为第九预设值,则确定所述第i个区块的表达格式为第二表达格式。If the second syntax element is a ninth preset value, it is determined that the expression format of the i-th block is the second expression format.
  25. 根据权利要求24所述的方法,其中,所述第二语法元素位于所述拼接图的第i个区块的拼接 图区块数据单元头中。The method of claim 24, wherein the second syntax element is located in a mosaic block data unit header of the i-th block of the mosaic.
  26. 根据权利要求23-25任一项所述的方法,其中,所述对所述至少一个拼接图和拼接图信息进行编码,得到码流,包括:The method according to any one of claims 23 to 25, wherein said encoding the at least one splicing image and the splicing image information to obtain a code stream includes:
    若所述第i个区块的表达格式为第一表达格式,确定所述第i个区块中子图块采用所述第一表达格式对应的编码方法进行编码,得到所述第一表达格式的视觉媒体内容对应的码流;If the expression format of the i-th block is the first expression format, it is determined that the sub-blocks in the i-th block are encoded using the encoding method corresponding to the first expression format to obtain the first expression format. The code stream corresponding to the visual media content;
    若所述第i个区块的表达格式为第二表达格式,确定所述第i个区块中子图块采用所述第二表达格式对应的编码方法进行编码,得到所述第二表达格式的视觉媒体内容对应的码流。If the expression format of the i-th block is the second expression format, it is determined that the sub-tiles in the i-th block are encoded using the encoding method corresponding to the second expression format to obtain the second expression format. The code stream corresponding to the visual media content.
  27. 根据权利要求17-26任一项所述的方法,其中,所述码流的参数集子码流中包括第三语法元素,根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流。The method according to any one of claims 17 to 26, wherein the parameter set sub-code stream of the code stream includes a third syntax element, and it is determined according to the third syntax element that the code stream includes at least one The code stream corresponding to the visual media content in the expression format.
  28. 根据权利要求27所述的方法,其中,所述根据所述第三语法元素确定所述码流中包括至少一种表达格式的视觉媒体内容对应的码流,包括:The method of claim 27, wherein determining, according to the third syntax element, a code stream corresponding to visual media content including at least one expression format in the code stream includes:
    所述第三语法元素为第一数值,确定所述码流中同时包括第一表达格式的视觉媒体内容对应的码流和第二表达格式的视觉媒体内容对应的码流;The third syntax element is a first value, which determines that the code stream includes both a code stream corresponding to the visual media content in the first expression format and a code stream corresponding to the visual media content in the second expression format;
    所述第三语法元素为第二数值,确定所述码流中包括所述第一表达格式的视觉媒体内容对应的码流;The third syntax element is a second value, which determines that the code stream includes a code stream corresponding to the visual media content of the first expression format;
    所述第三语法元素为第三数值,确定所述码流中包括所述第二表达格式的视觉媒体内容对应的码流。The third syntax element is a third numerical value, which determines that the code stream includes a code stream corresponding to the visual media content of the second expression format.
  29. 根据权利要求27-28任一项所述的方法,其中,所述至少一个拼接图包括异构混合拼接图时,根据所述第三语法元素确定所述码流中包括至少两种表达格式的视觉媒体内容对应的码流。The method according to any one of claims 27-28, wherein when the at least one splicing diagram includes a heterogeneous hybrid splicing diagram, it is determined according to the third syntax element that the code stream includes at least two expression formats. The code stream corresponding to the visual media content.
  30. 根据权利要求17-29任一项所述的方法,其中,所述对所述至少一个拼接图和拼接图信息进行编码,得到码流,包括:The method according to any one of claims 17 to 29, wherein said encoding said at least one splicing image and the splicing image information to obtain a code stream includes:
    对所述至少一个拼接图进行编码,得到视频压缩子码流;Encode the at least one spliced image to obtain a video compression sub-stream;
    对所述至少一个拼接图的拼接图信息进行编码,得到拼接图信息子码流;Encode the splicing image information of the at least one splicing image to obtain a splicing image information sub-stream;
    将所述视频压缩子码流和所述拼接图信息子码流合成所述码流。The video compression sub-stream and the splicing image information sub-stream are combined into the code stream.
  31. 根据权利要求17-30任一项所述的方法,其中,所述至少一种表达格式包括:多视点视频、点云和网格中的至少一种。The method according to any one of claims 17 to 30, wherein the at least one expression format includes: at least one of multi-view video, point cloud and mesh.
  32. 根据权利要求17-31任一项所述的方法,其中,所述异构混合拼接图以下至少一种:单一属性异构混合拼接图和多属性异构混合拼接图;The method according to any one of claims 17 to 31, wherein the heterogeneous hybrid mosaic graph is at least one of the following: a single attribute heterogeneous hybrid mosaic graph and a multi-attribute heterogeneous hybrid mosaic graph;
    所述同构拼接图包括以下至少一种:单一属性同构拼接图和多属性同构拼接图。The isomorphic splicing diagram includes at least one of the following: a single attribute isomorphic splicing diagram and a multi-attribute isomorphic splicing diagram.
  33. 一种解码装置,其中,包括:A decoding device, including:
    解码单元,用于解码码流,得到拼接图和拼接图信息,其中,所述拼接图信息包括第一语法元素,根据所述第一语法元素确定拼接图为异构混合拼接图或者同构拼接图;A decoding unit, configured to decode the code stream to obtain a splicing image and splicing image information, wherein the splicing image information includes a first syntax element, and the splicing image is determined to be a heterogeneous hybrid splicing image or a homogeneous splicing image according to the first syntax element. picture;
    拆分单元,用于根据所述第一语法元素确定所述拼接图为异构混合拼接图时,根据所述拼接图的拼接图信息对所述拼接图进行拆分,得到至少两种同构区块,其中,所述至少两种同构区块对应不同的视觉媒体内容表达格式;A splitting unit configured to split the spliced image according to the spliced image information of the spliced image to obtain at least two isomorphic images when it is determined according to the first syntax element that the spliced image is a heterogeneous mixed spliced image. Blocks, wherein the at least two isomorphic blocks correspond to different visual media content expression formats;
    所述拆分单元,用于根据所述第一语法元素确定所述拼接图为同构拼接图时,根据所述拼接图的拼接图信息对所述拼接图进行拆分,得到一种同构区块,其中,所述一种同构区块对应相同的视觉媒体内容表达格式;The splitting unit is configured to split the spliced diagram according to the spliced diagram information of the spliced diagram to obtain an isomorphic spliced diagram when it is determined according to the first syntax element that the spliced diagram is an isomorphic spliced diagram. Blocks, wherein said one isomorphic block corresponds to the same visual media content expression format;
    处理单元,用于对所述同构区块进行解码重建,得到至少一种表达格式的视觉媒体内容。A processing unit configured to decode and reconstruct the homogeneous blocks to obtain visual media content in at least one expression format.
  34. 一种编码装置,其中,包括:An encoding device, which includes:
    处理单元,用于对至少一种表达格式的视觉媒体内容进行处理,得到至少一种同构区块,其中,不同种同构区块对应不同的视觉媒体内容表达格式;A processing unit, configured to process visual media content in at least one expression format to obtain at least one isomorphic block, wherein different types of isomorphic blocks correspond to different visual media content expression formats;
    拼接单元,用于对所述至少一种同构区块进行拼接,得到至少一个拼接图和拼接图信息,其中,所述拼接图信息包括第一语法元素,根据所述第一语法元素确定所述拼接图为异构混合拼接图或者同构拼接图,所述异构混合拼接图包括至少两种同构区块,所述同构拼接图包括一种同构区块;A splicing unit, configured to splice the at least one isomorphic block to obtain at least one splicing graph and splicing graph information, wherein the splicing graph information includes a first syntax element, and the splicing graph information is determined according to the first syntax element. The mosaic diagram is a heterogeneous hybrid mosaic diagram or a homogeneous mosaic diagram, the heterogeneous hybrid mosaic diagram includes at least two types of isomorphic blocks, and the isomorphic mosaic diagram includes one type of isomorphic block;
    编码单元,用于对所述至少一个拼接图和拼接图信息进行编码,得到码流。An encoding unit, used to encode the at least one spliced image and the spliced image information to obtain a code stream.
  35. 一种解码器,其中,所述解码器包括:A decoder, wherein the decoder includes:
    第一存储器和第一处理器;a first memory and a first processor;
    所述第一存储器存储有可在第一处理器上运行的计算机程序,所述第一处理器执行所述程序时实现权利要求1至16任一项所述解码方法。The first memory stores a computer program that can be run on a first processor, and when the first processor executes the program, the decoding method of any one of claims 1 to 16 is implemented.
  36. 一种编码器,其中,所述编码器包括:An encoder, wherein the encoder includes:
    第二存储器和第二处理器;second memory and second processor;
    所述第二存储器存储有可在第二处理器上运行的计算机程序,所述第二处理器执行所述程序时实现权利要求17至32任一项所述编码方法。The second memory stores a computer program that can be run on a second processor, and when the second processor executes the program, the encoding method of any one of claims 17 to 32 is implemented.
  37. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被第一处理器执行时,实现权利要求1至16任一项所述解码方法;或者,所述计算机程序被第二处理器执行时,实现权利要求17至32任一项所述编码方法。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by the first processor, the decoding method of any one of claims 1 to 16 is implemented; or, When the computer program is executed by the second processor, the encoding method according to any one of claims 17 to 32 is implemented.
  38. 一种码流,其中,所述码流是基于如上述权利要求17至32任一项所述的方法生成的。A code stream, wherein the code stream is generated based on the method described in any one of claims 17 to 32.
PCT/CN2022/105006 2022-07-11 2022-07-11 Coding method and apparatus, decoding method and apparatus, and coder, decoder and storage medium WO2024011386A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/105006 WO2024011386A1 (en) 2022-07-11 2022-07-11 Coding method and apparatus, decoding method and apparatus, and coder, decoder and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/105006 WO2024011386A1 (en) 2022-07-11 2022-07-11 Coding method and apparatus, decoding method and apparatus, and coder, decoder and storage medium

Publications (1)

Publication Number Publication Date
WO2024011386A1 true WO2024011386A1 (en) 2024-01-18

Family

ID=89535246

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/105006 WO2024011386A1 (en) 2022-07-11 2022-07-11 Coding method and apparatus, decoding method and apparatus, and coder, decoder and storage medium

Country Status (1)

Country Link
WO (1) WO2024011386A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112188180A (en) * 2019-07-05 2021-01-05 浙江大学 Method and device for processing sub-block images
US20210209347A1 (en) * 2020-01-02 2021-07-08 Sony Corporation Texture map generation using multi-viewpoint color images
CN114071116A (en) * 2020-07-31 2022-02-18 阿里巴巴集团控股有限公司 Video processing method and device, electronic equipment and storage medium
CN114189697A (en) * 2021-12-03 2022-03-15 腾讯科技(深圳)有限公司 Video data processing method and device and readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112188180A (en) * 2019-07-05 2021-01-05 浙江大学 Method and device for processing sub-block images
US20210209347A1 (en) * 2020-01-02 2021-07-08 Sony Corporation Texture map generation using multi-viewpoint color images
CN114071116A (en) * 2020-07-31 2022-02-18 阿里巴巴集团控股有限公司 Video processing method and device, electronic equipment and storage medium
CN114189697A (en) * 2021-12-03 2022-03-15 腾讯科技(深圳)有限公司 Video data processing method and device and readable storage medium

Similar Documents

Publication Publication Date Title
US11151742B2 (en) Point cloud data transmission apparatus, point cloud data transmission method, point cloud data reception apparatus, and point cloud data reception method
US11170556B2 (en) Apparatus for transmitting point cloud data, a method for transmitting point cloud data, an apparatus for receiving point cloud data and a method for receiving point cloud data
US20220159261A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
TWI523492B (en) Non-nested sei messages in video coding
JP2022500931A (en) Improved attribute layers and signaling in point cloud coding
TW201830965A (en) Modified adaptive loop filter temporal prediction for temporal scalability support
US11968393B2 (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method
TW201729592A (en) Linear-model prediction with non-square prediction units in video coding
KR20180051594A (en) Improved Color Remapping Information Supplemental Enhancement Information Message Processing
BR112018007529B1 (en) OPERATING POINT SAMPLE GROUP ALIGNMENT IN MULTILAYER BIT STREAMS FILE FORMAT
US10574959B2 (en) Color remapping for non-4:4:4 format video content
TWI713354B (en) Color remapping information sei message signaling for display adaptation
CN115443652A (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method
WO2023142127A1 (en) Coding and decoding methods and apparatuses, device, and storage medium
CN113273193A (en) Encoder, decoder and corresponding methods for block configuration indication
WO2023071557A1 (en) Media file encapsulation method and apparatus, device, and storage medium
WO2022166462A1 (en) Encoding/decoding method and related device
WO2024011386A1 (en) Coding method and apparatus, decoding method and apparatus, and coder, decoder and storage medium
WO2023044868A1 (en) Video encoding method, video decoding method, device, system, and storage medium
WO2023201504A1 (en) Encoding method and apparatus, decoding method and apparatus, device, and storage medium
WO2024077806A1 (en) Coding method and apparatus, decoding method and apparatus, coder, decoder, and storage medium
CN114846789A (en) Decoder for indicating image segmentation information of a slice and corresponding method
TW202408245A (en) A coding and decoding method, device, encoder, decoder, storage medium and code stream
WO2024077616A1 (en) Coding and decoding method and coding and decoding apparatus, device, and storage medium
KR102659806B1 (en) Scaling parameters for V-PCC

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22950517

Country of ref document: EP

Kind code of ref document: A1