WO2024011386A1

WO2024011386A1 - Coding method and apparatus, decoding method and apparatus, and coder, decoder and storage medium

Info

Publication number: WO2024011386A1
Application number: PCT/CN2022/105006
Authority: WO
Inventors: 虞露; 金峡钶; 朱志伟; 戴震宇
Original assignee: 浙江大学; Oppo广东移动通信有限公司
Priority date: 2022-07-11
Filing date: 2022-07-11
Publication date: 2024-01-18
Also published as: TW202408245A

Abstract

The present application provides a coding method and apparatus, a decoding method and apparatus, and a coder, a decoder and a storage medium. For an application scenario that comprises visual media content in one or more expression formats, isomorphic blocks in different expression formats are stitched into a heterogeneous hybrid stitched image, isomorphic blocks in the same expression format are stitched into an isomorphic stitched image, and the obtained stitched images and stitched image information are written into a code stream. An isomorphic stitched image (e.g. at least one of a multi-view stitched image, a point cloud stitched image and a grid stitched image) and a heterogeneous hybrid stitched image are simultaneously present in a code stream, such that the coding method and the decoding method are applicable to the application scenarios of visual media content in a plurality of expression formats, thereby expanding the application scope. Moreover, the code stream includes a first syntactic element, such that the efficiency of a decoding end decoding a stitched image can be improved. Since isomorphic blocks in different expression formats are stitched in a heterogeneous hybrid stitched image for coding and decoding, the number of decoders invoked can be reduced, thereby reducing the implementation cost and improving the usability.

Description

A coding and decoding method, device, encoder, decoder and storage medium

Technical field

The present application relates to the field of image processing technology, and in particular to a coding and decoding method, device, encoder, decoder and storage medium.

Background technique

In three-dimensional application scenarios, such as virtual reality (VR), augmented reality (AR), mixed reality (Mix Reality, MR) and other application scenarios, visual expressions with different expression formats may appear in the same scene. media objects. For example, in the same three-dimensional scene, the scene background and some characters and objects are expressed in video, and another part of the characters are expressed in three-dimensional point cloud or three-dimensional grid.

When compressing and encoding, using multi-viewpoint video encoding, point cloud encoding, and grid encoding respectively will maintain the effective information of the original expression format better than all projection into multi-viewpoint video encoding, improve the quality of the viewing window rendered during viewing, and improve The overall efficiency of code rate-quality.

However, the current encoding and decoding technology encodes and decodes multi-view video, point cloud encoding and grid mesh respectively. A large number of codecs need to be called during the encoding and decoding process, making encoding and decoding expensive.

Contents of the invention

Embodiments of the present application provide a coding and decoding method, device, encoder, decoder, and storage medium.

In the first aspect, this application provides a decoding method applied to a decoder, including:

Decode the code stream to obtain a spliced image and spliced image information, wherein the spliced image information includes a first syntax element, and it is determined according to the first syntax element that the spliced image is a heterogeneous hybrid spliced image or a homogeneous spliced image;

When it is determined that the spliced graph is a heterogeneous hybrid spliced graph according to the first syntax element, the spliced graph is split according to the spliced graph information of the spliced graph to obtain at least two types of isomorphic blocks, wherein: The at least two isomorphic blocks correspond to different visual media content expression formats;

When it is determined that the spliced graph is a isomorphic spliced graph according to the first syntax element, the spliced graph is split according to the spliced graph information of the spliced graph to obtain a homogeneous block, wherein the one Each isomorphic block corresponds to the same visual media content expression format;

The homogeneous blocks are decoded and reconstructed to obtain visual media content in at least one expression format.

In the second aspect, this application provides an encoding method, applied to the encoder, including:

Process the visual media content of at least one expression format to obtain at least one isomorphic block, where different types of isomorphic blocks correspond to different visual media content expression formats;

The at least one isomorphic block is spliced to obtain at least one spliced graph and spliced graph information, wherein the spliced graph information includes a first syntax element, and it is determined that the spliced graph is a heterogeneous one according to the first syntax element. A heterogeneous hybrid mosaic map or a homogeneous mosaic map, the heterogeneous hybrid mosaic map includes at least two types of isomorphic blocks, and the isomorphic mosaic map includes one type of isomorphic block;

The at least one spliced image and the spliced image information are encoded to obtain a code stream.

In a third aspect, this application provides a decoding device, applied to a decoder, which includes:

A decoding unit, configured to decode the code stream to obtain a splicing image and splicing image information, wherein the splicing image information includes a first syntax element, and the splicing image is determined to be a heterogeneous hybrid splicing image or a homogeneous splicing image according to the first syntax element. picture;

A first splitting unit configured to split the spliced image according to the spliced image information of the spliced image to obtain at least two kinds of Isomorphic blocks, wherein the at least two isomorphic blocks correspond to different visual media content expression formats;

The second splitting unit is configured to split the spliced diagram according to the spliced diagram information of the spliced diagram to obtain an isomorphic spliced diagram when it is determined according to the first syntax element that the spliced diagram is an isomorphic spliced diagram. Blocks, wherein said one isomorphic block corresponds to the same visual media content expression format;

A processing unit configured to decode and reconstruct the homogeneous blocks to obtain visual media content in at least one expression format.

In a fourth aspect, this application provides an encoding device, applied to an encoder, which includes:

A processing unit, configured to process visual media content in at least one expression format to obtain at least one isomorphic block, wherein different types of isomorphic blocks correspond to different visual media content expression formats;

A splicing unit, configured to splice the at least one isomorphic block to obtain at least one splicing graph and splicing graph information, wherein the splicing graph information includes a first syntax element, and the splicing graph information is determined according to the first syntax element. The mosaic diagram is a heterogeneous hybrid mosaic diagram or a homogeneous mosaic diagram, the heterogeneous hybrid mosaic diagram includes at least two types of isomorphic blocks, and the isomorphic mosaic diagram includes one type of isomorphic block;

An encoding unit, used to encode the at least one spliced image and the spliced image information to obtain a code stream.

In a fifth aspect, a decoder is provided, including a first memory and a first processor; the first memory stores a computer program executable on the first processor to execute the above first aspect or its respective implementations. method within the method.

In a sixth aspect, an encoder is provided, including a second memory and a second processor; the second memory stores a computer program that can be run on the second processor to execute the above second aspect or its respective implementations. method within the method.

The seventh aspect provides a coding and decoding system, including an encoder and a decoder. The encoder is configured to perform the method in the above second aspect or its implementations, and the decoder is used to perform the method in the above first aspect or its implementations.

An eighth aspect provides a chip for implementing any one of the above-mentioned first to second aspects or the method in each implementation manner thereof. Specifically, the chip includes: a processor, configured to call and run a computer program from a memory, so that the device installed with the chip executes any one of the above-mentioned first to second aspects or implementations thereof. method.

A ninth aspect provides a computer-readable storage medium for storing a computer program that causes a computer to execute any one of the above-mentioned first to second aspects or the method in each implementation thereof.

In a tenth aspect, a computer program product is provided, including computer program instructions, which enable a computer to execute any one of the above-mentioned first to second aspects or the methods in each implementation thereof.

An eleventh aspect provides a computer program that, when run on a computer, causes the computer to execute any one of the above-mentioned first to second aspects or the method in each implementation thereof.

A twelfth aspect provides a code stream, which is generated based on the encoding method of the second aspect.

Based on the above technical solution, for application scenarios that include visual media content in one or more expression formats, homogeneous blocks in different expression formats are spliced into a heterogeneous mixed splicing picture, and homogeneous blocks in the same expression format are spliced into a heterogeneous mixed splicing image. Create a isomorphic splicing image, and write the resulting splicing image and the splicing image information into the code stream. There are both homogeneous splicing images (such as at least one of multi-viewpoint splicing images, point cloud splicing images and grid splicing images) and heterogeneous hybrid splicing images in the code stream, making this encoding and decoding method suitable for visual expressions of multiple expression formats. The application scenarios of media content expand the application scope of encoding and decoding methods. Moreover, the splicing picture information includes the first syntax element used to indicate the type of the splicing picture, which improves the decoding efficiency of the splicing picture at the decoding end. Furthermore, since homogeneous blocks of different expression formats are spliced into a heterogeneous hybrid splicing image for encoding and decoding, the number of 2D video codecs such as HEVC, VVC, AVC, and AVS that need to be called can be reduced, reducing Realize value and improve ease of use.

Description of drawings

Figure 1 is a schematic block diagram of a video encoding and decoding system related to an embodiment of the present application;

Figure 2A is a schematic block diagram of a video encoder involved in an embodiment of the present application;

Figure 2B is a schematic block diagram of a video decoder involved in an embodiment of the present application;

Figure 3A is a diagram of the organization and expression framework of multi-viewpoint video data;

Figure 3B is a schematic diagram of splicing image generation of multi-viewpoint video data;

Figure 3C is a diagram of the organization and expression framework of point cloud data;

Figures 3D to 3F are schematic diagrams of different types of point cloud data;

Figure 4 is a schematic diagram of multi-viewpoint video encoding;

Figure 5 is a schematic diagram of decoding multi-viewpoint video;

Figure 6 is a schematic flow chart of an encoding method provided by an embodiment of the present application;

Figure 7 is a schematic diagram of a heterogeneous hybrid splicing diagram provided by an embodiment of the present application;

Figure 8 is a schematic diagram of a isomorphic splicing diagram provided by an embodiment of the present application;

Figure 9 is a schematic diagram of the V3C bitstream structure provided by the embodiment of the present application;

Figure 10 is a schematic flow chart of a decoding method provided by an embodiment of the present application;

Figure 11 is a schematic block diagram of an encoding device provided by an embodiment of the present application;

Figure 12 is a schematic block diagram of a decoding device provided by an embodiment of the present application;

Figure 13 is a schematic block diagram of an encoder provided by an embodiment of the present application;

Figure 14 is a schematic block diagram of a decoder provided by an embodiment of the present application;

Figure 15 is a schematic structural diagram of a coding and decoding system provided by an embodiment of the present application.

Detailed ways

This application can be applied to the fields of image encoding and decoding, video encoding and decoding, hardware video encoding and decoding, dedicated circuit video encoding and decoding, real-time video encoding and decoding, etc. For example, the solution of this application can be combined with the audio and video coding standard (AVS for short), such as H.264/audio video coding (AVC for short) standard, H.265/high-efficiency video coding (AVS for short) high efficiency video coding (HEVC) standard and H.266/versatile video coding (VVC) standard. Alternatively, the solution of this application can be operated in conjunction with other proprietary or industry standards, including ITU-TH.261, ISO/IECMPEG-1Visual, ITU-TH.262 or ISO/IECMPEG-2Visual, ITU-TH.263 , ISO/IECMPEG-4Visual, ITU-TH.264 (also known as ISO/IECMPEG-4AVC), including scalable video codec (SVC) and multi-view video codec (MVC) extensions. It should be understood that the technology of this application is not limited to any specific codec standard or technology.

The high-degree-of-freedom immersive coding system can be roughly divided into the following links according to the task line: data collection, data organization and expression, data encoding and compression, data decoding and reconstruction, data synthesis and rendering, and finally presenting the target data to the user.

The encoding involved in the embodiment of the present application is mainly video encoding and decoding. To facilitate understanding, the video encoding and decoding system involved in the embodiment of the present application is first introduced with reference to Figure 1 .

Figure 1 is a schematic block diagram of a video encoding and decoding system related to an embodiment of the present application. It should be noted that Figure 1 is only an example, and the video encoding and decoding system in the embodiment of the present application includes but is not limited to what is shown in Figure 1 . As shown in FIG. 1 , the video encoding and decoding system 100 includes an encoding device 110 and a decoding device 120 . The encoding device is used to encode the video data (which can be understood as compression) to generate a code stream, and transmit the code stream to the decoding device. The decoding device decodes the code stream generated by the encoding device to obtain decoded video data.

The encoding device 110 in the embodiment of the present application can be understood as a device with a video encoding function, and the decoding device 120 can be understood as a device with a video decoding function. That is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120. Examples include smartphones, desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, and the like.

In some embodiments, the encoding device 110 may transmit the encoded video data (eg, code stream) to the decoding device 120 via the channel 130 . Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120 .

In one example, channel 130 includes one or more communication media that enables encoding device 110 to transmit encoded video data directly to decoding device 120 in real time. In this example, encoding device 110 may modulate the encoded video data according to the communication standard and transmit the modulated video data to decoding device 120. The communication media includes wireless communication media, such as radio frequency spectrum. Optionally, the communication media may also include wired communication media, such as one or more physical transmission lines.

In another example, channel 130 includes a storage medium that can store video data encoded by encoding device 110 . Storage media include a variety of local access data storage media, such as optical disks, DVDs, flash memories, etc. In this example, the decoding device 120 may obtain the encoded video data from the storage medium.

In another example, channel 130 may include a storage server that may store video data encoded by encoding device 110 . In this example, the decoding device 120 may download the stored encoded video data from the storage server. Optionally, the storage server may store the encoded video data and may transmit the encoded video data to the decoding device 120, such as a web server (eg, for a website), a File Transfer Protocol (FTP) server, etc.

In some embodiments, the encoding device 110 includes a video encoder 112 and an output interface 113. Among other things, the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.

In some embodiments, the encoding device 110 may include a video source 111 in addition to the video encoder 112 and the input interface 113 .

Video source 111 may include at least one of a video capture device (eg, a video camera), a video archive, a video input interface for receiving video data from a video content provider, a computer graphics system Used to generate video data.

The video encoder 112 encodes the video data from the video source 111 to generate a code stream. Video data may include one or more images (pictures) or sequence of pictures (sequence of pictures). The code stream contains the encoding information of an image or image sequence in the form of a bit stream. Encoded information may include encoded image data and associated data. The associated data may include sequence parameter set (SPS), picture parameter set (PPS) and other syntax structures. An SPS can contain parameters that apply to one or more sequences. A PPS can contain parameters that apply to one or more images. A syntax structure refers to a collection of zero or more syntax elements arranged in a specified order in a code stream.

The video encoder 112 transmits the encoded video data directly to the decoding device 120 via the output interface 113 . The encoded video data can also be stored on a storage medium or storage server for subsequent reading by the decoding device 120 .

In some embodiments, decoding device 120 includes input interface 121 and video decoder 122. In some embodiments, in addition to the input interface 121 and the video decoder 122, the decoding device 120 may also include a display device 123.

The input interface 121 includes a receiver and/or a modem. Input interface 121 may receive encoded video data over channel 130.

The video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123.

The display device 123 displays the decoded video data. Display device 123 may be integrated with decoding device 120 or external to decoding device 120 . Display device 123 may include a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.

In addition, Figure 1 is only an example, and the technical solution of the embodiment of the present application is not limited to Figure 1. For example, the technology of the present application can also be applied to unilateral video encoding or unilateral video decoding.

The video coding framework involved in the embodiments of this application is introduced below.

FIG. 2A is a schematic block diagram of a video encoder related to an embodiment of the present application. It should be understood that the video encoder 200 can be used to perform lossy compression of images (lossy compression), or can also be used to perform lossless compression (lossless compression) of images. The lossless compression can be visually lossless compression (visually lossless compression) or mathematically lossless compression (mathematically lossless compression).

The video encoder 200 can be applied to image data in a luminance-chrominance (YCbCr, YUV) format. For example, the YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y represents brightness (Luma), Cb(U) represents blue chroma, Cr(V) represents red chroma, U and V represent Chroma, which is used to describe color and saturation. For example, in the color format, 4:2:0 means that every 4 pixels have 4 luminance components and 2 chrominance components (YYYYCbCr), 4:2:2 means that every 4 pixels have 4 luminance components and 4 Chroma component (YYYYCbCrCbCr), 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).

For example, the video encoder 200 reads video data, and for each frame of image in the video data, divides one frame of image into several coding tree units (coding tree units, CTU). In some examples, CTB may be called "Tree block", "Largest Coding unit" (LCU for short) or "coding tree block" (CTB for short). Each CTU can be associated with an equal-sized block of pixels within the image. Each pixel can correspond to one luminance (luminance or luma) sample and two chrominance (chrominance or chroma) samples. Therefore, each CTU can be associated with one block of luma samples and two blocks of chroma samples. A CTU size is, for example, 128×128, 64×64, 32×32, etc. A CTU can be further divided into several coding units (Coding Units, CUs) for encoding. CUs can be rectangular blocks or square blocks. CU can be further divided into prediction unit (PU for short) and transform unit (TU for short), thus enabling coding, prediction, and transformation to be separated and processing more flexible. In an example, the CTU is divided into CUs in a quad-tree manner, and the CU is divided into TUs and PUs in a quad-tree manner.

Video encoders and video decoders can support various PU sizes. Assuming that the size of a specific CU is 2N×2N, the video encoder and video decoder can support a PU size of 2N×2N or N×N for intra prediction, and support 2N×2N, 2N×N, N×2N, N×N or similar sized symmetric PU for inter prediction. The video encoder and video decoder can also support 2N×nU, 2N×nD, nL×2N and nR×2N asymmetric PUs for inter prediction.

In some embodiments, as shown in FIG. 2A , the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, and a loop filtering unit. 260. Decode the image cache 270 and the entropy encoding unit 280. It should be noted that the video encoder 200 may include more, less, or different functional components.

Optionally, in this application, the current block (current block) may be called the current coding unit (CU) or the current prediction unit (PU), etc. The prediction block may also be called a predicted image block or an image prediction block, and the reconstructed image block may also be called a reconstruction block or an image reconstructed image block.

In some embodiments, prediction unit 210 includes inter prediction unit 211 and intra estimation unit 212. Since there is a strong correlation between adjacent pixels in a video frame, the intra-frame prediction method is used in video encoding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Since there is a strong similarity between adjacent frames in the video, the interframe prediction method is used in video coding and decoding technology to eliminate the temporal redundancy between adjacent frames, thereby improving coding efficiency.

The inter-frame prediction unit 211 can be used for inter-frame prediction. Inter-frame prediction can include motion estimation (motion estimation) and motion compensation (motion compensation). It can refer to image information of different frames. Inter-frame prediction uses motion information to find a reference from a reference frame. block, a prediction block is generated based on the reference block to eliminate temporal redundancy; the frames used in inter-frame prediction can be P frames and/or B frames, P frames refer to forward prediction frames, and B frames refer to bidirectional predictions frame. Inter-frame prediction uses motion information to find reference blocks from reference frames and generate prediction blocks based on the reference blocks. The motion information includes the reference frame list where the reference frame is located, the reference frame index, and the motion vector. The motion vector can be in whole pixels or sub-pixels. If the motion vector is in sub-pixels, then interpolation filtering needs to be used in the reference frame to make the required sub-pixel blocks. Here, the reference frame found according to the motion vector is A block of whole pixels or sub-pixels is called a reference block. Some technologies will directly use the reference block as a prediction block, and some technologies will process the reference block to generate a prediction block. Reprocessing to generate a prediction block based on a reference block can also be understood as using the reference block as a prediction block and then processing to generate a new prediction block based on the prediction block.

The intra-frame estimation unit 212 only refers to the information of the same frame image and predicts the pixel information in the current coded image block to eliminate spatial redundancy. The frames used in intra prediction may be I frames.

Intra-frame prediction has multiple prediction modes. Taking the international digital video coding standard H series as an example, the H.264/AVC standard has 8 angle prediction modes and 1 non-angle prediction mode, and H.265/HEVC has been extended to 33 angles. prediction mode and 2 non-angle prediction modes. The intra-frame prediction modes used by HEVC include planar mode (Planar), DC and 33 angle modes, for a total of 35 prediction modes. The intra-frame modes used by VVC include Planar, DC and 65 angle modes, for a total of 67 prediction modes.

It should be noted that with the increase of angle modes, intra-frame prediction will be more accurate and more in line with the development needs of high-definition and ultra-high-definition digital videos.

Residual unit 220 may generate a residual block of the CU based on the pixel block of the CU and the prediction block of the PU of the CU. For example, residual unit 220 may generate a residual block of a CU such that each sample in the residual block has a value equal to the difference between the sample in the pixel block of the CU and the PU of the CU. Predict the corresponding sample in the block.

Transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with the TU of the CU based on quantization parameter (QP) values associated with the CU. Video encoder 200 may adjust the degree of quantization applied to transform coefficients associated with the CU by adjusting the QP value associated with the CU.

Inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct the residual block from the quantized transform coefficients.

Reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by prediction unit 210 to produce a reconstructed image block associated with the TU. By reconstructing blocks of samples for each TU of a CU in this manner, video encoder 200 can reconstruct blocks of pixels of the CU.

The loop filtering unit 260 is used to process the inversely transformed and inversely quantized pixels to compensate for distortion information and provide a better reference for subsequent encoding of pixels. For example, a deblocking filtering operation can be performed to reduce the number of pixel blocks associated with the CU. block effect.

In some embodiments, the loop filtering unit 260 includes a deblocking filtering unit and a sample adaptive compensation/adaptive loop filtering (SAO/ALF) unit, where the deblocking filtering unit is used to remove blocking effects, and the SAO/ALF unit Used to remove ringing effects.

Decoded image cache 270 may store reconstructed pixel blocks. Inter prediction unit 211 may perform inter prediction on PUs of other images using reference images containing reconstructed pixel blocks. Additionally, intra estimation unit 212 may use the reconstructed pixel blocks in decoded image cache 270 to perform intra prediction on other PUs in the same image as the CU.

Entropy encoding unit 280 may receive the quantized transform coefficients from transform/quantization unit 230 . Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.

FIG. 2B is a schematic block diagram of a video decoder related to an embodiment of the present application.

As shown in FIG. 2B , the video decoder 300 includes an entropy decoding unit 310 , a prediction unit 320 , an inverse quantization/transformation unit 330 , a reconstruction unit 340 , a loop filtering unit 350 and a decoded image cache 360 . It should be noted that the video decoder 300 may include more, less, or different functional components.

Video decoder 300 can receive the code stream. Entropy decoding unit 310 may parse the codestream to extract syntax elements from the codestream. As part of parsing the code stream, the entropy decoding unit 310 may parse entropy-encoded syntax elements in the code stream. The prediction unit 320, the inverse quantization/transformation unit 330, the reconstruction unit 340 and the loop filtering unit 350 may decode the video data according to the syntax elements extracted from the code stream, that is, generate decoded video data.

In some embodiments, prediction unit 320 includes inter prediction unit 321 and intra estimation unit 322.

Intra estimation unit 322 may perform intra prediction to generate predicted blocks for the PU. Intra estimation unit 322 may use an intra prediction mode to generate predicted blocks for a PU based on pixel blocks of spatially neighboring PUs. Intra estimation unit 322 may also determine the intra prediction mode of the PU based on one or more syntax elements parsed from the codestream.

The inter prediction unit 321 may construct a first reference image list (List 0) and a second reference image list (List 1) according to syntax elements parsed from the code stream. Additionally, if the PU uses inter-prediction encoding, entropy decoding unit 310 may parse the motion information of the PU. Inter prediction unit 321 may determine one or more reference blocks for the PU based on the motion information of the PU. Inter prediction unit 321 may generate a predictive block for the PU based on one or more reference blocks of the PU.

Inverse quantization/transform unit 330 may inversely quantize (ie, dequantize) transform coefficients associated with a TU. Inverse quantization/transform unit 330 may use the QP value associated with the CU of the TU to determine the degree of quantization.

After inversely quantizing the transform coefficients, inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients to produce a residual block associated with the TU.

Reconstruction unit 340 uses the residual blocks associated with the TU of the CU and the prediction blocks of the PU of the CU to reconstruct the pixel blocks of the CU. For example, reconstruction unit 340 may add samples of the residual block to corresponding samples of the prediction block to reconstruct the pixel block of the CU to obtain a reconstructed image block.

Loop filtering unit 350 may perform deblocking filtering operations to reduce blocking artifacts for blocks of pixels associated with the CU.

Video decoder 300 may store the reconstructed image of the CU in decoded image cache 360 . The video decoder 300 may use the reconstructed image in the decoded image cache 360 as a reference image for subsequent prediction, or transmit the reconstructed image to a display device for presentation.

The basic process of video encoding and decoding is as follows: at the encoding end, an image frame is divided into blocks. For the current block, the prediction unit 210 uses intra prediction or inter prediction to generate a prediction block of the current block. The residual unit 220 may calculate a residual block based on the prediction block and the original block of the current block, that is, the difference between the prediction block and the original block of the current block. The residual block may also be called residual information. The residual block undergoes transformation and quantization processes such as transformation/quantization unit 230 to remove information that is insensitive to human eyes to eliminate visual redundancy. Optionally, the residual block before transformation and quantization by the transformation/quantization unit 230 may be called a time domain residual block, and the time domain residual block after transformation and quantization by the transformation/quantization unit 230 may be called a frequency residual block. or frequency domain residual block. The entropy encoding unit 280 receives the quantized change coefficient output from the change quantization unit 230, and may perform entropy encoding on the quantized change coefficient to output a code stream. For example, the entropy encoding unit 280 may eliminate character redundancy according to the target context model and probability information of the binary code stream.

At the decoding end, the entropy decoding unit 310 can parse the code stream to obtain the prediction information, quantization coefficient matrix, etc. of the current block. The prediction unit 320 uses intra prediction or inter prediction for the current block based on the prediction information to generate a prediction block of the current block. The inverse quantization/transform unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block. The reconstruction unit 340 adds the prediction block and the residual block to obtain a reconstruction block. The reconstructed blocks constitute a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the blocks to obtain a decoded image. The encoding end also needs similar operations as the decoding end to obtain the decoded image. The decoded image may also be called a reconstructed image, and the reconstructed image may be used as a reference frame for inter-frame prediction for subsequent frames.

It should be noted that the block division information determined by the encoding end, as well as mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc., are carried in the code stream when necessary. The decoding end determines the same block division information as the encoding end by parsing the code stream and analyzing the existing information, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, thereby ensuring the decoded image obtained by the encoding end It is the same as the decoded image obtained by the decoding end.

The above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. This application is applicable to the block-based hybrid coding framework. The basic process of the video codec, but is not limited to this framework and process.

In some application scenarios, multiple heterogeneous contents appear simultaneously in the same three-dimensional scene, such as multi-viewpoint videos and point clouds. For this situation, the current encoding and decoding methods include at least the following two:

Method 1: For multi-viewpoint videos, MPEG (Moving Picture Experts Group) immersive video (MIV) technology is used for encoding and decoding, and for point clouds, point cloud video compression (Video based Point Cloud) is used. Compression (VPCC for short) technology for encoding and decoding.

MIV technology and VPCC technology are introduced below.

MIV technology: In order to reduce the transmission pixel rate while retaining scene information as much as possible to ensure that there is enough information for rendering the target view, the scheme adopted by MPEG-I is shown in Figure 3A. A limited number of viewpoints are selected as the basic viewpoints and as much as possible To express the visible range of the scene, the base viewpoint is transmitted as a complete image, and the redundant pixels between the remaining non-base viewpoints and the base viewpoint are removed, that is, only the effective information of non-repeated expressions is retained, and then the effective information is extracted into sub-block images and base views The viewpoint image is reorganized to form a larger rectangular image, which is called a spliced image. Figure 3A and Figure 3B give a schematic process of generating a spliced image. The spliced image is sent to the codec for compression and reconstruction, and the auxiliary data related to the sub-block image splicing information is also sent to the encoder to form a code stream.

The encoding method of VPCC is to project point clouds into two-dimensional images or videos, and convert three-dimensional information into two-dimensional information encoding. Figure 3C is the coding block diagram of VPCC. The code stream is roughly divided into four parts. The geometric code stream is the code stream generated by geometric depth map encoding, which is used to represent the geometric information of the point cloud; the attribute code stream is the code stream generated by texture map encoding. , used to represent the attribute information of the point cloud; the occupancy code stream is the code stream generated by the occupancy map encoding, which is used to indicate the effective area in the depth map and texture map; these three types of videos all use video encoders for encoding and decoding. As shown in Figure 3D to Figure 3F. The auxiliary information code stream is the code stream generated by encoding the auxiliary information of the sub-block image, which is the part related to the patch data unit in the V3C standard, indicating the position and size of each sub-block image.

Method 2: Multi-viewpoint videos and point clouds are encoded and decoded using the frame packing technology in Visual Volumetric Video-based Coding (V3C).

The following is an introduction to frame packing technology.

Taking multi-viewpoint video as an example, as shown in Figure 4, the encoding end includes the following steps:

Step 1: When encoding the acquired multi-view video, perform some pre-processing to generate multi-view video sub-blocks (patch). Then, organize the multi-view video sub-blocks to generate a multi-view video splicing image.

For example, as shown in Figure 4, multi-viewpoint videos are input into TIMV for packaging, and a multi-viewpoint video splicing image is output. TIMV is a reference software for MIV. Packaging in the embodiment of this application can be understood as splicing.

Among them, the multi-viewpoint video mosaic includes a multi-view video texture mosaic and a multi-view video geometry mosaic, that is, it only contains multi-view video sub-blocks.

Step 2: Input the multi-viewpoint video splicing image into the frame packer and output the multi-viewpoint video mixed splicing image.

Among them, the multi-viewpoint video hybrid splicing image includes a multi-viewpoint video texture blending splicing image, a multi-viewpoint video geometry blending splicing image, and a multi-viewpoint video texture and geometry blending splicing image.

Specifically, as shown in Figure 4, the multi-viewpoint video splicing image is frame packed to generate a multi-viewpoint video hybrid splicing image. Each multi-viewpoint video splicing image occupies a region of the multi-viewpoint video hybrid splicing image. ). Correspondingly, a flag pin_region_type_id_minus2 must be transmitted for each region in the code stream. This flag records the information whether the current area belongs to a multi-viewpoint video texture splicing map or a multi-viewpoint video geometric splicing map. This information needs to be used at the decoding end.

Step 3: Use a video encoder to encode the multi-viewpoint video mixed splicing image to obtain a code stream.

For example, as shown in Figure 5, the decoding end includes the following steps:

Step 1: During multi-viewpoint video decoding, input the obtained code stream into the video decoder for decoding to obtain a reconstructed multi-viewpoint video mixed splicing image.

Step 2: Input the reconstructed multi-viewpoint video mixed splicing image into the frame depacker and output the reconstructed multi-viewpoint video splicing image.

Specifically, first, the flag pin_region_type_id_minus2 is obtained from the code stream. If it is determined that the pin_region_type_id_minus2 is V3C_AVD, it means that the current region is a multi-viewpoint video texture mosaic, and then the current region is split and output as a reconstructed multi-viewpoint video texture mosaic.

If it is determined that pin_region_type_id_minus2 is V3C_GVD, it means that the current region is a multi-viewpoint video geometric mosaic, and the current region is split and output as a reconstructed multi-viewpoint video geometric mosaic.

Step 3: Decode the reconstructed multi-viewpoint video splicing image to obtain the reconstructed multi-viewpoint video.

Specifically, the multi-viewpoint video texture splicing image and the multi-viewpoint video geometric splicing image are decoded to obtain the reconstructed multi-viewpoint video.

The above uses multi-viewpoint video as an example to analyze and introduce frame packing technology. The frame packing encoding and decoding method for point clouds is basically the same as the above-mentioned multi-viewpoint video. You can refer to it. For example, use TMC (a VPCC reference software) to point cloud. The cloud is packaged to obtain a point cloud splicing image. The point cloud splicing image is input into the frame packer for frame packaging to obtain a point cloud hybrid splicing image. The point cloud hybrid splicing image is spliced to obtain a point cloud code stream. I will not go into details here. .

The following is an introduction to the syntax related to frame packing in the standard.

The V3C unit header syntax is shown in Table 1:

Table 1

V3C unit header semantics, as shown in Table 2:

Table 2: V3C unit types

Currently, if multiple visual media contents with different expression formats appear simultaneously in the same three-dimensional scene, the visual media content with multiple different expression formats will be encoded and decoded separately. For example, for the situation where point cloud and multi-viewpoint video appear simultaneously in the same three-dimensional scene, the current packaging technology is to compress the point cloud to form a point cloud compression code stream (i.e. a V3C code stream), and to compress the multi-viewpoint video Information is compressed to obtain a multi-view video compressed code stream (i.e. another V3C code stream), and then the system layer multiplexes the compressed code stream to obtain a fused three-dimensional scene multiplexed code stream. During decoding, the point cloud compression code stream and the multi-viewpoint video compression code stream are decoded separately. It can be seen from this that when encoding and decoding visual media content in multiple different expression formats, the existing technology uses many codecs and the encoding and decoding cost is high.

In order to solve the above technical problems, the embodiments of the present application splice homogeneous blocks with different expression formats into a heterogeneous mixed splicing diagram, and splice homogeneous blocks with the same expression format into a homogeneous splicing diagram. The resulting Heterogeneous hybrid splicing images and/or homogeneous splicing images are encoded and written into the code stream. Homogeneous splicing images (such as at least one of multi-viewpoint splicing images, point cloud splicing images and grid splicing images) can coexist in the code stream. and heterogeneous hybrid splicing images to expand the application scenarios of encoding and decoding methods. Moreover, the splicing picture information includes a first syntax element used to indicate the type of the splicing picture, which can improve the decoding efficiency of the splicing picture at the decoding end.

The video encoding method provided by the embodiment of the present application will be introduced below with reference to Figure 6, taking the encoding end as an example.

Figure 6 is a schematic flow chart of the encoding method provided by the embodiment of the present application. As shown in Figure 6, the encoding method includes:

Step 601: Process the visual media content of at least one expression format to obtain at least one isomorphic block, where different types of isomorphic blocks correspond to different visual media content expression formats;

In three-dimensional application scenarios, such as virtual reality (VR), augmented reality (AR), mixed reality (Mix Reality, MR) and other application scenarios, visual expressions with different expression formats may appear in the same scene. Media objects, for example, exist in the same three-dimensional scene. The scene background and some characters and objects are expressed in video, and another part of the characters are expressed in three-dimensional point cloud or three-dimensional grid.

In some embodiments, the visual media content includes visual media content in at least one expression format such as multi-view video, point cloud, and grid. A special example of multi-viewpoint video is single-viewpoint video, that is, the multi-viewpoint video may include multiple viewpoint videos and/or single-viewpoint video.

Among them, one isomorphic block corresponds to one expression format. Exemplarily, the expression format corresponding to at least one isomorphic block includes at least one of the following: multi-view video, point cloud, and grid. At least two isomorphic blocks correspond to at least two different expression formats. For example, the at least two isomorphic blocks in the embodiment of the present application include isomorphic areas of at least two different expression formats such as multi-view video, point cloud, grid, etc. piece.

It should be noted that in the embodiment of the present application, each isomorphic block may include at least one isomorphic block with the same expression format. Exemplarily, a homogeneous block in point cloud format includes one or more point cloud blocks, a homogeneous area in multi-viewpoint video format includes one or more multi-viewpoint video blocks, and a homogeneous area in grid format includes one or more point cloud blocks. A block consists of one or more grid blocks.

In some embodiments, step 601 may be: processing visual media content in an expression format to obtain a homogeneous block. In some embodiments, step 601 may include: processing visual media content in at least two expression formats to obtain at least two isomorphic blocks, where different visual media content corresponds to different expression formats. Specifically, the visual media content in the first expression format is processed to obtain isomorphic blocks in the first expression format; the visual multimedia content in the second expression format is processed to obtain isomorphic blocks in the second expression format. Wherein, the first expression format is one of multi-view video, point cloud, and grid, the second expression format is one of multi-view video, point cloud, and grid, and the first expression format and the second expression format are different expressions. Format.

That is to say, the above-mentioned visual media content includes visual media content in at least one expression format such as multi-viewpoint video, point cloud, grid, etc. When visual media content of an expression format is included, the visual media content is processed to obtain isomorphic blocks of an expression format. When visual media content of multiple expression formats is included, the visual media content is processed to obtain isomorphic blocks of multiple expression formats.

It should be noted that blocks can also be called tiles, that is, point cloud blocks can also be called point cloud strips, multi-viewpoint video blocks can also be called multi-viewpoint video strips, and grid blocks Also called grid strips. The block may be a mosaic of a specific shape, for example, a mosaic of a rectangular area with a specific length and/or height. For example, at least one sub-tile can be spliced in an orderly manner, such as from large to small according to the area of the sub-tiles, or from large to small according to the length and/or height of the sub-tiles, to obtain the visual media content corresponding to block. Optionally, a tile can be mapped exactly to an atlas tile.

In some embodiments, each sub-tile in a block may have a patch ID (patchID) to distinguish different sub-tiles in the same block. For example, the same block may include sub-patch 1 (patch1), sub-patch 2 (patch2), and sub-patch 3 (patch3).

Furthermore, the expression format corresponding to each sub-block in the isomorphic block is the same. For example, each sub-block in the isomorphic block is a multi-view video sub-block, or is a point cloud sub-block, etc. A subtile for the expression format. The expression format corresponding to each sub-block in the isomorphic block is the expression format corresponding to the isomorphic block.

In some embodiments, homogeneous tiles may have tile identifiers (tileIDs) to distinguish different tiles of the same expression format. For example, the point cloud block may include point cloud block 1 or point cloud block 2. For example, multiple visual media contents include point clouds and multi-viewpoint videos. The point clouds are processed to obtain point cloud blocks. Point cloud block 1 includes point cloud sub-blocks 1 to 3; for multi-view points The video is processed to obtain a multi-viewpoint video block, which includes multi-viewpoint video sub-blocks 1 to 4.

When visual media content of an expression format needs to be processed, a homogeneous block of the expression format is obtained. When at least two visual media contents need to be processed, at least two isomorphic blocks of expression formats are obtained. In order to improve compression efficiency, embodiments of the present application process the at least two visual media contents, such as packaging (also called splicing) processing, to obtain blocks corresponding to each visual media content in the at least two visual media contents. For example, the block can be obtained by splicing sub-tiles (patches) corresponding to at least two visual media contents. It should be noted that the embodiment of the present application processes at least two visual media contents separately, and the method of obtaining blocks is not limited.

In a possible implementation, the visual media content includes visual media content in two expression formats: multi-view video and point cloud. The visual media content in at least one expression format is processed to obtain at least one isomorphic region. block, including: after projecting and de-redundant processing of the acquired multi-viewpoint video, connecting non-repeating pixel points into video sub-blocks, and splicing the video sub-blocks into multi-viewpoint video blocks; and processing the acquired points The cloud performs parallel projection, and the connected points in the projection surface are composed of point cloud sub-blocks, and the point cloud sub-blocks are spliced into point cloud blocks.

Specifically, for multi-viewpoint videos, taking MPEG-I as an example, a limited number of viewpoints are selected as base viewpoints and express the visible range of the scene as much as possible. The base viewpoints are transmitted as complete images, and the gaps between the remaining non-base viewpoints and the base viewpoints are removed. Redundant pixels, that is, only the effective information of non-repeated expressions is retained, and then the effective information is extracted into sub-block images and basic viewpoint images and reorganized to form a larger strip-shaped image. This strip-shaped image is called a multi-viewpoint video block.

In some embodiments, the above-mentioned visual media content is media content presented simultaneously in the same three-dimensional space. In some embodiments, the visual media content is media content presented at different times in the same three-dimensional space. In some embodiments, the above-mentioned visual media content may also be media content in different three-dimensional spaces. That is to say, in the embodiments of this application, there are no specific restrictions on the at least two visual media contents mentioned above.

Step 602: Splice the at least one isomorphic block to obtain at least one splicing graph and splicing graph information, wherein the splicing graph information includes a first syntax element, and the splicing is determined according to the first syntax element. The picture shows a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram. The heterogeneous hybrid splicing diagram includes at least two types of isomorphic blocks, and the isomorphic splicing diagram includes one type of isomorphic block;

In some embodiments, the splicing of the at least one homogeneous block to obtain at least one splicing diagram and splicing diagram information includes: heterogeneously splicing homogeneous blocks of at least two expression formats to generate Heterogeneous mixed splicing diagrams and splicing diagram information; isomorphic splicing of homogeneous blocks with the same expression format to generate isomorphic splicing diagrams and splicing diagram information.

Exemplarily, at least one isomorphic block includes a isomorphic block in a first expression format and a isomorphic block in a second expression format. The method specifically includes: isomorphically splicing the isomorphic blocks of the first expression format to obtain the first isomorphic splicing diagram and splicing diagram information, and isomorphically splicing the isomorphic blocks of the second expression format to obtain the second isomorphic splicing diagram. Homogeneous splicing diagram and splicing diagram information; or, performing heterogeneous splicing on the isomorphic blocks of the first expression format and the isomorphic blocks of the second expression format to obtain heterogeneous mixed splicing diagram and splicing diagram information; or, perform heterogeneous splicing on The isomorphic blocks in the first expression format are isomorphically spliced to obtain the first isomorphic splicing diagram and the splicing diagram information, and the isomorphic blocks in the first expression format and the isomorphic blocks in the second expression format are heterogeneously spliced. , obtain a heterogeneous mixed splicing diagram and splicing diagram information; or perform isomorphic splicing on the isomorphic blocks of the second expression format to obtain the second isomorphic splicing diagram and splicing diagram information, and perform isomorphic splicing on the isomorphic areas of the first expression format Blocks are heterogeneously spliced with homogeneous blocks in the second expression format to obtain heterogeneous mixed splicing images and splicing image information.

That is to say, the homogeneous splicing diagram may include one isomorphic block or multiple isomorphic blocks of the same expression format, and the heterogeneous mixed splicing diagram includes at least two isomorphic blocks of at least two expression formats. In the embodiment of the present application, the first expression format is one of multi-view video, point cloud, and grid, the second expression format is one of multi-view video, point cloud, and grid, and the first expression format and the third expression format are one of multi-view video, point cloud, and grid. The two expression formats are different expression formats. As shown in Figure 7, multi-viewpoint video block 1, multi-viewpoint video block 2 and point cloud block 1 are spliced to obtain a heterogeneous hybrid stitching image.

For example, the first expression format is multi-viewpoint video, and the second expression format is point cloud. The splicing of the at least one homogeneous block to obtain at least one spliced image and spliced image information includes: splicing a part of the multi-viewpoint video block and a part of the point cloud block into a heterogeneous hybrid spliced image; Part of the multi-viewpoint video blocks are spliced into a multi-viewpoint spliced image; another part of the point cloud blocks are spliced into a point cloud spliced image.

Wherein, the mosaic image information includes a first syntax element, and the first syntax element is used to indicate that the mosaic image is a heterogeneous hybrid mosaic image or a homogeneous mosaic image.

In some embodiments, determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram includes: if the first syntax element is a first preset value, then determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram. The figure shows a heterogeneous mixed splicing diagram including homogeneous blocks of a first expression format and a second expression format, wherein the first expression format and the second expression format are different expression formats; the first syntax element is the second preset value, then it is determined that the splicing diagram is a isomorphic splicing diagram including the isomorphic blocks of the first expression format; the first syntax element is the third preset value, then it is determined that the splicing The figure shows a isomorphic mosaic diagram including the isomorphic blocks of the second expression format. That is to say, by setting different values for the first syntax element, it is used to indicate the mosaic type. Further, the first syntax element can also be set to other values to indicate that the spliced graph is a isomorphic spliced graph that includes isomorphic blocks of other expression formats, or to indicate that the spliced graph includes at least two other expressions. Heterogeneous mosaic graph of homogeneous blocks in format.

In some embodiments, the first syntax element includes at least two sub-syntax elements. Exemplarily, the first syntax element includes: a first sub-grammar element and a second sub-grammar element. According to the first sub-grammar element and the second sub-grammar element, it is determined that the splicing diagram is heterogeneous hybrid splicing. graph or isomorphic splicing graph; determining that the splicing graph is a heterogeneous hybrid splicing graph or a homogeneous splicing graph according to the first syntax element includes: the first sub-grammar element is a fourth preset value, then it is determined that the splicing graph is a heterogeneous hybrid splicing graph or a isomorphic splicing graph. If the spliced graph includes isomorphic blocks of the first expression format; if the second sub-syntax element is the fifth preset value, it is determined that the spliced graph includes isomorphic blocks of the second expression format.

It can be understood that when the first sub-syntax element is the fourth preset value, it is determined that the spliced graph includes isomorphic blocks of the first expression format, that is, it is determined that the spliced graph includes the first expression format. isomorphic splicing diagram of isomorphic blocks; the second sub-grammar element is the fifth preset value, then it is determined that the splicing diagram includes isomorphic blocks of the second expression format, that is, it is determined that the splicing diagram includes Isomorphic mosaic diagram of the isomorphic blocks of the second expression format; when the first sub-grammar element is a fourth preset value and the second sub-grammar element is a fifth preset value, then it is determined that the The spliced diagram includes homogeneous blocks in the first expression format and isomorphic blocks in the second expression format, that is, the spliced diagram is determined to be a heterogeneous hybrid splicing including homogeneous blocks in the first expression format and the second expression format. picture.

In some embodiments, the method further includes: when the first sub-grammar element is a sixth preset value, it is determined that the splicing diagram does not include isomorphic blocks of the first expression format; the second sub-grammar element If the element is the seventh preset value, it is determined that the mosaic image does not include isomorphic blocks in the second expression format.

Specifically, the first sub-grammar element is a fourth preset value and the second sub-grammar element is a fifth preset value, and it is determined that the mosaic diagram includes isomorphic blocks of the first expression format and the second A heterogeneous mixed mosaic diagram of homogeneous blocks in an expression format, the first sub-grammar element is a fourth preset value and the second sub-grammar element is a seventh preset value, and it is determined that the mosaic diagram includes all The isomorphic mosaic diagram of the isomorphic blocks of the first expression format; the first sub-grammar element is the sixth preset value and the second sub-grammar element is the fifth preset value, and it is determined that the mosaic diagram is A isomorphic mosaic diagram including the isomorphic blocks of the second expression format.

In other words, the expression format of the isomorphic block in the splicing diagram can also be determined based on the values of the two sub-grammatical elements. Furthermore, when the splicing diagram includes more expression formats, multiple syntax elements can also be used to indicate the expression formats of the isomorphic blocks in the splicing diagram. For example, when three expression formats are included, three syntax elements are set, and when four expression formats are included, four syntax elements are set. Multiple values can also be set through one syntax element to represent multiple expression formats.

In some embodiments, the first syntax element is located in a parameter set of the code stream. For example, the parameter set of the code stream may be V3C_VPS, and the first syntax element may be ptl_profile_toolset_idc in V3C_VPS.

In some embodiments, the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element. Exemplarily, the splicing graph sequence parameter set corresponding to the splicing graph includes the first sub-syntax element and the second sub-syntax element. For example, the first sub-syntax element is asps_vpcc_extension_present_flag in the splicing diagram sequence parameter set, and the second sub-syntax element is asps_miv_extension_present_flag.

In other words, the first syntax element can be located in the parameter set of the code stream, and the decoding end can parse the splicing pattern type of each splicing pattern earlier. The first syntax element may also be located in the mosaic sequence parameter set corresponding to each mosaic image, and the decoding end obtains and then determines the mosaic image type when parsing each mosaic image.

In some embodiments, the heterogeneous hybrid mosaic graph of the embodiment of the present application includes at least one of the following: a single attribute heterogeneous hybrid mosaic graph and a multi-attribute heterogeneous hybrid mosaic graph.

Among them, the single-attribute heterogeneous hybrid splicing diagram refers to the heterogeneous hybrid splicing diagram in which the attribute information of all homogeneous blocks included is the same. For example, a single attribute heterogeneous hybrid mosaic image only includes homogeneous blocks of attribute information, such as only multi-view video texture blocks and point cloud texture blocks. For another example, a single-attribute heterogeneous hybrid mosaic image only includes homogeneous blocks of geometric information, such as only multi-view video geometry blocks and point cloud geometry blocks.

A multi-attribute heterogeneous hybrid mosaic map refers to a heterogeneous hybrid mosaic map that includes at least two homogeneous blocks with different attribute information. For example, a multi-attribute heterogeneous hybrid mosaic map includes both homogeneous blocks with attribute information. Also includes isomorphic blocks of geometric information. As an example, any attribute or blocks under any two attributes of at least two of the point cloud, multi-viewpoint video and grid can be spliced into one image to obtain a heterogeneous hybrid spliced image. This application does not limit this.

In some embodiments, the single-attribute homogeneous blocks in the first expression format and the single-attribute blocks in the second expression format are spliced to obtain a heterogeneous hybrid spliced image. Wherein, the first expression format and the second expression format are any one of multi-view video, point cloud, and grid, and the first expression format and the second expression format are different. The first expression format and the second expression format The attribute information is the same.

The single attribute isomorphic block of the multi-view video includes at least one of a multi-view video texture block, a multi-view video geometry block, and the like.

The single attribute isomorphic block of the point cloud includes at least one of a point cloud texture block, a point cloud geometry block, a point cloud occupancy block, and the like.

The single attribute isomorphic block of the grid includes at least one of a grid texture block and a grid geometry block.

For example, at least two of the multi-viewpoint video geometry blocks, point cloud geometry blocks, and grid geometry blocks are spliced into one image to obtain a heterogeneous hybrid spliced image. This heterogeneous mixed mosaic diagram is called a single attribute heterogeneous mixed mosaic diagram. For another example, at least two of the multi-viewpoint video texture blocks, point cloud texture blocks, and grid texture blocks are spliced into one image to obtain a heterogeneous hybrid spliced image. This heterogeneous mixed mosaic diagram is called a single attribute heterogeneous mixed mosaic diagram.

In some embodiments, the multi-attribute isomorphic blocks in the first expression format and the multi-attribute isomorphic blocks in the second expression format are spliced to obtain a heterogeneous hybrid spliced image. Wherein, the first expression format and the second expression format are any one of multi-view video, point cloud, and grid, and the first expression format and the second expression format are different. The first expression format and the second expression format The attribute information is not exactly the same.

For example, the multi-viewpoint video texture block is spliced into one picture with at least one of the point cloud geometry block and the mesh geometry block to obtain a heterogeneous hybrid spliced picture. For another example, a multi-viewpoint video geometry block is spliced into one picture with at least one of a point cloud texture block and a mesh texture block to obtain a heterogeneous hybrid spliced picture. For another example, the point cloud texture block and at least one of the multi-viewpoint video geometry block and the mesh geometry block are spliced into one image to obtain a heterogeneous hybrid spliced image. For another example, the point cloud geometry block is spliced into one picture with at least one of the multi-viewpoint video texture block and the mesh texture block to obtain a heterogeneous hybrid spliced picture. For another example, point cloud geometry blocks, multi-viewpoint video texture blocks, and multi-viewpoint video texture blocks are spliced into one image to obtain a heterogeneous hybrid spliced image. For another example, point cloud geometry blocks, point cloud texture blocks, multi-viewpoint video texture blocks, and multi-viewpoint video texture blocks are spliced into one image to obtain a heterogeneous hybrid spliced image. Here, the obtained heterogeneous hybrid mosaic graph is called a multi-attribute heterogeneous hybrid mosaic graph.

The following takes the first expression format as multi-viewpoint video and the second expression format as point cloud as an example to introduce the splicing method in detail.

It is assumed that the multi-view video block includes a multi-view video texture block and a multi-view video geometry block, and the point cloud block includes a point cloud texture block, a point cloud geometry block and a point cloud occupancy block. Then, the above-mentioned heterogeneous splicing methods can include but are not limited to the following two:

Method 1: Splice the multi-viewpoint video texture block, multi-viewpoint video geometry block, point cloud texture block, point cloud geometry block and point cloud occupancy block into a heterogeneous hybrid splicing image.

Method 2: According to the preset heterogeneous splicing method, splice the multi-view video texture block, multi-view video geometry block, point cloud texture block, point cloud geometry block and point cloud occupancy block to obtain M A heterogeneous mixed mosaic image, M is a positive integer greater than or equal to 1.

Among them, the second method can include at least the following examples: Example 1, splicing multi-view video texture blocks and point cloud texture blocks to obtain a heterogeneous mixed texture splicing map, and combining multi-view video geometry blocks and point cloud geometry The blocks are spliced to obtain a heterogeneous mixed geometry splicing map, and the point cloud occupancy blocks are separately used as a mixed splicing map. Example 2: Splice multi-view video texture blocks and point cloud texture blocks to obtain a heterogeneous mixed texture splicing map. Splice multi-view video geometry blocks, point cloud geometry blocks and point cloud occupancy blocks. A mosaic of heterogeneous mixed geometry and occupancy is obtained. Example 3: Splice the multi-view video texture block, the point cloud texture block and the point cloud occupancy block to obtain a sub-heterogeneous hybrid stitching image, which combines the multi-view video geometry block and the point cloud geometry block. Perform splicing to obtain another sub-heterogeneous hybrid splicing picture. Further, after obtaining M heterogeneous mixed spliced images, video coding can be performed on the M heterogeneous mixed spliced images respectively to obtain video compression sub-streams.

In some embodiments, the isomorphic splicing graph of the embodiment of the present application includes at least one of the following: a single attribute isomorphic splicing graph and a multi-attribute isomorphic splicing graph. In some embodiments, the first attribute isomorphic blocks of the first expression format are spliced to obtain an isomorphic splicing graph. Alternatively, the first attribute isomorphic block and the second attribute isomorphic block of the first expression format are spliced to obtain an isomorphic splicing diagram.

Among them, a single-attribute isomorphic splicing diagram refers to a isomorphic splicing diagram in which all isomorphic blocks included have the same expression format and the same attribute information. For example, a single-attribute isomorphic mosaic image only includes isomorphic blocks that express attribute information in the first format. For example, a single-attribute isomorphic mosaic image only includes multi-view video texture blocks, or only point cloud texture blocks. For another example, a single-attribute isomorphic mosaic image only includes isomorphic blocks of geometric information, such as only multi-view video geometric blocks, or only point cloud geometric blocks.

A multi-attribute isomorphic spliced graph refers to an isomorphic spliced graph that includes at least two isomorphic blocks with the same expression format but different attribute information. For example, a multi-attribute isomorphic spliced graph includes both isomorphic blocks with attribute information. , and also includes isomorphic blocks of geometric information. As an example, a multi-attribute isomorphic mosaic image includes multi-viewpoint video texture blocks and multi-viewpoint video collection blocks. For another example, a multi-attribute isomorphic mosaic image includes a point cloud geometry block and a point cloud texture block. As shown in Figure 8, a multi-attribute isomorphic mosaic image includes a point cloud texture block 1 and a point cloud geometry area. Block 1 and Point Cloud Geometry Block 2.

In some embodiments, the spliced image information may also include syntax elements, according to which the spliced image is determined to be a single-attribute heterogeneous hybrid spliced image, a multi-attribute heterogeneous hybrid spliced image, a single-attribute isomorphic spliced image, or a multi-attribute homogeneous spliced image. Construct a mosaic diagram.

Step 603: Encode the at least one spliced image and the spliced image information to obtain a code stream.

In some embodiments, the code stream includes a video compression sub-stream and a splicing image information sub-stream. The encoding of the at least one spliced image and the spliced image information to obtain a code stream includes: encoding the at least one spliced image to obtain a video compression sub-stream; and encoding the spliced image information of the at least one spliced image. Encoding is performed to obtain a splicing image information sub-stream; the video compression sub-stream and the splicing image information sub-stream are synthesized into the code stream. In this way, it is possible to support heterogeneous source formats such as video, point cloud, and grid in the same compressed code stream, and to realize the simultaneous existence of multi-viewpoint video splicing images, point cloud video splicing images, grid splicing images, and heterogeneous formats in the compressed code stream. Hybrid splicing can reduce the number of 2D video encoders such as HEVC, VVC, AVC, and AVS that need to be called, reduce implementation costs, and improve ease of use.

In some embodiments, when it is determined based on the first syntax element that the spliced image is a heterogeneous hybrid spliced image, the spliced image information also includes a second syntax element, and the spliced image is determined based on the second syntax element The expression format of the i-th block in . The encoding end writes the second syntax element into the code stream, which can help improve the decoding accuracy of the decoding end, and at the same time enable the V3C standard to support visual media content in different expression formats such as multi-view videos and point clouds in the same compressed code stream. .

In some embodiments, determining the expression format of the i-th block in the mosaic diagram based on the second syntax element includes: if the second syntax element is an eighth preset value, then determining the i-th block The expression format of the i-th block is the first expression format; if the second syntax element is the ninth preset value, it is determined that the expression format of the i-th block is the second expression format.

Specifically, the expression format type corresponding to the i-th block in the splicing diagram can be indicated by setting different values to the second syntax element. Taking the first expression format as point cloud and the second expression format as multi-viewpoint video as an example, if the i-th block is a point cloud block, the second syntax element is set to the eighth default value; if the i-th block is a point cloud block, the second syntax element is set to the eighth default value; If the block is a multi-viewpoint video block, the second syntax element is set to the ninth default value. The embodiments of this application do not limit the specific values of the eighth preset value and the ninth preset value. Optional, the eighth preset value is 0. Optional, the ninth default value is 1.

In some embodiments, encoding the at least one spliced image and the spliced image information to obtain a code stream includes: if the expression format of the i-th block is a first expression format, determining the i-th block The sub-tiles in each block are encoded using the encoding standard corresponding to the first expression format to obtain a code stream corresponding to the visual media content of the first expression format; if the expression format of the i-th block is the In the second expression format, it is determined that the sub-tiles in the i-th block are encoded using the encoding standard corresponding to the second expression format, and a code stream corresponding to the visual media content of the second expression format is obtained.

In some embodiments, the second syntax element is located in the mosaic block data unit header of the i-th block of the mosaic map. In some embodiments, the second syntax element may also be located in a sub-patch data unit (patch_data_unit). For example, on the premise that the second syntax element (ath_toolset_type) is known to be 1, it is determined that the current sub-tile is encoded using the multi-view video coding standard. On the premise that the second syntax element (ath_toolset_type) is known to be 0, it is determined that the current sub-tile is encoded using the point cloud encoding standard.

In some embodiments, encoding the at least one spliced image and the spliced image information to obtain a code stream includes: calling a video encoder to encode the at least one spliced image to obtain a video compression sub-stream.

In the embodiment of the present application, in order to reduce the number of encoders and reduce the encoding cost, during encoding, at least two visual media contents are first processed separately (that is, packaged) to obtain multiple isomorphic blocks. Next, at least two homogeneous blocks with different expression formats are spliced into a heterogeneous mixed spliced graph, and at least one homogeneous block with exactly the same expression format is spliced into a homogeneous spliced graph. For the heterogeneous mixed spliced graph and The isomorphic splicing image is encoded to obtain the video compression sub-stream. This encoding and decoding method is suitable for application scenarios of visual media content in multiple expression formats, expanding the scope of application. Moreover, by splicing homogeneous blocks of different expression formats into a heterogeneous hybrid splicing image for encoding, during encoding, The video encoder can be called only once for encoding, thereby reducing the number of 2D video encoders such as HEVC, VVC, AVC, and AVS that need to be called, reducing encoding costs and improving ease of use.

In the embodiment of the present application, the video encoder used to perform video encoding on the heterogeneous hybrid splicing image and the homogeneous splicing image to obtain the video compression sub-stream can be the video encoder shown in Figure 2A above. That is to say, in the embodiment of the present application, the heterogeneous hybrid splicing image or the homogeneous splicing image is used as a frame image. Block division is first performed, and then intra-frame or inter-frame prediction is used to obtain the predicted value of the coding block. The predicted value of the coding block and The original values are subtracted to obtain the residual value. After transforming and quantizing the residual value, the video compression sub-stream is obtained.

In the embodiment of the present application, while generating at least one mosaic image, the mosaic image information corresponding to each mosaic image is generated. The spliced image information is encoded to obtain the spliced image information sub-stream. Wherein, the splicing diagram information includes a first syntax element used to indicate the type of the splicing diagram, and a second syntax element used to indicate the expression format of each isomorphic block in the splicing diagram. The embodiments of the present application do not limit the method of encoding the spliced image information. For example, conventional data compression encoding methods such as equal-length encoding or variable-length encoding may be used for compression.

Finally, the video compression sub-stream and the splicing image information sub-stream are written in the same code stream to obtain the final code stream. In other words, the embodiments of the present application not only support heterogeneous source formats such as video, point cloud, grid, etc., but also support homogeneous source formats in the same compressed code stream.

In some embodiments, the method further includes: encoding the parameter set of the code stream to obtain a code stream parameter set sub-stream. Specifically, the encoding end synthesizes the video compression sub-stream, the splicing image information sub-stream and the parameter set sub-stream into a code stream. The parameter set sub-code stream of the code stream includes a third syntax element, and the code stream corresponding to the visual media content including at least one expression format in the code stream is determined according to the third syntax element. That is to say, the encoding end sends the third syntax element to indicate whether the code stream contains visual media content in at least two expression formats at the same time. For example, when the third syntax element indicates that the code stream includes a code stream corresponding to visual media content in an expression format, it can be understood that the encoding end processes the visual media content in an expression format to obtain a homogeneous block. , splicing a kind of isomorphic blocks to obtain a isomorphic splicing graph. When the third syntax element indicates that the code stream includes code streams corresponding to visual media content in at least two expression formats, it can be understood that the encoding end obtains at least two isomorphic blocks for the visual media content in at least two expression formats. Two homogeneous blocks are spliced to obtain a homogeneous spliced image and/or a heterogeneous hybrid spliced image.

Exemplarily, when the third syntax element indicates that the code stream includes code streams corresponding to visual media content in at least two expression formats, the method includes: isomorphically splicing the isomorphic blocks of the first expression format to obtain the first Isomorphic splicing diagram: perform isomorphic splicing on the isomorphic blocks of the second expression format to obtain the second isomorphic splicing diagram; or, perform isomorphic splicing on the isomorphic blocks of the first expression format and the isomorphic blocks of the second expression format. Perform heterogeneous splicing to obtain a heterogeneous mixed splicing diagram; or perform isomorphic splicing on the isomorphic blocks of the first expression format to obtain a first homogeneous splicing diagram, and perform homogeneous splicing on the isomorphic blocks of the first expression format and the second The isomorphic blocks of the expression format are heterogeneously spliced to obtain a heterogeneous mixed splicing diagram; or the isomorphic blocks of the second expression format are isomorphically spliced to obtain a second isomorphic splicing diagram. The homogeneous blocks and the homogeneous blocks in the second expression format are heterogeneously spliced to obtain a heterogeneous mixed splicing diagram.

In some embodiments, setting the third syntax element to a different value indicates that the code stream includes a code stream corresponding to the visual media content of at least one expression format. That is to say, certain preset values of the third syntax element can indicate that the code stream includes code streams corresponding to visual media content in one or more expression formats.

Exemplarily, determining the code stream corresponding to the visual media content including at least one expression format in the code stream according to the third syntax element includes: the third syntax element is a first value, and determining the The code stream simultaneously includes a code stream corresponding to the visual media content in the first expression format and a code stream corresponding to the visual media content in the second expression format; the third syntax element is a second value, which determines that the code stream includes all The code stream corresponding to the visual media content in the first expression format; the third syntax element is a third value, which determines that the code stream includes the code stream corresponding to the visual media content in the second expression format.

For example, the parameter set of the code stream may be V3C_VPS, and the third syntax element may be ptl_profile_toolset_idc in V3C_VPS.

For example, taking the first expression format as multi-view video and the second expression format as point cloud, when the third syntax element is set to a first value, the first value is used to indicate that the code stream also contains multi-view video codes. Streams and point cloud code streams. As a specific example, when ptl_profile_toolset_idc=X and X is 128/129/130/132/133/134, it means that the current code stream contains both point cloud and multi-viewpoint code streams. For another example, when the third syntax element is set to the second value, the second value is used to indicate that the code stream only contains the point cloud code stream. As a specific example, when ptl_profile_toolset_idc=X and X is 0/1, it means that the current code stream only contains point cloud code streams. For another example, the third syntax element is set to a third value, and the third value is used to indicate that the code stream only contains a multi-view video code stream. As a specific example, when ptl_profile_toolset_idc=X and X is 64/65/66, it means that the current code stream only contains multi-view video code streams. It should be understood that the above values of the first numerical value, the second numerical value, and the third numerical value are only examples, and the embodiments of the present application are not limited thereto.

In this example, V3C_VPS in the existing V3C standard can be reused, and ptl_profile_toolset_idc is preconfigured with values such as 0/1, 64/65/66, 128/129/130/132/133/134 to indicate the current code stream The code stream type included in . When encoding visual media content, the embodiment of the present application adds the value of the third syntax element in the parameter set to indicate that the code stream contains the code stream corresponding to the visual media content in that expression format, which can help improve The decoding accuracy of the decoder also enables the V3C standard to support visual media content containing one or more expression formats such as multi-view videos, point clouds, grids, etc. in the same compressed code stream.

Table 3 shows an example of available toolset profile components (Available toolset profile components). Table 3 provides a list of toolset profile components defined for V3C and their corresponding identification syntax element values, such as ptl_profile_toolset_idc and ptc_one_v3c_frame_only_flag. This definition may be used for this document only. The syntax element ptl_profile_toolset_idc provides the main definition of the toolset profile. Additional syntax elements such as ptc_one_v3c_frame_only_flag can specify additional characteristics or restrictions of the defined profile. ptc_one_v3c_frame_only_flag can be used to support only a single V3C frame. It should be noted that 2..63, 67..127, 131, 135..255 in ptl_profile_toolset_idc are reserved and are temporarily undefined. The standards organization may further stipulate them in future standards. The configuration file types defined in Table 3 can include dynamic (Dynamic) or static (Static).

Table 3 Available toolset profile components (Available toolset profile components)

In some embodiments, the parameter set of the code stream further includes a first syntax element, wherein the first syntax element is used to indicate the type of each mosaic picture, specifically to indicate that the mosaic picture is the heterogeneous hybrid mosaic picture or The isomorphic splicing diagram; writing the first syntax element into the parameter set of the code stream. For example, the first syntax element (vps_toolset_type) is added to V3C_VPS, and vps_toolset_type is used to determine whether each spliced image and its corresponding V3C unit should belong to a point cloud spliced image/multi-viewpoint spliced image/point cloud + multi-viewpoint heterogeneous mixture. Mosaic diagram. At the same time, in order to be compatible with previous standards, the following new syntax and semantics are implemented, as well as constraints on the old semantics.

Exemplarily, the first syntax element is a first preset value, which determines that the mosaic graph is a heterogeneous hybrid mosaic graph including homogeneous blocks of the first expression format and the second expression format, wherein the first An expression format and the second expression format are different expression formats; the first syntax element is a second preset value, which determines that the splicing diagram is a isomorphic splicing including isomorphic blocks of the first expression format Figure; the first syntax element is a third preset value, which determines that the mosaic graph is a isomorphic mosaic graph including isomorphic blocks of the second expression format.

For example, taking the first expression format as multi-view video and the second expression format as point cloud, the first syntax element is a first preset value, and the first preset value is used to indicate that the spliced image includes point cloud blocks. and a heterogeneous hybrid spliced image of multi-viewpoint video blocks, the first syntax element is a second preset value, and the second preset value is used to indicate that the spliced image includes a homogeneous spliced image of multi-viewpoint video blocks (which may be called a multi-viewpoint video block). viewpoint video mosaic), the first syntax element is a third preset value, and the third preset value is used to indicate that the mosaic includes a isomorphic mosaic of point cloud blocks (which may be called a point cloud mosaic).

For example, after obtaining the third syntax element ptl_profile_toolset_idc=128/129/130/132/133/134, it is necessary to parse each splicing image in VPS to obtain the first syntax element (vps_toolset_type), and determine vps_toolset_type=X, X When X is 1, it means that the spliced image only contains multi-view video blocks, which should meet the requirements of the multi-view coding method; when X is 2, it means that the spliced image only contains point cloud blocks, which should meet the requirements of the point cloud encoding method; Multi-view video blocks and point cloud blocks should meet the requirements of both multi-view and point cloud encoding methods. It should be understood that the above values of the first syntax element are only examples, and the embodiments of the present application are not limited thereto.

Table 4 shows the syntax of the general V3C parameter set (General V3C parameter set syntax). The V3C parameter set has a new syntax element vps_toolset_type. Specifically, vps_toolset_type[j] can be used to represent the type of splicing diagram with index j. By adding the syntax element vps_toolset_type to the V3C parameter set, the decoding end can obtain the vps_toolset_type from the V3C parameter set. According to the vps_toolset_type, it can quickly determine whether each stitched image and its corresponding V3C unit should belong to point cloud/multi-viewpoint/point cloud+ Multiple viewpoints to determine which coding method the spliced image should meet.

Table 4 General V3C parameter set syntax (General V3C parameter set syntax)

In some embodiments, the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element. Exemplarily, the splicing graph sequence parameter set corresponding to the splicing graph includes the first sub-syntax element and the second sub-syntax element. The first sub-syntax element and the second sub-syntax element are used to indicate a splicing diagram type, wherein the splicing diagram is the heterogeneous hybrid splicing diagram or the isomorphic splicing diagram.

Exemplarily, the first sub-grammar element is a fourth preset value and the second sub-grammar element is a fifth preset value, and it is determined that the mosaic diagram includes the isomorphic block of the first expression format and the th A heterogeneous hybrid splicing diagram of homogeneous blocks in two expression formats, the first sub-grammar element is a fourth preset value and the second sub-grammar element is a seventh preset value, and it is determined that the splicing diagram includes The isomorphic mosaic diagram of the isomorphic blocks of the first expression format; the first sub-grammar element is the sixth preset value and the second sub-grammar element is the fifth preset value, determining the mosaic diagram It is a isomorphic mosaic diagram including the isomorphic blocks of the second expression format.

Optionally, the first sub-grammar element is asps_vpcc_extension_present_flag in the splicing diagram sequence parameter set, and the second sub-grammar element is asps_miv_extension_present_flag.

For example, the NAL-ASPS of the V3C_AD code stream may contain asps_miv_extension_present_flag and asps_vpcc_extension_present_flag.

In the embodiment of the present application, the first sub-syntax element and the second sub-syntax element are set to specific values to indicate that the spliced image is the heterogeneous hybrid spliced image or the isomorphic spliced image.

For example, after obtaining the third syntax element ptl_profile_toolset_idc=128/129/130/132/133/134, obtain asps_vpcc_extension_present_flag=X and asps_miv_extension_present_flag=Y from the splicing image information of the splicing image. When X is 0 and Y is 1, it means that the spliced image only contains multi-viewpoint video blocks, which should meet the requirements of the multi-viewpoint encoding method; when X is 1 and Y is 0, it means that the spliced image only contains point cloud blocks, which should meet the requirements of the point cloud encoding method. ; X is 1 and Y is 1, which means that the spliced image contains both multi-view video blocks and point cloud blocks, and should meet the requirements of multi-view and point cloud encoding methods at the same time. It should be understood that the above values of the seventh preset value and the eighth preset value are only examples, and the embodiments of the present application are not limited thereto.

Table 5 shows the syntax of the general atlas sequence parameter set RBSP syntax. The splicing map sequence parameter set can be understood as splicing map information. The encoding end uses the syntax elements asps_vpcc_extension_present_flag and asps_miv_extension_present_flag in the splicing map sequence parameter set to represent The type of splicing image. The encoding end can obtain these two syntax elements from the parameter set of the splicing image by parsing the code stream. Based on the values of these two syntax elements, it is determined that the splicing image should belong to point cloud/multi-view/point cloud+multi-view. This determines which encoding method requirements the spliced image should meet.

Table 5 Syntax of general atlas sequence parameter set RBSP syntax

After implementing the above syntax, it is possible to realize a spliced image with multiple viewpoints and point clouds under a VPS. It is further necessary to realize that when there are multiple isomorphic blocks in a spliced image, each isomorphic block is a multi-view mosaic. The case of a collection of viewpoint sub-block images or a collection of point cloud sub-block images. Because the existing technology can only realize one kind of isomorphic block in a mosaic picture. Therefore, the embodiment of the present application adds a second syntax element. According to the second syntax element, it is determined whether the expression format of a homogeneous block in a spliced image is multi-view video, point cloud, grid, etc.

In some embodiments, the mosaic map block data unit header of the i-th block includes a second syntax element. When it is determined according to the first syntax element that the mosaic map is a heterogeneous hybrid mosaic map, the mosaic map The information also includes a second syntax element, according to which the expression format of the i-th block in the spliced image is determined.

When encoding the heterogeneous hybrid splicing image, the embodiment of the present application sets a second syntax element to indicate the expression format of the i-th block in the heterogeneous hybrid splicing image, which can help improve the decoding accuracy of the decoder. At the same time, the V3C standard can support visual media content in different expression formats such as multi-view videos and point clouds in the same compressed code stream. For example, the second syntax element may be ath_toolset_type in the mosaic map tile data unit header (atlas_tile_header).

Exemplarily, the second syntax element is the eighth preset value, and it is determined that the expression format of the i-th block is the first expression format; the second syntax element is the ninth preset value, and it is determined that the expression format of the i-th block is the first expression format. The expression format of the i-th block is the second expression format.

For example, when the spliced image is a heterogeneous mixed spliced image, the atlas_tile_header is parsed in the ACL NAL unit type code stream in the AD unit, and the ath_toolset_type is obtained from the analysis, and it is judged that ath_toolset_type=X. If X is 0, it means that the current block is a point cloud area. block; X is 1 indicating that the current block is a multi-view video block.

Table 6 shows the Atlas tile header syntax. The encoding end adds a new syntax element ath_toolset_type to the Atlas tile header syntax to indicate the block type. The decoded code stream can be spliced. The picture block data unit header syntax obtains ath_toolset_type to determine whether the current block belongs to multi-view video decoding or point cloud decoding.

Table 6 Atlas tile header syntax

Optionally, the second syntax element may also be located in the sub-patch data unit (patch_data_unit). For example, on the premise that the second syntax element (ath_toolset_type) is known to be 1, it is determined that the current sub-tile is encoded using the multi-viewpoint video encoding method. On the premise that the second syntax element (ath_toolset_type) is known to be 0, it is determined that the current sub-tile is encoded using the point cloud encoding method. The sub-patch data unit syntax (Patch data unit syntax) can be shown in Table 7:

Table 7 Sub-patch data unit syntax (Patch data unit syntax)

In some embodiments, a vps_toolset_type[j] value of 1 indicates that the value of the syntax element of the toolset profile component of the atlas with index j should comply with ISO/IEC 23090-12 Table A-1-1 (i.e. The values specified in Table 8);

The value of vps_toolset_type[j] is 2, indicating that the value of the syntax element of the atlas toolset profile component with index j should comply with the values specified in ISO/IEC 23090-5 Table H-3, but vps_extension_present_flag, vps_packing_information_present_flag, vps_miv_extension_present_flag, Except for the values of vuh_unit_type, vps_atlas_count_minus1, their values should comply with the values specified in ISO/IEC 23090-12 Table A-1-1;

A vps_toolset_type[j] value of 3 indicates that the value of the syntax element of the atlas toolset grade component with index j should comply with the extended ISO/IEC 23090-12 Table A-1-2 (i.e. Table 9-1 and Table 9 -2); Table A-1-1 and Table A-1-2 respectively represent the relevant syntax restrictions of toolbox level components for multi-viewpoints and the toolbox level for heterogeneous data under the integrated code stream. Restrictions on component-related syntax.

A vps_toolset_type[j] value of 0 or any value from 4 to 7 indicates that the value is reserved for future use by ISO/IEC and should not appear in bitstreams conforming to this version of this document. Decoders conforming to this version of this document should ignore such reserved unit types. Allowed values for the syntax element value of the MIV toolset configuration file.

ath_toolset_type indicates that the value of the syntax element of the tool set level component of the current tile should conform to the value specified in Table A-1 of the ISO/IEC 23090-12 extension. The value range of ath_toolset_type should be between 0 and 1.

Table 8 Allowable values of syntax element values for the MIV toolset profile

Table 9-1 Allowable values of syntax element values for the MIV toolset profile(Extended)

Table 9-2 Allowable values of syntax element values for the MIV toolset profile(Extended)

Figure 9 is a schematic diagram of the V3C bitstream structure provided by the embodiment of the present application. Among them, the V3C parameter set () (V3C_parameter_set()) of V3C_VPS can include ptl_profile_toolset_idc. If ptl_profile_toolset_idc is 128/129/130/132/133/134, it means that the current code stream also contains a point cloud code stream (such as VPCC basic or VPCC extended, etc.) and multi-view video streams (such as MIV main or MIV Extended or MIV Geometry Absent, etc.).

The V3C parameter set () (V3C_parameter_set()) of V3C_VPS can include the first syntax element (vps_toolset_type). When ptl_profile_toolset_idc is 128/129/130/132/133/134, vps_toolset_type is 1, which means that the current splicing diagram only exists For multi-view video blocks, a value of 2 means that only point cloud blocks exist in the current spliced image, and a value of 3 means that both multi-view point blocks and point cloud blocks exist in the current spliced image.

Alternatively, the splicing map sequence parameter set () (Atlas_sequence_parameter_set_rbsp()) in the NAL_ASPS in the atlas sub-bitstream () (Atlas_sub_bitstream()) of V3C_AD may include asps_vpcc_extension_present_flag and asps_miv_extension_present_flag. When ptl_profile_toolset_idc is 128/129/130/132/133/134, asps_vpcc_extension_present_flag=X and asps_miv_extension_present_flag=Y. When X is 0 and Y is 1, it means that the spliced image only contains multi-viewpoint video blocks; when X is 1 and Y is 0, it means that the spliced image only contains point cloud blocks; Blocks and point cloud blocks.

The ACL NAL unit type (ACL_NAL_unit_type) in Atlas_sub_bitstream() of V3C_AD includes splicing image information. For example, the mosaic map tile data unit (atlas_tile_data_unit()) may include ath_toolset_type. If ath_toolset_type is no (that is, 0), it means that the current block belongs to a point cloud block. If atdu_type_flag is yes (that is, 1), it means that the current block belongs to a multi-viewpoint video block.

Further, the sub-patch information data () (patch_information_data) includes a sub-patch data unit (patch_data_unit). If ath_toolset_type is no (that is, 0), it means that the current sub-tile is implemented using the point cloud video encoding method. When ath_toolset_type is yes (that is, 1), it means that the current sub-tile is implemented using a multi-viewpoint video coding method.

By obtaining the first syntax element of each spliced image, determining whether the spliced image includes both point cloud blocks and multi-viewpoint video blocks based on the first syntax element value, and determining that both point cloud blocks and multi-viewpoint video blocks exist in the spliced image. When selecting video blocks, you need to obtain the ath_toolset_type of each block in the splicing image to determine the block type.

The encoding method of the present application is introduced above by taking the encoding end as an example. The video decoding method provided by the embodiment of the present application is described below by taking the decoding end as an example.

Figure 10 is a schematic flow chart of a decoding method provided by an embodiment of the present application. As shown in Figure 10, the decoding method in this embodiment of the present application includes:

Step 1001: Decode the code stream to obtain a spliced image and spliced image information, wherein the spliced image information includes a first syntax element, and the spliced image is determined to be a heterogeneous hybrid spliced image or a homogeneous spliced image according to the first syntax element;

In some embodiments, determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram includes: if the first syntax element is a first preset value, then determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram. The figure shows a heterogeneous mixed splicing diagram including homogeneous blocks of a first expression format and a second expression format, wherein the first expression format and the second expression format are different expression formats; the first syntax element is the second preset value, then it is determined that the splicing diagram is a isomorphic splicing diagram including the isomorphic blocks of the first expression format; the first syntax element is the third preset value, then it is determined that the splicing The figure shows a isomorphic mosaic diagram including the isomorphic blocks of the second expression format.

In some embodiments, the first syntax element includes: a first sub-syntax element and a second sub-syntax element, and it is determined that the splicing graph is heterogeneous according to the first sub-syntax element and the second sub-syntax element. A hybrid splicing diagram or a homogeneous splicing diagram; accordingly, the determination of the splicing diagram according to the first syntax element as a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram includes: the first sub-grammar element is a fourth preset value, it is determined that the spliced graph includes isomorphic blocks of the first expression format; and/or, the second sub-grammar element is a fifth preset value, then it is determined that the spliced graph includes isomorphic blocks of the second expression format. Building block; wherein the first expression format and the second expression format are different expression formats.

In some embodiments, determining according to the first syntax element that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram further includes: determining that the first sub-grammar element is a sixth preset value. If the mosaic graph does not include the isomorphic blocks of the first expression format; if the second sub-syntax element is the seventh preset value, it is determined that the mosaic graph does not include the isomorphic blocks of the second expression format.

Specifically, determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram according to the first syntax element includes: the first sub-grammar element is a fourth preset value and the second sub-grammar element is the fifth preset value, it is determined that the splicing diagram is a heterogeneous hybrid splicing diagram including homogeneous blocks of the first expression format and homogeneous blocks of the second expression format, and the first sub-grammar element is the fourth The preset value and the second sub-grammar element is the seventh preset value, which determines that the mosaic diagram is a isomorphic mosaic diagram including isomorphic blocks of the first expression format; the first sub-grammar element is The sixth preset value and the second sub-syntax element are the fifth preset value, which determines that the mosaic graph is a isomorphic mosaic graph including isomorphic blocks of the second expression format.

In some embodiments, the first syntax element is located in a parameter set sub-codestream of the codestream.

In some other embodiments, the mosaic map sequence parameter set corresponding to the mosaic map includes the first syntax element.

In some embodiments, the at least one expression format includes: at least one of multi-view video, point cloud, and mesh. Specifically, the first expression format is one of multi-view video, point cloud and grid, the second expression format is one of multi-view video, point cloud and grid, the first expression format and the second expression format are different.

In some embodiments, the code stream further includes a parameter set sub-code stream of the code stream, the parameter set sub-code stream of the code stream includes a third syntax element, and it is determined according to the third syntax element that the code stream includes A code stream corresponding to visual media content in at least one expression format. The method further includes: decoding the parameter set sub-code stream of the code stream to obtain the parameter set of the code stream, and obtaining the third syntax element from the parameter set of the code stream.

In some embodiments, the method further includes: the third syntax element is a first value, determining that the code stream includes both a code stream corresponding to the visual media content in the first expression format and a visual code stream in the second expression format. The code stream corresponding to the media content; the third syntax element is a second value, and it is determined that the code stream includes the code stream corresponding to the visual media content of the first expression format; the third syntax element is a third value , determining the code stream corresponding to the visual media content including the second expression format in the code stream.

In some embodiments, decoding the code stream to obtain at least one spliced image includes: determining according to the third syntax element that the code stream includes code streams corresponding to visual media content in at least two expression formats, decoding the The code stream obtains a heterogeneous hybrid splicing image.

In some embodiments, decoding the code stream to obtain at least one spliced image includes: determining according to the third syntax element that the code stream includes code streams corresponding to visual media content in at least two expression formats, decoding the The code stream obtains isomorphic splicing images of at least two expression formats. That is to say, when the code stream includes code streams corresponding to visual media content in at least two expression formats, each expression format corresponds to a isomorphic splicing diagram.

In some embodiments, decoding the code stream to obtain at least one spliced image includes: determining according to the third syntax element that the code stream includes code streams corresponding to visual media content in at least two expression formats, decoding the The code stream obtains heterogeneous mixed splicing images and isomorphic splicing images of at least two expression formats. That is to say, when the code stream includes code streams corresponding to visual media content in at least two expression formats, the isomorphic blocks of some expression formats construct a heterogeneous hybrid splicing diagram, and the isomorphic blocks of another part of the expression formats construct an isomorphic Mosaic diagram.

In some embodiments, the heterogeneous hybrid mosaic diagram includes at least one of the following: a single attribute heterogeneous hybrid mosaic diagram and a multi-attribute heterogeneous hybrid mosaic diagram; the isomorphic mosaic diagram includes at least one of the following: single attribute isomorphism Mosaic graphs and multi-attribute isomorphic mosaic graphs.

In some embodiments, the code stream includes a video compression sub-stream and a splicing picture information sub-stream, and decoding the code stream to obtain at least one splicing picture and splicing picture information includes: decoding the video compression sub-stream , obtain the at least one spliced image; decode the spliced image information sub-stream to obtain the spliced image information of the at least one spliced image. Exemplarily, it is determined according to the third syntax element that the code stream includes a code stream corresponding to visual media content in at least two expression formats, the video compression sub-stream is decoded, and the code stream is decoded to obtain heterogeneous hybrid splicing. picture. Or, determine according to the third syntax element that the code stream includes a code stream corresponding to visual media content in at least two expression formats, decode the video compression sub-stream, and decode the code stream to obtain a heterogeneous hybrid splicing image and Isomorphic splicing diagram; or, determine according to the third syntax element that the code stream includes code streams corresponding to visual media content of at least two expression formats, decode the video compression sub-code stream, and obtain the same code stream of at least two expression formats. Construct a mosaic diagram.

Step 1002: When it is determined that the spliced image is a heterogeneous hybrid spliced image according to the first syntax element, split the spliced image according to the spliced image information of the spliced image to obtain at least two types of isomorphic blocks, Wherein, the at least two isomorphic blocks correspond to different visual media content expression formats;

Step 1003: When it is determined that the spliced graph is a isomorphic spliced graph according to the first syntax element, the spliced graph is split according to the spliced graph information of the spliced graph to obtain a homogeneous block, wherein, The one isomorphic block corresponds to the same visual media content expression format;

Step 1004: Decode and reconstruct the isomorphic blocks to obtain visual media content in at least one expression format.

In some embodiments, the method further includes: when determining that the spliced image is a heterogeneous hybrid spliced image according to the first syntax element, the spliced image information further includes a second syntax element. According to the second syntax element The element determines the expression format of the i-th block in the mosaic diagram.

Exemplarily, determining the expression format of the i-th block in the mosaic diagram based on the second syntax element includes: the second syntax element is an eighth preset value, and determining the i-th block The expression format is the first expression format; the second syntax element is the ninth preset value, and the expression format of the i-th block is determined to be the second expression format.

In some embodiments, the second syntax element is located in the mosaic block data unit header of the i-th block of the mosaic map.

In some embodiments, decoding and reconstructing the isomorphic blocks to obtain visual media content in at least one expression format includes: if the expression format of the i-th block is the first expression format, determining The sub-tiles in the i-th block are decoded and reconstructed using the decoding method corresponding to the first expression format to obtain the visual media content of the first expression format; if the expression format of the i-th block is In the second expression format, it is determined that the sub-tiles in the i-th block are decoded and reconstructed using the decoding method corresponding to the second expression format to obtain the visual media content of the second expression format.

For example, the decoded code stream obtains a multi-viewpoint video splicing image, a point cloud splicing image, and a heterogeneous hybrid splicing image. According to the splicing information of the heterogeneous hybrid splicing image, the heterogeneous hybrid splicing image is split, and the reconstructed multi-viewpoint video blocks and point cloud blocks are output; according to the splicing information corresponding to the multi-viewpoint video splicing image, the multi-viewpoint video is split Splicing image, output the reconstructed multi-view video block; split the point cloud splicing image according to the splicing image information corresponding to the point cloud splicing image, and output the reconstructed point cloud block; pass all the acquired multi-view point video blocks through the multi-view video Decoding generates a reconstructed multi-view video; all acquired point cloud blocks are decoded to generate a reconstructed point cloud.

Using the above technical solution, for application scenarios that include visual media content in one or more expression formats, homogeneous blocks of different expression formats are spliced into a heterogeneous mixed splicing picture, and homogeneous blocks of the same expression format are spliced into a heterogeneous mixed splicing image. Create a isomorphic splicing image, and write the resulting splicing image and the splicing image information into the code stream. There are both homogeneous splicing images (such as at least one of multi-viewpoint splicing images, point cloud splicing images and grid splicing images) and heterogeneous hybrid splicing images in the code stream, making this encoding and decoding method suitable for visual expressions of multiple expression formats. The application scenarios of media content expand the application scope of encoding and decoding methods. Moreover, the splicing picture information includes the first syntax element used to indicate the type of the splicing picture, which improves the decoding efficiency of the splicing picture at the decoding end. Furthermore, since homogeneous blocks of different expression formats are spliced into a heterogeneous hybrid splicing image for encoding and decoding, the number of 2D video codecs such as HEVC, VVC, AVC, and AVS that need to be called can be reduced, reducing Realize value and improve ease of use.

An embodiment of the present application also provides an encoding device. Figure 11 is a schematic block diagram of an encoding device provided by an embodiment of the present application. The encoding device 110 is applied to an encoder. As shown in Figure 11, the encoding device 110 includes:

The processing unit 1101 is configured to process visual media content in at least one expression format to obtain at least one isomorphic block, where different types of isomorphic blocks correspond to different visual media content expression formats;

The splicing unit 1102 is used to splice the at least one isomorphic block to obtain at least one spliced image and spliced image information, wherein the spliced image information includes a first syntax element, which is determined according to the first syntax element. The mosaic diagram is a heterogeneous hybrid mosaic diagram or a homogeneous mosaic diagram, the heterogeneous hybrid mosaic diagram includes at least two types of isomorphic blocks, and the isomorphic mosaic diagram includes one type of isomorphic block;

The encoding unit 1103 is used to encode the at least one splicing picture and the splicing picture information to obtain a code stream.

In some embodiments, the first syntax element includes: a first sub-syntax element and a second sub-syntax element, and it is determined that the splicing graph is heterogeneous according to the first sub-syntax element and the second sub-syntax element. Hybrid mosaic or isomorphic mosaic;

Determining that the splicing diagram is a heterogeneous hybrid splicing diagram or a homogeneous splicing diagram based on the first syntax element includes: if the first sub-grammar element is a fourth preset value, then it is determined that the splicing diagram includes the first expression isomorphic blocks of the format; and/or, if the second sub-syntax element is the fifth preset value, it is determined that the splicing diagram includes a isomorphic block of the second expression format; wherein, the first expression format and the second expression format are different expression formats.

In some embodiments, the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element.

In some embodiments, when it is determined based on the first syntax element that the spliced image is a heterogeneous hybrid spliced image, the spliced image information also includes a second syntax element, and the spliced image is determined based on the second syntax element The expression format of the i-th block in .

In some embodiments, determining the expression format of the i-th block in the mosaic diagram based on the second syntax element includes: the second syntax element is an eighth preset value, and determining the i-th block The expression format of the block is the first expression format; the second syntax element is the ninth preset value, which determines that the expression format of the i-th block is the second expression format.

In some embodiments, the encoding unit 1103 is configured to, if the expression format of the i-th block is a first expression format, determine that the sub-tile in the i-th block adopts the first expression format. Encode with the corresponding encoding method to obtain the code stream corresponding to the visual media content of the first expression format; if the expression format of the i-th block is the second expression format, determine the neutron of the i-th block The tiles are encoded using the encoding method corresponding to the second expression format to obtain a code stream corresponding to the visual media content of the second expression format.

In some embodiments, the parameter set sub-code stream of the code stream includes a third syntax element, and the code stream corresponding to the visual media content including at least one expression format in the code stream is determined according to the third syntax element. .

In some embodiments, determining the code stream corresponding to visual media content including at least one expression format in the code stream according to the third syntax element includes: the third syntax element is a first value, determining The code stream simultaneously includes a code stream corresponding to the visual media content in the first expression format and a code stream corresponding to the visual media content in the second expression format; the third syntax element is a second value, which determines the code stream in the code stream. The code stream includes the code stream corresponding to the visual media content in the first expression format; the third syntax element is a third value, which determines that the code stream includes the code stream corresponding to the visual media content in the second expression format.

In some embodiments, when the at least one mosaic includes a heterogeneous hybrid mosaic, the third syntax element is used to indicate that the code stream includes a code stream corresponding to visual media content in at least two expression formats.

In some embodiments, the encoding unit 1103 is used to encode the at least one spliced image to obtain a video compression sub-stream; encode the spliced image information of the at least one spliced image to obtain the spliced image information sub-stream. Code stream; synthesize the video compression sub-stream and the splicing image information sub-stream into the code stream.

In some embodiments, the at least one expression format includes: at least one of multi-view video, point cloud, and mesh.

An embodiment of the present application also provides a decoding device. Figure 12 is a schematic block diagram of a decoding device provided by an embodiment of the present application. The decoding device 120 is applied to a decoder. As shown in Figure 12, the decoding device 120 includes:

The decoding unit 1201 is used to decode the code stream to obtain the splicing image and the splicing image information, wherein the splicing image information includes a first syntax element, and it is determined according to the first syntax element that the splicing image is a heterogeneous hybrid splicing image or isomorphic mosaic;

The splitting unit 1202 is configured to split the spliced image according to the spliced image information of the spliced image to obtain at least two homogeneous ones when it is determined according to the first syntax element that the spliced image is a heterogeneous hybrid spliced image. Constructing blocks, wherein the at least two isomorphic blocks correspond to different visual media content expression formats;

The splitting unit 1202 is configured to split the spliced diagram according to the spliced diagram information of the spliced diagram to obtain a homogeneous spliced diagram when it is determined according to the first syntax element that the spliced diagram is a homogeneous spliced diagram. Constituent blocks, wherein said one isomorphic block corresponds to the same visual media content expression format;

The processing unit 1203 is configured to decode and reconstruct the homogeneous blocks to obtain visual media content in at least one expression format.

In some embodiments, the processing unit 1203 is configured to, if the expression format of the i-th block is a first expression format, determine that the sub-tiles in the i-th block adopt the first expression format. The corresponding decoding method performs decoding and reconstruction to obtain the visual media content of the first expression format; if the expression format of the i-th block is the second expression format, determine the sub-block in the i-th block using The decoding method corresponding to the second expression format performs decoding and reconstruction to obtain the visual media content of the second expression format.

In some embodiments, the decoding unit 1201 is configured to determine, according to the third syntax element, that the code stream includes a code stream corresponding to visual media content in at least two expression formats, and decode the code stream to obtain a heterogeneous hybrid Mosaic diagram.

In some embodiments, the code stream includes a video compression sub-stream and a splicing image information sub-stream, and the decoding unit 1201 is used to decode the video compression sub-stream to obtain the at least one splicing image; decoding The splicing picture information sub-stream is used to obtain the splicing picture information of the at least one splicing picture.

It should be understood that the device embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here.

The device and system of the embodiments of the present application are described above from the perspective of functional units in conjunction with the accompanying drawings. It should be understood that this functional unit can be implemented in the form of hardware, can also be implemented in the form of instructions in the software, or can also be implemented in a combination of hardware and software units. Specifically, each step of the method embodiments in the embodiments of the present application can be completed by integrated logic circuits of hardware in the processor and/or instructions in the form of software. The steps of the methods disclosed in conjunction with the embodiments of the present application can be directly embodied in hardware. The execution of the decoding processor is completed, or the execution is completed using a combination of hardware and software units in the decoding processor. Optionally, the software unit may be located in a mature storage medium in this field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, etc. The storage medium is located in the memory, and the processor reads the information in the memory and completes the steps in the above method embodiment in combination with its hardware.

In practical applications, the embodiment of the present application also provides an encoder. Figure 13 is a schematic block diagram of the encoder provided by an embodiment of the present application. As shown in Figure 13, the encoder 1310 includes:

The second memory 1320 and the second processor 1330; the second memory 1320 stores a computer program that can be run on the second processor 1330, and the second processor 1330 executes the encoding method on the encoder side of the program.

In practical applications, this embodiment of the present application also provides a decoder. Figure 14 is a schematic block diagram of a decoder provided by an embodiment of the present application. As shown in Figure 14, the decoder 1410 includes:

The first memory 1420 and the first processor 1430; the first memory 1420 stores a computer program that can be run on the first processor 1430, and the first processor 1430 executes the decoding method on the decoder side of the program.

In some embodiments of the present application, the processor may include, but is not limited to:

General processor, Digital Signal Processor (DSP), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates Or transistor logic devices, discrete hardware components, etc.

In some embodiments of the present application, the memory includes but is not limited to:

Volatile memory and/or non-volatile memory. Among them, non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory may be Random Access Memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are available, such as static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synch link DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DR RAM).

In addition, each functional module in this embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software function modules.

In yet another embodiment of the present application, see FIG. 15 , which shows a schematic structural diagram of a coding and decoding system provided by an embodiment of the present application. As shown in Figure 15, the encoding and decoding system 150 may include an encoder 1501 and a decoder 1502. The encoder 1501 may be a device integrated with the encoding device described in the previous embodiment; the decoder 1502 may be a device integrated with the decoding device described in the previous embodiment.

In this embodiment of the present application, in the coding and decoding system 150, both the encoder 1501 and the decoder 1502 can use the color component information of adjacent reference pixels and the pixels to be predicted to implement the calculation of the weighting coefficient corresponding to the pixel to be predicted; Moreover, different reference pixels can have different weighting coefficients. Applying this weighting coefficient to the chroma prediction of the pixels to be predicted in the current block can not only improve the accuracy of chroma prediction and save code rate, but also improve the encoding and decoding performance. .

An embodiment of the present application also provides a chip for implementing the above encoding and decoding method. Specifically, the chip includes: a processor, configured to call and run a computer program from a memory, so that the electronic device installed with the chip executes the above encoding and decoding method.

Embodiments of the present application also provide a computer storage medium in which a computer program is stored. When the computer program is executed by the second processor, the encoding method of the encoder is implemented; or, when the computer program is executed by the first processor, the encoding method of the encoder is implemented. The decoding method of the decoder. In other words, embodiments of the present application also provide a computer program product containing instructions, which when executed by a computer causes the computer to perform the method of the above method embodiments.

This application also provides a code stream, which is generated according to the above encoding method. Optionally, the code stream includes the above first syntax element, or includes a second syntax element and a third syntax element.

When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted over a wired connection from a website, computer, server, or data center (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website, computer, server or data center. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media. The available media may be magnetic media (such as floppy disks, hard disks, magnetic tapes), optical media (such as digital video discs (DVD)), or semiconductor media (such as solid state disks (SSD)), etc.

Those of ordinary skill in the art can appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed in this application can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

A unit described as a separate component may or may not be physically separate. A component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or it may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, each functional unit in various embodiments of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

The above contents are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or replacements within the technical scope disclosed in the present application, and should are covered by the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

A decoding method, including:

Decode the code stream to obtain a spliced image and spliced image information, wherein the spliced image information includes a first syntax element, and it is determined according to the first syntax element that the spliced image is a heterogeneous hybrid spliced image or a homogeneous spliced image;

When it is determined that the spliced graph is a heterogeneous hybrid spliced graph according to the first syntax element, the spliced graph is split according to the spliced graph information of the spliced graph to obtain at least two types of isomorphic blocks, wherein: The at least two isomorphic blocks correspond to different visual media content expression formats;

When it is determined that the spliced graph is a isomorphic spliced graph according to the first syntax element, the spliced graph is split according to the spliced graph information of the spliced graph to obtain a homogeneous block, wherein the one Each isomorphic block corresponds to the same visual media content expression format;

The homogeneous blocks are decoded and reconstructed to obtain visual media content in at least one expression format.
The method according to claim 1, wherein determining according to the first syntax element that the spliced graph is a heterogeneous hybrid spliced graph or a homogeneous spliced graph includes:

If the first syntax element is a first preset value, it is determined that the mosaic diagram is a heterogeneous hybrid mosaic diagram including homogeneous blocks of the first expression format and the second expression format, wherein the first expression format and the second expression format are different expression formats;

If the first syntax element is a second preset value, it is determined that the mosaic graph is a isomorphic mosaic graph including isomorphic blocks of the first expression format;

If the first syntax element is a third preset value, it is determined that the mosaic graph is a isomorphic mosaic graph including isomorphic blocks of the second expression format.
The method according to claim 1, wherein the first syntax element includes: a first sub-syntax element and a second sub-syntax element, and the said first sub-syntax element and the second sub-syntax element are determined according to the first sub-syntax element and the second sub-syntax element. The splicing diagram is a heterogeneous mixed splicing diagram or a homogeneous splicing diagram;

Determining according to the first syntax element that the spliced graph is a heterogeneous hybrid spliced graph or a homogeneous spliced graph includes:

If the first sub-syntax element is a fourth preset value, it is determined that the splicing diagram includes isomorphic blocks of the first expression format;

If the second sub-syntax element is a fifth preset value, it is determined that the splicing diagram includes isomorphic blocks of the second expression format;

Wherein, the first expression format and the second expression format are different expression formats.
The method according to claim 3, wherein determining according to the first syntax element that the spliced graph is a heterogeneous hybrid spliced graph or a homogeneous spliced graph further includes:

If the first sub-syntax element is a sixth preset value, it is determined that the spliced image does not include isomorphic blocks of the first expression format;

If the second sub-syntax element is a seventh preset value, it is determined that the spliced image does not include isomorphic blocks of the second expression format.
The method according to any one of claims 1 to 4, wherein the first syntax element is located in a parameter set sub-code stream of the code stream.
The method according to any one of claims 1 to 4, wherein the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element.
The method according to any one of claims 1-6, wherein the method further includes:

When the mosaic image is determined to be a heterogeneous hybrid mosaic image based on the first syntax element, the mosaic image information also includes a second syntax element, and the i-th block in the mosaic image is determined based on the second syntax element. expression format.
The method according to claim 7, wherein determining the expression format of the i-th block in the mosaic diagram according to the second syntax element includes:

If the second syntax element is the eighth preset value, then the expression format of the i-th block is determined to be the first expression format;

If the second syntax element is a ninth preset value, it is determined that the expression format of the i-th block is the second expression format.
The method of claim 7, wherein the second syntax element is located in a mosaic block data unit header of the i-th block of the mosaic.
The method according to any one of claims 7-9, wherein the decoding and reconstruction of the isomorphic blocks to obtain visual media content in at least one expression format includes:

If the expression format of the i-th block is the first expression format, it is determined that the sub-block in the i-th block is decoded and reconstructed using the decoding method corresponding to the first expression format to obtain the first expression. formats of visual media content;

If the expression format of the i-th block is the second expression format, it is determined that the sub-block in the i-th block is decoded and reconstructed using the decoding method corresponding to the second expression format to obtain the second expression. format of visual media content.
The method according to any one of claims 1 to 10, wherein the parameter set sub-code stream of the code stream includes a third syntax element, and it is determined according to the third syntax element that the code stream includes at least one expression format The code stream corresponding to the visual media content.
The method according to claim 11, wherein determining, according to the third syntax element, the code stream corresponding to the visual media content including at least one expression format in the code stream includes:

The third syntax element is a first value, which determines that the code stream includes both a code stream corresponding to the visual media content in the first expression format and a code stream corresponding to the visual media content in the second expression format;

The third syntax element is a second value, which determines that the code stream includes a code stream corresponding to the visual media content of the first expression format;

The third syntax element is a third numerical value, which determines that the code stream includes a code stream corresponding to the visual media content of the second expression format.
The method according to any one of claims 11-12, wherein the decoding code stream to obtain at least one splicing image includes:

It is determined according to the third syntax element that the code stream includes a code stream corresponding to visual media content in at least two expression formats, and the code stream is decoded to obtain a heterogeneous hybrid splicing image.
The method according to any one of claims 1 to 13, wherein the code stream includes a video compression sub-stream and a splicing image information sub-stream, and the decoding code stream obtains at least one splicing image and splicing image information, include:

Decode the video compression sub-stream to obtain the at least one splicing image;

Decode the splicing picture information sub-stream to obtain the splicing picture information of the at least one splicing picture.
The method according to any one of claims 1 to 14, wherein the at least one expression format includes: at least one of multi-view video, point cloud and mesh.
The method according to any one of claims 1 to 15, wherein the heterogeneous hybrid mosaic graph is at least one of the following: a single attribute heterogeneous hybrid mosaic graph and a multi-attribute heterogeneous hybrid mosaic graph;

The isomorphic splicing diagram includes at least one of the following: a single attribute isomorphic splicing diagram and a multi-attribute isomorphic splicing diagram.
A coding method that includes:

Process the visual media content of at least one expression format to obtain at least one isomorphic block, where different types of isomorphic blocks correspond to different visual media content expression formats;

The at least one isomorphic block is spliced to obtain at least one spliced graph and spliced graph information, wherein the spliced graph information includes a first syntax element, and it is determined that the spliced graph is a heterogeneous one according to the first syntax element. A heterogeneous hybrid mosaic map or a homogeneous mosaic map, the heterogeneous hybrid mosaic map includes at least two types of isomorphic blocks, and the isomorphic mosaic map includes one type of isomorphic block;

The at least one spliced image and the spliced image information are encoded to obtain a code stream.
The method according to claim 17, wherein the determining according to the first syntax element that the spliced graph is a heterogeneous hybrid spliced graph or a homogeneous spliced graph includes:

If the first syntax element is a first preset value, it is determined that the mosaic diagram is a heterogeneous hybrid mosaic diagram including homogeneous blocks of the first expression format and the second expression format, wherein the first expression format and the second expression format are different expression formats;

If the first syntax element is a second preset value, it is determined that the mosaic graph is a isomorphic mosaic graph including isomorphic blocks of the first expression format;

If the first syntax element is a third preset value, it is determined that the mosaic graph is a isomorphic mosaic graph including isomorphic blocks of the second expression format.
The method of claim 17, wherein the first syntax element includes: a first sub-syntax element and a second sub-syntax element, and the first sub-syntax element and the second sub-syntax element are determined according to the first sub-syntax element and the second sub-syntax element. The splicing diagram is a heterogeneous mixed splicing diagram or a homogeneous splicing diagram;

Determining according to the first syntax element that the spliced graph is a heterogeneous hybrid spliced graph or a homogeneous spliced graph includes:

If the first sub-syntax element is a fourth preset value, it is determined that the splicing diagram includes isomorphic blocks of the first expression format;

If the second sub-syntax element is a fifth preset value, it is determined that the splicing diagram includes isomorphic blocks of the second expression format;

Wherein, the first expression format and the second expression format are different expression formats.
The method according to claim 19, wherein determining according to the first syntax element that the spliced graph is a heterogeneous hybrid spliced graph or a homogeneous spliced graph further includes:

If the first sub-syntax element is a sixth preset value, it is determined that the spliced image does not include isomorphic blocks of the first expression format;

If the second sub-syntax element is a seventh preset value, it is determined that the spliced image does not include isomorphic blocks of the second expression format.
The method according to any one of claims 17 to 20, wherein the first syntax element is located in a parameter set sub-code stream of the code stream.
The method according to any one of claims 17 to 20, wherein the mosaic graph sequence parameter set corresponding to the mosaic graph includes the first syntax element.
The method according to any one of claims 17-22, wherein the method further includes:

When the mosaic image is determined to be a heterogeneous hybrid mosaic image based on the first syntax element, the mosaic image information also includes a second syntax element, and the i-th block in the mosaic image is determined based on the second syntax element. expression format.
The method according to claim 23, wherein determining the expression format of the i-th block in the mosaic diagram according to the second syntax element includes:

If the second syntax element is the eighth preset value, then the expression format of the i-th block is determined to be the first expression format;

If the second syntax element is a ninth preset value, it is determined that the expression format of the i-th block is the second expression format.
The method of claim 24, wherein the second syntax element is located in a mosaic block data unit header of the i-th block of the mosaic.
The method according to any one of claims 23 to 25, wherein said encoding the at least one splicing image and the splicing image information to obtain a code stream includes:

If the expression format of the i-th block is the first expression format, it is determined that the sub-blocks in the i-th block are encoded using the encoding method corresponding to the first expression format to obtain the first expression format. The code stream corresponding to the visual media content;

If the expression format of the i-th block is the second expression format, it is determined that the sub-tiles in the i-th block are encoded using the encoding method corresponding to the second expression format to obtain the second expression format. The code stream corresponding to the visual media content.
The method according to any one of claims 17 to 26, wherein the parameter set sub-code stream of the code stream includes a third syntax element, and it is determined according to the third syntax element that the code stream includes at least one The code stream corresponding to the visual media content in the expression format.
The method of claim 27, wherein determining, according to the third syntax element, a code stream corresponding to visual media content including at least one expression format in the code stream includes:

The third syntax element is a first value, which determines that the code stream includes both a code stream corresponding to the visual media content in the first expression format and a code stream corresponding to the visual media content in the second expression format;

The third syntax element is a second value, which determines that the code stream includes a code stream corresponding to the visual media content of the first expression format;

The third syntax element is a third numerical value, which determines that the code stream includes a code stream corresponding to the visual media content of the second expression format.
The method according to any one of claims 27-28, wherein when the at least one splicing diagram includes a heterogeneous hybrid splicing diagram, it is determined according to the third syntax element that the code stream includes at least two expression formats. The code stream corresponding to the visual media content.
The method according to any one of claims 17 to 29, wherein said encoding said at least one splicing image and the splicing image information to obtain a code stream includes:

Encode the at least one spliced image to obtain a video compression sub-stream;

Encode the splicing image information of the at least one splicing image to obtain a splicing image information sub-stream;

The video compression sub-stream and the splicing image information sub-stream are combined into the code stream.
The method according to any one of claims 17 to 30, wherein the at least one expression format includes: at least one of multi-view video, point cloud and mesh.
The method according to any one of claims 17 to 31, wherein the heterogeneous hybrid mosaic graph is at least one of the following: a single attribute heterogeneous hybrid mosaic graph and a multi-attribute heterogeneous hybrid mosaic graph;

The isomorphic splicing diagram includes at least one of the following: a single attribute isomorphic splicing diagram and a multi-attribute isomorphic splicing diagram.
A decoding device, including:

A decoding unit, configured to decode the code stream to obtain a splicing image and splicing image information, wherein the splicing image information includes a first syntax element, and the splicing image is determined to be a heterogeneous hybrid splicing image or a homogeneous splicing image according to the first syntax element. picture;

A splitting unit configured to split the spliced image according to the spliced image information of the spliced image to obtain at least two isomorphic images when it is determined according to the first syntax element that the spliced image is a heterogeneous mixed spliced image. Blocks, wherein the at least two isomorphic blocks correspond to different visual media content expression formats;

The splitting unit is configured to split the spliced diagram according to the spliced diagram information of the spliced diagram to obtain an isomorphic spliced diagram when it is determined according to the first syntax element that the spliced diagram is an isomorphic spliced diagram. Blocks, wherein said one isomorphic block corresponds to the same visual media content expression format;

A processing unit configured to decode and reconstruct the homogeneous blocks to obtain visual media content in at least one expression format.
An encoding device, which includes:

A processing unit, configured to process visual media content in at least one expression format to obtain at least one isomorphic block, wherein different types of isomorphic blocks correspond to different visual media content expression formats;

A splicing unit, configured to splice the at least one isomorphic block to obtain at least one splicing graph and splicing graph information, wherein the splicing graph information includes a first syntax element, and the splicing graph information is determined according to the first syntax element. The mosaic diagram is a heterogeneous hybrid mosaic diagram or a homogeneous mosaic diagram, the heterogeneous hybrid mosaic diagram includes at least two types of isomorphic blocks, and the isomorphic mosaic diagram includes one type of isomorphic block;

An encoding unit, used to encode the at least one spliced image and the spliced image information to obtain a code stream.
A decoder, wherein the decoder includes:

a first memory and a first processor;

The first memory stores a computer program that can be run on a first processor, and when the first processor executes the program, the decoding method of any one of claims 1 to 16 is implemented.
An encoder, wherein the encoder includes:

second memory and second processor;

The second memory stores a computer program that can be run on a second processor, and when the second processor executes the program, the encoding method of any one of claims 17 to 32 is implemented.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by the first processor, the decoding method of any one of claims 1 to 16 is implemented; or, When the computer program is executed by the second processor, the encoding method according to any one of claims 17 to 32 is implemented.
A code stream, wherein the code stream is generated based on the method described in any one of claims 17 to 32.