CN114913249A

CN114913249A - Encoding method, decoding method and related devices

Info

Publication number: CN114913249A
Application number: CN202110170984.8A
Authority: CN
Inventors: 杨海涛
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2022-08-16
Also published as: WO2022166462A1

Abstract

The application provides an encoding method, a decoding method and related equipment. Relates to the technical field of video or image compression based on artificial intelligence, in particular to the technical field of video compression based on a neural network. The method comprises the following steps: encoding an original image to obtain a first code stream, and encoding a fidelity image to obtain a second code stream; the decoding end decodes the first code stream to obtain a reconstructed image of the original image, the decoding end decodes the second code stream to obtain a reconstructed image of the fidelity map, and the reconstructed image of the fidelity map is used for representing distortion between at least part of the area of the original image and at least part of the area of the reconstructed image. According to the embodiment of the application, the distortion intensity information of the coded image can be obtained at a decoding end.

Description

Encoding method, decoding method and related devices

Technical Field

The embodiment of the application relates to the technical field of Artificial Intelligence (AI) -based video or image compression, in particular to an encoding and decoding method and related equipment.

Background

Video encoding (video encoding and decoding) is widely used in digital video applications such as broadcast digital television, video transmission over the internet and mobile networks, real-time session applications such as video chat and video conferencing, DVD and blu-ray discs, video content capture and editing systems, and security applications for camcorders.

Even where the movie is short, a large amount of video data needs to be described, which can cause difficulties when the data is to be sent or otherwise transmitted in a network with limited bandwidth capacity. Therefore, video data is typically compressed and then transmitted in modern telecommunication networks. As memory resources may be limited, the size of the video may also become an issue when storing the video on the storage device. Video compression devices typically use software and/or hardware on the source side to encode the video data prior to transmission or storage, thereby reducing the amount of data required to represent digital video images. The compressed data is then received by the video decompression device at the destination side. With limited network resources and an increasing demand for higher video quality, there is a need for improved compression and decompression techniques that can increase the compression rate with little impact on image quality.

In recent years, the application of deep learning to the field of image and video coding and decoding is becoming a trend. However, in the existing various neural network-based video or image coding and decoding schemes, the distortion strength information of the coded image cannot be obtained at the decoding end, for example, the distortion strength information of each region in one coded image and the overall distortion strength information of one coded image cannot be obtained at the decoding end.

Disclosure of Invention

The application provides an encoding method, a decoding method and related equipment, which can obtain the distortion intensity information of an encoded image at a decoding end.

The above and other objects are achieved by the subject matter of the independent claims. Other implementations are apparent from the dependent claims, the detailed description and the accompanying drawings.

Specific embodiments are outlined in the appended independent claims, and other embodiments are outlined in the dependent claims.

According to a first aspect, the application relates to a method of encoding. The method is performed by an encoding device. The method comprises the following steps: encoding an original image to obtain a first code stream; and coding a fidelity graph to obtain a second code stream, wherein the fidelity graph is used for representing distortion between at least partial region of the original image and at least partial region of a reconstructed image, and the reconstructed image is obtained by decoding the first code stream. In the embodiment of the application, an original image is encoded to obtain a first code stream, a fidelity map is encoded to obtain a second code stream, the fidelity map is used for representing distortion between at least partial areas of the original image and at least partial areas of a reconstructed image, wherein the distortion comprises difference; the decoding end decodes the first code stream to obtain a reconstructed image of the original image, and the decoding end decodes the second code stream to obtain a reconstructed image of the fidelity map (also referred to as a reconstructed fidelity map for short); if the encoding is lossless encoding, the reconstructed image of the fidelity image is the same as the fidelity image; if the encoding is lossy encoding, the reconstructed image of the fidelity map comprises encoding distortion generated by encoding the fidelity map; the fidelity map may be used to represent distortion between at least a partial region of the original image and at least a partial region of the reconstructed image; therefore, the embodiment of the application can obtain the distortion intensity information of the coded image at the decoding end.

In one possible design, the method further includes: dividing the original image into a plurality of first image blocks and dividing the reconstructed image into a plurality of second image blocks, wherein the dividing strategy for dividing the original image is the same as the dividing strategy for dividing the reconstructed image, and the plurality of first image blocks and the plurality of second image blocks are in one-to-one correspondence; or dividing the preset area of the original image into a plurality of first image blocks and dividing the preset area of the reconstructed image into a plurality of second image blocks, wherein the dividing strategy for dividing the preset area of the original image is the same as the dividing strategy for dividing the preset area of the reconstructed image, and the plurality of first image blocks and the plurality of second image blocks are in one-to-one correspondence; and calculating a fidelity value of any second image block according to any second image block in the plurality of second image blocks and the first image block corresponding to the any second image block, wherein the fidelity map comprises the fidelity value of the any second image block, and the fidelity value of the any second image block is used for representing distortion between the any second image block and the first image block corresponding to the any second image block. The position of any second image block in the plurality of second image blocks in the reconstructed image is the same as the position of a first image block corresponding to any second image block in the original image; the position of the preset area of the original image in the original image is the same as the position of the preset area of the reconstructed image in the reconstructed image, and the position of any one of the second image blocks in the preset area of the reconstructed image is the same as the position of the first image block corresponding to the any one of the second image blocks in the preset area of the original image. In the embodiment of the application, the size of the original image and the size of the reconstructed image are the same, and the position of the preset region in the original image and the size and the position in the reconstructed image are the same; dividing an original image into a plurality of first image blocks according to the same division strategy, and dividing a reconstructed image into a plurality of second image blocks; or dividing the preset area of the original image into a plurality of first image blocks according to the same division strategy, and dividing the preset area of the reconstructed image into a plurality of second image blocks; the method comprises the steps that a plurality of first image blocks obtained through division and second image blocks obtained through division have a one-to-one correspondence relationship, wherein the sizes of any first image blocks are the same, the sizes of any second image blocks are the same, and the sizes of the first image blocks and the sizes of the second image blocks are the same; therefore, the first image block and the second image block can be used as basic units for fidelity calculation, that is, a fidelity value of any second image block can be calculated according to any second image block in the plurality of second image blocks and the corresponding first image block, and the fidelity values of the plurality of second image blocks are also fidelity values of each area of the reconstructed image, so that a fidelity map can be obtained according to the fidelity values of the plurality of second image blocks; when the first image block is obtained by dividing the original image and the second image block is obtained by dividing the reconstructed image, the fidelity map is used for representing the fidelity of the reconstructed image; when the first image block is obtained by dividing the preset area of the original image and the second image block is obtained by dividing the preset area of the reconstructed image, the fidelity map is used for representing the fidelity of the preset area of the reconstructed image; thereby facilitating obtaining a fidelity map of the distortion strength information used to characterize the encoded image.

In one possible design, the fidelity map includes a plurality of first elements, the plurality of second image blocks correspond to the plurality of first elements one-to-one, a value of any first element in the plurality of first elements is a fidelity value of a second image block corresponding to the any first element, a position of the any first element in the fidelity map is determined according to a position of the second image block corresponding to the any first element in the reconstructed image, or a position of the any first element in the fidelity map is determined according to a position of the second image block corresponding to the any first element in a preset area of the reconstructed image. The first element may also be referred to as a pixel point of the fidelity map. When the fidelity graph is calculated, the fidelity of the basic units obtained by dividing the image is calculated, so that the fidelity graph has a plurality of first elements when the number of the basic units is large; wherein, any first element has two attributes, which are the fidelity value and the position of the fidelity value in the fidelity diagram respectively. In the embodiment of the present application, the fidelity map is a two-dimensional array, the reconstructed image is divided into a plurality of second image blocks, the fidelity map can be obtained according to the fidelity values of the plurality of second image blocks, that is, a plurality of first elements can be determined according to the plurality of second image blocks, the plurality of second image blocks correspond to the plurality of first elements one to one, and the value of any one of the plurality of first elements is the fidelity value of the corresponding second image block; and the position of any first element in the plurality of first elements in the fidelity map is determined according to the position of the second image block corresponding to the first element in the reconstructed image, specifically, the position of any first element in the fidelity map is the same as the position of the second image block corresponding to the first element in the fidelity map in the reconstructed image or the preset region of the reconstructed image, so that the elements at the positions of the fidelity map represent the fidelity of the region corresponding to the position in the preset region of the reconstructed image or the reconstructed image, and therefore, the fidelity map is favorable for representing the distortion intensity information of the encoded image.

In one possible design, the second image block includes three color components, the fidelity map is a three-dimensional array including the color components, three dimensions of width and height, the two-dimensional array under any color component A in the fidelity map comprises a plurality of first elements, the value of any first element in the plurality of first elements is the fidelity value of the color component A of the second image block corresponding to the any first element, the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of a second image block corresponding to any first element in the reconstructed image, or the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of the second image block corresponding to any first element in a preset area of the reconstructed image. When the fidelity map is a three-dimensional array comprising three dimensions of color components, width and height, the two-dimensional array under any color component A represented by the height comprises a plurality of row first elements, the two-dimensional array under any color component A represented by the width comprises a plurality of column first elements, the number of the plurality of first elements is equal to the product of the width and the height, and the color component A is any one of the three color components. In the embodiment of the application, the original image or the reconstructed image includes three color components, when the fidelity map is calculated, a fidelity map of a two-dimensional array is calculated under any color component, the two-dimensional array under the three color components forms the fidelity map of a three-dimensional array, a first element in the two-dimensional array under any color component a in the fidelity map of the three-dimensional array represents the fidelity of a region corresponding to the position of the reconstructed image or the reconstructed image under any color component a, and therefore the fidelity map of the three-dimensional array can represent distortion intensity information of the three color components of the encoded image.

In one possible design, the encoding the fidelity map to obtain the second code stream includes: entropy encoding said any first element to obtain said second stream of codes, said entropy encoding of said any first element being independent of entropy encoding of other first elements; or determining a probability distribution of a value of any first element or a predicted value of any first element according to a value of at least one first element in the encoded first elements, and performing entropy encoding on any first element according to the probability distribution of the value of any first element or the predicted value of any first element to obtain the second code stream; wherein the second code stream comprises a code stream of the plurality of first elements. Specifically, in the entropy encoding process of any first element in the fidelity map, if there is no encoded first element, directly performing entropy encoding on the any first element to obtain a code stream of the any first element; if the encoded first elements exist, determining the probability distribution of the value of any first element or the predicted value of any first element according to the value of at least one first element in the encoded first elements, and performing entropy coding on any first element according to the probability distribution of the value of any first element or the predicted value of any first element to obtain the code stream of any first element; wherein the second code stream comprises a code stream of the plurality of first elements. In the embodiment of the present application, the fidelity map is encoded to obtain a second code stream, that is, any first element in the fidelity map is encoded to obtain a code stream of any first element in the fidelity map, where the second code stream includes the code stream of any first element in the fidelity map; in the entropy encoding process, a probability distribution of a value of a currently encoded first element or a predicted value of the currently encoded first element may be determined by using a value of the already encoded first element, for example, a probability distribution of a value of the currently encoded first element is determined by using values of first elements adjacent to the currently encoded first element on the left, above, left, and the like, and then the currently encoded first element is entropy encoded according to the probability distribution of the value of the currently encoded first element or the predicted value of the currently encoded first element, so as to assist in improving entropy encoding efficiency.

In one possible design, the encoding the fidelity map to obtain the second code stream includes: quantizing any first element to obtain a quantized first element; encoding the quantized first element to obtain the second code stream; wherein the second code stream comprises a code stream of the plurality of first elements. The quantization step size for quantizing any one of the plurality of first elements may be the same or different. In the embodiment of the present application, the fidelity map is encoded to obtain a second code stream, that is, any first element in the fidelity map is encoded to obtain a code stream of any first element in the fidelity map, where the second code stream includes the code stream of any first element in the fidelity map; in the process of encoding any first element in the fidelity diagram, any first element can be quantized, and then any quantized first element is adopted for encoding to obtain a code stream of any first element; quantizing any first element, that is, quantizing the value of any first element, or scaling any fidelity value in the fidelity map; the purpose of quantization is to narrow the dynamic range of the fidelity values in the fidelity map to reduce the coding overhead of the fidelity map.

According to a second aspect, the application relates to a decoding method. The method is performed by a decoding device. The method comprises the following steps: decoding the first code stream to obtain a reconstructed image of the original image; and decoding a second code stream to obtain a reconstructed image of a fidelity map, wherein the second code stream is obtained by encoding the fidelity map, and the reconstructed image of the fidelity map is used for representing distortion between at least partial areas of the original image and at least partial areas of the reconstructed image. In the embodiment of the application, an original image is coded to obtain a first code stream, and a fidelity map is coded to obtain a second code stream, wherein the fidelity map is used for representing distortion between at least partial areas of the original image and at least partial areas of a reconstructed image, and the distortion comprises difference; the decoding end decodes the first code stream to obtain a reconstructed image of the original image, and the decoding end decodes the second code stream to obtain a reconstructed image of the fidelity map; if the encoding is lossless encoding, the reconstructed image of the fidelity image is the same as the fidelity image; if the encoding is lossy encoding, the reconstructed image of the fidelity map comprises encoding distortion generated by encoding the fidelity map; the reconstructed map of the fidelity map may be used to represent distortion between at least a region of the original image and at least a region of the reconstructed image; therefore, the embodiment of the application can obtain the distortion intensity information of the coded image at the decoding end.

In one possible design, the fidelity map includes a fidelity value of any second image block in the plurality of second image blocks, and the fidelity value of any second image block is used to represent distortion between any second image block and an original image block corresponding to any second image block. The plurality of second image blocks are obtained by dividing the reconstructed image, the plurality of second image blocks correspond to a plurality of original image blocks one by one, the original image blocks are image blocks in the original image, and for example, the original image blocks are the first image blocks; the plurality of original image blocks are obtained by dividing the original image, the plurality of second image blocks are obtained by dividing the reconstructed image, and the dividing strategy for dividing the original image is the same as the dividing strategy for dividing the reconstructed image; or the plurality of original image blocks are obtained by dividing the preset area of the original image, the plurality of second image blocks are obtained by dividing the preset area of the reconstructed image, and the dividing strategy for dividing the preset area of the original image is the same as the dividing strategy for dividing the preset area of the reconstructed image.

In one possible design, the fidelity map includes a plurality of first elements, the plurality of second image blocks correspond to the plurality of first elements one-to-one, a value of any first element in the plurality of first elements is a fidelity value of a second image block corresponding to the any first element, a position of the any first element in the fidelity map is determined according to a position of the second image block corresponding to the any first element in the reconstructed image, or a position of the any first element in the fidelity map is determined according to a position of the second image block corresponding to the any first element in a preset area of the reconstructed image.

In one possible design, the second image block includes three color components, the fidelity map is a three-dimensional array including the color components, three dimensions of width and height, the two-dimensional array under any color component A in the fidelity map comprises a plurality of first elements, the value of any first element in the plurality of first elements is the fidelity value of the color component A of the second image block corresponding to the any first element, the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of a second image block corresponding to any first element in the reconstructed image, or the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of the second image block corresponding to any first element in a preset area of the reconstructed image.

In one possible design, the decoding the second code stream to obtain a reconstructed picture of the fidelity map includes: decoding the second code stream to obtain a reconstruction fidelity value of any first element; and obtaining a reconstructed image of the fidelity map according to the reconstructed fidelity value of any first element. Wherein the reconstructed fidelity value of the first element is the reconstruction of the value of the first element. And the position of the reconstruction fidelity value of any first element in the reconstruction image of the fidelity map is determined according to the position of the second image block corresponding to any first element in the reconstruction image. Or, the second code stream comprises the position of any first element in the fidelity map; the position of the reconstructed fidelity value of any first element in the reconstructed image of the fidelity map is determined according to the position of any first element in the fidelity map. In the embodiment of the application, the second code stream comprises a code stream of any first element in the fidelity diagram, so that the reconstruction fidelity value of any first element can be obtained by decoding the second code stream; it should be understood that if the encoding is lossless, then the reconstruction fidelity value of the first element is the value of the first element; if the encoding is lossy encoding, the reconstruction fidelity value of the first element comprises encoding distortion generated by encoding the first element, and the reconstruction fidelity value of the first element is the sum of the value of the first element and the encoding distortion; therefore, a reconstructed image of the fidelity map can be obtained according to the reconstructed fidelity value of any first element, so that the reconstructed image of the fidelity map can be used for representing distortion between at least partial areas of the original image and at least partial areas of the reconstructed image; therefore, the embodiment of the application can obtain the distortion strength information of the coded image at the decoding end.

In a possible design, the second code stream is obtained by encoding the quantized first element; the decoding the second code stream to obtain a reconstructed image of the fidelity map comprises: decoding the second code stream to obtain a reconstructed fidelity value of the quantized first element; performing inverse quantization on the quantized reconstruction fidelity value of the first element to obtain a reconstruction fidelity value of any one first element; and obtaining a reconstructed image of the fidelity map according to the reconstructed fidelity value of any first element. In the embodiment of the application, in order to reduce the encoding overhead, the encoding end may quantize the first element and then encode the first element to obtain the code stream of the first element, so the code stream of the first element obtained by the decoding end may be obtained by encoding the quantized first element; in this case, the second code stream is decoded to obtain a reconstructed fidelity value of the quantized first element, and the reconstructed fidelity value of the quantized first element needs to be inversely quantized to obtain a reconstructed fidelity value of any first element; then, a reconstructed image of the fidelity diagram can be obtained according to the reconstructed fidelity value of any first element; therefore, the method can obtain the distortion intensity information of the coded image at the decoding end and reduce the coding overhead.

In one possible design, the method further includes: processing the reconstructed image or the preset area of the reconstructed image according to the reconstructed image of the fidelity map so as to improve the image quality of the reconstructed image or the preset area of the reconstructed image; or determining whether to apply the reconstructed image according to a reconstructed image of the fidelity map. In the embodiment of the application, the decoding end can process the reconstructed image or the preset area of the reconstructed image according to the reconstructed image of the fidelity map so as to improve the image quality of the reconstructed image or the preset area of the reconstructed image; or determining whether to apply a reconstructed image according to the reconstructed image of the fidelity map; thereby facilitating the application of the reconstructed image.

According to a third aspect, the application relates to a decoding method. The method is performed by a decoding device. The method comprises the following steps: decoding the first code stream to obtain a reconstructed image of an original image and target quantization parameter information, wherein the target quantization parameter information comprises quantization parameter values of all or part of second image blocks in a plurality of second image blocks of the reconstructed image; and constructing a quantization parameter map of the reconstructed image according to the target quantization parameter information, wherein the quantization parameter map of the reconstructed image is used for representing distortion between at least partial areas of the original image and at least partial areas of the reconstructed image. It should be understood that at the decoding end, the main purpose of the quantization parameter is to do an inverse quantization operation; of course, the quantization parameter itself represents the signal distortion (fidelity), so that a quantization parameter map of the reconstructed image constructed from the quantization parameter can be used to represent the distortion between at least a partial region of the original image and at least a partial region of the reconstructed image. In the prior art, a decoding end performs a decoding operation on a first code stream obtained by encoding an original image, and quantization parameter values of each region (second image block) of a reconstructed image cannot be obtained. In the embodiment of the application, a decoding end performs decoding operation on a first code stream obtained by encoding an original image to obtain quantization parameter values of each region (second image block) of a reconstructed image; the quantization parameter map of the reconstructed image can be constructed according to the quantization parameter values of all or part of the second image blocks in the plurality of second image blocks of the reconstructed image, and the quantization parameter map of the reconstructed image can be used for representing distortion between at least a partial area of the original image and at least a partial area of the reconstructed image.

In one possible design, the second image block is a coding unit. In the embodiment of the application, a coding end divides an original image into a plurality of coding units, and codes the plurality of coding units obtained by dividing the original image to obtain a first code stream; the decoding end decodes the first code stream to obtain a reconstructed image of the original image and target quantization parameter information, wherein the target quantization parameter information comprises quantization parameter values of all or part of the plurality of coding units; a quantization parameter map of the reconstructed image can be constructed according to the target quantization parameter information; the quantization parameter map of the reconstructed image is a fidelity map in a form, and when the target quantization parameter information includes quantization parameter values of all the coding units in the plurality of coding units, the quantization parameter map of the reconstructed image is a fidelity map of the entire reconstructed image; when the target quantization parameter information includes quantization parameter values of a part of the encoding units among the plurality of encoding units, the quantization parameter map of the reconstructed image is a fidelity map of a preset region of the reconstructed image; therefore, the quantization parameter map of the reconstructed image can be used for representing the fidelity of the reconstructed image or representing the fidelity of a preset area of the reconstructed image; therefore, the embodiment of the application can obtain the distortion intensity information of the coded image at the decoding end.

In a possible design, the quantization parameter map of the reconstructed image includes a plurality of second elements, the plurality of second image blocks are in one-to-one correspondence with the plurality of second elements, a value of any second element in the plurality of second elements is a quantization parameter value of a second image block corresponding to the any second element, a position of the any second element in the quantization parameter map of the reconstructed image is determined according to a position of the second image block corresponding to the any second element in the reconstructed image, or a position of the any second element in the quantization parameter map of the reconstructed image is determined according to a position of the second image block corresponding to the any second element in a preset area of the reconstructed image. The second element may also be referred to as a pixel point of the quantization parameter map.

In one possible design, the second image block includes three color components, the quantization parameter map of the reconstructed image is a three-dimensional array including three dimensions, namely color components, width and height, the two-dimensional array under any color component a in the quantization parameter map of the reconstructed image includes a plurality of second elements, a value of any second element in the plurality of second elements is a quantization parameter value of a color component a of the second image block corresponding to the any second element, a position of the two-dimensional array under any color component a of the quantization parameter map of the reconstructed image of any second element in the quantization parameter map of the reconstructed image is determined according to a position of the second image block corresponding to the any second element in the reconstructed image, or a position of the two-dimensional array under any color component a in the quantization parameter map of the reconstructed image is determined according to a position of the second image block corresponding to the any second element in a preset region of the reconstructed image And (4) determining the position.

In one possible design, the constructing the quantization parameter map of the reconstructed image according to the target quantization parameter information includes: when the target quantization parameter information includes quantization parameter values of a part of the plurality of coding units, obtaining a quantization parameter value of a target coding unit according to the quantization parameter values of the part of the coding units and/or a reference quantization parameter map, wherein the reference quantization parameter map is a quantization parameter map of a reference image of the reconstructed image, and the target coding unit is a coding unit of the plurality of coding units except the part of the coding units; and obtaining a quantization parameter map of the reconstructed image according to the quantization parameter values of the partial coding units and the quantization parameter values of the target coding unit. Specifically, when the target quantization parameter information includes quantization parameter values of all coding units in the plurality of coding units, a quantization parameter map of the reconstructed image is obtained according to the quantization parameter values of all coding units; when the target quantization parameter information includes quantization parameter values of a part of the plurality of coding units, obtaining a quantization parameter map of the reconstructed image according to the quantization parameter values of the part of the coding units, or obtaining the quantization parameter map of the reconstructed image according to the quantization parameter values of the part of the coding units and a reference quantization parameter map, where the reference quantization parameter map is a quantization parameter map of a reference image of the reconstructed image. In the embodiment of the present application, when the target quantization parameter information includes quantization parameter values of all coding units in the multiple coding units, a quantization parameter map of the reconstructed image may be obtained according to the quantization parameter values of all coding units, and the obtained quantization parameter map of the reconstructed image in this case may be used to characterize the fidelity of the entire reconstructed image; when the target quantization parameter information includes quantization parameter values of some coding units in the multiple coding units, a quantization parameter map of the reconstructed image may be obtained according to the quantization parameter values of some coding units, and the obtained quantization parameter map of the reconstructed image in this case may be used to represent fidelity of a preset region of the reconstructed image; when the target quantization parameter information includes quantization parameter values of a part of the plurality of coding units, a quantization parameter map of the reconstructed image can be obtained according to the quantization parameter values of the part of the coding units and the reference quantization parameter map, and since the reference quantization parameter map is a quantization parameter map of a reference image of the reconstructed image, a quantization parameter value of any one of the plurality of coding units except the part of the coding units can be obtained according to the reference quantization parameter map, so that a quantization parameter of any one of the plurality of coding units can be obtained, and the obtained quantization parameter map of the reconstructed image in this case can be used for representing the fidelity of the whole reconstructed image or representing the fidelity of a preset region of the reconstructed image; therefore, in any case where the target quantization parameter information obtained by decoding includes quantization parameter values of all or some of the plurality of coding units, the embodiment of the present application can obtain a quantization parameter map of the reconstructed image that represents the fidelity of the reconstructed image or represents the fidelity of a preset region of the reconstructed image.

In one possible design, the obtaining a quantization parameter value of a target coding unit according to the quantization parameter value of the partial coding unit includes: determining a quantization parameter value of the target coding unit according to a quantization parameter value of at least one coding unit of the partial coding units. In this embodiment of the present application, when a quantization parameter value of a certain coding unit cannot be obtained from a first code stream, the quantization parameter of a spatial neighborhood of the coding unit may be used for filling. Specifically, when the target quantization parameter information includes the quantization parameter values of a part of the coding units, the quantization parameter values of any one of the coding units except the part of the coding units can be determined according to the quantization parameter value of at least one of the part of the coding units, so that the quantization parameter values of any one of the coding units can be ensured; moreover, a quantization parameter map of the reconstructed image can be obtained according to quantization parameter values of part or all of the plurality of coding units; when the quantization parameter map of the reconstructed image is obtained according to the quantization parameter values of all the coding units in the plurality of coding units, the obtained quantization parameter map of the reconstructed image can also be used for representing the fidelity of the whole reconstructed image; when the quantization parameter map of the reconstructed image is obtained according to the quantization parameter values of some of the plurality of coding units, the obtained quantization parameter map of the reconstructed image may also be used to characterize the fidelity of the preset region of the reconstructed image.

In one possible design, the reference quantization parameter map includes a plurality of reference elements, a value of any one of the plurality of reference elements being a quantization parameter value of a coding unit in the reference picture; the obtaining a quantization parameter value of a target coding unit according to a reference quantization parameter map includes: and taking the value of a target element as a quantization parameter value of any target coding unit in the target coding units, wherein the target element is a reference element in the reference quantization parameter map, and the position of the target element in the reference quantization parameter map is determined according to the position of the any target coding unit in the reconstructed image, or the position of the target element in the reference quantization parameter map is determined according to the position of the any target coding unit in the reconstructed image and the motion vector of the any target coding unit. Wherein the reference element is another name of the second element. In this embodiment of the present application, when a quantization parameter value of a certain coding unit cannot be obtained from a first code stream, the quantization parameter of a time neighborhood of the coding unit may be used for filling. Specifically, for any one coding unit of quantization parameter values which cannot be obtained from the first code stream; the value of the target element in the reference quantization parameter map may be used as the quantization parameter value of the coding unit, the position of the target element in the reference quantization parameter map is determined according to the position of the coding unit in the reconstructed image, or the position of the target element in the reference quantization parameter map is determined according to the position of the coding unit in the reconstructed image and the motion vector of the coding unit. Thus, the quantization parameter value of any one coding unit in a plurality of coding units can be obtained; moreover, a quantization parameter map of the reconstructed image can be obtained according to quantization parameter values of part or all of the plurality of coding units; when the quantization parameter map of the reconstructed image is obtained according to the quantization parameter values of all the coding units in the plurality of coding units, the obtained quantization parameter map of the reconstructed image can also be used for representing the fidelity of the whole reconstructed image; when the quantization parameter map of the reconstructed image is obtained according to the quantization parameter values of some of the plurality of coding units, the obtained quantization parameter map of the reconstructed image may also be used to characterize the fidelity of the preset region of the reconstructed image.

In one possible design, the method further includes: and storing the reconstructed image and the quantization parameter map of the reconstructed image in a correlation mode so as to take the reconstructed image as a reference image and take the quantization parameter map of the reconstructed image as a reference quantization parameter map. In the embodiment of the present application, the quantization parameter map of the reconstructed image may be stored for use in a reference quantization parameter map of a constructed quantization parameter map of a subsequent decoded image, thereby facilitating construction of the quantization parameter map of the subsequent decoded image.

In one possible design, the method further includes: processing the reconstructed image or the preset region of the reconstructed image according to the quantitative parameter map of the reconstructed image so as to improve the image quality of the reconstructed image or the preset region of the reconstructed image; or determining whether to apply the reconstructed image according to the quantization parameter map of the reconstructed image. In the embodiment of the application, the decoding end can process the reconstructed image or the preset area of the reconstructed image according to the reconstructed image of the fidelity map so as to improve the image quality of the reconstructed image or the preset area of the reconstructed image; or determining whether to apply a reconstructed image according to a reconstructed image of the fidelity map; thereby facilitating the application of the reconstructed image.

According to a fourth aspect, the present application relates to an encoding apparatus, and beneficial effects may be referred to the description of the first aspect, which is not described herein again. The encoding device includes: the video encoder is used for encoding an original image to obtain a first code stream; and the fidelity map encoder is used for encoding the fidelity map to obtain a second code stream, wherein the fidelity map is used for representing distortion between at least partial area of the original image and at least partial area of a reconstructed image, and the reconstructed image is obtained after decoding the first code stream.

In one possible design, the encoding apparatus further includes a fidelity map calculator to: dividing the original image into a plurality of first image blocks and dividing the reconstructed image into a plurality of second image blocks, wherein the dividing strategy for dividing the original image is the same as the dividing strategy for dividing the reconstructed image, and the plurality of first image blocks and the plurality of second image blocks are in one-to-one correspondence; or dividing the preset area of the original image into a plurality of first image blocks and dividing the preset area of the reconstructed image into a plurality of second image blocks, wherein the dividing strategy for dividing the preset area of the original image is the same as the dividing strategy for dividing the preset area of the reconstructed image, and the plurality of first image blocks and the plurality of second image blocks are in one-to-one correspondence; and calculating a fidelity value of any second image block according to any second image block in the plurality of second image blocks and the first image block corresponding to the any second image block, wherein the fidelity map comprises the fidelity value of the any second image block, and the fidelity value of the any second image block is used for representing distortion between the any second image block and the first image block corresponding to the any second image block.

In one possible design, the fidelity map encoder is specifically configured to: entropy coding any one first element to obtain the second code stream, wherein the entropy coding of any one first element is independent of the entropy coding of other first elements; or determining a probability distribution of a value of any first element or a predicted value of any first element according to a value of at least one first element in the encoded first elements, and performing entropy encoding on any first element according to the probability distribution of the value of any first element or the predicted value of any first element to obtain the second code stream; wherein the second code stream comprises a code stream of the plurality of first elements.

In one possible design, the fidelity map encoder is specifically configured to: quantizing any first element to obtain a quantized first element; encoding the quantized first element to obtain the second code stream; wherein the second code stream comprises a code stream of the plurality of first elements.

According to a fifth aspect, the present application relates to a decoding device, and beneficial effects may be referred to the description of the second aspect, which is not described herein again. The decoding apparatus includes: the video decoder is used for decoding the first code stream to obtain a reconstructed image of the original image; and the fidelity diagram decoder is used for decoding a second code stream to obtain a reconstructed diagram of a fidelity diagram, wherein the second code stream is obtained by encoding the fidelity diagram, and the reconstructed diagram of the fidelity diagram is used for representing distortion between at least partial areas of the original image and at least partial areas of the reconstructed image.

In one possible design, the fidelity map includes a fidelity value of any second image block in the plurality of second image blocks, and the fidelity value of any second image block is used to represent distortion between any second image block and an original image block corresponding to any second image block.

In one possible design, the fidelity diagram decoder is specifically configured to: decoding the second code stream to obtain a reconstruction fidelity value of any first element; and obtaining a reconstructed image of the fidelity map according to the reconstructed fidelity value of any first element.

In a possible design, the second code stream is obtained by encoding the quantized first element; the fidelity diagram decoder is specifically configured to: decoding the second code stream to obtain a reconstructed fidelity value of the quantized first element; performing inverse quantization on the quantized reconstruction fidelity value of the first element to obtain a reconstruction fidelity value of any one first element; and obtaining a reconstructed image of the fidelity map according to the reconstructed fidelity value of any first element.

According to a sixth aspect, the present application relates to a decoding device, and beneficial effects may be referred to the description of the third aspect, which is not described herein again. The decoding apparatus includes: the video decoder is used for decoding the first code stream to obtain a reconstructed image of an original image and target quantization parameter information, wherein the target quantization parameter information comprises quantization parameter values of all or part of second image blocks in a plurality of second image blocks of the reconstructed image; and a quantization parameter map builder for building a quantization parameter map of the reconstructed image according to the target quantization parameter information, wherein the quantization parameter map of the reconstructed image is used for representing distortion between at least a partial region of the original image and at least a partial region of the reconstructed image.

In one possible design, the second image block is a coding unit.

In one possible design, the quantization parameter map of the reconstructed image includes a plurality of second elements, the plurality of second image blocks are in one-to-one correspondence with the plurality of second elements, a value of any one of the plurality of second elements is a quantization parameter value of a second image block corresponding to the any one of the plurality of second elements, a position of the any one of the second elements in the quantization parameter map of the reconstructed image is determined according to a position of a second image block corresponding to the any one of the second elements in the reconstructed image, or a position of the any one of the second elements in the quantization parameter map of the reconstructed image is determined according to a position of a second image block corresponding to the any one of the second elements in a preset area of the reconstructed image.

In one possible design, the quantized parameter map builder is specifically configured to: when the target quantization parameter information comprises quantization parameter values of a part of the plurality of coding units, obtaining a quantization parameter value of a target coding unit according to the quantization parameter values of the part of the coding units and/or a reference quantization parameter map, wherein the reference quantization parameter map is a quantization parameter map of a reference image of the reconstructed image, and the target coding unit is a coding unit of the plurality of coding units except the part of the coding units; and obtaining a quantization parameter map of the reconstructed image according to the quantization parameter values of the partial coding units and the quantization parameter values of the target coding unit.

In one possible design, the reference quantization parameter map includes a plurality of reference elements, and a value of any one of the plurality of reference elements is a quantization parameter value of a coding unit in the reference image; the quantization parameter map builder is specifically configured to: and taking the value of a target element as the quantization parameter value of any target coding unit in the target coding units, wherein the target element is a reference element in the reference quantization parameter map, and the position of the target element in the reference quantization parameter map is determined according to the position of the any target coding unit in the reconstructed image, or the position of the target element in the reference quantization parameter map is determined according to the position of the any target coding unit in the reconstructed image and the motion vector of the any target coding unit.

According to the seventh aspect, the present application relates to an encoding apparatus, and beneficial effects may refer to the description of the first aspect, which is not described herein again. The encoding apparatus has a function of realizing the behavior in the method example of the first aspect described above. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions. In one possible design, the decoding apparatus includes a processing unit configured to: encoding an original image to obtain a first code stream; and coding a fidelity graph to obtain a second code stream, wherein the fidelity graph is used for representing distortion between at least partial region of the original image and at least partial region of a reconstructed image, and the reconstructed image is obtained by decoding the first code stream.

In one possible design, the processing unit is further configured to: dividing the original image into a plurality of first image blocks and dividing the reconstructed image into a plurality of second image blocks, wherein the dividing strategy for dividing the original image is the same as the dividing strategy for dividing the reconstructed image, and the plurality of first image blocks and the plurality of second image blocks are in one-to-one correspondence; or dividing the preset area of the original image into a plurality of first image blocks and dividing the preset area of the reconstructed image into a plurality of second image blocks, wherein the dividing strategy for dividing the preset area of the original image is the same as the dividing strategy for dividing the preset area of the reconstructed image, and the plurality of first image blocks and the plurality of second image blocks are in one-to-one correspondence; and calculating a fidelity value of any second image block according to any second image block in the plurality of second image blocks and the first image block corresponding to the any second image block, wherein the fidelity map comprises the fidelity value of the any second image block, and the fidelity value of the any second image block is used for representing distortion between the any second image block and the first image block corresponding to the any second image block.

In one possible design, the second image block includes three color components, the fidelity map is a three-dimensional array including the color components, three dimensions of width and height, the two-dimensional array under any color component A in the fidelity map comprises a plurality of first elements, the value of any first element in the plurality of first elements is the fidelity value of the color component A of the second image block corresponding to the any first element, the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of the second image block corresponding to the any first element in the reconstructed image, or the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of the second image block corresponding to any first element in a preset area of the reconstructed image.

In one possible design, the processing unit is specifically configured to: entropy encoding said any first element to obtain said second stream of codes, said entropy encoding of said any first element being independent of entropy encoding of other first elements; or determining a probability distribution of a value of any first element or a predicted value of any first element according to a value of at least one first element in the encoded first elements, and performing entropy encoding on any first element according to the probability distribution of the value of any first element or the predicted value of any first element to obtain the second code stream; wherein the second code stream comprises a code stream of the plurality of first elements.

In one possible design, the processing unit is specifically configured to: quantizing any first element to obtain a quantized first element; encoding the quantized first element to obtain the second code stream; wherein the second code stream comprises a code stream of the plurality of first elements.

According to an eighth aspect, the present application relates to an encoding apparatus, and the beneficial effects can be seen from the description of the second aspect, which is not described herein again. The encoding apparatus has a function of realizing the behavior in the method example of the second aspect described above. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions. In one possible design, the decoding apparatus includes a processing unit configured to: decoding the first code stream to obtain a reconstructed image of the original image; and decoding a second code stream to obtain a reconstructed image of a fidelity map, wherein the second code stream is obtained by encoding the fidelity map, and the reconstructed image of the fidelity map is used for representing distortion between at least partial areas of the original image and at least partial areas of the reconstructed image.

In one possible design, the processing unit is specifically configured to: decoding the second code stream to obtain a reconstruction fidelity value of any first element; and obtaining a reconstructed image of the fidelity map according to the reconstructed fidelity value of any first element.

In a possible design, the second code stream is obtained by encoding the quantized first element; the processing unit is specifically configured to: decoding the second code stream to obtain a reconstructed fidelity value of the quantized first element; performing inverse quantization on the quantized reconstruction fidelity value of the first element to obtain a reconstruction fidelity value of any one first element; and obtaining a reconstructed image of the fidelity map according to the reconstructed fidelity value of any first element.

According to a ninth aspect, the present application relates to an encoding apparatus, and beneficial effects can be seen from the description of the third aspect, which is not described herein again. The encoding apparatus has a function of realizing the behavior in the method example of the third aspect described above. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions. In one possible design, the decoding apparatus includes a processing unit configured to: decoding the first code stream to obtain a reconstructed image of an original image and target quantization parameter information, wherein the target quantization parameter information comprises quantization parameter values of all or part of second image blocks in a plurality of second image blocks of the reconstructed image; and constructing a quantization parameter map of the reconstructed image according to the target quantization parameter information, wherein the quantization parameter map of the reconstructed image is used for representing distortion between at least partial areas of the original image and at least partial areas of the reconstructed image.

In one possible design, the second image block is a coding unit.

In one possible design, the processing unit is specifically configured to: when the target quantization parameter information includes quantization parameter values of a part of the plurality of coding units, obtaining a quantization parameter value of a target coding unit according to the quantization parameter values of the part of the coding units and/or a reference quantization parameter map, wherein the reference quantization parameter map is a quantization parameter map of a reference image of the reconstructed image, and the target coding unit is a coding unit of the plurality of coding units except the part of the coding units; and obtaining a quantization parameter map of the reconstructed image according to the quantization parameter values of the partial coding units and the quantization parameter values of the target coding unit.

In one possible design, the reference quantization parameter map includes a plurality of reference elements, and a value of any one of the plurality of reference elements is a quantization parameter value of a coding unit in the reference image; the processing unit is specifically configured to: and taking the value of a target element as a quantization parameter value of any target coding unit in the target coding units, wherein the target element is a reference element in the reference quantization parameter map, and the position of the target element in the reference quantization parameter map is determined according to the position of the any target coding unit in the reconstructed image, or the position of the target element in the reference quantization parameter map is determined according to the position of the any target coding unit in the reconstructed image and the motion vector of the any target coding unit.

The method of the first aspect of the present application may be performed by an apparatus of the seventh aspect of the present application. Further features and implementations of the method of the first aspect of the present application are directly dependent on the functionality and implementations of the apparatus of the seventh aspect of the present application.

The method of the second aspect of the present application may be performed by an apparatus of the eighth aspect of the present application. Further features and implementations of the method according to the second aspect of the present application are directly dependent on the functionality and implementation of the apparatus according to the eighth aspect of the present application.

The method of the third aspect of the present application may be performed by an apparatus of the ninth aspect of the present application. Further features and implementations of the method according to the second aspect of the present application are directly dependent on the functionality and implementation of the apparatus according to the ninth aspect of the present application.

According to a tenth aspect, the application relates to an apparatus for encoding a video stream, comprising a processor and a memory. The memory stores instructions that cause the processor to perform the method of the first aspect.

According to an eleventh aspect, the application relates to an apparatus for decoding a video stream, comprising a processor and a memory. The memory stores instructions that cause the processor to perform the method of the second aspect.

According to a twelfth aspect, the application relates to an apparatus for decoding a video stream, comprising a processor and a memory. The memory stores instructions that cause the processor to perform the method of the third aspect.

According to a thirteenth aspect, the present application provides a computer-readable storage medium having instructions stored thereon that, when executed, cause one or more processors to encode video data. The instructions cause the one or more processors to perform the method of the first, second or third aspect or any one of the possible embodiments of the first, second or third aspect.

According to a fourteenth aspect, the present application relates to a computer program product comprising program code which, when run, performs the method of the first, second or third aspect or any one of the possible embodiments of the first, second or third aspect.

According to a fifteenth aspect, the present application relates to an encoder (20) comprising processing circuitry for performing the method of the first aspect or any one of the possible embodiments of the first aspect.

According to a sixteenth aspect, the application relates to a decoder (30) comprising processing circuitry for performing the second or third aspect or the method in any one of the possible embodiments of the second or third aspect.

According to a seventeenth aspect, the present application relates to an encoder comprising: one or more processors; a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, causes the encoder to perform the method of the first aspect or any one of the possible embodiments of the first aspect.

According to an eighteenth aspect, the application relates to a decoder comprising: one or more processors; a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, causes the decoder to perform the method of the second or third aspect or any one of the possible embodiments of the second or third aspect.

According to a nineteenth aspect, the present application relates to a non-transitory computer readable storage medium comprising program code for performing the method of the first, second or third aspect or any one of the possible embodiments of the first, second or third aspect when executed by a computer device.

According to a twentieth aspect, the present application relates to a non-transitory storage medium comprising a bitstream encoded according to the method of the first aspect or any one of the possible embodiments of the first aspect.

According to a twenty-first aspect, the present application relates to an electronic device comprising the encoding device of the fourth aspect and/or the decoding device of the fifth or sixth aspect.

The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

Drawings

The drawings used in the embodiments of the present application are described below.

FIG. 1A is a block diagram of an example of a video coding system for implementing embodiments of the present application, wherein the system utilizes a neural network to encode or decode video images;

FIG. 1B is a block diagram of another example of a video coding system for implementing embodiments of the present application, wherein the video encoder and/or video decoder uses a neural network to encode or decode video images;

FIG. 2 is a block diagram of an example of a video encoder for implementing embodiments of the present application, wherein the video encoder 20 uses a neural network to encode video images;

FIG. 3 is a block diagram of an example of a video decoder for implementing embodiments of the present application, wherein the video decoder 30 uses a neural network to decode video images;

FIG. 4 is a schematic block diagram of a video coding device for implementing an embodiment of the present application;

FIG. 5 is a schematic block diagram of a video coding device for implementing an embodiment of the present application;

FIG. 6 is a schematic diagram of an image codec based on a deep neural network for implementing an embodiment of the present application;

fig. 7 is a schematic block diagram of an encoding method provided in an embodiment of the present application;

fig. 8 is a schematic diagram illustrating division of an original image or a preset region of the original image according to an embodiment of the present disclosure;

fig. 9 is a schematic diagram illustrating division of a reconstructed image or a preset region of the reconstructed image according to an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of a fidelity map provided by an embodiment of the present application;

fig. 11 is a schematic block diagram of a decoding method provided in an embodiment of the present application;

FIG. 12 is a schematic diagram of a reconstructed image of a fidelity map provided in an embodiment of the present application;

fig. 13 is a schematic block diagram of another decoding method provided in the embodiments of the present application;

FIG. 14 is a diagram of a quantization parameter map provided by an embodiment of the present application;

FIG. 15 is a diagram illustrating another example of a quantization parameter map provided in an embodiment of the present application;

fig. 16 is a schematic block diagram of an encoding apparatus provided in an embodiment of the present application;

Fig. 17 is a schematic block diagram of a decoding device provided in an embodiment of the present application;

fig. 18 is a schematic block diagram of a decoding device provided in an embodiment of the present application;

fig. 19 is a schematic block diagram of an encoding apparatus according to an embodiment of the present application;

fig. 20 is a schematic block diagram of a decoding apparatus according to an embodiment of the present application;

fig. 21 is a schematic block diagram of a decoding apparatus according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a video image compression technology based on artificial intelligence, in particular provides a video compression technology based on Neural Network (NN), and particularly provides an inter-frame prediction technology/intra-frame prediction technology/filtering technology based on Neural Network so as to improve a traditional hybrid video coding and decoding system.

Video coding generally refers to processing a sequence of images that form a video or video sequence. In the field of video coding, the terms "image", "frame" or "picture" may be used as synonyms. Video encoding (or encoding in general) includes both video encoding and video decoding. Video encoding is performed on the source side, typically involving processing (e.g., compressing) the original video image to reduce the amount of data required to represent the video image (and thus more efficient storage and/or transmission). Video decoding is performed at the destination side, typically involving inverse processing with respect to the encoder, to reconstruct the video image. Embodiments are directed to "encoding" of video images (or generally referred to as pictures) to be understood as "encoding" or "decoding" of video images or video sequences. The encoding portion and the decoding portion are also collectively referred to as a CODEC (coding and decoding, CODEC).

In the case of lossless video coding, the original video image can be reconstructed, i.e., the reconstructed video image has the same quality as the original video image (assuming no transmission loss or other data loss during storage or transmission). In the case of lossy video coding, further compression is performed by quantization or the like to reduce the amount of data required to represent the video image, whereas the decoder side cannot reconstruct the video image completely, i.e. the quality of the reconstructed video image is lower or worse than the quality of the original video image.

Several video coding standards belong to the "lossy hybrid video codec" (i.e., the combination of spatial and temporal prediction in the pixel domain and 2D transform coding in the transform domain for applying quantization). Any image in a video sequence is typically partitioned into non-overlapping sets of blocks, typically encoded at the block level. In other words, the encoder typically processes, i.e., encodes, the video at the block (video block) level, e.g., producing prediction blocks by spatial (intra) prediction and temporal (inter) prediction; subtracting the prediction block from the current block (currently processed/block to be processed) to obtain a residual block; the residual block is transformed and quantized in the transform domain to reduce the amount of data to be transmitted (compressed), while the decoder side applies the inverse processing part with respect to the encoder to the encoded or compressed block to reconstruct the current block for representation. In addition, the encoder needs to repeat the processing steps of the decoder so that the encoder and decoder generate the same prediction (e.g., intra-prediction and inter-prediction) and/or reconstructed pixels for processing, i.e., encoding, of subsequent blocks.

In the following embodiments of the decoding system 10, the encoder 20 and decoder 30 are described with respect to fig. 1A-3.

Fig. 1A is a schematic block diagram of an exemplary coding system 10, such as a video coding system 10 (or simply coding system 10) that may utilize the techniques of the present application. Video encoder 20 (or simply encoder 20) and video decoder 30 (or simply decoder 30) in video coding system 10 represent, among other things, devices that may be used to perform techniques in accordance with various examples described in this application.

As shown in FIG. 1A, transcoding system 10 includes a source device 12, source device 12 being configured to provide encoded image data 21, such as an encoded image, to a destination device 14 for decoding encoded image data 21. The encoded image data is also referred to as a bit stream, a compressed code stream, or a code stream, so the encoded image data 21 may also be referred to as a bit stream 21, a compressed code stream 21, or a code stream 21.

Source device 12 includes an encoder 20 and may additionally, or alternatively, include an image source 16, a pre-processor (or pre-processing unit) 18 such as an image pre-processor, a communication interface (or communication unit) 22.

Image sources 16 may include or may be any type of image capture device for capturing real-world images and the like, and/or any type of image generation device, such as a computer graphics processor for generating computer-animated images or any type of device for acquiring and/or providing real-world images, computer-generated images (e.g., screen content, Virtual Reality (VR) images, and/or any combination thereof (e.g., Augmented Reality (AR) images)).

In order to distinguish the processing performed by the preprocessor (or preprocessing unit) 18, the image (or image data) 17 may also be referred to as an original image (or original image data) 17.

Preprocessor 18 is configured to receive (raw) image data 17 and preprocess image data 17 to obtain a preprocessed image (or preprocessed image data) 19. For example, the pre-processing performed by pre-processor 18 may include pruning, color format conversion (e.g., from RGB to YCbCr), toning, or denoising. It will be appreciated that the pre-processing unit 18 may be an optional component.

A video encoder (or encoder) 20 is operative to receive pre-processed image data 19 and provide encoded image data 21 (described further below with respect to fig. 2 and/or the like).

The communication interface 22 in the source device 12 may be used to: receives encoded image data 21 and transmits encoded image data 21 (or any other processed version) over communication channel 13 to another device, such as destination device 14, or any other device for storage or direct reconstruction.

The destination device 14 includes a decoder 30 and may additionally, or alternatively, include a communication interface (or communication unit) 28, a post-processor (or post-processing unit) 32, and a display device 34.

Communication interface 28 in destination device 14 is configured to receive encoded image data 21 (or any other processed version) either directly from source device 12 or from any other source device, such as a storage device, for example, an encoded image data storage device, and to provide encoded image data 21 to decoder 30.

The communication interface 22 and the communication interface 28 may be used to transmit or receive encoded image data (or encoded data) 21 over a direct communication link, such as a direct wired or wireless connection, etc., between the source device 12 and the destination device 14, or over any type of network, such as a wired network, a wireless network, or any combination thereof, any type of private and public networks, or any type of combination thereof.

For example, communication interface 22 may be used to encapsulate encoded image data 21 into a suitable format such as a message and/or process the encoded image data using any type of transport encoding or processing for transmission over a communication link or network.

Communication interface 28 corresponds to communication interface 22, and may be used, for example, to receive transmitted data and process the transmitted data using any type of corresponding transmission decoding or processing and/or decapsulation to obtain encoded image data 21.

Both communication interface 22 and communication interface 28 may be configured as a one-way communication interface, as indicated by the arrows pointing from source device 12 to corresponding communication channel 13 of destination device 14 in fig. 1A, or a two-way communication interface, and may be used to send and receive messages, etc., to establish a connection, to acknowledge and exchange any other information related to a communication link and/or data transmission, such as an encoded image data transmission, etc.

Video decoder 30 is operative to receive encoded image data 21 and provide decoded image data (or decoded image, reconstructed image) 31 (described further below with respect to fig. 3, etc.).

The post-processor 32 is configured to perform post-processing on decoded image data 31 (also referred to as reconstructed image data) such as a decoded image, and obtain post-processed image data 33 such as a post-processed image. Post-processing performed by post-processing unit 32 may include, for example, color format conversion (e.g., from YCbCr to RGB), toning, cropping, or resampling, or any other processing for generating decoded image data 31 for display by display device 34 or the like.

The display device 34 is used to receive the post-processed image data 33 to display an image to a user or viewer or the like. The display device 34 may be or include any type of display for representing the reconstructed image, such as an integrated or external display screen or display. For example, the display screen may include a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS), a Digital Light Processor (DLP), or any other type of display screen.

The coding system 10 further comprises a training engine 25, the training engine 25 being configured to train the encoder 20 (in particular the mode selection unit 260 in the encoder 20) or the decoder 30 (in particular the mode application unit 360 in the decoder 30) to process the input image or image area or image block to generate a prediction value for the input image or image area or image block.

The training data for the encoder 20 or the decoder 30 in the embodiment of the present application may be stored in a database (not shown), and the training engine 25 trains to obtain the target model (for example, a neural network for inter-frame prediction or intra-frame prediction of images, or loop filtering, etc.) based on the training data. It should be noted that, in the embodiment of the present application, a source of the training data is not limited, and for example, the training data may be obtained from a cloud or other places to perform model training.

The object model trained by the training engine 25 may be applied to the decoding system 10, 40, for example, the source device 12 (e.g., encoder 20) or the destination device 14 (e.g., decoder 30) shown in fig. 1A. The training engine 25 may train in the cloud to obtain a target model, and then the decoding system 10 downloads and uses the target model from the cloud; alternatively, the training engine 25 may train a target model in the cloud and use the target model to obtain the processing result directly from the cloud by the decoding system 10. For example, the training engine 25 trains to obtain a target model with a filtering function, the decoding system 10 downloads the target model from the cloud, and then the loop filter 220 in the encoder 20 or the loop filter 320 in the decoder 30 may filter the input reconstructed image or image block according to the target model to obtain a filtered image or image block. For another example, the training engine 25 trains to obtain a target model with a filtering function, the decoding system 10 does not need to download the target model from the cloud, the encoder 20 or the decoder 30 transmits the reconstructed image or image block to the cloud, and the cloud performs a filtering process on the reconstructed image or image block through the target model to obtain a filtered image or image block and transmits the filtered image or image block to the encoder 20 or the decoder 30.

Although fig. 1A shows the source device 12 and the destination device 14 as separate devices, device embodiments may also include both the source device 12 and the destination device 14 or both the source device 12 and the destination device 14 functionality, i.e., both the source device 12 or corresponding functionality and the destination device 14 or corresponding functionality. In these embodiments, the source device 12 or corresponding functionality and the destination device 14 or corresponding functionality may be implemented using the same hardware and/or software or by separate hardware and/or software or any combination thereof.

It will be apparent to the skilled person from the description that the presence and (exact) division of different units or functions in the source device 12 and/or the destination device 14 shown in fig. 1A may differ depending on the actual device and application.

Encoder 20 (e.g., video encoder 20) or decoder 30 (e.g., video decoder 30) or both may be implemented by processing circuitry as shown in fig. 1B, such as one or more microprocessors, Digital Signal Processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, hardware, video-encoding dedicated processors, or any combination thereof. Encoder 20 may be implemented by processing circuitry 46 to include the various modules discussed with reference to encoder 20 of fig. 2 and/or any other encoder system or subsystem described herein. Decoder 30 may be implemented by processing circuitry 46 to include the various modules discussed with reference to decoder 30 of fig. 3 and/or any other decoder system or subsystem described herein. The processing circuit 46 may be used to perform various operations discussed below. As shown in fig. 5, if portions of the techniques are implemented in software, the device may store the instructions of the software in a suitable non-transitory computer-readable storage medium and execute the instructions in hardware using one or more processors to perform the techniques of this application. One of the video encoder 20 and the video decoder 30 may be integrated in a single device as part of a combined CODEC (CODEC), as shown in fig. 1B.

Source device 12 and destination device 14 may comprise any of a variety of devices, including any type of handheld or fixed device, such as a notebook or laptop computer, a cell phone, a smart phone, a tablet or tablet computer, a camera, a desktop computer, a set-top box, a television, a display device, a digital media player, a video game console, a video streaming device (e.g., a content service server or a content distribution server), a broadcast receiving device, a broadcast transmitting device, etc., and may not use or use any type of operating system. In some cases, source device 12 and destination device 14 may be equipped with components for wireless communication. Thus, source device 12 and destination device 14 may be wireless communication devices.

In some cases, the video coding system 10 shown in fig. 1A is merely exemplary, and the techniques provided herein may be applicable to video encoding settings (e.g., video encoding or video decoding) that do not necessarily include any data communication between the encoding device and the decoding device. In other examples, the data is retrieved from local storage, sent over a network, and so on. A video encoding device may encode and store data in memory, and/or a video decoding device may retrieve and decode data from memory. In some examples, the encoding and decoding are performed by devices that do not communicate with each other, but merely encode data to and/or retrieve and decode data from memory.

Fig. 1B is an illustrative diagram of an example of a video coding system 40 including video encoder 20 of fig. 2 and/or video decoder 30 of fig. 3, according to an example embodiment. Video coding system 40 may include an imaging device 41, video encoder 20, video decoder 30 (and/or a video codec implemented by processing circuitry 46), an antenna 42, one or more processors 43, one or more memory storage devices 44, and/or a display device 45.

As shown in fig. 1B, the imaging device 41, the antenna 42, the processing circuit 46, the video encoder 20, the video decoder 30, the processor 43, the memory storage 44, and/or the display device 45 are capable of communicating with each other. In different examples, video coding system 40 may include only video encoder 20 or only video decoder 30.

In some instances, antenna 42 may be used to transmit or receive an encoded bitstream of video data. Additionally, in some instances, display device 45 may be used to present video data. The processing circuit 46 may comprise application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, or the like. Video decoding system 40 may also include an optional processor 43, which optional processor 43 similarly may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, or the like. In addition, the memory 44 may be any type of memory, such as a volatile memory (e.g., Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), etc.) or a non-volatile memory (e.g., flash memory, etc.), and so on. In a non-limiting example, memory storage 44 may be implemented by a speed cache memory. In other examples, the processing circuitry 46 may include memory (e.g., cache, etc.) for implementing an image buffer, etc.

In some examples, video encoder 20, implemented by logic circuitry, may include an image buffer (e.g., implemented by processing circuitry 46 or memory storage 44) and a graphics processing unit (e.g., implemented by processing circuitry 46). The graphics processing unit may be communicatively coupled to the image buffer. The graphics processing unit may include video encoder 20 implemented by processing circuitry 46 to implement the various modules discussed with reference to fig. 2 and/or any other encoder system or subsystem described herein. Logic circuitry may be used to perform various operations discussed herein.

In some examples, video decoder 30 may be implemented by processing circuitry 46 in a similar manner to implement the various modules discussed with reference to video decoder 30 of fig. 3 and/or any other decoder system or subsystem described herein. In some examples, logic circuit implemented video decoder 30 may include an image buffer (implemented by processing circuit 46 or memory storage 44) and a graphics processing unit (implemented by processing circuit 46, for example). The graphics processing unit may be communicatively coupled to the image buffer. The graphics processing unit may include a video decoder 30 implemented by processing circuitry 46 to implement the various modules discussed with reference to fig. 3 and/or any other decoder system or subsystem described herein.

In some instances, antenna 42 may be used to receive an encoded bitstream of video data. As discussed, the encoded bitstream may include data related to the encoded video frame, indicators, index values, mode selection data, etc., discussed herein, such as data related to the encoding partition (e.g., transform coefficients or quantized transform coefficients, (as discussed) optional indicators, and/or data defining the encoding partition). Video coding system 40 may also include a video decoder 30 coupled to antenna 42 and used to decode the encoded bitstream. The display device 45 is used to present video frames.

It should be understood that video decoder 30 may be used to perform the reverse process for the example described with reference to video encoder 20 in the embodiments of the present application. With respect to signaling syntax elements, video decoder 30 may be configured to receive and parse such syntax elements and decode the associated video data accordingly. In some examples, video encoder 20 may entropy encode the syntax elements into an encoded video bitstream. In such instances, video decoder 30 may parse such syntax elements and decode the relevant video data accordingly.

For ease of description, embodiments of the present application are described with reference to general Video Coding (VVC) reference software or High-performance Video Coding (HEVC) developed by Joint Video Coding Team (JCT-VC) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). Those of ordinary skill in the art understand that the embodiments of the present application are not limited to HEVC or VVC.

The following describes an encoder and an encoding method, and a decoder and a decoding method.

Encoder and encoding method

Fig. 2 is a schematic block diagram of an example of a video encoder 20 for implementing the techniques of this application. In the example of fig. 2, the video encoder 20 includes an input terminal (or input interface) 201, a residual calculation unit 204, a transform processing unit 206, a quantization unit 208, an inverse quantization unit 210, an inverse transform processing unit 212, a reconstruction unit 214, a loop filter 220, a Decoded Picture Buffer (DPB) 230, a mode selection unit 260, an entropy encoding unit 270, and an output terminal (or output interface) 272. Mode select unit 260 may include inter prediction unit 244, intra prediction unit 254, and partition unit 262. The inter prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown). The video encoder 20 shown in fig. 2 may also be referred to as a hybrid video encoder or a hybrid video codec-based video encoder.

Referring to fig. 2, the inter prediction module/intra prediction module/loop filter module includes (is) a trained target model (also referred to as a neural network) for processing an input image or image area or image block to generate a prediction value for the input image block. For example, neural networks for inter prediction/intra prediction/loop filtering are used to receive an input image or image area or image block.

The residual calculation unit 204, the transform processing unit 206, the quantization unit 208, and the mode selection unit 260 constitute a forward signal path of the encoder 20, and the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the buffer 216, the loop filter 220, the Decoded Picture Buffer (DPB) 230, the inter prediction unit 244, and the intra prediction unit 254 constitute a backward signal path of the encoder, wherein the backward signal path of the encoder 20 corresponds to a signal path of a decoder (see the decoder 30 in fig. 3). Inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, loop filter 220, decoded image buffer 230, inter prediction unit 244, and intra prediction unit 254 also constitute a "built-in decoder" of video encoder 20.

(1) Image and image segmentation (image and block)

The encoder 20 is operable to receive images (or image data) 17, e.g. forming images in a sequence of images of a video or video sequence, via an input 201 or the like. The received image or image data may also be a pre-processed image (or pre-processed image data) 19. For simplicity, the following description uses image 17. The image 17 may also be referred to as a current image, an original image or an image to be encoded (in particular when the current image is distinguished from other images in video encoding, for example the same video sequence, i.e. previously encoded images and/or decoded images in a video sequence also comprising the current image).

The (digital) image is or can be considered as a two-dimensional array or matrix of pixels with intensity values. The pixels in the array may also be referred to as pixels (or pels) (short for picture elements). The number of pixels in the array or image in both the horizontal and vertical directions (or axes) determines the size and/or resolution of the image. To represent color, three color components are typically employed, i.e., the image may be represented as or include an array of three pixel points. In the RBG format or color space, the image includes corresponding arrays of red, green, and blue pixel dots. However, in video coding, any pixel is typically represented in a luminance/chrominance format or color space, such as YCbCr, which includes a luminance component (sometimes also denoted L) indicated by Y and two chrominance components, denoted Cb, Cr. The luminance (luma) component Y represents luminance or gray level intensity (e.g., both are the same in a gray scale image), while the two chrominance (abbreviated chroma) components Cb and Cr represent chrominance or color information components. Accordingly, an image in YCbCr format includes a luminance pixel point array of luminance pixel point values (Y) and two chrominance pixel point arrays of chrominance values (Cb and Cr). An image in RGB format may be converted or transformed into YCbCr format and vice versa, a process also known as color transformation or conversion. If the image is black and white, the image may include only an array of luminance pixel points. Accordingly, the image may be, for example, an array of luminance pixel points in monochrome format or an array of luminance pixel points in 4:2:0, 4:2:2 and 4:4:4 color formats and two corresponding arrays of chrominance pixel points.

In one embodiment, an embodiment of the video encoder 20 may include an image partitioning unit (not shown in fig. 2) for partitioning the image 17 into a plurality of (typically non-overlapping) image blocks 203, an image block 203 being a set of pixels, an image block 203 sometimes also referred to as a current block 203, an original block 203, or a partitioned block 203. For example, an image 17 to be encoded is first divided into non-overlapping image blocks 203, and each image block 203 is processed in turn in a given order (e.g., line scan order). These image blocks 203 may also be referred to as root blocks, macroblocks (h.264/AVC), or Coding Tree Blocks (CTBs), or Coding Tree Units (CTUs) in the h.265/HEVC and VVC standards. The segmentation unit may be adapted to use the same block size for all images in the video sequence and to use a corresponding grid defining the block sizes, or to change the block sizes between images or subsets or groups of images and to segment any image into corresponding blocks.

In other embodiments, the video encoder may be configured to receive image blocks 203 of an image 17 directly, e.g., one, several, or all of the blocks that make up the image 17. The image block 203 may also be referred to as a current image block or an image block to be encoded.

As with image 17, image patch 203 is also or can be thought of as a two-dimensional array or matrix of pixels having intensity values (pixel point values), but image patch 203 is smaller than image 17. In other words, the image block 203 may include one pixel dot array (e.g., a luminance array in the case of a monochrome image 17 or a luminance array or a chrominance array in the case of a color image) or three pixel dot arrays (e.g., a luminance array and two chrominance arrays in the case of a color image 17) or any other number and/or type of arrays depending on the color format employed. The number of pixels in the horizontal and vertical directions (or axes) of the image block 203 defines the size of the image block 203. Accordingly, the image block 203 may be an M × N (M columns × N rows) pixel dot array, or an M × N transform coefficient array, or the like. For example, if the image block 203 has a size of N × N, it means that the image block 203 is a two-dimensional pixel array having horizontal and vertical sizes of N.

In one embodiment, the video encoder 20 shown in fig. 2 is used to encode the image 17 on a block-by-block basis, e.g., to perform encoding and prediction on any image block 203.

In one embodiment, the video encoder 20 shown in fig. 2 may also be used to segment and/or encode pictures using slices (also referred to as video slices), where pictures may be segmented or encoded using one or more slices (which typically are non-overlapping). Any slice may comprise one or more blocks (e.g., coding tree unit CTU) or one or more groups of blocks (e.g., coded blocks (tile) in the h.265/HEVC/VVC standard and tiles (brick) in the VVC standard).

In one embodiment, the video encoder 20 shown in fig. 2 may be further configured to partition and/or encode a picture using one or more slice/coding block groups (generally non-overlapping) and/or coding blocks (also referred to as video coding blocks), wherein any slice/coding block group may include one or more blocks (e.g., CTUs) or one or more coding blocks, etc., wherein any coding block may be rectangular, etc., and may include one or more complete or partial blocks (e.g., CTUs).

(2) Residual calculation

The residual calculation unit 204 is configured to calculate a residual block 205 from the image block 203 and the prediction block 265 (the prediction block 265 is described in detail below) as follows: for example, pixel-point-by-pixel (pixel-by-pixel) values of the prediction block 265 are subtracted from pixel-point values of the image block 203, resulting in the residual block 205 in the pixel domain. The encoder 20 performs intra prediction or inter prediction on an image block to obtain prediction values of pixels therein, and a set of prediction values of pixels in the image block is referred to as prediction of the image block and is also referred to as a prediction block. Further, differences between the original values of the pixels in the image block and the predicted values of the pixels in the image block are calculated, and a set of the differences between the original values of the pixels in the image block and the predicted values of the pixels in the image block is called a residual of the image block, also called a residual block.

(3) Transformation of

The transform processing unit 206 is configured to perform Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), or the like on pixel point values of the residual block 205, to obtain transform coefficients 207 in a transform domain. The transform coefficients 207, which may also be referred to as transform residual coefficients, represent a residual block 205 in the transform domain.

Transform processing unit 206 may be used to apply an integer approximation of DCT/DST, such as the transform specified for h.265/HEVC. Such an integer approximation is typically scaled by some factor compared to the orthogonal DCT transform. To maintain the norm of the residual block that is processed by the forward and inverse transforms, other scaling factors are used as part of the transform process. The scaling factor is typically selected according to certain constraints, e.g., the scaling factor is a power of 2 for a shift operation, the bit depth of the transform coefficients, a tradeoff between accuracy and implementation cost, etc. For example, a specific scaling factor may be specified for the inverse transform by the inverse transform processing unit 212 on the encoder 20 side (and for the corresponding inverse transform by, for example, the inverse transform processing unit 312 on the decoder 30 side), and accordingly, a corresponding scaling factor may be specified for the forward transform by the transform processing unit 206 on the encoder 20 side.

In one embodiment, video encoder 20 (correspondingly, transform processing unit 206) may be configured to output transform parameters, such as the type of transform(s), e.g., directly or after being encoded or compressed by entropy encoding unit 270, e.g., such that video decoder 30 may receive and use the transform parameters for decoding.

(4) Quantization

The quantization unit 208 is configured to quantize the transform coefficients 207 by, for example, scalar quantization or vector quantization, and obtain quantized transform coefficients 209, that is, quantized transform coefficients (quantized transform coefficients)209, which are simply referred to as quantized coefficients 209. The quantized transform coefficients 209 may also be referred to as quantized residual coefficients 209.

The quantization process may reduce the bit depth associated with some or all of transform coefficients 207. For example, an n-bit transform coefficient may be rounded down to an m-bit transform coefficient during quantization, where n is greater than m. The quantization level may be modified by adjusting a Quantization Parameter (QP). For example, for scalar quantization, different degrees of scaling may be applied to achieve finer or coarser quantization. Smaller quantization steps correspond to finer quantization and larger quantization steps correspond to coarser quantization. An appropriate quantization step size may be indicated by a Quantization Parameter (QP). For example, the quantization parameter may be an index of a predefined set of suitable quantization step sizes. For example, a smaller quantization parameter may correspond to a fine quantization (smaller quantization step size) and a larger quantization parameter may correspond to a coarse quantization (larger quantization step size), or vice versa. The quantization may comprise division by a quantization step size, while the corresponding or inverse dequantization performed by the dequantization unit 210, etc., may comprise multiplication by the quantization step size. Embodiments according to some standards such as HEVC may be used to determine the quantization step size using a quantization parameter. In general, the quantization step size may be calculated from the quantization parameter using a fixed point approximation of an equation that includes division. Other scaling factors may be introduced for quantization and dequantization to recover the norm of the residual block that may be modified due to the scale used in the fixed-point approximation of the equation for the quantization step size and quantization parameter. In one exemplary implementation, the inverse transform and dequantization ratios may be combined. Alternatively, a custom quantization table may be used and indicated from the encoder to the decoder in the bitstream or the like. Quantization is a lossy operation, where the larger the quantization step size, the greater the loss.

In one embodiment, video encoder 20 (correspondingly, quantization unit 208) may be used to output Quantization Parameters (QPs), e.g., directly or encoded or compressed by entropy encoding unit 270, e.g., such that video decoder 30 may receive and decode using the quantization parameters.

(5) Inverse quantization

The inverse quantization unit 210 is configured to perform inverse quantization of the quantization unit 208 on the quantization coefficients 209, resulting in dequantized coefficients 211, e.g., perform an inverse quantization scheme according to or using the same quantization step as the quantization unit 208 as the quantization scheme performed by the quantization unit 208. Dequantized coefficients 211, which may also be referred to as dequantized residual coefficients 211 or dequantized coefficients 211, correspond to transform coefficients 207, but dequantized coefficients 211 are typically not identical to transform coefficients 207 due to the loss caused by quantization.

(6) Inverse transformation

The inverse transform processing unit 212 is configured to perform an inverse transform of the transform performed by the transform processing unit 206, for example, inverse Discrete Cosine Transform (DCT) or inverse Discrete Sine Transform (DST), to obtain a reconstructed residual block 213 in the pixel domain. The reconstructed residual block 213 may also be referred to as a transform block 213.

(7) Reconstruction

The reconstruction unit 214 (e.g. summer 214) is configured to add the transform block 213 (i.e. the reconstructed residual block 213) to the prediction block 265 to obtain the reconstruction block 215 in the pixel domain, e.g. to add pixel point values of the reconstructed residual block 213 and pixel point values of the prediction block 265.

(8) Filtering

The loop filter unit 220 (or simply "loop filter" 220) is used to filter the reconstruction block 215 to obtain a filter block 221, or generally used to filter the reconstructed pixel points to obtain filtered pixel point values. For example, the loop filter unit is used to smoothly perform pixel transition or improve video quality, and coding distortion such as blocking effect and ringing effect can be removed by loop filtering. Loop filter unit 220 may include one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or one or more other filters, such as an Adaptive Loop Filter (ALF), a Noise Suppression Filter (NSF), or any combination thereof. For example, the loop filter unit 220 may include a deblocking filter, an SAO filter, and an ALF filter. The order of the filtering process may be a deblocking filter, an SAO filter, and an ALF filter. As another example, a process called luma mapping with chroma scaling (LMCS) (i.e., adaptive in-loop shaper) is added. This process is performed prior to deblocking. As another example, the deblocking filtering process may also be applied to intra sub-block edges, such as affine sub-block edges, ATMVP sub-block edges, sub-block transform (SBT) edges, and intra sub-partition (ISP) edges. Although loop filter unit 220 is shown in fig. 2 as a loop filter, in other configurations, loop filter unit 220 may be implemented as a loop filter. The filtering block 221 may also be referred to as a filtered reconstruction block 221.

In one embodiment, video encoder 20 (correspondingly, loop filter unit 220) may be used to output loop filter parameters (e.g., SAO filtering parameters, ALF filtering parameters, or LMCS parameters), e.g., directly or after entropy encoding by entropy encoding unit 270, e.g., such that decoder 30 may receive and decode using the same or different loop filter parameters.

(9) Decoded picture buffer

Decoded Picture Buffer (DPB) 230 may be a reference picture memory that stores reference pictures (or reference picture data) for use by video encoder 20 in encoding video data. DPB 230 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM), including Synchronous DRAM (SDRAM), Magnetoresistive RAM (MRAM), Resistive RAM (RRAM), or other types of memory devices. The decoded picture buffer 230 may be used to store one or more filter blocks 221. The decoded picture buffer 230 may also be used to store other previously filtered blocks, such as previously reconstructed and filtered blocks 221, of the same current picture or of a different picture, such as a previously reconstructed picture, and may provide a complete previously reconstructed, i.e., decoded picture (and corresponding reference blocks and pixels) and/or a partially reconstructed current picture (and corresponding reference blocks and pixels), for example, for inter prediction. Decoded picture buffer 230 may also be used to store one or more unfiltered reconstructed blocks 215, or generally, unfiltered reconstructed pixel points, such as reconstructed blocks 215 that are not filtered by loop filtering unit 220, or reconstructed blocks or reconstructed pixel points that have not been subjected to any other processing.

(10) Mode selection (segmentation and prediction)

The mode selection unit 260 includes a segmentation unit 262, an inter prediction unit 244, and an intra prediction unit 254, for receiving or obtaining original image data, such as filtered and/or unfiltered reconstructed pixel points or reconstructed blocks of the same (current) image and/or one or more previously decoded images, of the original block 203 (the current block 203 of the current image 17) and reconstructed image data, from the decoded image buffer 230 or other buffers (e.g., column buffers, not shown). The reconstructed image data is used as reference image data necessary for prediction such as inter prediction or intra prediction to obtain a prediction block 265 or a prediction value 265.

The mode selection unit 260 may be used to determine or select a partition for the current block prediction mode (including no partition) and the prediction mode (e.g., intra or inter prediction modes), generating a corresponding prediction block 265 for calculation of the residual block 205 and reconstruction of the reconstructed block 215.

In one embodiment, mode selection unit 260 may be used to select a partitioning and prediction mode (e.g., from among prediction modes supported or available by mode selection unit 260) that provides the best match or minimum residual (minimum residual refers to better compression in transmission or storage), or provides the minimum signaling overhead (minimum signaling overhead refers to better compression in transmission or storage), or both. The mode selection unit 260 may be configured to determine the partitioning and prediction modes according to Rate Distortion Optimization (RDO), i.e., to select a prediction mode that provides minimum rate distortion Optimization. The terms "best," "lowest," "optimal," and the like herein do not necessarily refer to "best," "lowest," "optimal" as a whole, but may also refer to situations where termination or selection criteria are met, e.g., values above or below a threshold or other limit may result in "sub-optimal selection," but may reduce complexity and processing time.

In other words, the segmentation unit 262 may be configured to segment images in a video sequence into a sequence of Coding Tree Units (CTUs) 203, the CTUs 203 may be further segmented into smaller block portions or sub-blocks (again forming blocks), e.g. by iteratively using quad-tree (QT) segmentation, binary-tree (BT) segmentation or triple-tree (TT) segmentation, or any combination thereof, and to perform prediction, e.g. for each of the block portions or sub-blocks, wherein the mode selection comprises selecting a tree structure of the segmented block 203 and selecting a prediction mode to apply to each of the block portions or sub-blocks.

The partitioning (e.g., by partitioning unit 262) and prediction processing (e.g., by inter-prediction unit 244 and intra-prediction unit 254) performed by video encoder 20 will be described in detail below.

(11) Segmentation

The partition unit 262 may partition (or divide) one coding tree unit 203 into smaller parts, such as small blocks of square or rectangular shape. For an image with three pixel point arrays, one CTU consists of nxn luminance pixel point blocks and two corresponding chrominance pixel point blocks. The maximum allowable size of the luminance block in the CTU is specified as 128 × 128 in a universal Video Coding (VVC) standard under development, but may be specified as a value other than 128 × 128, for example, 256 × 256, in the future. The CTUs of a picture may be grouped/grouped into slice/coding block groups, coding blocks or tiles. A coded block covers a rectangular area of an image and a coded block may be divided into one or more tiles. A tile consists of multiple rows of CTUs within a coding block. A coded block that is not partitioned into multiple tiles may be referred to as a tile. However, the tiles are a true subset of the coded blocks and are therefore not referred to as coded blocks. The VVC supports two modes of coded blockgroups, a raster scan slice/coded blockgroup mode and a rectangular slice mode. In the raster scan coding block group mode, a slice/coding block group contains a sequence of coding blocks in a raster scan of coding blocks of an image. In the rectangular tile mode, the tile contains a plurality of tiles of an image that together make up a rectangular area of the image. The tiles in the rectangular sheet are arranged in the tile raster scan order of the photographs. These smaller blocks (which may also be referred to as sub-blocks) may be further partitioned into smaller portions. This is also referred to as tree splitting or hierarchical tree splitting, where a root block at root tree level 0 (hierarchical level 0, depth 0), etc., may be recursively split into two or more next lower tree level blocks, e.g., tree level 1 (hierarchical level 1, depth 1) nodes. These blocks may in turn be split into two or more next lower level blocks, e.g. tree level 2 (hierarchical level 2, depth 2), etc., until the split ends (because the end criteria are met, e.g. maximum tree depth or minimum block size is reached). The blocks that are not further divided are also referred to as leaf blocks or leaf nodes of the tree. A tree divided into two parts is called a binary-tree (BT), a tree divided into three parts is called a ternary-tree (TT), and a tree divided into four parts is called a quad-tree (QT).

For example, a Coding Tree Unit (CTU) may be or include a CTB of luminance pixels, two corresponding CTBs of chrominance pixels of an image having three pixel point arrays, or a CTB of pixels of a monochrome image, or a CTB of pixels of an image encoded using three independent color planes and syntax structures (for encoding pixels). Accordingly, a Coding Tree Block (CTB) may be a block of N × N pixels, where N may be set to a value such that the components are divided into CTBs, which is the partition. A Coding Unit (CU) may be or include a coding block of luminance pixels, two corresponding coding blocks of chrominance pixels of an image having three arrays of pixel points, or a coding block of pixels of a monochrome image or a coding block of pixels of an image coded using three independent color planes and syntax structures (for coding pixels). Accordingly, a Coding Block (CB) may be an M × N block of pixels, where M and N may be set to a value such that the CTB is divided into coding blocks, which is the partition.

For example, in an embodiment, a Coding Tree Unit (CTU) may be partitioned into CUs according to HEVC by using a quadtree structure represented as a coding tree. The decision whether to encode an image region using inter (temporal) prediction or intra (spatial) prediction is made at the leaf CU level. Any leaf-CU may be further divided into one, two, or four PUs according to PU partition type. The same prediction process is used within a PU and the relevant information is transmitted to the decoder in units of PU. After applying the prediction process according to the PU partition type to obtain the residual block, the leaf-CU may be partitioned into Transform Units (TUs) according to other quadtree structures similar to the coding tree used for the CU.

For example, in an embodiment, the segmentation structure for partitioning the coding tree unit is partitioned using a combined quadtree of nested multi-type trees (e.g., binary and ternary) according to the latest video coding standard currently being developed, referred to as universal video coding (VVC). In the coding tree structure within a coding tree unit, a CU may be square or rectangular. For example, a Coding Tree Unit (CTU) is first partitioned by a quadtree structure. The quadtree leaf nodes are further partitioned by a multi-type tree structure. The multi-type tree structure has four division types: vertical binary tree partitioning (SPLIT _ BT _ VER), horizontal binary tree partitioning (SPLIT _ BT _ HOR), vertical ternary tree partitioning (SPLIT _ TT _ VER), and horizontal ternary tree partitioning (SPLIT _ TT _ HOR). Leaf nodes of a multi-type tree are called Coding Units (CUs), unless a CU is too large for the maximum transform length, such segments are used for prediction and transform processing without any other partitioning. In most cases, this means that the block sizes of the CU, PU, and TU are the same in the coding block structure of the quad-tree nested multi-type tree. This exception occurs when the maximum supported transform length is smaller than the width or height of the color component of the CU. The VVC formulates a unique signaling mechanism for partitioning information in a coding structure with quad-tree nested multi-type trees. In the signaling mechanism, a Coding Tree Unit (CTU) is first partitioned by a quadtree structure as a root of the quadtree. Any quadtree leaf nodes (when large enough to be broken) are then further partitioned into a multi-type tree structure. In the multi-type tree structure, whether the node is further divided is indicated by a first flag (mtt _ split _ cu _ flag), when the node is further divided, the division direction is indicated by a second flag (mtt _ split _ cu _ vertical _ flag) first, and the division is indicated by a third flag (mtt _ split _ cu _ binary _ flag) to be a binary tree division or a ternary tree division. From the values of mtt _ split _ CU _ vertical _ flag and mtt _ split _ CU _ binary _ flag, the decoder can derive a multi-type tree partition mode (mttssplitmode) for the CU based on predefined rules or tables. It should be noted that for a certain design, such as 64 × 64 luma blocks and 32 × 32 chroma pipeline design in the VVC hardware decoder, TT partitioning is not allowed when the width or height of the luma block is larger than 64. TT partitioning is also not allowed when the chroma coding block has a width or height greater than 32. The pipeline design divides the image into a plurality of Virtual Pipeline Data Units (VPDUs), and any VPDU is defined as a unit which does not overlap with each other in the image. In a hardware decoder, successive VPDUs are processed simultaneously in multiple pipeline stages. In most pipeline stages, the VPDU size is roughly proportional to the buffer size, so it is desirable to keep the VPDU small. In most hardware decoders, the VPDU size can be set to the maximum Transform Block (TB) size. However, in VVC, the partitioning of the Ternary Tree (TT) and Binary Tree (BT) may increase the size of the VPDU.

In addition, when a part of a tree node block exceeds the bottom or the right boundary of the image, the tree node block is forcibly divided until all pixel points of any encoded CU are located within the image boundary.

For example, the intra sub-partitions (ISP) tool may divide a luma intra predicted block into two or four sub-parts vertically or horizontally according to a block size.

In one example, mode select unit 260 of video encoder 20 may be used to perform any combination of the segmentation techniques described above.

As described above, video encoder 20 is configured to determine or select the best or optimal prediction mode from a set of (predetermined) prediction modes. The set of prediction modes may include, for example, intra-prediction modes and/or inter-prediction modes.

(12) Intra prediction

The set of intra prediction modes may include 35 different intra prediction modes, e.g., non-directional modes like DC (or mean) and planar modes, or directional modes as defined in HEVC, or may include 67 different intra prediction modes, e.g., non-directional modes like DC (or mean) and planar modes, or directional modes as defined in VVC. For example, several conventional angular intra prediction modes are adaptively replaced with wide-angle intra prediction modes of non-square blocks defined in VVC. For another example, to avoid division operations for DC prediction, only the longer sides are used to calculate the average of non-square blocks. Also, the intra prediction result of the planar mode may be modified using a position-dependent intra prediction combination (PDPC) method.

The intra-prediction unit 254 is configured to generate an intra-prediction block 265 according to intra-prediction modes in the intra-prediction mode set by using reconstructed pixel points of neighboring blocks of the same current picture.

Intra-prediction unit 254 (or, generally, mode selection unit 260) is also used to output intra-prediction parameters (or, generally, information indicating the selected intra-prediction mode for the block) to entropy encoding unit 270 in the form of syntax elements 266 for inclusion into encoded image data 21 so that video decoder 30 may perform operations, such as receiving and using the prediction parameters for decoding.

(13) Inter prediction

In possible implementations, the set of inter prediction modes depends on the available reference pictures (i.e., at least some previously decoded pictures stored in the DBP 230, for example, as described above) and other inter prediction parameters, such as whether to use the entire reference picture or only a portion of the reference picture, such as a search window area near the area of the current block, to search for the best matching reference block, and/or such as whether to perform half-pel, quarter-pel, and/or 16-th-pel interpolation of pixels, for example.

In addition to the prediction mode described above, a skip mode and/or a direct mode may be employed.

For example, extended merge prediction, the merge candidate list for this mode consists of the following five candidate types in order: spatial MVPs from spatially neighboring CUs, temporal MVPs from collocated CUs, history-based MVPs from FIFO tables, pairwise average MVPs, and zero MVs. Decoder side motion vector modification (DMVR) based on bilateral matching may be used to increase the accuracy of the MV in merge mode. Merge mode with MVD (MMVD) comes from merge modes with motion vector disparity. The MMVD flag is sent immediately after sending the skip flag and the merge flag to specify whether the CU uses MMVD mode. A CU-level Adaptive Motion Vector Resolution (AMVR) scheme may be used. The AMVR supports the MVDs of CUs to be encoded with different precisions. And adaptively selecting the MVD of the current CU according to the prediction mode of the current CU. When a CU is encoded in a combined mode, a combined inter/intra prediction (CIIP) mode may be applied to the current CU. And carrying out weighted average on the interframe and intraframe prediction signals to obtain CIIP prediction. For affine motion compensated prediction, the affine motion field of a block is described by the motion information of 2 control point (4 parameters) or 3 control point (6 parameters) motion vectors. Sub-block-based temporal motion vector prediction (SbTMVP) is similar to the Temporal Motion Vector Prediction (TMVP) in HEVC, but predicts the motion vectors of sub-CUs within the current CU. Bi-directional optical flow (BDOF), formerly known as BIO, is a simplified version of the reduction in computations, particularly in terms of multiplication times and multiplier size. In the triangle division mode, a CU is uniformly divided into two triangle parts in two divisions of diagonal division and anti-diagonal division. Furthermore, the bi-directional prediction mode is extended on the basis of simple averaging to support weighted averaging of the two prediction signals.

The inter prediction unit 244 may include a Motion Estimation (ME) unit and a Motion Compensation (MC) unit (both not shown in fig. 2). The motion estimation unit may be configured to receive or retrieve an image block 203 (a current image block 203 of a current image 17) and a decoded image 231, or at least one or more previously reconstructed blocks, e.g., of one or more other/different previously decoded images 231, for motion estimation. For example, the video sequence may comprise a current picture and a previous decoded picture 231, or in other words, the current picture and the previous decoded picture 231 may be part of or form the sequence of pictures forming the video sequence.

For example, the encoder 20 may be configured to select a reference block from a plurality of reference blocks of the same or different one of a plurality of other images, and to provide the reference image (or reference image index) and/or an offset (spatial offset) between the position (x, y coordinates) of the reference block and the position of the current block as an inter prediction parameter to the motion estimation unit. This offset is also called a Motion Vector (MV).

The motion compensation unit is configured to obtain, e.g., receive, inter-prediction parameters and perform inter-prediction according to or using the inter-prediction parameters to obtain an inter-prediction block 246. The motion compensation performed by the motion compensation unit may involve extracting or generating a prediction block from a motion/block vector determined by motion estimation, and may also include performing interpolation on sub-pixel precision. Interpolation filtering may generate pixel points for other pixels from pixel points for known pixels, potentially increasing the number of candidate prediction blocks that may be used to encode an image block. Upon receiving a motion vector corresponding to a PU of a current image block, the motion compensation unit may locate, in one of the reference picture lists, a prediction block to which the motion vector points.

Motion compensation unit may also generate syntax elements related to the block and the video slice for use by video decoder 30 in decoding an image block of the video slice. In addition, or as an alternative to slices and corresponding syntax elements, coding block groups and/or coding blocks and corresponding syntax elements may be generated or used.

(14) Entropy coding

Entropy encoding unit 270 is configured to apply an entropy encoding algorithm or scheme (e.g., a Variable Length Coding (VLC) scheme, a context adaptive VLC scheme, a CALVC (CALVC), an arithmetic coding scheme, a binarization algorithm, a Context Adaptive Binary Arithmetic Coding (CABAC), a syntax-based context-adaptive binary arithmetic coding (SBAC), a Probability Interval Partitioning Entropy (PIPE) coding, or other entropy encoding methods or techniques) to quantization coefficients 209, inter-prediction parameters, intra-prediction parameters, loop filter parameters, and/or other syntax elements, encoded image data 21 that can be output via output 272 in the form of an encoded bitstream 21 or the like is obtained so that parameters for decoding can be received and used by video decoder 30 or the like. The encoded bitstream 21 may be transmitted to the video decoder 30 or stored in memory for later transmission or retrieval by the video decoder 30.

Other structural variations of video encoder 20 may be used to encode the video stream. For example, the non-transform based encoder 20 may directly quantize the residual signal without the transform processing unit 206 for some blocks or frames. In another implementation, the encoder 20 may have the quantization unit 208 and the inverse quantization unit 210 combined into a single unit.

Second, decoder and decoding method

Fig. 3 illustrates an exemplary video decoder 30 for implementing the techniques of the present application. The video decoder 30 is configured to receive encoded image data 21 (e.g., encoded bitstream 21), for example, encoded by the encoder 20, resulting in a decoded image 331. The encoded image data 21 or bitstream comprises information for decoding said encoded image data 21, such as data representing image blocks of an encoded video slice (and/or a group of encoded blocks or a coded block) and associated syntax elements.

In the example of fig. 3, the decoder 30 includes an entropy decoding unit 304, an inverse quantization unit 310, an inverse transform processing unit 312, a reconstruction unit 314 (e.g., a summer 314), a loop filter 320, a decoded image buffer (DBP)330, a mode application unit 360, an inter prediction unit 344, and an intra prediction unit 354. The inter prediction unit 344 may be or include a motion compensation unit. In some examples, video decoder 30 may perform a decoding process that is substantially the inverse of the encoding process described with reference to video encoder 100 of fig. 2.

Referring to fig. 3, the inter prediction module/intra prediction module/loop filter module includes (is) a trained target model (also referred to as a neural network) for processing an input image or image area or image block to generate a prediction value for the input image block. For example, neural networks for inter prediction/intra prediction/loop filtering are used to receive an input image or image area or image block.

Inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, loop filter 220, decoded image buffer 230, inter prediction unit 344, and intra prediction unit 354 also constitute a "built-in decoder" of video encoder 20, as described for encoder 20. Accordingly, the inverse quantization unit 310 may be functionally identical to the inverse quantization unit 110, the inverse transform processing unit 312 may be functionally identical to the inverse transform processing unit 122, the reconstruction unit 314 may be functionally identical to the reconstruction unit 214, the loop filter 320 may be functionally identical to the loop filter 220, and the decoded picture buffer 330 may be functionally identical to the decoded picture buffer 230. Accordingly, the explanations of the corresponding units and functions of video encoder 20 apply to the corresponding units and functions of video decoder 30, respectively.

(1) Entropy decoding

The entropy decoding unit 304 is configured to parse the bitstream 21 (or generally the encoded image data 21) and perform entropy decoding on the encoded image data 21 to obtain quantized coefficients 309 and/or decoded encoding parameters (not shown in fig. 3), such as any or all of inter-prediction parameters (e.g., reference image indexes and motion vectors), intra-prediction parameters (e.g., intra-prediction modes or indexes), transformation parameters, quantization parameters, loop filter parameters, and/or other syntax elements. Entropy decoding unit 304 may be used to apply a decoding algorithm or scheme corresponding to the encoding scheme of entropy encoding unit 270 of encoder 20. Entropy decoding unit 304 may also be used to provide inter-prediction parameters, intra-prediction parameters, and/or other syntax elements to mode application unit 360, as well as to provide other parameters to other units of decoder 30. Video decoder 30 may receive syntax elements at the video slice and/or video block level. In addition, or as an alternative to slices and corresponding syntax elements, coding block groups and/or coding blocks and corresponding syntax elements may be received or used.

(2) Inverse quantization

Inverse quantization unit 310 may be used to receive Quantization Parameters (QPs) (or, in general, information related to inverse quantization) and quantization coefficients from encoded image data 21 (e.g., parsed and/or decoded by entropy decoding unit 304), and inverse quantize the decoded quantization coefficients 309 based on the quantization parameters to obtain inverse quantization coefficients 311, which inverse quantization coefficients 311 may also be referred to as transform coefficients 311 or dequantized coefficients 311. The inverse quantization process may include using a quantization parameter calculated by video encoder 20 for any video block in the video slice to determine a degree of quantization, as well as a degree of inverse quantization that needs to be performed.

(3) Inverse transformation

The inverse transform processing unit 312 is operable to receive the dequantized coefficients 311, also referred to as transform coefficients 311, and apply a transform to the dequantized coefficients 311 to obtain a reconstructed residual block 313 in the pixel domain. The reconstructed residual block 313 may also be referred to as a transform block 313. The transform may be an inverse transform, such as an inverse DCT, an inverse DST, an inverse integer transform, or a conceptually similar inverse transform process. Inverse transform processing unit 312 may also be used to receive transform parameters or corresponding information from encoded image data 21 (e.g., parsed and/or decoded by entropy decoding unit 304) to determine the transform to apply to dequantized coefficients 311.

(4) Reconstruction

The reconstruction unit 314 (e.g., summer 314) is configured to add the reconstructed residual block 313 to the prediction block 365 to obtain a reconstruction block 315 in the pixel domain, e.g., to add pixel point values of the reconstructed residual block 313 and pixel point values of the prediction block 365.

(5) Filtering

The loop filter unit 320 (in or after the encoding loop) is used to filter the reconstruction block 315, resulting in a filter block 321, to facilitate pixel transitions or to improve video quality, etc. Loop filter unit 320 may include one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or one or more other filters, such as an Adaptive Loop Filter (ALF), a Noise Suppression Filter (NSF), or any combination. For example, the loop filter unit 320 may include a deblocking filter, an SAO filter, and an ALF filter. The order of the filtering process may be a deblocking filter, an SAO filter, and an ALF filter. As another example, a process called luma mapping with chroma scaling (LMCS) (i.e., adaptive in-loop shaper) is added. This process is performed prior to deblocking. As another example, the deblocking filtering process may also be applied to intra sub-block edges, such as affine sub-block edges, ATMVP sub-block edges, sub-block transform (SBT) edges, and intra sub-partition (ISP) edges. Although loop filter unit 320 is shown in fig. 3 as a loop filter, in other configurations, loop filter unit 320 may be implemented as a loop filter.

(6) Decoded picture buffer

Decoded video blocks 321 in one picture are then stored in decoded picture buffer 330, and decoded picture buffer 330 stores decoded pictures 331 as reference pictures for use in other pictures and/or subsequent motion compensation for respective output display.

The decoder 30 is operable to output the decoded image 331 via the output 312 or the like for display to a user or for viewing by a user.

(7) Prediction

Inter-prediction unit 344 may be functionally identical to inter-prediction unit 244 (particularly a motion compensation unit), intra-prediction unit 354 may be functionally identical to intra-prediction unit 254, and decides to divide or divide and perform prediction based on the division and/or prediction parameters or corresponding information received from encoded image data 21 (e.g., parsed and/or decoded by entropy decoding unit 304). The mode application unit 360 may be configured to perform prediction (intra or inter prediction) on any block according to the reconstructed image, block or corresponding pixel (filtered or unfiltered) to obtain a prediction block 365.

When a video slice is encoded as an intra-coded (I) slice, the intra-prediction unit 354 in the mode application unit 360 is used to generate a prediction block 365 for an image block of the current video slice according to the indicated intra-prediction mode and data from a previously decoded block of the current image. When the video picture is encoded as an inter-coded (i.e., B or P) slice, an inter prediction unit 344 (e.g., a motion compensation unit) in the mode application unit 360 is used to generate a prediction block 365 for the video block of the current video slice based on the motion vectors and other syntax elements received from the entropy decoding unit 304. For inter prediction, the prediction blocks may be generated from one of the reference pictures in one of the reference picture lists. Video decoder 30 may use a default construction technique to construct reference frame list 0 and list 1 from the reference pictures stored in decoded picture buffer 330. The same or similar process may be applied to embodiments of coding block groups (e.g., video coding block groups) and/or coding blocks (e.g., video coding blocks), in addition to or instead of slices (e.g., video slices), e.g., video may be encoded using I, P or B coding block groups and/or coding blocks.

Mode application unit 360 is to determine prediction information for the video blocks of the current video slice by parsing the motion vectors and other syntax elements, and to generate prediction blocks for the current video block being decoded using the prediction information. For example, mode application unit 360 uses some of the syntax elements received to determine a prediction mode (e.g., intra prediction or inter prediction) for encoding a video block of a video slice, an inter prediction slice type (e.g., B-slice, P-slice, or GPB-slice), construction information for one or more reference picture lists of the slice, a motion vector for any inter-coded video block of the slice, an inter prediction state for any inter-coded video block of the slice, other information, to decode the video block within the current video slice. The same or similar process may be applied to embodiments of coding block groups (e.g., video coding block groups) and/or coding blocks (e.g., video coding blocks), in addition to or instead of slices (e.g., video slices), e.g., video may be encoded using I, P or B coding block groups and/or coding blocks.

In one embodiment, the video encoder 30 shown in fig. 3 may also be used to segment and/or decode images using slices (also referred to as video slices), where images may be segmented or decoded using one or more slices (which typically are non-overlapping). Any slice may comprise one or more chunks (e.g. CTUs) or one or more chunk groups (e.g. coded chunks in the h.265/HEVC/VVC standard and bricks in the VVC standard).

In one embodiment, the video decoder 30 shown in fig. 3 may also be configured to partition and/or decode a picture using one or more slice/coding block groups (generally non-overlapping) and/or coding blocks (also referred to as video coding blocks), any slice/coding block group may include one or more blocks (e.g., CTUs) or one or more coding blocks, and the like, any coding block may be rectangular and the like, and may include one or more complete or partial blocks (e.g., CTUs).

Other variations of video decoder 30 may be used to decode encoded image data 21. For example, decoder 30 may generate an output video stream without loop filter unit 320. For example, the non-transform based decoder 30 may directly inverse quantize the residual signal without the inverse transform processing unit 312 for some blocks or frames. In another implementation, video decoder 30 may have inverse quantization unit 310 and inverse transform processing unit 312 combined into a single unit.

It should be understood that in the encoder 20 and the decoder 30, the processing result of the current step may be further processed and then output to the next step. For example, after interpolation filtering, motion vector derivation, or loop filtering, further operations, such as clip (clip) or shift (shift) operations, may be performed on the processing results of interpolation filtering, motion vector derivation, or loop filtering.

It should be noted that the derived motion vector for the current block (including but not limited to the control point motion vector for affine mode, sub-block motion vector for affine, planar, ATMVP mode, temporal motion vector, etc.) may be further operated on. For example, the value of the motion vector is limited to a predefined range according to the representation bits of the motion vector. If the representation bits of the motion vector are bitDepth, the range is-2 ^ (bitDepth-1) to 2^ (bitDepth-1) -1, where "^" represents the power. For example, if bitDepth is set to 16, the range is-32768-32767; if bitDepth is set to 18, the range is-131072 ~ 131071. For example, the values from which the motion vectors are derived (e.g. the MVs of 4 x 4 sub-blocks in an 8 x 8 block) are restricted such that the maximum difference between the integer parts of the 4 x 4 sub-blocks MV does not exceed N pixels, e.g. 1 pixel. Two methods of restricting the motion vector according to bitDepth are provided herein.

Although the above embodiments primarily describe video codecs, it should be noted that embodiments of coding system 10, encoder 20, and decoder 30, as well as other embodiments described herein, may also be used for still image processing or codec, i.e., processing or codec of a single image independent of any previous or consecutive image in a video codec. In general, if image processing is limited to only a single image 17, the inter prediction unit 244 (encoder) and the inter prediction unit 344 (decoder) may not be available. All other functions (also referred to as tools or techniques) of video encoder 20 and video decoder 30 are equally applicable to still image processing, such as residual calculation 204/304, transform processing unit 206, quantization unit 208, inverse quantization unit 210/310, inverse transform processing unit 212/312, partition unit 262, intra prediction unit 254/354, loop filter 220/320, entropy encoding unit 270, and entropy decoding unit 304, among others.

It should be noted that the encoder 20 and the decoder 30 have corresponding operations. For example, in the encoder 20 and the decoder 30, the operations of the inter prediction unit 244 and the inter prediction unit 344, and the intra prediction unit 254 and the intra prediction unit 354 are identical; the entropy encoding unit 270 and entropy decoding unit 304, the transform processing unit 206 and inverse transform processing unit 212/312, the quantization unit 208 and inverse quantization unit 210/310, and so on are all paired inverse operations. Therefore, after operations such as prediction, transformation, quantization, entropy encoding, and the like of the encoder 20 are defined, operations such as prediction, inverse transformation, inverse quantization, entropy decoding, and the like of the decoder 30 are also determined.

Fig. 4 is a schematic diagram of a video coding apparatus 400 provided by an embodiment of the present application. Video coding apparatus 400 is suitable for implementing the disclosed embodiments described herein. In one embodiment, video coding device 400 may be a decoder, such as video decoder 30 in fig. 1A, or an encoder, such as video encoder 20 in fig. 1A.

Video coding apparatus 400 includes: an ingress port 410 (or input port 410) and a receiving unit (Rx) 420 for receiving data; a processor, logic unit, or Central Processing Unit (CPU) 430 for processing data; for example, here the processor 430 may be a neural network processor 430; a transmitter unit (Tx) 440 and an egress port 450 (or an egress port 450) for transmitting data; a memory 460 for storing data. The video coding apparatus 400 may also include optical-to-electrical (OE) and electrical-to-optical (EO) components coupled to the ingress port 410, the reception unit 420, the transmission unit 440, and the egress port 450, for the egress or ingress of optical or electrical signals.

The processor 430 is implemented by hardware and software. Processor 430 may be implemented as one or more processor chips, cores (e.g., multi-core processors), FPGAs, ASICs, and DSPs. Processor 430 is in communication with ingress port 410, receiving unit 420, transmitting unit 440, egress port 450, and memory 460. Processor 430 includes a decoding module 470 (e.g., neural network-based decoding module 470). Coding module 470 implements the embodiments disclosed above. For example, the decode module 470 performs, processes, prepares, or provides various encoding operations. Thus, substantial improvements are provided to the functionality of video coding apparatus 400 by coding module 470 and affect the switching of video coding apparatus 400 to different states. Alternatively, decode module 470 may be implemented as instructions stored in memory 460 and executed by processor 430.

Memory 460, which may include one or more disks, tape drives, and solid state drives, may be used as an over-flow data storage device for storing programs when such programs are selected for execution, and for storing instructions and data that are read during program execution. The memory 460 may be volatile and/or nonvolatile, and may be read-only memory (ROM), Random Access Memory (RAM), ternary content-addressable memory (TCAM), and/or Static Random Access Memory (SRAM).

Fig. 5 is a simplified block diagram of an apparatus 500 provided by an exemplary embodiment, where the apparatus 500 may be used as either or both of the source device 12 and the destination device 14 in fig. 1A.

The processor 502 in the apparatus 500 may be a central processor. Alternatively, processor 502 may be any other type of device or devices now or later developed that is capable of manipulating or processing information. Although the disclosed implementations may be implemented using a single processor, such as processor 502 shown, using more than one processor is faster and more efficient.

In one implementation, the memory 504 in the apparatus 500 may be a Read Only Memory (ROM) device or a Random Access Memory (RAM) device. Any other suitable type of storage device may be used for memory 504. The memory 504 may include code and data 506 that the processor 502 accesses over the bus 512. The memory 504 may also include an operating system 508 and application programs 510, the application programs 510 including at least one program that allows the processor 502 to perform the methods described herein. For example, applications 510 may include applications 1 through N, including video coding applications that perform the methods described herein.

The apparatus 500 may also include one or more output devices, such as a display 518. In one example, display 518 may be a touch-sensitive display that combines the display with touch-sensitive elements that may be used to sense touch inputs. A display 518 may be coupled to the processor 502 by the bus 512.

Although the bus 512 in the apparatus 500 is described herein as a single bus, the bus 512 may include multiple buses. Further, the secondary storage may be coupled directly to other components of the apparatus 500 or accessed over a network, and may comprise a single integrated unit such as a memory card or multiple units such as multiple memory cards. Accordingly, the apparatus 500 may have a variety of configurations.

In order to facilitate understanding of the present application by those skilled in the art, some terms used in the embodiments of the present application will be explained below, and related technical knowledge related to the embodiments of the present application will be introduced.

First, term interpretation

Auto-Encoder (AE): a particular neural network architecture.

End-to-end (E2E): the technical scheme generally refers to a learning task is realized by using a neural network, so that all parameters of the neural network can be optimized simultaneously in a gradient back propagation mode.

Coded picture (Coding picture): the method generally comprises three matrixes of an image, wherein the three matrixes respectively store the reconstruction of YUV or RGB color component intensity values of the image, and also comprise coding information such as block division, coding modes, quantization parameters and the like of all coding units of the image.

Decoded picture (Decoded picture): typically, three matrices are included for an image, and the reconstruction of the YUV or RGB color component intensity values for the image is stored.

Coding Unit (CU): the method generally comprises three matrixes of an image block, wherein the three matrixes respectively store the reconstruction of YUV or RGB three color component intensity values of the image block, and also comprise coding information such as block division, a coding mode, a quantization parameter and the like of the image block.

And (3) image reconstruction: refers to an image that contains coding distortion after an encoding operation.

Second, related technical knowledge

Video compression coding technology has wide application in the fields of multimedia service, broadcasting, video communication and storage, etc. In recent years, two major standards organizations of ITU-T and ISO/IEC jointly make and release three video coding and decoding standards of H.264/AVC, H.265/HEVC and H.266/VVC in 2003, 2013 and 2020. Meanwhile, a series of video image codec standards such as AVS1, AVS2, and AVS3 are also established and output by the AVS standard group. In addition, the AOM alliance also promulgated an AV1 video codec scheme in 2018. The video coding and decoding technology adopts a mixed coding and decoding scheme based on block division and transformation quantization, and continuous technical iteration is performed on specific blocks division, prediction, transformation, entropy coding, loop filtering and other modules, so that the compression efficiency of video images is continuously improved.

(1) Related technical knowledge

In recent years, the academic community has started researching image codec schemes based on Deep Neural Networks (DNNs). Fig. 6 shows a typical image codec scheme based on a deep neural network, which adopts a self-encoder network structure. The input x is the original image to be encoded, which can be expressed as a w _x ×h _x ×c _x Array of (a), w _x 、h _x 、 c _x Respectively representing the number of width, height, color components of the input image. The analyzer and synthesizer are typically deep Convolutional Neural Network (CNN) networks. The analyzer performs dimensionality reduction on the input image x to obtain its Latent representation (Latent representation) yA layer representation can be generally represented as a w _y ×h _y ×c _y Array, w _y 、h _y 、 c _y The number of the channels of the latent layer is the width, the height and the number of the channels of the latent layer. Each element in the latent layer expression y output by the analyzer can be a floating point number or an integer number, and quantization or ordinary rounding operation can be performed to obtain a more compact integer expression y'. And then carrying out entropy coding operation on the y' to obtain a compressed code stream. Entropy decoding is the inverse operation of entropy coding, with the aim of obtaining y' from compressed code stream parsing. The entropy coding and the entropy decoding adopt the same probability model, and the state of the probability model can be synchronously updated in an encoder and a decoder to ensure the matching of the encoding and the decoding. Optionally, the encoder may transmit the probability model parameters to the decoder to ensure that entropy encoding and entropy decoding use the same probability model. The synthesizer acquires a coded image reconstruction x 'based on y'. The input image may be divided into blocks, and each image block is sent to the encoder shown in fig. 3 to perform encoding operation to output a compressed code stream, and the compressed code stream is decoded to obtain the reconstruction of the image block.

The quantization unit 208 in fig. 2 is configured to perform a quantization operation on the transform coefficient 207 output from the transform processing unit 206 by applying scalar quantization or vector quantization to obtain a quantization level (quantization level)209 of the transform coefficient 207, where the quantization level 209 is an output after quantization of the transform coefficient, and is also referred to as a quantized transform coefficient 209 or a quantized coefficient 209.

For example, for scalar quantization, different quantization step sizes (QS) may be applied to achieve finer or coarser quantization. Smaller quantization steps correspond to finer quantization and larger quantization steps correspond to coarser quantization. The quantization step size used may be indicated by a Quantization Parameter (QP). For example, the quantization parameter may be an index of a predefined set of suitable quantization step sizes. For example, a smaller quantization parameter may correspond to a fine quantization (smaller quantization step size) and a larger quantization parameter may correspond to a coarse quantization (larger quantization step size), or vice versa. The inverse quantization unit 210 in fig. 2 performs the same inverse quantization operation as the inverse quantization unit 310 in fig. 3, and its inputs are all quantized coefficients and its output is dequantized coefficients.

Embodiments of encoder 20 (or quantization unit 208) may be configured to output the quantization scheme and quantization step size, e.g., directly or after entropy encoding by entropy encoding unit 270 or any other entropy encoding unit, such that decoder 30 may receive and do the corresponding inverse quantization operation.

The quantization step size, or equivalent quantization parameter, used in the quantization unit 208 and the inverse quantization unit 210/310 must be the same. The quantization parameter is specified by the encoder 20 and is directly output or is entropy-encoded by the entropy encoding unit 270 or any other entropy encoding unit and then output. The decoder 30 may receive the quantization parameter information output by the encoder 20, and entropy-decode the quantization parameter information by the entropy decoding unit 304 or any other entropy decoding unit to obtain the quantization parameter specified by the encoder 20. The HEVC standard scheme is used as an example to illustrate how the encoder passes the quantization parameter to the decoder, as shown in table 1.

TABLE 1

As can be seen from table 1, a QP initial value, i.e., init _ QP _ minus26+26, is passed in a Picture Parameter Set (PPS). In addition, whether different QPs can be specified for different CUs is dictated by the control flag CU _ QP _ delta _ enabled _ flag convention. If the control flag indicates false, all CUs in the entire image use the same QP, and therefore, a different QP cannot be specified for each CU in the image. If the control flag indicates true, a different QP may be specified for each CU in the picture, and the QP information may be written into the video stream when a CU is specifically encoded.

Note that CUs in the HEVC standard may have different sizes, from 64 × 64 to 8 × 8. In the extreme case, a coded picture is divided into 8 × 8-sized CUs, and one QP needs to be transmitted for each 8 × 8-sized CU, which may cause a significant QP coding overhead and a significant increase in the coded video rate. To avoid this extreme case, the HEVC standard specifies the Quantization Group (QG) size by the syntax diff _ cu _ qp _ delta _ depth in PPS. In the case where the size of the coding tree unit is 64 × 64, the mapping relationship between the two is shown in table 2.

TABLE 2

diff _ cu _ qp _ delta _ depth value	0	1	2	3
					Size of QG	64×64	32×32	16×16	8×8

QG is the basic transmission unit that transmits QP. In other words, each QG can only transmit a maximum of one QP. If the CU size is smaller than the QG size, i.e. one QG contains multiple CUs, then the QP is transmitted only in the first CU containing a non-zero quantization level, and the QP will be used for the dequantization of all CUs within the QG. If the CU size is greater than or equal to the QG size, i.e., a CU contains one or more QGs, it is determined whether to transmit QP information for the CU based on whether the CU contains a non-zero quantization level.

The QP initial value transmitted in the PPS applies to all coded pictures within the PPS acting range. When each coded picture, slice, sub-picture, and slice is specifically processed, the QP initial value may be further adjusted to obtain a QP reference value for the processed coded picture, slice, sub-picture, and slice. For example, the HEVC standard scheme transmits syntax Slice _ QP _ delta in a Slice Header (SH), which means that a difference value is superimposed on a QP initial value transmitted in the PPS, thereby obtaining a QP reference value for the Slice, as shown in table 3.

TABLE 3

slice_segment_header(){	Descriptor
		slice_qp_delta	se(v)

When HEVC processes each CU, it determines whether each Transform Unit (TU) is the first TU in the QG of the CU that contains a non-zero quantization level. If so, the QP difference information for the CU is transmitted. Specifically, the QP difference information of the CU includes a QP difference absolute value CU _ QP _ delta _ abs and a CU difference symbol CU _ QP _ delta _ sign _ flag, as shown in table 4; the QP difference value of the CU is CU _ QP _ delta _ abs × (1-2 × CU _ QP _ delta _ sign _ flag). Since one CU only transmits at most one QP difference information, in the case where one CU includes a plurality of TUs, the QP difference information is transmitted only when processing the first TU including a non-zero quantization level.

TABLE 4

Even under the constraint of QG, the overhead incurred by QP value encoding can still significantly reduce video compression finishing efficiency, with at most one CU transmitting its QP value. Thus, it is common to predictively encode QP values. Still taking the HEVC standard as an example, the QP predictor of a CU is derived from the QP values of the left neighboring QG, the upper neighboring QG and the previous coded QG, i.e., the QP value of the processed QG in the neighborhood of the current QG is used to generate the prediction of the QP value of the current QG. After an encoder determines a QP value of a CU according to content complexity and a coding control strategy, only a difference value between the QP value of the CU and the QP predicted value of the CU needs to be transmitted; after the decoder obtains the QP difference value of a CU through decoding, the QP predicted value is obtained through the same prediction operation, and the QP value of the CU can be obtained after the QP predicted value is superposed with the QP difference value.

The strength of quantization distortion can be theoretically analyzed and obtained by performing quantization operation on the specified signal by using the specified quantization step size Qstep. For example, assuming a uniformly distributed source, the mean square error of the quantization distortion for uniform scalar quantization is Qstep ² /12。

However, although the QP mechanism in the existing HEVC standard scheme or any other hybrid codec scheme also uses Qstep-based uniform scalar quantization, it cannot accurately indicate the actual distortion of each image block for the following reasons:

first, if the degree of the residual of any sample point in the residual block of an image block is smaller than the quantization step Qstep, the encoder quantizes the residual of the image block to all 0 s. This is the case more commonly in P, B frame coding, and occurs when the reference picture quality is high and the Qstep setting for the currently encoded picture block is large. At this time, although the decoding end can obtain the valid QP information of the current block, the QP does not reflect the actual distortion of the image block.

Second, if the residual of an image block is quantized to all 0, i.e. there is no residual transmitted for the image block, the decoder skips the inverse quantization operation. To avoid transmitting useless QP information, the encoder does not pass the QP information for the image block to the decoding side either. At this time, the decoding end cannot obtain the valid QP information of the current block at all.

This is determined by the design objective of the QP mechanism in the current generation video codec scheme. The purpose of the current QP mechanism is to do correct inverse quantization operation at the decoding end, rather than to obtain accurate distortion strength information at the decoding end.

(2) Knowledge of related art two

In the learning-based depth image coding and decoding scheme, an image to be coded obtains latent layer expression of an input image through an Encoder (Encoder) sub-network, and then obtains a plurality of quantized coding blocks (quantized codes) after being processed by a Quantization (Quantization) module. On the other hand, the encoding side calculates an importance map (importance map) from the input image. And the coding end performs cutting operation on the quantized coding block by using the significance map, entropy codes the cut quantized coding block and the significance map together, and transmits the coded block and the significance map to the decoding end. In this scheme, the significance map may be used to control the number of quantized code blocks that need to be transmitted, thereby implementing the function of code rate control. Therefore, the importance map substantially corresponds to the Quantization Parameter (QP), and the bit allocation at the region level is performed according to the content of the image.

The significance diagram in the scheme has the functions of telling which coding blocks are contained in the code stream of the decoding end, guiding the Decoder to decode to obtain the coding blocks, setting the coding blocks not contained in the code stream to be 0 so as to obtain complete latent layer expression, and inputting the coding blocks into a Decoder (Decoder) sub-network to perform subsequent decoding operation.

In the current mainstream image compression coding scheme based on the deep neural network, an image to be coded is firstly input into a sub-analyzer network (such as an Encoder sub-network) to extract a latent layer expression of the input image. On the one hand, since the latent layer representation is a kind of reduced-dimension representation of the input image, information loss has been introduced, i.e. distortion of the image signal has been introduced into the latent layer representation, which cannot be represented by the significance map of the above-mentioned scheme. On the other hand, the significance map is only used to determine which coding blocks do not need to be transmitted by coding, and cannot indicate how much signal distortion is caused by discarding the coding blocks. Therefore, the above-described significance map cannot indicate distortion strength information of each region in the encoded image.

In summary, in the existing various video image coding and decoding schemes based on the deep neural network, the trained models are used for coding and decoding the video or the images, and such methods usually perform subjective visual quality optimization aiming at human perception, and essentially perform flexible bit allocation between different coded images and different areas in one image. On the one hand, these codec schemes do not transmit to the decoding end the signal quality (or distortion strength) of each encoded image or of each region within a coded image. On the other hand, in many cases, the true distortion intensity condition of the encoded image cannot be obtained through human eye observation and judgment; for example, some coded images obtained by processing a codec Network trained based on a countermeasure generated Network (GAN) method include false texture information that human eyes cannot distinguish authenticity. Therefore, the existing various coding and decoding schemes based on the deep neural network cannot obtain the distortion intensity information of the current coded image at the decoding end, and specifically cannot obtain the distortion intensity information of each region in one image and the total distortion intensity information of one image at the decoding end. The distortion intensity information of the encoded image can be used to assist in determining whether the content of a certain region in the image is authentic, so that the method is very important for applications such as video monitoring. Therefore, if a video image codec based on the deep neural network is to be used in an actual product or service, this information needs to be informed to a decoding end. It should be noted that even if the conventional hybrid-based codec scheme is used, the accurate distortion strength information of the current coded image cannot be obtained through the transmitted quantization parameter, for the reasons explained above. Therefore, for some video products or services using the conventional hybrid codec scheme, it is also necessary to somehow make the decoding end obtain accurate distortion strength information of the encoded image.

In view of the technical problems presented in the above description, embodiments of the present application provide an encoding method, a decoding method and related devices. In the embodiment of the application, no matter what video image coding and decoding scheme is used, the coding end can compare the original image with the reconstructed image, calculate and obtain the fidelity information of the reconstructed image (including the fidelity information of each image area in the reconstructed image), and carry the fidelity information in the compressed code stream to inform the decoding end; the reconstructed image is a reconstructed image of the original image, that is, an encoded output image. When the fidelity of the reconstructed image is obtained through calculation, any quality evaluation method with reference, such as mse (mean Squared error), sad (sum of Absolute difference), ssim (structural similarity), and the like, may be adopted to calculate and obtain a distortion intensity value of the reconstructed image relative to the original image, or obtain an indication identifier of whether the content of the reconstructed image has synthesized false image content. The fidelity can be calculated at any granularity, such as the whole image, or any M × N size image block in one image, and so on. The fidelity may be calculated for any color component, for example, three fidelity are calculated for three color components of RGB or YUV of an image, or a fidelity may be obtained by fusing the distortion intensities of the three color components. Even aiming at the traditional mixed video image coding and decoding scheme, the method can simultaneously use the QP information already obtained by a decoding end and the fidelity of an image area referred by the current image block in the inter-frame prediction operation to derive and obtain the fidelity of the current image block.

The technical solutions provided in the present application are described in detail below with reference to specific embodiments.

FIG. 7 is a flow diagram illustrating a process 700 of an encoding method according to one embodiment of the present application. Process 700 may be performed by an encoding device, such as video encoder 20. Process 700 is described as a series of steps or operations, it being understood that process 700 may be performed in various orders and/or concurrently and is not limited to the order of execution shown in FIG. 7. Process 700 includes, but is not limited to, the following steps or operations:

701. and encoding the original image to obtain a first code stream.

It should be understood that the original image is also the image 17, so the original image is a coded image; the first code stream also encodes image data 21.

702. And coding a fidelity graph to obtain a second code stream, wherein the fidelity graph is used for representing distortion between at least partial region of the original image and at least partial region of a reconstructed image, and the reconstructed image is obtained by decoding the first code stream.

It should be understood that the reconstructed image is also the decoded image 231, so the reconstructed image is a decoded image. Because the first code stream is obtained by encoding the original image, the reconstructed image obtained by decoding the first code stream is the reconstructed image of the original image, and the reconstructed image and the original image have the same image size.

The fidelity map can be obtained by calculation according to the original image and the reconstructed image, or the fidelity map can be obtained by calculation according to a preset region of the original image and a preset region of the reconstructed image. When the fidelity graph is obtained according to the original image and the reconstructed image through calculation, the fidelity graph is used for representing the fidelity of the whole reconstructed image; when the fidelity map is obtained by calculation according to the preset region of the original image and the preset region of the reconstructed image, the fidelity map is used for representing the fidelity of the preset region of the reconstructed image. The preset area of the original image, namely a certain area image in the original image, can be an image block; the preset region of the reconstructed image is also an image of a certain region in the reconstructed image, which may also be an image block.

In order to enable the decoding end to obtain the distortion intensity information of the encoded image, the first code stream and the second code stream need to be transmitted from the encoding end to the decoding end, the first code stream and the second code stream can be transmitted from the encoding end to the decoding end after being combined, and the first code stream and the second code stream can also be separately and independently transmitted from the encoding end to the decoding end. In addition, if the fidelity map is calculated according to a preset region of the original image and a preset region of the reconstructed image, the position of the preset region in the original image and/or the position of the preset region in the reconstructed image need to be transmitted from the encoding end to the decoding end, where when the preset region is a rectangle, the position of the preset region may be represented by coordinates of the preset region, and the coordinates of the preset region are usually represented as the coordinates of the upper-left luminance pixel of the preset region; the position of the preset region in the original image and/or the position of the preset region in the reconstructed image can be combined with at least one of the first code stream and the second code stream and then transmitted from the encoding end to the decoding end; the position of the preset region in the original image and/or the position of the preset region in the reconstructed image may not be combined with at least one of the first code stream and the second code stream, and is transmitted from the encoding end to the decoding end separately. The encoding end and the decoding end described in the embodiments of the present application may be different electronic devices, or may be different hardware units of the same electronic device, which is not specifically limited in the present application.

It should be understood that since the second code stream is encoded from the fidelity map, the first code stream can be decoded to obtain a reconstructed image of the fidelity map. According to different encoding modes, the reconstructed image of the fidelity map obtained by decoding can be the same as the fidelity map or different from the fidelity map. Specifically; if the encoding is lossless encoding, the reconstructed image of the fidelity image is the same as the fidelity image; if the encoding is lossy, the reconstructed image of the fidelity map includes coding distortion generated by encoding the fidelity map.

In the embodiment of the application, an original image is coded to obtain a first code stream, and a fidelity map is coded to obtain a second code stream, wherein the fidelity map is used for representing distortion between at least partial areas of the original image and at least partial areas of a reconstructed image; the distortion comprises a difference; the decoding end decodes the first code stream to obtain a reconstructed image of the original image, and the decoding end decodes the second code stream to obtain a reconstructed image of the fidelity map; if the encoding is lossless encoding, the reconstructed image of the fidelity image is the same as the fidelity image; if the encoding is lossy encoding, the reconstructed image of the fidelity map comprises encoding distortion generated by encoding the fidelity map; the reconstructed map of the fidelity map may be used to represent distortion between at least a region of the original image and at least a region of the reconstructed image; therefore, the embodiment of the application can obtain the distortion intensity information of the coded image at the decoding end.

In one possible design, the method further includes: dividing the original image into a plurality of first image blocks and dividing the reconstructed image into a plurality of second image blocks, wherein the dividing strategy for dividing the original image is the same as the dividing strategy for dividing the reconstructed image, and the plurality of first image blocks and the plurality of second image blocks are in one-to-one correspondence; or dividing the preset area of the original image into a plurality of first image blocks and dividing the preset area of the reconstructed image into a plurality of second image blocks, wherein the dividing strategy for dividing the preset area of the original image is the same as the dividing strategy for dividing the preset area of the reconstructed image, and the plurality of first image blocks and the plurality of second image blocks are in one-to-one correspondence; and calculating a fidelity value of any second image block according to any second image block in the plurality of second image blocks and the first image block corresponding to the any second image block, wherein the fidelity map comprises the fidelity value of the any second image block, and the fidelity value of the any second image block is used for representing distortion between the any second image block and the first image block corresponding to the any second image block.

The position of any second image block in the plurality of second image blocks in the reconstructed image is the same as the position of a first image block corresponding to any second image block in the original image; the position of the preset area of the original image in the original image is the same as the position of the preset area of the reconstructed image in the reconstructed image, and the position of any one of the second image blocks in the preset area of the reconstructed image is the same as the position of the first image block corresponding to the any one of the second image blocks in the preset area of the original image.

The dividing strategy for dividing the original image is the same as the dividing strategy for dividing the reconstructed image, and the dividing strategy for dividing the preset region of the original image is the same as the dividing strategy for dividing the preset region of the reconstructed image, which means that the size of any first image block in the plurality of divided first image blocks is the same as the size of the corresponding second image block, and the position of any first image block in the plurality of divided first image blocks in the original image is the same as the position of the corresponding second image block in the reconstructed image; when an image block is rectangular, the position of the image block may be represented by the coordinates of the image block, which are usually represented as the luminance pixel coordinates of the upper left corner of the image block. When the original image or the reconstructed image is divided, the original image or the reconstructed image may be divided according to a basic unit, that is, divided according to the size of any image block, so that the size of any first image block obtained by the division is the same, the size of any second image block obtained by the division is also the same, and the size of the first image block and the size of the second image block are the size of the basic unit. It should be understood that the size of the basic unit may be determined according to the size of the original image or the reconstructed image or the number of the first image block or the second image block that needs to be divided, the basic unit may be as small as 1 × 1 pixel, and the basic unit may be as large as the size of the original image or the reconstructed image.

The fidelity may be calculated for any color component, for example, three fidelity are calculated for three color components, RGB or YUV, of one image, respectively, or one fidelity may be obtained by fusing distortion intensities of the three color components. When any color component calculates a fidelity, the fidelity graph is a three-dimensional array; when a fidelity is calculated by fusing three color components, the fidelity map is a two-dimensional array. Thus, the fidelity map calculated for a reconstructed image or a predetermined area of a reconstructed image is a two-dimensional array of (W/M) × (H/N) or a three-dimensional array of (W/M) × (H/N) × C, where W and H represent the width and height of the original image or reconstructed image, W and H represent the width and height of the predetermined area of the original image or the predetermined area of the reconstructed image, M and N represent the width and height of the basic cell used for fidelity calculation, and C represents the number of color components of the original image or the reconstructed image. Without loss of generality, in the following, a specific implementation of the embodiment of the present application is described assuming that C is 1 for simplicity of description.

Specifically, a basic unit may be set to have a size of M × N, the original image and the reconstructed image are divided according to the basic unit, and the original image may be divided into R rows and S columns, where R is (W/M) and S is (H/N), and a total of R × S first image blocks; dividing the reconstructed image into R rows and S columns, wherein R multiplied by S second image blocks are obtained in total; and the size of the first image block and the second image block obtained by dividing is M multiplied by N. For example, the original image is divided into 4 rows and 7 columns, and a total of 4 × 7 ═ 28 first image blocks are shown in fig. 8; and dividing the reconstructed image into 4 rows and 7 columns, and totally 4 × 7 ═ 28 second image blocks, as shown in fig. 9; and the size of the first image block and the second image block obtained by dividing is M multiplied by N. The method for dividing the preset area of the original image and the preset area of the reconstructed image is the same, and the preset area of the original image can be divided into R rows and S columns, wherein R is multiplied by S, and the total number of the first image blocks is R multiplied by S; dividing a preset area of the reconstructed image into R rows and S columns, wherein the R is multiplied by S and the total number of the second image blocks is R multiplied by S; and the size of the first image block and the second image block obtained by dividing is M multiplied by N.

After obtaining the R × S first image blocks and the R × S second image blocks through division, the fidelity of any second image block with respect to the corresponding first image block may be calculated, that is, the fidelity of the second image block in the ith row and the jth column in the reconstructed image with respect to the first image block in the ith row and the jth column in the original image is calculated, where i is greater than or equal to 1 and less than or equal to R, and j is greater than or equal to 1 and less than or equal to S, so that the fidelity of any second image block in the R × S second image blocks and the corresponding first image block may be obtained, that is, the fidelity values of the R × S second image blocks are obtained; and then, a fidelity graph can be obtained according to the fidelity values of the R multiplied by S second image blocks.

In one example, the decoding is performed by a decoder or decoding device; the decoder or decoding device stores the size of the first image block and/or the second image block; or, the decoder or decoding device stores the number of the first image blocks and/or the number of the second image blocks; or, the size of the first image block and/or the second image block is the input of the decoding; or, the number of the first image blocks and/or the number of the second image blocks are input for the decoding; or, the decoder or decoding device stores the size of the basic unit and/or the number of the basic units; or, the size of the elementary units and/or the number of elementary units is the input of the decoding.

Specifically, the decoding end stores the values of M and N or R and S, or the values of M and N or R and S as the input of decoding; therefore, after the decoding end decodes the reconstructed image and the reconstructed image of the fidelity map, the value of any element in the reconstructed image of the fidelity map can be known to characterize the fidelity of which image block in the reconstructed image. When the values of M and N or R and S are input for decoding, they may be combined with at least one of the first code stream and the second code stream and then transmitted to the decoding end, or may be transmitted to the decoding end separately without being combined with at least one of the first code stream and the second code stream.

It should be noted that, in the division of the original image in fig. 8 and the division of the reconstructed image in fig. 9, a uniform division manner is adopted, and the sizes of any obtained first image block or second image block are the same. It can be understood that the original image or the reconstructed image may also be divided by non-uniform division, where the size of any first image block or any second image block obtained by the division is the same, and when the non-uniform division is adopted, a decoding end needs to know the size and the position of any first image block and/or any second image block.

In the embodiment of the application, the size of the original image and the size of the reconstructed image are the same, and the size and the position of the preset region in the original image and the size and the position of the preset region in the reconstructed image are the same; dividing an original image into a plurality of first image blocks according to the same division strategy, and dividing a reconstructed image into a plurality of second image blocks; or dividing the preset area of the original image into a plurality of first image blocks according to the same division strategy, and dividing the preset area of the reconstructed image into a plurality of second image blocks; the method comprises the steps that a plurality of first image blocks obtained through division and second image blocks obtained through division have a one-to-one correspondence relationship, wherein the sizes of any first image blocks are the same, the sizes of any second image blocks are the same, and the sizes of the first image blocks and the sizes of the second image blocks are the same; therefore, the first image block and the second image block can be used as basic units for fidelity calculation, that is, a fidelity value of any second image block can be calculated according to any second image block in the plurality of second image blocks and the corresponding first image block, the fidelity values of the plurality of second image blocks are fidelity values of each area of the reconstructed image, and a fidelity image can be obtained according to the fidelity values of the plurality of second image blocks; when the first image block is obtained by dividing the original image and the second image block is obtained by dividing the reconstructed image, the fidelity map is used for representing the fidelity of the reconstructed image; when the first image block is obtained by dividing the preset area of the original image and the second image block is obtained by dividing the preset area of the reconstructed image, the fidelity map is used for representing the fidelity of the preset area of the reconstructed image; thereby facilitating obtaining a fidelity map of the distortion strength information used to characterize the encoded image.

In one possible design, the fidelity map includes a plurality of first elements, the plurality of second image blocks correspond to the plurality of first elements one to one, a value of any first element in the plurality of first elements is a fidelity value of a second image block corresponding to the any first element, a position of the any first element in the fidelity map is determined according to a position of the second image block corresponding to the any first element in the reconstructed image, or a position of the any first element in the fidelity map is determined according to a position of the second image block corresponding to the any first element in a preset area of the reconstructed image. The first element may also be referred to as a pixel point of the fidelity map.

The construction process of the fidelity graph comprises the following steps: determining a plurality of first elements according to the plurality of second image blocks, wherein the plurality of second image blocks correspond to the plurality of first elements one to one, and the value of any one first element in the plurality of first elements is the fidelity value of the corresponding second image block; and obtaining the fidelity map according to the plurality of first elements, wherein the position of any first element in the plurality of first elements in the fidelity map is determined according to the position of the corresponding second image block in the reconstructed image, or the position of any first element in the fidelity map is determined according to the position of the corresponding second image block in a preset area of the reconstructed image.

When the fidelity graph is calculated, the fidelity of basic units obtained by dividing the image is calculated, so that the fidelity graph has a plurality of first elements when the number of the basic units is large; that is, how many first image blocks or second image blocks and how many first elements exist in the fidelity map; any first element has two attributes, namely a fidelity value and a position of the fidelity value in a fidelity graph. Therefore, any first element in the fidelity diagram is used for representing the fidelity value of the second image block corresponding to the first element, and the value of any first element in the fidelity diagram is the fidelity value of the second image block corresponding to the first element.

Specifically, the reconstructed image or the predetermined area of the reconstructed image is divided into R × S second image blocks, the fidelity map is a two-dimensional array of R rows and S columns, there are altogether R × S first elements, the R × S second image blocks and the R × S first elements are in one-to-one correspondence, and a value of any one of the R × S first elements is a fidelity value of the corresponding second image block. If the fidelity diagram of the whole reconstructed image is calculated, the position of any first element in the R multiplied by S first elements in the fidelity diagram is the same as the position of the corresponding second image block in the reconstructed image; if the fidelity map of the preset area of the reconstructed image is calculated, the position of any first element in the R × S first elements in the fidelity map is the same as the position of the corresponding second image block in the preset area of the reconstructed image. That is, the first element in the ith row and the jth column in the fidelity diagram corresponds to the second image block in the ith row and the jth column in the reconstructed image or the preset area of the reconstructed image, and the value of the first element in the ith row and the jth column in the fidelity diagram is the fidelity value of the second image block in the ith row and the jth column in the preset area of the reconstructed image or the reconstructed image, wherein i is greater than or equal to 1 and less than or equal to R, and j is greater than or equal to 1 and less than or equal to S. For example, as shown in fig. 10, fig. 10 is a fidelity diagram of the reconstructed image shown in fig. 9, where the grids in fig. 10 represent a first element, and the numbers in any grid represent the values of the first element, that is, the fidelity values of the second image block corresponding to the first element; in fig. 9, the reconstructed image is divided into 4 rows and 7 columns, and there are a total of 28 second image blocks, i.e. 4 × 7; in fig. 10, the fidelity map is a two-dimensional array of 4 rows and 7 columns, which has a total of 28 first elements, i.e., 4 × 7; the first element in the ith row and the jth column in fig. 10 corresponds to the second image block in the ith row and the jth column in fig. 9, and the value of the first element in the ith row and the jth column in fig. 10 is the fidelity value of the second image block in the ith row and the jth column in fig. 9, where i is greater than or equal to 1 and less than or equal to 4, and j is greater than or equal to 1 and less than or equal to 7.

In the embodiment of the present application, the fidelity map is a two-dimensional array, the reconstructed image is divided into a plurality of second image blocks, the fidelity map can be obtained according to the fidelity values of the plurality of second image blocks, that is, a plurality of first elements can be determined according to the plurality of second image blocks, the plurality of second image blocks correspond to the plurality of first elements one to one, and the value of any one of the plurality of first elements is the fidelity value of the corresponding second image block; and the position of any first element in the plurality of first elements in the fidelity map is determined according to the position of the second image block corresponding to the first element in the reconstructed image, specifically, the position of any first element in the fidelity map is the same as the position of the second image block corresponding to the first element in the fidelity map in the reconstructed image or the preset region of the reconstructed image, so that the element at each position of the fidelity map represents the fidelity of the region corresponding to the position in the preset region of the reconstructed image or the reconstructed image, and the fidelity map is favorable for representing the distortion intensity information of the encoded image.

In one possible design, the second image block includes three color components, the fidelity map is a three-dimensional array including the color components, three dimensions of width and height, the two-dimensional array under any color component A in the fidelity map comprises a plurality of first elements, the value of any first element in the plurality of first elements is the fidelity value of the color component A of the second image block corresponding to the any first element, the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of a second image block corresponding to any first element in the reconstructed image, or the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of the second image block corresponding to any first element in a preset area of the reconstructed image. When the fidelity map is a three-dimensional array comprising three dimensions of color components, width and height, the two-dimensional array under any color component A represented by the height comprises a plurality of row first elements, the two-dimensional array under any color component A represented by the width comprises a plurality of column first elements, the number of the plurality of first elements is equal to the product of the width and the height, and the color component A is any one of the three color components.

In the embodiment of the application, the original image or the reconstructed image includes three color components, when the fidelity map is calculated, a fidelity map of a two-dimensional array is calculated under any color component, the two-dimensional array under the three color components forms the fidelity map of a three-dimensional array, a first element in the two-dimensional array under any color component a in the fidelity map of the three-dimensional array represents the fidelity of a region corresponding to the position of the reconstructed image or the reconstructed image under any color component a, and therefore the fidelity map of the three-dimensional array can represent distortion intensity information of the three color components of the encoded image.

In one possible design, the encoding the fidelity map to obtain the second code stream includes: entropy encoding said any first element to obtain said second stream of codes, said entropy encoding of said any first element being independent of entropy encoding of other first elements; or determining a probability distribution of a value of any first element or a predicted value of any first element according to a value of at least one first element in the encoded first elements, and performing entropy encoding on any first element according to the probability distribution of the value of any first element or the predicted value of any first element to obtain the second code stream; wherein the second code stream comprises a code stream of the plurality of first elements.

Specifically, in the entropy encoding process of any first element in the fidelity map, if there is no encoded first element, directly performing entropy encoding on the any first element to obtain a code stream of the any first element; if the encoded first elements exist, determining the probability distribution of the value of any first element or the predicted value of any first element according to the value of at least one first element in the encoded first elements, and performing entropy coding on any first element according to the probability distribution of the value of any first element or the predicted value of any first element to obtain the code stream of any first element; wherein the second code stream comprises a code stream of the plurality of first elements.

In the entropy encoding process, the value of any first element may be used to determine a probability distribution of the value of the first element, and the value of the first element, that is, the fidelity value of the second image block, is also used to determine a probability distribution of the fidelity value of the current encoding, so as to assist in improving the entropy encoding efficiency. The input to the arithmetic coding of the symbols is the symbol probability distribution. Where any encoded fidelity value may be used to determine the probability distribution of the fidelity value for the current encoding. For example, the probability distribution of the fidelity value of the current code is determined by the coded fidelity values at the positions of the left side, the upper left side and the like of the fidelity value of the current code in the fidelity map, so as to assist in improving the entropy coding efficiency. For example, different huffman code tables may be selected according to the probability distribution determination of the fidelity value of the current coding, or the subinterval division manner of the arithmetic coding may be determined.

Or, in the entropy encoding process, the value of the encoded first element may be used to determine the predicted value of the first element, that is, the encoded fidelity value is used to determine the predicted value of the fidelity of the current encoding, and then the difference between the value of the first element and the predicted value of the first element is entropy encoded, so as to obtain the code stream of the first element.

In the embodiment of the application, the fidelity map is encoded to obtain a second code stream, that is, any first element in the fidelity map is encoded to obtain a code stream of any first element, and the second code stream includes the code stream of any first element in the fidelity map; in the entropy coding process, a probability distribution of a value of a currently coded first element may be determined by using a value of an already coded first element, for example, a probability distribution of a value of a currently coded first element or a predicted value of the currently coded first element may be determined by using values of first elements adjacent to the currently coded first element on the left, above, left, and the like, and then the currently coded first element is coded according to the probability distribution of the value of the currently coded first element or the predicted value of the currently coded first element, so as to assist in improving entropy coding efficiency.

It should be understood that the above entropy coding is only an exemplary way to give entropy coding; in the entropy coding process, the present application can adopt various existing entropy coding techniques, such as huffman coding, arithmetic coding, context modeling arithmetic coding (a context modeling method is described herein to assist arithmetic coding), binary arithmetic coding, etc.; this is not a particular limitation of the present application.

In one possible design, the encoding the fidelity map to obtain the second code stream includes: quantizing any first element to obtain a quantized first element; encoding the quantized first element to obtain the second code stream; wherein the second code stream comprises a code stream of the plurality of first elements.

The quantization step size for quantizing any one of the plurality of first elements may be the same or different.

In an example, the decoding is performed by a decoder or decoding device; the decoder or the decoding apparatus stores a quantization step size for quantizing any one of the plurality of first elements; or a quantization step size for quantizing any of the plurality of first elements is the decoded input. Thus, the decoding end can recover the fidelity value of the original number level.

In the embodiment of the present application, the fidelity map is encoded to obtain a second code stream, that is, any first element in the fidelity map is encoded to obtain a code stream of any first element in the fidelity map, where the second code stream includes the code stream of any first element in the fidelity map; in the process of encoding any first element in the fidelity diagram, any first element can be quantized, and then any quantized first element is adopted for encoding to obtain a code stream of any first element; quantizing any first element, that is, quantizing the value of any first element, or scaling any fidelity value in the fidelity map; the purpose of quantization is to narrow the dynamic range of the fidelity values in the fidelity map to reduce the coding overhead of the fidelity map.

FIG. 11 is a flow diagram illustrating a process 1100 of a decoding method according to one embodiment of the present application. Process 1100 may be performed by a decoding device, such as video decoder 30. Process 1100 is depicted as a series of steps or operations, and it is to be understood that process 1100 can be performed in various orders and/or concurrently and is not limited to the order of execution depicted in FIG. 11. Process 1100 includes, but is not limited to, the following steps or operations:

1101. And decoding the first code stream to obtain a reconstructed image of the original image.

The first code stream is a code stream obtained by encoding an original image, that is, encoded image data 21; the reconstructed image of the original image, i.e., the decoded image 331, is hereinafter referred to as a reconstructed image.

1102. And decoding a second code stream to obtain a reconstructed image of a fidelity map, wherein the second code stream is obtained by encoding the fidelity map, and the reconstructed image of the fidelity map is used for representing distortion between at least partial areas of the original image and at least partial areas of the reconstructed image.

The reconstructed image of the fidelity map is obtained by decoding the code stream of the fidelity map, so that the reconstructed image of the fidelity map and the fidelity map have the same size and property; when the fidelity map is used for representing the fidelity of the whole reconstructed image, the reconstructed image of the fidelity map is also used for representing the fidelity of the whole reconstructed image; when the fidelity map is used to characterize the fidelity of the predetermined region of the reconstructed image, the reconstructed map of the fidelity map is also used to characterize the fidelity of the predetermined region of the reconstructed image. If the fidelity map is used to characterize the fidelity of the preset region of the reconstructed image, the decoding end needs to obtain the position of the preset region in the original image and/or the position of the preset region in the reconstructed image from the encoding end, so that after the decoding end obtains the reconstructed image of the fidelity map, the decoding end knows which region of the reconstructed image of the fidelity map is used to characterize the fidelity of the specific region of the reconstructed image.

In the embodiment of the application, an original image is encoded to obtain a first code stream, a fidelity map is encoded to obtain a second code stream, the fidelity map is used for representing distortion between at least partial areas of the original image and at least partial areas of a reconstructed image, wherein the distortion comprises difference; the decoding end decodes the first code stream to obtain a reconstructed image of the original image, and the decoding end decodes the second code stream to obtain a reconstructed image of the fidelity map; if the encoding is lossless encoding, the reconstructed image of the fidelity image is the same as the fidelity image; if the encoding is lossy encoding, the reconstructed image of the fidelity map comprises encoding distortion generated by encoding the fidelity map; the reconstructed map of the fidelity map may be used to represent distortion between at least a region of the original image and at least a region of the reconstructed image; therefore, the embodiment of the application can obtain the distortion intensity information of the coded image at the decoding end.

The plurality of second image blocks are obtained by dividing the reconstructed image, the plurality of second image blocks correspond to a plurality of original image blocks one by one, the original image blocks are image blocks in the original image, and for example, the original image blocks are the first image blocks; the plurality of original image blocks are obtained by dividing the original image, the plurality of second image blocks are obtained by dividing the reconstructed image, and the dividing strategy for dividing the original image is the same as the dividing strategy for dividing the reconstructed image; or the plurality of original image blocks are obtained by dividing the preset area of the original image, the plurality of second image blocks are obtained by dividing the preset area of the reconstructed image, and the dividing strategy for dividing the preset area of the original image is the same as the dividing strategy for dividing the preset area of the reconstructed image.

In one possible design, the decoding the second code stream to obtain a reconstructed picture of the fidelity map includes: decoding the second code stream to obtain a reconstruction fidelity value of any first element; and obtaining a reconstructed image of the fidelity map according to the reconstructed fidelity value of any first element.

Wherein the reconstruction fidelity value of the first element is also the reconstruction of the value of the first element. And the position of the reconstruction fidelity value of any first element in the reconstruction image of the fidelity map is determined according to the position of the second image block corresponding to any first element in the reconstruction image. Or, the second code stream comprises the position of any first element in the fidelity map; the position of the reconstructed fidelity value of any first element in the reconstructed image of the fidelity map is determined according to the position of any first element in the fidelity map.

And any first element in the reconstructed graph of the fidelity graph has two attributes, namely the reconstructed fidelity value of any first element and the position of the reconstructed fidelity value in the reconstructed graph of the fidelity graph.

It should be understood that if the encoding is lossless encoding, the reconstructed fidelity value of any first element is the value of the first element, and a, b, c, d, e, f, g, h in the reconstructed map of the fidelity map are 15, 67, 99, 134, 16, 76, 123, 187, respectively, as shown in fig. 10 and 12, wherein one lattice in fig. 12 represents one first element, and the number in any lattice represents the reconstructed fidelity value of the first element. If the encoding is lossy, the reconstruction fidelity value of any first element is the sum of the value of the first element and the encoding distortion.

The fidelity map has a plurality of first elements, so that the reconstructed image or the preset area of the reconstructed image can be divided into a plurality of second image blocks, the plurality of first elements are in one-to-one correspondence with the plurality of second image blocks, and the value of any one of the plurality of first elements is the fidelity value of the corresponding second image block; because the reconstructed image of the fidelity map has the reconstructed fidelity values of the plurality of first elements, and the reconstructed fidelity values of the plurality of first elements are in one-to-one correspondence with the plurality of first elements, the reconstructed fidelity values of the plurality of first elements are also in one-to-one correspondence with the plurality of second image blocks, and the reconstructed fidelity value of any one first element in the reconstructed fidelity values of the plurality of first elements is the fidelity value of the corresponding second image block.

Specifically, as shown in fig. 12, since the reconstructed image or the predetermined region of the reconstructed image is divided into R × S second image blocks, the reconstructed image of the fidelity map is a two-dimensional array of R rows and S columns, which has R × S first elements in total, the R × S second image blocks and the R × S first elements are in one-to-one correspondence, and the reconstruction fidelity value of any one of the R × S first elements is the fidelity value of the second image block corresponding to the first element. If the reconstruction image of the fidelity image of the whole reconstruction image is calculated, the position of any first element in the R multiplied by S first elements in the reconstruction image of the fidelity image is the same as the position of the corresponding second image block in the reconstruction image; if the reconstruction map of the fidelity map of the preset area of the reconstructed image is calculated, the position of any first element in the R multiplied by S first elements in the reconstruction map of the fidelity map is the same as the position of the corresponding second image block in the preset area of the reconstructed image. That is, the first element in the ith row and the jth column in the reconstructed image of the fidelity map corresponds to the second image block in the ith row and the jth column in the reconstructed image or the preset area of the reconstructed image, and the reconstruction fidelity value of the first element in the ith row and the jth column in the reconstructed image of the fidelity map is the fidelity value of the second image block in the ith row and the jth column in the preset area of the reconstructed image or the reconstructed image, wherein i is greater than or equal to 1 and less than or equal to R, and j is greater than or equal to 1 and less than or equal to S.

In the embodiment of the application, the second code stream comprises a code stream of any first element in the fidelity diagram, so that the reconstruction fidelity value of any first element can be obtained by decoding the second code stream; it should be understood that if the encoding is lossless, then the reconstruction fidelity value of the first element is the value of the first element; if the encoding is lossy encoding, the reconstruction fidelity value of the first element comprises encoding distortion generated by encoding the first element, and the reconstruction fidelity value of the first element is the sum of the value of the first element and the encoding distortion; therefore, a reconstructed image of the fidelity map can be obtained according to the reconstructed fidelity value of any first element, so that the reconstructed image of the fidelity map can be used for representing distortion between at least partial regions of the original image and at least partial regions of the reconstructed image; therefore, the embodiment of the application can obtain the distortion intensity information of the coded image at the decoding end.

In a possible design, the second code stream is obtained by encoding the quantized first element; the decoding the second code stream to obtain a reconstructed image of the fidelity map comprises: decoding the second code stream to obtain a reconstructed fidelity value of the quantized first element; performing inverse quantization on the quantized reconstruction fidelity value of the first element to obtain a reconstruction fidelity value of any one first element; and obtaining a reconstructed image of the fidelity map according to the reconstructed fidelity value of any first element.

The quantization step size for quantizing any one of the plurality of first elements may be the same or different; therefore, the quantization step size for performing inverse quantization on any one of the plurality of first elements may be the same or different; and the quantization step size for quantizing any first element in the plurality of first elements is the same as the quantization step size for quantizing the corresponding first element.

In an example, the decoding is performed by a decoder or decoding device; the decoder or the decoding apparatus stores a quantization step size for quantizing any one of the plurality of first elements; or, the decoder or the decoding apparatus stores a quantization step size for inverse-quantizing any one of the plurality of first elements; or a quantization step size for quantizing any of the plurality of first elements is the decoded input; or a quantization step size for inverse quantizing any of the plurality of first elements is the decoded input. Thus, the decoding end can recover the fidelity value of the original number level.

In the embodiment of the application, in order to reduce the encoding overhead, the encoding end may quantize the first element and then encode the first element to obtain the code stream of the first element, so the code stream of the first element obtained by the decoding end may be obtained by encoding the quantized first element; in this case, the second code stream is decoded to obtain a reconstructed fidelity value of the quantized first element, and the reconstructed fidelity value of the quantized first element needs to be inversely quantized to obtain a reconstructed fidelity value of any first element; then, a reconstructed image of the fidelity diagram can be obtained according to the reconstructed fidelity value of any first element; therefore, the method can obtain the distortion intensity information of the coded image at the decoding end and reduce the coding overhead.

In one possible design, the method further includes: processing the reconstructed image or the preset area of the reconstructed image according to the reconstructed image of the fidelity map so as to improve the image quality of the reconstructed image or the preset area of the reconstructed image; or determining whether to apply the reconstructed image according to a reconstructed image of the fidelity map.

Specifically, the image quality of the reconstructed image or the preset area of the reconstructed image can be determined according to the reconstructed image of the fidelity map, and when the image quality of the reconstructed image or the preset area of the reconstructed image is poor, the image quality enhancement algorithm is adopted to process the reconstructed image or the preset area of the reconstructed image so as to improve the image quality of the reconstructed image or the preset area of the reconstructed image; or determining the fidelity of the reconstructed image or the preset region of the reconstructed image according to the reconstructed image of the fidelity map, and determining whether to apply the reconstructed image according to the fidelity of the reconstructed image or the preset region of the reconstructed image, for example, determining not to apply the reconstructed image when the fidelity value of the reconstructed image or the preset region of the reconstructed image is lower than a preset fidelity threshold.

When the reconstructed image or the preset region of the reconstructed image is processed, B distortion ranges can be divided according to the distortion degree of the image in a post-processing enhancement algorithm based on learning to be respectively trained to obtain a plurality of image enhancement models, wherein B is an integer larger than 1, the decoding end can determine the distortion degree of different regions of the reconstructed image according to a fidelity diagram, different models are respectively selected for different regions to carry out image enhancement, and a better image quality improvement effect can be obtained by using a training model which is more matched with distortion distribution. For example, the image enhancement algorithm of the single model is trained and used, the quantitative parameter map can be used as additional input information of the network, and the network can output images with better enhancement effect under the indication of the quantitative parameter map.

In the embodiment of the application, the decoding end can process the reconstructed image or the preset area of the reconstructed image according to the reconstructed image of the fidelity map so as to improve the image quality of the reconstructed image or the preset area of the reconstructed image; or determining whether to apply a reconstructed image according to a reconstructed image of the fidelity map; thereby facilitating the application of the reconstructed image.

It should be noted that, what is described in fig. 11 is that the decoding method is the inverse process of the encoding method described in fig. 7, and the steps or operations described in fig. 11 may refer to the relevant description of the steps or operations described in fig. 7.

The technical solutions provided in fig. 7 to 12 are further described below through the whole process of video image encoding and decoding.

Implementation mode of encoding end

(1) Original image coding

Inputting any video image to be coded (namely an original image) in a video sequence into a video coder for coding operation, and outputting a first code stream of the video image and a reconstructed image of the video image after the coding operation, namely outputting the first code stream of the original image and the reconstructed image of the original image; when a video image to be encoded is subjected to an encoding operation, the video image is often called an encoded image; for a specific encoding operation, reference may be made to the foregoing description, and details are not repeated here. The encoding operation may be any video image encoding method, such as the existing domestic and foreign standard schemes or industry schemes, such as h.264, h.265, h.266, AVS2, AVS3, and AV1, or the video image encoding scheme based on the deep neural network which is being researched by academia and industry, or other schemes capable of compressing the input video image. A reconstructed image refers to an image that contains coding distortion after an encoding operation. Typical coding distortions include blockiness, ringing, blurring, etc. Some codec schemes based on deep learning techniques, such as those based on generation of a countermeasure network (GAN), may also generate new coding distortions such as false image content or texture details.

Each video image in a video sequence can be processed by adopting the method, so that a code stream of the whole video sequence and a reconstructed image of each video image in the video sequence can be generated, wherein the code stream of the whole video sequence comprises a first code stream of each video image in the video sequence.

(2) Computation and representation of fidelity maps

For a video sequence, a fidelity map may be calculated for each video image in the video sequence, thereby obtaining a fidelity map for each video image in the video sequence; and selecting a part of video images in the video sequence to calculate the fidelity map, thereby obtaining the fidelity map of the part of video images in the video sequence. For example, only intra-coded pictures by frame may be selected, whose fidelity map is computed; or only selecting a scene switching frame, namely a first frame image after the scene in the video sequence is switched, and calculating a fidelity graph of the first frame image; or only selecting key frames, namely the lowest time layer image under the condition of coding according to a layered B frame structure reference structure, and calculating the fidelity graph of the key frames; or other selection rules and combinations of selection rules.

Wherein the fidelity map of an encoded image can be calculated according to a preset basic unit size. For example, the encoded image may be divided into a plurality of image blocks of size M × N, each as one basic unit for computing fidelity; m and N are the width and height of the image block respectively, and the values of M and N may or may not be equal. The size of the basic unit can be preset and stored in the encoder and the decoder; the size of the basic unit can also be flexibly set by the encoder according to the richness of the content of all video images or a part of the video images of the video sequence, and the decoder is informed of the set size of the basic unit. It should be understood that, in addition to directly specifying the width and height of a basic unit, the number of basic units in one encoded image may be specified, for example, the number of basic units in the horizontal direction and the vertical direction of the encoded image. Since a coded image is divided into a set of basic units (i.e., a plurality of basic units) by uniform division, the above two basic units are equivalent in terms of representation.

When the fidelity is calculated, any quality evaluation method with reference, such as Mean Squared Error (MSE), Sum of Absolute Difference (SAD), Structural Similarity (SSIM), and the like, may be used to calculate a distortion intensity value of the reconstructed image relative to the original image, or calculate an indication flag indicating whether the content of the reconstructed image has synthesized false image content. Specifically, the original image and the reconstructed image are uniformly divided according to the size of the basic units or the number of the basic units so as to divide the original image into a plurality of first image blocks and divide the reconstructed image into a plurality of second image blocks; calculating the fidelity of any second image block in the plurality of second image blocks relative to the corresponding first image block, wherein the fidelity of the second image block relative to the corresponding first image block can be called a fidelity value, so that the fidelity values of the plurality of second image blocks can be obtained; and then, according to the fidelity values of the plurality of second image blocks, a fidelity graph of the reconstructed image can be obtained. The fidelity map is an array, the array includes a plurality of first elements, the plurality of first elements are in one-to-one correspondence with a plurality of second image blocks obtained by dividing the reconstructed image, any one of the plurality of first elements is used for representing a fidelity value of the second image block corresponding to the first element, a value of any one of the plurality of first elements is the fidelity value of the second image block corresponding to the first element, and a position of any one of the plurality of first elements in the array is the same as a position of the second image block corresponding to the first element in the reconstructed image.

After the fidelity map is calculated, it may be subjected to a quantization operation. Specifically, the value of the first element in the fidelity map may be scaled by using the specified quantization step, that is, the fidelity value of the second image block corresponding to the first element in the fidelity map may be scaled by using the specified quantization step. The purpose of quantization is to reduce the dynamic range of the fidelity values in the fidelity map to reduce the coding overhead of the fidelity map. Since quantization causes distortion to the fidelity value, when the encoding end sets the quantization step length, the corresponding quantization step length can be set according to the accuracy requirement of the decoding end on the fidelity of the reconstructed image. For example, the same quantization step may be set for all encoded images of a video sequence, that is, the fidelity maps of all encoded images are quantized with the same quantization step; or setting a quantization step for a part of coded images of a video sequence and another quantization step for another part of coded images of the video sequence, that is, the fidelity pictures of the part of coded images in all the coded images are quantized by using the same quantization step; or different quantization step sizes are set for different coded images of a video sequence, that is, the fidelity pictures of any one of the coded images are quantized by adopting different quantization step sizes; even different quantization step sizes are set for different image blocks of each coded image of a video sequence, i.e. the fidelity values of all the coded blocks are quantized with different quantization step sizes. The quantization step length set by the encoding end needs to be informed to the decoding end; that is, which quantization step is used for the fidelity diagram of which coded image or which quantization step is used for the fidelity value of which coded block, the coding end needs to inform the decoding end. In addition, besides the encoding end flexibly appoints the quantization step size, the preset quantization step size can also be stored in the encoding end and the decoding end for use; that is, which quantization step is used when the fidelity diagram of which encoded image is quantized or which quantization step is used when the fidelity value of which encoded block is quantized, and the encoding end and the decoding end are pre-stored. Therefore, the encoding end and the decoding end can both obtain the same quantization step size, and the decoding end can recover to obtain the fidelity values of the original number levels in the encoding end.

The fidelity map can be represented by the syntax structure of table 5. In table 5, fidelity _ metric _ idc is a quality evaluation method indication used for calculating fidelity, and may be used to indicate which evaluation method is used in a preset quality evaluation method list to calculate a fidelity map, where the quality evaluation method list is preset and may include quality evaluation indexes such as MSE, SSE, and SSIM; base _ unit _ width and base _ unit _ height are respectively the width and height of the basic unit adopted in the fidelity calculation; quantization _ step is the quantization step; fidelity _ value is a fidelity value calculated in one elementary unit.

TABLE 5

It should be understood that the encoding end may calculate the fidelity map for the entire encoded image, or may calculate the fidelity map for the preset region in the encoded image, where the process of calculating the fidelity map for the preset region in the encoded image by the encoding end is the same as the process of calculating the fidelity map for the entire encoded image, and is not described herein again. If the encoding end calculates the fidelity map for the preset region in the encoded image, the encoding end needs to inform the decoding end of the position of the preset region in the encoded image or the coordinate of the preset region in the encoded image.

(3) Fidelity map coding

As shown in table 5, when encoding the fidelity map, each of the fidelity _ values in the fidelity map of the currently encoded image may be traversed and compression-encoded to obtain a stream of all the fidelity _ values, and the second stream obtained by the fidelity map encoding includes the stream of all the fidelity _ values. Specifically, the value can be binarized by fixed-length coding, exponential golomb coding and other methods to obtain a binary character string, and then entropy coding is performed on each binary character in the binary character string; the value may be entropy-encoded by using a method such as huffman coding or multi-valued arithmetic coding. In the entropy encoding process, the encoded fidelity values in the fidelity map may be used to determine the probability distribution of the currently encoded fidelity values or the predicted values of the currently encoded fidelity values, for example, the encoded fidelity values on the left side, above or left-above spatially adjacent to the currently encoded fidelity values are used to determine the probability distribution of the currently encoded fidelity values or the predicted values of the currently encoded fidelity values, so as to assist in improving the entropy encoding efficiency. For example, different huffman code tables may be selected according to the determined probability distribution of the fidelity value of the current code or the predicted value of the fidelity value of the current code, or the subinterval division mode of the arithmetic coding may be determined.

Since the fidelity map is represented as a two-dimensional or three-dimensional array, the fidelity map can be compression encoded using any of the existing monochrome image or color image encoding methods. At this time, it is not necessary to traverse all the fidelity values calculated by the basic units in table 5, but the fidelity values are directly embedded and encoded by using the existing encoding method to obtain the second code stream.

It should be noted that the second code stream output by the fidelity map coding may be embedded in the first code stream output by the original image coding, or may be independently managed for operations such as transmission or storage.

Through the encoding operation, the encoding end can obtain a first code stream of an original image and a second code stream of a fidelity image, and transmits the first code stream and the second code stream to the decoding end after obtaining the first code stream and the second code stream; the code stream transmitted from the encoding end to the decoding end may be collectively referred to as a compressed code stream.

Second, decoding end implementation mode

(1) Decoding to obtain a reconstructed image

And after receiving the compressed code stream from the encoding end, the decoding end acquires a first code stream of the original image from the compressed code stream, inputs the first code stream into a video decoder, and obtains a reconstructed image through decoding operation. The decoding operation of the decoding end on the first code stream is the inverse operation of the encoding end on the original image, and the reconstructed image obtained by the decoding of the decoding end is the same as the reconstructed image obtained by the encoding end. For a specific decoding operation, reference may be made to the foregoing description, and details are not repeated here.

(2) Decoding to obtain reconstructed image of fidelity image

And after receiving the compressed code stream from the encoding end, the decoding end acquires a second code stream of the fidelity map from the compressed code stream, and decodes the second code stream to acquire a reconstructed map of the fidelity map. The decoding operation of the decoding end on the second code stream is the inverse operation of the encoding end on the fidelity map, if the encoding operation of the encoding end on the fidelity map adopts a lossless encoding mode, as shown in table 5, each fidelity value in the fidelity map is traversed, and lossless entropy encoding is performed on each fidelity value, then the reconstructed map of the fidelity map obtained by the decoding operation of the decoding end is the same as the fidelity map obtained by the calculation of the encoding end. If the encoding operation of the encoding end on the fidelity map is a lossy encoding mode, for example, the fidelity map is processed by a monochrome image encoding mode, the reconstructed map of the fidelity map obtained by the decoding operation of the decoding end is different from the fidelity map calculated by the encoding end, because the reconstructed map of the fidelity map obtained by the decoding operation of the decoding end contains the encoding distortion introduced by the encoding operation on the fidelity map.

(3) Using fidelity mapping

And the decoding end judges the signal distortion condition of a reconstructed image or a preset area in the reconstructed image according to the reconstructed image of the fidelity image and applies the signal distortion condition to different service environments. For example, in a video monitoring scene, the distortion degree of a certain image area in a reconstructed image is judged according to a reconstructed image of a fidelity map, and if the distortion degree is greater than a certain preset threshold, the reconstructed image is not applied; in a video conference scene, whether the seen content is real or false can be judged according to the fidelity diagram. For another example, one of the image enhancement methods may be selected from a group of image enhancement methods according to a preset rule according to a distortion degree of an image region in the reconstructed image, and applied to the image region for image quality improvement.

Fig. 13 is a flow chart illustrating a process 1300 of a decoding method according to another embodiment of the present application. Process 1300 may be performed by a decoding device, such as video decoder 30. Process 1300 is described as a series of steps or operations, it being understood that process 1300 may be performed in various orders and/or occur simultaneously, and is not limited to the order of execution shown in fig. 13. Process 1300 includes, but is not limited to, the following steps or operations:

1301. and decoding the first code stream to obtain a reconstructed image of the original image and target quantization parameter information, wherein the target quantization parameter information comprises quantization parameter values of all or part of second image blocks in a plurality of second image blocks of the reconstructed image.

It should be understood that the first code stream also encodes image data 21; the reconstructed image is also the decoded image 331, so the reconstructed image is a decoded image; the original image is also the image 17, so the original image is a coded image.

In one possible design, the second image block is a coding unit.

The first code stream can be obtained by encoding an original image by an encoder specified by any existing mainstream video image encoding and decoding standard (H.264, H.265, H.266, AVS3, etc.); in the process of encoding an original image, an original image is divided into a plurality of encoding units (namely encoding blocks), any one of the plurality of encoding units is encoded, so that a code stream of any one of the plurality of encoding units is obtained, and a first code stream comprises the code streams of the plurality of encoding units. It should be understood that the division of the original image into a plurality of coding units, and in particular how it is divided, is determined by the encoder. Correspondingly, the decoder for decoding the first code stream is any one of the decoders capable of executing the decoding operation specified by the video image coding and decoding standard. The decoder performs decoding operation specified by a standard on the input first code stream and outputs a reconstructed image of the original image.

The first code stream is decoded, and besides the reconstructed image of the original image, the target quantization parameter information obtained in the decoding process is output, namely the quantization parameter information used in the original image encoding process. The quantization parameter information comprises quantization parameter values of all or part of the coding units obtained by dividing any original image in the coding process, positions of all or part of the coding units obtained by dividing any original image in the original image, and sizes of all or part of the coding units obtained by dividing any original image; the quantization parameter value is a value of a quantization parameter, and the quantization parameter is a quantization parameter used by the quantization unit 208, and in the quantization unit 208, any coding unit performs quantization by using the corresponding quantization parameter. Therefore, the target quantization parameter information includes the quantization parameter values of all or some of the plurality of coding units obtained by dividing the original image, the position of any of the plurality of coding units obtained by dividing the original image in the original image, and the size of any of the plurality of coding units obtained by dividing the original image. The position of a coding unit in the original image can be represented by the coordinates of the coding unit, which are usually expressed as the luminance pixel coordinates in the upper left corner of the coding unit. Since a coding unit is generally rectangular, its size can be expressed in terms of its width and height, typically in terms of the number of luminance pixels; if one coding unit is square, its size can be expressed using only a side length or an area.

1302. And constructing a quantization parameter map of the reconstructed image according to the target quantization parameter information, wherein the quantization parameter map of the reconstructed image is used for representing distortion between at least partial areas of the original image and at least partial areas of the reconstructed image.

Wherein, the quantization parameter map is a quantization parameter map of a reconstructed image, which is a data structure indicating quantization parameter values of coding units in the reconstructed image; and the quantization parameter value of the coding unit in the reconstructed image is also the quantization parameter value of the coding unit in the original image. It should be understood that the primary purpose of the quantization parameter is to do an inverse quantization operation; of course, the quantization parameter itself represents the signal distortion (fidelity), so that the quantization parameter map can be used to characterize the fidelity of the reconstructed image, i.e. the quantization parameter map is a form of fidelity map. Therefore, the quantization parameter map of the reconstructed image constructed according to the quantization parameters can be used for representing the fidelity of the reconstructed image or representing the fidelity of a preset area of the reconstructed image. The quantization parameter map is a two-dimensional array or a three-dimensional array, and the value of a second element in the quantization parameter map is the quantization parameter value of the coding unit; any second element in the quantization parameter map has two attributes, namely a quantization parameter value and a position of the quantization parameter value in the quantization parameter map. It should be understood that each quantization parameter value in the quantization parameter map represents a degree of distortion of a certain image block in the corresponding reconstructed image, and it is obvious that the quantization parameter map is also a form of fidelity map. If a quantization parameter map is constructed by the quantization parameter values of the RGB or YUV three color components of a reconstructed image, the quantization parameter map is a three-dimensional array; if a quantization parameter map is constructed from the fused quantization parameter values of the three color components of a reconstructed image, the quantization parameter map is a two-dimensional array. Without loss of generality, to simplify the description, specific implementations are described below with the quantization parameter map as a two-dimensional array.

The quantization parameter map may have various representations. Some are listed below by way of example.

Firstly, an original image is divided into a plurality of coding units, so that a reconstructed image comprises a plurality of coding units, the coding units are used as basic units for constructing a quantization parameter map, and at the moment, the coding units in the reconstructed image are also the second image blocks, so the plurality of coding units are a plurality of basic units for constructing the quantization parameter map; the plurality of basic units are in one-to-one correspondence with a plurality of second elements in the quantization parameter map, and the plurality of basic units are the plurality of coding units, so that the plurality of coding units are in one-to-one correspondence with the plurality of second elements in the quantization parameter map, and the value of any second element in the quantization parameter map is set as the quantization parameter value of the coding unit corresponding to the second element.

In this way, the position of any second element in the quantization parameter map is the same as the position of its corresponding coding unit in the reconstructed image. Further, the quantization parameter map may also have the same spatial resolution as the corresponding original image or reconstructed image, i.e. the quantization parameter map and the original image or reconstructed image have the same size. As shown in fig. 14, the original image or the reconstructed image includes 6 coding units, that is, the original image or the reconstructed image includes 6 coding units, and there are 6 lattices in the quantization parameter map, that is, there are 6 second elements in the quantization parameter map, where the 6 coding units correspond to the 6 second elements one to one, and a number in any lattice is a quantization parameter corresponding to its corresponding coding unit; if the size of the original image is W × H and the size of the reconstructed image is W × H, the size of the quantization parameter map is W × H. Fig. 14 is only an exemplary illustration of the case where the quantization parameter map and the corresponding original image or reconstructed image have the same size, and it should be understood that the size of the quantization parameter map and the size of the corresponding original image or reconstructed image may also be scaled.

Secondly, dividing the reconstructed image into a plurality of basic units with the same size for constructing a quantization parameter map, wherein the basic units in the reconstructed image are also the second image blocks; wherein, the size of the basic unit is smaller than or equal to the size of the coding unit with the smallest size; for any one basic unit in a plurality of basic units, taking a quantization parameter value of a coding unit containing the basic unit as a quantization parameter value of the basic unit; since a plurality of basic units in the reconstructed image are in one-to-one correspondence with a plurality of second elements in the quantization parameter map, the value of any second element in the quantization parameter map is the quantization parameter value of the corresponding basic unit in the reconstructed image, that is, the quantization parameter value of the coding unit including the basic unit.

For better understanding, it can be simply understood that an original image and a reconstructed image are divided into a plurality of basic units, the basic unit obtained by dividing the original image is referred to as a basic unit i, and the basic unit obtained by dividing the reconstructed image is referred to as a basic unit j; the plurality of basic units i correspond to the plurality of basic units j one by one, the plurality of basic units j correspond to the plurality of second elements one by one, so the plurality of basic units i correspond to the plurality of second elements one by one, the value of any second element in the plurality of second elements is the quantization parameter value of the basic unit i corresponding to the second element, and the quantization parameter value of the basic unit i is the quantization parameter value of the coding unit comprising the basic unit i in the original image. Or another way is adopted to explain that a plurality of coding units in the original image are in one-to-one correspondence with a plurality of reconstruction blocks in the reconstructed image, and the quantization parameter value of any reconstruction block in the plurality of reconstruction blocks in the reconstructed image is the quantization parameter value of the coding unit corresponding to the quantization parameter value; the reconstructed image is divided into a plurality of basic units with the same size, which are used for constructing the quantization parameter map, the plurality of basic units in the reconstructed image are in one-to-one correspondence with the plurality of second elements in the quantization parameter map, so that the value of any second element in the quantization parameter map is the quantization parameter value of the corresponding basic unit in the reconstructed image, namely the quantization parameter value of the reconstructed block containing the basic unit. In this way, the position of any second element in the quantization parameter map is the same as the position of its corresponding basic unit in the original image or the reconstructed image. For example, the size of the reconstructed image is W × H, the encoding units of the original image corresponding to the reconstructed image are divided into basic units of R × S size, as shown in fig. 14, with the size M × N of the encoding unit 2 as the size of the basic unit; thus, the obtained quantization parameter map is constructed as a two-dimensional array with R rows and S columns, as shown in fig. 15, a total of R × S second elements are in the two-dimensional array, the R × S second elements are in one-to-one correspondence with the R × S basic units, a position of any one of the R × S second elements in the two-dimensional array is the same as a position of its corresponding basic unit in the reconstructed image, and a value of any one of the R × S second elements is a quantization parameter value of its corresponding basic unit; specifically, the value 22 of the second element in fig. 15 is the quantization parameter value of the coding unit 1 in fig. 14, the value 24 of the second element in fig. 15 is the quantization parameter value of the coding unit 2 in fig. 14, the value 26 of the second element in fig. 15 is the quantization parameter value of the coding unit 3 in fig. 14, the value 18 of the second element in fig. 15 is the quantization parameter value of the coding unit 4 in fig. 14, the value 20 of the second element in fig. 15 is the quantization parameter value of the coding unit 5 in fig. 14, and the value 16 of the second element in fig. 15 is the quantization parameter value of the coding unit 6 in fig. 14. It should be understood that the size of the quantization parameter map and the size of the corresponding original image or reconstructed image may be the same or may have a certain scaling; the quantization parameter map shown in fig. 15 is reduced in size according to the size of the original image or the reconstructed image.

It should be noted that the quantization parameter map may also be used to characterize the fidelity of the preset region of the reconstructed image, in this case, the value of the second element in the quantization parameter map is the quantization parameter value of a part of the coding units in the original image, that is, only the quantization parameter value of the part of the coding units is used in constructing the quantization parameter map. It should be understood that the quantization parameter map used for characterizing the fidelity of the predetermined region of the reconstructed image may also adopt the above-described representation manner, in which case, the plurality of basic units used for constructing the quantization parameter map are obtained by dividing the predetermined region of the reconstructed image.

In the embodiment of the application, a coding end divides an original image into a plurality of coding units, and codes the plurality of coding units obtained by dividing the original image to obtain a first code stream; the decoding end decodes the first code stream to obtain a reconstructed image of the original image and target quantization parameter information, wherein the target quantization parameter information comprises quantization parameter values of all or part of the plurality of coding units; a quantization parameter map of the reconstructed image can be constructed according to the target quantization parameter information; the quantization parameter map of the reconstructed image is a fidelity map in a form, and when the target quantization parameter information includes quantization parameter values of all the coding units in the plurality of coding units, the quantization parameter map of the reconstructed image is a fidelity map of the entire reconstructed image; when the target quantization parameter information includes quantization parameter values of a part of the encoding units among the plurality of encoding units, the quantization parameter map of the reconstructed image is a fidelity map of a preset region of the reconstructed image; therefore, the quantization parameter map of the reconstructed image can be used for representing the fidelity of the reconstructed image or representing the fidelity of a preset area of the reconstructed image; therefore, the embodiment of the application can obtain the distortion intensity information of the coded image at the decoding end.

In the embodiment of the application, a decoding end performs decoding operation on a first code stream obtained by encoding an original image to obtain quantization parameter values of each region (second image block) of a reconstructed image; the quantization parameter map of the reconstructed image can be constructed according to the quantization parameter values of all or part of the second image blocks in the plurality of second image blocks of the reconstructed image, and the quantization parameter map of the reconstructed image can be used for representing distortion between at least a partial area of the original image and at least a partial area of the reconstructed image.

In one possible design, the quantization parameter map of the reconstructed image includes a plurality of second elements, the plurality of second image blocks are in one-to-one correspondence with the plurality of second elements, a value of any one of the plurality of second elements is a quantization parameter value of a second image block corresponding to the any one of the plurality of second elements, a position of the any one of the second elements in the quantization parameter map of the reconstructed image is determined according to a position of a second image block corresponding to the any one of the second elements in the reconstructed image, or a position of the any one of the second elements in the quantization parameter map of the reconstructed image is determined according to a position of a second image block corresponding to the any one of the second elements in a preset area of the reconstructed image. The second element may also be referred to as a pixel point of the quantization parameter map.

In one possible design, the constructing the quantization parameter map of the reconstructed image according to the target quantization parameter information includes: when the target quantization parameter information includes quantization parameter values of a part of the plurality of coding units, obtaining a quantization parameter value of a target coding unit according to the quantization parameter values of the part of the coding units and/or a reference quantization parameter map, wherein the reference quantization parameter map is a quantization parameter map of a reference image of the reconstructed image, and the target coding unit is a coding unit of the plurality of coding units except the part of the coding units; and obtaining a quantization parameter map of the reconstructed image according to the quantization parameter values of the partial coding units and the quantization parameter values of the target coding unit.

Specifically, when the target quantization parameter information includes quantization parameter values of all coding units in the plurality of coding units, a quantization parameter map of the reconstructed image is obtained according to the quantization parameter values of all coding units; when the target quantization parameter information includes quantization parameter values of a part of the plurality of coding units, obtaining a quantization parameter map of the reconstructed image according to the quantization parameter values of the part of the coding units, or obtaining the quantization parameter map of the reconstructed image according to the quantization parameter values of the part of the coding units and a reference quantization parameter map, where the reference quantization parameter map is a quantization parameter map of a reference image of the reconstructed image.

It should be understood that the reference quantization parameter map is a quantization parameter map of a reference picture of a reconstructed picture, i.e., a reference picture of a coded picture (coding picture). In the current mainstream video image coding and decoding standard, a true and effective quantization parameter value cannot be derived for each coding unit at a decoding end. Taking the h.265 standard scheme as an example, if the residual quantization of a coding unit is all 0, i.e. no residual is transmitted, the decoding end skips the inverse quantization operation. In order to avoid transmitting useless quantization parameter information, the encoding end does not transfer the quantization parameter value of the encoding unit to the decoding end. At this time, the decoding end cannot obtain the quantization parameter value of the coding unit from the first code stream at all. Furthermore, even if a quantization parameter value of one coding unit can be obtained by a decoding operation at a decoding end, the quantization parameter value cannot accurately represent the degree of distortion of the coding unit. Specifically, if the residual tension of a coding unit is smaller than the quantization step (the quantization step is calculated according to the quantization parameter value), the coding end quantizes the residual of the coding unit to all 0 s. This is common in P, B frame coding, and occurs when the reference picture quality is high and the quantization step size setting of the coding unit is large. At this time, although the decoding end may obtain the quantization parameter value of the coding unit through the decoding operation, the quantization parameter value does not reflect the actual distortion of the coding unit.

The scheme for constructing the quantization parameter map provided by the embodiment of the application can be combined with any existing mainstream video image coding and decoding standard (H.264, H.265, H.266, AVS3 and the like), and the quantization parameter map is constructed according to the quantization parameter information obtained in the decoding process. In addition, the inaccurate quantization parameter value identified in the quantization parameter map can be corrected to obtain a corrected quantization parameter map, namely a fidelity map, and finally the fidelity map is applied to the reconstructed image. Specifically, the quantization parameter values of the other coding units in the current original image may be corrected using the quantization parameter values of the coding units in the current original image, or the inaccurate quantization parameter data may be corrected using a quantization parameter map of a reference image of the current original image (hereinafter, simply referred to as a reference quantization parameter map). After the construction and correction of the quantization parameter map of each reconstructed image are completed, the quantization parameter map of each reconstructed image is stored and is reserved to be used as an input for constructing the quantization parameter map of the subsequent reconstructed image, namely the quantization parameter map of the subsequent reconstructed image.

When the target quantization parameter information comprises quantization parameter values of all coding units in the multiple coding units in the reconstructed image, a quantization parameter map of the reconstructed image can be constructed according to the quantization parameter values of all coding units, and the quantization parameter map of the reconstructed image is used for representing the fidelity of the whole reconstructed image; or, the quantization parameter values of the coding units in the preset region of the reconstructed image may be selected from all the coding units, and a quantization parameter map of the reconstructed image is constructed according to the quantization parameter values of the coding units in the preset region of the reconstructed image, where the quantization parameter map of the reconstructed image is used to represent the fidelity of the preset region of the reconstructed image.

When the target quantization parameter information includes quantization parameter values of a part of the coding units in the plurality of coding units in the reconstructed image and the quantization parameter values of the part of the coding units completely include quantization parameter values of the coding units in the preset region of the reconstructed image, a quantization parameter map of the reconstructed image can be constructed according to the quantization parameter values of the coding units in the preset region of the reconstructed image; or, the quantization parameter values may be corrected according to the quantization parameter values of some of the plurality of coding units or the reference quantization parameter map to obtain the quantization parameter values of all of the plurality of coding units, and then the quantization parameter map of the reconstructed image may be constructed according to the quantization parameter values of all of the coding units.

When the target quantization parameter information includes quantization parameter values of a part of the coding units in the multiple coding units in the reconstructed image and the quantization parameter values of the part of the coding units do not completely include quantization parameter values of the coding units in the preset region of the reconstructed image, correcting the quantization parameter values according to the quantization parameter values of the part of the coding units in the multiple coding units or the reference quantization parameter map to obtain quantization parameter values of all the coding units in the multiple coding units; then, a quantization parameter map of the reconstructed image can be constructed according to the quantization parameter values of all the coding units; alternatively, the quantization parameter values of the coding units in the preset region of the reconstructed image may be selected from all the coding units, and the quantization parameter map of the reconstructed image may be constructed according to the quantization parameter values of the coding units in the preset region of the reconstructed image.

In the embodiment of the present application, when the target quantization parameter information includes quantization parameter values of all coding units in the multiple coding units, a quantization parameter map of the reconstructed image may be obtained according to the quantization parameter values of all coding units, and the obtained quantization parameter map of the reconstructed image in this case may be used to characterize the fidelity of the entire reconstructed image; when the target quantization parameter information includes quantization parameter values of some coding units in the multiple coding units, a quantization parameter map of the reconstructed image may be obtained according to the quantization parameter values of some coding units, and the obtained quantization parameter map of the reconstructed image in this case may be used to represent fidelity of a preset region of the reconstructed image; when the target quantization parameter information includes quantization parameter values of a part of the plurality of coding units, a quantization parameter map of the reconstructed image can be obtained according to the quantization parameter values of the part of the coding units and the reference quantization parameter map, and since the reference quantization parameter map is a quantization parameter map of a reference image of the reconstructed image, a quantization parameter value of any one of the plurality of coding units except the part of the coding units can be obtained according to the reference quantization parameter map, so that a quantization parameter of any one of the plurality of coding units can be obtained, and the obtained quantization parameter map of the reconstructed image in this case can be used for representing the fidelity of the whole reconstructed image or representing the fidelity of a preset region of the reconstructed image; therefore, in any case where the target quantization parameter information obtained by decoding includes quantization parameter values of all or some of the plurality of coding units, the embodiment of the present application can obtain a quantization parameter map of the reconstructed image that represents the fidelity of the reconstructed image or represents the fidelity of a preset region of the reconstructed image.

In one possible design, the obtaining a quantization parameter value of a target coding unit according to the quantization parameter value of the partial coding unit includes: determining a quantization parameter value of the target coding unit according to a quantization parameter value of at least one coding unit of the partial coding units.

When the decoding end can not obtain the quantization parameter value of a coding unit from the first code stream, the coding unit is filled with the quantization parameter value of the coding unit space neighborhood. Specifically, assuming that the decoding end cannot obtain the quantization parameter value of the target coding unit from the first code stream, the determining, by using the quantization parameter value of at least one coding unit of the coding units of the quantization parameter value obtained by decoding, the quantization parameter value of the target coding unit includes: and calculating an average value of the several decoded quantization parameter values, wherein the average value is used as the quantization parameter value of the target coding unit. For example, the quantization parameter values of the coding units at the left, upper left, and the like of the target coding unit in the original image may be used as the quantization parameter values of the target coding unit.

In this way, the quantization parameter values of the coding units of the quantization parameter values that cannot be obtained from the first code stream can be filled according to the quantization parameter values of some coding units in the multiple coding units, so as to obtain the quantization parameter values of all coding units in the multiple coding units; further, a quantization parameter map of the reconstructed image can be constructed according to the quantization parameter values of all the coding units; or selecting the quantization parameter values of the coding units in the preset region of the reconstructed image from all the coding units, and constructing the quantization parameter map of the reconstructed image according to the quantization parameter values of the coding units in the preset region of the reconstructed image.

In this embodiment of the present application, when the decoding end cannot obtain the quantization parameter value of a certain coding unit from the first code stream, the quantization parameter value of the spatial neighborhood of the coding unit may be used for filling. Specifically, when the target quantization parameter information includes quantization parameter values of a part of the plurality of coding units, the quantization parameter value of any one of the plurality of coding units other than the part of the plurality of coding units may also be determined according to the quantization parameter value of at least one of the part of the plurality of coding units, so that it is possible to ensure that the quantization parameter value of any one of the plurality of coding units is obtained; moreover, a quantization parameter map of the reconstructed image can be obtained according to quantization parameter values of part or all of the plurality of coding units; when the quantization parameter map of the reconstructed image is obtained according to the quantization parameter values of all the coding units in the plurality of coding units, the obtained quantization parameter map of the reconstructed image can also be used for representing the fidelity of the whole reconstructed image; when the quantization parameter map of the reconstructed image is obtained according to the quantization parameter values of some of the plurality of coding units, the obtained quantization parameter map of the reconstructed image may also be used to characterize the fidelity of the preset region of the reconstructed image.

In one possible design, the reference quantization parameter map includes a plurality of reference elements, a value of any one of the plurality of reference elements being a quantization parameter value of a coding unit in the reference picture; the obtaining a quantization parameter value of a target coding unit according to a reference quantization parameter map includes: and taking the value of a target element as a quantization parameter value of any target coding unit in the target coding units, wherein the target element is a reference element in the reference quantization parameter map, and the position of the target element in the reference quantization parameter map is determined according to the position of the any target coding unit in the reconstructed image, or the position of the target element in the reference quantization parameter map is determined according to the position of the any target coding unit in the reconstructed image and the motion vector of the any target coding unit. Wherein the reference element is another name of the second element.

When the decoding end cannot obtain the quantization parameter value of a certain coding unit from the first code stream, the quantization parameter value of the time neighborhood of the coding unit can be adopted for filling. Assuming that the decoding end cannot obtain the quantization parameter value of the target coding unit from the first code stream, the filling method includes:

In a first method, a value of a target element in a reference quantization parameter map of a reconstructed image is used as a quantization parameter value of a target coding unit, and a position of the target element in the reference quantization parameter map is determined according to a position of the target coding unit in the reconstructed image.

Specifically, a reference coding unit is determined on a reference image of a reconstructed image according to a target coding unit, wherein the position of the reference coding unit in the reference image is the same as the position of the target coding unit in the reconstructed image; and taking the value of a target element in the reference quantization parameter map as the quantization parameter value of the target coding unit, wherein the target element is a second element corresponding to the reference coding unit in the reference quantization parameter map.

In the second method, the value of the target element in the reference quantization parameter map of the reconstructed image is used as the quantization parameter value of the target coding unit, and the position of the target element in the reference quantization parameter map is determined according to the position of the target coding unit in the original image and the motion vector of the target coding unit.

Specifically, the coordinates of the target coding unit on the reconstructed image are (x, y) and the motion vector (mvx, mvy) of the target coding unit is used for making offset to obtain (x + mvx, y + mvy), and the reference coding unit is determined at the (x + mvx, y + mvy) position of the reference image of the reconstructed image; and taking the value of a target element in the reference quantization parameter map as the quantization parameter value of the target coding unit, wherein the target element is a second element corresponding to the reference coding unit in the reference quantization parameter map.

It should be noted that, in the above various filling methods, the representation method of the reference parametric map and the representation method of the quantization parametric map of the reconstructed image are the same, that is, it may be assumed that the quantization parametric map of the reconstructed image, the reference parametric map and the corresponding original image have the same size or are scaled by a ratio.

Further, a plurality of quantization parameter values may be obtained using the various manners described above, and an arithmetic mean operation may be performed on the plurality of quantization parameter values to determine a quantization parameter value of the target coding unit.

In this way, the quantization parameter values of the coding units of the quantization parameter values which cannot be obtained from the first code stream can be filled according to the reference quantization parameter map of the reconstructed image, so as to obtain the quantization parameter values of all the coding units in the multiple coding units; further, a quantization parameter map of the reconstructed image can be constructed according to the quantization parameter values of all the coding units; or selecting the quantization parameter values of the coding units in the preset region of the reconstructed image from all the coding units, and constructing the quantization parameter map of the reconstructed image according to the quantization parameter values of the coding units in the preset region of the reconstructed image.

In this embodiment of the present application, when a quantization parameter value of a certain coding unit cannot be obtained from a first code stream, the quantization parameter of a time neighborhood of the coding unit may be used for filling. Specifically, for any one coding unit of quantization parameter values which cannot be obtained from the first code stream; the value of the target element in the reference quantization parameter map may be used as the quantization parameter value of the coding unit, the position of the target element in the reference quantization parameter map is determined according to the position of the coding unit in the reconstructed image, or the position of the target element in the reference quantization parameter map is determined according to the position of the coding unit in the reconstructed image and the motion vector of the coding unit. Thus, the quantization parameter value of any one coding unit in a plurality of coding units can be obtained; moreover, a quantization parameter map of the reconstructed image can be obtained according to quantization parameter values of part or all of the plurality of coding units; when the quantization parameter map of the reconstructed image is obtained according to the quantization parameter values of all the coding units in the plurality of coding units, the obtained quantization parameter map of the reconstructed image can also be used for representing the fidelity of the whole reconstructed image; when the quantization parameter map of the reconstructed image is obtained according to the quantization parameter values of some of the plurality of coding units, the obtained quantization parameter map of the reconstructed image may also be used to characterize the fidelity of the preset region of the reconstructed image.

In one possible design, the method further includes: and storing the reconstructed image and the quantization parameter map of the reconstructed image in a correlation mode so as to take the reconstructed image as a reference image and take the quantization parameter map of the reconstructed image as a reference quantization parameter map.

After construction and correction are completed, one quantization parameter graph is stored and is reserved to be used as input of a quantization parameter graph for constructing a subsequent coding image; for example, a quantization parameter map buffer is designed for storing the quantization parameter map.

The decoder of any mainstream video codec includes a decoded picture buffer to store decoded pictures and a reference picture management mechanism to manage the addition and removal of decoded pictures. In the scheme of the application, each quantization parameter map corresponds to a decoded image, and the quantization parameter information of the decoded image is recorded. Therefore, the quantization parameter map thereof can be managed in exactly the same operation as that of managing one decoded image. In other words, a decoded picture and its quantized parameter map are managed in the decoded picture register and the reference quantized parameter map register, respectively, and the management operations of the two are completely synchronized.

In the embodiment of the present application, the quantization parameter map of the reconstructed image may be stored for use as a reference quantization parameter map for constructing a quantization parameter map of a subsequent decoded image, thereby facilitating construction of the quantization parameter map of the subsequent decoded image.

In one possible design, the method further includes: processing the reconstructed image or a preset region of the reconstructed image according to the quantization parameter map of the reconstructed image so as to improve the image quality of the reconstructed image or the preset region of the reconstructed image; or determining whether to apply the reconstructed image according to the quantization parameter map of the reconstructed image.

Specifically, the decoding end judges the signal distortion condition of a reconstructed image or a preset area in the reconstructed image according to the reconstructed image of the fidelity map, and applies the signal distortion condition to different service environments. For example, in a video monitoring scene, the distortion degree of a certain image area in a reconstructed image is judged according to a reconstructed image of the fidelity map, and if the distortion degree is greater than a certain preset threshold, the reconstructed image is not applied. For another example, one of the image enhancement methods may be selected from a group of image enhancement methods according to a preset rule according to a distortion degree of an image region in the reconstructed image, and applied to the image region for image quality improvement.

When the reconstructed image or the preset region of the reconstructed image is processed, B distortion ranges can be divided according to the distortion degree of the image in a post-processing enhancement algorithm based on learning to respectively train to obtain a plurality of image enhancement models, wherein B is an integer larger than 1, the decoding end can determine the distortion degree of different regions of the reconstructed image according to a quantization parameter graph of the reconstructed image, different models are respectively selected for different regions to carry out image enhancement, and a better image quality improvement effect can be obtained by using a training model which is more matched with distortion distribution. For another example, a single-model image enhancement algorithm is trained and used, the quantitative parameter map can be used as additional input information of the network, and the network can output images with better enhancement effect under the indication of the quantitative parameter map.

Fig. 16 is a schematic block diagram of an encoding apparatus provided in an embodiment of the present application; the encoding device comprises a video encoder and a fidelity map encoder, wherein:

The video encoder is used for encoding an original image to obtain a first code stream;

and the fidelity map encoder is used for encoding the fidelity map to obtain a second code stream, wherein the fidelity map is used for representing distortion between at least part of the region of the original image and at least part of the region of the reconstructed image, and the reconstructed image is obtained after the first code stream is decoded.

The compressed code stream in fig. 16 is a generic name of a code stream transmitted from the encoding end to the decoding end, and the compressed code stream includes a first code stream and a second code stream.

In one possible design, the encoding apparatus further includes a fidelity map calculator to: dividing the original image into a plurality of first image blocks and dividing the reconstructed image into a plurality of second image blocks, wherein the dividing strategy for dividing the original image is the same as the dividing strategy for dividing the reconstructed image, and the plurality of first image blocks and the plurality of second image blocks are in one-to-one correspondence; or dividing the preset area of the original image into a plurality of first image blocks and dividing the preset area of the reconstructed image into a plurality of second image blocks, wherein the dividing strategy for dividing the preset area of the original image is the same as the dividing strategy for dividing the preset area of the reconstructed image, and the plurality of first image blocks and the plurality of second image blocks are in one-to-one correspondence; and calculating a fidelity value of any one of the second image blocks according to the any one of the second image blocks and the first image block corresponding to the any one of the second image blocks, wherein the fidelity map comprises the fidelity value of the any one of the second image blocks, and the fidelity value of the any one of the second image blocks is used for representing distortion between the any one of the second image blocks and the first image block corresponding to the any one of the second image blocks.

In one possible design, the fidelity map includes a plurality of first elements, the plurality of second image blocks correspond to the plurality of first elements one to one, a value of any first element in the plurality of first elements is a fidelity value of a second image block corresponding to the any first element, a position of the any first element in the fidelity map is determined according to a position of the second image block corresponding to the any first element in the reconstructed image, or a position of the any first element in the fidelity map is determined according to a position of the second image block corresponding to the any first element in a preset area of the reconstructed image.

It should be noted that the implementation of each unit may also correspond to the corresponding description of the method embodiment shown in fig. 7.

In the encoding apparatus depicted in fig. 16, an original image is encoded to obtain a first code stream, and a fidelity map used for representing distortion between at least a partial region of the original image and at least a partial region of a reconstructed image is encoded to obtain a second code stream; the decoding end decodes the first code stream to obtain a reconstructed image of the original image, and the decoding end decodes the second code stream to obtain a reconstructed image of the fidelity map; if the encoding is lossless encoding, the reconstructed image of the fidelity image is the same as the fidelity image; if the encoding is lossy encoding, the reconstructed image of the fidelity map comprises encoding distortion generated by encoding the fidelity map; the reconstructed map of the fidelity map may be used to represent distortion between at least a region of the original image and at least a region of the reconstructed image; therefore, the embodiment of the application can obtain the distortion intensity information of the coded image at the decoding end.

Fig. 17 is a schematic block diagram of a decoding device provided in an embodiment of the present application; the decoding apparatus includes a video decoder and a fidelity diagram decoder, wherein:

the video decoder is used for decoding the first code stream to obtain a reconstructed image of the original image;

And the fidelity diagram decoder is used for decoding a second code stream to obtain a reconstructed diagram of a fidelity diagram, wherein the second code stream is obtained by encoding the fidelity diagram, and the reconstructed diagram of the fidelity diagram is used for representing distortion between at least partial areas of the original image and at least partial areas of the reconstructed image.

The compressed code stream in fig. 17 is a generic name of a code stream transmitted from the encoding end to the decoding end, and the compressed code stream includes a first code stream and a second code stream.

In one possible design, the fidelity graph decoder is specifically configured to: decoding the second code stream to obtain a reconstruction fidelity value of any first element; and obtaining a reconstructed image of the fidelity map according to the reconstructed fidelity value of any first element.

It should be noted that the implementation of each unit may also correspond to the corresponding description of the method embodiment shown in fig. 11.

In the decoding apparatus depicted in fig. 17, a reconstructed image of an original image can be obtained by decoding a first code stream, and a reconstructed image of a fidelity map is obtained by decoding a second code stream, where the first code stream is obtained by encoding the original image, and the second code stream is obtained by encoding the fidelity map, and the fidelity map is used to represent distortion between at least a partial region of the original image and at least a partial region of the reconstructed image; if the encoding is lossless encoding, the reconstructed image of the fidelity image is the same as the fidelity image; if the encoding is lossy encoding, the reconstructed image of the fidelity map comprises encoding distortion generated by encoding the fidelity map; the reconstructed map of the fidelity map may be used to represent distortion between at least a region of the original image and at least a region of the reconstructed image; therefore, the embodiment of the application can obtain the distortion intensity information of the coded image at the decoding device.

Fig. 18 is a schematic block diagram of a decoding device provided in an embodiment of the present application; the decoding apparatus includes a video decoder and a fidelity diagram decoder, wherein:

the video decoder is used for decoding the first code stream to obtain a reconstructed image of an original image and target quantization parameter information, wherein the target quantization parameter information comprises quantization parameter values of all or part of second image blocks in a plurality of second image blocks of the reconstructed image;

and a quantization parameter map builder for building a quantization parameter map of the reconstructed image according to the target quantization parameter information, wherein the quantization parameter map of the reconstructed image is used for representing distortion between at least a partial region of the original image and at least a partial region of the reconstructed image.

The compressed code stream in fig. 18 is a generic name of a code stream transmitted from the encoding end to the decoding end, and the compressed code stream includes a first code stream.

In one possible design, the second image block is a coding unit.

In one possible design, the quantized parameter map builder is specifically configured to: when the target quantization parameter information includes quantization parameter values of a part of the plurality of coding units, obtaining a quantization parameter value of a target coding unit according to the quantization parameter values of the part of the coding units and/or a reference quantization parameter map, wherein the reference quantization parameter map is a quantization parameter map of a reference image of the reconstructed image, and the target coding unit is a coding unit of the plurality of coding units except the part of the coding units; and obtaining a quantization parameter map of the reconstructed image according to the quantization parameter values of the partial coding units and the quantization parameter values of the target coding unit.

In one possible design, the reference quantization parameter map includes a plurality of reference elements, and a value of any one of the plurality of reference elements is a quantization parameter value of a coding unit in the reference image; the quantized parameter map builder is specifically configured to: and taking the value of a target element as a quantization parameter value of any target coding unit in the target coding units, wherein the target element is a reference element in the reference quantization parameter map, and the position of the target element in the reference quantization parameter map is determined according to the position of the any target coding unit in the reconstructed image, or the position of the target element in the reference quantization parameter map is determined according to the position of the any target coding unit in the reconstructed image and the motion vector of the any target coding unit.

It should be noted that the implementation of each unit may also correspond to the corresponding description of the method embodiment shown in fig. 13.

In the decoding apparatus depicted in fig. 18, a quantization parameter value of each region (second image block) of the reconstructed image can be obtained by performing a decoding operation on the first code stream obtained by encoding the original image; the quantization parameter map of the reconstructed image can be constructed according to the quantization parameter values of all or part of the second image blocks in the plurality of second image blocks of the reconstructed image, and the quantization parameter map of the reconstructed image can be used for representing distortion between at least a partial area of the original image and at least a partial area of the reconstructed image.

Fig. 19 is a schematic block diagram of an encoding apparatus according to an embodiment of the present application; the encoding apparatus 1900 is applied to an encoding device, and the encoding apparatus 1900 includes a processing unit 1901 and a communication unit 1902, where the processing unit 1901 is configured to execute any step in the method embodiment shown in fig. 7, and when data transmission such as acquisition is performed, the communication unit 1902 is optionally called to complete a corresponding operation. The details will be described below.

The processing unit 1901 is configured to: encoding an original image to obtain a first code stream; and coding a fidelity graph to obtain a second code stream, wherein the fidelity graph is used for representing distortion between at least partial region of the original image and at least partial region of a reconstructed image, and the reconstructed image is obtained after decoding the first code stream.

In one possible design, the processing unit 1901 is further configured to: dividing the original image into a plurality of first image blocks and dividing the reconstructed image into a plurality of second image blocks, wherein the dividing strategy for dividing the original image is the same as the dividing strategy for dividing the reconstructed image, and the plurality of first image blocks are in one-to-one correspondence with the plurality of second image blocks; or dividing the preset area of the original image into a plurality of first image blocks and dividing the preset area of the reconstructed image into a plurality of second image blocks, wherein the dividing strategy for dividing the preset area of the original image is the same as the dividing strategy for dividing the preset area of the reconstructed image, and the plurality of first image blocks and the plurality of second image blocks are in one-to-one correspondence; and calculating a fidelity value of any second image block according to any second image block in the plurality of second image blocks and the first image block corresponding to the any second image block, wherein the fidelity map comprises the fidelity value of the any second image block, and the fidelity value of the any second image block is used for representing distortion between the any second image block and the first image block corresponding to the any second image block.

In one possible design, the processing unit 1901 is specifically configured to: entropy coding any one first element to obtain the second code stream, wherein the entropy coding of any one first element is independent of the entropy coding of other first elements; or determining a probability distribution of a value of any first element or a predicted value of any first element according to a value of at least one first element in the encoded first elements, and performing entropy encoding on any first element according to the probability distribution of the value of any first element or the predicted value of any first element to obtain the second code stream; wherein the second code stream comprises a code stream of the plurality of first elements.

In one possible design, the processing unit 1901 is specifically configured to: quantizing any first element to obtain a quantized first element; encoding the quantized first element to obtain the second code stream; wherein the second code stream comprises a code stream of the plurality of first elements.

The encoding apparatus 1900 may further include a storage unit 1903 for storing program codes and data of the encoding device, among others. The processing unit 1901 may be a processor, the communication unit 1902 may be a transceiver, and the storage unit 1903 may be a memory.

In the encoding apparatus depicted in fig. 19, an original image is encoded to obtain a first codestream, and a fidelity map representing distortion between at least a partial region of the original image and at least a partial region of a reconstructed image is encoded to obtain a second codestream; the decoding end decodes the first code stream to obtain a reconstructed image of the original image, and the decoding end decodes the second code stream to obtain a reconstructed image of the fidelity map; if the encoding is lossless encoding, the reconstructed image of the fidelity image is the same as the fidelity image; if the encoding is lossy encoding, the reconstructed image of the fidelity map comprises encoding distortion generated by encoding the fidelity map; the reconstructed image of the fidelity map can be used for distortion between at least a partial region of the original image and at least a partial region of the reconstructed image; therefore, the embodiment of the application can obtain the distortion intensity information of the coded image at the decoding end.

Fig. 20 is a schematic block diagram of a decoding apparatus according to an embodiment of the present application; the decoding apparatus 2000 is applied to a decoding device, the decoding apparatus 2000 includes a processing unit 2001 and a communication unit 2002, wherein the processing unit 2001 is configured to execute any step in the method embodiment shown in fig. 11, and when data transmission such as acquisition is performed, the communication unit 2002 is optionally invoked to complete a corresponding operation. The details will be described below.

The processing unit 2001 is configured to: decoding the first code stream to obtain a reconstructed image of the original image; and decoding a second code stream to obtain a reconstructed image of a fidelity map, wherein the second code stream is obtained by encoding the fidelity map, and the reconstructed image of the fidelity map is used for representing distortion between at least partial areas of the original image and at least partial areas of the reconstructed image.

In one possible design, the processing unit 2001 is specifically configured to: decoding the second code stream to obtain a reconstruction fidelity value of any first element; and obtaining a reconstructed image of the fidelity map according to the reconstructed fidelity value of any first element.

In a possible design, the second code stream is obtained by encoding the quantized first element; the processing unit 2001 is specifically configured to: decoding the second code stream to obtain a reconstructed fidelity value of the quantized first element; performing inverse quantization on the quantized reconstruction fidelity value of the first element to obtain a reconstruction fidelity value of any one first element; and obtaining a reconstructed image of the fidelity map according to the reconstructed fidelity value of any first element.

The decoding apparatus 2000 may further include a storage unit 2003 for storing program codes and data of the decoding device, among others. The processing unit 2001 may be a processor, the communication unit 2002 may be a transceiver, and the storage unit 2003 may be a memory.

In the decoding apparatus depicted in fig. 20, a reconstructed image of an original image can be obtained by decoding a first code stream, and a reconstructed image of a fidelity map is obtained by decoding a second code stream, where the first code stream is obtained by encoding the original image, and the second code stream is obtained by encoding the fidelity map, and the fidelity map is used to represent distortion between at least a partial region of the original image and at least a partial region of the reconstructed image; if the encoding is lossless encoding, the reconstructed image of the fidelity image is the same as the fidelity image; if the encoding is lossy encoding, the reconstructed image of the fidelity map comprises encoding distortion generated by encoding the fidelity map; the reconstructed image of the fidelity map may be used to represent a distortion between at least a region of the original image and at least a region of the reconstructed image; therefore, the embodiment of the application can obtain the distortion strength information of the coded image at the decoding device.

Fig. 21 is a schematic block diagram of a decoding apparatus according to an embodiment of the present application; the decoding apparatus 2100 is applied to a decoding device, and the decoding apparatus 2100 comprises a processing unit 2101 and a communication unit 2102, wherein the processing unit 2101 is configured to execute any step in the method embodiment shown in fig. 13, and when data transmission such as acquisition is performed, the communication unit 2102 is optionally invoked to complete the corresponding operation. The details will be described below.

The processing unit 2101 is configured to: decoding the first code stream to obtain a reconstructed image of an original image and target quantization parameter information, wherein the target quantization parameter information comprises quantization parameter values of all or part of second image blocks in a plurality of second image blocks of the reconstructed image; and constructing a quantization parameter map of the reconstructed image according to the target quantization parameter information, wherein the quantization parameter map of the reconstructed image is used for representing distortion between at least partial areas of the original image and at least partial areas of the reconstructed image.

In one possible design, the second image block is a coding unit.

In a possible design, the quantization parameter map of the reconstructed image includes a plurality of second elements, the plurality of second image blocks are in one-to-one correspondence with the plurality of second elements, a value of any second element in the plurality of second elements is a quantization parameter value of a second image block corresponding to the any second element, a position of the any second element in the quantization parameter map of the reconstructed image is determined according to a position of the second image block corresponding to the any second element in the reconstructed image, or a position of the any second element in the quantization parameter map of the reconstructed image is determined according to a position of the second image block corresponding to the any second element in a preset area of the reconstructed image.

In one possible design, the processing unit 2101 is specifically configured to: when the target quantization parameter information includes quantization parameter values of a part of the plurality of coding units, obtaining a quantization parameter value of a target coding unit according to the quantization parameter values of the part of the coding units and/or a reference quantization parameter map, wherein the reference quantization parameter map is a quantization parameter map of a reference image of the reconstructed image, and the target coding unit is a coding unit of the plurality of coding units except the part of the coding units; and obtaining a quantization parameter map of the reconstructed image according to the quantization parameter values of the partial coding units and the quantization parameter values of the target coding unit.

In one possible design, the reference quantization parameter map includes a plurality of reference elements, and a value of any one of the plurality of reference elements is a quantization parameter value of a coding unit in the reference image; the processing unit 2101 is specifically configured to: and taking the value of a target element as a quantization parameter value of any target coding unit in the target coding units, wherein the target element is a reference element in the reference quantization parameter map, and the position of the target element in the reference quantization parameter map is determined according to the position of the any target coding unit in the reconstructed image, or the position of the target element in the reference quantization parameter map is determined according to the position of the any target coding unit in the reconstructed image and the motion vector of the any target coding unit.

The decoding apparatus 2100 may further include a storage unit 2103 for storing program codes and data of the decoding device, among others. The processing unit 2101 may be a processor, the communication unit 2102 may be a transceiver, and the storage unit 2103 may be a memory.

In the decoding apparatus depicted in fig. 21, the quantization parameter values of each region (second image block) of the reconstructed image can be obtained by performing a decoding operation on the first code stream obtained by encoding the original image; the quantization parameter map of the reconstructed image can be constructed according to the quantization parameter values of all or part of the second image blocks in the plurality of second image blocks of the reconstructed image, and the quantization parameter map of the reconstructed image can be used for representing distortion between at least a partial area of the original image and at least a partial area of the reconstructed image.

An embodiment of the present application provides an apparatus for encoding a video stream, comprising a processor and a memory. The memory stores instructions that cause the processor to perform the method shown in fig. 7.

An embodiment of the present application provides an apparatus for decoding a video stream, comprising a processor and a memory. The memory stores instructions that cause the processor to perform the method shown in fig. 11.

An embodiment of the present application provides an apparatus for decoding a video stream, comprising a processor and a memory. The memory stores instructions that cause the processor to perform the method shown in fig. 13.

Embodiments of the present application provide a computer-readable storage medium having instructions stored thereon that, when executed, cause one or more processors to encode video data. The instructions cause the one or more processors to perform the method illustrated in fig. 7, 11, or 13.

Embodiments of the present application provide a computer program product comprising program code which, when run, performs the method illustrated in figure 7, figure 11 or figure 13.

An embodiment of the present application provides an encoder (20) comprising processing circuitry for performing the method shown in fig. 7.

The embodiment of the application provides a decoder (30) which comprises a processing circuit and is used for executing the method shown in the figure 11 or the figure 13.

An embodiment of the present application provides an encoder, including: one or more processors; a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, causes the encoder to perform the method of fig. 7.

An embodiment of the present application provides a decoder, including: one or more processors; a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, causes the decoder to perform the method shown in fig. 11 or fig. 13.

Embodiments of the present application provide a non-transitory computer-readable storage medium including program code for performing the method illustrated in fig. 7, 11 or 13 when executed by a computer device.

An embodiment of the present application provides a non-transitory storage medium including a bitstream encoded according to the method shown in fig. 7.

Those of skill in the art will appreciate that the functions described in connection with the various illustrative logical blocks, modules, and algorithm steps described in the disclosure herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions described in the various illustrative logical blocks, modules, and steps may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or any communication medium including a medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described herein. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this application may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this application to emphasize functional aspects of means for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit, in conjunction with suitable software and/or firmware, or provided by an interoperating hardware unit (including one or more processors as described above).

The above description is only an exemplary embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An encoding method, comprising:

encoding an original image to obtain a first code stream;

and coding a fidelity graph to obtain a second code stream, wherein the fidelity graph is used for representing distortion between at least partial region of the original image and at least partial region of a reconstructed image, and the reconstructed image is obtained by decoding the first code stream.

2. The method of claim 1, further comprising:

dividing the original image into a plurality of first image blocks and dividing the reconstructed image into a plurality of second image blocks, wherein the dividing strategy for dividing the original image is the same as the dividing strategy for dividing the reconstructed image, and the plurality of first image blocks and the plurality of second image blocks are in one-to-one correspondence;

Or dividing the preset area of the original image into a plurality of first image blocks and dividing the preset area of the reconstructed image into a plurality of second image blocks, wherein the dividing strategy for dividing the preset area of the original image is the same as the dividing strategy for dividing the preset area of the reconstructed image, and the plurality of first image blocks and the plurality of second image blocks are in one-to-one correspondence;

and calculating a fidelity value of any second image block according to any second image block in the plurality of second image blocks and the first image block corresponding to the any second image block, wherein the fidelity map comprises the fidelity value of the any second image block, and the fidelity value of the any second image block is used for representing distortion between the any second image block and the first image block corresponding to the any second image block.

3. The method according to claim 2, wherein the fidelity map includes a plurality of first elements, the plurality of second tiles correspond to the plurality of first elements one-to-one, a value of any first element in the plurality of first elements is a fidelity value of a second tile corresponding to the any first element, a position of the any first element in the fidelity map is determined according to a position of the second tile corresponding to the any first element in the reconstructed image, or a position of the any first element in the fidelity map is determined according to a position of the second tile corresponding to the any first element in a preset area of the reconstructed image.

4. The method according to claim 2, wherein the second image block comprises three color components, the fidelity map is a three-dimensional array comprising three dimensions, a color component, a width and a height, the two-dimensional array under any color component A in the fidelity map comprises a plurality of first elements, the value of any first element in the plurality of first elements is the fidelity value of the color component A of the second image block corresponding to the any first element, the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of a second image block corresponding to any first element in the reconstructed image, or the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of the second image block corresponding to any first element in a preset area of the reconstructed image.

5. The method of claim 3 or 4, wherein encoding the fidelity map to obtain the second code stream comprises:

entropy encoding said any first element to obtain said second stream of codes, said entropy encoding of said any first element being independent of entropy encoding of other first elements; alternatively, the first and second electrodes may be,

Determining a probability distribution of a value of any one first element or a predicted value of any one first element according to a value of at least one first element in the encoded first elements, and performing entropy encoding on any one first element according to the probability distribution of the value of any one first element or the predicted value of any one first element to obtain the second code stream;

wherein the second code stream comprises a code stream of the plurality of first elements.

6. The method of claim 3 or 4, wherein encoding the fidelity map to obtain the second code stream comprises:

quantizing any first element to obtain a quantized first element;

encoding the quantized first element to obtain the second code stream;

7. A method of decoding, comprising:

decoding the first code stream to obtain a reconstructed image of the original image;

and decoding the second code stream to obtain a reconstructed image of a fidelity map, wherein the second code stream is obtained by encoding the fidelity map, and the reconstructed image of the fidelity map is used for representing distortion between at least partial areas of the original image and at least partial areas of the reconstructed image.

8. The method of claim 7,

the fidelity map comprises a fidelity value of any second image block in the plurality of second image blocks, wherein the fidelity value of any second image block is used for representing distortion between any second image block and an original image block corresponding to any second image block.

9. The method according to claim 8, wherein the fidelity map includes a plurality of first elements, the plurality of second tiles correspond to the plurality of first elements one-to-one, a value of any first element in the plurality of first elements is a fidelity value of a second tile corresponding to the any first element, a position of the any first element in the fidelity map is determined according to a position of the second tile corresponding to the any first element in the reconstructed image, or a position of the any first element in the fidelity map is determined according to a position of the second tile corresponding to the any first element in a preset area of the reconstructed image.

10. The method according to claim 8, wherein the second image block comprises three color components, the fidelity map is a three-dimensional array comprising three dimensions, a color component, a width and a height, the two-dimensional array under any color component A in the fidelity map comprises a plurality of first elements, the value of any first element in the plurality of first elements is the fidelity value of the color component A of the second image block corresponding to the any first element, the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of a second image block corresponding to any first element in the reconstructed image, or the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of the second image block corresponding to any first element in a preset area of the reconstructed image.

11. The method of claim 9 or 10, wherein decoding the second code stream to obtain the reconstructed picture of the fidelity map comprises:

decoding the second code stream to obtain a reconstruction fidelity value of any first element;

and obtaining a reconstructed image of the fidelity map according to the reconstructed fidelity value of any first element.

12. The method according to claim 9 or 10, wherein the second code stream is obtained by encoding the quantized first element; the decoding the second code stream to obtain a reconstructed image of the fidelity map comprises:

decoding the second code stream to obtain a reconstructed fidelity value of the quantized first element;

performing inverse quantization on the quantized reconstruction fidelity value of the first element to obtain a reconstruction fidelity value of any one first element;

13. A method of decoding, comprising:

decoding the first code stream to obtain a reconstructed image of an original image and target quantization parameter information, wherein the target quantization parameter information comprises quantization parameter values of all or part of second image blocks in a plurality of second image blocks of the reconstructed image;

And constructing a quantization parameter map of the reconstructed image according to the target quantization parameter information, wherein the quantization parameter map of the reconstructed image is used for representing distortion between at least partial areas of the original image and at least partial areas of the reconstructed image.

14. The method of claim 13, wherein the second image block is a coding unit.

15. The method according to claim 13 or 14, wherein the quantization parameter map of the reconstructed image includes a plurality of second elements, the plurality of second image blocks correspond to the plurality of second elements in a one-to-one manner, a value of any second element in the plurality of second elements is a quantization parameter value of a second image block corresponding to the any second element, a position of the any second element in the quantization parameter map of the reconstructed image is determined according to a position of a second image block corresponding to the any second element in the reconstructed image, or a position of the any second element in the quantization parameter map of the reconstructed image is determined according to a position of a second image block corresponding to the any second element in a preset area of the reconstructed image.

16. The method according to claim 13 or 14, wherein the second image block comprises three color components, the quantization parameter map of the reconstructed image is a three-dimensional array comprising three dimensions of color components, width and height, the two-dimensional array under any color component a in the quantization parameter map of the reconstructed image comprises a plurality of second elements, the value of any second element in the plurality of second elements is the quantization parameter value of color component a of the second image block corresponding to the any second element, the position of the two-dimensional array under any color component a in the quantization parameter map of the reconstructed image of any second element is determined according to the position of the second image block corresponding to the any second element in the reconstructed image, or the position of the two-dimensional array under any color component a in the quantization parameter map of the reconstructed image of any second element is determined according to the position of the second image block corresponding to the any second element in the reconstructed image And determining the position in the preset area of the reconstructed image.

17. The method according to any one of claims 14-16, wherein said constructing a quantization parameter map of the reconstructed image from the target quantization parameter information comprises:

When the target quantization parameter information includes quantization parameter values of a part of the plurality of coding units, obtaining a quantization parameter value of a target coding unit according to the quantization parameter values of the part of the coding units and/or a reference quantization parameter map, wherein the reference quantization parameter map is a quantization parameter map of a reference image of the reconstructed image, and the target coding unit is a coding unit of the plurality of coding units except the part of the coding units;

and obtaining a quantization parameter map of the reconstructed image according to the quantization parameter values of the partial coding units and the quantization parameter values of the target coding unit.

18. The method of claim 17, wherein the reference quantization parameter map comprises a plurality of reference elements, and wherein a value of any one of the plurality of reference elements is a quantization parameter value of a coding unit in the reference picture; the obtaining a quantization parameter value of a target coding unit according to a reference quantization parameter map includes:

and taking the value of a target element as a quantization parameter value of any target coding unit in the target coding units, wherein the target element is a reference element in the reference quantization parameter map, and the position of the target element in the reference quantization parameter map is determined according to the position of the any target coding unit in the reconstructed image, or the position of the target element in the reference quantization parameter map is determined according to the position of the any target coding unit in the reconstructed image and the motion vector of the any target coding unit.

19. An encoding device, characterized by comprising:

and the fidelity map encoder is used for encoding the fidelity map to obtain a second code stream, wherein the fidelity map is used for representing distortion between at least partial area of the original image and at least partial area of a reconstructed image, and the reconstructed image is obtained after decoding the first code stream.

20. The encoding apparatus of claim 19, further comprising a fidelity map calculator, the fidelity map calculator being configured to:

21. The encoding apparatus according to claim 20, wherein the fidelity map includes a plurality of first elements, the plurality of second image blocks correspond to the plurality of first elements one-to-one, a value of any first element in the plurality of first elements is a fidelity value of a second image block corresponding to the any first element, a position of the any first element in the fidelity map is determined according to a position of the second image block corresponding to the any first element in the reconstructed image, or a position of the any first element in the fidelity map is determined according to a position of the second image block corresponding to the any first element in a preset region of the reconstructed image.

22. The encoding device according to claim 20, wherein the second image block comprises three color components, the fidelity map is a three-dimensional array comprising three dimensions, a color component, a width and a height, the two-dimensional array under any color component A in the fidelity map comprises a plurality of first elements, the value of any first element in the plurality of first elements is the fidelity value of the color component A of the second image block corresponding to the any first element, the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of a second image block corresponding to any first element in the reconstructed image, or the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of the second image block corresponding to any first element in a preset area of the reconstructed image.

23. The encoding device according to claim 21 or 22, characterized in that the fidelity map encoder is specifically configured to:

24. The encoding device according to claim 21 or 22, wherein the fidelity map encoder is specifically configured to:

quantizing any first element to obtain a quantized first element;

encoding the quantized first element to obtain the second code stream;

25. A decoding device, characterized by comprising:

26. The decoding device according to claim 25,

27. The decoding apparatus according to claim 26, wherein the fidelity map includes a plurality of first elements, the plurality of second image blocks correspond to the plurality of first elements one-to-one, a value of any first element in the plurality of first elements is a fidelity value of a second image block corresponding to the any first element, a position of the any first element in the fidelity map is determined according to a position of the second image block corresponding to the any first element in the reconstructed image, or a position of the any first element in the fidelity map is determined according to a position of the second image block corresponding to the any first element in a preset region of the reconstructed image.

28. The decoding device according to claim 26, wherein the second image block comprises three color components, the fidelity map is a three-dimensional array comprising three dimensions, a color component, a width and a height, the two-dimensional array under any color component A in the fidelity map comprises a plurality of first elements, the value of any first element in the plurality of first elements is the fidelity value of the color component A of the second image block corresponding to the any first element, the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of a second image block corresponding to any first element in the reconstructed image, or the position of the two-dimensional array of any first element under any color component A in the fidelity map is determined according to the position of the second image block corresponding to any first element in a preset area of the reconstructed image.

29. The decoding device according to claim 27 or 28, wherein the fidelity diagram decoder is specifically configured to:

30. The decoding device according to claim 27 or 28, wherein the second code stream is obtained by encoding the quantized first element; the fidelity diagram decoder is specifically configured to:

31. A decoding device, characterized by comprising:

32. The decoding device according to claim 31, wherein the second image block is an encoding unit.

33. The decoding device according to claim 31 or 32, wherein the quantization parameter map of the reconstructed image includes a plurality of second elements, the plurality of second image blocks correspond to the plurality of second elements in a one-to-one manner, a value of any one of the plurality of second elements is a quantization parameter value of a second image block corresponding to the any one second element, a position of the any one second element in the quantization parameter map of the reconstructed image is determined according to a position of the second image block corresponding to the any one second element in the reconstructed image, or a position of the any one second element in the quantization parameter map of the reconstructed image is determined according to a position of the second image block corresponding to the any one second element in a preset region of the reconstructed image.

34. The decoding apparatus according to claim 31 or 32, wherein the second image block includes three color components, the quantization parameter map of the reconstructed image is a three-dimensional array including three dimensions of color components, width, and height, the two-dimensional array under any color component a in the quantization parameter map of the reconstructed image includes a plurality of second elements, a value of any second element in the plurality of second elements is a quantization parameter value of a color component a of the second image block corresponding to the any second element, a position of the two-dimensional array under any color component a of the quantization parameter map of the reconstructed image of any second element is determined according to a position of the second image block corresponding to the any second element in the reconstructed image, or a position of the two-dimensional array under any color component a in the quantization parameter map of the reconstructed image is determined according to a position of the second image block corresponding to the any second element in the reconstructed image The position of the block in a preset area of the reconstructed image is determined.

35. The decoding device according to any of the claims 32-34, wherein the quantization parameter map builder is specifically configured to:

When the target quantization parameter information comprises quantization parameter values of a part of the plurality of coding units, obtaining a quantization parameter value of a target coding unit according to the quantization parameter values of the part of the coding units and/or a reference quantization parameter map, wherein the reference quantization parameter map is a quantization parameter map of a reference image of the reconstructed image, and the target coding unit is a coding unit of the plurality of coding units except the part of the coding units;

36. The decoding device according to claim 35, wherein the reference quantization parameter map includes a plurality of reference elements, and a value of any one of the plurality of reference elements is a quantization parameter value of a coding unit in the reference picture; the quantized parameter map builder is specifically configured to:

37. An encoder (20) characterized in that it comprises processing circuitry for performing the method of any one of claims 1-6.

38. A decoder (30) characterized in that it comprises processing circuitry for performing the method of any of claims 7-12 or 13-18.

39. An encoder, comprising:

one or more processors;

a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, causes the encoder to perform the method of any of claims 1-6.

40. A decoder, comprising:

one or more processors;

a non-transitory computer readable storage medium coupled to the processor and storing a program for execution by the processor, wherein the program, when executed by the processor, causes the decoder to perform the method of any of claims 7-12 or 13-18.

41. A non-transitory computer-readable storage medium comprising program code that, when executed by a computer device, performs the method of any of claims 1-6 or 7-12 or 13-18.

42. A non-transitory storage medium comprising a bitstream encoded according to the method of any one of claims 1-6.