WO2023279968A1

WO2023279968A1 - Method and apparatus for encoding and decoding video image

Info

Publication number: WO2023279968A1
Application number: PCT/CN2022/100578
Authority: WO
Inventors: 杨海涛; 张恋; 傅佳莉; 毛珏; 刘�东; 马海川; 李礼; 吴枫
Original assignee: 华为技术有限公司; 中国科学技术大学
Priority date: 2021-07-09
Filing date: 2022-06-22
Publication date: 2023-01-12
Also published as: CN115604486A

Abstract

The present application relates to the technical field of video or image compression based on artificial intelligence (AI), and in particular to the technical field of video compression based on a neural network. Provided are a method and apparatus for encoding and decoding a video image. The encoding method comprises: acquiring a first image, the first image being an image to be encoded or a decoded image (S801); performing probability estimation according to first context information to obtain a first probability estimation result, wherein the first context information is obtained from the first image (S802); and writing the first probability estimation result into a compressed bitstream (S803). A decoding end performs sampling according to the probability estimation result, so as to obtain an estimation coefficient, and obtains a reconstructed image on the basis of the estimation coefficient obtained by means of sampling. By means of the present application, a high-quality image can be obtained.

Description

Method and device for encoding and decoding video images

This application claims the priority of the Chinese patent application with the application number 202110781903.8 and the title of the invention "Method and device for encoding and decoding video images" filed with the State Intellectual Property Office of China on July 9, 2021, the entire contents of which are hereby incorporated by reference In this application.

technical field

The present application relates to the field of video encoding and decoding, and in particular to a method and device for encoding and decoding video images.

Background technique

Digital images are image information recorded in the form of digital signals. A digital image (hereinafter referred to as an image) can be regarded as a two-dimensional array of M rows and N columns, including M×N samples, the position of each sample is called a sampling position, and the value of each sample is called a sample value.

In applications such as image storage and transmission, it is usually necessary to encode images to reduce storage capacity and transmission bandwidth. Image coding includes two steps of encoding and decoding. A typical coding process generally includes three steps of transformation, quantization and entropy coding. For an image to be encoded, the first step is to decorrelate the image through transformation to obtain the transformation coefficient with more concentrated energy distribution; the second step is to quantize the transformation coefficient to obtain the quantization coefficient; the third step is to entropy encode the quantization coefficient Get the compressed code stream. Corresponding to the encoding operation, a typical decoding process includes three steps of entropy decoding, inverse quantization and inverse transformation in sequence after the decoder receives the compressed code stream to obtain the reconstructed image.

In the decoding methods of the prior art, entropy decoding, inverse quantization and inverse transformation are generally deterministic processes, that is, decoding a compressed code stream will obtain a unique reconstructed image. The quality is not high.

Contents of the invention

Embodiments of the present application provide a video image encoding and decoding method and related equipment, which can improve image quality.

The above and other objects are achieved by the subject-matter of the independent claims. Other implementations are evident from the dependent claims, the detailed description and the figures.

Particular embodiments are outlined in the appended independent claims, other embodiments are outlined in the dependent claims.

Based on the first aspect, the present application relates to a video image encoding method. The method is performed by an encoding device, and the method includes:

Acquiring a first image, the first image is an image to be encoded or a decoded image, performing probability estimation according to the first context information to obtain a first probability estimation result; the first context information is obtained from the first image; the first probability estimation The result is written to the compressed codestream.

Wherein, the first context information may be pixels in the first image or coefficients in the first transformed image obtained by transforming the first image.

The probability estimation is performed at the encoding end to obtain a probability estimation result, and the probability estimation result is transmitted to the decoding end, so that the decoding end performs sampling based on the probability estimation result to obtain a high-quality image.

In a possible design, the method of this embodiment also includes:

Acquire a second image, the second image is an image to be encoded or a decoded image, and the second image is different from the first image; perform probability estimation according to the first context information to obtain a first probability estimation result, including:

Performing probability estimation according to the first context information and the second context information to obtain the first probability estimation result, wherein the second context information is obtained from the second image.

By introducing the second context information, a probability estimation result with higher accuracy can be obtained, so that the decoding end performs sampling based on the probability estimation result to obtain an image with better quality.

In a possible design, the probability estimation is performed according to the first context information to obtain the first probability estimation result, including:

Perform probability estimation according to the context information of the first data to obtain the probability estimation result of the first data; perform probability estimation according to the context information of the second data to obtain the probability estimation result of the second data; wherein, the first data and the second data are based on the second data An image is obtained; the first context information includes the context information of the first data and the context information of the second data.

The encoding end calculates the probability estimation results of each data in the first image one by one, and transmits the probability estimation results of each data to the decoding end, so that the decoding end can accurately sample based on the probability estimation results of the respective data, thereby obtaining the quality Higher reconstructed images.

In a possible design, the first probability estimation result includes the probability estimation result of a first preset area, the first preset area includes the first data and the second data, and the first preset area is located in the first image, or located in In the image obtained by transforming the first image, performing probability estimation according to the first context information to obtain a first probability estimation result, including:

Perform probability estimation according to the context information of the first data to obtain the probability estimation result of the first data; perform probability estimation according to the context information of the second data to obtain the probability estimation result of the second data, wherein the first context information includes the context information of the first data and the context information of the second data; according to the probability estimation result of the first data and the probability estimation result of the second data, the probability estimation result of the first preset area is selected, and the first probability estimation result includes the probability estimation of the first preset area result.

Wherein, the first preset area is an image block in the first image, or a subband obtained by performing wavelet transform on the first image, or a subband obtained by performing discrete cosine transform (discrete cosine transform, DCT) on the first image A frequency band of , or a transform block obtained by performing DCT on the first image, or a channel in a three-dimensional feature map obtained by performing feature extraction on the first image.

Wherein, performing DCT transformation on the first image in units of one or more image blocks may obtain one or more transform blocks.

For the data in a preset area, the encoder uses a probability estimation result as the probability estimation result of all the data in the preset area, so that only one probability estimation result needs to be transmitted during transmission, thereby reducing the number of transmission code streams and The resources required to transmit the code stream.

In a possible design, the first probability estimation result includes the probability estimation result of the second preset area, and the second preset area is located in the first image or in an image obtained by transforming the first image, and the first context The information includes context information of the second preset area, and performing probability estimation according to the first context information to obtain the first probability estimation result includes: performing probability estimation according to the context information of the second preset area to obtain the probability estimation result of the second preset area , the first probability estimation result includes the probability estimation result of the second preset area.

Wherein, the second preset area is an image block in the first image, or a subband obtained by performing wavelet transformation on the first image, or a frequency band obtained by performing DCT on the first image, or a A transformation block obtained by performing DCT on the image, or a channel in a three-dimensional feature map obtained by performing feature extraction on the first image.

In a possible design, this encoding method also includes:

Set the value of the first identifier of the first preset area as the first value to indicate that the probability estimation result of the first preset area is used when sampling the estimated coefficients in the first preset area; set the first The probability estimation result of the preset area is stored in the probability estimation result set, and the index of the probability estimation result of the first preset area in the probability estimation result set is recorded; the probability estimation result is written into the compressed code stream, including: the probability estimation result The set, the index, the size information of the first preset area and the first identification are written into the compressed code stream.

For multiple probability estimation results of multiple preset areas, the encoding end saves the probability estimation results of multiple preset areas in the probability estimation result set, and records the probability estimation results of each preset area in the probability estimation result set position (namely index), so that the decoder can accurately determine the probability estimation result of each preset area from the probability estimation result set obtained based on codestream decoding based on the index, thereby ensuring the accuracy of decoding. The size information is introduced to indicate the number of times of sampling based on the probability estimation result of the first preset area when sampling to obtain the estimated coefficients in the first preset area, so as to obtain all the estimated coefficients in the first preset area.

In a possible design, this encoding method also includes:

Set the value of the first identifier of the first preset area as the first value to indicate that the probability estimation result of the first preset area is used when sampling the estimated coefficients in the first preset area; according to the first The scaling factor of the preset area preprocesses the probability estimation result of the first preset area to obtain the processed probability estimation result, saves the processed probability estimation result into the probability estimation result set, and records the processed probability estimation result The results are in the index of the probability estimation result set; writing the probability estimation result into the compressed code stream includes: writing the probability estimation result set, the index, the size information of the first preset area and the first identification into the compressed code stream.

The encoding end preprocesses the probability estimation result of the first preset area to obtain a processed probability estimation result; the decoding end performs sampling based on the processed probability estimation result to obtain a reconstructed image. By setting different preprocessing methods, reconstructed images of different qualities can be obtained, such as images with high subjective quality or images with high objective quality.

In a possible design, this encoding method also includes:

Set the value of the first identifier of the first preset area as the first value to indicate that the probability estimation result of the first preset area is used when sampling the estimated coefficients in the first preset area; set the first Writing the probability estimation result into the compressed code stream includes: writing the probability estimation result of the first preset area, the size information of the first preset area and the first identification into the code stream.

By setting the value of the first flag of the first preset area as the first value, it indicates that the decoding end uses the first The probability estimation result of the preset area; the size information is introduced to indicate the number of sampling times that need to be sampled based on the probability estimation result of the first preset area when sampling to obtain the estimated coefficient in the first preset area, so as to obtain the first preset area All estimated coefficients in .

In a possible design, this encoding method also includes:

The probability estimation result of the first data is preprocessed to obtain the probability estimation result after processing.

In a possible design, the probability estimation result of the first data includes the mean and variance of the Gaussian distribution, and the probability estimation result of the first data is preprocessed to obtain the processed probability estimation result, including:

The variance of the Gaussian distribution is set to 0 as the processed variance, wherein the processed probability estimation result includes the mean value of the Gaussian distribution and the processed variance.

In a possible design, the probability estimation result of the first data includes the mean and variance of the Gaussian distribution, and the probability estimation result of the first data is preprocessed to obtain the processed probability estimation result, including: scaling according to the first data The factor preprocesses the variance of the Gaussian distribution to obtain the processed variance, where the processed probability estimation result includes the mean value of the Gaussian distribution and the processed variance; then

The scaling factor of the first data is the same as the scaling factor of the second data; or, the scaling factor of the first data is different from the scaling factor of the second data; or,

Perform preprocessing on the probability estimation result of the first data according to the content information of the preset area to which the first data belongs to obtain the processed probability estimation result, including: determining the first data according to the content information of the preset area to which the first data belongs The scaling factor of the data. According to the scaling factor, the variance of the Gaussian distribution is preprocessed to obtain the processed variance. Wherein, the content information of the preset area includes texture resolution level or texture complexity of the preset area.

As an example, the complexity of the texture can be calculated. For the preset area with complex texture, the resolution level is high, and for the smooth texture preset area, the resolution level is considered low. For the first data belonging to the preset area with high resolution level And the second data, the shrinkage factor of the first data is different from the shrinkage factor of the second data, for the first data and the second data in the preset area with low resolution level, the shrinkage factor of the first data and the second The shrinkage factor of the data is the same. As another example, for the first data and second data belonging to the preset area with high texture complexity, the shrinkage factor of the first data is different from that of the second data, and for the preset area with low texture complexity For the first data and the second data in the area, the shrinkage factor of the first data is the same as the shrinkage factor of the second data.

The aforementioned preset area may be an image block, a subband, a frequency band, or a channel as mentioned below.

If the first data and the second data belong to the same image block in the first image, the scaling factor of the first data and the scaling factor of the second data are the same; or if the first data and the second data belong to different image blocks, then The scaling factor of the first data is different from the scaling factor of the second data; or the scaling factor of the first data is determined according to the texture complexity of the image block to which the first data belongs;

or,

If the first data and the second data belong to one subband among the plurality of subbands obtained by performing wavelet transformation on the first image, then the scaling factor of the first data is the same as that of the second data; or if the first data and the second data The two data belong to different subbands, and the scaling factor of the first data is different from the scaling factor of the second data; or the scaling factor of the first data is determined according to the texture complexity of the subband to which the first data belongs;

or,

If the first data and the second data belong to one of the multiple frequency bands or one of the multiple transform blocks obtained by performing DCT on the first image, the scaling factor of the first data is the same as the scaling factor of the second data; Or if the first data and the second data belong to different frequency bands or transform blocks, then the scaling factor of the first data and the scaling factor of the second data are different; if or the scaling factor of the first data is according to the frequency band or transform to which the first data belongs The texture complexity of the block is determined;

or,

If the first data and the second data belong to the same channel of the three-dimensional feature map obtained by performing feature extraction on the first image, then the scaling factor of the first data and the scaling factor of the second data are the same; or if the first data and the second data belong to different channels, the scaling factor of the first data is different from the scaling factor of the second data; or the scaling factor of the first data is determined according to the texture complexity of the channel to which the first data belongs.

It should be noted here that the texture complexity of the image block to which the first data belongs can be determined according to the content of the corresponding image block in the image to be encoded or the decoded image; the texture complexity of the subband to which the first data belongs can be Determined according to the content of the corresponding part of the sub-band in the image to be encoded or in the decoded image; the texture complexity of the frequency band to which the first data belongs may be determined according to the content of the corresponding part of the frequency band in the image to be encoded or in the decoded image; for The texture complexity of the channel to which the first data belongs may be determined according to the content of the corresponding part of the channel in the image to be encoded or the decoded image. In one example, the larger the texture complexity of the first data is, the larger the scaling factor of the first data is.

In a possible design, this encoding method also includes:

The probability estimation result of the second preset area is preprocessed to obtain the probability estimation result after processing.

Set the variance of the Gaussian distribution to 0 as the first variance, wherein the processed probability estimation result includes the mean value and the first variance of the Gaussian distribution, or, the variance of the Gaussian distribution is calculated according to the scaling factor of the second preset area processing to obtain the second variance, wherein the processed probability estimation result includes the mean value of the Gaussian distribution and the second variance, and the scaling factor of the first prefabricated area is the same or different from the scaling factor of the second prefabricated area.

In a possible design, the first context information includes some or all pixel values in the first image.

By preprocessing the probability estimation results, reconstructed images with different properties can be obtained according to user's needs, which improves the quality of reconstructed images. For example, if the variance of the probability estimation result is set to 0 as the variance after processing, the reconstructed image with the best signal quality (best objective quality) can be obtained, that is, the peak signal to noise ratio (PSNR) of the image can be increased. ) or reduce the mean-square error (mean-square error, MSE); by setting the scaling factors of multiple data to be the same, an image with the best subjective quality can be obtained, that is, reducing the PSNR of the image or increasing the MSE of the image; By setting the scaling factors of the data belonging to the same part in the image to be the same, and setting the scaling factors of the data belonging to different parts to be different, an image whose property is between the best subjective quality and the best objective quality can be obtained.

In a possible design, this encoding method also includes:

Transforming the first image to obtain a first transformed image; wherein, if transformed into a wavelet transform, the first context information includes some or all coefficients in the first transformed image, and the coefficients are wavelet coefficients or quantized wavelet coefficients, Or; if the transformation is DCT, the first context information includes some or all of the coefficients in the first transformed image, and the coefficients are DCT coefficients or quantized DCT coefficients; or, if the transformation is feature transformation, the first context information is included in Part or all of the coefficients in the first transformed image are characteristic coefficients or quantized characteristic coefficients.

The first context information is input into the first probability estimation network for processing to obtain the parameters of the first probability distribution model; the parameters of the probability estimation result first probability distribution model;

or,

The first context information is input into the second probability estimation network for processing to obtain the target probability distribution, and the probability estimation result includes the parameters of the target probability distribution; wherein, the first probability estimation network and the second probability estimation network are realized by a neural network.

Based on the second aspect, the present application relates to a video image encoding method. The method is performed by an encoding device, and the method includes:

A plurality of coefficients are obtained according to the image to be encoded, and the plurality of coefficients include a first coefficient; a first probability estimation result is obtained according to the context information of the first coefficient; and the first coefficient and the first probability estimation result are written into a compressed code stream.

Wherein, the first coefficient may be a pixel in the image to be coded or a coefficient in a transformed image obtained by transforming the image to be coded.

In a possible design, the multiple coefficients also include a second coefficient, and the encoding method also includes:

Obtaining a second probability estimation result according to the context information of the second coefficient; writing the first coefficient and the first probability estimation result into the compressed code stream, including: writing the first coefficient, the first probability estimation result, the second coefficient and the second probability The estimated results are written to the compressed codestream.

The encoding end calculates the probability estimation results of each coefficient in the image to be encoded one by one, and transmits the probability estimation results of each coefficient to the decoding end, so that the decoding end can accurately sample based on the probability estimation results of the respective coefficients, thereby obtaining the quality Higher reconstructed images.

In a possible design, the plurality of coefficients further includes a second coefficient, the first coefficient and the second coefficient belong to the same preset area, and the preset area is located in the image to be coded, or in an image obtained by transforming the image to be coded, The first probability estimation result is obtained according to the context information of the first coefficient, including:

Perform probability estimation according to the context information of the first coefficient to obtain a third probability estimation result; perform probability estimation according to the context information of the second coefficient to obtain a second probability estimation result; determine the third probability estimation result from the third probability estimation result and the second probability estimation result a probability estimate result;

Writing the first coefficient and the first probability estimation result into the compressed code stream includes: writing the first coefficient, the second coefficient and the first probability estimation result into the compressed code stream.

Wherein, the preset area is an image block in the image to be encoded, or a subband obtained by performing wavelet transformation on the image to be encoded, or a frequency band obtained by performing DCT on the image to be encoded, or a frequency band obtained by performing DCT on the image to be encoded A transformation block, or a channel in a three-dimensional feature map obtained by performing feature extraction on the image to be encoded.

Wherein, DCT transformation is performed on the image to be coded in units of one or more image blocks to obtain one or more transform blocks.

For the data in a preset area, the encoding end uses a probability estimation result as the probability estimation result of all coefficients in the preset area, so that only one probability estimation result needs to be transmitted during transmission, thereby reducing the number of transmission code streams and The resources required to transmit the code stream.

In a possible design, the plurality of coefficients further includes a second coefficient, the first coefficient and the second coefficient belong to the same preset area, and the preset area is located in the image to be coded, or in an image obtained by transforming the image to be coded, The first probability distribution is obtained according to the context information of the first coefficient, including:

Probability estimation is performed according to the context information of the preset area to obtain a first probability estimation result; the context information of the preset area includes context information of the first coefficient; writing the first coefficient and the first probability estimation result into the compressed code stream includes: The first coefficient, the second coefficient and the first probability estimation result are written into the compressed code stream.

In a possible design, this encoding method also includes:

Set the value of the first flag of the preset area to the first value to indicate that the first probability estimation result is used when sampling the estimated coefficients in the preset area; save the first probability estimation result to the probability estimation result set, and record the index of the first probability estimation result in the probability estimation result set; write the first coefficient, the second coefficient and the first probability estimation result into the compressed code stream, including: writing the first coefficient, the second coefficient, the probability The estimation result set, the index, the size information of the preset area and the first identification are written into the compressed code stream.

For multiple probability estimation results of multiple preset areas, the encoding end saves the probability estimation results of multiple preset areas in the probability estimation result set, and records the probability estimation results of each preset area in the probability estimation result set position (namely index), so that the decoder can accurately determine the probability estimation result of each preset area from the probability estimation result set obtained based on codestream decoding based on the index, thereby ensuring the accuracy of decoding. The size information is introduced to indicate the number of times of sampling based on the probability estimation result of the preset area when sampling to obtain the estimated coefficients in the preset area, so as to obtain all the estimated coefficients in the preset area.

In a possible design, this encoding method also includes:

Set the value of the first flag of the preset area to the first value to indicate that when the estimated coefficients in the preset area are obtained by sampling, the first probability estimation result is used to combine the first coefficient, the second coefficient and the first probability Writing the estimation result into the compressed code stream includes: writing the first coefficient, the second coefficient, the first probability estimation result, the size information of the preset area and the first identification into the compressed code stream.

By setting the value of the first identifier of the preset area as the first value, it indicates that the decoding end obtains the probability estimation result of the preset area after sampling to obtain the estimated coefficient in the preset area, and uses the probability estimation result of the preset area; The size information is introduced to indicate the number of times of sampling based on the probability estimation result of the preset area when sampling to obtain the estimated coefficients in the preset area, so as to obtain all the estimated coefficients in the preset area.

In a possible design, the first coefficient and the second coefficient belong to the same preset area, and the encoding method further includes:

Set the value of the first flag of the preset area to the second value to indicate that when sampling the estimated coefficients in the preset area, the respective probability estimation results are used; set the first coefficient, the first probability estimation result, Writing the second coefficient and the second probability estimation result into the compressed code stream includes: writing the first coefficient, the first probability estimation result, the second coefficient, the second probability estimation result and the first identification of the preset area into the compressed code stream .

In a possible design, this encoding method also includes:

The probability estimation result of the first coefficient is preprocessed to obtain the probability estimation result after processing.

In a possible design, the probability estimation result of the first coefficient includes the mean and variance of the Gaussian distribution, and the probability estimation result of the first coefficient is preprocessed to obtain the processed probability estimation result, including:

Preprocessing the variance of the Gaussian distribution according to the scaling factor of the first coefficient to obtain the processed variance, wherein the processed probability estimation result includes the mean value of the Gaussian distribution and the processed variance;

The scaling factor of the first coefficient and the scaling factor of the second coefficient are the same; or, the scaling factor of the first coefficient and the scaling factor of the second coefficient are different; or,

Perform preprocessing on the probability estimation result of the first coefficient according to the content information of the preset area to which the first coefficient belongs to obtain the processed probability estimation result, including: determining the first coefficient according to the content information of the preset area to which the first coefficient belongs The scaling factor of the coefficient, according to which the variance of the Gaussian distribution is preprocessed to obtain the processed variance. Wherein, the content information of the preset area includes texture resolution level or texture complexity of the preset area.

As an example, the complexity of the texture can be calculated. For the preset area with complex texture, the resolution level is considered to be high, and for the texture smooth preset area, the resolution level is considered to be low. For the first coefficient belonging to the preset area with high resolution level And the second coefficient, the shrinkage factor of the first coefficient is different from the shrinkage factor of the second coefficient, for the first coefficient and the second coefficient in the preset area with low resolution level, the shrinkage factor of the first coefficient The shrinkage factors of the coefficients are the same. As another example, for the first coefficient and the second coefficient belonging to the preset area with high texture complexity, the shrinkage factor of the first coefficient and the shrinkage factor of the second coefficient are different, and for the preset area with low texture complexity The first coefficient and the second coefficient in the area, the shrinkage factor of the first coefficient and the shrinkage factor of the second coefficient are the same.

The aforementioned preset area may be an image block, a subband, a frequency band, a transform block or a channel as mentioned below.

If the first coefficient and the second coefficient belong to the same image block in the image to be encoded, then the scaling factor of the first data and the scaling factor of the second coefficient are the same; or if the first coefficient and the second coefficient belong to different image blocks, then The scaling factor of the first coefficient is different from the scaling factor of the second coefficient; or the scaling factor of the first coefficient is determined according to the texture complexity of the subband to which the first coefficient belongs; or,

If the first coefficient and the second coefficient belong to one of the multiple subbands obtained by performing wavelet transformation on the image to be encoded, then the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient; or if the first coefficient and the second If the coefficients belong to different subbands, the scaling factor of the first coefficient is different from the scaling factor of the second coefficient; or the scaling factor of the first coefficient is determined according to the texture complexity of the subband to which the first coefficient belongs;

or,

If the first coefficient and the second coefficient belong to one of the multiple frequency bands obtained by performing DCT on the image to be coded, the scaling factor of the first coefficient and the scaling factor of the second coefficient are the same; or if the first coefficient and the second coefficient belong to different frequency band, the scaling factor of the first coefficient is different from the scaling factor of the second coefficient; if the scaling factor of the first coefficient is determined according to the texture complexity of the frequency band to which the first coefficient belongs;

or,

If the first coefficient and the second coefficient belong to the same channel of the three-dimensional feature map obtained by feature extraction of the image to be coded, the scaling factor of the first coefficient and the scaling factor of the second coefficient are the same; or if the first coefficient and the second coefficient belong to For different channels, the scaling factor of the first coefficient is different from the scaling factor of the second coefficient; if the scaling factor of the first coefficient is determined according to the texture complexity of the channel to which the first coefficient belongs.

It should be noted here that the texture complexity of the image block to which the first coefficient belongs can be determined according to the content of the image block in the image to be encoded; the texture complexity of the subband to which the first coefficient belongs can be determined according to the content of the image to be encoded The content of the corresponding part of the sub-band is determined; the texture complexity of the frequency band to which the first coefficient belongs can be determined according to the content of the corresponding part of the frequency band in the image to be encoded; the texture complexity of the channel to which the first coefficient belongs can be determined according to the content of the frequency band to be encoded The content of the corresponding part of the channel in the encoded image is determined. Wherein, the larger the texture complexity of the first coefficient is, the larger the scaling factor of the first coefficient is.

In a possible design, this encoding method also includes:

The probability estimation result of the preset area is preprocessed to obtain the probability estimation result after processing.

In a possible design, the probability estimation result of the preset area includes the mean and variance of the Gaussian distribution, and the probability estimation result of the preset area is preprocessed to obtain the processed probability estimation result, including:

Set the variance of the Gaussian distribution to 0 as the first variance, wherein the processed probability estimation result includes the mean value and the first variance of the Gaussian distribution, or process the variance of the Gaussian distribution according to the scaling factor of the preset area, to obtain the second variance, wherein the processed probability estimation result includes the mean value and the second variance of the Gaussian distribution.

By preprocessing the probability estimation results, reconstructed images with different properties can be obtained according to user's needs, which improves the quality of reconstructed images. For example, if the variance of the probability estimation result is set to 0 as the variance after processing, the reconstructed image with the best signal quality (best objective quality) can be obtained, that is, the PSNR or MSE of the image is increased; by setting the scaling factors of multiple coefficients For the same, the image with the best subjective quality can be obtained, that is, to reduce the PSNR of the image or to increase the MSE of the image; by setting the scaling factors of the coefficients belonging to the same part of the image to be the same, the coefficients of the coefficients belonging to different parts If the scaling factors are set differently, an image whose nature is between the best subjective quality and the best objective quality can be obtained.

In a possible design, if the multiple coefficients are multiple pixel values in the image to be coded, the first context information includes some or all pixel values in the first image; or,

Obtain multiple coefficients according to the image to be encoded, including:

If the image to be coded is subjected to wavelet transformation to obtain multiple coefficients, the multiple coefficients are multiple wavelet coefficients, and the first context information includes part or all of the multiple wavelet coefficients; or, if the image to be coded is subjected to wavelet transformation and quantization to obtain multiple Coefficients, the plurality of coefficients are a plurality of quantized wavelet coefficients, the first context information includes part or all of the plurality of quantized wavelet coefficients; or, if the image to be coded is subjected to DCT to obtain a plurality of coefficients, the plurality of coefficients are a plurality of DCT coefficients, The first context information includes some or all of the multiple DCT coefficients; or, if the image to be coded is subjected to DCT and quantization to obtain multiple coefficients, the multiple coefficients are multiple quantized DCT coefficients, and the first context information includes multiple quantized DCT coefficients Part or all of them; or, if the feature extraction of the image to be coded obtains multiple coefficients, the multiple coefficients are multiple feature coefficients, and the first context information includes some or all of the multiple feature coefficients; or, if the image to be coded Feature extraction and quantization are performed to obtain multiple coefficients, the multiple coefficients are multiple quantized feature coefficients, and the first context information includes part or all of the multiple quantized feature coefficients.

In a possible design, the first probability estimation result is obtained according to the context information of the first coefficient, including:

Obtain the second probability distribution model, input the first context information into the third probability estimation network for processing, and obtain the parameters of the second probability distribution model; obtain the first probability according to the parameters of the second probability distribution model and the second probability distribution model estimated results;

or,

The first context information is input into the fourth probability estimation model for processing to obtain a probability estimation result; wherein, the third probability estimation network and the fourth probability estimation network are realized by a neural network.

Based on a third aspect, the present application relates to a method for decoding video images. The method is performed by a decoding device, and the method includes:

Decoding the compressed code stream to obtain a first probability estimation result; performing sampling according to the first probability estimation result to obtain a first estimated coefficient; obtaining a first reconstructed image according to the first estimated coefficient.

In a possible design, the decoding method also includes:

Decoding the compressed code stream to obtain a second probability estimation result; performing sampling according to the second probability estimation result to obtain a second estimation coefficient; obtaining a first reconstructed image according to the first estimation coefficient, including: obtaining according to the first estimation coefficient and the second estimation coefficient First reconstruct the image.

In a possible design, the first probability estimation result is obtained from decoding the compressed code stream, including:

Decoding the first identifier from the compressed code stream; if the value of the first identifier is the first value, decoding the compressed code stream to obtain a first probability estimation result, including:

Decode the probability estimation result set and the index of the preset area from the compressed code stream; the preset area includes the first estimated coefficient, the preset area is an area in the first reconstructed image, and is determined from the probability estimation result set according to the index The probability estimation result of the preset area, the first probability estimation result is the probability estimation result of the preset area; wherein, the value of the first identifier is the first value used to indicate that all estimation systems in the preset area are sampled using the above Probability estimation results for preset regions.

In a possible design, the decoding method also includes:

Decoding the first identifier from the compressed code stream; if the value of the first identifier is the first value, decoding the compressed code stream to obtain the first probability estimation result, including: decoding the probability estimation result of the preset area from the compressed code stream and the size information of the preset area; the preset area includes the first estimation coefficient, and the preset area is an area in the first reconstructed image; the probability estimation result of the preset area is the first probability estimation result; wherein, the first identified The value is the first value and is used to indicate that the probability estimation result of the preset area is used when all the systems to be estimated in the preset area are obtained by sampling.

In a possible design, the first estimated coefficient and the second estimated coefficient belong to the same preset area, and the preset area is an area in the first reconstructed image, and the decoding method further includes:

Decode the first identifier from the compressed code stream; if the value of the first identifier is the second value, the value of the first identifier is the second value, which is used to indicate that when sampling all the systems to be estimated in the preset area, use their respective probabilities Estimated results.

In a possible design, the first probability estimation result includes the mean and variance of the Gaussian distribution, and sampling is performed according to the first probability estimation result to obtain the first estimated coefficient, including:

Acquiring a first random number; determining a first reference value according to the first random number, and the first reference value obeys a Gaussian distribution; determining a first estimation coefficient according to the first reference value and the mean value and variance of the first probability estimation result.

In a possible design, the decoding method also includes:

Preprocessing the variance of the first probability estimation result to obtain the processed variance;

Determining the first estimated coefficient according to the first reference value and the mean value and variance of the first probability estimation result, including:

The first estimation coefficient is determined according to the first reference value, the mean value of the first probability estimation result and the processed variance.

In one possible design, the variance of the first probability estimation result is preprocessed to obtain the processed variance, including:

Set the variance of the first probability distribution to 0 as the processed variance.

In a possible design, the first estimated coefficient is a quantized wavelet coefficient, or, a wavelet coefficient, or a quantized DCT coefficient, or a DCT coefficient, or a feature coefficient, or a quantized feature coefficient, and the variance of the first probability distribution is preprocessed, To get the processed variance, including:

Preprocess the variance of the first probability distribution according to the scaling factor of the first estimated coefficient to obtain the processed variance,

The scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient are the same; or, the scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient are different; or,

Perform preprocessing on the probability estimation result of the first estimated coefficient according to the content information of the preset area to which the first estimated coefficient belongs to obtain the processed probability estimation result, including: according to the content information of the preset area to which the first estimated coefficient belongs The scaling factor of the first estimated coefficient is determined, and the variance of the Gaussian distribution is preprocessed according to the scaling factor to obtain the processed variance. Wherein, the content information of the preset area includes texture resolution level or texture complexity of the preset area.

As an example, the complexity of the texture can be calculated. For the preset area with complex texture, the resolution level is considered to be high, and the texture smooth preset area is considered to be low in resolution level. For the preset area belonging to the same high resolution level, the first estimate coefficient and the second estimated coefficient, the shrinkage factor of the first estimated coefficient and the shrinkage factor of the second estimated coefficient are different, for the first estimated coefficient and the second estimated coefficient belonging to the preset area with low resolution level, the first estimated coefficient The shrinkage factor for the coefficients is the same as the shrinkage factor for the second estimated coefficients. As another example, for the first estimated coefficient and the second coefficient belonging to the preset area with high texture complexity, the shrinkage factor of the first estimated coefficient and the shrinkage factor of the second estimated coefficient are different, and for the same preset area with low texture complexity The first estimated coefficient and the second estimated coefficient in the preset area of , the shrinkage factor of the first estimated coefficient and the shrinkage factor of the second estimated coefficient are the same.

When the first estimated coefficient and the second estimated coefficient are quantized wavelet coefficients or wavelet coefficients, if the first estimated coefficient and the second estimated coefficient belong to the same subband, the scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient The factors are the same; or if the first estimated coefficient and the second estimated coefficient belong to different subbands, the scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient are different; or the scaling factor of the first estimated coefficient is based on the first estimated The texture complexity of the image block to which the coefficient belongs is determined;

or,

When the first estimated coefficient and the second estimated coefficient are quantized DCT coefficients or DCT coefficients, if the first estimated coefficient and the second estimated coefficient belong to the same frequency band or transform block, the scaling factor of the first estimated coefficient and the second estimated The scaling factors of the coefficients are the same; or if the first estimated coefficients and the second estimated coefficients belong to different frequency bands or transform blocks, the scaling factors of the first estimated coefficients and the scaling factors of the second estimated coefficients are different; or the scaling factors of the first estimated coefficients is determined according to the frequency band to which the first estimated coefficient belongs or the texture complexity of the transform block;

or,

When the first estimated coefficient and the second estimated coefficient are characteristic coefficients or quantized characteristic coefficients, if the first estimated coefficient and the second estimated coefficient belong to the same channel, the scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient are the same ; or if the first estimated coefficient and the second estimated coefficient belong to different channels, the scaling factor of the first estimated coefficient is different from the scaling factor of the second estimated coefficient; or the scaling factor of the first estimated coefficient is based on the channel to which the first estimated coefficient belongs The channel's texture complexity is determined.

In a possible design, the first estimated coefficient and the second estimated coefficient are pixel values, and the variance of the first probability estimation result is preprocessed to obtain the processed variance, including:

Preprocess the variance of the first probability estimate according to the scaling factor of the first coefficient to obtain the processed variance,

The scaling factor of the first estimated coefficient is the same as the scaling factor of the second estimated coefficient, or the scaling factor of the first estimated coefficient is different from the scaling factor of the second estimated coefficient; or the scaling factor of the first estimated coefficient is based on the first estimated coefficient The texture complexity of the image block to which it belongs is determined.

It should be noted here that the texture complexity of the image block to which the first estimated coefficient belongs can be determined according to the content of the image block in the first reconstructed image or the second reconstructed image; for the texture complexity of the subband to which the first estimated coefficient belongs The complexity can be determined according to the content of the corresponding part of the subband in the first reconstructed image or the second reconstructed image; the texture complexity of the frequency band to which the first estimated coefficient belongs can be determined according to the content of the subband in the first reconstructed image or the second reconstructed image The content of the corresponding part of the frequency band is determined; the texture complexity of the channel to which the first estimated coefficient belongs may be determined according to the content of the corresponding part of the channel in the first reconstructed image or the second reconstructed image. Wherein, the larger the texture complexity of the first estimated coefficient is, the larger the scaling factor of the first estimated coefficient is.

In a possible design, the first reconstructed image is obtained according to the first estimated coefficient and the second estimated coefficient, including:

If the first estimated coefficient and the second estimated coefficient are quantized wavelet coefficients, inverse quantization and wavelet inverse transform are performed on the first estimated coefficient and the second estimated coefficient to obtain the first reconstructed image, or, if the first estimated coefficient and the second estimated coefficient is the wavelet coefficient, perform wavelet inverse transform on the first estimated coefficient and the second estimated coefficient to obtain the first reconstructed image, or, if the first estimated coefficient and the second estimated coefficient are quantized DCT coefficients, the first estimated coefficient and the second estimated coefficient Perform inverse quantization and inverse DCT on the coefficients to obtain the first reconstructed image, or, the first estimated coefficient and the second estimated coefficient are DCT coefficients, and perform inverse DCT on the first estimated coefficient and the second estimated coefficient to obtain the first reconstructed image.

By preprocessing the probability estimation results, reconstructed images with different properties can be obtained according to user's needs, which improves the quality of reconstructed images. For example, if the variance of the probability estimation result is set to 0 as the variance after processing, the reconstructed image with the best signal quality (best objective quality) can be obtained, that is, the PSNR or MSE of the image is increased; Set to the same, you can get the image with the best subjective quality, that is, reduce the PSNR of the image or increase the MSE of the image; by setting the scaling factor of the data belonging to the same part of the image to the same, the data belonging to different parts The scaling factors of different images can be obtained between the best subjective quality and the best objective quality.

In a possible design, the decoding method also includes:

A plurality of reconstruction coefficients are obtained by decoding the compressed code stream; and a second reconstruction image is obtained according to the plurality of reconstruction coefficients.

In one possible design, the second reconstructed image is derived from a plurality of coefficients, including:

If the multiple reconstruction coefficients are quantized wavelet coefficients, perform inverse quantization and wavelet inverse transform on the multiple reconstruction coefficients to obtain the second reconstructed image, or, if the multiple reconstruction coefficients are wavelet coefficients, perform wavelet inverse transform on the multiple reconstruction coefficients to obtain the second reconstructed image Two reconstructed images, or, if the plurality of reconstruction coefficients are quantized DCT coefficients, perform inverse quantization and inverse DCT on the plurality of reconstruction coefficients to obtain a second reconstructed image, or, if the plurality of reconstruction coefficients are DCT coefficients, perform inverse quantization on the plurality of reconstruction coefficients The inverse DCT obtains the second reconstructed image.

Due to the randomness of the sampling process, the sampling step can be repeated in the present application to obtain multiple reconstructed images. The multiple reconstructed images may be the reconstructed images with the best subjective quality, or the reconstructed images with the best objective quality. The reconstructed image can be used in the codec loop as a reference for intra-frame or inter-frame prediction; it can also be used outside the codec loop to optimize image quality as a post-processing method. For example: After multiple reconstructed images are obtained through the sampling step and the inverse transformation step, the reconstructed image with the best subjective quality is put into the decoded picture buffer (DPB) or the reference frame set, which is used to encode and decode the frame in the loop The reference image for intra or inter-frame prediction; the reconstructed image with the best objective quality is used for post-processing, and the subjective quality adjustment is performed on the coded reconstructed image to improve the image/video quality after compression and reconstruction.

It should be pointed out here that the beneficial effects of the decoding end can refer to the beneficial effects of the encoding end, which will not be described here again.

Based on the fourth aspect, the present application relates to a video image-based encoding device, and the beneficial effects may refer to the description of the first aspect or the second aspect, which will not be repeated here. The coding device has the function of realizing the behavior in the method example of the first aspect or the second aspect above. The functions described above may be implemented by hardware, or may be implemented by executing corresponding software on the hardware. The hardware or software includes one or more modules corresponding to the above functions.

Based on the fifth aspect, the present application relates to a video image-based decoding device, and the beneficial effects may refer to the description of the third aspect and will not be repeated here. The encoding device has the function of realizing the behavior in the method example of the third aspect above. The functions described above may be implemented by hardware, or may be implemented by executing corresponding software on the hardware. The hardware or software includes one or more modules corresponding to the above functions.

The method described in the first aspect or the second aspect of the present application may be executed by the device described in the fourth aspect of the present application. Other features and implementations of the method described in the first or second aspect of the present application directly depend on the functionality and implementation of the device described in the fourth aspect of the present application.

The method described in the third aspect of the present application can be executed by the device described in the fifth aspect of the present application. Other features and implementations of the method described in the third aspect of the application depend directly on the functionality and implementations of the device described in the fifth aspect of the application.

Based on a sixth aspect, the present application relates to an apparatus for encoding a video stream, including a processor and a memory. The memory stores instructions, and the instructions cause the processor to execute the method described in the first aspect or the second aspect.

Based on a seventh aspect, the present application relates to an apparatus for decoding a video stream, including a processor and a memory. The memory stores instructions, and the instructions cause the processor to execute the method described in the third aspect.

According to an eighth aspect, there is provided a computer readable storage medium having stored thereon instructions which, when executed, cause one or more processors to encode video data. The instructions cause the one or more processors to execute the method in the first, second, or third aspect, or any possible embodiment of the first, second, or third aspect.

Based on the ninth aspect, the present application relates to a computer program product including program code, the program code executes the first or second or third aspect or any possible embodiment of the first or second or third aspect when running method in .

The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present application, and those skilled in the art can also obtain other drawings based on these drawings without creative work.

FIG. 1 is a block diagram of an example of a video decoding system for implementing an embodiment of the present application;

FIG. 2 is a block diagram of another example of a video decoding system for implementing an embodiment of the present application;

FIG. 3 is a schematic block diagram of a video decoding device for implementing an embodiment of the present application;

FIG. 4 is a schematic block diagram of a video decoding device for implementing an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a video encoding and decoding device provided in an embodiment of the present application;

Figure 6a is a schematic diagram of the results after a wavelet transformation;

Fig. 6b is a schematic diagram of the first context information and the second context information of the first data;

Fig. 6c is a schematic diagram of the first context information and the second context information of the first preset area;

Fig. 6d is a schematic structural diagram of a probability estimation network provided by an embodiment of the present application;

FIG. 6e is a schematic structural diagram of a residual network provided by an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a video codec provided in an embodiment of the present application;

FIG. 8 is a schematic diagram of an encoding process provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of another encoding process provided by the embodiment of the present application;

FIG. 10 is a schematic diagram of a decoding process provided by an embodiment of the present application.

detailed description

The embodiment of the present application provides an AI-based video image compression technology, especially a neural network-based video compression technology, and specifically provides a probability distribution and sampling-based decoding method to improve the traditional hybrid video codec system .

Video coding generally refers to the processing of sequences of images that form a video or video sequence. In the field of video coding, the terms "picture", "frame" or "image" may be used as synonyms. Video coding (or commonly referred to as coding) includes two parts: video coding and video decoding. Video encoding is performed on the source side and typically involves processing (eg, compressing) raw video images to reduce the amount of data needed to represent the video images (and thus more efficient storage and/or transmission). Video decoding is performed at the destination and typically involves inverse processing relative to the encoder to reconstruct the video image. The "encoding" of video images (or generally referred to as images) involved in the embodiments should be understood as "encoding" or "decoding" of video images or video sequences. The encoding part and the decoding part are also collectively referred to as codec (encoding and decoding, CODEC).

In the case of lossless video coding, the original video image can be reconstructed, ie the reconstructed video image has the same quality as the original video image (assuming no transmission loss or other data loss during storage or transmission). In the case of lossy video coding, further compression is performed by quantization, etc., to reduce the amount of data required to represent the video image, and the decoder side cannot completely reconstruct the video image, that is, the quality of the reconstructed video image is lower than that of the original video image. low or poor.

Since the embodiment of the present application involves the application of a neural network, for ease of understanding, some nouns or terms used in the embodiment of the present application are firstly explained below, and the nouns or terms are also part of the summary of the invention.

(1) neural network

The neural network can be composed of neural units, and the neural unit can refer to an operation unit that takes xs and intercept 1 as input, and the output of the operation unit can be:

Wherein, s=1, 2, ... n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting multiple above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field. The local receptive field can be an area composed of several neural units.

(2) Deep Neural Network

Deep neural network (DNN), also known as multi-layer neural network, can be understood as a neural network with multiple hidden layers. DNN is divided according to the position of different layers, and the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in the middle are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

Although DNN looks complicated, it is actually not complicated in terms of the work of each layer. In simple terms, it is the following linear relationship expression:

in,

is the input vector,

is the output vector,

Is the offset vector, W is the weight matrix (also called coefficient), a() is the activation function. Each layer is just an input vector

After such a simple operation to get the output vector

Due to the large number of DNN layers, the coefficient W and the offset vector

The number is also higher. The definition of these parameters in DNN is as follows: Take the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the fourth neuron of the second layer to the second neuron of the third layer is defined as

The superscript 3 represents the layer number of the coefficient W, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

In summary, the coefficient from the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as

It should be noted that the input layer has no W parameter. In deep neural networks, more hidden layers make the network more capable of describing complex situations in the real world. Theoretically speaking, a model with more parameters has a higher complexity and a greater "capacity", which means that it can complete more complex learning tasks. Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).

(3) Convolutional neural network

Convolutional neural network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network contains a feature extractor composed of a convolutional layer and a subsampling layer, which can be regarded as a filter. The convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron can only be connected to some adjacent neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units of the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as a way to extract image information that is independent of location. The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

(4) Recurrent neural networks (RNN) are used to process sequence data. In the traditional neural network model, from the input layer to the hidden layer to the output layer, the layers are fully connected, and each node in each layer is unconnected. Although this ordinary neural network solves many problems, it is still powerless to many problems. For example, if you want to predict what the next word in a sentence is, you generally need to use the previous words, because the preceding and following words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output. The specific manifestation is that the network will remember the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layer and the current layer are no longer connected but connected, and the input of the hidden layer not only includes The output of the input layer also includes the output of the hidden layer at the previous moment. In theory, RNN can process sequence data of any length. The training of RNN is the same as that of traditional CNN or DNN. RNN is designed to allow machines to have the ability to remember like humans. Therefore, the output of RNN needs to depend on the current input information and historical memory information.

(5) Loss function

In the process of training the deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then based on the difference between the two to update the weight vector of each layer of the neural network (of course, there is usually an initialization process before the first update, that is, to pre-configure parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make it predict lower, and keep adjusting until the deep neural network can predict the real desired target value or a value very close to the real desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function (loss function) or objective function (objective function), which is used to measure the difference between the predicted value and the target value important equation. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing the loss as much as possible.

(6) Back propagation algorithm

The neural network can use the error back propagation (back propagation, BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, passing the input signal forward until the output will generate an error loss, and updating the parameters in the initial neural network model by backpropagating the error loss information, so that the error loss converges. The backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the neural network model, such as the weight matrix.

In the following embodiment of the decoding system 10 , the encoder 20 and the decoder 30 are described with reference to FIGS. 1-3 .

FIG. 1 is a schematic block diagram of an exemplary decoding system 10 , such as a video decoding system 10 (or simply referred to as the decoding system 10 ), which may utilize the techniques of the present application. Video encoder 20 (or simply encoder 20) and video decoder 30 (or simply decoder 30) in video coding system 10 represent devices, etc. that may be used to perform techniques according to various examples described in this application. .

As shown in FIG. 1 , the decoding system 10 includes a source device 12 for providing coded image data 21 such as coded images to a destination device 14 for decoding the coded image data 21 .

The source device 12 includes an encoder 20 , and optionally, an image source 16 , a preprocessor (or a preprocessing unit) 18 such as an image preprocessor, and a communication interface (or a communication unit) 22 .

Image source 16 may include or be any type of image capture device for capturing real world images, etc., and/or any type of image generation device, such as a computer graphics processor or any type of Devices for acquiring and/or providing real-world images, computer-generated images (e.g., screen content, virtual reality (VR) images, and/or any combination thereof (e.g., augmented reality (AR) images). So The image source may be any type of memory or storage that stores any of the above images.

To distinguish the processing performed by the preprocessor (or preprocessing unit) 18 , the image (or image data) 17 may also be referred to as an original image (or original image data) 17 .

The preprocessor 18 is used to receive (original) image data 17 and perform preprocessing on the image data 17 to obtain a preprocessed image (or preprocessed image data) 19 . For example, preprocessing performed by preprocessor 18 may include cropping, color format conversion (eg, from RGB to YCbCr), color grading, or denoising. It can be understood that the preprocessing unit 18 can be an optional component.

A video encoder (or encoder) 20 is used to receive preprocessed image data 19 and provide encoded image data 21 (to be further described below with reference to FIG. 2 etc.).

The communication interface 22 in the source device 12 may be used to receive the encoded image data 21 and send the encoded image data 21 (or any other processed version) via the communication channel 13 to another device such as the destination device 14 or any other device for storage Or rebuild directly.

The destination device 14 includes a decoder 30 , and may also optionally include a communication interface (or communication unit) 28 , a post-processor (or post-processing unit) 32 and a display device 34 .

The communication interface 28 in the destination device 14 is used to receive the coded image data 21 (or any other processed version) directly from the source device 12 or from any other source device such as a storage device, for example, the storage device is a coded image data storage device, And the coded image data 21 is supplied to the decoder 30 .

The communication interface 22 and the communication interface 28 can be used to pass through a direct communication link between the source device 12 and the destination device 14, such as a direct wired or wireless connection, etc., or through any type of network, such as a wired network, a wireless network, or any other Combination, any type of private network and public network or any combination thereof, send or receive coded image data (or coded data) 21 .

For example, the communication interface 22 can be used to encapsulate the encoded image data 21 into a suitable format such as a message, and/or use any type of transmission encoding or processing to process the encoded image data, so that it can be transmitted over a communication link or communication network on the transmission.

The communication interface 28 corresponds to the communication interface 22, eg, can be used to receive the transmission data and process the transmission data using any type of corresponding transmission decoding or processing and/or decapsulation to obtain the encoded image data 21 .

Both the communication interface 22 and the communication interface 28 can be configured as a one-way communication interface as indicated by an arrow from the source device 12 to the corresponding communication channel 13 of the destination device 14 in FIG. 1, or a two-way communication interface, and can be used to send and receive messages etc., to establish the connection, confirm and exchange any other information related to the communication link and/or data transmission such as encoded image data transmission, etc.

The video decoder (or decoder) 30 is used to receive encoded image data 21 and provide decoded image data (or decoded image data) 31 (which will be further described below with reference to FIG. 3 , etc.).

The post-processor 32 is used to post-process the decoded image data 31 (also referred to as reconstructed image data) such as the decoded image to obtain post-processed image data 33 such as the post-processed image. Post-processing performed by post-processing unit 32 may include, for example, color format conversion (e.g., from YCbCr to RGB), color grading, cropping, or resampling, or any other processing for producing decoded image data 31 for display by a display device 34 or the like. .

A display device 34 is used to receive the post-processed image data 33 to display the image to a user or viewer or the like. Display device 34 may be or include any type of display for representing the reconstructed image, eg, an integrated or external display screen or display. For example, the display screen may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS) display, or a liquid crystal on silicon (LCoS) display. ), a digital light processor (DLP), or any type of other display.

The decoding system 10 also includes a training engine 25. The specific training process implemented by the training engine 25 can be found in the subsequent description and will not be described here.

Although FIG. 1 shows the source device 12 and the destination device 14 as independent devices, the device embodiment may also include the source device 12 and the destination device 14 or the functions of the source device 12 and the destination device 14 at the same time, that is, include the source device 12 and the destination device 14 at the same time. Device 12 or corresponding function and destination device 14 or corresponding function. In these embodiments, source device 12 or corresponding functionality and destination device 14 or corresponding functionality may be implemented using the same hardware and/or software or by separate hardware and/or software or any combination thereof.

It will be apparent to a skilled person from the description that the presence and (exact) division of different units or functions in the source device 12 and/or destination device 14 shown in FIG. 1 may vary depending on the actual device and application. .

Encoder 20 (e.g., video encoder 20) or decoder 30 (e.g., video decoder 30) or both may be implemented by processing circuitry as shown in FIG. 2, such as one or more microprocessors, digital signal processors (digital signal processor, DSP), application-specific integrated circuit (ASIC), field-programmable gate array (field-programmable gate array, FPGA), discrete logic, hardware, video encoding dedicated processor or any combination thereof . Encoder 20 may be implemented by processing circuitry 46 to include the various modules discussed with reference to encoder 20 of FIG. 2 and/or any other encoder system or subsystem described herein. Decoder 30 may be implemented by processing circuitry 46 to include the various modules discussed with reference to decoder 30 of FIG. 3 and/or any other decoder system or subsystem described herein. The processing circuitry 46 may be used to perform various operations discussed below. As shown in Figure 4, if part of the technology is implemented in software, the device can store the instructions of the software in a suitable non-transitory computer-readable storage medium, and use one or more processors to execute the instructions in hardware, thereby Perform the inventive technique. One of the video encoder 20 and the video decoder 30 may be integrated in a single device as part of a combined codec (encoder/decoder, CODEC), as shown in FIG. 2 .

Source device 12 and destination device 14 may comprise any of a variety of devices, including any type of handheld or stationary device, such as a notebook or laptop computer, cell phone, smartphone, tablet or tablet computer, camera, desktop computers, set-top boxes, televisions, display devices, digital media players, video game consoles, video streaming devices (such as content service servers or content distribution servers), broadcast receiving devices, broadcast transmitting devices, etc., and may not Use or use any type of operating system. In some cases, source device 12 and destination device 14 may be equipped with components for wireless communication. Accordingly, source device 12 and destination device 14 may be wireless communication devices.

In some cases, the video coding system 10 shown in FIG. 1 is merely exemplary, and the techniques provided herein are applicable to video coding settings (e.g., video coding or video decoding) that do not necessarily include coding devices and Decode any data communication between devices. In other examples, data is retrieved from local storage, sent over a network, and so on. A video encoding device may encode and store data into memory, and/or a video decoding device may retrieve and decode data from memory. In some examples, encoding and decoding are performed by devices that do not communicate with each other but simply encode data to memory and/or retrieve and decode data from memory.

2 is an illustrative diagram of an example of a video coding system 40 including video encoder 20 of FIG. 2 and/or video decoder 30 of FIG. 3, according to an example embodiment. The video decoding system 40 may include an imaging device 41, a video encoder 20, a video decoder 30 (and/or a video encoder/decoder implemented by a processing circuit 46), an antenna 42, one or more processors 43, a or multiple memory stores 44 and/or a display device 45 .

As shown in FIG. 2 , imaging device 41 , antenna 42 , processing circuit 46 , video encoder 20 , video decoder 30 , processor 43 , memory storage 44 and/or display device 45 are capable of communicating with each other. In different examples, the video coding system 40 may include only the video encoder 20 or only the video decoder 30 .

In some examples, antenna 42 may be used to transmit or receive an encoded bitstream of video data. Additionally, in some instances, display device 45 may be used to present video data. The processing circuit 46 may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like. The video decoding system 40 may also include an optional processor 43, and the optional processor 43 may similarly include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like. In addition, the memory storage 44 can be any type of memory, such as volatile memory (for example, static random access memory (static random access memory, SRAM), dynamic random access memory (dynamic random access memory, DRAM), etc.) or non-volatile memory volatile memory (for example, flash memory, etc.) and the like. In a non-limiting example, memory storage 44 may be implemented by cache memory. In other examples, processing circuitry 46 may include memory (eg, cache, etc.) for implementing an image buffer or the like.

In some examples, video encoder 20 implemented by logic circuitry may include an image buffer (eg, implemented by processing circuitry 46 or memory storage 44 ) and a graphics processing unit (eg, implemented by processing circuitry 46 ). A graphics processing unit may be communicatively coupled to the image buffer. Graphics processing unit may include video encoder 20 implemented by processing circuitry 46 to implement the various modules discussed with reference to FIG. 2 and/or any other encoder system or subsystem described herein. Logic circuits may be used to perform the various operations discussed herein.

In some examples, video decoder 30 may be implemented by processing circuitry 46 in a similar manner to implement the various aspects discussed with reference to video decoder 30 of FIG. 3 and/or any other decoder system or subsystem described herein. module. In some examples, logic circuit implemented video decoder 30 may include an image buffer (implemented by processing circuit 46 or memory storage 44 ) and a graphics processing unit (eg, implemented by processing circuit 46 ). A graphics processing unit may be communicatively coupled to the image buffer. Graphics processing unit may include video decoder 30 implemented by processing circuitry 46 to implement the various modules discussed with reference to FIG. 3 and/or any other decoder system or subsystem described herein.

In some examples, antenna 42 may be used to receive an encoded bitstream of video data. As discussed, an encoded bitstream may contain data related to encoded video frames, indicators, index values, mode selection data, etc., as discussed herein, such as data related to encoding partitions (e.g., transform coefficients or quantized transform coefficients , (as discussed) an optional indicator, and/or data defining an encoding split). Video coding system 40 may also include video decoder 30 coupled to antenna 42 and used to decode the encoded bitstream. A display device 45 is used to present video frames.

It should be understood that, for the example described with reference to the video encoder 20 in the embodiment of the present application, the video decoder 30 may be used to perform a reverse process. With regard to signaling syntax elements, the video decoder 30 may be configured to receive and parse such syntax elements and decode the associated video data accordingly. In some examples, video encoder 20 may entropy encode the syntax elements into an encoded video bitstream. In such instances, video decoder 30 may parse such syntax elements and decode the related video data accordingly.

For ease of description, refer to the general video coding (Versatile video coding, VVC) reference software or by the ITU-T Video Coding Experts Group (Video Coding Experts Group, VCEG) and ISO/IEC Motion Picture Experts Group (Motion Picture Experts Group, MPEG) Embodiments of the present invention are described in High-Efficiency Video Coding (HEVC) developed by the Joint Collaboration Team on Video Coding (JCT-VC). Those of ordinary skill in the art understand that embodiments of the present invention are not limited to HEVC or VVC.

FIG. 3 is a schematic diagram of a video decoding device 300 provided by an embodiment of the present invention. The video coding apparatus 300 is suitable for implementing the disclosed embodiments described herein. In one embodiment, the video decoding device 300 may be a decoder, such as the video decoder 30 in FIG. 1 , or an encoder, such as the video encoder 20 in FIG. 1 .

The video decoding device 300 includes: an input port 310 (or input port 310) for receiving data and a receiving unit (receiver unit, Rx) 320; a processor, a logic unit or a central processing unit (central processing unit) for processing data , CPU) 330; For example, the processor 330 here can be a neural network processor 330; a sending unit (transmitter unit, Tx) 340 and an output port 350 (or output port 350) for transmitting data; memory 360. The video decoding device 300 may also include an optical-to-electrical (OE) component and an electrical-to-optical (EO) component coupled to the input port 310, the receiving unit 320, the transmitting unit 340 and the output port 350, For the exit or entrance of optical or electrical signals.

The processor 330 is realized by hardware and software. Processor 330 may be implemented as one or more processor chips, cores (eg, multi-core processors), FPGAs, ASICs, and DSPs. Processor 330 is in communication with ingress port 310 , receiving unit 320 , transmitting unit 340 , egress port 350 and memory 360 . The processor 330 includes a decoding module 370 (eg, a neural network NN based decoding module 370 ). The decoding module 370 implements the embodiments disclosed above. For example, the decode module 370 performs, processes, prepares, or provides for various encoding operations. Thus, a substantial improvement is provided to the functionality of the video coding device 300 by the decoding module 370 and the switching of the video coding device 300 to different states is effected. Alternatively, decode module 370 is implemented as instructions stored in memory 360 and executed by processor 330 .

Memory 360, including one or more magnetic disks, tape drives, and solid-state drives, may be used as an overflow data storage device for storing programs when such programs are selected for execution, and for storing instructions and data that are read during program execution. The memory 360 can be volatile and/or nonvolatile, and can be a read-only memory (ROM), a random access memory (RAM), a ternary content-addressable memory (ternary) content-addressable memory (TCAM) and/or static random-access memory (static random-access memory, SRAM).

FIG. 4 is a simplified block diagram of an apparatus 400 provided by an exemplary embodiment. The apparatus 400 may be used as either or both of the source device 12 and the destination device 14 in FIG. 1 .

Processor 402 in apparatus 400 may be a central processing unit. Alternatively, processor 402 may be any other type of device or devices, existing or to be developed in the future, capable of manipulating or processing information. While the disclosed implementations can be implemented using a single processor, such as processor 402 as shown, it is faster and more efficient to use more than one processor.

In one implementation, memory 404 in apparatus 400 may be a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device may be used as memory 404 . Memory 404 may include code and data 406 accessed by processor 402 via bus 412 . Memory 404 may also include an operating system 408 and application programs 410, including at least one program that allows processor 402 to perform the methods described herein. For example, application programs 410 may include applications 1 through N, and also include a video coding application that performs the methods described herein.

Apparatus 400 may also include one or more output devices, such as display 418 . In one example, display 418 may be a touch-sensitive display that combines the display with touch-sensitive elements that may be used to sense touch input. Display 418 may be coupled to processor 402 via bus 412 .

Although bus 412 in device 400 is described herein as a single bus, bus 412 may include multiple buses. Additionally, secondary storage may be directly coupled to other components of device 400 or accessed over a network, and may include a single integrated unit such as a memory card or multiple units such as multiple memory cards. Accordingly, apparatus 400 may have a wide variety of configurations.

Codecs and Codec Methods

FIG. 5 is a schematic block diagram of an example of a video codec for implementing the technology of the present application. In the example of FIG. 5 , the video encoder 20 includes an encoding unit 501 , a forward transform unit 502 and a probability estimation unit 503 ; the video decoder 30 includes a decoding unit 504 , a sampling unit 505 and an inverse transform unit 506 . The video codec shown in FIG. 5 may also be referred to as an end-to-end video codec or a video codec based on an end-to-end video codec.

coding unit 501

The encoding unit 501 performs image encoding on the image to be encoded to obtain a compressed code stream.

Optionally, the above-mentioned image encoding may be a joint photographic experts group (JPEG) encoding method, a JPEG2000 encoding method, an H.264 intra-frame encoding method, an H.265 intra-frame encoding method, or an H.266 intra-frame encoding method. method or other image encoding methods.

forward transformation unit 502

The forward transformation unit 502 is used to transform the first image to obtain a first transformed image.

Wherein, the first image is an image to be encoded or an image that has been decoded.

Optionally, the forward transformation unit 502 is further configured to transform the second image to obtain a second transformed image.

Wherein, the second image is an image to be encoded or a decoded image, and the first image is different from the second image.

In an example, N times of wavelet transformation are performed on the first image, 3N+1 subbands, each subband includes one or more wavelet coefficients, and N is an integer greater than 0.

Wherein, the wavelet transform method may be a traditional wavelet transform or a deep network-based wavelet transform or other similar transform methods, which are not limited here. The difference between the deep network-based wavelet transform method and the traditional wavelet transform lies in that the transformation and prediction are implemented using the deep network-based method, and the specific implementation method of the deep network is not limited here. The present application takes a wavelet transform as an example, that is, N=1, as shown in FIG. 6a, four subbands LL1, HL1, LH1 and HH1 are obtained after the first image undergoes a wavelet transform.

For the first image, the image composed of subbands obtained by performing wavelet transform on the first image is the above-mentioned first transformed image. Similarly, for the second image, the subband obtained by performing wavelet transform on the second image The formed image is the above-mentioned second converted image.

Optionally, after the wavelet transform obtains multiple wavelet coefficients, each wavelet coefficient is quantized to obtain multiple quantized wavelet coefficients. Specifically, when quantizing each wavelet coefficient, each subband can be processed according to a preset order one, and then the wavelet coefficients in the current subband can be quantized according to a preset order two to obtain quantized wavelet coefficients, wherein the preset The order one can be the existing zigzag scanning order, for example: LL1→HL1→LH1→HH1. The second preset order can be an existing zigzag scanning order, horizontal scanning order or vertical scanning order.

It should be understood that the preset order 1 and the preset order 2 above are just examples, and are not limitations on the application, and of course other orders may also be used.

Optionally, before quantizing each wavelet coefficient, the wavelet coefficient can be preprocessed to obtain the processed wavelet coefficient, and then the preprocessed wavelet coefficient can be quantized, for example: the obtained wavelet coefficient is subjected to a The neural network performs feature extraction, and then quantifies the feature extraction results. Processing the wavelet coefficients before quantization can enable the decoder to decode and obtain a high-quality first reconstructed image.

For the first image, the image composed of quantized wavelet coefficients obtained by quantizing the wavelet coefficients obtained by performing wavelet transformation on the first image is the above-mentioned first transformed image. Similarly, for the second image, based on the first image The image formed by the quantized wavelet coefficients obtained by quantizing the wavelet coefficients obtained by performing wavelet transformation on the second image is the second transformed image.

In another example, DCT is performed on the first image to obtain a DCT image, the DCT image includes a plurality of frequency bands, and each frequency band includes one or more DCT coefficients; wherein, after the first image is transformed, its low-frequency components are concentrated in In the upper left corner, the high-frequency components are distributed in the lower right corner, where the coefficient value in the first row and first column represents the direct current (DC) coefficient, that is, the average value of the first image, and the other coefficients are the alternating current (AC) coefficient, DC coefficient and AC coefficient Collectively referred to as DCT coefficients.

Optionally, block division is performed on the first image to obtain multiple image blocks, and then DCT is performed in units of image blocks to obtain transform blocks. For example, 1) divide the first image into image blocks of a preset size, and the sizes of the image blocks of the preset size may be 4x4, 8x8, 16x16, 32x32, 64x64, 128x128, and 256x256. Or 2) dividing the first image to obtain one or more image blocks, and the size of the image blocks is not limited. The first image may be divided using a quadtree, binary tree or ternary tree division method in an existing encoding standard (H266, H265, H264, AVS2 or AVS3) to obtain one or more image blocks.

It should be pointed out that a frequency band can be understood as a coefficient block (a coefficient block obtained by performing DCT transformation on an image block, because the DCT transformation is based on a block) or as coefficients at the same position in each coefficient block to form a frequency band.

It should be understood that the image formed based on the DCT coefficients obtained from the first image is the above-mentioned first transformed image. The second image may also be processed in the above manner to obtain DCT coefficients of the second image, and the DCT coefficients of the second image may constitute the second transformed image.

Optionally, the obtained DCT coefficients are quantized, such as uniformly quantized, to obtain quantized DCT coefficients. For the first image, the image formed based on the quantized DCT coefficients obtained for the first image is the above-mentioned first transformed image. Similarly, for the second image, the image formed based on the quantized DCT coefficients obtained for the second image is The image is the above-mentioned second transformed image.

In another example, feature extraction is performed on the first image to obtain a three-dimensional feature map, and the three-dimensional feature map is the above-mentioned first transformed image. Optionally, the feature coefficients in the three-dimensional feature map are quantized to obtain quantized feature coefficients, and the three-dimensional feature map formed by the quantized feature coefficients is the first transformed image.

It should be understood that the above-mentioned processing can be performed on the second image, and the obtained three-dimensional feature map is the above-mentioned second transformed image; or the three-dimensional feature map composed of quantized feature coefficients obtained by quantizing the feature coefficients in the three-dimensional feature map is the above-mentioned first three-dimensional feature map. 2 Transform the image.

It should be pointed out here that the forward transformation unit 502 is optional, so it is represented by a dotted box in FIG. 5 . That is to say, when the forward transformation unit 502 does not exist, the image in the pixel domain is input to the probability estimation unit 503 .

Probability estimation unit 503

The probability estimation unit 503 performs probability estimation according to the first context information of the first data to obtain a probability estimation result of the first data.

In an example, the first data is a pixel of the first image, and the first context information of the pixel includes all or part of the pixels in the first image. Further, the first context information of the pixel includes the pixels adjacent to the pixel in the first image, or includes part or all of the pixels in the image block adjacent to the pixel, or includes the pixels in the image block where the pixel is located. some or all of the pixels.

It should be noted here that the above-mentioned "surrounding pixels" refer to pixels whose distance from the first data is smaller than a preset threshold, and the unit of the preset threshold is "pixel".

In an example, the first data is a coefficient in the first transformed image, and if the first data is a wavelet coefficient or a quantized wavelet coefficient, the first context information of the first data includes some or all coefficients in the first transformed image , the coefficient is a wavelet coefficient or a quantized wavelet coefficient. Further, the first context information of the first data includes wavelet coefficients or quantized wavelet coefficients around the first data in the first transformed image, or the first context information includes part or all of the subbands adjacent to the first data A coefficient, the coefficient is a wavelet coefficient or a quantized wavelet coefficient; or the first context information includes part or all of the coefficients in the sub-band where the first data is located; the coefficient is a wavelet coefficient or a quantized wavelet coefficient;

or,

If the first data are DCT coefficients or quantized DCT coefficients, the first context information of the first data includes part or all of the coefficients in the first transformed image, and the coefficients are DCT coefficients or quantized DCT coefficients. The first context information of the first data includes DCT coefficients or quantized DCT coefficients around the first data in the first transformed image, or the first context information includes some or all of the coefficients in the subband adjacent to the first data. It is a DCT coefficient or a quantized DCT coefficient; or the first context information includes some or all coefficients in the subband where the first data is located; the coefficient is a DCT coefficient or a quantized DCT coefficient;

Alternatively, if the first data is a feature coefficient or a quantized feature coefficient, and the first transformed image is a three-dimensional feature map obtained by performing feature extraction on the first image, the first context information of the first data includes part or all of the first transformed image Coefficients, the coefficients are characteristic coefficients or quantized characteristic coefficients; further, the first context information of the first data includes characteristic coefficients or quantized characteristic coefficients around the first data in the first transformed image, or the first context information includes the first data Some or all of the coefficients in the channel. The coefficient is a characteristic coefficient or a quantized characteristic coefficient.

The aforementioned "surrounding wavelet coefficients or quantized wavelet coefficients" refer to wavelet coefficients or quantized wavelet coefficients whose distance from the first data is smaller than a preset threshold, and the unit of the preset threshold is "wavelet coefficients or quantized wavelet coefficients"; "Surrounding DCT coefficients or quantized DCT coefficients" refer to DCT coefficients or quantized DCT coefficients whose distance from the first data is less than a preset threshold, and the unit of the preset threshold is "DCT coefficients or quantized DCT coefficients"; the above " Surrounding characteristic coefficients or quantified characteristic coefficients" refer to characteristic coefficients or quantified characteristic coefficients whose distance from the first data is smaller than a preset threshold, and the unit of the preset threshold is "characteristic coefficient or quantified characteristic coefficient".

In an example, the probability estimation unit 503 performs probability estimation according to the first context information of the first data to obtain a probability estimation result of the first data, including:

The probability estimation unit 503 performs probability estimation according to the first context information and the second context information of the first data to obtain a probability estimation result of the first data; wherein, the first context information and the second context information are respectively based on the first image and the second image get.

For example, as shown in Figure 6b, assuming that the first data is the pixel at position P in the image to be encoded, the first context information of the first data includes the pixels around the pixel at position P in the image to be encoded (the gray block in Figure 6b shown), or include some or all of the pixels in the image block adjacent to the pixel at position P, or include some or all of the pixels in the image block where the pixel at position P is located; the second context information includes position P in the decoded image The pixels around the pixel at position , or include some or all of the pixels in the image block adjacent to the pixel at position P, or include some or all of the pixels in the image block where the pixel at position P is located.

Assuming that the first data is the wavelet coefficient or quantized wavelet coefficient at position P in the first transformed image, the first context information of the first data includes coefficients around the coefficient at position P in the first transformed image, and the coefficient is the wavelet coefficient or quantized wavelet coefficient The wavelet coefficients, or the first context information includes some or all of the coefficients in the subband adjacent to the coefficient at position P, and the coefficients are wavelet coefficients or quantized wavelet coefficients, or the first context information includes the subband where the coefficient at position P is located Part or all of the coefficients, the coefficients are wavelet coefficients or quantized wavelet coefficients; the second context information includes coefficients around the coefficients at position P in the second transformed image, and the coefficients are wavelet coefficients or quantized wavelet coefficients, or the second context information includes Some or all of the coefficients in the subband adjacent to the coefficient at position P, the coefficients are wavelet coefficients or quantized wavelet coefficients, or the second context information includes some or all of the coefficients in the subband where the coefficient at position P is located, and the coefficients are Wavelet coefficients or quantized wavelet coefficients.

Assuming that the first data is a DCT coefficient or quantized DCT coefficient at position P in the first transformed image, the first context information of the first data includes coefficients around the coefficient at position P in the first transformed image, and the coefficient is a DCT coefficient or quantized DCT coefficients, or the first context information includes part or all of the coefficients in the frequency band adjacent to the coefficient at position P, and the coefficients are DCT coefficients or quantized DCT coefficients, or the first context information includes the part in the frequency band where the coefficient at position P is located Or all coefficients, the coefficients are DCT coefficients or quantized DCT coefficients; the second context information includes coefficients around the coefficient at position P in the second transformed image, and the coefficients are DCT coefficients or quantized DCT coefficients, or the second context information includes the same as the position Some or all of the coefficients in the frequency band adjacent to the coefficient at P, the coefficients are DCT coefficients or quantized DCT coefficients, or the second context information includes some or all of the coefficients in the frequency band where the coefficient at position P is located, and the coefficients are DCT coefficients or quantized DCT coefficients.

Assume that the first data is the feature coefficient or quantized feature coefficient at position P in the first transformed image, that is, the first transformed image and the second transformed image are three-dimensional feature maps obtained by feature extraction of the first image and the second image respectively; The first context information of the first data includes characteristic coefficients or quantized characteristic coefficients around the coefficient at position P in the first transformed image, or the first context information includes some or all coefficients in channels adjacent to the characteristic coefficient at position P, The coefficient is a characteristic coefficient or a quantized characteristic coefficient, or the first context information includes part or all of the coefficients in the channel where the coefficient at the position P is located, and the coefficient is a characteristic coefficient or a quantized characteristic coefficient; the second context information includes the position in the second transformed image Characteristic coefficients or quantized characteristic coefficients around the coefficient at position P, or the second context information includes some or all coefficients in the channel adjacent to the coefficient at position P, and the coefficients are characteristic coefficients or quantized characteristic coefficients, or the second context information includes Part or all of the coefficients in the channel from which the coefficient at position P comes out, the coefficients are characteristic coefficients or quantized characteristic coefficients.

In an example, the probability estimation unit 503 further performs probability estimation according to the first context information of the second data to obtain a probability estimation result of the second data.

It should be pointed out that the second data and the first data belong to data in different positions of the same image (such as the first image or the first transformed image obtained by transforming the first image), and the probability is calculated according to the first context information of the second data. For the specific process of estimating and obtaining the probability estimation result of the second data, refer to the related description of obtaining the probability estimation result of the first data by performing probability estimation according to the first context information of the first data, and will not be described here again.

In a feasible embodiment, the first data and the second data belong to the same preset area, and the preset area may be an image block in the first image, or a subband obtained by performing wavelet transform on the first image, Or for the frequency band obtained by performing DCT on the first image, or a channel of the three-dimensional feature map obtained by performing feature extraction on the first image, only one probability estimation result can be obtained during probability estimation, and the probability estimation result can be called preset Probability estimates for the region. For data in a preset area, only one probability estimation result is obtained, and only one probability estimation result (that is, the probability estimation result of the preset area) needs to be transmitted during transmission, which can save code streams.

The following describes how to obtain the probability estimation result of the first preset area.

Method 1: For each data in the first preset area, the probability estimation result of all data in the first preset area can be obtained by processing according to the above method of obtaining the probability estimation result of the first data, such as the first preset area If there are 5 data in it, 5 probability estimation results can be obtained; then the target probability estimation result is selected from the probability estimation results of all the data in the first preset area as the probability estimation result of the first preset area. For example, the probability estimation result of the data located in the middle of the first preset area, or the upper left corner or the upper right corner, the lower left corner or the lower right corner is the probability estimation result of the first preset area.

Method 2: Perform probability estimation according to the first context information of the first preset area to obtain the probability estimation result of the first preset area; or perform probability estimation according to the first context information and the second context information of the first preset area to obtain the second A probability estimation result of a preset area.

In an example, if the first preset area is an image block of the first image, the first context information of the first preset area includes some or all pixels in the first image, further, the first preset area The first context information includes some or all pixels in the image blocks around the first preset area in the first image;

If the preset area is a subband of the first transformed image (obtained by performing wavelet transform on the first image), the first context information of the first preset area includes some or all coefficients in the first transformed image, further, The first context information of the first preset area includes some or all coefficients in subbands around the first preset area in the first image, the coefficients being wavelet coefficients or quantized wavelet coefficients;

If the first preset area is a frequency band of the first transformed image (obtained by performing DCT on the first image), the first context information of the first preset area includes some or all coefficients in the first transformed image, further, The first context information of the first preset area includes some or all coefficients in the frequency band around the first preset area in the first image, and the coefficients are DCT coefficients or quantized DCT coefficients;

If the first preset area is a channel of the first transformed image (a three-dimensional feature map obtained by performing feature extraction on the first image), the first context information of the first preset area includes some or all coefficients in the first transformed image , the coefficient is a characteristic coefficient or a quantized characteristic coefficient. Further, the first context information of the first preset area includes some or all coefficients in the channel to which the first preset area belongs in the first image, and the coefficient is a characteristic coefficient or a quantized characteristic coefficient.

When the probability estimation unit 503 performs probability estimation according to the first context information and the second context information of the first preset area to obtain the probability estimation result of the first preset area, wherein the first context information and the second context information are respectively based on the first context information An image and a second image, wherein the second image is an image to be encoded or a decoded image, and the first image is different from the second image.

For example, as shown in Figure 6c, assuming that the first preset area is area B in the first image, the first context information of the first preset area includes the surrounding area of area B in the first image (the left figure in Figure 6c Part or all of the pixels in the gray block shown in FIG. 6 c ); the second context information includes part or all of the pixels in the surrounding area of area B in the second image (the gray block shown in the right figure in FIG. 6 c ). Region B in the first image is an image block in the first image.

Assuming that the first preset area is subband B in the first transformed image, the first context information of the first preset area includes all or part of the coefficients in the subbands around subband B in the first transformed image, and the second context information includes All or part of the coefficients in subbands around subband B in the second transformed image are wavelet coefficients or quantized wavelet coefficients.

Assuming that the first preset area is the frequency band B in the first transformed image, the first context information of the first preset area includes all or part of the coefficients in the frequency band around subband B in the first transformed image, and the second context information includes the second Transform all or part of the coefficients in the frequency band around the sub-band B in the image, and the coefficients are DCT coefficients or quantized DCT coefficients.

Assuming that the first preset area is channel B in the first transformed image, the first context information of the first preset area includes all or part of the coefficients in the subbands around channel B in the first transformed image, and the second context information includes the second Transform all or part of the coefficients in channels around channel B in the image, and the coefficients are wavelet coefficients or quantized wavelet coefficients.

In an example, for the probability estimation result of the first data, the probability estimation unit 503 obtains the probability distribution model of the first data; and performs the first context information and/or the second context information of the first data through the first probability estimation network. Processing to obtain the parameters of the probability distribution model; obtain the probability distribution of the first data according to the probability distribution model of the first data and the parameters of the probability distribution model; the probability estimation result of the above-mentioned first data includes the probability distribution of the above-mentioned first data , or the parameters of the probability distribution model of the above-mentioned first data;

or,

Processing the first context information and/or the second context information of the first data through the second probability estimation network to obtain the probability distribution of the first data; the probability estimation result of the first data includes the probability distribution of the first data, Or include parameters of a probability distribution model corresponding to the probability distribution, wherein the first probability estimation network and the second probability estimation network are implemented based on a neural network.

In the manner described above, the probability estimation result of the above-mentioned second data can be obtained.

In an example, the probability estimation result of the first preset area can be obtained in the following manner:

The probability estimation unit 503 obtains the probability distribution model of the first preset area; processes the first context information and/or the second context information of the first preset area through a third probability estimation network to obtain parameters of the probability distribution model ; According to the probability distribution model of the first preset area and the parameters of the probability distribution model, the probability distribution of the first preset area is obtained; wherein, the probability estimation result of the first preset area includes the probability distribution of the first preset area , or the parameters of the probability distribution model of the above-mentioned first preset area;

or,

Processing the first context information and/or the second context information of the first preset area through the fourth probability estimation network to obtain the probability distribution of the first preset area; the probability estimation result of the first preset area includes the second A probability distribution of a preset area, or parameters of a probability distribution model corresponding to the probability distribution; wherein, the third probability estimation network and the fourth probability estimation network are realized based on a neural network.

Optionally, the above probability distribution model may be: a single Gaussian model (Gaussian single model, GSM), an asymmetric Gaussian model, a mixed Gaussian model (Gaussian mixture model, GMM) or a Laplace distribution model (Laplace distribution). Wherein, the probability estimation network can be implemented based on a deep learning network, such as a recurrent neural network (recurrent neural network, RNN) and a pixel convolutional neural network (Pixel convolutional neural network, PixelCNN), etc., which are not limited here.

As an example, when the probability distribution model is a Gaussian model (a single Gaussian model or an asymmetric Gaussian model or a mixed Gaussian model), the parameters of the probability distribution model are parameters of the Gaussian model, including mean μ and variance σ.

As an example, when the probability distribution model is a Laplace distribution model, the parameters of the probability distribution model are parameters of the Laplace distribution model, including a location parameter μ and a scale parameter b.

As an example, a typical probability estimation network based on PixelCNN (including the first probability estimation network, the second probability estimation network, the third probability estimation network and the fourth probability estimation network) is shown in Fig. 6d. "h×w" indicates that the current convolutional layer uses a convolution kernel with a size of "h×w", "ResB" indicates the residual module, and the structure is shown in Figure 6e, "*/relu" indicates that relu is used after the current layer activation function.

In an example, after obtaining the approximate estimation result of the first data, the probability estimation unit 503 performs preprocessing on the approximate estimation result of the first data to obtain a processed probability estimation result. Specifically, if the probability estimation result of the first data includes the mean and variance of the Gaussian distribution, the variance of the Gaussian distribution is processed to obtain the processed variance, and the mean and the processed variance of the Gaussian distribution are used as the processed probability of the first data estimated results; or,

The mean value of the Gaussian distribution is processed to obtain the processed mean value, and the variance of the Gaussian distribution and the processed mean value are used as the processed probability estimation result of the first data.

In one example, the variance of the Gaussian distribution is processed to obtain the processed variance, including:

The variance of the Gaussian distribution is set to 0 as the variance after processing.

Process the variance of the Gaussian distribution according to the scaling factor of the first data to obtain the processed variance;

Wherein, the scaling factor of the first data is the same as the scaling factor of the second data; or,

the scaling factor of the first data and the scaling factor of the second data are different; or,

or,

If the first data and the second data belong to one frequency band among a plurality of frequency bands obtained by performing DCT on the first image, then the scaling factor of the first data and the scaling factor of the second data are the same; or if the first data and the second data belong to Different frequency bands, the scaling factor of the first data is different from the scaling factor of the second data; if or the scaling factor of the first data is determined according to the texture complexity of the frequency band to which the first data belongs;

or,

In one example, when the probability estimation result of the first data includes the location parameter and the scale parameter of the Laplace distribution, the scale parameter of the Laplace distribution is processed according to the scaling factor of the first data, and the processing of the first data The final probability estimation results include the processed scale parameters and the location parameters of the Laplace distribution.

In one example, when the probability estimation result of the first data includes the location parameter and scale parameter of the Laplace distribution, the location parameter of the Laplace distribution is processed according to the scaling factor of the first data, and the processing of the first data The final probability estimation results include the processed location parameters and the scale parameters of the Laplace distribution.

After an example, after obtaining the approximate estimation result of the first preset area, the probability estimation unit 503 preprocesses the approximate estimation result of the first preset area to obtain the processed probability estimation result. Specifically, if the probability estimation result of the first preset area includes the mean value and variance of the Gaussian distribution, the variance of the Gaussian distribution is processed to obtain the processed variance, and the mean value and the processed variance of the Gaussian distribution are used as the value of the first preset area. Processed probability estimates; or,

The mean value of the Gaussian distribution is processed to obtain the processed mean value, and the variance of the Gaussian distribution and the processed mean value are used as the processed probability estimation result of the preset area. In an example, processing the variance of the Gaussian distribution to obtain the processed variance includes: setting the variance of the Gaussian distribution to 0 as the processed variance. In one example, the variance of the Gaussian distribution is processed to obtain the processed variance, including:

Processing the variance of the Gaussian distribution according to the scaling factor of the first preset area to obtain the processed variance;

Wherein, the scaling factor of the first preset area is the same as that of other preset areas; or,

The scaling factor of the first preset area is different from that of other preset areas.

In an example, when the probability estimation result of the first preset area includes the position parameter and the scale parameter of the Laplace distribution, the scale parameter of the Laplace distribution is processed according to the scaling factor of the first preset area, the first The processed probability estimation result of a preset area includes processed scale parameters and location parameters of Laplace distribution.

In an example, when the probability estimation result of the first preset area includes the location parameter and the scale parameter of the Laplace distribution, the location parameter of the Laplace distribution is processed according to the scaling factor of the first preset area, the first The processed probability estimation result of a preset area includes processed position parameters and scale parameters of Laplace distribution.

In an example, after obtaining the probability estimation results of the first data and the probability estimation results of the second data, the encoding unit 501 directly writes the probability estimation results of the first data and the probability estimation results of the second data into the compressed code stream. In one example, in video compression, the probability estimation result of the first data and the probability estimation result of the second data can be stored in the sequence header (sequence header), image header (picture header), Slice (slice header) or attached Enhanced information (supplemental enhancement information, SEI) is transmitted to the decoder 30.

In an example, after obtaining the probability estimation result of the first preset area, the first flag enable_flag of the first preset area is set to a first value (for example, 1 or true), to indicate that the decoding end obtains the first Use the same probability distribution when estimating coefficients in a preset area, that is, the probability estimation result of the first preset area, and save the probability estimation result of the first preset area in the probability estimation result set, and record the first preset The probability estimation result of the region is indexed in the probability estimation result set and the size information of the first preset region, and the encoding unit 501 writes the probability estimation result set, enable_flag, index and size information of the first preset region into the compressed code stream.

It should be pointed out that for multiple different preset areas, multiple probability estimation results can be obtained, and the multiple probability estimation results form a probability estimation result set, and the position of the probability estimation result of the preset area in the probability estimation result set , which is the index of the preset area.

In one example, the probability estimation result set may be transmitted to the decoder 30 through an adaptation parameter set (APS).

In an example, after obtaining the probability estimation result of the first preset area, the enable_flag of the first preset area is set to the first value (such as 1 or true) to indicate that the first preset is obtained by sampling at the decoding end The same probability distribution is used when estimating coefficients in the area, that is, the probability estimation result of the first preset area; the encoding unit 501 writes the probability estimation result of the first preset area, enable_flag and the size information of the first preset area into the compressed code flow.

In an example, if all the data in the first preset area use their respective probability estimation results when sampling, the enable_flag of the first preset area is set to the second value (such as 0 or false), and the encoding unit 501 sets the The respective probability estimation results of all the data in a preset area and the enable_flag of the first preset area are written into the compressed code stream. Optionally, the coding unit 501 also writes the size information of the first preset area into the compressed code stream.

In an example, the encoding unit does not write the size information of the preset area into the code stream. Before encoding and decoding, the encoding end and the decoding end can negotiate the size of the preset area, and save the size of the preset area in the codec in advance. terminal and decoding terminal.

decoding unit 504

The decoding unit 504 decodes the compressed code stream to obtain a first probability estimation result.

In an example, the decoding unit 504 further decodes the compressed code stream to obtain the second probability estimation result.

Optionally, the first probability estimation result includes parameters of the first probability distribution or the first probability distribution model. The second probability estimation result includes parameters of the second probability distribution or the second probability distribution model.

In one example, the decoding unit 504 also decodes the first identifier from the compressed code stream. If the first identifier is the first value, it means that the same probability estimation result ( That is, the probability estimation result of the first preset area), the first preset area is an area in the enhanced image; the decoding unit 504 also decodes the probability estimation result set and the index of the first preset area from the compressed code stream, The probability estimation result set includes probability estimation results of multiple preset areas, and the decoding unit 504 obtains the probability estimation results of the first preset area from the probability estimation result set according to the index of the first preset area. Probability estimate results;

If the first flag is the second value, it means that the respective probability estimation results of the estimated coefficients are used when sampling all the estimated coefficients in the first preset area; the decoding unit 504 decodes the size information H1 of the first preset area from the code stream *W1, indicating that the decoding unit 504 decodes H1*W1 probability estimation results from the compressed code stream, and the sampling unit 505 can obtain all estimated coefficients in the first preset area by sampling the H1*W1 probability estimation results, H1 and W1 are all integers greater than 1.

In one example, the decoding unit 504 also decodes the first identifier from the compressed code stream, indicating that the same probability estimation result (that is, the probability estimation result of the first preset area) is used when sampling to obtain all estimated coefficients in the first preset area ), the first preset area is an area in the enhanced image, and the decoding unit 504 also decodes the probability estimation result and H1*W1 of the first preset area from the code stream, and the sampling unit 505 passes the The probability estimation result is sampled H1*W1 times to obtain H1*W1 estimated coefficients, that is, the first preset area includes H1*W1 estimated coefficients.

Sampling unit 505

The sampling unit 505 performs sampling according to the first probability estimation result to obtain the first estimated coefficient, and performs sampling according to the first probability estimation result to obtain the second estimated coefficient. Since the two sampling processes are consistent, the following uses sampling according to the first probability estimation result to obtain the second estimated coefficient An estimated coefficient to specify.

In an example, the first probability estimation result includes the mean and variance of the Gaussian distribution, and the sampling unit 505 performs sampling according to the first probability estimation result to obtain the first estimation coefficient, including:

Specifically, use the linear congruence method to generate a uniformly distributed random number u on [0,1]; let

Then z ₁ obeys the standard Gaussian distribution. Among them, erf() is the Gaussian error function, which is the cumulative distribution function of the standard normal distribution, defined as follows:

Let z ₂ =δ·z ₁ +μ, then z ₂ obeys the Gaussian distribution with mean value μ and variance δ, and z ₂ is the above-mentioned first estimation coefficient, where δ and μ are the above-mentioned first probability estimation results respectively mean and variance.

Optionally, before sampling, the variance of the first probability estimation result is processed, and the specific processing process includes: setting the variance of the first probability estimation result to 0 as the processed variance; and then according to the processed variance and The mean value of the first probability estimation result is sampled according to the above sampling manner to obtain the first estimated coefficient.

Optionally, before sampling, process the variance of the first probability estimation result according to the scaling factor of the first estimation coefficient, and then perform sampling according to the above-mentioned sampling method according to the processed variance and the mean value of the probability estimation result to obtain the first estimated coefficients.

Optionally, before sampling, the mean value of the first probability estimation result is processed according to the scaling factor of the first estimation coefficient, and then according to the processed mean value and the variance of the first probability estimation result, sampling is performed according to the above sampling method to obtain The first estimated coefficient.

It should be understood that sampling may be performed according to the second probability estimation result in the above manner to obtain the second estimated coefficient.

Optionally, wherein the scaling factor of the first estimated coefficient is the same as the scaling factor of the second estimated coefficient; or,

the scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient are different; or,

When the first estimated coefficient and the second estimated coefficient are quantized wavelet coefficients or wavelet coefficients, if the first estimated coefficient and the second estimated coefficient belong to the same subband, the scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient The factors are the same; or if the first estimated coefficient and the second estimated coefficient belong to different subbands, the scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient are different; or the scaling factor of the first estimated coefficient is based on the first estimated Determined by the texture complexity of the subband to which the coefficient belongs;

or,

When the first estimated coefficient and the second estimated coefficient are quantized DCT coefficients or DCT coefficients, if the first estimated coefficient and the second estimated coefficient belong to the same frequency band, the scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient The factors are the same; or if the first estimated coefficient and the second estimated coefficient belong to different frequency bands, the scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient are different; or the scaling factor of the first estimated coefficient is based on the first estimated coefficient The texture complexity of the band to which it belongs is determined;

or,

When the first estimated coefficient and the second estimated coefficient are characteristic coefficients or characteristic coefficients, if the first estimated coefficient and the second estimated coefficient belong to the same channel, the scaling factor of the first estimated coefficient is the same as the scaling factor of the second estimated coefficient; Or if the first estimated coefficient and the second estimated coefficient belong to different channels, the scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient are different; or the scaling factor of the first estimated coefficient is based on the channel to which the first estimated coefficient belongs The texture complexity is determined.

In an example, the first estimated coefficient and the second estimated coefficient are pixel values, and the variance of the first probability estimation result is preprocessed to obtain the processed variance, including:

Preprocess the variance of the first probability distribution according to the scaling factor of the first coefficient to obtain the processed variance,

The scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient are the same, or the scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient are not the same; or, the scaling factor of the first estimated coefficient is based on the first estimate Determined by the texture complexity of the channel the coefficient belongs to.

By preprocessing the first probability distribution, reconstructed images with different properties can be obtained according to user requirements. For example, if the variance of the first probability distribution is set to 0 as the variance after processing, the reconstructed image with the best signal quality (best objective quality) can be obtained, that is, the PSNR of the image can be increased or the MSE can be reduced; by scaling multiple coefficients If the factors are set to be the same, the image with the best subjective quality can be obtained, that is, to reduce the PSNR of the image or to increase the MSE of the image; The scaling factors of the coefficients are set to be different, and images whose properties are between the best subjective quality and the best objective quality can be obtained.

When the first probability estimation result includes the location parameter and scale parameter of the Laplace distribution, the probability estimation is performed according to the first probability estimation result to obtain the first estimated coefficient, including:

Generate two uniformly distributed random numbers μ ₁ and μ ₁ , set z ₃ =b·log(μ ₁ ), z ₄ =b·log(μ ₂ ), and the first estimated coefficient is z ₅ =z ₃ -z ₄ +μ, where μ and b are the location and scale parameters of the Laplace distribution, respectively.

Optionally, before sampling, the scale parameter of the Laplace distribution is processed according to the scaling factor of the first coefficient, and then according to the processed scale parameter and the position parameter of the Laplacian distribution, the above sampling method is performed Sampling to obtain the first estimated coefficients.

Optionally, before sampling, the location parameter of the Laplace distribution is processed according to the scaling factor of the first coefficient, and then the sampling method is performed according to the processed location parameter and the scale parameter of the Laplace distribution Sampling to obtain the first estimated coefficients.

In the manner described above, a plurality of estimated coefficients can be obtained, and the plurality of estimated coefficients include a first estimated coefficient and a second estimated coefficient.

Inverse transformation unit 506

The inverse transformation unit 506 obtains the enhanced image according to a plurality of estimated coefficients,

Specifically, if the multiple estimated coefficients are multiple quantized wavelet coefficients, the inverse transform unit 506 performs inverse quantization and wavelet inverse transform on the multiple estimated coefficients to obtain an enhanced image, or,

If the multiple estimated coefficients are multiple wavelet coefficients, the inverse transform unit 506 performs wavelet inverse transform on the multiple estimated coefficients to obtain an enhanced image, or,

If the multiple estimated coefficients are multiple quantized DCT coefficients, the inverse transform unit 506 performs inverse quantization and inverse DCT on the multiple estimated coefficients to obtain a reconstructed image, or,

If the multiple estimated coefficients are multiple DCT coefficients, the inverse transform unit 506 performs inverse DCT on the multiple estimated coefficients to obtain an enhanced image.

If the multiple estimated coefficients are multiple pixel values, that is, multiple reconstructed pixel values, the enhanced image is obtained based on the multiple estimated coefficients.

In an example, after obtaining a feature map composed of multiple feature elements, the feature map may be passed through a neural network to output the enhanced image. The neural network can adopt any structure, such as a fully connected network, a convolutional neural network, a recurrent neural network, and the like. The neural network can adopt a deep neural network structure with a multi-layer structure to obtain the first reconstructed image or the second reconstructed image with better quality.

In an example, after obtaining a feature map composed of multiple feature elements, the feature map can be input into a machine vision task module to perform corresponding machine tasks. For example, complete machine vision tasks such as object classification, recognition, and segmentation.

It should be pointed out here that the encoding end scheme of this embodiment is to obtain a compressed bit stream after encoding the image to be encoded, and then refer to the encoded information (such as the compressed bit stream or the coefficient information transformed during the encoding process) to perform probability estimation. The solution at the decoding end of this embodiment is performed on the premise that the compressed code stream is decoded to obtain a decoded image. It can also be said that the solution at the decoding end of this embodiment is a post-processing process.

It can be seen that the probability estimation is performed at the encoding end, the probability estimation result is obtained, and the probability estimation result is transmitted to the decoding end. The decoding end performs sampling based on the probability estimation result to obtain the estimated coefficient, and the estimated coefficient obtained by re-sampling obtains an enhanced image. Since the sampling process is random and is an uncertain process, multiple high-quality images of different properties can be obtained by decoding the same compressed code stream multiple times in the above-mentioned manner. For example, the image with the best subjective quality and the image with the best objective quality.

In one example, during the entropy encoding process, the encoding unit 501 first performs probability estimation on the first data to obtain the probability estimation result of the first data, which is called the probability estimation result A; Entropy encoding is performed on the first data; in the process of entropy decoding, the decoding unit 504 first performs probability estimation on the first data to obtain the probability estimation result of the first data, which may also be called probability estimation result A; The estimated result A is entropy decoded. The probability estimation result mentioned in the above embodiment is called the probability estimation result B.

Optionally, entropy encoding is performed on the first data at the encoding end according to the probability estimation result A, and the decoding end performs probability estimation on the first data according to the manner in which the encoding end performs probability estimation on the first data, and obtains the probability estimation result (also can be As the probability estimation result A), entropy decoding is performed according to the probability estimation result A, and sampling may also be performed according to the probability estimation result A, and the sampling method is consistent with the above-mentioned embodiment.

Optionally, entropy encoding is performed on the first data at the encoding end according to the probability estimation result A, and the probability estimation result A is transmitted to the decoding end, and the decoding end performs entropy decoding according to the probability estimation result A, and may also perform Sampling, the sampling method is consistent with the above-mentioned embodiment.

Optionally, entropy encoding is performed on the first data at the encoding end according to the probability estimation result B, the encoding end sends the probability estimation result B to the decoding end, and the decoding end performs entropy decoding according to the probability estimation result B, and can also perform entropy decoding according to the probability estimation result B Sampling, the sampling method is consistent with the above-mentioned embodiment.

Optionally, entropy encoding is performed on the first data at the encoding end according to the probability estimation result B; the probability estimation is performed on the first data at the decoding end to obtain the probability estimation result B, and then entropy decoding is performed according to the probability estimation result B, and it is also possible to obtain the probability estimation result B according to The probability estimation result B is sampled, and the sampling method is consistent with the above-mentioned embodiment.

FIG. 7 is a schematic block diagram of an example of another video codec for implementing the technology of the present application. In the example of FIG. 7, the video encoder 20 includes a coefficient acquisition unit 701, a probability estimation unit 702, and an entropy encoding unit 703; the video decoder 30 includes an entropy decoding unit 704, a sampling unit 705, a first reconstruction unit 706, and a second reconstruction unit. Unit 707. The video codec shown in FIG. 5 may also be referred to as an end-to-end video codec or a video codec based on an end-to-end video codec.

Coefficient acquisition unit 701

The coefficient obtaining unit 701 obtains a plurality of coefficients from the image to be encoded, and the plurality of coefficients include a first coefficient.

Optionally, the multiple coefficients can be multiple pixels.

In an example, the coefficient acquiring unit 701 divides the image to be coded into image blocks of a preset size, and the sizes of the image blocks of the preset size may be 4x4, 8x8, 16x16, 32x32, 64x64, 128x128, and 256x256. Or 2) The coefficient acquisition unit 701 divides the image to be coded to obtain one or more image blocks, and the size of the image blocks is not limited. The quadtree, binary tree or ternary tree division method in existing encoding standards (H266, H265, H264, AVS2 or AVS3) can be used to divide the image to be encoded to obtain one or more image blocks. Each image block includes one or more pixels.

In an example, the image to be coded is subjected to wavelet transformation N times, 3N+1 subbands, each subband includes one or more wavelet coefficients, and N is an integer greater than 0.

Wherein, the wavelet transform method may be a traditional wavelet transform or a deep network-based wavelet transform or other similar transform methods, which are not limited here. The difference between the deep network-based wavelet transform method and the traditional wavelet transform lies in that the transformation and prediction are implemented using the deep network-based method, and the specific implementation method of the deep network is not limited here. This application takes a wavelet transform as an example, that is, N=1. As shown in FIG. 6a, four subbands LL1, HL1, LH1 and HH1 are obtained after the image to be coded is subjected to a wavelet transform.

For the image to be coded, the image composed of subbands obtained by performing wavelet transformation on the image to be coded is the first transformed image. Similarly, for the decoded image, the subband composition obtained by performing wavelet transformation on the decoded image The image of is the above-mentioned second transformed image.

The above multiple coefficients may be multiple wavelet coefficients or quantized wavelet coefficients.

In another example, the coefficient acquisition unit 701 performs DCT on the image to be encoded to obtain a DCT image, the DCT image includes multiple frequency bands, and each frequency band includes one or more DCT coefficients; wherein, after the image to be encoded is transformed, its low frequency components They are all concentrated in the upper left corner, and the high-frequency components are distributed in the lower right corner. The coefficient values in the first row and first column represent direct current (DC) coefficients, that is, the average value of the image to be encoded, and the other coefficients are alternating current (AC) coefficients, DC coefficients and AC coefficients are collectively referred to as DCT coefficients.

Optionally, the coefficient acquiring unit 701 divides the image to be coded into blocks to obtain multiple image blocks, and then performs DCT in units of image blocks to obtain transform blocks. For example 1) Divide the image to be coded into image blocks of a preset size, and the size of the image blocks of the preset size may be 4x4, 8x8, 16x16, 32x32, 64x64, 128x128, and 256x256. Or 2) dividing the image to be coded to obtain one or more image blocks, and the size of the image blocks is not limited. The quadtree, binary tree or ternary tree division method in existing encoding standards (H266, H265, H264, AVS2 or AVS3) can be used to divide the image to be encoded to obtain one or more image blocks.

It should be understood that the image formed based on the DCT coefficients obtained from the image to be encoded is the above-mentioned first transformed image.

Optionally, the obtained DCT coefficients are quantized, such as uniformly quantized, to obtain quantized DCT coefficients. For the image to be encoded, the image formed based on the quantized DCT coefficients obtained from the image to be encoded is the first transformed image.

The above multiple coefficients may be multiple DCT coefficients or quantized DCT coefficients.

In another example, feature extraction is performed on the image to be coded to obtain a three-dimensional feature map, and the three-dimensional feature map is the above-mentioned first transformed image. Optionally, quantify the feature elements in the three-dimensional feature map to obtain quantized feature elements, and the three-dimensional feature map formed by the quantized feature elements is the above-mentioned first transformed image; wherein, the above-mentioned multiple coefficients can be multiple feature coefficients or Multiple quantization feature coefficients.

Probability Estimation Unit 702

The probability estimation unit 702 obtains a first probability estimation result according to the context information of the first coefficient.

In an example, the first coefficient is a pixel of the image to be encoded, and the first context information of the pixel includes all or part of the pixels in the image to be encoded. Further, the first context information of the pixel includes the pixels adjacent to the pixel in the image to be encoded, or includes part or all of the pixels in the image block adjacent to the pixel, or includes the pixels in the image block where the pixel is located. some or all of the pixels.

In an example, the first coefficient is a coefficient in the first transformed image, and if the first data is a wavelet coefficient or a quantized wavelet coefficient, the first context information of the first coefficient includes part or all of the coefficients in the first transformed image , the coefficient is a wavelet coefficient or a quantized wavelet coefficient. Further, the first context information of the first coefficient includes wavelet coefficients or quantized wavelet coefficients around the first coefficient in the first transformed image, or the first context information includes part or all of the subbands adjacent to the first coefficient A coefficient, the coefficient is a wavelet coefficient or a quantized wavelet coefficient; or the first context information includes part or all of the coefficients in the subband where the first coefficient is located; the coefficient is a wavelet coefficient or a quantized wavelet coefficient;

or,

If the first coefficient is a DCT coefficient or a quantized DCT coefficient, the first context information of the first coefficient includes part or all of the coefficients in the first transformed image, and the coefficient is a DCT coefficient or a quantized DCT coefficient. The first context information of the first coefficient includes DCT coefficients or quantized DCT coefficients around the first data in the first transformed image, or the first context information includes some or all of the coefficients in the subband adjacent to the first coefficient. It is a DCT coefficient or a quantized DCT coefficient; or the first context information includes some or all coefficients in the subband where the first coefficient is located; the coefficient is a DCT coefficient or a quantized DCT coefficient;

Alternatively, if the first data is a characteristic coefficient or a quantized characteristic coefficient, the first context information of the first coefficient includes part or all of the coefficients in the first transformed image, and the coefficient is a characteristic coefficient or a quantized characteristic coefficient; further, the first The first context information of the coefficient includes feature coefficients or quantized feature coefficients around the first coefficient in the first transformed image, or the first context information includes some or all coefficients in the channel where the first coefficient is located, and the coefficients are feature coefficients or quantized features coefficient.

The aforementioned "surrounding wavelet coefficients or quantized wavelet coefficients" refer to wavelet coefficients or quantized wavelet coefficients whose distance from the first data is smaller than a preset threshold, and the unit of the preset threshold is "wavelet coefficients or quantized wavelet coefficients"; "Surrounding DCT coefficients or quantized DCT coefficients" refer to DCT coefficients or quantized DCT coefficients whose distance from the first data is less than a preset threshold, and the unit of the preset threshold is "DCT coefficients or quantized DCT coefficients"; the above " Surrounding characteristic coefficients or quantized characteristic coefficients" refer to characteristic coefficients or quantized characteristic coefficients whose distance from the first data is smaller than a preset threshold, and the unit of the preset threshold is "characteristic coefficient or quantized characteristic coefficient".

In an example, the plurality of coefficients further include a second coefficient, and the probability estimation unit 702 is further configured to perform probability estimation according to context information of the second coefficient to obtain a second probability estimation result.

The second coefficient and the first coefficient are located at different positions in the same image (such as the image to be encoded or the first transformed image obtained by transforming the image to be encoded), and the probability estimation is performed according to the context information of the second coefficient to obtain the second probability estimation result For the specific process, please refer to the related description of obtaining the first probability estimation result by performing probability estimation according to the context information of the first coefficient above, and will not be described here again.

In an example, the first coefficient and the second coefficient belong to the same preset area, and the preset area can be an image block in the image to be coded, or a subband obtained by wavelet transform of the image to be coded, or a The frequency band or image block obtained by performing DCT on the image, or a channel of the three-dimensional feature map obtained by performing feature extraction on the image to be coded, can only obtain one probability estimation result during probability estimation, and this probability estimation result can be called the probability of the preset area Estimated results. For data in a preset area, only one probability estimation result is obtained, and only one probability estimation result (that is, the probability estimation result of the preset area) needs to be transmitted during transmission, which can save code streams.

Method 1: For each coefficient in the first preset area, the probability estimation results of all coefficients in the first preset area can be obtained by processing according to the method of obtaining the probability estimation result of the first coefficient, such as the first preset area If there are 5 coefficients in it, 5 probability estimation results can be obtained; then the target probability estimation result is selected from the probability estimation results of all the coefficients in the first preset area as the probability estimation result of the first preset area. For example, the probability estimation result of the coefficient located in the middle of the first preset area, or the upper left corner or the upper right corner, the lower left corner or the lower right corner is the probability estimation result of the first preset area.

Mode 2: Perform probability estimation according to the context information of the first preset area to obtain a probability estimation result of the first preset area.

Optionally, if the first preset area is an image block of the first image, the context information of the first preset area includes some or all pixels in the first image, further, the context information of the first preset area includes Part or all of the pixels in the image block around the first preset area in the first image;

If the first preset area is a subband of the first transformed image (obtained by performing wavelet transformation on the first image), the context information of the first preset area includes some or all coefficients in the first transformed image, further , the context information of the first preset area includes some or all of the coefficients in the subbands around the first preset area in the first image, and the coefficients are wavelet coefficients or quantized wavelet coefficients;

If the first preset area is a frequency band of the first transformed image (obtained by performing DCT on the first image), the context information of the first preset area includes some or all coefficients in the first transformed image, further, the second The context information of a preset area includes some or all coefficients in the frequency band around the first preset area in the first image, and the coefficients are DCT coefficients or quantized DCT coefficients;

If the first preset area is a transform block of the first transform image (obtained by performing DCT on the first image), performing DCT transform on the first image in units of one or more image blocks can obtain one or more transforms piece.

If the first preset area is a channel of the first transformed image (a three-dimensional feature map obtained by performing feature extraction on the first image), the context information of the first preset area includes some or all coefficients in the first transformed image, the The coefficients are feature coefficients or quantized feature coefficients. Further, the context information of the first preset area includes some or all coefficients in the channel to which the first preset area belongs in the first image, and the coefficients are feature coefficients or quantized feature coefficients.

In an example, for the first probability estimation result, the probability estimation unit 702 obtains the probability distribution model of the first coefficient; processes the context information of the first coefficient through the fifth probability estimation network to obtain the parameters of the probability distribution model; The first probability distribution is obtained according to the probability distribution model of the first coefficient and the parameters of the probability distribution model; the above-mentioned first probability estimation result includes the above-mentioned first probability distribution, or the parameters of the above-mentioned first probability distribution model;

or,

The context information of the first coefficient is processed through the sixth probability estimation network to obtain the first probability distribution; the above-mentioned first probability estimation result includes the first probability distribution, or includes the parameters of the probability distribution model corresponding to the probability distribution, wherein, The fifth probability estimation network and the sixth probability estimation network are implemented based on neural networks.

In the above manner, the probability estimation result of the above second coefficient can be obtained.

The probability estimation unit 503 obtains the probability distribution model of the first preset area; processes the context information of the first preset area through the seventh probability estimation network to obtain the parameters of the probability distribution model; according to the probability of the first preset area The distribution model and the parameters of the probability distribution model obtain the probability distribution of the first preset area; wherein, the probability estimation result of the first preset area includes the probability distribution of the first preset area, or the probability distribution of the first preset area The parameters of the probability distribution model;

or,

Processing the context information of the first preset area through the eighth probability estimation network to obtain the probability distribution of the first preset area; the probability estimation result of the first preset area includes the probability distribution of the first preset area, or It includes parameters of a probability distribution model corresponding to the probability distribution; wherein, the seventh probability estimation network and the eighth probability estimation network are implemented based on neural networks.

Optionally, the above probability distribution model may be: GSM, an asymmetric Gaussian model, GMM or a Laplace distribution model (Laplace distribution). Wherein, the probability estimation network can be implemented based on a deep learning network, such as RNN and PixelCNN, etc., which is not limited here.

As an example, a typical probability estimation network based on PixelCNN (including the fifth probability estimation network, the sixth probability estimation network, the seventh probability estimation network and the eighth probability estimation network) is shown in Fig. 6d. "h×w" indicates that the current convolutional layer uses a convolution kernel with a size of "h×w", "ResB" indicates the residual module, and the structure is shown in Figure 6e, "*/relu" indicates that relu is used after the current layer activation function.

In an example, after obtaining the first approximate estimation result, the probability estimation unit 702 performs preprocessing on the first approximate estimation result to obtain a processed probability estimation result. Specifically, if the first probability estimation result includes the mean value and variance of the Gaussian distribution, the variance of the Gaussian distribution is processed to obtain the processed variance, and the mean value and the processed variance of the Gaussian distribution are used as the processed probability estimation result; or,

The mean value of the Gaussian distribution is processed to obtain the processed mean value, and the variance of the Gaussian distribution and the processed mean value are used as the probability estimation result after processing.

Process the variance of the Gaussian distribution according to the scaling factor of the first coefficient to obtain the processed variance;

Wherein, the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient; or,

the scaling factor of the first coefficient and the scaling factor of the second coefficient are different; or,

or,

If the first coefficient and the second coefficient belong to one frequency band or transformation block among multiple frequency bands obtained by performing DCT on the image to be encoded, then the scaling factor of the first coefficient and the scaling factor of the second coefficient are the same; or if the first coefficient and the second If the coefficients belong to different frequency band transform blocks, the scaling factor of the first coefficient is different from the scaling factor of the second coefficient; if the scaling factor of the first coefficient is determined according to the texture complexity of the frequency band transform block to which the first coefficient belongs;

or,

In an example, the probability estimation result is implemented based on a Gaussian distribution model, and the probability estimation result includes a Gaussian distribution or a mean and/or a variance of the Gaussian distribution.

In one example, when the probability estimation result of the first coefficient includes the location parameter and the scale parameter of the Laplace distribution, the scale parameter of the Laplace distribution is processed according to the scaling factor of the first coefficient, and the processing of the first coefficient The final probability estimation results include the processed scale parameters and the location parameters of the Laplace distribution.

In one example, when the probability estimation result of the first coefficient includes the location parameter and scale parameter of the Laplace distribution, the location parameter of the Laplace distribution is processed according to the scaling factor of the first coefficient, and the processing of the first coefficient The final probability estimation results include the processed location parameters and the scale parameters of the Laplace distribution.

After an example, after obtaining the approximate estimation result of the first preset area, the probability estimation unit 702 preprocesses the approximate estimation result of the first preset area to obtain the processed probability estimation result. Specifically, if the probability estimation result of the first preset area includes the mean value and variance of the Gaussian distribution, the variance of the Gaussian distribution is processed to obtain the processed variance, and the mean value and the processed variance of the Gaussian distribution are used as the value of the first preset area. Processed probability estimates; or,

In an example, if the probability estimation result is obtained based on the Laplace distribution, the probability estimation result includes the Laplace distribution, or the scale parameter and/or the location parameter of the Laplace distribution.

Entropy coding unit 703

The entropy encoding unit 703 writes the first coefficient, the second coefficient, the first probability estimation result and the second probability estimation result into the compressed code stream.

In one example, in video compression, the first probability estimation result and the second probability estimation result may be stored in a sequence header, image header, Slice or SEI and transmitted to the decoder 30 .

In an example, after obtaining the probability estimation result of the first preset area, the first flag enable_flag of the first preset area is set to a first value (for example, 1 or true), to indicate that the decoding end obtains the first Use the same probability distribution when estimating coefficients in a preset area, that is, the probability estimation result of the first preset area, and save the probability estimation result of the first preset area in the probability estimation result set, and record the first preset The probability estimation result of the region is indexed in the probability estimation result set and the size information of the first preset region. The entropy encoding unit 703 encodes all the coefficients in the first preset region, the probability estimation result set, the enable_flag of the first preset region, Index and size information is written into the compressed codestream.

In one example, the set of probability estimation results may be transmitted to decoder 30 via APS.

In an example, after obtaining the probability estimation result of the first preset area, the enable_flag of the first preset area is set to the first value (such as 1 or true) to indicate that the first preset is obtained by sampling at the decoding end The same probability distribution is used when estimating the coefficients in the region, that is, the probability estimation result of the first preset region; the entropy coding unit 703 converts all the coefficients in the first preset region, the probability estimation result of the first preset region, enable_flag and the The size information of a preset area is written into the compressed code stream.

In one example, if all the coefficients in the first preset area use their respective probability estimation results when sampling, and the enable_flag of the first preset area is set to a second value (such as 0 or false), the entropy coding unit 703 will All the coefficients in the first preset area, the respective probability estimation results of all the coefficients in the first preset area, and the enable_flag of the first preset area are written into the compressed code stream. Optionally, the entropy coding unit 703 also writes the size information of the first preset area into the compressed code stream.

It should be pointed out here that the entropy coding unit 703 writes the above data into the compressed code stream, specifically refers to performing entropy coding on the above data to obtain the compressed code stream. Optionally, entropy coding methods such as Huffman coding, CABAC coding, and H.264/H265/H.266 can be used.

Entropy decoding unit 704

In an example, the entropy decoding unit 704 also decodes the compressed code stream to obtain the second probability estimation result.

In one example, the entropy decoding unit 704 also decodes the first identifier from the compressed code stream. If the first identifier is the first value, it means that the same probability estimation result is used when all the estimated coefficients in the first preset area are obtained by sampling. (that is, the probability estimation result of the first preset area), the first preset area is an area in the enhanced image; the entropy decoding unit 704 also decodes the probability estimation result set and the first preset area from the compressed code stream index, the probability estimation result set includes probability estimation results of multiple preset areas, and the entropy decoding unit 704 obtains the first preset area from the probability estimation result set according to the index of the first preset area. The probability estimation result of the location area;

If the first flag is the second value, it means that the respective probability estimation results of the estimated coefficients are used when sampling all the estimated coefficients in the first preset area; the entropy decoding unit 704 decodes the size information of the first preset area from the code stream H1*W1, indicating that the entropy decoding unit 704 decodes H1*W1 probability estimation results from the compressed code stream, and the sampling unit 705 can obtain all estimated coefficients in the first preset area by sampling the H1*W1 probability estimation results, Both H1 and W1 are integers greater than 1.

In one example, the entropy decoding unit 704 also decodes the first flag from the compressed code stream, indicating that the same probability estimation result is used when sampling all estimated coefficients in the first preset area (that is, the probability estimation of the first preset area result), the first preset area is an area in the enhanced image, and the entropy decoding unit 704 also decodes the probability estimation result and H1*W1 of the first preset area from the code stream, and the sampling unit 705 passes the first preset The probability estimation result of the area is sampled H1*W1 times to obtain H1*W1 estimated coefficients, that is, the first preset area includes H1*W1 estimated coefficients.

The entropy decoding unit 704 is further configured to decode the compressed code stream to obtain multiple reconstruction coefficients.

It should be noted that the decoding method used by the entropy decoding unit 704 to decode the compressed code stream corresponds to the entropy coding method used by the entropy coding unit 703 .

Sampling unit 705

For the specific process, reference may be made to the relevant description of the above-mentioned sampling unit 505, which will not be described here again.

The first reconstruction unit 706

The first reconstruction unit 706 obtains a first reconstructed image according to a plurality of estimated coefficients.

Specifically, if the multiple estimated coefficients are multiple pixel values, the first reconstructed image can be obtained based on the multiple pixel values.

If the multiple estimated coefficients are multiple quantized wavelet coefficients, the first reconstruction unit 706 performs inverse quantization and inverse wavelet transform on the multiple estimated coefficients to obtain the first reconstructed image, or,

If the multiple estimated coefficients are multiple wavelet coefficients, the first reconstruction unit 706 performs wavelet inverse transform on the multiple estimated coefficients to obtain the first reconstructed image, or,

If the multiple estimated coefficients are multiple quantized DCT coefficients, the first reconstruction unit 706 performs inverse quantization and inverse DCT on the multiple estimated coefficients to obtain a reconstructed image, or,

If the multiple estimated coefficients are multiple DCT coefficients, the first reconstruction unit 706 performs inverse DCT on the multiple estimated coefficients to obtain the first reconstructed image.

If the multiple estimated coefficients are multiple feature coefficients, the first reconstruction unit 706 processes the feature map composed of multiple feature coefficients to obtain the first reconstructed image; or,

If the multiple estimated coefficients are multiple quantized feature coefficients, the first reconstruction unit 706 dequantizes the multiple estimated coefficients to obtain multiple feature coefficients; the feature map composed of multiple feature coefficients is processed to obtain the first reconstructed image.

In an example, a plurality of estimated coefficients may be input into the second reconstruction unit for processing to obtain a reconstructed image, and the reconstructed image may be used as a reference image for subsequent image prediction.

Second reconstruction unit 707

The second reconstruction unit 707 obtains a second reconstructed image according to the plurality of reconstruction coefficients.

Specifically, if the multiple reconstruction coefficients are multiple pixel values, the second reconstructed image can be obtained based on the multiple pixel values.

If the multiple reconstruction coefficients are multiple quantized wavelet coefficients, the second reconstruction unit 707 performs inverse quantization and wavelet inverse transform on the multiple quantized coefficients to obtain the first reconstructed image, or,

If the multiple reconstruction coefficients are multiple wavelet coefficients, the second reconstruction unit 707 performs wavelet inverse transform on the multiple reconstruction coefficients to obtain the first reconstructed image, or,

If the multiple reconstruction coefficients are multiple quantized DCT coefficients, the second reconstruction unit 707 performs inverse quantization and inverse DCT on the multiple reconstruction coefficients to obtain a reconstructed image, or,

If the multiple reconstruction coefficients are multiple DCT coefficients, the second reconstruction unit 707 performs inverse DCT on the multiple reconstruction coefficients to obtain the first reconstructed image.

If the multiple reconstruction coefficients are multiple feature coefficients, the second reconstruction unit 707 processes the feature map composed of multiple feature coefficients to obtain the first reconstructed image; or,

If the multiple reconstruction coefficients are multiple quantized feature coefficients, the second reconstruction unit 707 dequantizes the multiple reconstruction coefficients to obtain multiple feature coefficients; the feature map composed of multiple feature coefficients is processed to obtain the first reconstructed image.

Optionally, the implementation manner of the second reconstruction unit 707 may be the same as that of the first reconstruction unit 706, or may be different, which is not limited here.

In an example, after obtaining a feature map composed of multiple feature elements, the feature map may be passed through a neural network to output the above-mentioned first reconstructed image or the second reconstructed image. The neural network can adopt any structure, such as a fully connected network, a convolutional neural network, a recurrent neural network, and the like. The neural network can adopt a deep neural network structure with a multi-layer structure to obtain the first reconstructed image or the second reconstructed image with better quality.

In one example, the plurality of estimated coefficients obtained by the sampling unit 705 can be input to the second reconstruction unit 707 together with the plurality of reconstruction coefficients; specifically, when the plurality of estimated coefficients and the plurality of reconstruction coefficients are characteristic coefficients , the second reconstruction unit 707 processes multiple estimated coefficients according to the method of the first reconstruction unit 706 to obtain the first feature map, the second reconstruction unit 707 obtains the second feature map based on the multiple reconstruction coefficients, and then based on the first feature and the second feature map are processed by the neural network to obtain the second reconstructed image.

It can be seen that, due to the randomness of the sampling process, the sampling step can be repeated in the present application to obtain multiple first reconstructed images. The multiple first reconstructed images may be the reconstructed images with the best subjective quality, or the reconstructed images with the best objective quality. The first reconstructed image can be used in the encoding and decoding loop as a reference for intra-frame or inter-frame prediction; it can also be used outside the encoding and decoding loop to optimize image quality in a post-processing manner. For example: After obtaining multiple first reconstructed images through the sampling step and the reconstruction step, the reconstructed image with the best subjective quality is put into the decoded picture buffer (DPB) or the reference frame set for use in the codec loop The reference image for intra-frame or inter-frame prediction; the reconstructed image with the best objective quality is used for post-processing, and the subjective quality of the reconstructed image after codec is adjusted to improve the image/video quality after compression and reconstruction. Optionally, the second reconstructed image is obtained according to the reconstruction coefficient, which can be used as a reference frame when predicting the next frame in video compression.

In one example, during the entropy coding process, the entropy coding unit 703 first performs probability estimation on the first coefficient to obtain the probability estimation result of the first coefficient, which is called the probability estimation result C; and then according to the probability estimation result C Perform entropy encoding on the first coefficient; during the entropy decoding process, the entropy decoding unit 704 first performs probability estimation on the first coefficient to obtain a probability estimation result of the first coefficient, which may also be called a probability estimation result C; and then Entropy decoding is performed according to the probability estimation result C. The probability estimation result mentioned in the above embodiments is called the probability estimation result D.

Optionally, entropy encoding is performed on the first coefficient at the encoding end according to the probability estimation result C, and the decoding end performs probability estimation on the first coefficient according to the manner in which the encoding end performs probability estimation on the first data to obtain the probability estimation result (also can be As the probability estimation result C, entropy decoding is performed according to the probability estimation result C, and sampling may also be performed according to the probability estimation result C, and the sampling method is consistent with the above-mentioned embodiment.

Optionally, entropy encoding is performed on the first coefficient at the encoding end according to the probability estimation result C, and the probability estimation result C is transmitted to the decoding end, and the decoding end performs entropy decoding according to the probability estimation result C, and can also perform entropy decoding according to the probability estimation result C Sampling, the sampling method is consistent with the above-mentioned embodiment.

Optionally, entropy encoding is performed on the first coefficient at the encoding end according to the probability estimation result D, and the encoding end sends the probability estimation result D to the decoding end, and the decoding end performs entropy decoding according to the probability estimation result D, or performs entropy decoding according to the probability estimation result D Sampling, the sampling method is consistent with the above-mentioned embodiment.

Optionally, entropy encoding is performed on the first coefficient at the encoding end according to the probability estimation result D; the probability estimation is performed on the first coefficient at the decoding end to obtain the probability estimation result D, and then entropy decoding is performed according to the probability estimation result D, and it is also possible to obtain the probability estimation result D according to The probability estimation result D is sampled, and the sampling method is consistent with the above-mentioned embodiment.

FIG. 8 is a flowchart showing a process 800 of an encoding method based on an embodiment of the present application. Process 800 may be performed by video encoder 20 . The process 800 is described as a series of steps or operations. It should be understood that the process 1000 may be performed in various orders and/or concurrently, and is not limited to the order of execution shown in FIG. 8 .

As shown in Figure 8, the encoding method includes:

S801. Acquire a first image, where the first image is an image to be encoded or an image that has been decoded.

S802. Perform probability estimation according to the first context information to obtain a first probability estimation result; the first context information is obtained from the first image.

In a possible design, the method of this embodiment also includes:

Probability estimation is performed according to the first context information and the second context information to obtain the first probability estimation result; the second context information is obtained from the second image.

performing probability estimation according to the context information of the first data to obtain a probability estimation result of the first data;

Perform probability estimation according to the context information of the second data to obtain the probability estimation result of the second data; wherein, the first data and the second data are obtained according to the first image; the first context information includes the context information of the first data and the second Contextual information about the data.

S803. Write the first probability estimation result into the compressed code stream.

In a possible design, the encoding method further includes: setting the value of the first flag of the first preset area as the first value, which is used to indicate that when the estimated coefficients in the first preset area are obtained by sampling The probability estimation result of the first preset area; the probability estimation result of the first preset area is saved in the probability estimation result set, and the index of the probability estimation result of the first preset area in the probability estimation result set is recorded; the probability estimation Writing the result into the compressed code stream includes: writing the probability estimation result set, the index, the size information of the first preset area and the first identification into the compressed code stream.

In a possible design, this encoding method also includes:

Preprocessing the variance of the Gaussian distribution according to the scaling factor of the first data to obtain the processed variance, wherein the processed probability estimation result includes the mean value of the Gaussian distribution and the processed variance;

or,

If the first data and the second data belong to one frequency band or transform block among the plurality of frequency bands obtained by performing DCT on the first image, then the scaling factor of the first data is the same as that of the second data; or if the first data and the second data The two data belong to different frequency bands or transform blocks, then the scaling factor of the first data and the scaling factor of the second data are different; if or the scaling factor of the first data is determined according to the frequency band to which the first data belongs or the texture complexity of the transform block ;

or,

In a possible design, this encoding method also includes:

or,

It should be noted here that, for the specific implementation process of the embodiment shown in FIG. 8 , reference may be made to the related descriptions of the coding unit 501 , the forward transformation unit 502 and the probability estimation unit 503 in FIG. 5 , which will not be described here again.

FIG. 9 is a flow chart showing a process 900 of an encoding method based on an embodiment of the present application. Process 900 may be performed by video encoder 20 . The process 900 is described as a series of steps or operations. It should be understood that the process 900 may be performed in various orders and/or concurrently, and is not limited to the order of execution shown in FIG. 9 .

As shown in Figure 9, the encoding method includes:

S901. Obtain a plurality of coefficients according to an image to be encoded, where the plurality of coefficients include a first coefficient.

S902. Obtain a first probability estimation result according to the context information of the first coefficient.

S903. Write the first coefficient and the first probability estimation result into the compressed code stream.

In a possible design, this encoding method also includes:

In a possible design, the probability estimation result of the first coefficient includes the mean and variance of the Gaussian distribution, the probability estimation result of the first coefficient is preprocessed, and the probability estimation result after processing is obtained, including:

or,

In a possible design, this encoding method also includes:

In a possible design, if the multiple coefficients are multiple pixel values in the image to be encoded, the first context information includes some or all pixel values in the image to be encoded; or,

Obtain multiple coefficients according to the image to be encoded, including:

or,

It should be noted here that, for the specific implementation process of the embodiment shown in FIG. 9 , reference may be made to the relevant descriptions of the coefficient acquisition unit 701 , the probability estimation unit 702 and the entropy encoding unit 703 in FIG. 7 , which will not be described here again.

Fig. 10 is a flowchart showing a process 1000 of a decoding method based on an embodiment of the present application. Process 1000 may be performed by video decoder 30 . The process 1000 is described as a series of steps or operations. It should be understood that the process 1000 may be performed in various orders and/or concurrently, and is not limited to the order of execution shown in FIG. 10 .

As shown in Figure 10, the decoding method includes:

S1001. Obtain a first probability estimation result from decoding a compressed code stream.

S1002. Perform sampling according to the first probability estimation result to obtain a first estimation coefficient.

S1003. Obtain a first reconstructed image according to the first estimation coefficient.

In a possible design, the decoding method also includes:

Decoding the first identifier from the compressed code stream; if the value of the first identifier is the first value, decoding the compressed code stream to obtain a first probability estimation result, including: decoding the probability estimation result of the preset area from the compressed code stream and the size information of the preset area; the preset area includes the first estimation coefficient, and the preset area is an area in the first reconstructed image; the probability estimation result of the preset area is the first probability estimation result; wherein, the first identified The value is the first value and is used to indicate that the probability estimation result of the preset area is used when all the systems to be estimated in the preset area are obtained by sampling.

In a possible design, the first probability estimation result includes the mean and variance of the Gaussian distribution, and the first estimation coefficient is obtained by sampling according to the first probability estimation result, including:

In a possible design, the decoding method also includes:

The scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient are the same; or, the scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient are different; or

or,

The scaling factor of the first estimated coefficient is the same as the scaling factor of the second estimated coefficient, or the scaling factor of the first estimated coefficient is different from the scaling factor of the second estimated coefficient; or, the scaling factor of the first estimated coefficient is based on the first estimate The texture complexity of the image block to which the coefficient belongs is determined.

In a possible design, the decoding method also includes:

It should be noted here that the specific implementation process of the embodiment shown in FIG. 10 can refer to the decoding unit 504, the sampling unit 505, and the inverse transformation unit 506 in the embodiment shown in FIG. 5, and the Relevant descriptions of the entropy decoding unit 704, the sampling unit 705, the first reconstruction unit 706 and the second reconstruction unit 707 are omitted here.

Those of skill in the art would appreciate that the functions described in conjunction with the various illustrative logical blocks, modules, and algorithm steps disclosed herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions described by the various illustrative logical blocks, modules, and steps may be stored or transmitted as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which correspond to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (eg, based on a communication protocol) . In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this application. A computer program product may include a computer readable medium.

By way of example and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can contain the desired program code in the form of a computer and can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD) and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce optically with lasers data. Combinations of the above should also be included within the scope of computer-readable media.

can be processed by one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. device to execute instructions. Accordingly, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or in conjunction with into the combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of the present application may be implemented in a wide variety of devices or devices, including wireless handsets, an integrated circuit (IC), or a group of ICs (eg, a chipset). Various components, modules, or units are described in this application to emphasize functional aspects of means for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit in conjunction with suitable software and/or firmware, or by interoperating hardware units (comprising one or more processors as described above) to supply.

The above is only an exemplary embodiment of the present application, but the scope of protection of the present application is not limited thereto. Any skilled person familiar with the technical field can easily think of changes or Replacement should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims

An image processing method implemented by a coding device, comprising:

Acquiring a first image, where the first image is an image to be encoded or an image that has been decoded,

performing probability estimation according to the first context information to obtain a first probability estimation result, wherein the first context information is obtained from the first image;

Writing the first probability estimation result into the compressed code stream.
The method according to claim 1, further comprising:

Acquiring a second image, the second image is an image to be encoded or a decoded image, and the second image is different from the first image;

The performing probability estimation according to the first context information to obtain the first probability estimation result includes:

Performing probability estimation according to the first context information and second context information to obtain the first probability estimation result, wherein the second context information is obtained from the second image.
The method according to claim 1 or 2, wherein the probability estimation according to the first context information to obtain the first probability estimation result comprises:

performing probability estimation according to the context information of the first data to obtain a probability estimation result of the first data;

performing probability estimation according to the context information of the second data to obtain a probability estimation result of the second data;

Wherein, the first data and the second data are obtained according to the first image;

The first context information includes context information of the first data and context information of the second data.
The method according to claim 1 or 2, wherein the first probability estimation result comprises a probability estimation result of a first preset area, and the first preset area includes first data and second data, so The first preset area is located in the first image, or in an image obtained by transforming the first image, and performing probability estimation according to the first context information to obtain the first probability estimation result includes:

performing probability estimation according to the context information of the first data to obtain a probability estimation result of the first data;

performing probability estimation according to context information of the second data to obtain a probability estimation result of the second data, wherein the first context information includes context information of the first data and context information of the second data;

Selecting and obtaining the probability estimation result of the first preset area according to the probability estimation result of the first data and the probability estimation result of the second data, the first probability estimation result including the probability estimation result of the first preset area Probability estimate results.
The method according to claim 1 or 2, wherein the first probability estimation result includes a probability estimation result of a second preset area, and the second preset area is located in the first image, or located in In the image obtained by transforming the first image, the first context information includes the context information of the second preset area, and performing probability estimation according to the first context information to obtain the first probability estimation result, include:

Probability estimation is performed according to the context information of the second preset area to obtain a probability estimation result of the second preset area, and the first probability estimation result includes the probability estimation result of the second preset area.
The method according to claim 4 or 5, characterized in that the method further comprises:

Set the value of the first identifier of the first preset area as the first value, which is used to indicate the probability of using the first preset area when sampling the estimated coefficients in the first preset area estimated results;

saving the probability estimation result of the first preset area in a probability estimation result set, and recording the index of the probability estimation result of the first preset area in the probability estimation result set;

The writing the first probability estimation result into the compressed code stream includes:

Writing the probability estimation result set, the index, the size information of the first preset area and the first identifier into the compressed code stream.
The method according to claim 4 or 5, characterized in that the method further comprises:

Set the value of the first identifier of the first preset area as the first value, which is used to indicate the probability of using the first preset area when sampling the estimated coefficients in the first preset area estimated results;

Preprocessing the probability estimation result of the first preset area according to the scaling factor of the first preset area to obtain a processed probability estimation result, and saving the processed probability estimation result to a probability estimation result set , and record the index of the processed probability estimation result in the probability estimation result set;

The writing the first probability estimation result into the compressed code stream includes:

Writing the probability estimation result set, the index, the size information of the first preset area and the first identifier into the compressed code stream.
The method according to claim 4 or 5, characterized in that the method further comprises:

Set the value of the first identifier of the first preset area as the first value, which is used to indicate the probability of using the first preset area when sampling the estimated coefficients in the first preset area estimated results;

The writing the first probability estimation result into the compressed code stream includes:

Writing the probability estimation result of the first preset area, the size information of the first preset area, and the first identifier into the code stream.
The method according to any one of claims 3-4, 6 and 8, wherein the method further comprises:

The probability estimation result of the first data is preprocessed to obtain the probability estimation result after processing.
The method according to claim 9, wherein the probability estimation result of the first data includes the mean and variance of the Gaussian distribution, and the probability estimation result of the first data is preprocessed to obtain the processed Probability estimation results, including:

Setting the variance of the Gaussian distribution to 0 as the processed variance, wherein the processed probability estimation result includes the mean value of the Gaussian distribution and the processed variance.
The method according to claim 9, wherein the probability estimation result of the first data includes the mean and variance of the Gaussian distribution, and the probability estimation result of the first data is preprocessed to obtain the processed The probability estimation result includes: preprocessing the variance of the Gaussian distribution according to the scaling factor of the first data to obtain the processed variance, wherein the processed probability estimation result includes the mean value of the Gaussian distribution and the processed variance; the method also includes: preprocessing the variance of the second probability distribution according to the scaling factor of the second coefficient, wherein:

The scaling factor of the first data is the same as the scaling factor of the second data; or,

a scaling factor of the first data and a scaling factor of the second data are different; or,

If the first data and the second data belong to the same image block in the first image, the scaling factor of the first data is the same as the scaling factor of the second data; or if the first data If the first data and the second data belong to different image blocks, the scaling factor of the first data is different from the scaling factor of the second data; or the scaling factor of the first data is based on the The texture complexity of the image block is determined; or,

If the first data and the second data belong to one of the subbands obtained by performing wavelet transformation on the first image, the scaling factor of the first data and the scaling factor of the second data the same; or if the first data and the second data belong to different subbands, the scaling factor of the first data is different from the scaling factor of the second data; or the scaling factor of the first data is determined according to the texture complexity of the subband to which the first data belongs;

or,

If the first data and the second data belong to one of the frequency bands obtained by performing DCT on the first image, the scaling factor of the first data is the same as the scaling factor of the second data; or if The first data and the second data belong to different frequency bands, then the scaling factor of the first data is different from the scaling factor of the second data; or the scaling factor of the first data is based on the first Determined by the texture complexity of the frequency band to which the data belongs;

or,

If the first data and the second data belong to the same channel of the three-dimensional feature map obtained by performing feature extraction on the first image, the scaling factor of the first data is the same as the scaling factor of the second data ; or if the first data and the second data belong to different channels, the scaling factor of the first data is different from the scaling factor of the second data; or the scaling factor of the first data is based on the determined by the texture complexity of the channel to which the first data belongs.
The method according to claim 5, wherein the method further comprises:

The probability estimation result of the second preset area is preprocessed to obtain a processed probability estimation result.
The method according to claim 12, wherein the probability estimation result of the first data includes the mean and variance of the Gaussian distribution, and the probability estimation result of the first data is preprocessed to obtain the processed Probability estimation results, including:

Setting the variance of the Gaussian distribution to 0 as the first variance, wherein the processed probability estimation result includes the mean value of the Gaussian distribution and the first variance, or,

The variance of the Gaussian distribution is processed according to the scaling factor of the second preset area to obtain a second variance, wherein the processed probability estimation result includes the mean value of the Gaussian distribution and the second variance, and the first preset The scaling factor of the prefabricated area is the same as or different from the scaling factor of the second prefabricated area.
The method according to any one of claims 1-13, wherein the first context information includes some or all pixel values in the first image.
The method according to any one of claims 1-13, wherein the method further comprises:

transforming the first image to obtain a first transformed image;

Wherein, if the transformation is wavelet transformation, the first context information includes some or all coefficients in the first transformed image, and the coefficients are wavelet coefficients or quantized wavelet coefficients, or;

If the transform is a discrete cosine transform DCT, the first context information includes some or all coefficients in the first transformed image, and the coefficients are DCT coefficients or quantized DCT coefficients; or,

If the transformation is a feature transformation, the first context information includes part or all of the coefficients in the first transformed image, and the coefficients are feature coefficients or quantized feature coefficients.
According to the method according to any one of claims 1-15, the probability estimation according to the first context information to obtain the first probability estimation result comprises:

Inputting the first context information into a first probability estimation network for processing to obtain parameters of the first probability distribution model; the first probability estimation result includes parameters of the first probability distribution model;

or,

inputting the first context information into a second probability estimation network for processing to obtain a target probability distribution, the first probability estimation result including parameters of the target probability distribution;

Wherein, the first probability estimation network and the second probability estimation network are realized by a neural network.
An encoding method implemented by an encoding device, characterized in that it comprises:

obtaining a plurality of coefficients according to the image to be encoded, the plurality of coefficients including a first coefficient;

Obtaining a first probability estimation result according to the context information of the first coefficient;

Writing the first coefficient and the first probability estimation result into a compressed code stream.
The method according to claim 17, wherein the plurality of coefficients further comprises a second coefficient, and the method further comprises:

Obtaining a second probability estimation result according to the context information of the second coefficient;

The writing the first coefficient and the first probability estimation result into the compressed code stream includes:

writing the first coefficient, the first probability estimation result, the second coefficient and the second probability estimation result into the compressed code stream.
The method according to claim 17, wherein the plurality of coefficients further include a second coefficient, the first coefficient and the second coefficient belong to the same preset area, and the preset area is located in the waiting In the coded image, or in the image obtained by transforming the image to be coded, the obtaining the first probability estimation result according to the context information of the first coefficient includes:

Perform probability estimation according to the context information of the first coefficient to obtain a third probability estimation result; perform probability estimation according to the context information of the second coefficient to obtain a second probability estimation result; obtain the second probability estimation result from the third probability estimation result and the first probability estimation result The first probability estimation result is determined from the second probability estimation result;

The writing the first coefficient and the first probability estimation result into the compressed code stream includes:

Writing the first coefficient, the second coefficient and the first probability estimation result into the compressed code stream.
The method according to claim 17, wherein the plurality of coefficients further include a second coefficient, the first coefficient and the second coefficient belong to the same preset area, and the preset area is located in the waiting In the coded image, or in the image obtained by transforming the image to be coded, the obtaining the first probability distribution according to the context information of the first coefficient includes:

performing probability estimation according to context information of the preset area to obtain a first probability estimation result; the context information of the preset area includes context information of the first coefficient;

The writing the first coefficient and the first probability estimation result into the compressed code stream includes:

Writing the first coefficient, the second coefficient and the first probability estimation result into a compressed code stream.
The method according to claim 19 or 20, wherein the method further comprises:

Setting the value of the first identifier of the preset area as the first value, which is used to indicate that the first probability estimation result is used when sampling the estimated coefficients in the preset area;

saving the first probability estimation result into a probability estimation result set, and recording the index of the first probability estimation result in the probability estimation result set;

The writing the first coefficient, the second coefficient and the first probability estimation result into the compressed code stream includes:

Writing the first coefficient, the second coefficient, the probability estimation result set, the index, the size information of the preset area and the first identifier into the compressed code stream.
The method according to claim 19 or 20, wherein the method further comprises:

Setting the value of the first identifier of the preset area as the first value, which is used to indicate that the first probability estimation result is used when sampling the estimated coefficients in the preset area;

The writing the first coefficient, the second coefficient and the first probability estimation result into the compressed code stream includes:

Writing the first coefficient, the second coefficient, the first probability estimation result, the size information of the preset area and the first identifier into the compressed code stream.
The method according to claim 18, wherein the first coefficient and the second coefficient belong to the same preset area, and the method further comprises:

Setting the value of the first identifier of the preset area to a second value, which is used to indicate that the respective probability estimation results are used when sampling the estimated coefficients in the preset area;

The writing the first coefficient, the first probability estimation result, the second coefficient and the second probability estimation result into the compressed code stream includes:

Writing the first coefficient, the first probability estimation result, the second coefficient and the second probability estimation result, and the first identifier of the preset area into the compressed code stream.
The method according to any one of claims 17-19 and 21-23, wherein the method further comprises:

The probability estimation result of the first coefficient is preprocessed to obtain a processed probability estimation result.
The method according to claim 24, wherein the probability estimation result of the first coefficient includes the mean and variance of the Gaussian distribution, and the probability estimation result of the first coefficient is preprocessed to obtain the processed Probability estimation results, including:

Setting the variance of the Gaussian distribution to 0 as the processed variance, wherein the processed probability estimation result includes the mean value of the Gaussian distribution and the processed variance.
The method according to claim 24, wherein the probability estimation result of the first coefficient includes the mean and variance of the Gaussian distribution, and the probability estimation result of the first coefficient is preprocessed to obtain the processed Probability estimation results, including:

Preprocessing the variance of the Gaussian distribution according to the scaling factor of the first coefficient to obtain a processed variance, wherein the processed probability estimation result includes a mean value of the Gaussian distribution and a processed variance;

The method further includes: preprocessing the variance of the second probability distribution according to the scaling factor of the second coefficient, wherein:

the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient; or,

a scaling factor of the first coefficient and a scaling factor of the second coefficient are different; or,

If the first coefficient and the second coefficient belong to the same image block in the image to be encoded, then the scaling factor of the first data is the same as the scaling factor of the second coefficient; or if the first data If the first coefficient and the second coefficient belong to different image blocks, then the scaling factor of the first coefficient is different from the scaling factor of the second coefficient; or the scaling factor of the first coefficient is based on the The texture complexity of the image block is determined; or,

If the first coefficient and the second coefficient belong to one of the subbands obtained by performing wavelet transformation on the image to be coded, the scaling factor of the first coefficient and the scaling factor of the second coefficient the same; or if the first coefficient and the second coefficient belong to different subbands, the scaling factor of the first coefficient and the scaling factor of the second coefficient are different; or the scaling factor of the first coefficient is determined according to the texture complexity of the subband to which the first coefficient belongs;

or,

If the first coefficient and the second coefficient belong to one of the frequency bands obtained by performing DCT on the image to be encoded, then the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient; or if The first coefficient and the second coefficient belong to different frequency bands, then the scaling factor of the first coefficient is different from the scaling factor of the second coefficient; or the scaling factor of the first coefficient is based on the first determined by the texture complexity of the frequency band to which the coefficient belongs;

or,

If the first coefficient and the second coefficient belong to the same channel of the three-dimensional feature map obtained by performing feature extraction on the image to be encoded, the scaling factor of the first coefficient and the scaling factor of the second coefficient are the same ; or if the first coefficient and the second coefficient belong to different channels, the scaling factor of the first coefficient is different from the scaling factor of the second coefficient; or the scaling factor of the first coefficient is based on the determined by the texture complexity of the channel to which the first coefficient belongs.
The method according to claim 20, further comprising:

The probability estimation result of the preset area is preprocessed to obtain the probability estimation result after processing.
The method according to claim 27, wherein the probability estimation result of the preset area includes the mean and variance of the Gaussian distribution, and the preprocessing is performed on the probability estimation result of the preset area to obtain the processed Probability estimation results, including:

Setting the variance of the Gaussian distribution to 0 as the first variance, wherein the processed probability estimation result includes the mean value and the first variance of the Gaussian distribution, or,

The variance of the Gaussian distribution is processed according to the scaling factor of the preset area to obtain a second variance, wherein the processed probability estimation result includes a mean value and a second variance of the Gaussian distribution.
The method according to any one of claims 17-28, wherein if the multiple coefficients are multiple pixel values in the image to be encoded, the first context information includes some or all of the pixel values; or,

If performing wavelet transformation on the image to be coded to obtain the multiple coefficients, the multiple coefficients are multiple wavelet coefficients, and the first context information includes part or all of the multiple wavelet coefficients; or,

If performing wavelet transformation and quantization on the image to be encoded to obtain the multiple coefficients, the multiple coefficients are multiple quantized wavelet coefficients, and the first context information includes part or all of the multiple quantized wavelet coefficients ;or,

If DCT is performed on the image to be encoded to obtain the multiple coefficients, the multiple coefficients are multiple DCT coefficients, and the first context information includes part or all of the multiple DCT coefficients; or,

If performing DCT and quantization on the image to be encoded to obtain the multiple coefficients, the multiple coefficients are multiple quantized DCT coefficients, and the first context information includes part or all of the multiple quantized DCT coefficients; or,

If performing feature extraction on the image to be encoded to obtain the multiple coefficients, the multiple coefficients are multiple feature coefficients, and the first context information includes part or all of the multiple feature coefficients; or,

If feature extraction and quantization are performed on the image to be encoded to obtain the multiple coefficients, the multiple coefficients are multiple quantized feature coefficients, and the first context information includes part or all of the multiple quantized feature coefficients .
According to the method according to any one of claims 17-29, said obtaining the first probability estimation result according to the context information of the first coefficient comprises:

acquiring a second probability distribution model, inputting the first context information into a third probability estimation network for processing, and obtaining parameters of the second probability distribution model; according to the second probability distribution model and the second probability The parameters of the distribution model obtain the first probability estimation result;

or,

inputting the first context information into a fourth probability estimation model for processing to obtain the probability estimation result;

Wherein, the third probability estimation network and the fourth probability estimation network are implemented by a neural network.
An image processing method implemented by a decoding device, characterized in that it comprises:

Obtaining a first probability estimation result from decoding the compressed code stream;

performing sampling according to the first probability estimation result to obtain a first estimated coefficient;

A first reconstructed image is obtained according to the first estimated coefficients.
The method according to claim 31, further comprising:

Obtaining a second probability estimation result from decoding the compressed code stream;

performing sampling according to the second probability estimation result to obtain a second estimation coefficient;

The obtaining the first reconstructed image according to the first estimation coefficient includes:

The first reconstructed image is obtained according to the first estimated coefficient and the second estimated coefficient.
The method according to claim 31, wherein said obtaining the first probability estimation result from the decoding of the compressed code stream comprises:

Decode the first identifier from the compressed code stream;

If the value of the first identifier is the first value, the decoding of the compressed code stream to obtain a first probability estimation result includes:

Decoding a probability estimation result set and an index of a preset area from the compressed code stream; the preset area includes the first estimated coefficient, and the preset area is an area in the first reconstructed image,

determining the probability estimation result of the preset area from the probability estimation result set according to the index, the first probability estimation result being the probability estimation result of the preset area;

Wherein, the value of the first identifier is the first value used to indicate that the probability estimation result of the preset area is used when sampling all the estimation systems in the preset area.
The method according to claim 31, further comprising:

Decode the first identifier from the compressed code stream;

If the value of the first identifier is the first value, the decoding of the compressed code stream to obtain a first probability estimation result includes:

The probability estimation result of the preset area and the size information of the preset area are decoded from the compressed code stream; the preset area includes the first estimated coefficient, and the preset area is the first reconstruction An area in the image; the probability estimation result of the preset area is the first probability estimation result;

Wherein, the value of the first identifier is the first value used to indicate that the probability estimation result of the preset area is used when sampling all the systems to be estimated in the preset area.
The method according to claim 32, wherein the first estimated coefficient and the second estimated coefficient belong to the same preset area, and the preset area is an area in the first reconstructed image, the method Also includes:

Decode the first identifier from the compressed code stream;

If the value of the first identifier is the second value, the value of the first identifier is the second value used to indicate that the respective probability estimation results are used when all the systems to be estimated in the preset area are obtained by sampling.
The method according to any one of claims 31-35, wherein the first probability estimation result includes the mean and variance of the Gaussian distribution, and the first estimation coefficient obtained by sampling according to the first probability estimation result includes :

Obtain the first random number;

determining a first reference value according to the first random number, where the first reference value obeys a Gaussian distribution;

The first estimation coefficient is determined according to the first reference value and the mean value and variance of the first probability estimation result.
The method of claim 36, further comprising:

Preprocessing the variance of the first probability estimation result to obtain the processed variance;

The determining the first estimated coefficient according to the first reference value and the mean value and variance of the first probability estimation result includes:

The first estimation coefficient is determined according to the first reference value, the mean value of the first probability estimation result, and the processed variance.
The method according to claim 37, wherein the preprocessing the variance of the first probability estimation result to obtain the processed variance comprises:

Set the variance of the first probability distribution to 0 as the processed variance.
According to the method according to claim 37, when the first estimated coefficient is a quantized wavelet coefficient, or, a wavelet coefficient, or a quantized discrete cosine transform DCT coefficient, or a DCT coefficient, or a feature coefficient, or a quantized feature coefficient, the Preprocessing the variance of the first probability distribution to obtain the processed variance includes:

preprocessing the variance of the first probability distribution according to the scaling factor of the first estimated coefficient to obtain the processed variance,

the scaling factor of the first estimated coefficient is the same as the scaling factor of the second estimated coefficient; or,

the scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient are different; or

When the first estimated coefficient and the second estimated coefficient are quantized wavelet coefficients or wavelet coefficients, if the first estimated coefficient and the second estimated coefficient belong to the same subband, the scaling of the first estimated coefficient factor is the same as the scaling factor of the second estimated coefficient; or if the first estimated coefficient and the second estimated coefficient belong to different subbands, the scaling factor of the first estimated coefficient and the second estimated coefficient different scaling factors; or the scaling factor of the first estimated coefficient is determined according to the texture complexity of the image block to which the first estimated coefficient belongs;

or,

When the first estimated coefficient and the second estimated coefficient are quantized DCT coefficients or DCT coefficients, if the first estimated coefficient and the second estimated coefficient belong to the same frequency band, the scaling of the first estimated coefficient factor is the same as the scaling factor of the second estimated coefficient; or if the first estimated coefficient and the second estimated coefficient belong to different frequency bands, the scaling factor of the first estimated coefficient and the scaling factor of the second estimated coefficient The scaling factors are different; or the scaling factor of the first estimated coefficient is determined according to the texture complexity of the frequency band to which the first estimated coefficient belongs;

or,

When the first estimated coefficient and the second estimated coefficient are characteristic coefficients or quantized characteristic coefficients, if the first estimated coefficient and the second estimated coefficient belong to the same channel, the scaling factor of the first estimated coefficient and The scaling factors of the second estimated coefficients are the same; or if the first estimated coefficients and the second estimated coefficients belong to different channels, the scaling factors of the first estimated coefficients and the scaling factors of the second estimated coefficients Different; if the scaling factor of the first estimated coefficient is determined according to the texture complexity of the channel to which the first estimated coefficient belongs.
According to the method according to claim 37, when the first estimated coefficient and the second estimated coefficient are pixel values, the preprocessing is performed on the variance of the first probability estimation result to obtain the processed variance, comprising :

preprocessing the variance of the first probability estimation result according to the scaling factor of the first coefficient to obtain the processed variance,

The scaling factor of the first estimated coefficient is the same as the scaling factor of the second estimated coefficient, or the scaling factor of the first estimated coefficient is different from the scaling factor of the second estimated coefficient; or,

If the first estimation coefficient and the second estimation coefficient belong to the same image block, and the resolution of the image block is lower than the preset resolution, the scaling factor of the first estimation coefficient and the second estimation coefficient different scaling factors; or if the first estimated coefficient and the second estimated coefficient belong to the same image block, and the resolution of the image block is not lower than the preset resolution, the first estimated coefficient The scaling factor is the same as the scaling factor of said second estimated coefficient.
The method according to any one of claims 31-40, wherein the obtaining the first reconstructed image according to the first estimated coefficient and the second estimated coefficient comprises:

If the first estimated coefficient and the second estimated coefficient are quantized wavelet coefficients, performing inverse quantization and wavelet inverse transform on the first estimated coefficient and the second estimated coefficient to obtain the first reconstructed image, or,

If the first estimated coefficient and the second estimated coefficient are wavelet coefficients, performing inverse wavelet transform on the first estimated coefficient and the second estimated coefficient to obtain the first reconstructed image, or,

If the first estimated coefficient and the second estimated coefficient are quantized DCT coefficients, performing inverse quantization and inverse DCT on the first estimated coefficient and the second estimated coefficient to obtain the first reconstructed image, or,

If the first estimated coefficient and the second estimated coefficient are DCT coefficients, performing inverse DCT on the first estimated coefficient and the second estimated coefficient to obtain the first reconstructed image.
The method according to any one of claims 31-41, further comprising:

Decoding the compressed code stream to obtain a plurality of reconstruction coefficients;

A second reconstructed image is obtained according to the plurality of reconstruction coefficients.
The method according to claim 42, wherein said obtaining a second reconstructed image according to said plurality of coefficients comprises:

If the multiple reconstruction coefficients are quantized wavelet coefficients, performing inverse quantization and wavelet inverse transform on the multiple reconstruction coefficients to obtain the second reconstructed image, or,

If the multiple reconstruction coefficients are wavelet coefficients, performing inverse wavelet transform on the multiple reconstruction coefficients to obtain the second reconstructed image, or,

If the multiple reconstruction coefficients are quantized DCT coefficients, performing inverse quantization and inverse DCT on the multiple reconstruction coefficients to obtain the second reconstructed image, or,

If the multiple reconstruction coefficients are DCT coefficients, performing inverse DCT on the multiple reconstruction coefficients to obtain the second reconstructed image.
A decoder, characterized by comprising a processing circuit configured to execute the method according to any one of claims 31-43.
An encoder, characterized by comprising a processing circuit configured to execute the method according to any one of claims 1-30.
A computer program product, characterized in that it includes program code, which is used to execute the method according to any one of claims 1-43 when it is executed on a computer or a processor.
A decoder, characterized in that it comprises:

one or more processors;

A non-transitory computer-readable storage medium, coupled to the processor, storing a program executed by the processor, wherein the program, when executed by the processor, causes the decoder to perform the operation described in claim 31 - The method described in any one of 43.
An encoder, characterized in that it comprises:

one or more processors;

A non-transitory computer-readable storage medium, coupled to the processor, storing a program executed by the processor, wherein the program, when executed by the processor, causes the decoder to perform the operation described in claim 1 - the method of any one of 30.
A non-transitory computer-readable storage medium, characterized by comprising program code, which is used to execute the method according to any one of claims 1-43 when executed by a computer device.
A non-transitory storage medium, characterized by comprising a bit stream encoded based on the method according to any one of claims 1-43.