WO2023279961A1 - 视频图像的编解码方法及装置 - Google Patents
视频图像的编解码方法及装置 Download PDFInfo
- Publication number
- WO2023279961A1 WO2023279961A1 PCT/CN2022/100424 CN2022100424W WO2023279961A1 WO 2023279961 A1 WO2023279961 A1 WO 2023279961A1 CN 2022100424 W CN2022100424 W CN 2022100424W WO 2023279961 A1 WO2023279961 A1 WO 2023279961A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- coefficient
- coefficients
- probability distribution
- estimated
- scaling factor
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 171
- 238000009826 distribution Methods 0.000 claims abstract description 343
- 238000005070 sampling Methods 0.000 claims abstract description 159
- 238000013528 artificial neural network Methods 0.000 claims abstract description 58
- 238000012545 processing Methods 0.000 claims description 72
- 230000009466 transformation Effects 0.000 claims description 44
- 238000013139 quantization Methods 0.000 claims description 42
- 238000007781 pre-processing Methods 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 4
- 230000001131 transforming effect Effects 0.000 claims description 4
- 230000006835 compression Effects 0.000 abstract description 7
- 238000007906 compression Methods 0.000 abstract description 7
- 238000013473 artificial intelligence Methods 0.000 abstract 1
- 230000008569 process Effects 0.000 description 61
- 230000006870 function Effects 0.000 description 39
- 238000013461 design Methods 0.000 description 30
- 238000004891 communication Methods 0.000 description 28
- 238000010586 diagram Methods 0.000 description 26
- 238000012549 training Methods 0.000 description 24
- 238000013527 convolutional neural network Methods 0.000 description 14
- 230000005540 biological transmission Effects 0.000 description 12
- 230000001537 neural effect Effects 0.000 description 12
- 210000002569 neuron Anatomy 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 7
- 230000000306 recurrent effect Effects 0.000 description 7
- 230000004913 activation Effects 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 6
- 238000012805 post-processing Methods 0.000 description 6
- 238000013500 data storage Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 230000005055 memory storage Effects 0.000 description 5
- 238000003062 neural network model Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000001186 cumulative effect Effects 0.000 description 4
- 238000005315 distribution function Methods 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 238000000354 decomposition reaction Methods 0.000 description 3
- 239000004973 liquid crystal related substance Substances 0.000 description 3
- 230000033001 locomotion Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 229910052710 silicon Inorganic materials 0.000 description 2
- 239000010703 silicon Substances 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000012952 Resampling Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/625—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
Definitions
- the present application relates to the field of video images, in particular to a method and device for decoding video images.
- Digital images are image information recorded in the form of digital signals.
- a digital image (hereinafter referred to as an image) can be regarded as a two-dimensional array of M rows and N columns, including M ⁇ N samples, the position of each sample is called a sampling position, and the value of each sample is called a sample value.
- Image coding includes two steps of encoding and decoding.
- a typical coding process generally includes three steps of transformation, quantization and entropy coding.
- the first step is to decorrelate the image through transformation to obtain the transformation coefficient with more concentrated energy distribution;
- the second step is to quantize the transformation coefficient to obtain the quantization coefficient;
- the third step is to entropy encode the quantization coefficient Get the compressed code stream.
- a typical decoding process includes three steps of entropy decoding, inverse quantization and inverse transformation in sequence after the decoder receives the compressed code stream to obtain the reconstructed image.
- the present application provides a method and device for decoding video images.
- multiple times of decoding on a single compressed code stream can obtain images with different properties.
- the probability distribution used in sampling can be adjusted based on user requirements, thereby improving the quality of the reconstructed image.
- the invention relates to a method of decoding video images.
- the method is performed by a decoding device.
- the method includes: obtaining a plurality of coefficients according to the compressed code stream of the data to be decoded, and the plurality of coefficients include a first coefficient; performing probability estimation according to the context information of the first coefficient to obtain a first probability distribution; sampling according to the first probability distribution , to obtain the first estimated coefficient, and obtain the reconstructed image according to the first estimated coefficient.
- the first estimated coefficient may be an estimated value of the first coefficient.
- the data to be decoded may be an image, an image block, a slice, or any region of an image.
- the above multiple coefficients also include a second coefficient
- the method of the present application also includes:
- the second estimated coefficient is obtained after the first estimated coefficient.
- each decoding process of the compressed code stream probability estimation is performed on the decoded coefficients, and sampling is performed based on the probability estimation results to obtain estimated coefficients, and the estimated coefficients obtained by re-sampling are obtained to obtain reconstructed images. Since the sampling process is random and is an uncertain process, multiple images of different properties can be obtained by performing multiple decodings on the same compressed code stream in the above manner. For example, the image with the best subjective quality and the image with the best objective quality.
- multiple coefficients are obtained according to the compressed code stream of the data to be decoded, including:
- the multiple coefficients are multiple quantized wavelet coefficients
- multiple coefficients are obtained according to the compressed code stream of the data to be decoded, including:
- the multiple coefficients are multiple quantized off-line cosine transform (discrete cosine transform, DCT) coefficients; or,
- multiple coefficients are obtained according to the compressed code stream of the data to be decoded, including:
- multiple coefficients are obtained according to the compressed code stream of the data to be decoded, including:
- Decoding the compressed code stream to obtain a plurality of coefficients the plurality of coefficients are reconstruction values of a plurality of initial pixels; or,
- the probability estimation according to the context information of the first coefficient to obtain the first probability distribution includes:
- obtaining the probability distribution model of the first coefficient processing the context information of the first coefficient through the first probability estimation network to obtain the parameters of the probability distribution model; obtaining the first probability distribution according to the probability distribution model and the parameters of the probability distribution model;
- the first probability estimation network and the second probability estimation network are implemented based on a neural network
- the context information of the first coefficient includes some or all of the coefficients, and/or, some or all of the estimated coefficients obtained by sampling all.
- the above-mentioned probability distribution model can be a Gaussian model, a Laplace model, a mixed Gaussian model or other models; when the above-mentioned probability distribution model is a Gaussian model, the parameters of the probability distribution model include mean and variance; when the above-mentioned probability When the distribution model is a Laplace model, the parameters of the probability distribution model include position parameters and size parameters.
- the aforementioned neural network may be a convolutional neural network, a deep neural network, a recurrent neural network or other neural networks.
- the above-mentioned first probability estimation network and the second probability estimation network have different structures and parameters, or the first probability estimation network and the second probability estimation network have the same structure but different parameters.
- the first probability distribution can be obtained in the above manner, so as to prepare for subsequent sampling based on the first probability distribution.
- the first probability distribution is a Gaussian distribution
- sampling is performed according to the first probability distribution to obtain the first estimated coefficients, including:
- the first random number is a uniformly distributed random number on [0,1] generated using the linear congruence method.
- the first reference value may follow a standard Gaussian distribution, an ordinary Gaussian distribution, an asymmetric Gaussian distribution, a single Gaussian model, a mixed Gaussian model, or other Gaussian distributions.
- the first estimated coefficient obtained by sampling also has randomness, so that the reconstructed image obtained based on the first estimated coefficient also has randomness, that is, uncertainty.
- the sampling process is a random process and an uncertain process; the multiple reconstructed images obtained based on the estimated coefficients obtained by performing multiple samplings in the above-mentioned manner have different properties.
- the method of the present application also includes:
- Determining a first estimated coefficient according to the first reference value and the mean value and variance of the first probability distribution including:
- the first estimation coefficient is determined according to the first reference value, the mean value of the first probability distribution and the processed variance.
- the method of the present application further includes: preprocessing the mean value of the first probability distribution according to the scaling factor of the first coefficient, so as to obtain the processed mean value;
- Determining a first estimated coefficient according to the first reference value and the mean value and variance of the first probability distribution including:
- the first estimation coefficient is determined according to the first reference value, the variance of the first probability distribution and the processed mean value.
- the variance of the first probability distribution is preprocessed to obtain the processed variance, including:
- the multiple coefficients are multiple quantized wavelet coefficients, or the multiple coefficients are multiple reconstructed wavelet coefficients, or the multiple coefficients are multiple quantized DCT coefficients, or the multiple coefficients are multiple reconstructed DCT coefficients Coefficients, or multiple coefficients are multiple characteristic coefficients, preprocessing the variance of the first probability distribution to obtain the processed variance, including:
- the variance of the first probability distribution is preprocessed according to the scaling factor of the first coefficient to obtain the processed variance.
- the variance of the second probability distribution can also be preprocessed according to the scaling factor of the second coefficient, where
- the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient;
- the scaling factor of the first coefficient and the scaling factor of the second coefficient are different;
- the multiple coefficients are multiple quantized wavelet coefficients or multiple reconstructed wavelet coefficients
- the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient; or if The first coefficient and the second coefficient belong to different subbands, the scaling factor of the first coefficient and the scaling factor of the second coefficient are different,
- the multiple coefficients are multiple quantized DCT coefficients or multiple reconstructed DCT coefficients
- the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient; or if The first coefficient and the second coefficient belong to different frequency bands, then the scaling factor of the first coefficient and the scaling factor of the second coefficient are different
- the scaling factor of the first coefficient and the scaling factor of the second coefficient are the same ; or if the first coefficient and the second coefficient belong to different channels, the scaling factor of the first coefficient and the scaling factor of the second coefficient are different.
- multiple coefficients are multiple initial pixel reconstruction values, or multiple pixels are multiple transformed pixel values, and the variance of the first probability distribution is preprocessed to obtain the processed variance, including:
- the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient, or the scaling factor of the first coefficient is different from the scaling factor of the second coefficient.
- reconstructed images with different properties can be obtained according to user requirements, thereby improving the quality of the reconstructed images. For example, if the variance of the first probability distribution is set to 0 as the variance after processing, the reconstructed image with the best signal quality (best objective quality) can be obtained, that is, the peak signal to noise ratio (PSNR) of the image can be increased.
- PSNR peak signal to noise ratio
- MSE mean-square error
- the multiple coefficients are multiple quantized wavelet coefficients, or the multiple coefficients are multiple reconstructed wavelet coefficients, and the reconstructed image is obtained according to the first estimated coefficient and the second estimated coefficient, including:
- Inverse wavelet transform is performed on the first estimated coefficient and the second estimated coefficient to obtain a reconstructed image.
- obtaining the reconstructed image according to the first estimated coefficient and the second estimated coefficient includes:
- inverse quantization and inverse DCT are performed on the first estimated coefficient and the second estimated coefficient to obtain a reconstructed image, or
- inverse DCT is performed on the first estimated coefficient and the second estimated coefficient to obtain a reconstructed image.
- the multiple coefficients are multiple transformed pixel values, and the reconstructed image is obtained according to the first estimated coefficient and the second estimated coefficient, including:
- Inverse transformation is performed on the first estimated coefficient and the second estimated coefficient to obtain a reconstructed image.
- the sampling step can be repeated in the present application to obtain multiple reconstructed images.
- the multiple reconstructed images may be the reconstructed images with the best subjective quality, or the reconstructed images with the best objective quality.
- the reconstructed image can be used in the codec loop as a reference for intra-frame or inter-frame prediction; it can also be used outside the codec loop to optimize image quality as a post-processing method.
- the reconstructed image with the best subjective quality is put into the decoded picture buffer (DPB) or the reference frame set, which is used to encode and decode the frame in the loop
- DPB decoded picture buffer
- the present invention relates to a device for decoding a compressed code stream, and the beneficial effect can be referred to the description of the first aspect, which will not be repeated here.
- the decoding device has the function of implementing the actions in the method example of the first aspect above.
- the functions described above may be implemented by hardware, or may be implemented by executing corresponding software on the hardware.
- the hardware or software includes one or more modules corresponding to the above functions.
- the method described in the first aspect of the present invention can be performed by the device described in the second aspect of the present invention.
- Other features and implementations of the method according to the first aspect of the invention depend directly on the functionality and implementations of the device according to the second aspect of the invention.
- the invention relates to a device for decoding a video stream, comprising a processor and a memory.
- the memory stores instructions, and the instructions cause the processor to execute the method described in the first aspect.
- a computer readable storage medium having stored thereon instructions which, when executed, cause one or more processors to encode video data.
- the instructions cause the one or more processors to execute the method in any possible embodiment of the first aspect.
- the invention relates to a computer program product comprising program code which, when run, performs the method of any one of the possible embodiments of the first aspect.
- FIG. 1 is a block diagram of an example of a video decoding system for implementing an embodiment of the present application
- FIG. 2 is a block diagram of another example of a video decoding system for implementing an embodiment of the present application
- FIG. 3 is a schematic block diagram of a video decoding device for implementing an embodiment of the present application
- FIG. 4 is a schematic block diagram of a video decoding device for implementing an embodiment of the present application.
- FIG. 5 is a schematic structural diagram of a video encoder and decoder provided in an embodiment of the present application.
- Figure 6a is a schematic diagram of the results after a wavelet transformation
- Fig. 6b is a schematic diagram of the processing flow of wavelet transform
- Fig. 6c is a schematic structural diagram of the deep network used for prediction and updating in Fig. 6b;
- Fig. 6d is a schematic structural diagram of a probability estimation network provided by an embodiment of the present application.
- Fig. 6e is a schematic diagram of the processing flow of wavelet inverse transform
- Fig. 7 is a schematic diagram of model training provided by the embodiment of the present application.
- FIG. 8a is a schematic structural diagram of another video decoder provided by an embodiment of the present application.
- FIG. 8b is a schematic structural diagram of another video decoder provided by an embodiment of the present application.
- FIG. 9 is a schematic structural diagram of another video decoder provided by an embodiment of the present application.
- FIG. 10a is a schematic structural diagram of another video decoder provided by an embodiment of the present application.
- FIG. 10b is a schematic structural diagram of another video decoder provided by an embodiment of the present application.
- FIG. 11 is a schematic diagram of a decoding process provided by an embodiment of the present application.
- the embodiment of the present application provides an AI-based video image compression technology, especially a neural network-based video compression technology, and specifically provides a probability distribution and sampling-based decoding method to improve the traditional hybrid video codec system .
- Video coding generally refers to the processing of sequences of images that form a video or video sequence.
- the terms "picture”, “frame” or “image” may be used as synonyms.
- Video coding (or commonly referred to as coding) includes two parts: video coding and video decoding.
- Video encoding is performed on the source side and typically involves processing (eg, compressing) raw video images to reduce the amount of data needed to represent the video images (and thus more efficient storage and/or transmission).
- Video decoding is performed at the destination and typically involves inverse processing relative to the encoder to reconstruct the video image.
- the "encoding" of video images (or generally referred to as images) involved in the embodiments should be understood as “encoding” or “decoding” of video images or video sequences.
- the encoding part and the decoding part are also collectively referred to as codec (encoding and decoding, CODEC).
- the original video image can be reconstructed, ie the reconstructed video image has the same quality as the original video image (assuming no transmission loss or other data loss during storage or transmission).
- further compression is performed by quantization, etc., to reduce the amount of data required to represent the video image, and the decoder side cannot completely reconstruct the video image, that is, the quality of the reconstructed video image is lower than that of the original video image. low or poor.
- the neural network can be composed of neural units, and the neural unit can refer to an operation unit that takes xs and intercept 1 as input, and the output of the operation unit can be:
- Ws is the weight of xs
- b is the bias of the neural unit.
- f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
- the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
- a neural network is a network formed by connecting multiple above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit.
- the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field.
- the local receptive field can be an area composed of several neural units.
- Deep neural network also known as multi-layer neural network
- DNN can be understood as a neural network with multiple hidden layers.
- DNN is divided according to the position of different layers, and the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
- the first layer is the input layer
- the last layer is the output layer
- the layers in the middle are all hidden layers.
- the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
- DNN looks complicated, it is actually not complicated in terms of the work of each layer.
- it is the following linear relationship expression: in, is the input vector, is the output vector, Is the offset vector, W is the weight matrix (also called coefficient), a() is the activation function.
- Each layer is just an input vector After such a simple operation to get the output vector Due to the large number of DNN layers, the coefficient W and the offset vector The number is also higher.
- DNN The definition of these parameters in DNN is as follows: Take the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the fourth neuron of the second layer to the second neuron of the third layer is defined as The superscript 3 represents the layer number of the coefficient W, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
- the coefficient from the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as
- the input layer has no W parameter.
- more hidden layers make the network more capable of describing complex situations in the real world. Theoretically speaking, a model with more parameters has a higher complexity and a greater "capacity", which means that it can complete more complex learning tasks.
- Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
- Convolutional neural network is a deep neural network with a convolutional structure.
- the convolutional neural network contains a feature extractor composed of a convolutional layer and a subsampling layer, which can be regarded as a filter.
- the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
- a neuron can only be connected to some adjacent neurons.
- a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units of the same feature plane share weights, and the shared weights here are convolution kernels.
- Shared weights can be understood as a way to extract image information that is independent of location.
- the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network.
- the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
- Recurrent neural networks are used to process sequence data.
- RNN Recurrent neural networks
- the layers are fully connected, and each node in each layer is disconnected.
- this ordinary neural network solves many problems, it is still powerless to many problems. For example, if you want to predict what the next word in a sentence is, you generally need to use the previous words, because the preceding and following words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output.
- RNN can process sequence data of any length.
- the training of RNN is the same as that of traditional CNN or DNN.
- RNN is designed to allow machines to have the ability to remember like humans. Therefore, the output of RNN needs to depend on the current input information and historical memory information.
- the neural network can use the error back propagation (back propagation, BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, passing the input signal forward until the output will generate an error loss, and updating the parameters in the initial neural network model by backpropagating the error loss information, so that the error loss converges.
- the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the neural network model, such as the weight matrix.
- the encoder 20 and the decoder 30 are described with reference to FIGS. 1-3 .
- FIG. 1 is a schematic block diagram of an exemplary decoding system 10 , such as a video decoding system 10 (or simply referred to as the decoding system 10 ), which may utilize the techniques of the present application.
- Video encoder 20 (or simply encoder 20) and video decoder 30 (or simply decoder 30) in video coding system 10 represent devices, etc. that may be used to perform techniques according to various examples described in this application. .
- the decoding system 10 includes a source device 12 for providing coded image data 21 such as coded images to a destination device 14 for decoding the coded image data 21 .
- the source device 12 includes an encoder 20 , and optionally, an image source 16 , a preprocessor (or a preprocessing unit) 18 such as an image preprocessor, and a communication interface (or a communication unit) 22 .
- Image source 16 may include or be any type of image capture device for capturing real world images, etc., and/or any type of image generation device, such as a computer graphics processor or any type of Devices for acquiring and/or providing real-world images, computer-generated images (e.g., screen content, virtual reality (VR) images, and/or any combination thereof (e.g., augmented reality (AR) images). So
- the image source may be any type of memory or storage that stores any of the above images.
- the image (or image data) 17 may also be referred to as an original image (or original image data) 17 .
- the preprocessor 18 is used to receive (original) image data 17 and perform preprocessing on the image data 17 to obtain a preprocessed image (or preprocessed image data) 19 .
- preprocessing performed by preprocessor 18 may include cropping, color format conversion (eg, from RGB to YCbCr), color grading, or denoising. It can be understood that the preprocessing unit 18 can be an optional component.
- a video encoder (or encoder) 20 is used to receive preprocessed image data 19 and provide encoded image data 21 (to be further described below with reference to FIG. 2 etc.).
- the communication interface 22 in the source device 12 may be used to receive the encoded image data 21 and send the encoded image data 21 (or any other processed version) via the communication channel 13 to another device such as the destination device 14 or any other device for storage Or rebuild directly.
- the destination device 14 includes a decoder 30 , and may also optionally include a communication interface (or communication unit) 28 , a post-processor (or post-processing unit) 32 and a display device 34 .
- the communication interface 28 in the destination device 14 is used to receive the coded image data 21 (or any other processed version) directly from the source device 12 or from any other source device such as a storage device, for example, the storage device is a coded image data storage device, And the coded image data 21 is supplied to the decoder 30 .
- the communication interface 22 and the communication interface 28 can be used to pass through a direct communication link between the source device 12 and the destination device 14, such as a direct wired or wireless connection, etc., or through any type of network, such as a wired network, a wireless network, or any other Combination, any type of private network and public network or any combination thereof, send or receive coded image data (or coded data) 21 .
- the communication interface 22 can be used to encapsulate the encoded image data 21 into a suitable format such as a message, and/or use any type of transmission encoding or processing to process the encoded image data, so that it can be transmitted over a communication link or communication network on the transmission.
- the communication interface 28 corresponds to the communication interface 22, eg, can be used to receive the transmission data and process the transmission data using any type of corresponding transmission decoding or processing and/or decapsulation to obtain the encoded image data 21 .
- Both the communication interface 22 and the communication interface 28 can be configured as a one-way communication interface as indicated by an arrow from the source device 12 to the corresponding communication channel 13 of the destination device 14 in FIG. 1, or a two-way communication interface, and can be used to send and receive messages etc., to establish the connection, confirm and exchange any other information related to the communication link and/or data transmission such as encoded image data transmission, etc.
- the video decoder (or decoder) 30 is used to receive encoded image data 21 and provide decoded image data (or decoded image data) 31 (which will be further described below with reference to FIG. 3 , etc.).
- the post-processor 32 is used to perform post-processing on decoded image data 31 (also referred to as reconstructed image data) such as a decoded image to obtain post-processed image data 33 such as a post-processed image.
- Post-processing performed by post-processing unit 32 may include, for example, color format conversion (e.g., from YCbCr to RGB), color grading, cropping, or resampling, or any other processing for producing decoded image data 31 for display by a display device 34 or the like. .
- the display device 34 is used to receive the post-processed image data 33 to display the image to a user or viewer or the like.
- Display device 34 may be or include any type of display for representing the reconstructed image, eg, an integrated or external display screen or display.
- the display screen may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS) display, or a liquid crystal on silicon (LCoS) display. ), a digital light processor (DLP), or any type of other display.
- LCD liquid crystal display
- OLED organic light emitting diode
- plasma display e.g., a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS) display, or a liquid crystal on silicon (LCoS) display.
- DLP digital light processor
- the decoding system 10 also includes a training engine 25.
- the specific training process implemented by the training engine 25 can be found in the subsequent description and will not be described here.
- FIG. 1 shows the source device 12 and the destination device 14 as independent devices
- the device embodiment may also include the source device 12 and the destination device 14 or the functions of the source device 12 and the destination device 14 at the same time, that is, include the source device 12 and the destination device 14 at the same time.
- Device 12 or corresponding function and destination device 14 or corresponding function may be implemented using the same hardware and/or software or by separate hardware and/or software or any combination thereof.
- Encoder 20 e.g., video encoder 20
- decoder 30 e.g., video decoder 30
- processing circuitry such as one or more microprocessors, digital signal processors (digital signal processor, DSP), application-specific integrated circuit (ASIC), field-programmable gate array (field-programmable gate array, FPGA), discrete logic, hardware, video encoding dedicated processor or any combination thereof .
- Encoder 20 may be implemented by processing circuitry 46 to include the various modules discussed with reference to encoder 20 of FIG. 2 and/or any other encoder system or subsystem described herein.
- Decoder 30 may be implemented by processing circuitry 46 to include the various modules discussed with reference to decoder 30 of FIG.
- the processing circuitry 46 may be used to perform various operations discussed below. As shown in Figure 4, if part of the technology is implemented in software, the device can store the instructions of the software in a suitable non-transitory computer-readable storage medium, and use one or more processors to execute the instructions in hardware, thereby Perform the inventive technique.
- One of the video encoder 20 and the video decoder 30 may be integrated in a single device as part of a combined codec (encoder/decoder, CODEC), as shown in FIG. 2 .
- Source device 12 and destination device 14 may comprise any of a variety of devices, including any type of handheld or stationary device, such as a notebook or laptop computer, cell phone, smartphone, tablet or tablet computer, camera, desktop computers, set-top boxes, televisions, display devices, digital media players, video game consoles, video streaming devices (such as content service servers or content distribution servers), broadcast receiving devices, broadcast transmitting devices, etc., and may not Use or use any type of operating system.
- source device 12 and destination device 14 may be equipped with components for wireless communication. Accordingly, source device 12 and destination device 14 may be wireless communication devices.
- the video coding system 10 shown in FIG. 1 is merely exemplary, and the techniques provided herein are applicable to video coding settings (e.g., video coding or video decoding) that do not necessarily include coding devices and Decode any data communication between devices.
- data is retrieved from local storage, sent over a network, and so on.
- a video encoding device may encode and store data into memory, and/or a video decoding device may retrieve and decode data from memory.
- encoding and decoding are performed by devices that do not communicate with each other but simply encode data to memory and/or retrieve and decode data from memory.
- FIG. 2 is an illustrative diagram of an example of a video coding system 40 including video encoder 20 of FIG. 2 and/or video decoder 30 of FIG. 3, according to an example embodiment.
- the video decoding system 40 may include an imaging device 41, a video encoder 20, a video decoder 30 (and/or a video encoder/decoder implemented by a processing circuit 46), an antenna 42, one or more processors 43, a or multiple memory stores 44 and/or a display device 45 .
- imaging device 41 , antenna 42 , processing circuit 46 , video encoder 20 , video decoder 30 , processor 43 , memory storage 44 and/or display device 45 are capable of communicating with each other.
- the video coding system 40 may include only the video encoder 20 or only the video decoder 30 .
- antenna 42 may be used to transmit or receive an encoded bitstream of video data.
- display device 45 may be used to present video data.
- the processing circuit 46 may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like.
- the video decoding system 40 may also include an optional processor 43, and the optional processor 43 may similarly include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like.
- the memory storage 44 can be any type of memory, such as volatile memory (for example, static random access memory (static random access memory, SRAM), dynamic random access memory (dynamic random access memory, DRAM), etc.) or non-volatile memory volatile memory (for example, flash memory, etc.) and the like.
- volatile memory for example, static random access memory (static random access memory, SRAM), dynamic random access memory (dynamic random access memory, DRAM), etc.
- non-volatile memory volatile memory for example, flash memory, etc.
- memory storage 44 may be implemented by cache memory.
- processing circuitry 46 may include memory (eg, cache, etc.) for implementing an image buffer or the like.
- video encoder 20 implemented by logic circuitry may include an image buffer (eg, implemented by processing circuitry 46 or memory storage 44 ) and a graphics processing unit (eg, implemented by processing circuitry 46 ).
- a graphics processing unit may be communicatively coupled to the image buffer.
- Graphics processing unit may include video encoder 20 implemented by processing circuitry 46 to implement the various modules discussed with reference to FIG. 2 and/or any other encoder system or subsystem described herein.
- Logic circuits may be used to perform the various operations discussed herein.
- video decoder 30 may be implemented by processing circuitry 46 in a similar manner to implement the various aspects discussed with reference to video decoder 30 of FIG. 3 and/or any other decoder system or subsystem described herein. module.
- logic circuit implemented video decoder 30 may include an image buffer (implemented by processing circuit 46 or memory storage 44 ) and a graphics processing unit (eg, implemented by processing circuit 46 ).
- a graphics processing unit may be communicatively coupled to the image buffer.
- Graphics processing unit may include video decoder 30 implemented by processing circuitry 46 to implement the various modules discussed with reference to FIG. 3 and/or any other decoder system or subsystem described herein.
- antenna 42 may be used to receive an encoded bitstream of video data.
- an encoded bitstream may contain data related to encoded video frames, indicators, index values, mode selection data, etc., as discussed herein, such as data related to encoding partitions (e.g., transform coefficients or quantized transform coefficients , (as discussed) an optional indicator, and/or data defining an encoding split).
- Video coding system 40 may also include video decoder 30 coupled to antenna 42 and used to decode the encoded bitstream.
- a display device 45 is used to present video frames.
- the video decoder 30 may be used to perform a reverse process.
- the video decoder 30 may be configured to receive and parse such syntax elements and decode the associated video data accordingly.
- video encoder 20 may entropy encode the syntax elements into an encoded video bitstream.
- video decoder 30 may parse such syntax elements and decode the related video data accordingly.
- VVC Very video coding
- VCEG Video Coding Experts Group
- MPEG Motion Picture Experts Group
- HEVC High-Efficiency Video Coding
- JCT-VC Joint Collaboration Team on Video Coding
- FIG. 3 is a schematic diagram of a video decoding device 300 provided by an embodiment of the present invention.
- the video coding apparatus 300 is suitable for implementing the disclosed embodiments described herein.
- the video decoding device 300 may be a decoder, such as the video decoder 30 in FIG. 1 , or an encoder, such as the video encoder 20 in FIG. 1 .
- the video decoding device 300 includes: an input port 310 (or input port 310) for receiving data and a receiving unit (receiver unit, Rx) 320; a processor, a logic unit or a central processing unit (central processing unit) for processing data , CPU) 330;
- the processor 330 here can be a neural network processor 330; a sending unit (transmitter unit, Tx) 340 and an output port 350 (or output port 350) for transmitting data; memory 360.
- the video decoding device 400 may also include an optical-to-electrical (OE) component and an electrical-to-optical (EO) component coupled to the input port 310, the receiving unit 320, the transmitting unit 340 and the output port 350, For the exit or entrance of optical or electrical signals.
- OE optical-to-electrical
- EO electrical-to-optical
- the processor 330 is realized by hardware and software.
- Processor 330 may be implemented as one or more processor chips, cores (eg, multi-core processors), FPGAs, ASICs, and DSPs.
- Processor 330 is in communication with ingress port 310 , receiving unit 320 , transmitting unit 340 , egress port 350 and memory 360 .
- the processor 330 includes a decoding module 370 (eg, a neural network NN based decoding module 370 ).
- the decoding module 370 implements the embodiments disclosed above. For example, the decode module 370 performs, processes, prepares, or provides for various encoding operations.
- decoding module 370 is implemented as instructions stored in memory 360 and executed by processor 330 .
- Memory 360 including one or more magnetic disks, tape drives, and solid-state drives, may be used as an overflow data storage device for storing programs when such programs are selected for execution, and for storing instructions and data that are read during program execution.
- the memory 360 can be volatile and/or nonvolatile, and can be a read-only memory (ROM), a random access memory (RAM), a ternary content-addressable memory (ternary) content-addressable memory (TCAM) and/or static random-access memory (static random-access memory, SRAM).
- ROM read-only memory
- RAM random access memory
- TCAM ternary content-addressable memory
- SRAM static random-access memory
- FIG. 4 is a simplified block diagram of an apparatus 400 provided by an exemplary embodiment.
- the apparatus 400 may be used as either or both of the source device 12 and the destination device 14 in FIG. 1 .
- Processor 402 in apparatus 400 may be a central processing unit.
- processor 402 may be any other type of device or devices, existing or to be developed in the future, capable of manipulating or processing information. While the disclosed implementations can be implemented using a single processor, such as processor 402 as shown, it is faster and more efficient to use more than one processor.
- memory 404 in apparatus 400 may be a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device may be used as memory 404 .
- Memory 404 may include code and data 406 accessed by processor 402 via bus 412 .
- Memory 404 may also include an operating system 408 and application programs 410, including at least one program that allows processor 402 to perform the methods described herein.
- application programs 410 may include applications 1 through N, and also include a video coding application that performs the methods described herein.
- Apparatus 400 may also include one or more output devices, such as display 418 .
- display 418 may be a touch-sensitive display that combines the display with touch-sensitive elements that may be used to sense touch input.
- Display 418 may be coupled to processor 402 via bus 412 .
- bus 412 in device 400 is described herein as a single bus, bus 412 may include multiple buses. Additionally, secondary storage may be directly coupled to other components of device 400 or accessed over a network, and may include a single integrated unit such as a memory card or multiple units such as multiple memory cards. Accordingly, apparatus 400 may have a wide variety of configurations.
- video encoder 20 includes wavelet transform unit 202 , quantization 204 and entropy encoding unit 206 .
- Video decoder 30 includes entropy decoding unit 208, probability estimation unit 212, sampling unit 214 and wavelet inverse transformation unit 216, optionally, video decoder 30 also includes inverse quantization unit 210; the video codec shown in Figure 5, It can also be called an end-to-end video codec or a video codec based on an end-to-end video codec.
- the wavelet transform unit 202 performs wavelet transform N times on the data to be coded 201 to obtain 3N+1 subbands 203, wherein each subband contains one or more wavelet coefficients.
- the data to be encoded 201 may be an image in YUV444 format, and the three channels are processed separately without utilizing the correlation between channels.
- This embodiment is described based on a single-channel signal. It can be understood that the solution of this embodiment can be extended to a multi-channel joint processing method.
- Performing N times of wavelet transformation on the data to be coded 201 may be understood as performing N times of wavelet transformation on an image block or an image region, which is not limited here.
- the image area may be an image, sub-image, slice (slice), patch (patch), etc., which is not limited here.
- the quadtree-based division method in existing coding standards can be used to divide the image area, or the image or image area can be divided into image blocks of the same size (for example, divided into 8x8 image blocks on average).
- a wavelet transformation is performed on the data to be coded 201 to obtain four two-dimensional subbands LL1, HL1, LH1, and HH1 as shown in FIG. 6a, where each subband contains one or more wavelet coefficients.
- LL1 is called an approximate subband, which is a low-resolution approximation of the data to be coded 201 ;
- the wavelet transform unit 202 may use traditional wavelet transform or deep neural network-based wavelet transform or other similar transform methods to perform wavelet transform on the data to be coded 201, which is not specifically limited here.
- wavelet transform can be performed based on the flowchart shown in Fig. 6b.
- Figure 6b takes a one-dimensional signal as an example to describe the wavelet transform process: first, the input signal is sampled and decomposed, usually odd and even, to obtain two sampling signals, and then the two sampling signals are mutually predicted and updated Steps, and finally get two-way decomposition results, which are called the approximate component and the detail component, respectively.
- the prediction and updating steps can be alternately performed multiple times to obtain the final decomposition result, and are not limited to the two times shown in FIG. 6b.
- Predictions and updates are implemented based on deep networks.
- a and b in Fig. 6b denote scaling parameters to balance the energy of different components after the prediction and lifting steps.
- the quantization unit 204 quantizes the wavelet coefficients in the sub-bands obtained after the wavelet transformation to obtain the quantized wavelet coefficients 205 .
- each subband when quantizing each wavelet coefficient, each subband can be processed according to a preset order one, and then the wavelet coefficients in the current subband can be quantized according to a preset order two to obtain quantized wavelet coefficients, wherein the preset
- the order one can be the existing zigzag scanning order, for example: LL1 ⁇ HL1 ⁇ LH1 ⁇ HH1.
- the second preset order can be an existing zigzag scanning order, horizontal scanning order or vertical scanning order.
- uniform quantization may be used for quantization, and the quantization step size may be optimized during joint training, and each jointly trained model adopts a quantization step size.
- each wavelet coefficient is recorded as c
- the quantized wavelet coefficient 205 is recorded as Then the quantization process can be expressed as follows.
- QP represents the quantization step size
- [ ⁇ ] represents rounding
- the wavelet coefficients can be preprocessed to obtain the processed wavelet coefficients, and then the preprocessed wavelet coefficients can be quantized, for example: the obtained wavelet coefficients are passed through a neural network Perform feature extraction, and then quantify the feature extraction results. Processing the wavelet coefficients before quantization can enable the decoder to decode high-quality reconstructed images.
- the entropy coding unit 206 performs entropy coding on the quantized wavelet coefficients 205 to obtain a compressed code stream 217 .
- each subband when performing entropy coding on each quantized wavelet coefficient 205, each subband may be processed according to a preset order one, and then entropy coded on the quantized wavelet coefficients 205 in the subband according to a preset order two to obtain a compressed code stream .
- Entropy encoding is performed on each quantized wavelet coefficient 205 (for convenience of description, referred to as coefficient in the embodiment), including: performing probability estimation on each coefficient to obtain the probability distribution of the coefficient, and then entropy encoding the coefficient according to the probability distribution of the coefficient coding.
- the probability distribution of the coefficients can be determined as follows:
- the probability distribution model may be: a single Gaussian model (Gaussian single model, GSM), an asymmetric Gaussian model, a mixed Gaussian model (Gaussian mixture model, GMM) or a Laplace distribution model (Laplace distribution).
- GSM Gaussian single model
- GMM mixed Gaussian model
- Laplace distribution Laplace distribution
- the probability estimation network can be implemented based on a deep learning network, such as a recurrent neural network (recurrent neural network, RNN) and a pixel convolutional neural network (Pixel convolutional neural network, PixelCNN), etc., which are not limited here.
- the probability distribution model is a Gaussian model (single Gaussian model or asymmetric Gaussian model or mixed Gaussian model)
- the context information of the current coefficient is input into the probability estimation network for processing to obtain the parameters of the Gaussian model, including the mean value ⁇ and variance ⁇ ; input the mean value ⁇ and variance ⁇ into the probability distribution model used to obtain the probability distribution of the current coefficient.
- the probability distribution model is a Laplace distribution model
- the context information of the current coefficient is input into the probability estimation network for processing, and the parameters of the Laplace distribution model are obtained, including the position parameter ⁇ and the scale parameter b;
- the location parameter ⁇ and scale parameter b are brought into the probability distribution model to obtain the probability distribution of the current coefficient.
- a typical PixelCNN-based probability estimation network is shown in Fig. 6d.
- H ⁇ W indicates that the current convolutional layer uses a convolution kernel of size H ⁇ W
- ResB indicates the residual module (refer to the right image in Figure 6c)
- */relu indicates that the relu is used after the current layer activation function.
- the above context information of the current coefficient includes: coded coefficients in a preset area, the preset area includes the area within the sub-band where the current coefficient is located, or the area outside the sub-band where the current coefficient is located. Do limited. Taking Fig. 6a as an example, when the current coefficient is a coefficient in the subband LL1, coded coefficients in a certain area in the subband LL1 may be used as the context information of the current coefficient. When the current coefficient is a coefficient in the sub-band HL1, the coded coefficient in the sub-band LL1 or in a certain area in the HL1 can be used as the context information of the current coefficient.
- the entropy decoding unit 208 performs entropy decoding on the compressed code stream 207 to obtain a plurality of quantized wavelet coefficients 209 .
- each subband when processing each wavelet coefficient in the compressed code stream 207, each subband can be processed according to the preset order one, and then the current subband can be paired according to the preset order two Entropy decoding is performed on the code stream corresponding to the wavelet coefficients to obtain quantized wavelet coefficients 209 .
- the preset order 1 and the preset order 2 may be the same as those at the encoding end, which are not limited here.
- the inverse quantization unit 210 performs inverse quantization on multiple quantized wavelet coefficients 209 to obtain multiple reconstructed wavelet coefficients 211 .
- each subband may be processed according to the preset order 1, and then the quantized wavelet coefficient 209 in the current subband may be dequantized according to the preset order 2 to obtain the reconstructed wavelet coefficient 211; specifically, the quantized wavelet coefficient 209 is multiplied by the corresponding quantization step size to obtain the reconstructed wavelet coefficient 211.
- the quantization step size may be QP.
- the preset order 1 and the preset order 2 may be the same as those at the encoding end, which are not limited here.
- the inverse quantization unit 210 is optional, so it is represented by a dotted line in FIG. 5 .
- the input data may be multiple quantized wavelet coefficients or multiple reconstructed wavelet coefficients.
- the data input into the probability estimation unit 212 is referred to as multiple coefficients.
- the function of the probability estimation unit 212 will be described taking the first coefficient and the second coefficient among the plurality of coefficients as an example.
- the probability estimation unit 212 performs probability estimation according to the context information of the first coefficient to obtain a first probability distribution 213; performs probability estimation according to the context information of the second coefficient and/or estimated coefficients obtained by sampling to obtain a second probability distribution 213, wherein , the estimated coefficients obtained by sampling include first estimated coefficients, and the first estimated coefficients are obtained before the second estimated coefficients.
- the probability estimation unit 212 performs probability estimation according to the context information of the first coefficient to obtain the first probability distribution 213, including:
- obtaining the probability distribution model of the first coefficient processing the context information of the first coefficient through the first probability estimation network to obtain the parameters of the probability distribution model; obtaining the first probability distribution according to the probability distribution model and the parameters of the probability distribution model;
- the first probability estimation network and the second probability estimation network are implemented based on a neural network
- the context information of the first coefficient includes some or all of the coefficients, and/or, some or all of the estimated coefficients obtained by sampling all.
- the context information of the first coefficients may include quantized wavelet coefficients in the first region and estimated coefficients in the second region, wherein the first The area is any area in the subband where the first coefficient is located in the quantized wavelet coefficient map, and the second area is any area in the subband where the first coefficient is located in the estimated coefficient map; the data input to the probability estimation unit 212 is the reconstructed wavelet coefficient
- the context information of the first coefficient may include the reconstructed wavelet coefficient in the first area and the estimated coefficient in the second area, wherein the first area is any area in the subband where the first coefficient in the reconstructed wavelet coefficient map is located, The second area is any area within the subband where the first coefficient in the estimated coefficient map is located.
- the quantized wavelet coefficient map is an image composed of the multiple quantized wavelet coefficients
- the reconstructed wavelet coefficient map is an image composed of the multiple reconstructed wavelet coefficients.
- the estimated coefficient map is an image composed of a plurality of estimated coefficients that have been sampled.
- the second area can be any area in the sub-band LL1; when the first coefficient is in the HL1, the second area is the sub-band Any area within LL1 or within HL1.
- the second probability distribution can be determined as above, or the second probability distribution can be determined as follows:
- the probability estimation unit 212 performs probability estimation according to the context information of the second coefficient and/or the estimated coefficient obtained by sampling to obtain the second probability distribution 213, where the estimated coefficient obtained by sampling includes the first estimated coefficient, that is to say, the When the probability estimation obtains the second probability distribution, the data input into the following third probability estimation network or the fourth probability estimation network includes the first estimated coefficients.
- the probability distribution model of the second coefficient is obtained; the context information of the second coefficient and/or the estimated coefficient obtained by sampling are processed through the third probability estimation network to obtain the parameters of the probability distribution model; according to the probability distribution model and The parameters of the probability distribution model result in a second probability distribution;
- the third probability estimation network and the fourth probability estimation network are implemented based on a neural network
- the context information of the second coefficient includes some or all of the multiple coefficients, and/or, some or all of the estimated coefficients obtained by sampling all.
- both the first probability distribution and the second probability distribution are output by the probability estimation unit 212 , so they are marked with the same identifier 213 .
- Sampling is performed according to the first probability distribution 213 to obtain a first estimation coefficient 215 ; sampling is performed according to the second probability distribution 213 to obtain a second estimation coefficient 215 . Since the two sampling processes are consistent, the following uses the first probability distribution 213 as a Gaussian distribution to illustrate how to obtain the first estimated coefficient 215 by sampling according to the first probability distribution 213 .
- erf() is the Gaussian error function, which is the cumulative distribution function of the standard normal distribution, defined as follows:
- the variance of the first probability distribution 213 is processed, and the specific processing process includes: setting the variance of the first probability distribution 213 to 0 as the processed variance; and then according to the processed variance and The mean value of the first probability distribution 213 is sampled according to the above sampling method to obtain the first estimated coefficient 215 .
- the variance of the first probability distribution 213 is processed according to the scaling factor of the first coefficient, and then according to the processed variance and the mean value of the first probability distribution 213, sampling is performed according to the above sampling method to obtain the second - Estimated coefficient 215.
- the mean value of the first probability distribution 213 is processed according to the scaling factor of the first coefficient, and then according to the processed mean value and the variance of the first probability distribution 213, sampling is performed according to the above sampling method to obtain the second - Estimated coefficient 215.
- sampling is performed according to the first probability distribution 213 to obtain the first estimated coefficient 215, including:
- the scale parameter of the first probability distribution 213 is processed according to the scaling factor of the first coefficient, and then according to the processed scale parameter and the position parameter of the first probability distribution 213, the above sampling method is performed Sampling yields first estimated coefficients 215 .
- the location parameter of the first probability distribution 213 is processed according to the scaling factor of the first coefficient, and then the sampling method is performed according to the processed location parameter and the scale parameter of the first probability distribution 213 Sampling yields first estimated coefficients 215 .
- the second estimation coefficient 215 can be obtained according to the second probability distribution 213 in the manner described above.
- the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient; or, the scaling factor of the first coefficient is different from the scaling factor of the second coefficient; or, if the first coefficient and the second coefficient belong to the same subband , then the scaling factor of the first coefficient and the scaling factor of the second coefficient are the same; or if the first coefficient and the second coefficient belong to different subbands, the scaling factor of the first coefficient and the scaling factor of the second coefficient are different, that is to say , the scaling factors of coefficients belonging to the same subband are the same, and the scaling factors of coefficients belonging to different subbands are different.
- reconstructed images with different properties can be obtained according to user requirements. For example, if the variance of the first probability distribution is set to 0 as the variance after processing, the reconstructed image with the best signal quality (best objective quality) can be obtained, that is, the PSNR of the image can be increased or the MSE can be reduced; by scaling multiple coefficients If the factors are set to be the same, the image with the best subjective quality can be obtained, that is, to reduce the PSNR of the image or to increase the MSE of the image; The scaling factors of the coefficients are set to be different, and images whose properties are between the best subjective quality and the best objective quality can be obtained.
- both the first estimated coefficient 215 and the second estimated coefficient 215 are output by the probability estimation unit 212, so they are marked with the same identifier "215".
- the context information of the first coefficient also includes an estimated coefficient obtained before the first estimated coefficient.
- the wavelet inverse transform unit 216 performs wavelet inverse transform on a plurality of estimated coefficients (including the first estimated coefficient and the second estimated coefficient) to obtain the reconstructed image 217 .
- the wavelet inverse transform method at the decoding end may use traditional wavelet inverse transform or deep network-based wavelet inverse transform or other similar transform methods, which are not limited here.
- the flowchart of wavelet inverse transform based on deep network is shown in Fig. 6e.
- Figure 6e takes a one-dimensional signal as an example to describe the process of wavelet inverse transformation: contrary to the forward transformation process shown in Figure 6b, the approximate component and detail component are first multiplied by parameters 1/a and 1/b, and then updated mutually and the prediction step to obtain two signals, respectively corresponding to the odd-numbered sampling component and the even-numbered sampling component of the original input signal, and finally the two-way signals are combined to obtain a reconstructed signal.
- the above-mentioned encoder 20 and decoder 30 need to be cascaded for joint training.
- the purpose of the training is to optimize the parameters of the relevant deep network modules used in the encoding and decoding process, including wavelet forward and inverse transforms based on deep networks, entropy coding based on deep networks, and probability estimation networks based on deep neural networks.
- Figure 7 shows the block diagram of joint training.
- the loss function used is:
- the loss function includes three items: the code rate given by the entropy coding based on the deep network The logarithmic likelihood logq(c) of the wavelet coefficient c on the probability distribution q obtained by the wavelet forward transformation; the reconstructed sample image obtained by inverse transformation using the mean value of q The mean square error between the input sample image x; ⁇ adjusts the importance between the code rate and the reconstruction loss, and different ⁇ generates different models for compressing images at different compression ratios.
- loss function for the second term of the above loss function, other loss functions can also be used, such as multi-scale structural similarity (multi-scale structural similarity, MS-SSIM) between the reconstructed sample image and the sample image, deep feature loss Wait.
- MS-SSIM multi-scale structural similarity
- the above-mentioned training process is realized by the training engine 50, and the training process includes initialization training and joint training, wherein the initialization training process includes:
- the training engine 50 trains the initialization codec based on the sample image The model converges until the loss value obtained based on the above loss function; at this time, keep the parameters of the probability estimation network and the parameters of the deep network used for entropy coding unchanged, and use the wavelet forward and inverse transformation based on the deep network to replace the CDF9/7 wavelet Forward transformation and inverse transformation to obtain the joint model; the training engine 50 trains the joint model based on the sample images until the loss value obtained based on the above loss function converges; so far the training is completed.
- the above-mentioned deep network for wavelet forward transform, wavelet inverse transform, entropy coding and probability estimation network for probability estimation are obtained from the third-party device after the third-party device is trained based on the above training method .
- Fig. 8a is a schematic block diagram of an example of a video decoder for implementing the technology of the present application.
- the video decoder 30 includes an entropy decoding unit 802, a probability estimation unit 806, a sampling unit 808, and an inverse transformation unit 810.
- the video decoder 30 also includes an inverse quantization unit 804; as shown in FIG. 8a
- a video decoder can also be called an end-to-end video decoder or a video decoder based on an end-to-end video decoder.
- the data to be encoded includes image blocks, specifically including: dividing the original image or image area into image blocks of a preset size, and the size of the image blocks of the preset size can be 4x4, 8x8, 16x16, 32x32, 64x64 , 128x128 and 256x256 etc.
- the original image is divided to obtain one or more image blocks, and the size of the image blocks is not limited.
- the original image can be divided using the quadtree, binary tree or ternary tree division method in existing encoding standards (H266, H265, H264, AVS2 or AVS3) to obtain one or more image blocks.
- DCT is performed on the data to be coded to obtain a plurality of quantized DCT coefficients.
- the data to be encoded that is, the image block
- DCT After the data to be encoded (that is, the image block) undergoes DCT, its low-frequency components are concentrated in the upper left corner, and the high-frequency components are distributed in the lower right corner.
- the coefficient values in the first row and first column represent direct current (DC) coefficients, that is, the image block. Average values, other coefficients are alternating current (AC) coefficients.
- the AC coefficients and the DC coefficients are quantized to obtain quantized AC and DC coefficients, that is, a plurality of quantized DCT coefficients.
- One of the following methods can be used to perform entropy coding on multiple quantized DCT coefficients, which is not limited here:
- Method 1 Existing methods may be used to perform entropy coding on multiple quantized DCT coefficients, such as Huffman coding in JPEG and CABAC coding in HEVC.
- Method 2 First, perform probability modeling on each quantized DCT coefficient to obtain a probability distribution model, and then input the context information of the quantified multiple coefficients into the probability estimation network to estimate the parameters of the probability distribution model, and substitute the parameters of the probability distribution model into the probability distribution
- the model obtains the probability distribution of the quantized DCT coefficients, and performs entropy coding on the quantized DCT coefficients according to the probability distribution; performs entropy coding on a plurality of quantized DCT coefficients according to the above method to obtain a compressed code stream.
- the context information of the quantized DCT coefficients includes: part or all of the encoded quantized DCT coefficients.
- the above probability distribution model may be: a single Gaussian model, an asymmetric Gaussian model, a mixed Gaussian model, or a Laplace distribution model, etc., which are not limited here.
- the above-mentioned probability estimation network can use a network based on deep learning, such as RNN and PixelCNN, etc., which is not limited here.
- the entropy decoding unit 802 performs entropy decoding on the compressed code stream to obtain a plurality of quantized DCT coefficients.
- the compressed code stream includes a code stream of multiple DCT coefficients.
- the compressed code stream includes a code stream of multiple DCT coefficients.
- the DCT Entropy decoding is performed on the code stream corresponding to the coefficient to obtain the quantized DCT coefficient 209 .
- the method of performing probability estimation on each DCT coefficient to obtain the probability distribution of the coefficient is the same as that at the encoding end, and will not be repeated here.
- the Huffman decoding method in JPEG and the CABAC decoding method in HEVC can be sampled to decode the compressed code stream to obtain multiple quantized DCT coefficients.
- the inverse quantization unit 804 performs inverse quantization on multiple quantized DCT coefficients to obtain multiple reconstructed DCT coefficients.
- each quantized DCT coefficient is multiplied by a corresponding quantization step size to obtain a reconstructed wavelet coefficient.
- the quantization step size may be QP.
- the inverse quantization unit 804 is optional, so it is represented by a dotted line in FIG. 8a.
- the input data may be multiple quantized DCT coefficients or multiple reconstructed DCT coefficients.
- the data input into the probability estimation unit 806 is referred to as multiple coefficients.
- the function of the probability estimation unit 806 will be described by taking the first coefficient and the second coefficient among the plurality of coefficients as an example.
- the probability estimation unit 806 performs probability estimation according to the context information of the first coefficient to obtain a first probability distribution; performs probability estimation according to the context information of the second coefficient and/or estimated coefficients obtained by sampling to obtain a second probability distribution, wherein, The estimated coefficients obtained by sampling include first estimated coefficients, and the first estimated coefficients are obtained before the second estimated coefficients.
- the probability estimation unit 806 performs probability estimation according to the context information of the first coefficient to obtain the first probability distribution, including:
- the fifth probability estimation network and the sixth probability estimation network are implemented based on a neural network
- the context information of the first coefficient includes some or all of the multiple coefficients, and/or, some or all of the estimated coefficients obtained by sampling all.
- the context information of the first coefficients may include quantized DCT coefficients in the third area and estimated coefficients in the fourth area, where the third A region is any region in the quantized DCT coefficient map; when the data input to the probability estimation unit 806 is a reconstructed DCT coefficient, the context information of the first coefficient may include the reconstructed DCT coefficient in the third region and the reconstructed DCT coefficient in the fourth region.
- the estimated coefficients of where the third area is any area of the reconstructed DCT coefficient map, and the fourth area is any area of the estimated coefficient map.
- the quantized DCT coefficient map is an image composed of the plurality of quantized DCT coefficients
- the reconstructed DCT coefficient map is an image composed of the plurality of reconstructed wavelet DCTs.
- the estimated coefficient map is an image composed of a plurality of sampled estimated coefficients.
- the second probability distribution can be determined as above, or the second probability distribution can be determined as follows:
- the probability estimation unit 806 performs probability estimation according to the context information of the second coefficient and/or the estimated coefficient obtained by sampling to obtain the second probability distribution, where the estimated coefficient obtained by sampling includes the first estimated coefficient, that is to say, the When the second probability distribution is estimated, the data input into the seventh probability estimation network or the eighth probability estimation network below includes the first estimated coefficients.
- the probability distribution model of the second coefficient is obtained; the context information of the second coefficient and/or the estimated coefficient obtained by sampling are processed through the seventh probability estimation network to obtain the parameters of the probability distribution model; according to the probability distribution model and The parameters of the probability distribution model result in a second probability distribution;
- the seventh probability estimation network and the eighth probability estimation network are implemented based on a neural network, and the context information of the second coefficient includes some or all of the multiple coefficients, and/or, some or all of the estimated coefficients obtained by sampling all.
- the sampling unit 808 performs sampling according to the first probability distribution to obtain the first estimated coefficient; samples according to the second probability distribution to obtain the second estimated coefficient. Since the sampling processes of the two are consistent, the following uses the first probability distribution as a Gaussian distribution to illustrate how to obtain the first estimated coefficient by sampling according to the first probability distribution.
- erf() is the Gaussian error function, which is the cumulative distribution function of the standard normal distribution, defined as follows:
- z 2 ⁇ z 1 + ⁇
- z 2 obeys the Gaussian distribution with mean value ⁇ and variance ⁇
- z 2 is the above-mentioned first estimated coefficient, where ⁇ and ⁇ are respectively the variance of the above-mentioned first probability distribution and mean.
- the specific processing process includes: setting the variance of the first probability distribution to 0 as the processed variance; and then according to the processed variance and the first The mean value of the probability distribution is sampled according to the above sampling manner to obtain the first estimated coefficient.
- the variance of the first probability distribution is processed according to the scaling factor of the first coefficient, and then the first estimate is obtained by sampling according to the above sampling method according to the processed variance and the mean value of the first probability distribution coefficient.
- the mean value of the first probability distribution is processed according to the scaling factor of the first coefficient, and then the first estimate is obtained by sampling according to the above sampling method according to the processed mean value and the variance of the first probability distribution coefficient.
- sampling is performed according to the first probability distribution to obtain the first estimated coefficient, including:
- the scale parameter of the first probability distribution is processed according to the scaling factor of the first coefficient, and then according to the processed scale parameter and the position parameter of the first probability distribution, sampling is performed according to the above sampling method to obtain The first estimated coefficient.
- the position parameter of the first probability distribution is processed according to the scaling factor of the first coefficient, and then according to the processed position parameter and the scale parameter of the first probability distribution, sampling is performed according to the above sampling method to obtain The first estimated coefficient.
- the second estimation coefficient can be obtained according to the second probability distribution in the above manner.
- the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient; or, the scaling factor of the first coefficient is different from the scaling factor of the second coefficient; or, if the first coefficient and the second coefficient belong to the same frequency band , then the scaling factor of the first coefficient and the scaling factor of the second coefficient are the same; or if the first coefficient and the second coefficient belong to different frequency bands, the scaling factor of the first coefficient and the scaling factor of the second coefficient are different, that is, The scaling factors of coefficients belonging to the same frequency band are the same, and the scaling factors of coefficients belonging to different frequency bands are different.
- the value range of the scaling factor is [0,1].
- a frequency band can be understood as a coefficient block (a coefficient block obtained by performing DCT transformation on an image block, because the DCT transformation is based on a block) or as coefficients at the same position in each coefficient block to form a frequency band.
- reconstructed images with different properties can be obtained according to user requirements. For example, if the variance of the first probability distribution is set to 0 as the variance after processing, the reconstructed image with the best signal quality (best objective quality) can be obtained, that is, the PSNR of the image can be increased or the MSE can be reduced; by scaling multiple coefficients If the factors are set to be the same, the image with the best subjective quality can be obtained, that is, to reduce the PSNR of the image or to increase the MSE of the image; The scaling factors of the coefficients are set to be different, and images whose properties are between the best subjective quality and the best objective quality can be obtained.
- the context information of the first coefficient also includes an estimated coefficient obtained before the first estimated coefficient.
- the inverse transform unit 810 performs inverse DCT on a plurality of estimated coefficients (including the first estimated coefficient and the second estimated coefficient) to obtain a reconstructed image.
- the multiple estimated coefficients are dequantized and dequantized through the inverse quantization unit 804 and the inverse transform unit 810 successively. Inverse DCT to obtain a reconstructed image, as shown in Figure 8b.
- FIG. 9 is a schematic block diagram of an example of a video decoder for implementing the techniques of the present application.
- the video decoder 30 includes an entropy decoding unit 902, a probability estimation unit 904, a sampling unit 906, and a reconstruction unit 908; the video decoder shown in FIG. 9 may also be referred to as an end-to-end video decoder or Video decoder based on end-to-end video decoder.
- the entropy decoding unit 902 performs entropy decoding on the compressed code stream to obtain a plurality of reconstruction feature coefficients.
- the entropy decoding unit 902 entropy decodes the side information from the compressed code stream Then based on side information Probability estimation is performed on each reconstructed feature coefficient, and the probability distribution of each reconstructed feature coefficient is obtained.
- the entropy decoding unit 902 entropy-decodes a plurality of reconstruction feature coefficients from the compressed code stream according to the probability distribution of the reconstruction feature coefficients.
- the multiple reconstruction feature coefficients can constitute a reconstruction feature map, and the size of the reconstruction feature map can be expressed as CxWxH, where C generally refers to the number of channels (channel), and W and H are the width and height of each channel.
- side information It is also a kind of feature information, that is, a three-dimensional feature map, which contains fewer feature coefficients than the number of feature elements in the feature map y obtained by feature extraction of the data to be encoded.
- the input data may be multiple quantized feature coefficients or multiple reconstruction feature coefficients.
- the data input into the probability estimation unit 904 is referred to as multiple coefficients.
- the function of the probability estimating unit 904 will be described by taking the first coefficient and the second coefficient among the plurality of coefficients as an example.
- the probability estimation unit 904 performs probability estimation according to the context information of the first coefficient to obtain a first probability distribution; performs probability estimation according to the context information of the second coefficient and/or estimated coefficients obtained by sampling to obtain a second probability distribution, wherein the The estimated coefficients obtained by sampling include first estimated coefficients, and the first estimated coefficients are obtained before the second estimated coefficients.
- the probability estimation unit 904 performs probability estimation according to the context information of the first coefficient to obtain the first probability distribution, including:
- the ninth probability estimation network and the tenth probability estimation network are implemented based on a neural network
- the context information of the first coefficient includes some or all of the multiple coefficients, and/or, some or all of the estimated coefficients obtained by sampling all.
- the context information of the first coefficient may include the quantized characteristic coefficient in the fifth area and the estimated coefficient in the sixth area, wherein the fifth The area is any area in the quantized feature coefficient map, and the sixth area is any area in the estimated coefficient map.
- the above-mentioned quantization feature coefficient map is an image composed of the above-mentioned multiple quantization feature coefficients.
- the above estimated coefficient map is an image composed of a plurality of estimated coefficients that have been sampled.
- the second probability distribution can be determined as above, or the second probability distribution can be determined as follows:
- the probability estimation unit 904 performs probability estimation according to the context information of the second coefficient and/or the estimated coefficient obtained by sampling to obtain the second probability distribution, where the estimated coefficient obtained by sampling includes the first estimated coefficient, that is to say, when the probability When the second probability distribution is estimated, the data input into the following eleventh probability estimation network or the twelfth probability estimation network includes the first estimated coefficients.
- the probability distribution model of the second coefficient is obtained; the context information of the second coefficient and/or the estimated coefficient obtained by sampling are processed through the eleventh probability estimation network to obtain the parameters of the probability distribution model; according to the probability distribution model and the parameters of the probability distribution model to obtain a second probability distribution;
- the eleventh probability estimation network and the twelfth probability estimation network are implemented based on a neural network
- the context information of the second coefficient includes some or all of the coefficients, and/or, among the estimated coefficients obtained by sampling some or all.
- the sampling unit 906 performs sampling according to the first probability distribution to obtain the first estimated coefficient; samples according to the second probability distribution to obtain the second estimated coefficient. Since the sampling processes of the two are consistent, the following uses the first probability distribution as a Gaussian distribution to illustrate how to obtain the first estimated coefficient by sampling according to the first probability distribution.
- erf() is the Gaussian error function, which is the cumulative distribution function of the standard normal distribution, defined as follows:
- z 2 ⁇ z 1 + ⁇
- z 2 obeys the Gaussian distribution with mean value ⁇ and variance ⁇
- z 2 is the above-mentioned first estimated coefficient, where ⁇ and ⁇ are respectively the variance of the above-mentioned first probability distribution and mean.
- the specific processing process includes: setting the variance of the first probability distribution to 0 as the processed variance; and then according to the processed variance and the first The mean value of the probability distribution is sampled according to the above sampling manner to obtain the first estimated coefficient.
- the variance of the first probability distribution is processed according to the scaling factor of the first coefficient, and then the first estimate is obtained by sampling according to the above sampling method according to the processed variance and the mean value of the first probability distribution coefficient.
- the mean value of the first probability distribution is processed according to the scaling factor of the first coefficient, and then the first estimate is obtained by sampling according to the above sampling method according to the processed mean value and the variance of the first probability distribution coefficient.
- sampling is performed according to the first probability distribution to obtain the first estimated coefficient, including:
- the scale parameter of the first probability distribution is processed according to the scaling factor of the first coefficient, and then according to the processed scale parameter and the position parameter of the first probability distribution, sampling is performed according to the above sampling method to obtain The first estimated coefficient.
- the position parameter of the first probability distribution is processed according to the scaling factor of the first coefficient, and then according to the processed position parameter and the scale parameter of the first probability distribution, sampling is performed according to the above sampling method to obtain The first estimated coefficient.
- the second estimation coefficient can be obtained according to the second probability distribution in the above manner.
- the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient; or, the scaling factor of the first coefficient is different from the scaling factor of the second coefficient; or, if the first coefficient and the second coefficient belong to the same channel , then the scaling factor of the first coefficient and the scaling factor of the second coefficient are the same; or if the first coefficient and the second coefficient belong to different channels, the scaling factor of the first coefficient and the scaling factor of the second coefficient are different, that is, Coefficients belonging to the same channel have the same scaling factor, and coefficients belonging to different channels have different scaling factors.
- the value range of the scaling factor is [0,1].
- reconstructed images with different properties can be obtained according to user requirements. For example, if the variance of the first probability distribution is set to 0 as the variance after processing, the reconstructed image with the best signal quality (best objective quality) can be obtained, that is, the PSNR of the image can be increased or the MSE can be reduced; by scaling multiple coefficients If the factors are set to be the same, the image with the best subjective quality can be obtained, that is, to reduce the PSNR of the image or to increase the MSE of the image; The scaling factors of the coefficients are set to be different, and images whose properties are between the best subjective quality and the best objective quality can be obtained.
- the context information of the first coefficient also includes an estimated coefficient obtained before the first estimated coefficient.
- a plurality of estimated coefficients can be obtained, and the estimated coefficients constitute a reconstructed feature map.
- the reconstructed feature map can be input into the machine vision task module to perform corresponding machine tasks. For example, to complete machine vision tasks such as object classification, recognition, and segmentation; it can also be input into the reconstruction unit 908 .
- the machine vision task module performs corresponding machine tasks.
- the reconstruction unit 908 processes the reconstructed feature map to obtain a reconstructed image, that is, transforms the reconstructed image from the feature domain to the pixel domain.
- the reconstruction unit 908 can be implemented based on a neural network of any structure, such as a fully connected network, a convolutional neural network, a recurrent neural network, and the like.
- the neural network can adopt a multi-layer deep neural network structure to achieve a better estimation effect.
- Fig. 10a is a schematic block diagram of an example of a video decoder for implementing the technology of the present application.
- the video decoder 30 includes a decoding unit 1002, a probability estimation unit 1004 and a sampling unit 1006; in another example, the video decoder 30 includes a decoding unit 1002, a probability estimation unit 1004, a sampling unit 1006, a transform
- the unit 1008 and the inverse transformation unit 1010 are shown in FIG. 10b; the video decoder shown in FIG. 10a and FIG. 10b can also be called an end-to-end video decoder or a video decoder based on an end-to-end video decoder.
- the decoding unit 1002 decodes the compressed code stream, such as JPEG decoding, to obtain an initial reconstructed image, and the initial reconstructed image includes a plurality of initial pixel reconstruction values.
- the transformation unit 1008 transforms the initial reconstructed image, that is, transforms multiple initial pixel reconstruction values to obtain multiple transformed pixel values.
- the transformation method adopted by the transformation unit 1008 includes, but is not limited to, wavelet transformation, DCT, or feature extraction.
- the input data can be a plurality of initial pixel reconstruction values, or a plurality of transformed pixel values.
- the data input into the probability estimation unit 1004 is called a plurality of coefficients.
- the function of the probability estimation unit 1004 will be described by taking the first coefficient and the second coefficient among the plurality of coefficients as an example.
- the probability estimation unit 1004 performs probability estimation according to the context information of the first coefficient to obtain a first probability distribution; performs probability estimation according to the context information of the second coefficient and/or estimated coefficients obtained by sampling to obtain a second probability distribution, wherein the The estimated coefficients obtained by sampling include first estimated coefficients, and the first estimated coefficients are obtained before the second estimated coefficients.
- the probability estimation unit 1004 performs probability estimation according to the context information of the first coefficient to obtain the first probability distribution, including:
- Obtain the probability distribution model of the first coefficient process the context information of the first coefficient through the thirteenth probability estimation network to obtain the parameters of the probability distribution model; obtain the first probability distribution according to the probability distribution model and the parameters of the probability distribution model ;
- the thirteenth probability estimation network and the fourteenth probability estimation network are implemented based on a neural network, the context information of the first coefficient includes some or all of the plurality of coefficients, and/or, among the estimated coefficients that have been sampled some or all.
- the context information of the first coefficient may include the initial pixel reconstruction value in the seventh area and the estimated coefficient in the eighth area, wherein, the seventh area is any area in the initial reconstructed image; when the data input to the probability estimation unit 1004 is a transformed pixel value, the context information of the first coefficient may include the transformed pixel value in the seventh area and the first The estimated coefficients in eight regions, wherein the seventh region is any region in the transformed image obtained by transforming the initial reconstructed image, and the eighth region is any region in the estimated coefficient map.
- the estimated coefficient map is an image composed of a plurality of estimated coefficients that have been sampled.
- the second probability distribution can be determined as above, or the second probability distribution can be determined as follows:
- the probability estimation unit 1004 performs probability estimation according to the context information of the second coefficient and/or the estimated coefficient obtained by sampling to obtain the second probability distribution, where the estimated coefficient obtained by sampling includes the first estimated coefficient, that is to say, when the probability When the second probability distribution is estimated, the data input into the following fifteenth probability estimation network or the sixteenth probability estimation network includes the first estimated coefficients.
- the probability distribution model of the second coefficient is obtained; the context information of the second coefficient and/or the estimated coefficient obtained by sampling are processed through the fifteenth probability estimation network to obtain the parameters of the probability distribution model; according to the probability distribution model and the parameters of the probability distribution model to obtain a second probability distribution;
- the fifteenth probability estimation network and the sixteenth probability estimation network are implemented based on a neural network
- the context information of the second coefficient includes some or all of the plurality of coefficients, and/or, among the estimated coefficients that have been sampled some or all.
- the sampling unit 1006 performs sampling according to the first probability distribution to obtain the first estimated coefficient; samples according to the second probability distribution to obtain the second estimated coefficient. Since the sampling processes of the two are consistent, the following uses the first probability distribution as a Gaussian distribution to illustrate how to obtain the first estimated coefficient by sampling according to the first probability distribution.
- erf() is the Gaussian error function, which is the cumulative distribution function of the standard normal distribution, defined as follows:
- z 2 ⁇ z 1 + ⁇
- z 2 obeys the Gaussian distribution with mean value ⁇ and variance ⁇
- z 2 is the above-mentioned first estimated coefficient, where ⁇ and ⁇ are respectively the variance of the above-mentioned first probability distribution and mean.
- the variance of the first probability distribution is processed, and the specific processing process includes: setting the variance of the first probability distribution to 0 as the processed variance; and then according to the processed variance and the first The mean value of the probability distribution is sampled according to the above sampling manner to obtain the first estimated coefficient.
- the variance of the first probability distribution is processed according to the scaling factor of the first coefficient, and then the first estimate is obtained by sampling according to the above sampling method according to the processed variance and the mean value of the first probability distribution coefficient.
- the mean value of the first probability distribution is processed according to the scaling factor of the first coefficient, and then the first estimate is obtained by sampling according to the above sampling method according to the processed mean value and the variance of the first probability distribution coefficient.
- sampling is performed according to the first probability distribution to obtain the first estimated coefficient, including:
- the scale parameter of the first probability distribution is processed according to the scaling factor of the first coefficient, and then according to the processed scale parameter and the position parameter of the first probability distribution, sampling is performed according to the above sampling method to obtain The first estimated coefficient.
- the position parameter of the first probability distribution is processed according to the scaling factor of the first coefficient, and then according to the processed position parameter and the scale parameter of the first probability distribution, sampling is performed according to the above sampling method to obtain The first estimated coefficient.
- the second estimation coefficient can be obtained according to the second probability distribution in the above manner.
- the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient; or, the scaling factor of the first coefficient is different from the scaling factor of the second coefficient, or, when the above transformation is DCT, if the first coefficient and If the second coefficient belongs to the same frequency band, the scaling factor of the first coefficient and the scaling factor of the second coefficient are the same; or if the first coefficient and the second coefficient belong to different frequency bands, the scaling factor of the first coefficient and the scaling factor of the second coefficient
- the factors are different, that is, the scaling factors of coefficients belonging to the same frequency band are the same, and the scaling factors of coefficients belonging to different frequency bands are different;
- the scaling factor of the first coefficient and the scaling factor of the second coefficient are the same; or if the first coefficient and the second coefficient belong to different subband, the scaling factor of the first coefficient is different from the scaling factor of the second coefficient, that is, the scaling factors of coefficients belonging to the same subband are the same, and the scaling factors of coefficients belonging to different subbands are different;
- the scaling factor of the first coefficient and the scaling factor of the second coefficient are the same; or if the first coefficient and the second coefficient belong to different channel, the scaling factor of the first coefficient is different from the scaling factor of the second coefficient, that is, the scaling factors of coefficients belonging to the same channel are the same, and the scaling factors of coefficients belonging to different channels are different.
- the value range of the scaling factor is [0,1].
- reconstructed images with different properties can be obtained according to user requirements. For example, if the variance of the first probability distribution is set to 0 as the variance after processing, the reconstructed image with the best signal quality (best objective quality) can be obtained, that is, the PSNR of the image can be increased or the MSE can be reduced; by scaling multiple coefficients If the factors are set to be the same, the image with the best subjective quality can be obtained, that is, to reduce the PSNR of the image or to increase the MSE of the image; The scaling factors of the coefficients are set to be different, and images whose properties are between the best subjective quality and the best objective quality can be obtained.
- the context information of the first coefficient also includes an estimated coefficient obtained before the first estimated coefficient.
- a plurality of estimated coefficients can be obtained. If a plurality of initial pixel reconstruction values are input into the probability estimation unit 1004, then the plurality of coefficients are a plurality of reconstruction pixel values, and the plurality of reconstruction pixel values constitute a reconstructed image; Transformed pixel values, the multiple coefficients are multiple transformed pixel reconstruction values, and the multiple transformed pixel reconstruction values are input to the inverse transform unit 1010.
- the inverse transformation unit 1010 inversely transforms the reconstructed values of the multiple transformed pixels to obtain multiple reconstructed pixel values, and the reconstructed pixel values constitute the reconstructed image.
- the actions performed by the above-mentioned transformation unit 1008, probability estimation unit 1004, sampling unit 1008, and inverse transformation unit 1010 are all based on the decoding results of the decoding unit 1002; It is regarded as implemented in a common decoder and an auxiliary decoding device, wherein the common decoder realizes the function of the decoding unit 1002, and the auxiliary decoding device realizes the functions of the transformation unit 1008, the probability estimation unit 1004, the sampling unit 1008 and the inverse transformation unit 1010 .
- the probability estimation of the decoded coefficients is performed, and sampling is performed based on the probability estimation results to obtain the estimated coefficients, and the estimated coefficients obtained by re-sampling are obtained to obtain the reconstructed image .
- the sampling process is random and is an uncertain process, multiple high-quality images of different properties can be obtained by performing multiple decodings on the same compressed code stream in the above-mentioned manner. For example, the image with the best subjective quality and the image with the best objective quality.
- FIG. 11 is a flowchart showing a process 1100 of a decoding method based on an embodiment of the present application.
- Process 1100 may be performed by video decoder 30 .
- the process 1100 is described as a series of steps or operations. It should be understood that the process 1100 may be performed in various orders and/or concurrently, and is not limited to the order of execution shown in FIG. 11 .
- the encoding method includes:
- the data to be decoded may be an image, an image block, a slice, or any region of an image.
- the above multiple coefficients also include a second coefficient
- the method of the present application also includes:
- the second estimated coefficient is obtained after the first estimated coefficient.
- multiple coefficients are obtained according to the compressed code stream of the data to be decoded, including:
- the multiple coefficients are multiple quantized wavelet coefficients
- multiple coefficients are obtained according to the compressed code stream of the data to be decoded, including:
- the multiple coefficients are multiple quantized off-line cosine transform (discrete cosine transform, DCT) coefficients; or,
- multiple coefficients are obtained according to the compressed code stream of the data to be decoded, including:
- multiple coefficients are obtained according to the compressed code stream of the data to be decoded, including:
- Decoding the compressed code stream to obtain a plurality of coefficients the plurality of coefficients are reconstruction values of a plurality of initial pixels; or,
- S1102. Perform probability estimation according to the context information of the first coefficient to obtain a first probability distribution.
- the probability estimation according to the context information of the first coefficient to obtain the first probability distribution includes:
- obtaining the probability distribution model of the first coefficient processing the context information of the first coefficient through the first probability estimation network to obtain the parameters of the probability distribution model; obtaining the first probability distribution according to the probability distribution model and the parameters of the probability distribution model;
- the first probability estimation network and the second probability estimation network are implemented based on a neural network
- the context information of the first coefficient includes some or all of the coefficients, and/or, some or all of the estimated coefficients obtained by sampling all.
- the above-mentioned probability distribution model can be a Gaussian model, a Laplace model, a mixed Gaussian model or other models; when the above-mentioned probability distribution model is a Gaussian model, the parameters of the probability distribution model include mean and variance; when the above-mentioned probability When the distribution model is a Laplace model, the parameters of the probability distribution model include position parameters and size parameters.
- the aforementioned neural network may be a convolutional neural network, a deep neural network, a recurrent neural network or other neural networks.
- the above-mentioned first probability estimation network and the second probability estimation network have different structures and parameters, or the first probability estimation network and the second probability estimation network have the same structure but different parameters.
- the first probability distribution can be obtained in the above manner, so as to prepare for subsequent sampling based on the first probability distribution.
- the first probability distribution is a Gaussian distribution
- sampling is performed according to the first probability distribution to obtain the first estimated coefficients, including:
- the first random number is a uniformly distributed random number on [0,1] generated using the linear congruence method.
- the first reference value may follow a standard Gaussian distribution, an ordinary Gaussian distribution, an asymmetric Gaussian distribution, a single Gaussian model, a mixed Gaussian model, or other Gaussian distributions.
- the first estimated coefficient obtained by sampling also has randomness, so that the reconstructed image obtained based on the first estimated coefficient also has randomness, that is, uncertainty.
- the sampling process is a random process and an uncertain process; the multiple reconstructed images obtained based on the estimated coefficients obtained by performing multiple samplings in the above-mentioned manner have different properties.
- the method of the present application also includes:
- Determining a first estimated coefficient according to the first reference value and the mean value and variance of the first probability distribution including:
- the first estimation coefficient is determined according to the first reference value, the mean value of the first probability distribution and the processed variance.
- the method of the present application further includes: preprocessing the mean value of the first probability distribution according to the scaling factor of the first coefficient, so as to obtain the processed mean value;
- Determining a first estimated coefficient according to the first reference value and the mean value and variance of the first probability distribution including:
- the first estimation coefficient is determined according to the first reference value, the variance of the first probability distribution and the processed mean value.
- the variance of the first probability distribution is preprocessed to obtain the processed variance, including:
- the multiple coefficients are multiple quantized wavelet coefficients, or the multiple coefficients are multiple reconstructed wavelet coefficients, or the multiple coefficients are multiple quantized DCT coefficients, or the multiple coefficients are multiple reconstructed DCT coefficients Coefficients, or multiple coefficients are multiple characteristic coefficients, preprocessing the variance of the first probability distribution to obtain the processed variance, including:
- the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient;
- the scaling factor of the first coefficient and the scaling factor of the second coefficient are different;
- the multiple coefficients are multiple quantized wavelet coefficients or multiple reconstructed wavelet coefficients
- the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient; or if The first coefficient and the second coefficient belong to different subbands, the scaling factor of the first coefficient and the scaling factor of the second coefficient are different,
- the multiple coefficients are multiple quantized DCT coefficients or multiple reconstructed DCT coefficients
- the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient; or if The first coefficient and the second coefficient belong to different frequency bands, then the scaling factor of the first coefficient and the scaling factor of the second coefficient are different
- the scaling factor of the first coefficient and the scaling factor of the second coefficient are the same ; or if the first coefficient and the second coefficient belong to different channels, the scaling factor of the first coefficient and the scaling factor of the second coefficient are different.
- multiple coefficients are multiple initial pixel reconstruction values, or multiple pixels are multiple transformed pixel values, and the variance of the first probability distribution is preprocessed to obtain the processed variance, including:
- the scaling factor of the first coefficient is the same as the scaling factor of the second coefficient, or the scaling factor of the first coefficient is different from the scaling factor of the second coefficient.
- the multiple coefficients are multiple quantized wavelet coefficients, or the multiple coefficients are multiple reconstructed wavelet coefficients, and the reconstructed image is obtained according to the first estimated coefficient and the second estimated coefficient, including:
- Inverse wavelet transform is performed on the first estimated coefficient and the second estimated coefficient to obtain a reconstructed image.
- obtaining the reconstructed image according to the first estimated coefficient and the second estimated coefficient includes:
- inverse quantization and inverse DCT are performed on the first estimated coefficient and the second estimated coefficient to obtain a reconstructed image, or
- inverse DCT is performed on the first estimated coefficient and the second estimated coefficient to obtain a reconstructed image.
- the multiple coefficients are multiple transformed pixel values, and the reconstructed image is obtained according to the first estimated coefficient and the second estimated coefficient, including:
- Inverse transformation is performed on the first estimated coefficient and the second estimated coefficient to obtain a reconstructed image.
- the probability estimation of the decoded coefficients is performed, and sampling is performed based on the probability estimation results to obtain the estimated coefficients, and the estimated coefficients obtained by re-sampling are obtained to obtain the reconstructed image .
- the sampling process is random and is an uncertain process, multiple high-quality images of different properties can be obtained by performing multiple decodings on the same compressed code stream in the above-mentioned manner. For example, the image with the best subjective quality and the image with the best objective quality.
- Computer-readable media may include computer-readable storage media, which correspond to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (eg, according to a communication protocol) .
- a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave.
- Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this application.
- a computer program product may include a computer readable medium.
- such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can contain the desired program code in the form of a computer and can be accessed by a computer.
- any connection is properly termed a computer-readable medium.
- coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave
- coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
- Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD) and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce optically with lasers data. Combinations of the above should also be included within the scope of computer-readable media.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable logic arrays
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable logic arrays
- the techniques of the present application may be implemented in a wide variety of devices or devices, including wireless handsets, an integrated circuit (IC), or a group of ICs (eg, a chipset).
- IC integrated circuit
- a group of ICs eg, a chipset
- Various components, modules, or units are described in this application to emphasize functional aspects of means for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit in conjunction with suitable software and/or firmware, or by interoperating hardware units (comprising one or more processors as described above) to supply.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Discrete Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
本申请提供了视频图像的解码方法和装置。涉及基于人工智能(AI)的视频或图像压缩技术领域,具体涉及基于神经网络的视频压缩技术领域。该方法包括:根据待解码数据的压缩码流获得多个系数,该多个系数包括第一系数;根据第一系数的上下文信息进行概率估计得到第一概率分布;根据第一概率分布进行采样,得到第一估计系数;根据该第一估计系数得到重建图像。采用本申请的方案针对单一压缩码流进行多次解码可以得到具有不同性质的高质量图像。
Description
[根据细则91更正 07.07.2022]
本申请要求于2021年7月9日提交中国国家知识产权局、申请号为202110781958.9、发明名称为“视频图像的解码方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请要求于2021年7月9日提交中国国家知识产权局、申请号为202110781958.9、发明名称为“视频图像的解码方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及视频图像领域,尤其涉及一种视频图像的解码方法及装置。
数字图像是以数字信号方式记录的图像信息。数字图像(以下简称图像)可看作一个M行N列的二维阵列,包含M×N个采样,每个采样的位置称为采样位置,每个采样的数值称为采样值。
在图像存储、传输等应用中,通常需要对图像做编码操作,以减少存储容量和传输带宽。图像编码包括编码和解码两个步骤。典型的编码流程一般包括变换、量化和熵编码三个步骤。针对一幅待编码的图像,第一步通过变换对图像进行去相关,得到能量分布更加集中的变换系数;第二步对变换系数进行量化,得到量化系数;第三步对量化系数进行熵编码得到压缩码流。与编码操作相对应,一个典型的解码流程包括解码器在接收到压缩码流后,依次经过熵解码、反量化和反变换三个步骤,得到重建图像。
由于上述解码过程中,熵解码、反量化和反变换等一般都是确定性过程,因此针对单一的压缩码流,在多次解码时通常只能解码得到单一性质的图像。
发明内容
本申请提供一种视频图像的解码方法和装置,采用本申请的方案针对单一压缩码流进行多次解码可以得到具有不同性质的图像。并且在解码过程中可以基于用户需求调整采样时所使用的概率分布,从而提高了重建图像的质量。
上述和其它目标通过独立权利要求的主题实现。其它实现方式在从属权利要求、具体实施方式和附图中显而易见。
具体实施例在所附独立权利要求中概述,其它实施例在从属权利要求中概述。
根据第一方面,本发明涉及视频图像的解码的方法。该方法由解码装置执行。该方法包括:根据待解码数据的压缩码流,获得多个系数,该多个系数包括第一系数;根据第一系数的上下文信息进行概率估计得到第一概率分布;根据第一概率分布进行采样,以得到第一估计系数,根据第一估计系数得到重建图像。比如,该第一估计系数可以为第一系数的估计值。
可选地,待解码数据可以为图像、图像块,条带或者图像的任意区域。
在一种可能的设计中,上述多个系数还包括第二系数,本申请的方法还包括:
根据第二系数的上下文信息和/或已采样得到的估计系数,进行概率估计得到第二概率分布,其中,已采样得到的估计系数包括第一估计系数;根据所述第二概率分布进行采样,得到第二估计系数;则根据所述第一估计系数得到重建图像包括:根据第一估计系数和第二估计系数得到重建图像。
在此需要指出的是,第二估计系数是在第一估计系数之后获得的。
在每次对压缩码流解码过程中,对解码出来的系数进行概率估计,并基于概率估计结果进行采样,得到估计系数,再采样得到的估计系数得到重建图像。由于采样过程具有随机性,是一个不确定过程,因此对于同一压缩码流按照上述方式进行多次解码可以得到的多张不同性质的图像。比如主观质量最优的图像,客观质量最优的图像。
在一种可能的设计中,根据待解码数据的压缩码流,获取多个系数,包括:
对压缩码流进行熵解码得到多个系数,该多个系数为多个量化小波系数;或者,
对压缩码流进行熵解码得到多个量化小波系数;对该多个量化小波系数进行反量化,以得到多个系数,该多个系数为多个重建小波系数。
在一种可能的设计中,根据待解码数据的压缩码流,获取多个系数,包括:
对压缩码流进行熵解码,以得到多个系数,该多个系数为多个量化离线余弦变换(discrete cosine transform,DCT)系数;或者,
对压缩码流进行熵解码,以得到多个量化DCT系数;对该多个量化DCT系数进行反量化,以得到多个系数,该多个系数为多个重建DCT系数。
在一种可能的设计中,根据待解码数据的压缩码流,获取多个系数,包括:
对压缩码流进行熵解码,以得到多个系数,该多个系数为多个特征系数。
在一种可能的设计中,根据待解码数据的压缩码流,获取多个系数,包括:
对压缩码流进行解码,以得到多个系数,该多个系数为多个初始像素重建值;或者,
对压缩码流进行解码,以得到多个初始像素重建值;对多个初始像素重建值进行变换得到多个系数,该多个系数为多个变换像素值。
按照上述不同的方式获取多个系数,使得本申请的解码方法可以应用于不同的解码场景,比如针对小波域、特征域、DCT域和像素域等。
在一种可能的设计中,所述根据所述第一系数的上下文信息进行概率估计得到第一概率分布,包括:
获取第一系数的概率分布模型;将第一系数的上下文信息经过第一概率估计网络进行处理,以得到概率分布模型的参数;根据概率分布模型和该概率分布模型的参数得到第一概率分布;
或者,
将所述第一系数的上下文信息经过第二概率估计网络进行处理,以得到第一概率分布;
其中,第一概率估计网络和第二概率估计网络是基于神经网络实现的,第一系数的上下文信息包括多个系数中的部分或者全部,和/或,已采样得到的估计系数中的部分或者全部。
可选地,上述概率分布模型可以为高斯模型、拉普拉斯模型、混合高斯模型或者其他模型;当上述概率分布模型为高斯模型时,该概率分布模型的参数包括均值和方差;当上述概率分布模型为拉普拉斯模型时,该概率分布模型的参数包括位置参数和尺寸参数。
可选地,上述神经网络可以为卷积神经网络、深度神经网络、循环神经网络或者其他神经网络。
可选地,上述第一概率估计网络和第二概率估计网络的结构和参数不相同,或者第一概率估计网络和第二概率估计网络的结构相同,但参数不相同。
按照上述方式可以得到第一概率分布,从而为后续基于第一概率分布进行采样做准备。
在一种可能的设计中,第一概率分布为高斯分布,根据第一概率分布进行采样,以得到第一估计系数,包括:
获取第一随机数;根据第一随机数确定第一参考值,该第一参考值服从高斯分布;根据 第一参考值和第一概率分布的均值和方差确定第一估计系数。
其中,第一随机数是使用线性同余法生成[0,1]上的均匀分布的随机数。
可选地,第一参考值可以服从标准高斯分布、普通高斯分布、非对称高斯分布、单高斯模型、混合高斯模型或者其他高斯分布等。
由于第一随机数具有随机性,使得采样得到的第一估计系数也具有随机性,从而导致基于第一估计系数得到的重建图像也具有随机性,也就是不确定性。采样过程是一个随机过程,不确定过程;基于按照上述方式进行多次采样得到的估计系数得到的多张重建图像具有不同的性质。
在一种可能的设计中,本申请的方法还包括:
对第一概率分布的方差进行预处理,以得到处理后的方差;
根据第一参考值和第一概率分布的均值和方差确定第一估计系数,包括:
根据第一参考值、第一概率分布的均值及处理后的方差确定第一估计系数。
在一个可能的设计中,本申请的方法还包括:根据第一系数的缩放因子对第一概率分布的均值进行预处理,以得到处理后的均值;
根据第一参考值和第一概率分布的均值和方差确定第一估计系数,包括:
根据第一参考值、第一概率分布的方差及处理后的均值确定第一估计系数。
在一种可能的设计中,对第一概率分布的方差进行预处理,以得到处理后的方差,包括:
将第一概率分布的方差置0作为处理后的方差。
在一种可能的设计中,多个系数为多个量化小波系数,或者,多个系数为多个重建小波系数,或者多个系数为多个量化DCT系数,或者多个系数为多个重建DCT系数,或者多个系数为多个特征系数,对第一概率分布的方差进行预处理,以得到处理后的方差,包括:
根据第一系数的缩放因子对第一概率分布的方差进行预处理,以得到处理后的方差。
类似地,还可以根据第二系数的缩放因子对第二概率分布的方差进行预处理,其中
第一系数的缩放因子和第二系数的缩放因子相同;或者,
第一系数的缩放因子和第二系数的缩放因子不同;或者
在多个系数为多个量化小波系数或者为多个重建小波系数时,若第一系数和第二系数属于同一个子带,则第一系数的缩放因子和第二系数的缩放因子相同;或者若第一系数和第二系数属于不同子带,则第一系数的缩放因子和第二系数的缩放因子不同,
或者,
在多个系数为多个量化DCT系数或者为多个重建DCT系数时,若第一系数和第二系数属于同一个频带,则第一系数的缩放因子和第二系数的缩放因子相同;或者若第一系数和第二系数属于不同频带,则第一系数的缩放因子和第二系数的缩放因子不同,
或者,
在多个系数为多个量化特征系数,或者多个系数为多个重建特征系数时,若第一系数和第二系数属于同一通道,则第一系数的缩放因子和第二系数的缩放因子相同;或者若第一系数和第二系数属于不同通道,则第一系数的缩放因子和第二系数的缩放因子不同。
其中,若
在一种可能的设计中,多个系数为多个初始像素重建值,或者多个像素为多个变换像素值,对第一概率分布的方差进行预处理,以得到处理后的方差,包括:
根据第一系数的缩放因子对第一概率分布的方差进行预处理,以得到处理后的方差,
其中,第一系数的缩放因子和第二系数的缩放因子相同,或者第一系数的缩放因子和第 二系数的缩放因子不相同。
通过对第一概率分布进行预处理,可以按照用户的需求得到不同性质的重建图像,提高了重建图像的质量。比如将第一概率分布的方差置0作为处理后的方差,可以得到信号质量最佳(客观质量最佳)的重建图像,也就是增大图像的峰值信噪比(peak signal to noise ratio,PSNR)或者降低均方误差(mean-square error,MSE);通过将多个系数的缩放因子设置为相同,可以得到主观质量最佳的图像,也即是降低图像的PSNR或者增大图像的MSE;通过将图像中属于同于部分的系数的缩放因子设置为相同,将属于不同部分的系数的缩放因子设置为不相同,可以得到性质在主观质量最佳和客观质量最佳之间的图像。
在一种可能的设计中,多个系数为多个量化小波系数,或者,多个系数为多个重建小波系数,根据第一估计系数和第二估计系数得到重建图像,包括:
对第一估计系数和第二估计系数进行小波反变换,以得到重建图像。
在一种可能的设计中,根据第一估计系数和第二估计系数得到重建图像,包括:
当多个系数为多个量化DCT系数时,对第一估计系数和第二估计系数进行反量化和反DCT,以得到重建图像,或者,
当多个系数为多个重建DCT系数时,对第一估计系数和第二估计系数进行反DCT,以得到重建图像。
在一种可能的设计中,多个系数为多个变换像素值,根据第一估计系数和第二估计系数得到重建图像,包括:
对第一估计系数和第二估计系数进行反变换,以得到重建图像。
由于采样过程具有随机性,本申请的中可重复进行采样步骤,以得到多张重建图像。多张重建图像可以是主观质量最优的重建图像,也可以是客观质量最优的重建图像。重建图像可用于编解码环路内作为帧内或帧间预测的参考;也可以用于编解码环路外,作为后处理的方式优化图像质量。例如:通过采样步骤和反变换步骤得到多张重建图像后,主观质量最优的重建图像放入图像缓存区(decoded picture buffer,DPB)中或参考帧集合中,用于编解码环路内帧内或帧间预测的参考图像;客观质量最优的重建图像用于后处理,对编解码后的重建图像进行主观质量的调整,提升压缩重建后的图像/视频质量。
根据第二方面,本发明涉及解码压缩码流的装置,有益效果可以参见第一方面的描述此处不再赘述。所述解码装置具有实现上述第一方面的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。
本发明第一方面所述的方法可由本发明第二方面所述的装置执行。本发明第一方面所述的方法的其它特征和实现方式直接取决于本发明第二方面所述的装置的功能性和实现方式。
根据第三方面,本发明涉及解码视频流的装置,包含处理器和存储器。所述存储器存储指令,所述指令使得所述处理器执行第一方面所述的方法。
根据第四方面,提供一种计算机可读存储介质,其上储存有指令,当所述指令执行时,使得一个或多个处理器编码视频数据。所述指令使得所述一个或多个处理器执行第一方面中任一种可能的实施例中的方法。
根据第五方面,本发明涉及包括程序代码的计算机程序产品,所述程序代码在运行时执行第一方面中任意一种可能的实施例中的方法。
附图及以下说明中将详细描述一个或多个实施例。其它特征、目的和优点在说明、附图以及权利要求中是显而易见的。
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以基于这些附图获得其他的附图。
图1为用于实现本申请实施例的视频译码系统示例的框图;
图2为用于实现本申请实施例的视频译码系统另一示例的框图;
图3为用于实现本申请实施例的视频译码装置的示意性框图;
图4为用于实现本申请实施例的视频译码装置的示意性框图;
图5为本申请实施例提供的一种视频编码器和解码器的结构示意图;
图6a为一次小波变换后的结果示意图;
图6b为小波变换的处理流程示意图;
图6c为图6b中用于预测和更新的深度网络的结构示意图;
图6d为本申请实施例提供的一种概率估计网络的结构示意图;
图6e为小波反变换的处理流程示意图;
图7为本申请实施例提供的一种模型训练示意图;
图8a为本申请实施例提供的另一种视频解码器的结构示意图;
图8b为本申请实施例提供的另一种视频解码器的结构示意图;
图9为本申请实施例提供的另一种视频解码器的结构示意图;
图10a为本申请实施例提供的另一种视频解码器的结构示意图;
图10b为本申请实施例提供的另一种视频解码器的结构示意图;
图11为本申请实施例提供的一种解码流程示意图。
本申请实施例提供一种基于AI的视频图像压缩技术,尤其是提供一种基于神经网络的视频压缩技术,具体提供一种基于概率分布和采样的解码方法,以改进传统的混合视频编解码系统。
视频编码通常是指处理形成视频或视频序列的图像序列。在视频编码领域,术语“图像(picture)”、“帧(frame)”或“图片(image)”可以用作同义词。视频编码(或通常称为编码)包括视频编码和视频解码两部分。视频编码在源侧执行,通常包括处理(例如,压缩)原始视频图像以减少表示该视频图像所需的数据量(从而更高效存储和/或传输)。视频解码在目的地侧执行,通常包括相对于编码器作逆处理,以重建视频图像。实施例涉及的视频图像(或通常称为图像)的“编码”应理解为视频图像或视频序列的“编码”或“解码”。编码部分和解码部分也合称为编解码(编码和解码,CODEC)。
在无损视频编码情况下,可以重建原始视频图像,即重建的视频图像与原始视频图像具有相同的质量(假设存储或传输期间没有传输损耗或其它数据丢失)。在有损视频编码情况下,通过量化等执行进一步压缩,来减少表示视频图像所需的数据量,而解码器侧无法完全重建视频图像,即重建的视频图像的质量比原始视频图像的质量较低或较差。
由于本申请实施例涉及神经网络的应用,为了便于理解,下面先对本申请实施例所使用到的一些名词或术语进行解释说明,该名词或术语也作为发明内容的一部分。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
其中,
是输入向量,
是输出向量,
是偏移向量,W是权重矩阵(也称系数),a()是激活函数。每一层仅仅是对输入向量
经过如此简单的操作得到输出向量
由于DNN层数多,系数W和偏移向量
的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(3)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(4)循环神经网络(recurrent neural networks,RNN)是用来处理序列数据的。在传统的神经网络模型中,是从输入层到隐含层再到输出层,层与层之间是全连接的,而对于每一层层内之间的各个节点是无连接的。这种普通的神经网络虽然解决了很多难题,但是却仍然对很多问题无能无力。例如,你要预测句子的下一个单词是什么,一般需要用到前面的单词,因为一个句子中前后单词并不是独立的。RNN之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐含层本层之间的节点不再无连接而是有连接的,并且隐含层的输入不仅包括输入层的输出还包括上一时刻隐含层的输出。理论上,RNN能够对任何长度的序列数据进行处理。对于RNN的训练和对传统的CNN或DNN的训练一样。RNN旨在让机器像人一样拥有记忆的能力。因此,RNN的输出就需要依赖当前的输入信息和历史的记忆信息。
(5)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再基于两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
(6)反向传播算法
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。
在以下译码系统10的实施例中,编码器20和解码器30根据图1至图3进行描述。
图1为示例性译码系统10的示意性框图,例如可以利用本申请技术的视频译码系统10(或简称为译码系统10)。视频译码系统10中的视频编码器20(或简称为编码器20)和视频解码器30(或简称为解码器30)代表可用于根据本申请中描述的各种示例执行各技术的设备等。
如图1所示,译码系统10包括源设备12,源设备12用于将编码图像等编码图像数据21提供给用于对编码图像数据21进行解码的目的设备14。
源设备12包括编码器20,另外即可选地,可包括图像源16、图像预处理器等预处理器(或预处理单元)18、通信接口(或通信单元)22。
图像源16可包括或可以为任意类型的用于捕获现实世界图像等的图像捕获设备,和/或任意类型的图像生成设备,例如用于生成计算机动画图像的计算机图形处理器或任意类型的用于获取和/或提供现实世界图像、计算机生成图像(例如,屏幕内容、虚拟现实(virtual reality,VR)图像和/或其任意组合(例如增强现实(augmented reality,AR)图像)的设备。所述图像源可以为存储上述图像中的任意图像的任意类型的内存或存储器。
为了区分预处理器(或预处理单元)18执行的处理,图像(或图像数据)17也可称为原始图像(或原始图像数据)17。
预处理器18用于接收(原始)图像数据17,并对图像数据17进行预处理,得到预处理图像(或预处理图像数据)19。例如,预处理器18执行的预处理可包括修剪、颜色格式转换(例如从RGB转换为YCbCr)、调色或去噪。可以理解的是,预处理单元18可以为可选组件。
视频编码器(或编码器)20用于接收预处理图像数据19并提供编码图像数据21(下面将根据图2等进一步描述)。
源设备12中的通信接口22可用于:接收编码图像数据21并通过通信信道13向目的设备14等另一设备或任何其它设备发送编码图像数据21(或其它任意处理后的版本),以便存储或直接重建。
目的设备14包括解码器30,另外即可选地,可包括通信接口(或通信单元)28、后处理器(或后处理单元)32和显示设备34。
目的设备14中的通信接口28用于直接从源设备12或从存储设备等任意其它源设备接收编码图像数据21(或其它任意处理后的版本),例如,存储设备为编码图像数据存储设备,并将编码图像数据21提供给解码器30。
通信接口22和通信接口28可用于通过源设备12与目的设备14之间的直连通信链路,例如直接有线或无线连接等,或者通过任意类型的网络,例如有线网络、无线网络或其任意组合、任意类型的私网和公网或其任意类型的组合,发送或接收编码图像数据(或编码数据)21。
例如,通信接口22可用于将编码图像数据21封装为报文等合适的格式,和/或使用任意类型的传输编码或处理来处理所述编码后的图像数据,以便在通信链路或通信网络上进行传输。
通信接口28与通信接口22对应,例如,可用于接收传输数据,并使用任意类型的对应传输解码或处理和/或解封装对传输数据进行处理,得到编码图像数据21。
通信接口22和通信接口28均可配置为如图1中从源设备12指向目的设备14的对应通信信道13的箭头所指示的单向通信接口,或双向通信接口,并且可用于发送和接收消息等,以建立连接,确认并交换与通信链路和/或例如编码后的图像数据传输等数据传输相关的任何其它信息,等等。
视频解码器(或解码器)30用于接收编码图像数据21并提供解码图像数据(或解码图像数据)31(下面将根据图3等进一步描述)。
后处理器32用于对解码后的图像等解码图像数据31(也称为重建后的图像数据)进行后处理,得到后处理后的图像等后处理图像数据33。后处理单元32执行的后处理可以包括例如颜色格式转换(例如从YCbCr转换为RGB)、调色、修剪或重采样,或者用于产生供显示设备34等显示的解码图像数据31等任何其它处理。
显示设备34用于接收后处理图像数据33,以向用户或观看者等显示图像。显示设备34可以为或包括任意类型的用于表示重建后图像的显示器,例如,集成或外部显示屏或显示器。例如,显示屏可包括液晶显示器(liquid crystal display,LCD)、有机发光二极管(organic light emitting diode,OLED)显示器、等离子显示器、投影仪、微型LED显示器、硅基液晶显示器(liquid crystal on silicon,LCoS)、数字光处理器(digital light processor,DLP)或任意类型的其它显示屏。
译码系统10还包括训练引擎25,训练引擎25所实现的具体训练过程详见后续描述,在此不再叙述。
尽管图1示出了源设备12和目的设备14作为独立的设备,但设备实施例也可以同时包括源设备12和目的设备14或同时包括源设备12和目的设备14的功能,即同时包括源设备12或对应功能和目的设备14或对应功能。在这些实施例中,源设备12或对应功能和目的设备14或对应功能可以使用相同硬件和/或软件或通过单独的硬件和/或软件或其任意组合来实现。
根据描述,图1所示的源设备12和/或目的设备14中的不同单元或功能的存在和(准确)划分可能根据实际设备和应用而有所不同,这对技术人员来说是显而易见的。
编码器20(例如视频编码器20)或解码器30(例如视频解码器30)或两者都可通过如图2所示的处理电路实现,例如一个或多个微处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件、视频编码专用处理器或其任意组合。编码器20可以通过处理电路46实现,以包含参照图2编码器20论述的各种模块和/或本文描述的任何其它编码器系统或子系统。解码器30可以通过处理电路46实现,以包含参照图3解码器30论述的各种模块和/或本文描述的任何其它解码器系统或子系统。所述处理电路46可用于执行下文论述的各种操作。如图4所示,如果部分技术在软件中实施,则设备可以将软件的指令存储在合适的非瞬时性计算机可读存储介质中,并且使用一个或多个处理器在硬件中执行指令,从而执行本发明技术。视频编码器20和视频解码器30中的其中一个可作为组合编解码器(encoder/decoder,CODEC)的一部分集成在单个设备中,如图2所示。
源设备12和目的设备14可包括各种设备中的任一种,包括任意类型的手持设备或固定设备,例如,笔记本电脑或膝上型电脑、手机、智能手机、平板或平板电脑、相机、台式计算机、机顶盒、电视机、显示设备、数字媒体播放器、视频游戏控制台、视频流设备(例如,内容业务服务器或内容分发服务器)、广播接收设备、广播发射设备,等等,并可以不使用或使用任意类型的操作系统。在一些情况下,源设备12和目的设备14可配备用于无线通信的组件。因此,源设备12和目的设备14可以是无线通信设备。
在一些情况下,图1所示的视频译码系统10仅仅是示例性的,本申请提供的技术可适用于视频编码设置(例如,视频编码或视频解码),这些设置不一定包括编码设备与解码设备之间的任何数据通信。在其它示例中,数据从本地存储器中检索,通过网络发送,等等。视频编码设备可以对数据进行编码并将数据存储到存储器中,和/或视频解码设备可以从存储器中检索数据并对数据进行解码。在一些示例中,编码和解码由相互不通信而只是编码数据到存储器和/或从存储器中检索并解码数据的设备来执行。
图2是根据一示例性实施例的包含图2的视频编码器20和/或图3的视频解码器30的视频译码系统40的实例的说明图。视频译码系统40可以包含成像设备41、视频编码器20、视频解码器30(和/或藉由处理电路46实施的视频编/解码器)、天线42、一个或多个处理器43、一个或多个内存存储器44和/或显示设备45。
如图2所示,成像设备41、天线42、处理电路46、视频编码器20、视频解码器30、处理器43、内存存储器44和/或显示设备45能够互相通信。在不同实例中,视频译码系统40可以只包含视频编码器20或只包含视频解码器30。
在一些实例中,天线42可以用于传输或接收视频数据的经编码比特流。另外,在一些实例中,显示设备45可以用于呈现视频数据。处理电路46可以包含专用集成电路(application-specific integrated circuit,ASIC)逻辑、图形处理器、通用处理器等。视频译码系统40也可以包含可选的处理器43,该可选处理器43类似地可以包含专用集成电路(application-specific integrated circuit,ASIC)逻辑、图形处理器、通用处理器等。另外,内存存储器44可以是任何类型的存储器,例如易失性存储器(例如,静态随机存取存储器(static random access memory,SRAM)、动态随机存储器(dynamic random access memory,DRAM)等)或非易失性存储器(例如,闪存等)等。在非限制性实例中,内存存储器44可以由超速缓存内存实施。在其它实例中,处理电路46可以包含存储器(例如,缓存等)用于实施图像缓冲器等。
在一些实例中,通过逻辑电路实施的视频编码器20可以包含(例如,通过处理电路46或内存存储器44实施的)图像缓冲器和(例如,通过处理电路46实施的)图形处理单元。图形处理单元可以通信耦合至图像缓冲器。图形处理单元可以包含通过处理电路46实施的视频编码器20,以实施参照图2和/或本文中所描述的任何其它编码器系统或子系统所论述的各种模块。逻辑电路可以用于执行本文所论述的各种操作。
在一些实例中,视频解码器30可以以类似方式通过处理电路46实施,以实施参照图3的视频解码器30和/或本文中所描述的任何其它解码器系统或子系统所论述的各种模块。在一些实例中,逻辑电路实施的视频解码器30可以包含(通过处理电路46或内存存储器44实施的)图像缓冲器和(例如,通过处理电路46实施的)图形处理单元。图形处理单元可以通信耦合至图像缓冲器。图形处理单元可以包含通过处理电路46实施的视频解码器30,以实施参照图3和/或本文中所描述的任何其它解码器系统或子系统所论述的各种模块。
在一些实例中,天线42可以用于接收视频数据的经编码比特流。如所论述,经编码比特流可以包含本文所论述的与编码视频帧相关的数据、指示符、索引值、模式选择数据等,例如与编码分割相关的数据(例如,变换系数或经量化变换系数,(如所论述的)可选指示符,和/或定义编码分割的数据)。视频译码系统40还可包含耦合至天线42并用于解码经编码比特流的视频解码器30。显示设备45用于呈现视频帧。
应理解,本申请实施例中对于参考视频编码器20所描述的实例,视频解码器30可以用于执行相反过程。关于信令语法元素,视频解码器30可以用于接收并解析这种语法元素,相应地解码相关视频数据。在一些例子中,视频编码器20可以将语法元素熵编码成经编码视频比特流。在此类实例中,视频解码器30可以解析这种语法元素,并相应地解码相关视频数据。
为便于描述,参考通用视频编码(Versatile video coding,VVC)参考软件或由ITU-T视频编码专家组(Video Coding Experts Group,VCEG)和ISO/IEC运动图像专家组(Motion Picture Experts Group,MPEG)的视频编码联合工作组(Joint Collaboration Team on Video Coding,JCT-VC)开发的高性能视频编码(High-Efficiency Video Coding,HEVC)描述本发明实施例。本领域普通技术人员理解本发明实施例不限于HEVC或VVC。
图3为本发明实施例提供的视频译码设备300的示意图。视频译码设备300适用于实现本文描述的公开实施例。在一个实施例中,视频译码设备300可以是解码器,例如图1中的视频解码器30,也可以是编码器,例如图1中的视频编码器20。
视频译码设备300包括:用于接收数据的入端口310(或输入端口310)和接收单元(receiver unit,Rx)320;用于处理数据的处理器、逻辑单元或中央处理器(central processing unit,CPU)330;例如,这里的处理器330可以是神经网络处理器330;用于传输数据的发送单元(transmitter unit,Tx)340和出端口350(或输出端口350);用于存储数据的存储器360。视频译码设备400还可包括耦合到入端口310、接收单元320、发送单元340和出端口350的光电(optical-to-electrical,OE)组件和电光(electrical-to-optical,EO)组件,用于光信号或电信号的出口或入口。
处理器330通过硬件和软件实现。处理器330可实现为一个或多个处理器芯片、核(例如,多核处理器)、FPGA、ASIC和DSP。处理器330与入端口310、接收单元320、发送单元340、出端口350和存储器360通信。处理器330包括译码模块370(例如,基于神经网络NN的译码模块370)。译码模块370实施上文所公开的实施例。例如,译码模块370执行、处理、准备或提供各种编码操作。因此,通过译码模块370为视频译码设备300的功能提供了实质性的改进,并且影响了视频译码设备300到不同状态的切换。或者,以存储在存储器360中并由处理器330执行的指令来实现译码模块370。
存储器360包括一个或多个磁盘、磁带机和固态硬盘,可以用作溢出数据存储设备,用于在选择执行程序时存储此类程序,并且存储在程序执行过程中读取的指令和数据。存储器360可以是易失性和/或非易失性的,可以是只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、三态内容寻址存储器(ternary content-addressable memory,TCAM)和/或静态随机存取存储器(static random-access memory,SRAM)。
图4为示例性实施例提供的装置400的简化框图,装置400可用作图1中的源设备12和目的设备14中的任一个或两个。
装置400中的处理器402可以是中央处理器。或者,处理器402可以是现有的或今后将研发出的能够操控或处理信息的任何其它类型设备或多个设备。虽然可以使用如图所示的处理器402等单个处理器来实施已公开的实现方式,但使用一个以上的处理器速度更快和效率更高。
在一种实现方式中,装置400中的存储器404可以是只读存储器(ROM)设备或随机存取存储器(RAM)设备。任何其它合适类型的存储设备都可以用作存储器404。存储器404可以包括处理器402通过总线412访问的代码和数据406。存储器404还可包括操作系统408和应用程序410,应用程序410包括允许处理器402执行本文所述方法的至少一个程序。例如,应用程序410可以包括应用1至N,还包括执行本文所述方法的视频译码应用。
装置400还可以包括一个或多个输出设备,例如显示器418。在一个示例中,显示器418可以是将显示器与可用于感测触摸输入的触敏元件组合的触敏显示器。显示器418可以通过总线412耦合到处理器402。
虽然装置400中的总线412在本文中描述为单个总线,但是总线412可以包括多个总线。此外,辅助储存器可以直接耦合到装置400的其它组件或通过网络访问,并且可以包括存储卡等单个集成单元或多个存储卡等多个单元。因此,装置400可以具有各种各样的配置。
小波系数的编解码器和编解码方法
图5为用于实现本申请技术的一种视频编码器和解码器的示例的示意性框图。在图5的示例中,视频编码器20包括小波变换单元202、量化204和熵编码单元206。视频解码器30包括熵解码单元208、概率估计单元212、采样单元214和小波反变换单元216,可选地,视频解码器30还包括反量化单元210;图5所示的视频编解码器,也可称为端到端的视频编解 码器或者基于端到端视频编解码器的视频编解码器。
小波变换单元202
小波变换单元202对待编码数据201进行N次小波变换,以得到3N+1个子带203,其中,每个子带包含一个或多个小波系数。
可选地,待编码数据201可以为YUV444格式的图像,三个通道分别进行处理,不利用通道间的相关性。本实施例是基于单通道信号来描述的。可以理解,本实施例的方案可以扩展为多通道联合处理的方法。
对待编码数据201进行N次小波变换可以理解为对图像块或者对一个图像区域进行N次小波变换,在此不做限定。其中,图像区域可以是一幅图像、子图像、条带(slice)、片(patch)等,在此不做限定。具体的,可以使用现有编码标准中的基于四叉树的划分方法对图像区域进行划分,或者把图像或者图像区域划分为相同大小的图像块(例如平均划分成8x8大小的图像块)。本申请以一次小波变换为例,即N=1,后续不再赘述。对待编码数据201进行一次小波变换,得到如图6a所示的4个二维子带LL1,HL1,LH1,HH1,其中每个子带包含一个或者多个小波系数。其中LL1被称为近似子带,是待编码数据201的低分辨率近似;HL1,LH1,HH1被称为细节子带,包含了待编码数据201的高频信息。
可选地,小波变换单元202可以采用传统的小波变换或者基于深度神经网络的小波变换或者其他类似的变换方法对待编码数据201进行小波变换,在此不做具体限定。
对于基于深度神经网络的小波变换,可以基于如图6b所示的流程图执行小波变换。图6b以一维信号为例,描述了小波变换的流程:首先将输入信号进行采样分解,一般是奇、偶分解,得到两路采样信号,然后两路采样信号之间经过相互的预测和更新步骤,最终得到两路分解结果,分别称为近似分量和细节分量。其中,预测和更新步骤可以交替多次进行以得到最终的分解结果,并不局限于图6b中所展示的两次。预测和更新是基于深度网络实现的。图6b中的a和b表示缩放参数,用于平衡预测和提升步骤之后不同分量的能量。
对二维图像执行一次小波变换,需要在行和列方向分别使用图6b的方式进行一次一维小波变换,组合得到一次二维形式的小波变换,得到四个子带。具体的,对宽为m和高n的输入图像经过一次小波变换后,将得到如图6a所示的宽为m/2和高为n/2的四个二维子带。
对于图6a中的“基于深度网络的预测”和“基于深度网络的更新”,均可使用图6c中的左图所示的深度网络来结构实现,但“基于深度网络的预测”和“基于深度网络的更新”所使用的网络参数不同。图6c中的左图中,“H×W”表示当前卷积层使用尺寸为H×W的卷积核,“ResB”表示残差模块,如图6c中的右图所示,rule表示激活函数。当然,“基于深度网络的预测”和“基于深度网络的更新”还可以使用其他的神经网络结构实现,在此不做限定。网络参数内置于编解码器中,不需要传输。
量化单元204
量化单元204对小波变换后得到的子带内的小波系数进行量化,以得到量化小波系数205。
具体地,在对每个小波系数进行量化时,可以按照预置次序一处理每个子带,然后再按照预置次序二对当前子带内的小波系数进行量化得到量化小波系数,其中,预置次序一可以现有的Z字扫描顺序,例如:LL1→HL1→LH1→HH1。预置次序二可以为现有的Z字扫描顺序、水平扫描顺序或者竖直扫描顺序。
应理解,上述预置次序一和预置次序二只是一个示例,不是对申请的限定,当然还可以是其他顺序。
其中QP表示量化步长,[·]表示四舍五入取整。
可选地,在对小波系数进行量化之前,可以对小波系数进行预处理,得到处理后的小波系数,再对预处理后的小波系数进行量化操作,例如:对得到的小波系数经过一个神经网络进行特征提取,再对特征提取结果进行量化。在量化前对小波系数进行处理,可以使得解码器能够解码出高质量的重建图像。
熵编码单元206
熵编码单元206对量化小波系数205进行熵编码,得到压缩码流217。
具体地,在对每个量化小波系数205进行熵编码时,可以按照预置次序一处理每个子带,然后再按照预置次序二对子带内的量化小波系数205进行熵编码得到压缩码流。
对每个量化小波系数205(为描述方便,实施例中称为系数)进行熵编码,包括:对每个系数进行概率估计得到系数的概率分布,然后根据该系数的概率分布对该系数进行熵编码。
可以采用如下方式确定系数的概率分布:
获取当前系数的概率分布模型进行建模,然后将当前系数的上下文信息输入概率估计网络进行处理,以得到上述概率分布模型的参数,将该概率分布模型的参数代入概率分布模型中,得到当前系数的概率分布。
可选地,概率分布模型可以是:单高斯模型(Gaussian single model,GSM)、非对称高斯模型、混合高斯模型(Gaussian mixture model,GMM)或者拉普拉斯分布模型(Laplace distribution)。其中,概率估计网络可以基于深度学习网络实现,例如循环神经网络(recurrent neural network,RNN)和逐像素卷积神经网络(Pixel convolutional neural network,PixelCNN)等,在此不做限定。
作为示例,当概率分布模型为高斯模型(单高斯模型或者非对称高斯模型或者混合高斯模型)时,将当前系数的上下文信息输入概率估计网络进行处理,以得到的高斯模型的参数,包括均值μ和方差σ;将所述均值μ和方差σ输入所使用的概率分布模型中,得到当前系数的概率分布。
作为示例,当概率分布模型为拉普拉斯分布模型时,将当前系数的上下文信息输入概率估计网络进行处理,得到拉普拉斯分布模型的参数,包括位置参数μ和尺度参数b;将该位置参数μ和尺度参数b带入概率分布模型中,得到当前系数的概率分布。
作为示例,一个典型的基于PixelCNN的概率估计网络如图6d所示。“H×W”表示当前卷积层使用尺寸为H×W的卷积核,“ResB”表示残差模块(参考图6c中的右图),“*/relu”表示在当前层之后使用relu激活函数。
需要说明的是,上述当前系数的上下文信息包括:预置区域内已编码的系数,该预置区域包括当前系数所在的子带内的区域,或者当前系数所在子带外的区域,在此不做限定。以图6a为例,在当前系数为子带LL1内的系数时,可以使用子带LL1内某个区域内已编码的系数作为当前系数的上下文信息。在当前系数为子带HL1内的系数时,可以使用子带LL1内或者HL1内某个区域内已编码的系数作为当前系数的上下文信息。
熵解码单元208
熵解码单元208对压缩码流207进行熵解码得到多个量化小波系数209。
对应于熵编码单元206所执行的熵编码过程,在处理压缩码流207中每个小波系数时, 可以按照预置次序一处理每个子带,然后再按照预置次序二对当前子带内的小波系数对应的码流进行熵解码得到量化小波系数209。预置次序一和预置次序二可以与编码端相同,在此不作限定。
在对每个小波系数对应的码流进行熵解码时,首先,对每个小波系数进行概率估计得到系数的概率分布,然后根据概率分布对该小波系数对应的码流进行熵解码得到量化小波系数209。其中,对每个系数进行概率估计得到系数的概率分布的方法与编码端相同,在此不再赘述。
反量化单元210
反量化单元210对多个量化小波系数209进行反量化,得到多个重建小波系数211。
具体的,在对每个量化小波系数209进行反量化时,可以按照预置次序一处理每个子带,然后再按照预置次序二对当前子带内的量化小波系数209进行反量化得到重建小波系数211;具体地,对量化小波系数209乘以相应的量化步长,得到重建小波系数211。其中,量化步长可以是QP。预置次序一和预置次序二可以与编码端相同,在此不作限定。
在此需要指出的,对于视频解码器30来说,反量化单元210是可选的,因此在图5中以虚线表示。
概率估计单元212
对于概率估计单元212来说,输入的数据可以为多个量化小波系数,也可以为多个重建小波系数,为了方便描述,将输入到概率估计单元212中的数据称为多个系数。以多个系数中的第一系数和第二系数为例来说明概率估计单元212的功能。
概率估计单元212根据第一系数的上下文信息进行概率估计,得到第一概率分布213;根据第二系数的上下文信息和/或采样得到的估计系数的进行概率估计,得到第二概率分布213,其中,该已采样得到的估计系数包括第一估计系数,第一估计系数是在第二估计系数之前获得的。
具体地,概率估计单元212根据第一系数的上下文信息进行概率估计,得到第一概率分布213,包括:
获取第一系数的概率分布模型;将第一系数的上下文信息经过第一概率估计网络进行处理,以得到概率分布模型的参数;根据概率分布模型和该概率分布模型的参数得到第一概率分布;
或者,
将所述第一系数的上下文信息经过第二概率估计网络进行处理,以得到第一概率分布;
其中,第一概率估计网络和第二概率估计网络是基于神经网络实现的,第一系数的上下文信息包括多个系数中的部分或者全部,和/或,已采样得到的估计系数中的部分或者全部。
进一步地,在输入到概率估计单元212中的数据为量化小波系数时,对于第一系数的上下文信息,可以包括第一区域内的量化小波系数和第二区域中的估计系数,其中,第一区域为量化小波系数图中第一系数所在子带内任意区域,第二区域为估计系数图中第一系数所在子带内的任意区域;在输入到概率估计单元212中的数据为重建小波系数时,对于第一系数的上下文信息,可以包括第一区域内的重建小波系数和第二区域中的估计系数,其中,第一区域为重建小波系数图中第一系数所在子带内任意区域,第二区域为估计系数图中第一系数所在子带内的任意区域。
应理解,上述量化小波系数图为由上述多个量化小波系数构成的图像,上述重建小波系数图为由上述多个重建小波系数构成的图像。估计系数图由多个已采样得到估计系数构成的 图像。
对于第二区域,以图6a为了进行说明,当第一系数在子带LL1内,第二区域可以为子带LL1内的任意区域;当第一系数为HL1内时,第二区域为子带LL1内或者HL1内的任意区域。
对于第二系数,可以按照上述方式确定第二概率分布,也可以按照以下方式确定第二概率分布:
概率估计单元212根据第二系数的上下文信息和/或已采样得到的估计系数进行概率估计,得到第二概率分布213,这里的已采样得到的估计系数包括第一估计系数,也就是说在进行概率估计得到第二概率分布时,输入到下面第三概率估计网络或第四概率估计网络中的数据包括第一估计系数。
具体地,获取第二系数的概率分布模型;将第二系数的上下文信息和/或已采样得到的估计系数经过第三概率估计网络进行处理,以得到概率分布模型的参数;根据概率分布模型和该概率分布模型的参数得到第二概率分布;
或者,
将所述第一系数的上下文信息经过第四概率估计网络进行处理,以得到第二概率分布;
其中,第三概率估计网络和第四概率估计网络是基于神经网络实现的,第二系数的上下文信息包括多个系数中的部分或者全部,和/或,已采样得到的估计系数中的部分或者全部。
需要指出的是,第二系数的上下文信息可以参见上述第一系数的上下文信息的相关描述,在此不再叙述。
对于上述概率分布模型,可以参见熵编码单元206中的相关描述,在此不再叙述。上述第一概率估计网络、第二概率估计网络、第三概率估计网络和第四概率估计网络的网络结构具体可以参见图6d所示的网络结构,在此不作限定,当然还可以是其他形式的网络结构。
需要指出的是,第一概率分布和第二概率分布均是由概率估计单元212输出的,因此用同一标识213来标记。
采样单元214
根据第一概率分布213进行采样,得到第一估计系数215;根据第二概率分布213采样,得到第二估计系数215。由于两者采样过程一致,下面以第一概率分布213为高斯分布来举例说明如何根据第一概率分布213进行采样得到第一估计系数215。
令z
2=δ·z
1+μ,则z
2服从均值为μ,方差为δ的高斯分布,z
2即为上述第一估计系数215,其中,δ和μ分别为上述第一概率分布213的方差和均值。
可选地,在进行采样之前,对第一概率分布213的方差进行处理,具体处理过程包括:将第一概率分布213的方差置为0作为处理后的方差;然后再根据处理后的方差和第一概率分布213的均值按照上述采样方式进行采样得到第一估计系数215。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布213的方差进行处理,然后再根据处理后的方差和第一概率分布213的均值按照上述采样方式进行采样得到第 一估计系数215。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布213的均值进行处理,然后再根据处理后的均值和第一概率分布213的方差按照上述采样方式进行采样得到第一估计系数215。
在一个示例中,当第一概率分布213为拉普拉斯分布时,根据第一概率分布213进行采样得到第一估计系数215,包括:
生成两个均匀分布的随机数μ
1和μ
2,令z
3=b·log(μ
1),z
4=b·log(μ
2),第一估计系数为z
5=z
3-z
4+μ,其中,μ和b分别为第一概率分布的位置参数和尺度参数。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布213的尺度参数进行处理,然后再根据处理后的尺度参数和第一概率分布213的位置参数按照上述采样方式进行采样得到第一估计系数215。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布213的位置参数进行处理,然后再根据处理后的位置参数和第一概率分布213的尺度参数按照上述采样方式进行采样得到第一估计系数215。
应理解,可以按照上述方式根据第二概率分布213得到第二估计系数215。
可选地,第一系数的缩放因子和第二系数的缩放因子相同;或者,第一系数的缩放因子和第二系数的缩放因子不同;或者,若第一系数和第二系数属于同一个子带,则第一系数的缩放因子和第二系数的缩放因子相同;或者若第一系数和第二系数属于不同子带,则第一系数的缩放因子和第二系数的缩放因子不同,也就是说,属于同一子带的系数的缩放因子相同,属于不同子带的系数的缩放因子不相同。
通过对第一概率分布进行预处理,可以按照用户的需求得到不同性质的重建图像。比如将第一概率分布的方差置0作为处理后的方差,可以得到信号质量最佳(客观质量最佳)的重建图像,也就是增大图像的PSNR或者降低MSE;通过将多个系数的缩放因子设置为相同,可以得到主观质量最佳的图像,也即是降低图像的PSNR或者增大图像的MSE;通过将图像中属于同于部分的系数的缩放因子设置为相同,将属于不同部分的系数的缩放因子设置为不相同,可以得到性质在主观质量最佳和客观质量最佳之间的图像。
需要指出的是,第一估计系数215和第二估计系数215均是由概率估计单元212输出的,因此用同一标识“215”来标记。
需要指出的是,若第一系数不是多个系数中的第一系数,第一系数的上下文信息还包括在第一估计系数之前得到的估计系数。
小波反变换单元216
小波反变换单元216对多个估计系数(包括第一估计系数和第二估计系数)进行小波反变换,得到重建图像217。
对应于编码端,解码端的小波反变换方法可以使用传统小波反变换或者基于深度网络的小波反变换或者其他类似的变换方法,在此不做限定。可选地你,基于深度网络的小波反变换的流程图如图6e所示。图6e以一维信号为例,描述了小波反变换的流程:与图6b所示的正变换过程相反,近似分量和细节分量首先乘以参数1/a和1/b,然后经过相互的更新和预测步骤,得到两路信号,分别对应原始输入信号的奇数采样分量和偶数采样分量,最后两路信号进行合并,得到重建信号。对于二维图像的分解结果LL1,HL1,LH1,HH1(参考图6a),需要首先对LH1,HH1在列方向使用图6e所示的流程进行一次一维小波反变换,得到H; 对LL1,HL1在列方向使用图6e所示的流程进行一次一维小波反变换,得到L;然后对L和H在行方向使用图6e所示的流程进行一次一维小波反变换,得到重建图像217。
应理解,上述多个估计系数构成二维图像。
为了得到高效的图像编解码模型,需要将上述编码器20和解码器30级联,进行联合训练。训练的目的是优化在编解码过程中所使用的相关深度网络模块的参数,包括基于深度网络的小波正变换和反变换、基于深度网络的熵编码、和基于深度神经网络的概率估计网络等。图7展示了联合训练的框图。
所使用的损失函数为:
其中,损失函数包括三项:基于深度网络的熵编码给出的码率
小波正变换得到的小波系数c在概率分布q上的对数似然logq(c);使用q的均值进行反变换得到的重建样本图像
与输入的样本图像x之间的均方误差;λ调节码率与重建损失之间的重要性,不同的λ生成不同的模型,用于在不同的压缩比下压缩图像。
可选地,对于上述损失函数的第二项,也可以使用其他损失函数,例如重建样本图像和样本图像之间的多尺度结构相似度(multi-scale structural similarity,MS-SSIM),深度特征损失等。
需要指出的是,由于对样本图像进行小波正变换得到多个小波系数,因此会得到多个概率分布;将多个概率分布的均值和该多个概率分布的方差分别进行处理,比如求平均,可得到均值平均值和方差平均值,由均值平均值和方差平均值两个参数得到的概率分布即为上述q(c)。
上述训练过程是有训练引擎50实现的,训练过程包括初始化训练和联合训练,其中,初始化训练过程包括:
使用CDF9/7小波的正变换和反变换替换基于深度网络的小波正变换和反变换,其他不变,得到初始化编解码模型;在获取样本图像后,训练引擎50基于样本图像训练该初始化编解码模型直至基于上述损失函数得到的损失值收敛;此时保持概率估计网络的参数和熵编码所使用的深度网络的参数不变,使用基于深度网络的小波正变换和反变换替换CDF9/7小波的正变换和反变换,得到联合模型;训练引擎50基于样本图像训练该联合模型直至基于上述损失函数得到的损失值收敛;至此训练完毕。
需要指出的是,将样本图像输入模型中得到压缩码流和重建样本图像的具体过程可以参见上述编码器20和解码器30所执行的具体过程,在此不再叙述。
需要说明的是,上述用于小波正变换、小波反变换、熵编码的深度网络和用于概率估计的概率估计网络是在第三方设备基于上述训练方式训练好后,从第三方设备中获取的。
基于DCT域的解码方法和解码器
图8a为用于实现本申请技术的一种视频解码器的示例的示意性框图。在图8a的示例中,视频解码器30包括熵解码单元802、概率估计单元806、采样单元808和反变换单元810,可选地,视频解码器30还包括反量化单元804;图8a所示的视频解码器,也可称为端到端的视频解码器或者基于端到端视频解码器的视频解码器。
首先介绍下在DCT域如何编码的。
获取待编码数据,待编码数据包括图像块,具体包括:将原始图像或者图像区域划分为预置大小的图像块,该预置大小的图像块的尺寸可以是4x4、8x8、16x16、32x32、64x64、 128x128和256x256等。作为另外一种可实施的方式,对原始图像进行划分得到一个或者多个图像块,图像块的大小不做限定。可以使用现有编码标准(H266,H265,H264,AVS2或者AVS3)中的四叉树、二叉树或者三叉树的划分方法对原始图像进行划分,以得到一个或者多个图像块。
对待编码数据进行DCT,得到多个量化DCT系数。待编码数据(即图像块)经过DCT后,其低频分量都集中在左上角,高频分量分布在右下角,其中第一行第一列的系数值代表直流(DC)系数,即图像块的平均值,其它系数是交流(AC)系数。对AC系数和DC系数进行量化,得到量化后的AC和DC系数,即多个量化DCT系数。
对多个量化DCT系数进行熵编码可以使用以下方法之一,在此不做限定:
方法一:对多个量化DCT系数进行熵编码可以采用已有方法,例如JPEG中的哈夫曼编码、HEVC中的CABAC编码。
方法二:首先对每个量化DCT系数进行概率建模得到概率分布模型,然后将该量化多个系数的上下文信息输入概率估计网络估计得到概率分布模型的参数,将概率分布模型的参数代入概率分布模型得到该量化DCT系数的概率分布,根据该概率分布对该量化DCT系数进行熵编码;按照上述方法对多个量化DCT系数进行熵编码得到压缩码流。量化DCT系数的上下文信息包括:已编码的量化DCT系数中的部分或者全部。
上述概率分布模型可以是:单高斯模型、非对称高斯模型、混合高斯模型或者拉普拉斯分布模型等,在此不做限定。
上述概率估计网络可以使用基于深度学习网络,例如RNN和PixelCNN等,在此不做限定。
熵解码单元802
熵解码单元802对压缩码流进行熵解码,得到多个量化DCT系数。
压缩码流包括多个DCT系数的码流,在对每个DCT系数对应的码流进行熵解码时,首先,对每个DCT系数进行概率估计得到系数的概率分布,然后根据概率分布对该DCT系数对应的码流进行熵解码得到量化DCT系数209。其中,对每个DCT系数进行概率估计得到系数的概率分布的方法与编码端相同,在此不再赘述。
可选地,可以采样JPEG中的哈夫曼解码方法、HEVC中的CABAC解码方法对压缩码流进行解码,得到多个量化DCT系数。
反量化单元804
反量化单元804对多个量化DCT系数进行反量化,得到多个重建DCT系数。
具体的,对每个量化DCT系数乘以相应的量化步长,得到重建小波系数。其中,量化步长可以是QP。
在此需要指出的,对于视频解码器30来说,反量化单元804是可选的,因此在图8a中以虚线表示。
概率估计单元806
对于概率估计单元806来说,输入的数据可以为多个量化DCT系数,也可以为多个重建DCT系数,为了方便描述,将输入到概率估计单元806中的数据称为多个系数。以多个系数中的第一系数和第二系数为例来说明概率估计单元806的功能。
概率估计单元806根据第一系数的上下文信息进行概率估计,得到第一概率分布;根据第二系数的上下文信息和/或采样得到的估计系数的进行概率估计,得到第二概率分布,其中, 该已采样得到的估计系数包括第一估计系数,第一估计系数是在第二估计系数之前获得的。
具体地,概率估计单元806根据第一系数的上下文信息进行概率估计,得到第一概率分布,包括:
获取第一系数的概率分布模型;将第一系数的上下文信息经过第五概率估计网络进行处理,以得到概率分布模型的参数;根据概率分布模型和该概率分布模型的参数得到第一概率分布;
或者,
将所述第一系数的上下文信息经过第六概率估计网络进行处理,以得到第一概率分布;
其中,第五概率估计网络和第六概率估计网络是基于神经网络实现的,第一系数的上下文信息包括多个系数中的部分或者全部,和/或,已采样得到的估计系数中的部分或者全部。
进一步地,在输入到概率估计单元806中的数据为量化DCT系数时,对于第一系数的上下文信息,可以包括第三区域内的量化DCT系数和第四区域中的估计系数,其中,第三区域为量化DCT系数图中的任意区域;在输入到概率估计单元806中的数据为重建DCT系数时,对于第一系数的上下文信息,可以包括第三区域内的重建DCT系数和第四区域中的估计系数,其中,第三区域为重建DCT系数图的任意区域,第四区域为估计系数图中的任意区域。
应理解,上述量化DCT系数图为由上述多个量化DCT系数构成的图像,上述重建DCT系数图为由上述多个重建小波DCT构成的图像。估计系数图由多个已采样得到估计系数构成的图像。
对于第二系数,可以按照上述方式确定第二概率分布,也可以按照以下方式确定第二概率分布:
概率估计单元806根据第二系数的上下文信息和/或已采样得到的估计系数进行概率估计,得到第二概率分布,这里的已采样得到的估计系数包括第一估计系数,也就是说在进行概率估计得到第二概率分布时,输入到下面第七概率估计网络或第八概率估计网络中的数据包括第一估计系数。
具体地,获取第二系数的概率分布模型;将第二系数的上下文信息和/或已采样得到的估计系数经过第七概率估计网络进行处理,以得到概率分布模型的参数;根据概率分布模型和该概率分布模型的参数得到第二概率分布;
或者,
将所述第一系数的上下文信息经过第八概率估计网络进行处理,以得到第二概率分布;
其中,第七概率估计网络和第八概率估计网络是基于神经网络实现的,第二系数的上下文信息包括多个系数中的部分或者全部,和/或,已采样得到的估计系数中的部分或者全部。
需要指出的是,第二系数的上下文信息可以参见上述第一系数的上下文信息的相关描述,在此不再叙述。
对于上述概率分布模型,可以参见熵编码单元206中的相关描述,在此不再叙述。上述第五概率估计网络、第六概率估计网络、第七概率估计网络和第八概率估计网络的网络结构具体可以参见图6d所示的网络结构,在此不作限定,当然还可以是其他形式的网络结构。
采样单元808
采样单元808根据第一概率分布进行采样,得到第一估计系数;根据第二概率分布采样,得到第二估计系数。由于两者采样过程一致,下面以第一概率分布为高斯分布来举例说明如何根据第一概率分布进行采样得到第一估计系数。
令z
2=δ·z
1+μ,则z
2服从均值为μ,方差为δ的高斯分布,z
2即为上述第一估计系数,其中,δ和μ分别为上述第一概率分布的方差和均值。
可选地,在进行采样之前,对第一概率分布的方差进行处理,具体处理过程包括:将第一概率分布的方差置为0作为处理后的方差;然后再根据处理后的方差和第一概率分布的均值按照上述采样方式进行采样得到第一估计系数。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布的方差进行处理,然后再根据处理后的方差和第一概率分布的均值按照上述采样方式进行采样得到第一估计系数。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布的均值进行处理,然后再根据处理后的均值和第一概率分布的方差按照上述采样方式进行采样得到第一估计系数。
在一个示例中,当第一概率分布为拉普拉斯分布时,根据第一概率分布进行采样得到第一估计系数,包括:
生成两个均匀分布的随机数μ
1和μ
2,令z
3=b·log(μ
1),z
4=b·log(μ
2),第一估计系数为z
5=z
3-z
4+μ,其中,μ和b分别为第一概率分布的位置参数和尺度参数。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布的尺度参数进行处理,然后再根据处理后的尺度参数和第一概率分布的位置参数按照上述采样方式进行采样得到第一估计系数。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布的位置参数进行处理,然后再根据处理后的位置参数和第一概率分布的尺度参数按照上述采样方式进行采样得到第一估计系数。
应理解,可以按照上述方式根据第二概率分布得到第二估计系数。
可选地,第一系数的缩放因子和第二系数的缩放因子相同;或者,第一系数的缩放因子和第二系数的缩放因子不同;或者,若第一系数和第二系数属于同一个频带,则第一系数的缩放因子和第二系数的缩放因子相同;或者若第一系数和第二系数属于不同频带,则第一系数的缩放因子和第二系数的缩放因子不同,也就是说,属于同一频带的系数的缩放因子相同,属于不同频带的系数的缩放因子不相同。
其中,缩放因子的取值范围为[0,1]。
需要指出的是,频带可以理解成一个系数块(图像块进行DCT变换得到的一个系数块,因为DCT变换是以块为单位)或者理解成各个系数块中相同位置的系数,组成一个频带。
通过对第一概率分布进行预处理,可以按照用户的需求得到不同性质的重建图像。比如将第一概率分布的方差置0作为处理后的方差,可以得到信号质量最佳(客观质量最佳)的重建图像,也就是增大图像的PSNR或者降低MSE;通过将多个系数的缩放因子设置为相同,可以得到主观质量最佳的图像,也即是降低图像的PSNR或者增大图像的MSE;通过将图像中属于同于部分的系数的缩放因子设置为相同,将属于不同部分的系数的缩放因子设置为不相同,可以得到性质在主观质量最佳和客观质量最佳之间的图像。
需要指出的是,若第一系数不是多个系数中的第一系数,第一系数的上下文信息还包括在第一估计系数之前得到的估计系数。
反变换单元810
反变换单元810对多个估计系数(包括第一估计系数和第二估计系数)进行反DCT,得到重建图像。
可选地,若概率估计单元806的输入数据为多个量化DCT系数,则对多个估计系数(包括第一系数和第二系数)先后经过反量化单元804和反变换单元810进行反量化和反DCT,得到重建图像,如图8b所示。
基于特征域的解码方法和解码器
图9为用于实现本申请技术的一种视频解码器的示例的示意性框图。在图9的示例中,视频解码器30包括熵解码单元902、概率估计单元904、采样单元906和重建单元908;图9所示的视频解码器,也可称为端到端的视频解码器或者基于端到端视频解码器的视频解码器。
熵解码单元902
熵解码单元902对压缩码流进行熵解码,得到多个重建特征系数。
具体地,熵解码单元902从压缩码流中熵解码出边信息
然后基于边信息
对每个重建特征系数进行概率估计,得到每个重建特征系数的概率分布。熵解码单元902根据重建特征系数的概率分布从压缩码流中熵解码出多个重建特征系数。该多个重建特征系数可以构成重建特征图,该重建特征图的大小可以表示为CxWxH,C一般是指通道数(channel),W和H为每个channel的宽和高。
概率估计单元904
对于概率估计单元904来说,输入的数据可以为多个量化特征系数,也可以为多个重建特征系数,为了方便描述,将输入到概率估计单元904中的数据称为多个系数。以多个系数中的第一系数和第二系数为例来说明概率估计单元904的功能。
概率估计单元904根据第一系数的上下文信息进行概率估计,得到第一概率分布;根据第二系数的上下文信息和/或采样得到的估计系数的进行概率估计,得到第二概率分布,其中,该已采样得到的估计系数包括第一估计系数,第一估计系数是在第二估计系数之前获得的。
具体地,概率估计单元904根据第一系数的上下文信息进行概率估计,得到第一概率分布,包括:
获取第一系数的概率分布模型;将第一系数的上下文信息经过第九概率估计网络进行处理,以得到概率分布模型的参数;根据概率分布模型和该概率分布模型的参数得到第一概率分布;
或者,
将所述第一系数的上下文信息经过第十概率估计网络进行处理,以得到第一概率分布;
其中,第九概率估计网络和第十概率估计网络是基于神经网络实现的,第一系数的上下文信息包括多个系数中的部分或者全部,和/或,已采样得到的估计系数中的部分或者全部。
进一步地,在输入到概率估计单元904中的数据为量化特征系数时,对于第一系数的上下文信息,可以包括第五区域内的量化特征系数和第六区域中的估计系数,其中,第五区域为量化特征系数图中的任意区域,第六区域为估计系数图中的任意区域。
应理解,上述量化特征系数图为由上述多个量化特征系数构成的图像。上述估计系数图由多个已采样得到估计系数构成的图像。
对于第二系数,可以按照上述方式确定第二概率分布,也可以按照以下方式确定第二概率分布:
概率估计单元904根据第二系数的上下文信息和/或已采样得到的估计系数进行概率估计,得到第二概率分布,这里的已采样得到的估计系数包括第一估计系数,也就是说在进行概率估计得到第二概率分布时,输入到下面第十一概率估计网络或第十二概率估计网络中的数据包括第一估计系数。
具体地,获取第二系数的概率分布模型;将第二系数的上下文信息和/或已采样得到的估计系数经过第十一概率估计网络进行处理,以得到概率分布模型的参数;根据概率分布模型和该概率分布模型的参数得到第二概率分布;
或者,
将所述第一系数的上下文信息经过第十二概率估计网络进行处理,以得到第二概率分布;
其中,第十一概率估计网络和第十二概率估计网络是基于神经网络实现的,第二系数的上下文信息包括多个系数中的部分或者全部,和/或,已采样得到的估计系数中的部分或者全部。
需要指出的是,第二系数的上下文信息可以参见上述第一系数的上下文信息的相关描述,在此不再叙述。
对于上述概率分布模型,可以参见熵编码单元206中的相关描述,在此不再叙述。上述第九概率估计网络、第十概率估计网络、第十一概率估计网络和第十二概率估计网络的网络结构具体可以参见图6d所示的网络结构,在此不作限定,当然还可以是其他形式的网络结构。
采样单元906
采样单元906根据第一概率分布进行采样,得到第一估计系数;根据第二概率分布采样,得到第二估计系数。由于两者采样过程一致,下面以第一概率分布为高斯分布来举例说明如何根据第一概率分布进行采样得到第一估计系数。
令z
2=δ·z
1+μ,则z
2服从均值为μ,方差为δ的高斯分布,z
2即为上述第一估计系数,其中,δ和μ分别为上述第一概率分布的方差和均值。
可选地,在进行采样之前,对第一概率分布的方差进行处理,具体处理过程包括:将第一概率分布的方差置为0作为处理后的方差;然后再根据处理后的方差和第一概率分布的均值按照上述采样方式进行采样得到第一估计系数。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布的方差进行处理,然后再根据处理后的方差和第一概率分布的均值按照上述采样方式进行采样得到第一估计系数。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布的均值进行处理,然后再根据处理后的均值和第一概率分布的方差按照上述采样方式进行采样得到第一估计系数。
在一个示例中,当第一概率分布为拉普拉斯分布时,根据第一概率分布进行采样得到第一估计系数,包括:
生成两个均匀分布的随机数μ
1和μ
2,令z
3=b·log(μ
1),z
4=b·log(μ
2),第一估计系数为z
5=z
3-z
4+μ,其中,μ和b分别为第一概率分布的位置参数和尺度参数。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布的尺度参数进行处理,然后再根据处理后的尺度参数和第一概率分布的位置参数按照上述采样方式进行采样得到第一估计系数。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布的位置参数进行处理,然后再根据处理后的位置参数和第一概率分布的尺度参数按照上述采样方式进行采样得到第一估计系数。
应理解,可以按照上述方式根据第二概率分布得到第二估计系数。
可选地,第一系数的缩放因子和第二系数的缩放因子相同;或者,第一系数的缩放因子和第二系数的缩放因子不同;或者,若第一系数和第二系数属于同一个通道,则第一系数的缩放因子和第二系数的缩放因子相同;或者若第一系数和第二系数属于不同通道,则第一系数的缩放因子和第二系数的缩放因子不同,也就是说,属于同一通道的系数的缩放因子相同,属于不同通道的系数的缩放因子不相同。
其中,缩放因子的取值范围为[0,1]。
通过对第一概率分布进行预处理,可以按照用户的需求得到不同性质的重建图像。比如将第一概率分布的方差置0作为处理后的方差,可以得到信号质量最佳(客观质量最佳)的重建图像,也就是增大图像的PSNR或者降低MSE;通过将多个系数的缩放因子设置为相同,可以得到主观质量最佳的图像,也即是降低图像的PSNR或者增大图像的MSE;通过将图像中属于同于部分的系数的缩放因子设置为相同,将属于不同部分的系数的缩放因子设置为不相同,可以得到性质在主观质量最佳和客观质量最佳之间的图像。
需要指出的是,若第一系数不是多个系数中的第一系数,第一系数的上下文信息还包括在第一估计系数之前得到的估计系数。
按照上述方法,可以得到多个估计系数,该估计系数构成重建特征图。该重建特征图可以输入面向机器视觉任务模块执行相应的机器任务。例如完成物体分类、识别、分割等机器视觉任务;还可以输入到重建单元908中。
如果应用在面向多种机器视觉任务的特征图编码中,可以针对多个不同的机器任务使用不同的采样方法,从而得到多不同性质的重建特征图,多个不同性质的重建特征图输入各自的面向机器视觉任务模块执行相应的机器任务。
重建单元908
重建单元908对重建特征图进行处理,得到重建图像,也就是将重建图像从特征域变换到像素域。
重建单元908可以基于任一结构的神经网络实现,例如全连接网络、卷积神经网络、循环神经网络等。所述神经网络可以采用多层结构的深度神经网络结构来达到更好的估计效果。
基于像素域的解码方法和解码器
图10a为用于实现本申请技术的一种视频解码器的示例的示意性框图。在图10a的示例中,视频解码器30包括解码单元1002、概率估计单元1004和采样单元1006;在另一个示例 中,视频解码器30包括解码单元1002、概率估计单元1004、采样单元1006、变换单元1008和反变换单元1010,如图10b所示;图10a和图10b所示的视频解码器,也可称为端到端的视频解码器或者基于端到端视频解码器的视频解码器。
解码单元1002
解码单元1002对压缩码流进行解码,比如JPEG解码,得到初始重建图像,初始重建图像包括多个初始像素重建值。
变换单元1008
变换单元1008对初始重建图像进行变换,也就是对多个初始像素重建值进行变换,得到多个变换像素值。
可选地,上述变换单元1008所采用的变换方式包括但不限于小波变换、DCT或特征提取等。
概率估计单元1004
对于概率估计单元1004来说,输入的数据可以为多个初始像素重建值,也可以为多个变换像素值,为了方便描述,将输入到概率估计单元1004中的数据称为多个系数。以多个系数中的第一系数和第二系数为例来说明概率估计单元1004的功能。
概率估计单元1004根据第一系数的上下文信息进行概率估计,得到第一概率分布;根据第二系数的上下文信息和/或采样得到的估计系数的进行概率估计,得到第二概率分布,其中,该已采样得到的估计系数包括第一估计系数,第一估计系数是在第二估计系数之前获得的。
具体地,概率估计单元1004根据第一系数的上下文信息进行概率估计,得到第一概率分布,包括:
获取第一系数的概率分布模型;将第一系数的上下文信息经过第十三概率估计网络进行处理,以得到概率分布模型的参数;根据概率分布模型和该概率分布模型的参数得到第一概率分布;
或者,
将所述第一系数的上下文信息经过第十四概率估计网络进行处理,以得到第一概率分布;
其中,第十三概率估计网络和第十四概率估计网络是基于神经网络实现的,第一系数的上下文信息包括多个系数中的部分或者全部,和/或,已采样得到的估计系数中的部分或者全部。
进一步地,在输入到概率估计单元1004中的数据为多个初始像素重建值时,对于第一系数的上下文信息,可以包括第七区域内的初始像素重建值和第八区域中的估计系数,其中,第七区域为初始重建图像中的任意区域;在输入到概率估计单元1004中的数据为变换像素值时,对于第一系数的上下文信息,可以包括第七区域内的变换像素值和第八区域中的估计系数,其中,第七区域为对初始重建图像进行变换得到的变换图像内的任意区域,第八区域为估计系数图中的任意区域。
应理解,估计系数图由多个已采样得到估计系数构成的图像。
对于第二系数,可以按照上述方式确定第二概率分布,也可以按照以下方式确定第二概率分布:
概率估计单元1004根据第二系数的上下文信息和/或已采样得到的估计系数进行概率估计,得到第二概率分布,这里的已采样得到的估计系数包括第一估计系数,也就是说在进行概率估计得到第二概率分布时,输入到下面第十五概率估计网络或第十六概率估计网络中的数据包括第一估计系数。
具体地,获取第二系数的概率分布模型;将第二系数的上下文信息和/或已采样得到的估计系数经过第十五概率估计网络进行处理,以得到概率分布模型的参数;根据概率分布模型和该概率分布模型的参数得到第二概率分布;
或者,
将所述第一系数的上下文信息经过第十六概率估计网络进行处理,以得到第二概率分布;
其中,第十五概率估计网络和第十六概率估计网络是基于神经网络实现的,第二系数的上下文信息包括多个系数中的部分或者全部,和/或,已采样得到的估计系数中的部分或者全部。
需要指出的是,第二系数的上下文信息可以参见上述第一系数的上下文信息的相关描述,在此不再叙述。
对于上述概率分布模型,可以参见熵编码单元206中的相关描述,在此不再叙述。上述第十三概率估计网络、第十四概率估计网络、第十五概率估计网络和第十六概率估计网络的网络结构具体可以参见图6d所示的网络结构,在此不作限定,当然还可以是其他形式的网络结构。
采样单元1006
采样单元1006根据第一概率分布进行采样,得到第一估计系数;根据第二概率分布采样,得到第二估计系数。由于两者采样过程一致,下面以第一概率分布为高斯分布来举例说明如何根据第一概率分布进行采样得到第一估计系数。
令z
2=δ·z
1+μ,则z
2服从均值为μ,方差为δ的高斯分布,z
2即为上述第一估计系数,其中,δ和μ分别为上述第一概率分布的方差和均值。
可选地,在进行采样之前,对第一概率分布的方差进行处理,具体处理过程包括:将第一概率分布的方差置为0作为处理后的方差;然后再根据处理后的方差和第一概率分布的均值按照上述采样方式进行采样得到第一估计系数。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布的方差进行处理,然后再根据处理后的方差和第一概率分布的均值按照上述采样方式进行采样得到第一估计系数。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布的均值进行处理,然后再根据处理后的均值和第一概率分布的方差按照上述采样方式进行采样得到第一估计系数。
当第一概率分布为拉普拉斯分布时,根据第一概率分布进行采样得到第一估计系数,包括:
生成两个均匀分布的随机数μ
1和μ
2,令z
3=b·log(μ
1),z
4=b·log(μ
2),第一估计系数为z
5=z
3-z
4+μ,其中,μ和b分别为第一概率分布的位置参数和尺度参数。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布的尺度参数进行处理,然后再根据处理后的尺度参数和第一概率分布的位置参数按照上述采样方式进行采样得 到第一估计系数。
可选地,在进行采样之前,根据第一系数的缩放因子对第一概率分布的位置参数进行处理,然后再根据处理后的位置参数和第一概率分布的尺度参数按照上述采样方式进行采样得到第一估计系数。
应理解,可以按照上述方式根据第二概率分布得到第二估计系数。
可选地,第一系数的缩放因子和第二系数的缩放因子相同;或者,第一系数的缩放因子和第二系数的缩放因子不同,或者,在上述变换为DCT时,若第一系数和第二系数属于同一个频带,则第一系数的缩放因子和第二系数的缩放因子相同;或者若第一系数和第二系数属于不同频带,则第一系数的缩放因子和第二系数的缩放因子不同,也就是说,属于同一频带的系数的缩放因子相同,属于不同频带的系数的缩放因子不相同;
或者,在上述变换为小波变换时,若第一系数和第二系数属于同一个子带,则第一系数的缩放因子和第二系数的缩放因子相同;或者若第一系数和第二系数属于不同子带,则第一系数的缩放因子和第二系数的缩放因子不同,也就是说,属于同一子带的系数的缩放因子相同,属于不同子带的系数的缩放因子不相同;
或者,在上述变换为特征提取时,若第一系数和第二系数属于同一个通道,则第一系数的缩放因子和第二系数的缩放因子相同;或者若第一系数和第二系数属于不同通道,则第一系数的缩放因子和第二系数的缩放因子不同,也就是说,属于同一通道的系数的缩放因子相同,属于不同通道的系数的缩放因子不相同。
其中,缩放因子的取值范围为[0,1]。
通过对第一概率分布进行预处理,可以按照用户的需求得到不同性质的重建图像。比如将第一概率分布的方差置0作为处理后的方差,可以得到信号质量最佳(客观质量最佳)的重建图像,也就是增大图像的PSNR或者降低MSE;通过将多个系数的缩放因子设置为相同,可以得到主观质量最佳的图像,也即是降低图像的PSNR或者增大图像的MSE;通过将图像中属于同于部分的系数的缩放因子设置为相同,将属于不同部分的系数的缩放因子设置为不相同,可以得到性质在主观质量最佳和客观质量最佳之间的图像。
需要指出的是,若第一系数不是多个系数中的第一系数,第一系数的上下文信息还包括在第一估计系数之前得到的估计系数。
按照上述方式,可以得到多个估计系数。若输入到概率估计单元1004中的是多个初始像素重建值,则该多个系数为多个重建像素值,该多个重建像素值构成重建图像;若输入到概率估计单元1004中的是多个变换像素值,该多个系数为多个变换像素重建值,将该多个变换像素重建值输入到反变换单元1010。
反变换单元1010
反变换单元1010对多个变换像素重建值进行反变换,得到多个重建像素值,该重建像素值构成重建图像。
需要指出的是,由上述描述可知,上述变换单元1008、概率估计单元1004、采样单元1008和反变换单元1010所执行的动作都是基于解码单元1002的解码结果;因此对于本实施的方案,可以看成在一个普通解码器和辅助解码设备来实现的,其中,普通解码器实现解码单元1002的功能,辅助解码设备实现变换单元1008、概率估计单元1004、采样单元1008和反变换单元1010的功能。
可以看出,采用本申请的方案在每次对压缩码流解码过程中,对解码出来的系数进行概率估计,并基于概率估计结果进行采样,得到估计系数,再采样得到的估计系数得到重建图 像。由于采样过程具有随机性,是一个不确定过程,因此对于同一压缩码流按照上述方式进行多次解码可以得到的多张不同性质的高质量图像。比如主观质量最优的图像,客观质量最优的图像。
图11是示出基于本申请一种实施例的解码方法的过程1100的流程图。过程1100可由视频解码器30执行。过程1100描述为一系列的步骤或操作,应当理解的是,过程1100可以以各种顺序执行和/或同时发生,不限于图11所示的执行顺序。
如图11所示,该编码方法包括:
S1101、根据待解码数据的压缩码流,获得多个系数,该多个系数包括第一系数。
可选地,待解码数据可以为图像、图像块,条带或者图像的任意区域。
在一种可能的设计中,上述多个系数还包括第二系数,本申请的方法还包括:
根据第二系数的上下文信息和/或已采样得到的估计系数,进行概率估计得到第二概率分布,其中,已采样得到的估计系数包括第一估计系数;根据所述第二概率分布进行采样,得到第二估计系数;则根据所述第一估计系数得到重建图像包括:根据第一估计系数和第二估计系数得到重建图像。
在此需要指出的是,第二估计系数是在第一估计系数之后获得的。
在一种可能的设计中,根据待解码数据的压缩码流,获取多个系数,包括:
对压缩码流进行熵解码得到多个系数,该多个系数为多个量化小波系数;或者,
对压缩码流进行熵解码得到多个量化小波系数;对该多个量化小波系数进行反量化,以得到多个系数,该多个系数为多个重建小波系数。
在一种可能的设计中,根据待解码数据的压缩码流,获取多个系数,包括:
对压缩码流进行熵解码,以得到多个系数,该多个系数为多个量化离线余弦变换(discrete cosine transform,DCT)系数;或者,
对压缩码流进行熵解码,以得到多个量化DCT系数;对该多个量化DCT系数进行反量化,以得到多个系数,该多个系数为多个重建DCT系数。
在一种可能的设计中,根据待解码数据的压缩码流,获取多个系数,包括:
对压缩码流进行熵解码,以得到多个系数,该多个系数为多个特征系数。
在一种可能的设计中,根据待解码数据的压缩码流,获取多个系数,包括:
对压缩码流进行解码,以得到多个系数,该多个系数为多个初始像素重建值;或者,
对压缩码流进行解码,以得到多个初始像素重建值;对多个初始像素重建值进行变换得到多个系数,该多个系数为多个变换像素值。
按照上述不同的方式获取多个系数,使得本申请的解码方法可以应用于不同的解码场景,比如针对小波域、特征域、DCT域和像素域等。
S1102、根据第一系数的上下文信息进行概率估计得到第一概率分布。
在一种可能的设计中,所述根据所述第一系数的上下文信息进行概率估计得到第一概率分布,包括:
获取第一系数的概率分布模型;将第一系数的上下文信息经过第一概率估计网络进行处理,以得到概率分布模型的参数;根据概率分布模型和该概率分布模型的参数得到第一概率分布;
或者,
将所述第一系数的上下文信息经过第二概率估计网络进行处理,以得到第一概率分布;
其中,第一概率估计网络和第二概率估计网络是基于神经网络实现的,第一系数的上下文信息包括多个系数中的部分或者全部,和/或,已采样得到的估计系数中的部分或者全部。
可选地,上述概率分布模型可以为高斯模型、拉普拉斯模型、混合高斯模型或者其他模型;当上述概率分布模型为高斯模型时,该概率分布模型的参数包括均值和方差;当上述概率分布模型为拉普拉斯模型时,该概率分布模型的参数包括位置参数和尺寸参数。
可选地,上述神经网络可以为卷积神经网络、深度神经网络、循环神经网络或者其他神经网络。
可选地,上述第一概率估计网络和第二概率估计网络的结构和参数不相同,或者第一概率估计网络和第二概率估计网络的结构相同,但参数不相同。
按照上述方式可以得到第一概率分布,从而为后续基于第一概率分布进行采样做准备。
S1103、根据第一概率分布进行采样,以得到第一估计系数。
在一种可能的设计中,第一概率分布为高斯分布,根据第一概率分布进行采样,以得到第一估计系数,包括:
获取第一随机数;根据第一随机数确定第一参考值,该第一参考值服从高斯分布;根据第一参考值和第一概率分布的均值和方差确定第一估计系数。
其中,第一随机数是使用线性同余法生成[0,1]上的均匀分布的随机数。
可选地,第一参考值可以服从标准高斯分布、普通高斯分布、非对称高斯分布、单高斯模型、混合高斯模型或者其他高斯分布等。
由于第一随机数具有随机性,使得采样得到的第一估计系数也具有随机性,从而导致基于第一估计系数得到的重建图像也具有随机性,也就是不确定性。采样过程是一个随机过程,不确定过程;基于按照上述方式进行多次采样得到的估计系数得到的多张重建图像具有不同的性质。
在一种可能的设计中,本申请的方法还包括:
对第一概率分布的方差进行预处理,以得到处理后的方差;
根据第一参考值和第一概率分布的均值和方差确定第一估计系数,包括:
根据第一参考值、第一概率分布的均值及处理后的方差确定第一估计系数。
在一个可能的设计中,本申请的方法还包括:根据第一系数的缩放因子对第一概率分布的均值进行预处理,以得到处理后的均值;
根据第一参考值和第一概率分布的均值和方差确定第一估计系数,包括:
根据第一参考值、第一概率分布的方差及处理后的均值确定第一估计系数。
在一种可能的设计中,对第一概率分布的方差进行预处理,以得到处理后的方差,包括:
将第一概率分布的方差置0作为处理后的方差。
在一种可能的设计中,多个系数为多个量化小波系数,或者,多个系数为多个重建小波系数,或者多个系数为多个量化DCT系数,或者多个系数为多个重建DCT系数,或者多个系数为多个特征系数,对第一概率分布的方差进行预处理,以得到处理后的方差,包括:
根据第一系数的缩放因子对第一概率分布的方差进行预处理,以得到处理后的方差,
第一系数的缩放因子和第二系数的缩放因子相同;或者,
第一系数的缩放因子和第二系数的缩放因子不同;或者
在多个系数为多个量化小波系数或者为多个重建小波系数时,若第一系数和第二系数属于同一个子带,则第一系数的缩放因子和第二系数的缩放因子相同;或者若第一系数和第二系数属于不同子带,则第一系数的缩放因子和第二系数的缩放因子不同,
或者,
在多个系数为多个量化DCT系数或者为多个重建DCT系数时,若第一系数和第二系数属于同一个频带,则第一系数的缩放因子和第二系数的缩放因子相同;或者若第一系数和第二系数属于不同频带,则第一系数的缩放因子和第二系数的缩放因子不同,
或者,
在多个系数为多个量化特征系数,或者多个系数为多个重建特征系数时,若第一系数和第二系数属于同一通道,则第一系数的缩放因子和第二系数的缩放因子相同;或者若第一系数和第二系数属于不同通道,则第一系数的缩放因子和第二系数的缩放因子不同。
在一种可能的设计中,多个系数为多个初始像素重建值,或者多个像素为多个变换像素值,对第一概率分布的方差进行预处理,以得到处理后的方差,包括:
根据第一系数的缩放因子对第一概率分布的方差进行预处理,以得到处理后的方差,
其中,第一系数的缩放因子和第二系数的缩放因子相同,或者第一系数的缩放因子和第二系数的缩放因子不相同。
S1104、根据第一估计系数得到重建图像。
在一种可能的设计中,多个系数为多个量化小波系数,或者,多个系数为多个重建小波系数,根据第一估计系数和第二估计系数得到重建图像,包括:
对第一估计系数和第二估计系数进行小波反变换,以得到重建图像。
在一种可能的设计中,根据第一估计系数和第二估计系数得到重建图像,包括:
当多个系数为多个量化DCT系数时,对第一估计系数和第二估计系数进行反量化和反DCT,以得到重建图像,或者,
当多个系数为多个重建DCT系数时,对第一估计系数和第二估计系数进行反DCT,以得到重建图像。
在一种可能的设计中,多个系数为多个变换像素值,根据第一估计系数和第二估计系数得到重建图像,包括:
对第一估计系数和第二估计系数进行反变换,以得到重建图像。
可以看出,采用本申请的方案在每次对压缩码流解码过程中,对解码出来的系数进行概率估计,并基于概率估计结果进行采样,得到估计系数,再采样得到的估计系数得到重建图像。由于采样过程具有随机性,是一个不确定过程,因此对于同一压缩码流按照上述方式进行多次解码可以得到的多张不同性质的高质量图像。比如主观质量最优的图像,客观质量最优的图像。
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,根据通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或 数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。
本申请的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请中描述各种组件、模块或单元是为了强调用于执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。
以上所述,仅为本申请示例性的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。
Claims (19)
- 一种视频图像的解码方法,其特征在于,包括:根据待解码数据的压缩码流,获得多个系数,其中所述多个系数包括第一系数;根据所述第一系数的上下文信息进行概率估计得到第一概率分布;根据所述第一概率分布进行采样,以得到第一估计系数;根据所述第一估计系数得到重建图像。
- 根据权利要求1所述的方法,其特征在于,所述多个系数还包括第二系数,所述方法还包括:根据所述第二系数的上下文信息和/或已采样得到的估计系数,进行概率估计得到第二概率分布;其中,所述已采样得到的估计系数包括所述第一估计系数;根据所述第二概率分布进行采样,以得到第二估计系数;则,所述根据所述第一估计系数得到重建图像,包括:根据所述第一估计系数和第二估计系数得到所述重建图像。
- 根据权利要求1或2所述的方法,其特征在于,所述根据待解码数据的压缩码流,获得多个系数,包括:对所述压缩码流进行熵解码得到所述多个系数,所述多个系数为多个量化小波系数;或者,对所述压缩码流进行熵解码得到多个量化小波系数;对所述多个量化小波系数进行反量化,以得到所述多个系数,所述多个系数为多个重建小波系数。
- 根据权利要求1或2所述的方法,其特征在于,所述根据待解码数据的压缩码流,获得多个系数,包括:对所述压缩码流进行熵解码,以得到所述多个系数,所述多个系数为多个量化离线余弦变换DCT系数;或者,对所述压缩码流进行熵解码,以得到所述多个量化DCT系数;对所述多个量化DCT系数进行反量化,以得到所述多个系数,所述多个系数为多个重建DCT系数。
- 根据权利要求1或2所述的方法,其特征在于,所述根据待解码数据的压缩码流,获得多个系数,包括:对所述压缩码流进行熵解码,以得到所述多个系数,所述多个系数为多个特征系数。
- 根据权利要求1或2所述的方法,其特征在于,所述根据待解码数据的压缩码流,获得多个系数,包括:对所述压缩码流进行解码,以得到所述多个系数,所述多个系数为多个初始像素重建值;或者,对所述压缩码流进行解码,以得到多个初始像素重建值;对所述多个初始像素重建值进行变换得到所述多个系数,所述多个系数为多个变换像素值。
- 根据权利要求1-6任一项所述的方法,其特征在于,所述根据所述第一系数的上下文 信息进行概率估计得到第一概率分布,包括:获取所述第一系数的概率分布模型;将所述第一系数的上下文信息经过第一概率估计网络进行处理,以得到所述概率分布模型的参数;根据所述概率分布模型和所述概率分布模型的参数得到所述第一概率分布;或者,将所述第一系数的上下文信息经过第二概率估计网络进行处理,以得到所述第一概率分布;其中,所述第一概率估计网络和所述第二概率估计网络是基于神经网络实现的,所述第一系数的上下文信息包括所述多个系数中的部分或者全部,和/或,已采样得到的估计系数中的部分或者全部。
- 根据权利要求1-7任一项所述的方法,其特征在于,所述第一概率分布为高斯分布,所述根据所述第一概率分布进行采样,以得到第一估计系数,包括:获取第一随机数;根据所述第一随机数确定第一参考值,所述第一参考值服从高斯分布;根据所述第一参考值和所述第一概率分布的均值和方差确定所述第一估计系数。
- 根据权利要求8所述的方法,其特征在于,所述方法还包括:对所述第一概率分布的方差进行预处理,以得到处理后的方差;所述根据所述第一参考值和所述第一概率分布的均值和方差确定所述第一估计系数,包括:根据所述第一参考值、所述第一概率分布的均值及所述处理后的方差确定所述第一估计系数。
- 根据权利要求9所述的方法,其特征在于,所述对所述第一概率分布的方差进行预处理,以得到处理后的方差,包括:将所述第一概率分布的方差置0作为所述处理后的方差。
- 根据权利要求9所述的方法,其特征在于,当所述多个系数为所述多个量化小波系数,或者,所述多个系数为所述多个重建小波系数,或者所述多个系数为多个量化DCT系数,或者所述多个系数为所述多个重建DCT系数,或者所述多个系数为所述多个特征系数时,所述对所述第一概率分布的方差进行预处理,以得到处理后的方差,包括:根据所述第一系数的缩放因子所述对所述第一概率分布的方差进行预处理,以得到处理后的方差,所述方法还包括:根据所述第二系数的缩放因子所述对所述第二概率分布的方差进行预处理,其中,所述第一系数的缩放因子和所述第二系数的缩放因子相同;或者,所述第一系数的缩放因子和所述第二系数的缩放因子不同;或者在所述多个系数为所述多个量化小波系数或者为所述多个重建小波系数时,若所述第一系数和所述第二系数属于同一个子带,则所述第一系数的缩放因子和所述第二系数的缩放因子相同;或者若所述第一系数和所述第二系数属于不同子带,则所述第一系数的缩放因子和 所述第二系数的缩放因子不同,或者,在所述多个系数为所述多个量化DCT系数或者为所述多个重建DCT系数时,若所述第一系数和所述第二系数属于同一个频带,则所述第一系数的缩放因子和所述第二系数的缩放因子相同;或者若所述第一系数和所述第二系数属于不同频带,则所述第一系数的缩放因子和所述第二系数的缩放因子不同,或者,在所述多个系数为多个量化特征系数,或者所述多个系数为所述多个重建特征系数时,若所述第一系数和所述第二系数属于同一通道,则所述第一系数的缩放因子和所述第二系数的缩放因子相同;或者若所述第一系数和所述第二系数属于不同通道,则所述第一系数的缩放因子和所述第二系数的缩放因子不同。
- 根据权利要求9所述的方法,当所述多个系数为多个初始像素重建值,或者所述多个系数为所述多个变换像素值,所述对所述第一概率分布的方差进行预处理时,以得到处理后的方差,包括:根据所述第一系数的缩放因子所述对所述第一概率分布的方差进行预处理,以得到处理后的方差,所述第一系数的缩放因子和所述第二系数的缩放因子相同,或者第一系数的缩放因子和所述第二系数的缩放因子不相同。
- 根据权利要求3-12任一项所述的方法,其特征在于,当所述多个系数为所述多个量化小波系数,或者,所述多个系数为所述多个重建小波系数时,所述根据所述第一估计系数和所述第二估计系数得到所述重建图像,包括:对所述第一估计系数和所述第二估计系数进行小波反变换,以得到所述重建图像。
- 根据权利要求3-12任一项所述的方法,其特征在于,所述根据所述第一估计系数和所述第二估计系数得到所述重建图像,包括:当所述多个系数为所述多个量化DCT系数时,对所述第一估计系数和所述第二估计系数进行反量化和反DCT,以得到所述重建图像,或者,当所述多个系数为所述多个重建DCT系数时,对所述第一估计系数和所述第二估计系数进行反DCT,以得到所述重建图像。
- 根据权利要求3-12任一项所述的方法,其特征在于,当所述多个系数为所述多个所述变换像素值时,所述根据所述第一估计系数和所述第二估计系数得到所述重建图像,包括:对所述第一估计系数和所述第二估计系数进行反变换,以得到所述重建图像。
- 一种解码器,其特征在于,包括处理电路,用于执行如权利要求1-15任一项所述的方法。
- 一种计算机程序产品,其特征在于,包括程序代码,当其在计算机或处理器上执行时,用于执行如权利要求1-15任一项所述的方法。
- 一种解码器,其特征在于,包括:一个或多个处理器;非瞬时性计算机可读存储介质,耦合到所述处理器,存储有所述处理器执行的程序,其中,所述程序在由所述处理器执行时,使得所述解码器执行如权利要求1-15任一项所述的方法。
- 一种非瞬时性计算机可读存储介质,其特征在于,包括程序代码,当其由计算机设备执行时,用于执行基于权利要求1-15任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110781958.9A CN115604485A (zh) | 2021-07-09 | 2021-07-09 | 视频图像的解码方法及装置 |
CN202110781958.9 | 2021-07-09 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023279961A1 true WO2023279961A1 (zh) | 2023-01-12 |
Family
ID=84800355
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/100424 WO2023279961A1 (zh) | 2021-07-09 | 2022-06-22 | 视频图像的编解码方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115604485A (zh) |
WO (1) | WO2023279961A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115866252A (zh) * | 2023-02-09 | 2023-03-28 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 一种图像压缩方法、装置、设备及存储介质 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024188189A1 (en) * | 2023-03-10 | 2024-09-19 | Douyin Vision Co., Ltd. | Method, apparatus, and medium for visual data processing |
CN117336494B (zh) * | 2023-12-01 | 2024-03-12 | 湖南大学 | 一种基于频域特征的双路径遥感影像压缩方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110418163A (zh) * | 2019-08-27 | 2019-11-05 | 北京百度网讯科技有限公司 | 视频帧采样方法、装置、电子设备及存储介质 |
CN111641826A (zh) * | 2019-03-01 | 2020-09-08 | 杭州海康威视数字技术股份有限公司 | 对数据进行编码、解码的方法、装置与系统 |
CN111641832A (zh) * | 2019-03-01 | 2020-09-08 | 杭州海康威视数字技术股份有限公司 | 编码方法、解码方法、装置、电子设备及存储介质 |
WO2020191402A1 (en) * | 2019-03-21 | 2020-09-24 | Qualcomm Incorporated | Video compression using deep generative models |
CN111818346A (zh) * | 2019-04-11 | 2020-10-23 | 富士通株式会社 | 图像编码方法和装置、图像解码方法和装置 |
-
2021
- 2021-07-09 CN CN202110781958.9A patent/CN115604485A/zh active Pending
-
2022
- 2022-06-22 WO PCT/CN2022/100424 patent/WO2023279961A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111641826A (zh) * | 2019-03-01 | 2020-09-08 | 杭州海康威视数字技术股份有限公司 | 对数据进行编码、解码的方法、装置与系统 |
CN111641832A (zh) * | 2019-03-01 | 2020-09-08 | 杭州海康威视数字技术股份有限公司 | 编码方法、解码方法、装置、电子设备及存储介质 |
WO2020191402A1 (en) * | 2019-03-21 | 2020-09-24 | Qualcomm Incorporated | Video compression using deep generative models |
CN111818346A (zh) * | 2019-04-11 | 2020-10-23 | 富士通株式会社 | 图像编码方法和装置、图像解码方法和装置 |
CN110418163A (zh) * | 2019-08-27 | 2019-11-05 | 北京百度网讯科技有限公司 | 视频帧采样方法、装置、电子设备及存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115866252A (zh) * | 2023-02-09 | 2023-03-28 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 一种图像压缩方法、装置、设备及存储介质 |
CN115866252B (zh) * | 2023-02-09 | 2023-05-02 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 一种图像压缩方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN115604485A (zh) | 2023-01-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023279961A1 (zh) | 视频图像的编解码方法及装置 | |
WO2022068716A1 (zh) | 熵编/解码方法及装置 | |
US20240064318A1 (en) | Apparatus and method for coding pictures using a convolutional neural network | |
CN115956363A (zh) | 用于后滤波的内容自适应在线训练方法及装置 | |
WO2021249290A1 (zh) | 环路滤波方法和装置 | |
US20180302643A1 (en) | Video coding with degradation of residuals | |
WO2022194137A1 (zh) | 视频图像的编解码方法及相关设备 | |
WO2022253249A1 (zh) | 特征数据编解码方法和装置 | |
US20230362378A1 (en) | Video coding method and apparatus | |
WO2021136056A1 (zh) | 编码方法及编码器 | |
EP3711302B1 (en) | Spatially adaptive quantization-aware deblocking filter | |
JP2024513693A (ja) | ピクチャデータ処理ニューラルネットワークに入力される補助情報の構成可能な位置 | |
WO2022156688A1 (zh) | 分层编解码的方法及装置 | |
WO2022063267A1 (zh) | 帧内预测方法及装置 | |
WO2022100173A1 (zh) | 一种视频帧的压缩和视频帧的解压缩方法及装置 | |
US20230396810A1 (en) | Hierarchical audio/video or picture compression method and apparatus | |
WO2023193629A1 (zh) | 区域增强层的编解码方法和装置 | |
Koyuncu et al. | Parallelized context modeling for faster image coding | |
WO2023279968A1 (zh) | 视频图像的编解码方法及装置 | |
CN118020306A (zh) | 视频编解码方法、编码器、解码器及存储介质 | |
WO2023165487A1 (zh) | 特征域光流确定方法及相关设备 | |
WO2024217530A1 (en) | Method and apparatus for image encoding and decoding | |
US20240296594A1 (en) | Generalized Difference Coder for Residual Coding in Video Compression | |
WO2023000182A1 (zh) | 图像编解码及处理方法、装置及设备 | |
WO2024081009A1 (en) | Learned transforms for coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22836715 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22836715 Country of ref document: EP Kind code of ref document: A1 |