CN110300301B - Image coding and decoding method and device - Google Patents
Image coding and decoding method and device Download PDFInfo
- Publication number
- CN110300301B CN110300301B CN201810242304.7A CN201810242304A CN110300301B CN 110300301 B CN110300301 B CN 110300301B CN 201810242304 A CN201810242304 A CN 201810242304A CN 110300301 B CN110300301 B CN 110300301B
- Authority
- CN
- China
- Prior art keywords
- filter
- cnn
- upsampling
- image block
- sampling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/80—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
- H04N19/82—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The application provides an image coding and decoding method and device. The image encoding method includes: determining a target up-sampling filter from a preset up-sampling filter set according to the coding cost of an image block to be coded, wherein the up-sampling filter set at least comprises a Finite Impulse Response (FIR) up-sampling filter and a Convolutional Neural Network (CNN) up-sampling filter; generating up-sampling filter indication information corresponding to the target up-sampling filter; adopting a preset FIR down-sampling filter to perform down-sampling on the image block to be encoded to obtain a first image block; coding the first image block to obtain a code stream; and writing the indication information of the up-sampling filter into the code stream. The coding effect can be improved.
Description
Technical Field
The present application relates to the field of image encoding and decoding technologies, and in particular, to an image encoding and decoding method and apparatus.
Background
In the process of storing and transmitting images, the images are often required to be compressed and encoded, so as to reduce the storage capacity and transmission bandwidth occupied by the image code stream. In addition, in consideration of the cost of software and hardware implementation in the image encoding process, when an image is encoded, an image to be encoded is usually divided into a plurality of image blocks, and then each image block is encoded, where a typical image encoding process is shown in fig. 1.
The encoding process shown in fig. 1 mainly comprises the following steps:
101. acquiring an input image, and dividing the input image into image blocks;
102. predicting the image block to obtain a prediction signal;
103. obtaining an original residual signal according to the image block and the prediction signal;
104. changing and quantizing the original residual error signal to obtain a quantization coefficient;
105. carrying out inverse quantization and inverse transformation operations on the quantized coefficients to obtain reconstructed residual signals;
106. obtaining a reconstructed signal of the image block according to the reconstructed residual signal and the prediction signal;
107. and entropy coding the quantization coefficient and other indication information in the coding process to obtain a compressed code stream.
Fig. 2 illustrates a decoding process of an image, corresponding to the encoding process illustrated in fig. 1. The decoding process shown in fig. 2 mainly comprises the following steps:
201. entropy decoding is carried out on the compressed code stream to obtain a quantization coefficient;
202. carrying out inverse quantization and inverse transformation on the quantized coefficients to obtain a reconstructed residual signal of the current image block;
203. acquiring a prediction signal of an image block to be decoded;
204. obtaining a reconstructed signal of the current image block according to the reconstructed residual signal and the prediction signal;
205. and outputting the decoded image.
In some application scenarios, for example, in a video conference, an online live broadcast, and the like, in order to improve the encoding quality of an image and reduce the bandwidth occupied by a compressed code stream, an image block may be encoded in a variable resolution mode.
Specifically, the encoding end performs downsampling operation on an image block according to a downsampling filter to obtain a low-resolution image block, then performs prediction, transformation and quantization on the low-resolution image block to obtain a quantization coefficient, and then performs entropy encoding on the quantization coefficient to obtain a compressed code stream. In addition, the encoding end also needs to perform inverse quantization and inverse transformation on the quantization coefficient to obtain a reconstructed residual signal, obtain an initial reconstructed image block according to the reconstructed residual signal and a prediction signal of the current image block to be decoded, and finally perform an upsampling operation on the initial reconstructed image block to obtain a final reconstructed image, wherein the reconstructed image can be used as a reference image when a subsequent image is encoded.
Correspondingly, the decoding end performs decoding operation on the compressed code stream to obtain a low-resolution reconstructed image, and then performs up-sampling on the low-resolution reconstructed image according to the up-sampling filter, so as to obtain a reconstructed image with the same original resolution as the image block.
However, in the conventional scheme, both the upsampling filter and the downsampling filter used for performing variable resolution coding on the image block adopt a preset Fixed Impulse Response (FIR) filter, and the coding effect is not ideal.
Disclosure of Invention
The application provides an image coding and decoding method for improving coding and decoding effects.
In a first aspect, an image encoding method is provided, and the method includes: determining a target up-sampling filter from a preset up-sampling filter set; generating up-sampling filter indication information corresponding to a target up-sampling filter; downsampling an image block to be encoded to obtain a first image block; coding the first image block to obtain a code stream; and writing the indication information of the up-sampling filter into the code stream.
The upsampling filter set at least includes an FIR upsampling filter and a Convolutional Neural Network (CNN) upsampling filter, where the FIR downsampling filter may be a bicubic upsampling filter, and the CNN upsampling filter may be an upsampling filter formed by a Convolutional Neural Network.
The upsampling filter indication information may be used to indicate that a certain upsampling filter in the upsampling filter set is a target upsampling filter. For example, the upsampling filter indication information may indicate a FIR upsampling filter of the upsampling filter set as the target upsampling filter, or the upsampling filter indication information may also indicate a CNN upsampling filter of the upsampling filter set as the target upsampling filter.
Optionally, the upsampling filter indication information is specifically an upsampling filter selection Flag2, and a value of the Flag2 is used to indicate the target upsampling filter.
For example, when the FIR filter is determined to be the target upsampling filter, the value of Flag2 is 0; and when the CNN filter is determined to be the target upsampling filter, the value of Flag2 is 1. Or when the FIR filter is determined to be the target upsampling filter, the value of Flag2 is 1; and when the CNN filter is determined to be the target upsampling filter, the value of Flag2 is 0.
It should be appreciated that determining the target upsampling filter from the pre-set of upsampling filters and determining the target downsampling filter from the set of downsampling filters are two separate processes. The target up-sampling filter and the target down-sampling filter are determined without being divided in time, so that the target up-sampling filter can be determined firstly, the target down-sampling filter can be determined firstly, or the target up-sampling filter and the target down-sampling filter can be determined simultaneously.
Optionally, encoding the first image block to obtain a code stream, including: and predicting, transforming, quantizing and entropy coding the first image block to obtain a code stream.
Specifically, a prediction number can be obtained by predicting the first image block, after the prediction signal is obtained, the prediction signal can be subtracted from the original signal of the first image block to obtain an original residual signal, then the original residual signal is transformed and quantized to obtain a quantized coefficient, and finally the quantized coefficient is entropy-encoded to obtain a code stream.
In this application, can select a target upsampling filter from the multiple upsampling filter that sets up in advance to carry out subsequent upsampling operation, compare with the upsampling filter that directly adopts fixed parameter value and carry out subsequent upsampling operation, the choice scope of upsampling filter in this application is bigger, can select suitable upsampling filter as target upsampling filter according to the circumstances.
In a possible implementation manner, downsampling an image block to be encoded to obtain a first image block specifically includes: and performing downsampling on the image block to be encoded by adopting a preset FIR downsampling filter to obtain a first image block.
By adopting the preset FIR downsampling filter to directly carry out downsampling operation, a target downsampling filter does not need to be selected from various downsampling filters, the complexity of coding operation can be reduced, and the coding efficiency is improved.
In a possible implementation manner, the method further includes: performing inverse transformation and inverse quantization on the quantized coefficients to obtain initial reconstructed image blocks; an initial reconstruction image block is up-sampled by adopting a target up-sampling filter to obtain a target reconstruction image block; entropy coding is carried out on the quantized coefficient to obtain a code stream; and writing the indication information of the up-sampling filter into the code stream.
Due to the fact that the image blocks are subjected to down-sampling operation, the resolution of the first image block obtained through down-sampling is smaller than the original resolution of the image block to be encoded. When the image block is reconstructed, the resolution of the initial image block is also smaller than the original resolution of the image block to be encoded, and the resolution of the target reconstructed image block obtained after the initial image block is up-sampled is the same as the original resolution of the image block to be encoded.
In a possible implementation manner, determining a target upsampling filter from the upsampling filter set specifically includes: and determining a target up-sampling filter from a preset up-sampling filter set according to the coding cost of the image block to be coded.
Optionally, determining a target upsampling filter from a preset upsampling filter set according to the coding cost of the image block to be coded, including: and determining the up-sampling filter with the corresponding coding cost meeting the preset requirement in the up-sampling filter set as the target up-sampling filter.
In one possible implementation manner, determining an upsampling filter in the upsampling filter set whose corresponding coding cost meets a preset requirement as a target upsampling filter includes: determining the coding cost of the image block to be coded when each up-sampling filter in the up-sampling filter set is used as a target up-sampling filter; and determining a first upsampling filter in the upsampling filter set as a target upsampling filter, wherein the coding cost of the image block to be coded when the first upsampling filter is used as the target upsampling filter in the upsampling filter set is the minimum.
By comparing the coding costs of the image blocks to be coded corresponding to the upsampling operation of each upsampling filter in the upsampling filter set, the upsampling filter with the minimum corresponding coding cost can be selected as the target upsampling filter, so that the coding cost generated in the coding process can be reduced as much as possible, and a better coding effect can be obtained.
It should be understood that the above-mentioned coding cost may specifically be a rate-distortion cost of the image block during the coding process, a distortion of the image block, and so on.
In a possible implementation manner, determining a target upsampling filter from a preset upsampling filter set specifically includes: and determining a target up-sampling filter from the up-sampling filter according to the texture characteristics of the image block to be coded.
For example, when the texture of an image block to be encoded is sparse, an FIR upsampling filter can be selected as a target upsampling filter; when the texture of the image block to be encoded is dense, the CNN upsampling filter may be selected as the target upsampling filter in order to ensure the encoding effect.
According to the texture characteristics of the image blocks, the up-sampling filter which is matched with the image blocks in a comparison mode can be flexibly selected to serve as the target up-sampling filter, and the coding effect of the image blocks with different texture characteristics can be improved.
In a possible implementation manner, determining a target upsampling filter from a preset upsampling filter set specifically includes: and determining a sampling filter from the up-sampling filter according to the spectral characteristics of the image block to be coded.
For example, when the high frequency component of the image block to be encoded is more, the CNN upsampling filter may be selected as the target upsampling filter, and when the high frequency component of the image block to be encoded is less, the FIR filter may be selected as the upsampling filter.
The up-sampling filter matched with the image block can be selected as the target up-sampling filter according to the spectral characteristics of the image block, and the coding effect of the image blocks with different spectral characteristics can be improved.
In a possible implementation manner, the parameter value of the CNN upsampling filter is preset, and the parameter value of the CNN upsampling filter is obtained by performing offline training on a preset image training set.
Because the parameter value of the CNN up-sampling filter is obtained through off-line training, if the CNN up-sampling filter is adopted to perform up-sampling in the encoding process, the information loss of the image in the up-sampling process can be reduced, and the image encoding quality is improved.
In a possible implementation manner, before determining the target upsampling filter from the preset upsampling filter set, the method further includes: and performing on-line training on the CNN up-sampling filter according to the image block to be encoded to obtain an updated parameter value of the CNN up-sampling filter.
And the updated parameter value of the CNN up-sampling filter is used for replacing the preset parameter value of the CNN up-sampling filter.
Optionally, the method further includes: and writing the updated parameter value of the CNN up-sampling filter into the code stream.
By training the CNN up-sampling filter on line, the up-sampling filter with filter parameters more matched can be used for performing up-sampling operation according to image texture characteristics, and compared with the preset CNN up-sampling filter, the image quality of an image block output by up-sampling can be further improved.
In a possible implementation manner, before determining a target upsampling filter from a preset upsampling filter set according to the coding cost of the image block to be coded, the method further includes: determining a target coding mode from an original resolution coding mode and a variable resolution coding mode according to the coding cost of an image block to be coded; generating encoding mode indication information corresponding to a target encoding mode; and writing the coding mode indication information into the code stream.
Optionally, determining a target upsampling filter from a preset upsampling filter set according to the coding cost of the image block to be coded, which specifically includes: determining the coding cost of an image block to be coded in the original resolution coding mode; determining the coding cost of an image block to be coded in a variable resolution coding mode; and selecting the coding mode with the minimum coding cost from the original resolution coding mode and the variable resolution coding mode as the target coding mode.
The coding mode indication information is used to indicate which coding mode of the candidate coding modes is the target coding mode. For example, the encoding mode indication information may indicate that the original resolution encoding mode is the target encoding mode, or the encoding mode indication information may indicate that the variable resolution encoding mode is the target encoding mode.
According to the coding cost of the image block to be coded, the coding mode with lower coding cost can be selected as the target coding mode, and the loss of image coding in the coding process can be reduced.
It should be understood that when it is determined that the image block to be encoded is encoded in the variable resolution encoding mode, the target upsampling filter is determined from the set of upsampling filters set in advance according to the encoding cost of the image block. When the variable resolution coding mode is determined to be adopted for coding the image block to be coded, the original resolution coding mode is directly adopted for coding the image block to be coded, and a target up-sampling filter does not need to be determined, because the up-sampling operation is not needed in the original resolution coding mode.
In a second aspect, there is provided an image encoding method, the method comprising: determining a target up-sampling filter from a preset up-sampling filter set according to the coding cost of an image block to be coded, wherein the up-sampling filter set at least comprises an FIR up-sampling filter and a CNN up-sampling filter; generating up-sampling filter indication information corresponding to a target up-sampling filter; determining a down-sampling filter with the same type as a target up-sampling filter in a preset down-sampling filter set as a target down-sampling filter, wherein the down-sampling filter set at least comprises an FIR down-sampling filter and a CNN down-sampling filter; a target downsampling filter is adopted to downsample an image block to be encoded to obtain a first image block; coding the first image block to obtain a code stream; and writing the indication information of the up-sampling filter into the code stream.
For example, the upsampling filter indication information may indicate a FIR upsampling filter of the upsampling filter set as the target upsampling filter, or the upsampling filter indication information may also indicate a CNN upsampling filter of the upsampling filter set as the target upsampling filter.
In the method, the target up-sampling filter can be determined from the up-sampling filter set according to the coding cost, and compared with the up-sampling operation directly performed by the up-sampling filter with fixed parameter values, the coding cost of the image block to be coded is fully considered when the target up-sampling filter is selected, and a better coding effect can be obtained.
In a possible implementation manner, determining a target upsampling filter from a preset upsampling filter set according to a coding cost of an image block to be coded includes: determining the coding cost of the image block to be coded when each up-sampling filter in the up-sampling filter set is used as a target up-sampling filter; and determining a first upsampling filter in the upsampling filter set as a target upsampling filter, wherein in the upsampling filter set, the coding cost of the image block to be coded is the smallest when the first upsampling filter in the upsampling filter set is used as the target upsampling filter.
In the application, the upsampling filter with the minimum coding cost is selected as the target upsampling filter, so that the coding cost generated in the coding process can be reduced as much as possible, and a better coding effect can be obtained.
In a possible implementation manner, the parameter value of the CNN upsampling filter is preset, and the parameter value of the CNN upsampling filter is obtained by performing offline training on a preset image training set.
Because the parameter value of the CNN up-sampling filter is obtained through off-line training, if the CNN up-sampling filter is adopted for up-sampling in the encoding process, the information loss of the image in the up-sampling process can be reduced, and the image encoding quality is improved.
In a possible implementation manner, the parameter value of the CNN downsampling filter is preset, and the parameter value of the CNN downsampling filter is obtained by performing offline training on a preset image training set.
Because the parameter value of the CNN downsampling filter is obtained through offline training, if the CNN downsampling filter is adopted for downsampling in the encoding process, the information loss of the image in the downsampling process can be reduced, and the image encoding quality is improved.
In a possible implementation manner, the parameter value of the CNN upsampling filter and the parameter value of the CNN downsampling filter are both preset, and the parameter value of the CNN upsampling filter and the parameter value of the CNN downsampling filter are obtained by performing joint training on a preset image training set under an offline condition.
Because the CNN up-sampling filter and the CNN down-sampling filter are obtained in an off-line condition in a joint training mode, the information loss caused by image textures in the up-sampling process and the down-sampling process can be reduced in the encoding process, and the quality of an encoded image is improved.
In one possible implementation, before downsampling the image block to be encoded by using the target downsampling filter, the method further includes: and performing online training on the CNN downsampling filter according to the image block to be encoded to obtain an updated parameter value of the CNN downsampling filter, wherein the updated parameter value of the CNN downsampling filter is used for replacing a preset parameter value of the CNN downsampling filter.
By training the parameters of the CNN downsampling filter on line, the information loss caused by downsampling operation can be reduced as much as possible, and the quality of the reconstructed image is improved. In addition, since the CNN downsampling filter is used only in the encoder, there is no need to transfer the CNN downsampling filter parameters to the decoder, and thus, there is no increase in encoding overhead.
In a possible implementation manner, before determining a target upsampling filter from a preset upsampling filter set according to the coding cost of the image block to be coded, the method further includes: and performing online training on the CNN up-sampling filter according to the image block to be encoded to obtain an update parameter value of the CNN up-sampling filter, wherein the update parameter value of the CNN up-sampling filter is used for replacing a preset parameter value of the CNN up-sampling filter.
By training the CNN up-sampling filter on line, the up-sampling filter with the filter parameters more matched can be used for performing up-sampling operation according to the image texture characteristics, and compared with the preset CNN up-sampling filter, the image quality of the image block output by up-sampling can be further improved.
In one possible implementation, before downsampling the image block to be encoded by using the target downsampling filter, the method further includes: performing combined online training on the CNN down-sampling filter and the CNN up-sampling filter according to the image block to be encoded to obtain an update parameter value of the CNN down-sampling filter and an update parameter value of the CNN up-sampling filter; the updating parameter value of the CNN down-sampling filter is used for replacing the preset parameter value of the CNN down-sampling filter, and the updating parameter value of the CNN up-sampling filter is used for replacing the preset parameter value of the CNN up-sampling filter.
Optionally, performing joint online training on the CNN downsampling filter and the CNN upsampling filter according to the image block to be encoded, including: using a CNN downsampling filter to downsample an image block to be encoded to obtain an image block to be encoded with low resolution; carrying out image coding on the low-resolution image block to be coded by using a coding end simulator to obtain a low-fraction reconstructed image block; the CNN up-sampling filter is used for up-sampling the reconstructed image block with low resolution to obtain a target reconstructed image block; and determining the parameters of the up-sampling filter and the down-sampling filter with the minimum difference between the image block to be encoded and the target reconstructed image block as the update parameter value of the CNN up-sampling filter and the update parameter value of the CNN down-sampling filter.
By training the CNN up-sampling filter and the CNN down-sampling filter on line, the up-sampling filter and the down-sampling filter which are matched with filter parameters can be used for performing up-sampling operation and down-sampling operation according to image texture characteristics, and compared with the preset CNN up-sampling filter and CNN down-sampling filter, the image quality of an image block output by up-sampling can be further improved.
In a possible implementation manner, the method further includes: and writing the updated parameter value of the CNN up-sampling filter into the code stream.
In a possible implementation manner, before determining, as the target upsampling filter, an upsampling filter in the upsampling filter set whose corresponding coding cost meets a preset requirement, the method further includes:
and performing on-line joint training on the CNN up-sampling filter and the CNN down-sampling filter according to the image block to be encoded to obtain an update parameter value of the CNN up-sampling filter and an update parameter value of the CNN down-sampling filter.
In a possible implementation manner, before determining a target upsampling filter from a preset upsampling filter set according to the coding cost of the image block to be coded, the method further includes: determining a coding mode with a coding cost meeting preset requirements in candidate coding modes as a target coding mode, wherein the candidate coding modes comprise an original resolution coding mode and a variable resolution coding mode; generating coding mode indication information, wherein the coding mode indication information is used for indicating a target coding mode; and writing the coding mode indication information into the code stream.
In a possible implementation manner, before determining a target upsampling filter from a preset upsampling filter set according to the coding cost of the image block to be coded, the method further includes: determining a target coding mode from an original resolution coding mode and a variable resolution coding mode according to the coding cost of an image block to be coded; generating encoding mode indication information corresponding to a target encoding mode; and writing the coding mode indication information into the code stream.
Optionally, determining a target upsampling filter from a preset upsampling filter set according to the coding cost of the image block to be coded, which specifically includes: determining the coding cost of an image block to be coded in the original resolution coding mode; determining the coding cost of an image block to be coded in a variable resolution coding mode; and selecting the coding mode with the minimum coding cost from the original resolution coding mode and the variable resolution coding mode as the target coding mode.
The coding mode indication information is used to indicate which coding mode of the candidate coding modes is the target coding mode. For example, the encoding mode indication information may indicate that the original resolution encoding mode is the target encoding mode, or the encoding mode indication information may indicate that the variable resolution encoding mode is the target encoding mode.
According to the coding cost of the image block to be coded, the coding mode with lower coding cost can be selected as the target coding mode, and the loss of image coding in the coding process can be reduced.
It should be understood that when it is determined that the image block to be encoded is encoded in the variable resolution encoding mode, the target upsampling filter is determined from the set of upsampling filters set in advance according to the encoding cost of the image block. When the variable resolution coding mode is determined to be adopted for coding the image block to be coded, the original resolution coding mode is directly adopted for coding the image block to be coded, and a target up-sampling filter does not need to be determined, because the up-sampling operation is not needed in the original resolution coding mode.
In a third aspect, an image encoding method is provided, the method including: determining a target downsampling filter from a preset downsampling filter set according to the coding cost of an image block to be coded, wherein the downsampling filter set at least comprises a Finite Impulse Response (FIR) downsampling filter and a Convolutional Neural Network (CNN) downsampling filter; a target downsampling filter is adopted to downsample an image block to be encoded to obtain a first image block; coding the first image block to obtain a code stream; determining an up-sampling filter with the same type as a target down-sampling filter in a preset up-sampling filter set as a target up-sampling filter, wherein the up-sampling filter set at least comprises an FIR up-sampling filter and a CNN up-sampling filter; generating up-sampling filter indication information corresponding to a target up-sampling filter; and writing the indication information of the up-sampling filter into the code stream.
For example, the upsampling filter indication information may indicate a FIR upsampling filter of the upsampling filter set as a target upsampling filter, or the upsampling filter indication information may also indicate a CNN upsampling filter of the upsampling filter set as a target upsampling filter.
According to the method and the device, the target down-sampling filter can be determined from the down-sampling filter set according to the coding cost, the up-sampling filter is determined according to the filter type of the down-sampling filter, and compared with the operation of directly adopting the up-sampling filter with fixed parameter values to perform up-sampling, the coding cost of the image block to be coded is fully considered when the target up-sampling filter is selected, and a better coding effect can be obtained.
In a possible implementation manner, determining a target downsampling filter from a preset downsampling filter set according to a coding cost of an image block to be coded includes: determining the coding cost of an image block to be coded when each downsampling filter in the downsampling filter set is used as a target downsampling filter; and determining a first downsampling filter in the downsampling filter set as a target downsampling filter, wherein in the downsampling filter set, the coding cost of the image block to be coded is the minimum when the first downsampling filter in the downsampling filter set is used as the target downsampling filter.
In the application, the downsampling filter with the minimum coding cost is selected as the target downsampling filter, so that the coding cost generated in the coding process can be reduced as much as possible, and a better coding effect can be achieved.
In a fourth aspect, there is provided an image decoding method, including: acquiring a code stream; entropy decoding, inverse quantization and inverse transformation are carried out on the code stream to obtain a reconstructed residual error signal of the image block to be decoded; acquiring a prediction signal of an image block to be decoded; adding the reconstructed residual signal and the prediction signal to obtain an initial reconstructed image block of the image block to be decoded; analyzing the code stream to obtain coding mode indication information; determining a target decoding mode from the original resolution decoding mode and the variable resolution decoding mode according to the encoding mode indication information; under the condition that the target decoding mode is a variable resolution decoding mode, analyzing a code stream of an image block to be coded, and acquiring indication information of an up-sampling filter; determining a target upsampling filter from a preset upsampling filter set according to the upsampling filter indication information, wherein the upsampling filter set at least comprises a finite impulse response FIR upsampling filter and a Convolutional Neural Network (CNN) upsampling filter; and upsampling the initial reconstructed image block by adopting a target upsampling filter to obtain a target reconstructed image block.
The encoding mode may include an original resolution encoding mode and a variable resolution encoding mode, and the encoding mode indication information is used to indicate whether an encoding end adopts the original resolution encoding mode or the variable resolution encoding mode during encoding. That is, the coding mode indication information indicates in which coding mode the code stream is obtained. For example, the encoding mode indication information may indicate that the code stream is encoded in an original resolution encoding mode or a resolution-variable encoding mode.
Optionally, determining the target decoding mode from the original resolution decoding mode and the variable resolution decoding mode according to the encoding mode indication information includes: when the coding mode is the variable resolution coding mode, determining that the target decoding mode is the variable resolution decoding mode; and when the coding mode is the original resolution coding mode, determining the target decoding mode to be the original resolution decoding mode.
The decoding end selects the decoding mode corresponding to the coding mode of the coding end as the target decoding mode, and the decoding effect of the decoding end can be ensured.
In the application, when an image block is decoded in a variable resolution decoding mode, a decoding end can select a corresponding upsampling filter from a preset upsampling filter set as a target upsampling filter according to the upsampling filter indication information, and then perform upsampling operation.
Optionally, the encoding mode indication information is specifically a value of an encoding mode Flag1, and different values of Flag1 indicate different encoding modes.
For example, when the value of Flag1 is 0, the original resolution coding mode is indicated, and when the value of Flag1 is 1, the variable resolution coding mode is indicated.
In a possible implementation manner, the parameter value of the CNN upsampling filter is preset, and the parameter value of the CNN upsampling filter is obtained by performing offline training on a preset image training set.
Because the parameter value of the CNN up-sampling filter is obtained through off-line training, the information loss of the image in the up-sampling process can be reduced and the image decoding quality is improved by adopting the CNN up-sampling filter for up-sampling in the decoding process.
In a possible implementation manner, the method further includes: and analyzing the code stream to obtain an update parameter value of the CNN up-sampling filter, wherein the update parameter value of the CNN up-sampling filter is used for replacing a preset parameter value of the CNN up-sampling filter.
By updating the parameter value of the CNN up-sampling filter, the filter parameter more matched with the image content can be obtained, so that the information loss of the image is further reduced when the CNN up-sampling filter is used for up-sampling operation, and the image decoding quality is improved.
In a possible implementation manner, before parsing the code stream and obtaining an update parameter value of the CNN upsampling filter, the method further includes: analyzing the code stream, and acquiring filter parameter updating indication information which is used for indicating whether to update the parameter value of the target up-sampling filter; analyzing the code stream to obtain an update parameter value of the CNN up-sampling filter, wherein the method comprises the following steps: and analyzing the code stream to acquire an update parameter value of the CNN up-sampling filter under the condition that the filter parameter update indication information indicates that the parameter of the target up-sampling filter is updated.
The update value of the CNN up-sampling filter is obtained from the code stream only when the filter parameter update indication information indicates that the target up-sampling filter needs to be updated, so that the decoding load can be reduced, and the decoding efficiency can be improved.
Optionally, the filter parameter update indication information is carried in a Sequence Parameter Set (SPS) or a Picture Parameter Set (PPS).
Optionally, the update parameter value of the CNN upsampling filter is obtained by performing online training on the CNN upsampling network according to the image block to be encoded, where the image block to be decoded is obtained by encoding the image block to be encoded.
By training the CNN up-sampling filter on line, the up-sampling filter with filter parameters more matched can be used for performing up-sampling operation according to image texture characteristics, and compared with the preset CNN up-sampling filter, the image quality of an image block output by up-sampling can be further improved.
In a fifth aspect, an image encoding apparatus is provided, where the image encoding apparatus includes means for performing the method of the first, second, or third aspect.
In a sixth aspect, there is provided an image decoding apparatus comprising means for performing the method of the fourth aspect.
A seventh aspect provides an image encoding device, including: a memory for storing a program; a processor configured to execute the program stored in the memory, and when the program is executed, the processor is configured to perform the method according to the first aspect, the second aspect, or the third aspect.
An eighth aspect provides an image decoding apparatus including: a memory for storing a program; a processor configured to execute the program stored in the memory, wherein when the program is executed, the processor is configured to perform the method of the fourth aspect.
A ninth aspect provides an image encoding apparatus, comprising a nonvolatile storage medium storing an executable program, and a central processing unit connected to the nonvolatile storage medium and executing the executable program to implement the method of the first, second or third aspect.
A tenth aspect provides an image decoding apparatus, comprising a nonvolatile storage medium and a central processing unit, wherein the nonvolatile storage medium stores an executable program, and the central processing unit is connected to the nonvolatile storage medium and executes the executable program to implement the method of the fourth aspect.
In an eleventh aspect, there is provided a computer readable medium storing program code for execution by a device, the program code comprising instructions for performing the method of the first, second, third or fourth aspect.
In a twelfth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first, second, third or fourth aspect described above.
In a thirteenth aspect, an electronic device is provided, which includes the image encoding apparatus in the above fifth aspect to tenth aspect, and/or the image decoding apparatus in the above fifth aspect to tenth aspect.
Drawings
FIG. 1 is a schematic diagram of an image encoding process;
FIG. 2 is a schematic diagram of an image decoding process;
FIG. 3 is a schematic view of a digital image;
FIG. 4 is a schematic flow chart of an image encoding method of an embodiment of the present application;
FIG. 5 is a schematic diagram of an on-line training process for a CNN filter;
FIG. 6 is a schematic flow chart of an image encoding method of an embodiment of the present application;
FIG. 7 is a schematic flow chart of an image encoding method of an embodiment of the present application;
FIG. 8 is a schematic flow chart of an image decoding method according to an embodiment of the present application;
FIG. 9 is a schematic of an encoder architecture;
FIG. 10 is a flowchart of an image encoding method according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a CNN downsampling network;
FIG. 12 is a flowchart of an image decoding method according to an embodiment of the present application;
fig. 13 is a schematic diagram of a CNN upsampling network;
fig. 14 is a schematic diagram of an offline training process of a CNN filter;
FIG. 15 is a schematic of the architecture of an encoder;
FIG. 16 is a schematic of the architecture of an encoder;
FIG. 17 is a flowchart of an image decoding method according to an embodiment of the present application;
FIG. 18 is a flowchart of an image decoding method according to an embodiment of the present application;
fig. 19 is a schematic block diagram of an image encoding apparatus of an embodiment of the present application;
FIG. 20 is a schematic block diagram of an image encoding apparatus according to an embodiment of the present application;
FIG. 21 is a schematic block diagram of an image decoding apparatus according to an embodiment of the present application;
FIG. 22 is a schematic block diagram of an encoder of an embodiment of the present application;
FIG. 23 is a schematic block diagram of a decoder of an embodiment of the present application;
fig. 24 is a schematic block diagram of a codec device according to an embodiment of the present application;
fig. 25 is a schematic block diagram of a video codec system according to an embodiment of the present application.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
The image coding and decoding method can code and decode the digital image. Specifically, the digital image is image information recorded in a digital signal manner. Fig. 3 shows a digital image that can be viewed as an M × N two-dimensional array, which contains a total of MxN samples (samples), the position of each sample being referred to as a sample position, and the value of each sample being referred to as a sample value. Generally, mxN may be referred to as the resolution of an image, i.e., the number of samples contained in an image. For example, the image resolution of a 2K image resolution is 1920 × 1080,4K video is 3840 × 2160. Conventionally, a sample may also be referred to as a pixel, and a pixel usually contains two pieces of information, namely, a pixel position and a pixel value.
An image, which may be a digital image in particular, typically contains one or more color components. For example, a black and white image contains only a luminance component, and a color image contains three color components of RGB. A color image is composed of samples of three color components, which can be intuitively understood as three color planes (planes), each of which is a sampling array as shown in fig. 3 obtained by sampling the corresponding color component. The three color components of a color image may have different representations, and commonly used representations include RGB, YCbCr, ICtCp, and the like.
The three color planes of the color image may have different resolutions. Taking a color image in YCbCr format as an example, since the luminance component Y contains detail texture information, and the two chrominance components Cb and Cr contain less high-frequency signals, a down-sampling operation of 2.
It should be understood that the image coding and decoding method of the embodiment of the present application can encode and decode video images.
Fig. 4 is a schematic flowchart of an image encoding method according to an embodiment of the present application. The method shown in fig. 4 may be executed by an encoding end device, where the encoding end device may specifically be an intelligent terminal, a Personal Digital Assistant (PAD), a video player, an internet of things device, or another device having a video image encoding function. The method shown in fig. 4 includes steps 301 to 305, and the steps 301 to 305 are described in detail below with reference to specific examples.
301. And determining a target up-sampling filter from a preset up-sampling filter set according to the coding cost of the image block to be coded.
Wherein the set of upsampling filters includes at least a FIR upsampling filter and a CNN upsampling filter. The FIR upsampling filter may specifically be a bicubic upsampling filter, and the CNN upsampling filter is a filter formed by a convolutional neural network.
In the application, besides determining the target upsampling filter from the upsampling filter set according to the coding cost, the target upsampling filter can be determined according to the texture feature or the upsampling filter set preset on the spectrum characteristic of the image block to be coded.
In particular, the upsampling filter may be selected according to the degree of density of the texture of the image block to be encoded (the denser the texture contains more image features). When the texture of the image block is dense, selecting a CNN filter as an up-sampling filter; and when the texture of the image block to be coded is sparse, selecting the FIR filter as an up-sampling filter.
In addition, the image block to be coded can be subjected to spectrum analysis to obtain the spectrum characteristics to be coded. When the high-frequency component of the image block to be coded is more, the CNN filter can be selected as an up-sampling filter; and when the high frequency component of the image block to be encoded is less, the FIR filter can be selected as the upsampling filter.
302. An upsampling filter indication corresponding to the target upsampling filter is generated.
The upsampling filter indication information may be used to indicate that a certain upsampling filter in the upsampling filter set is a target upsampling filter.
For example, the upsampling filter indication information may indicate a FIR upsampling filter of the upsampling filter set as a target upsampling filter, or the upsampling filter indication information may also indicate a CNN upsampling filter of the upsampling filter set as a target upsampling filter.
Optionally, the upsampling filter indication information is specifically an upsampling filter selection Flag2, and a value of the Flag2 is used to indicate the target upsampling filter.
For example, when the FIR filter is determined to be the target upsampling filter, the value of Flag2 is 0; and when the CNN filter is determined to be the target upsampling filter, the value of Flag2 is 1. Or when the FIR filter is determined to be the target upsampling filter, the value of Flag2 is 1; and when the CNN filter is determined to be the target upsampling filter, the value of Flag2 is 0.
In short, what value Flag2 is specifically taken to represent which filter is not limited in the present application, as long as different values of Flag2 can indicate different upsampling filters.
After the decoding end acquires the indication information of the up-sampling filter, the target up-sampling filter can be directly determined according to the corresponding relation between the value of Flag2 and the up-sampling filter.
303. And performing downsampling on the image block to be encoded by adopting a preset FIR downsampling filter to obtain a first image block.
It should be understood that steps 301 and 303 are two separate processes, and that step 301 may occur prior to step 303 or later than step 303, or that steps 301 and 303 may occur simultaneously.
304. And coding the first image block to obtain a code stream.
Encoding the first image block may comprise: and predicting, transforming, quantizing and entropy coding the first image block to obtain a code stream.
Specifically, a prediction number can be obtained by predicting the first image block, after the prediction signal is obtained, the prediction signal can be subtracted from the original signal of the first image block to obtain an original residual signal, then the original residual signal is transformed and quantized to obtain a quantization coefficient, and finally the quantization coefficient is entropy-encoded to obtain a code stream.
When predicting the first image block, the first image block may be predicted using an encoded reconstructed signal within the same image as the first image block, and such a prediction operation is commonly referred to as intra prediction. Typical intra prediction includes direct current component prediction, planar prediction, horizontal prediction, vertical prediction, and the like.
The quantization coefficient is entropy coded by the entropy coding module to obtain a code stream (also called as a compressed code stream), and in addition, in the process of generating the code stream, the entropy coding module also performs entropy coding operation on various indication information such as a prediction mode generated in the coding process to obtain the code stream.
Besides encoding an image block to be encoded to generate a code stream, an encoding end generally needs to acquire a reconstructed signal of the image block to be encoded as a reference for performing intra-frame prediction operation on a subsequent encoded image block. Therefore, the encoding end needs to perform inverse quantization and inverse transformation on the quantization coefficient obtained in the encoding process of the image block to be encoded to obtain a reconstructed residual signal, and then add the reconstructed residual signal to the prediction signal to obtain a reconstructed signal of the image block.
The original residual signal has information loss after transformation and quantization operations, and the information loss is irreversible, so that the reconstructed residual signal after inverse transformation after inverse quantization is inconsistent with the original residual signal, which further causes the reconstructed signal of the image block to be inconsistent with the original signal, and thus the compression coding method is called lossy compression.
In the image coding method based on block division, signal discontinuity usually occurs at the image block boundary position of the acquired reconstructed image, and the image block boundary position appears as a blocking effect. In order to eliminate the blocking effect, a smoothing filtering operation, called deblocking filtering, is usually performed on the boundary pixels of the image block to improve the quality of the reconstructed image. Besides the block effect removing filtering, the reconstructed image can be subjected to various filtering operations such as wiener filtering and bilateral filtering to improve the quality of the reconstructed image.
305. And writing the indication information of the up-sampling filter into the code stream.
By writing the up-sampling filter into the code stream, the decoding end can obtain the information of the up-sampling filter through the code stream and select a corresponding filter from preset filters as the up-sampling filter, so that the decoding end can adopt the same up-sampling filter for up-sampling.
According to the method and the device, one filter can be selected from multiple candidate filters to serve as an up-filter according to the coding cost of the image block, and compared with a mode of adopting a fixed filter as an up-sampling filter, a better coding effect can be achieved.
Alternatively, the FIR upsampling filter and the CNN upsampling filter in the upsampling filter set are preset in the encoder and the decoder, and the filter parameters of the FIR upsampling filter and the CNN upsampling filter may be preset or calculated.
Optionally, as an embodiment, the determining, according to the coding cost of the image block to be coded, a target upsampling filter from a preset upsampling filter set includes: determining the coding cost of the image block to be coded when each up-sampling filter in the up-sampling filter set is used as a target up-sampling filter; and determining a first upsampling filter in the upsampling filter set as a target upsampling filter, wherein the coding cost of the image block to be coded when the first upsampling filter is used as the target upsampling filter in the upsampling filter set is the minimum.
By comparing the coding costs of the image block to be coded corresponding to each upsampling filter in the upsampling filter set when the upsampling operation is performed, the upsampling filter with the minimum corresponding coding cost can be selected from the upsampling filter set as the target upsampling filter, so that the coding cost generated in the coding process can be reduced as much as possible, and a better coding effect can be achieved.
It should be understood that the above coding cost may specifically be a rate distortion cost of an image block during coding, distortion of an image block, and the like.
When a target up-sampling filter is determined from a preset up-sampling filter set according to the coding cost of an image block to be coded, firstly, the coding cost of the image block when each filter in the up-sampling filter set is used as the target up-sampling filter is determined, and then, a corresponding filter with the minimum coding cost is selected as the up-sampling filter.
For example, if the coding cost of the image block when the FIR upsampling filter is used as the target upsampling filter is less than the coding cost of the image block when the CNN upsampling filter is used as the target upsampling filter, then the FIR upsampling filter is determined to be the upsampling filter.
The coding cost may specifically be a rate distortion cost of the image block, and may also be a distortion of the image block, and the like.
Because the upsampling filter set comprises the CNN upsampling filter and the FIR upsampling filter, when the target upsampling filter is determined, the coding cost of the image block to be coded when the CNN upsampling filter and the FIR upsampling filter are respectively used as the target upsampling filter for upsampling operation can be determined, and then the filter with lower corresponding coding cost is selected as the target upsampling filter.
The determining the coding cost of the image block to be coded when the FIR upsampling filter is used as the target upsampling filter specifically comprises the following processes:
(1) Performing downsampling operation on an image block to be encoded by adopting an FIR downsampling filter to obtain an image block with low resolution;
(2) Coding the image block with low resolution and outputting a compressed code stream;
(3) Acquiring a reconstructed image block of an image block to be encoded;
(4) Determining the mean square error D of the reconstructed image block and the image block to be encoded FIR ;
(5) Determining the size R of the compressed code stream of the image block to be coded FIR ;
(6) According to D FIR And R FIR Determining the coding cost of the image block to be coded when the FIR up-sampling filter is used as the target up-sampling filter, and recording the coding cost as cost FIR 。
Determining the coding cost of the image block to be coded when the CNN upsampling filter is used as the target upsampling filter specifically comprises the following processes:
(7) Adopting a CNN downsampling filter to carry out downsampling operation on an image block to be encoded to obtain an image block with low resolution;
(8) Coding the image block with low resolution and outputting a compressed code stream;
(9) Acquiring a reconstructed image block of an image block to be encoded;
(10) Determining the mean square error D of the reconstructed image block and the image block to be encoded CNN ;
(11) Determining the size R of the compressed code stream of the image block to be coded CNN ;
(12) According to D CNN And R CNN Determining the coding cost of the image block to be coded when the CNN up-sampling filter is used as the target up-sampling filter, and recording the coding cost as cost CNN 。
Determining the cost of coding the image block to be coded as cost when the FIR up-sampling filter is used as the target up-sampling filter through the processes (1) to (6) FIR The CNN up-sampling filter is determined through the above processes (7) to (12)The coding cost of the image block to be coded when the filter is used as a target up-sampling filter is cost CNN . Next, it can be based on cost FIR And cost CNN Selects an upsampling filter from the FIR upsampling filter and the CNN upsampling filter.
When cost FIR <cost CNN Determining an FIR up-sampling filter as a target up-sampling filter;
when cost FIR >cost CNN Determining a CNN up-sampling filter as a target up-sampling filter;
when cost CNN =cost FIR Then, any one of the FIR upsampling filter and the CNN upsampling filter is selected as a target upsampling filter.
Optionally, in addition to determining the target upsampling filter from the preset upsampling filter set according to the coding cost, the target upsampling filter may also be determined from the preset upsampling filter set according to the texture feature or the spectral characteristic of the image block to be coded.
For example, when the texture of an image block to be encoded is sparse, an FIR upsampling filter can be selected as a target upsampling filter; when the texture of the image block to be encoded is dense, the CNN upsampling filter may be selected as the target upsampling filter in order to ensure the encoding effect.
According to the texture characteristics of the image blocks, the up-sampling filter which is matched with the image blocks in a comparison mode can be flexibly selected as the target up-sampling filter, and the coding effect of the image blocks with different texture characteristics can be improved.
For example, before formally encoding an image block to be encoded, performing spectrum analysis to obtain a spectrum feature of the image block to be encoded, and when the high-frequency component of the image block to be encoded is more, selecting a CNN upsampling filter as a target upsampling filter; and when the high frequency component of the image block to be encoded is less, the FIR filter can be selected as the upsampling filter.
The up-sampling filter matched with the image block can be selected as the target up-sampling filter according to the spectral characteristics of the image block, and the coding effect of the image blocks with different spectral characteristics can be improved.
Optionally, in the present application, the target upsampling filter may be further determined from the upsampling filter set according to at least two characteristics of the coding cost, the texture characteristic, and the spectral characteristic.
Specifically, the target upsampling filter may be determined from the upsampling filter set according to the coding cost and the texture feature of the image block to be coded.
For example, when the texture of the image block to be encoded is sparse, the FIR upsampling filter can be selected as the target upsampling filter; when the texture of the image block to be coded is dense, the coding cost of the image block to be coded when the FIR upsampling filter and the CNN upsampling filter perform upsampling respectively is determined, if the coding cost of the image block to be coded when the CNN upsampling filter performs upsampling is low, the CNN upsampling filter is selected as a target upsampling filter, and otherwise, the FIR upsampling filter is still selected as the target upsampling filter.
Specifically, the target upsampling filter may be determined from the upsampling filter set according to the coding cost and the spectral feature of the image block to be coded.
For example, when the high frequency components of the image block to be encoded are less, the FIR filter may be selected as the upsampling filter; when the high-frequency components of the image block to be coded are more, the coding cost of the image block to be coded when the FIR upsampling filter and the CNN upsampling filter perform upsampling respectively is determined, if the coding cost of the image block to be coded when the CNN upsampling filter performs upsampling is lower, the CNN upsampling filter is selected as a target upsampling filter, otherwise, the FIR upsampling filter is still selected as the target upsampling filter.
Optionally, as an embodiment, a parameter value of the CNN upsampling filter is preset, and the parameter value (also referred to as an initial parameter value) of the CNN upsampling filter is obtained by performing offline training on a preset image training set.
Specifically, when performing offline training on the CNN upsampling filter, a preset FIR downsampling filter may be first adopted to perform offline training on the original image x in the image training set i Down-sampling is carried out, and then the image y obtained by down-sampling is i As input to the CNN upsampling filter and convert the original image x i As a training target.
Next, the CNN upsampling filter (which may also be referred to as CNN upsampling network) may be trained according to a general training method of the CNN network to obtain an optimal parameter of the CNN upsampling filterAnd applying the optimum parameterAs parameter values for the CNN upsampling filter.
Specifically, the optimal parameters of the CNN upsampling filter may be obtained according to equation (1)
In the formula (1), x i Representing the original image in a training set of images, y i Representing a low resolution image obtained after down-sampling the original image, n representing the number of training samples, theta g All parameters, g (y), representing the CNN upsampling filter i ;θ g ) Representing the upsampled network pair input y i And performing upsampling operation.
Furthermore, in the training process described above, except for down-sampling the image y i Out of training as network input. Y may also be obtained first i Of reconstructed image y' i Then y 'is added' i The CNN upsampling filter is trained as an input. The up-sampling network obtained by training can restore the original resolution image details and can also restore the original resolution image details in a certain processAnd image flaws introduced by coding operation are eliminated, and the quality of reconstruction of the image block with the original resolution output by the up-sampling operation is further improved.
Optionally, as an embodiment, before determining a target upsampling filter from a preset upsampling filter set according to a coding cost of an image block to be coded, the method further includes: and performing on-line training on the CNN up-sampling filter according to the image block to be encoded to obtain an update parameter value of the CNN up-sampling filter.
And the updated parameter value of the CNN up-sampling filter obtained by on-line training is used for replacing the preset parameter value of the CNN up-sampling filter. The preset parameter value of the CNN upsampling filter may be obtained by training a preset image training set through the CNN upsampling filter.
By training the CNN up-sampling filter on line, the up-sampling filter with filter parameters more matched can be used for performing up-sampling operation according to image texture characteristics, and compared with the preset CNN up-sampling filter, the image quality of an image block output by up-sampling can be further improved.
Further, after the update parameter value of the CNN upsampling filter is obtained, the update parameter value of the CNN upsampling filter can be written into the code stream, so that the decoding end can also obtain the update parameter of the CNN upsampling filter by analyzing the code stream, and the preset parameter value of the CNN upsampling filter is updated according to the update parameter of the CNN upsampling filter.
Specifically, the on-line training procedure shown in fig. 5 may be employed to perform on-line training on a CNN upsampling filter (alternatively referred to as a CNN upsampling network). As shown in FIG. 5, I O For the current picture to be coded, I L Representing input image I using a CNN downsampling filter O The coding end simulator is a CNN network simulating the image coding process and can simulate the image coding process and output a low-resolution reconstructed image I' L And coding bit overhead. The CNN downsampling network can be obtained through offline training and is preset in the encoderAnd the method is used for performing downsampling operation on the original image block to be coded.
When the CNN upsampling filter is trained on line, the CNN upsampling network may be specifically trained according to formula (2) to obtain the optimal parameters of the CNN upsampling filter
In equation (2), g represents an upsampling mapping function, θ g Representing the upsampling filter parameters and g representing the mapping function of the upsampling filter.
Obtaining the optimal parameters of the CNN upsampling filter by formula (2)Thereafter, can beAn updated parameter value for the CNN upsampling filter is determined.
By training the parameters of the CNN up-sampling filter on line, the matched parameters can be used for up-sampling operation according to image texture features, and compared with the CNN up-sampling filter adopting preset parameters for up-sampling operation, the quality of image blocks output by up-sampling can be further improved. In addition, the encoding end can comprehensively consider the image quality improvement and parameter transmission cost brought by updating the parameters of the upsampling filter, for example, the overall Rate Distortion performance of the encoding scheme of the embodiment is improved by a (Rate Distortion optimization, RDO) method.
Optionally, as an embodiment, before determining a target upsampling filter from a preset upsampling filter set according to a coding cost of an image block to be coded, the method further includes: determining a target coding mode from an original resolution coding mode and a variable resolution coding mode according to the coding cost of an image block to be coded; generating encoding mode indication information corresponding to the target encoding mode; and writing the coding mode indication information into the code stream.
The coding mode indication information may be used to indicate a target coding mode. Specifically, the encoding mode indication information may be specifically represented by a value of a variable resolution encoding mode Flag1, where different values of Flag1 are used to represent different encoding modes. For example, when Flag1 is 0, it indicates that the target encoding mode is the original resolution encoding mode, and when Flag1 is 1, it indicates that the target encoding mode is the variable resolution encoding mode.
In addition, the above-mentioned coding mode information can also be represented by the value of the original resolution coding mode Flag1, and different values of Flag1 are used for representing different coding modes. For example, when Flag1 is 0, it indicates that the target coding mode is the variable resolution coding mode, and when Flag1 is 1, it indicates that the target coding mode is the original resolution coding mode.
Optionally, determining a target upsampling filter from a preset upsampling filter set according to the coding cost of the image block to be coded, which specifically includes: determining the coding cost of the image block to be coded in the original resolution coding mode; determining the coding cost of an image block to be coded in a variable resolution coding mode; and selecting the coding mode with the minimum coding cost from the original resolution coding mode and the variable resolution coding mode as the target coding mode.
According to the coding cost of the image block to be coded, the coding mode with lower coding cost can be selected as the target coding mode, and the loss of image coding in the coding process can be reduced.
The determining the coding cost of the image block to be coded in the original resolution coding mode specifically includes the following steps:
firstly, coding an image block to be coded in an original resolution mode according to a preset coding and decoding scheme to obtain a compressed code stream of the image block and a reconstructed image block of the image block to be coded;
secondly, determining the mean square error of the reconstructed image block and the image block to be encoded as Dorg;
thirdly, determining the size of a compressed code stream of the image block to be coded as Rorg;
and finally, calculating a rate distortion cost costorg for encoding the image block to be encoded in the original resolution encoding mode according to Dorg and Rorg, and determining the rate distortion cost costorg as the encoding cost of the image block to be encoded in the original resolution encoding mode.
The specific process of determining the coding cost of the image block to be coded in the variable resolution coding mode is the same as the specific process of determining the coding cost of the image block to be coded in the original resolution coding mode.
When the coding cost of the image block to be coded in the variable resolution coding mode is low, the up-sampling filter and the down-sampling filter can be selected differently, so that the coding cost of the image block when the up-sampling filter and the down-sampling filter are both FIR filters can be used as the coding cost of the image block in the variable resolution coding mode, and the coding cost of the image block when the up-sampling filter and the down-sampling filter are both CNN filters can be used as the coding cost of the image block in the variable resolution coding mode. Or, one filter may be arbitrarily selected from the CNN upsampling filter and the FIR upsampling filter as an upsampling filter in the encoding end, and one filter may be arbitrarily selected from the CNN downsampling filter and the FIR downsampling filter as a downsampling filter, and the encoding cost of the image block to be encoded is calculated.
It should be understood that when it is determined that the image block to be encoded is encoded in the variable resolution encoding mode, the target upsampling filter is determined from the set of upsampling filters set in advance according to the encoding cost of the image block. When the variable resolution coding mode is determined to be adopted for coding the image block to be coded, the original resolution coding mode is directly adopted for coding the image block to be coded, and a target up-sampling filter does not need to be determined, because the up-sampling operation is not needed in the original resolution coding mode.
Fig. 5 is a schematic flowchart of an image encoding method according to an embodiment of the present application. The method shown in fig. 5 may be executed by an encoding-side device, where the encoding-side device may specifically be an intelligent terminal, a PAD, a video player, an internet of things device, or other devices with a video image encoding function. The method shown in fig. 5 includes steps 401 to 406, and the following describes the steps 401 to 406 in detail with reference to specific examples.
401. And determining a target up-sampling filter from a preset up-sampling filter set according to the coding cost of the image block to be coded.
Wherein the set of upsampling filters includes at least a FIR upsampling filter and a CNN upsampling filter. For example, the FIR upsampling filter is specifically a bicubic upsampling filter.
402. Upsampling filter indication information corresponding to the target upsampling filter is generated.
The upsampling filter indication information may include an upsampling filter selection Flag2, and a value of Flag2 is used to indicate the target upsampling filter.
For example, when the FIR filter is determined to be the target upsampling filter, the value of Flag2 may be set to 0; and when it is determined that the CNN filter is the target upsampling filter, the value of Flag2 may be set to 1.
403. And determining a down-sampling filter of the same type as the target up-sampling filter in a preset down-sampling filter set as the target down-sampling filter.
Wherein the downsampling filter set includes at least a FIR downsampling filter and a CNN downsampling filter.
Specifically, when the FIR up-sampling filter is a target up-sampling filter, the FIR down-sampling filter is determined as a target down-sampling filter; when the CNN upsampling filter is the target upsampling filter, the CNN downsampling filter is determined as the target downsampling filter.
404. And a target downsampling filter is adopted to downsample the image block to be encoded to obtain a first image block.
405. And coding the first image block to obtain a code stream.
406. And writing the indication information of the up-sampling filter into the code stream.
In the method, the target up-sampling filter can be determined from the up-sampling filter set according to the coding cost, and compared with the up-sampling operation directly performed by the up-sampling filter with fixed parameter values, the coding cost of the image block to be coded is fully considered when the target up-sampling filter is selected, and a better coding effect can be obtained.
It will be appreciated that steps 401 and 402 of the method shown in figure 6 are substantially the same as steps 301 and 302 of the method shown in figure 4, and the definitions and explanations given above for steps 301 and 302 also apply to steps 401 and 402. In addition, steps 405 and 406 in the method shown in fig. 6 are substantially the same as steps 304 and 305 in the method shown in fig. 4, and the above definitions and explanations of steps 405 and 406 also apply to steps 304 and 305.
Optionally, as an embodiment, the determining a target upsampling filter from a preset upsampling filter set according to the coding cost of the image block to be coded includes: determining the coding cost of the image block to be coded when each up-sampling filter in the up-sampling filter set is used as a target up-sampling filter; and determining a first upsampling filter in the upsampling filter set as a target upsampling filter, wherein the coding cost of the image block to be coded when the first upsampling filter is used as the target upsampling filter in the upsampling filter set is the minimum.
In the application, the upsampling filter with the minimum coding cost is selected as the target upsampling filter, so that the coding cost generated in the coding process can be reduced as much as possible, and a better coding effect can be obtained.
The coding cost may specifically be a rate distortion cost of the image block during the coding process, distortion of the image block, and the like.
Optionally, in addition to determining the target upsampling filter from the preset upsampling filter set according to the coding cost, the target upsampling filter may also be determined from the preset upsampling filter set according to the texture feature or the spectral characteristic of the image block to be coded.
Optionally, as an embodiment, a parameter value of the CNN upsampling filter is preset, and the parameter value of the CNN upsampling filter is obtained by performing offline training on a preset image training set.
Because the parameter value of the CNN up-sampling filter is obtained through off-line training, if the CNN up-sampling filter is adopted for up-sampling in the encoding process, the information loss of the image in the up-sampling process can be reduced, and the image encoding quality is improved.
It should be understood that the specific process of obtaining the parameter values of the CNN upsampling filter through offline training can be referred to the above related description of offline training of the CNN upsampling filter.
Optionally, as an embodiment, a parameter value of the CNN downsampling filter is preset, and the parameter value of the CNN downsampling filter is obtained by performing offline training on a preset image training set.
Because the parameter value of the CNN downsampling filter is obtained through offline training, if the CNN downsampling filter is adopted for downsampling in the encoding process, the information loss of the image in the downsampling process can be reduced, and the image encoding quality is improved.
It should be understood that, when performing offline training on the CNN downsampling filter, it is necessary to perform upsampling processing by using upsampling filtering, and when performing upsampling processing, both the FIR upsampling filter and the CNN upsampling filter may be used, and the offline training process of the CNN downsampling filter when using the FIR upsampling filter is described in detail below (the training process when performing upsampling processing by using the CNN upsampling filter is similar to this).
When the CNN downsampling filter is trained, the original image in the training set may be used as the input of the CNN downsampling filter, the output of the CNN downsampling filter may be used as the input of the FIR upsampling filter, the mean square error between the image output by the FIR upsampling filter and the original image may be calculated as a loss function, and the filter parameter with the minimum loss function may be used as the parameter of the CNN downsampling filter.
Specifically, the parameters of the CNN downsampling filter can be obtained according to formula (3).
In the formula (3), the first and second groups of the compound,representing all parameters in a CNN down-sampled network, f (x) i ;θ f ) Representing down-sampled network pair input x i Do down sampling operation, g FIR Representing an up-sampling operation using a FIR filter,representing initial parameters of the CNN downsampling network.
Optionally, as an embodiment, the parameter value of the CNN upsampling filter and the parameter value of the CNN downsampling filter are both preset, and the parameter value of the CNN upsampling filter and the parameter value of the CNN downsampling filter are obtained by performing joint training on a preset image training set in an offline condition.
Because the CNN up-sampling filter and the CNN down-sampling filter are obtained in an off-line condition in a joint training mode, the information loss caused by image textures in the up-sampling process and the down-sampling process can be reduced in the encoding process, and the quality of an encoded image is improved.
Under the offline condition, the joint training of the preset image training set to obtain the parameter values of the CNN upsampling filter and the CNN downsampling filter specifically includes the following processes:
firstly, training a CNN up-sampling filter to obtain initial parameters of the CNN up-sampling filter.
And downsampling the pictures in the training set, taking the obtained downsampled images as input of a CNN upsampling network, taking the original pictures in the training set as a training target, and enabling the reconstructed images to be close to the training target as much as possible in the training process to obtain parameters of the CNN upsampling filter. Specifically, the CNN upsampling filter may be trained through formula (4), so as to obtain an initial parameter value of the CNN upsampling filter.
In the above formula (4), x i Original image, y, representing a training set i Representing a low resolution image obtained after down-sampling the original image, n representing the number of training samples, theta g Represents all parameters in the CNN upsampled network, g (y) i ;θ g ) Representing the upsampled network pair input y i The up-sampling operation is carried out on the sample,and (2) representing initial parameters of the CNN up-sampling network obtained after the training in the step (1) is finished.
Secondly, training the CNN down-sampling filter based on the CNN up-sampling filter obtained by training to obtain an initial parameter value of the CNN down-sampling filter.
And taking the original image in the training set as the input of the CNN down-sampling filter, taking the output of the CNN down-sampling filter as the input of the CNN up-sampling filter to obtain an image output by the CNN up-sampling filter, and calculating the mean square error between the image output by the CNN up-sampling network and the original image as a loss function to minimize the loss function, thereby obtaining the initial parameter value of the CNN down-sampling filter.
Specifically, the CNN downsampling filter may be trained specifically by using formula (5), so as to obtain an initial parameter value of the CNN downsampling filter.
In the formula (5), x i Original image, theta, representing a training set f All parameters representing the CNN downsampling filter,represents the initial parameter of CNN up-sampling network, f (x), obtained after the training in step 1 is finished i ;θ f ) Representation downsamplingFilter pair input x i The down-sampling operation is carried out,the initial parameters of the resulting CNN downsampling filter are represented.
And thirdly, performing joint training on the up-sampling filter and the down-sampling filter to obtain updated network parameters.
Parameters obtained through the training processInitial parameter values of the up-sampling filter and the down-sampling filter are respectively. The original resolution images in the image training set are then used as input and training targets while training the parameters of the up and down sampling filters.
Specifically, the CNN upsampling network and the downsampling network may be trained simultaneously according to formula (6) to obtain parameter values of the CNN upsampling filter and the CNN downsampling filter.
In the case of the formula (6),is the optimal parameter of the CNN down-sampling filter, theta' g Updating parameter values for the CNN up-sampling filter during training (initial parameter values of the CNN up-sampling filter are) The meanings of the other relevant parameters in equation (6) are explained below in equation (4) and equation (5).
The loss function expressed by equation (6) is composed of two terms, the first termThe calculated error between the image output by the CNN downsampling filter and the image output by the FIR filter downsampling filterDifference, second termThe error between the image output by the CNN upsampling filter and the original resolution image is calculated. The optimal parameters of the CNN downsampling filter can be obtained by training the upsampling filter and the downsampling filter according to the formula (6)
Optionally, as an embodiment, before downsampling the image block to be encoded by using the target downsampling filter, the method shown in fig. 6 further includes: and performing online training on the CNN down-sampling filter and/or the CNN up-sampling filter according to the image block to be encoded to obtain an update parameter value of the CNN down-sampling filter and/or an update parameter value of the CNN down-sampling filter, wherein the update parameter value of the CNN down-sampling filter is used for replacing a preset parameter value of the CNN down-sampling filter, and the update parameter value of the CNN up-sampling filter is used for replacing a preset parameter value of the CNN up-sampling filter.
By training the parameters of the CNN downsampling filter on line, the information loss caused by downsampling operation can be reduced as much as possible, and the quality of the reconstructed image is improved. In addition, since the CNN downsampling filter is used only in the encoder, there is no need to transfer the CNN downsampling filter parameters to the decoder, and thus, there is no increase in encoding overhead.
By training the CNN up-sampling filter on line, the up-sampling filter with the filter parameters more matched can be used for performing up-sampling operation according to the image texture characteristics, and compared with the preset CNN up-sampling filter, the image quality of the image block output by up-sampling can be further improved.
By simultaneously training the CNN up-sampling filter and the CNN down-sampling filter on line, the up-sampling filter and the down-sampling filter which are matched with filter parameters can be used for performing up-sampling operation and down-sampling operation according to image texture characteristics, and compared with the preset CNN up-sampling filter and CNN down-sampling filter, the image quality of an image block output by up-sampling can be further improved.
Optionally, the method shown in fig. 6 further includes: and writing the updated parameter value of the CNN up-sampling filter into the code stream.
After the update parameter value of the CNN up-sampling filter is obtained, the update parameter value of the CNN up-sampling filter may be written into the code stream, so that the decoding end can also obtain the update parameter of the CNN up-sampling filter by analyzing the code stream, and the preset parameter value of the CNN up-sampling filter is updated according to the update parameter of the CNN up-sampling filter, so that the decoding end can perform up-sampling operation by using the updated parameter value with the CNN up-sampling filter of the encoding end.
Optionally, as an embodiment, before determining, as the target upsampling filter, the upsampling filter in the upsampling filter set whose corresponding coding cost meets the preset requirement, the method shown in fig. 6 further includes: and performing joint training on the CNN up-sampling filter and the CNN down-sampling filter under the online condition according to the image block to be encoded to obtain an update parameter value of the CNN up-sampling filter and an update parameter value of the CNN down-sampling filter.
Optionally, before determining the target upsampling filter from the set of upsampling filters set in advance according to the coding cost of the image block to be coded, the method shown in fig. 6 further includes: determining a target coding mode from an original resolution coding mode and a variable resolution coding mode according to the coding cost of an image block to be coded; generating encoding mode indication information corresponding to a target encoding mode; and writing the coding mode indication information into the code stream.
Optionally, determining a target upsampling filter from a preset upsampling filter set according to the coding cost of the image block to be coded, which specifically includes: determining the coding cost of an image block to be coded in the original resolution coding mode; determining the coding cost of an image block to be coded in a variable resolution coding mode; and selecting the coding mode with the minimum coding cost from the original resolution coding mode and the variable resolution coding mode as the target coding mode.
The coding mode indication information is used to indicate which coding mode of the candidate coding modes is the target coding mode. For example, the encoding mode indication information may indicate that the original resolution encoding mode is the target encoding mode, or the encoding mode indication information may indicate that the variable resolution encoding mode is the target encoding mode.
According to the coding cost of the image block to be coded, the coding mode with lower coding cost can be selected as the target coding mode, and the loss of image coding in the coding process can be reduced.
It should be understood that when it is determined that the image block to be encoded is encoded by using the variable resolution encoding mode, the target upsampling filter is determined from the set of upsampling filters set in advance according to the encoding cost of the image block. And when the image block to be coded is determined to be coded by adopting the variable resolution coding mode, the image block to be coded is directly coded by adopting the original resolution coding mode without determining a target up-sampling filter, because the up-sampling operation is not required to be carried out in the original resolution coding mode.
Fig. 7 is a schematic flowchart of an image encoding method according to an embodiment of the present application. The method shown in fig. 7 may be executed by an encoding end device, where the encoding end device may specifically be an intelligent terminal, a PAD, a video player, an internet of things device, or another device having a video image encoding function. The method shown in fig. 7 includes steps 501 to 506, and the steps 501 to 506 are described in detail below with reference to specific examples.
501. And determining a target downsampling filter from a preset downsampling filter set according to the coding cost of the image block to be coded.
Wherein the downsampling filter set at least comprises an FIR downsampling filter and a CNN downsampling filter.
The method and the specific process for determining the target down-sampling filter from the preset down-sampling filter set according to the coding cost of the image block to be coded are substantially the same as the method and the specific process for determining the target up-sampling filter from the preset down-sampling filter set according to the coding cost of the image block to be coded, and are not repeated here.
502. And adopting a target downsampling filter to downsample the image block to be encoded to obtain a first image block.
503. And coding the first image block to obtain a code stream.
504. And determining an upsampling filter of the same type as the target downsampling filter in a preset upsampling filter set as a target upsampling filter, wherein the upsampling filter set at least comprises a finite impulse response FIR upsampling filter and a Convolutional Neural Network (CNN) upsampling filter.
505. An upsampling filter indication corresponding to the target upsampling filter is generated.
506. And writing the indication information of the up-sampling filter into the code stream.
It is to be understood that the specific definitions and explanations of the steps of the method shown in fig. 7 may be referred to the explanations and definitions of the corresponding steps in the methods described in fig. 4 and 6 above, which are not described in detail here.
Optionally, in the present application, the target upsampling filter and the target downsampling filter may be determined from a preset upsampling filter set and a target downsampling filter set according to the encoding cost of the image block to be encoded. Also, the processes of determining the target upsampling filter and determining the target downsampling filter may be independent of each other.
According to the method and the device, the target down-sampling filter can be determined from the down-sampling filter set according to the coding cost, the up-sampling filter is determined according to the filter type of the down-sampling filter, and compared with the operation of directly adopting the up-sampling filter with fixed parameter values to perform up-sampling, the coding cost of the image block to be coded is fully considered when the target up-sampling filter is selected, and a better coding effect can be obtained.
The image encoding method according to the embodiment of the present application is described in detail from the perspective of the encoding end with reference to fig. 4 to 7, and the image decoding method according to the embodiment of the present application is described in detail with reference to fig. 8. It should be understood that the image decoding method shown in fig. 8 corresponds to or is the inverse of the procedure of the image encoding method shown in fig. 4, 6, and 7 above.
Fig. 8 is a schematic flowchart of an image decoding method according to an embodiment of the present application. The method shown in fig. 8 may be executed by a decoding-side device, where the decoding-side device may specifically be an intelligent terminal, a PAD, a video player, an internet of things device, or other devices with a video image decoding function. The method shown in fig. 8 includes steps 401 to 408, and the following describes steps 401 to 408 in detail with reference to specific examples.
601. And acquiring a code stream.
The code stream may be a code stream obtained by the image encoding method of the present application. Specifically, the code stream may be obtained by the image encoding method shown in fig. 4, 6, and 7 above.
602. And carrying out entropy decoding, inverse quantization and inverse transformation on the code stream to obtain a reconstructed residual signal of the image block to be decoded.
603. And acquiring a prediction signal of the image block to be decoded.
604. And adding the reconstructed residual error signal and the prediction signal to obtain an initial reconstructed image block of the image block to be decoded.
605. And analyzing the code stream to obtain the indication information of the coding mode.
The coding mode indication information is used for indicating whether the coding end adopts the original resolution coding mode or the variable resolution coding mode during coding. That is, the coding mode indication information indicates in which coding mode the code stream is obtained. For example, the encoding mode indication information may indicate that the code stream is encoded in an original resolution encoding mode or a resolution-variable encoding mode.
606. And determining a target decoding mode from the original resolution decoding mode and the variable resolution decoding mode according to the encoding mode indication information.
Optionally, determining the target decoding mode from the original resolution decoding mode and the variable resolution decoding mode according to the encoding mode indication information includes: when the coding mode is the variable resolution coding mode, determining that the target decoding mode is the variable resolution decoding mode; and when the coding mode is the original resolution coding mode, determining the target decoding mode to be the original resolution decoding mode.
Optionally, the coding mode indication information is specifically represented by a value of a variable resolution coding mode Flag1, and different values of Flag1 are used for representing different coding modes. For example, when Flag1 is 0, it indicates that the code stream is obtained in the original resolution encoding mode, and when Flag1 is 1, it indicates that the code stream is obtained in the variable resolution encoding mode.
When the encoding mode indication information is represented by the value of the variable resolution encoding mode Flag1, the decoding end may determine the target decoding mode according to the value of the Flag1. For example, when Flag1 is 0, the decoding side determines that the target decoding mode is the variable resolution decoding mode; when Flag1 is 1, the decoding end determines that the target decoding mode is the original resolution decoding mode.
In addition, the above-mentioned coding mode information can also be represented by the value of the original resolution coding mode Flag1, and different values of Flag1 are used for representing different coding modes. For example, when Flag1 is 0, it indicates that the target encoding mode is the variable resolution encoding mode, and when Flag1 is 1, it indicates that the target encoding mode is the original resolution encoding mode.
The decoding end selects the decoding mode corresponding to the coding mode of the coding end as the target decoding mode, and the decoding effect of the decoding end can be ensured.
607. And under the condition that the target decoding mode is a variable resolution decoding mode, analyzing the code stream of the image block to be coded, and acquiring the indication information of the up-sampling filter.
When the decoding end determines that the target decoding mode is the variable resolution decoding mode, the code stream is obtained by encoding under the variable resolution encoding mode, and the encoding end writes the indication of the up-sampling filter into the code stream, so that the decoding end obtains the indication information of the up-sampling filter by analyzing the code stream.
The upsampling filter indication information may be used to indicate that an upsampling filter in the upsampling filter set is a target upsampling filter.
For example, the upsampling filter indication information may indicate a FIR upsampling filter of the upsampling filter set as the target upsampling filter, or the upsampling filter indication information may also indicate a CNN upsampling filter of the upsampling filter set as the target upsampling filter.
608. And determining a target upsampling filter from a preset upsampling filter set according to the upsampling filter indication information, wherein the upsampling filter set at least comprises an FIR upsampling filter and a CNN upsampling filter.
Alternatively, the upsampling filter indication information may be an upsampling filter selection Flag2, and a value of Flag2 is used to indicate the target upsampling filter.
For example, when the value of Flag2 is 0, the FIR filter is the target upsampling filter, and when the value of Flag2 is 1, the CNN filter is the target upsampling filter 1.
Or, when the value of Flag2 is 0, the CNN filter is the target upsampling filter, and when the value of Flag2 is 1, the FIR filter is the target upsampling filter 1.
The decoding end obtains the value of Flag2 by analyzing the code stream, and can directly determine the target up-sampling filter according to the value of Flag2.
609. And upsampling the initial reconstructed image block by adopting a target upsampling filter to obtain a target reconstructed image block.
In the application, when an image block is decoded in a variable resolution decoding mode, a decoding end can select a corresponding up-sampling filter from a preset up-sampling filter set as a target up-sampling filter according to up-sampling filter indication information, and then performs up-sampling operation.
Optionally, as an embodiment, the parameter value of the CNN upsampling filter is preset, and the parameter value of the CNN upsampling filter is obtained by performing offline training on a preset image training set.
The off-line training process of the CNN upsampling filter may refer to specific processes described in the image coding methods shown in fig. 4, fig. 6, and fig. 7, and for brevity, the description is not repeated here.
Because the parameter value of the CNN up-sampling filter is obtained through off-line training, the information loss of the image in the up-sampling process can be reduced and the image decoding quality is improved by adopting the CNN up-sampling filter for up-sampling in the decoding process.
In a possible implementation manner, the method further includes: analyzing the code stream, and acquiring an update parameter value of the CNN up-sampling filter, wherein the update parameter value of the CNN up-sampling filter is used for replacing a parameter value preset by the CNN up-sampling filter.
The encoding end can update the CNN up-sampling filter periodically or aperiodically, and write the obtained parameter value of the CNN up-sampling filter into the code stream, so that the decoding end can update the initial parameter value of the CNN up-sampling filter according to the parameter value of the CNN up-sampling filter.
By updating the parameter value of the CNN up-sampling filter, the filter parameter more matched with the image content can be obtained, so that the information loss of the image is further reduced when the CNN up-sampling filter is used for up-sampling operation, and the image decoding quality is improved.
Optionally, the update parameter value of the CNN upsampling filter is obtained by the encoding end performing online training on the CNN upsampling network according to the image block to be encoded, where the image block to be decoded is obtained by encoding the image block to be encoded.
It should be understood that the image block to be encoded and the image block to be decoded correspond to each other, that is, the code stream of the image block to be decoded is obtained by encoding the image block to be encoded.
By training the CNN up-sampling filter on line, the up-sampling filter with filter parameters more matched can be used for performing up-sampling operation according to image texture characteristics, and compared with the preset CNN up-sampling filter, the image quality of an image block output by up-sampling can be further improved.
Optionally, the update parameter value of the CNN upsampling filter is obtained by performing offline training on a preset image training set.
The updating parameter value of the CNN up-sampling filter is obtained through off-line training, and the parameter value of the CNN up-sampling filter can be updated, so that the coding quality is improved.
In a possible implementation manner, before analyzing the code stream and obtaining the update parameter value of the CNN upsampling filter, the method further includes: analyzing the code stream, and acquiring filter parameter updating indication information which is used for indicating whether to update the parameter value of the target up-sampling filter; analyzing the code stream to obtain an update parameter value of the CNN up-sampling filter, wherein the update parameter value comprises the following steps: and under the condition that the filter parameter updating indication information indicates that the parameters of the target up-sampling filter are updated, analyzing the code stream to obtain the updating parameter value of the CNN up-sampling filter.
Optionally, the filter update indication information is specifically a value of Flag3, and different values of Flag3 are used to indicate whether to update a parameter value of the target upsampling filter.
For example, when the value of Flag3 is 0, it indicates that the parameter value of the CNN upsampling filter does not need to be updated; when the value of Flag3 is 1, it indicates that the parameter value of the CNN upsampling filter does not need to be updated;
optionally, the filter parameter update indication information is carried in SPS or PPS.
The update value of the CNN up-sampling filter is obtained from the code stream only when the filter parameter update indication information indicates that the target up-sampling filter needs to be updated, so that the decoding load can be reduced, and the decoding efficiency can be improved.
In the present application, the filter parameter update indication information may be sent periodically, so that the parameter value of the CNN upsampling filter may be updated according to the update parameter value of the CNN upsampling filter, so as to improve the coding effect as much as possible.
The image encoding method and the image decoding method according to the embodiment of the present application are described in detail above from the perspective of the encoding end and the decoding end with reference to fig. 4 to 8. In order to better understand the image coding and decoding method according to the embodiment of the present application, a detailed description is given below of specific processes of the image coding and decoding method according to the embodiment of the present application with reference to specific embodiments.
The first embodiment is as follows: the up-sampling filter and the down-sampling filter are both CNN filters or FIR filters.
In the first embodiment, the same type of filter is used for the upsampling filter and the downsampling filter, and the specific type of filter used for the upsampling filter and the downsampling filter can be selected from the CNN filter and the FIR filter according to the coding cost of the image block.
In one embodiment, a specific architecture of the encoder may be as shown in fig. 9. When an image block is coded, an original resolution coding mode and a variable resolution coding mode can be selected, and when the coding mode is selected, one coding mode can be selected from the original resolution coding mode and the variable resolution coding mode according to the coding cost of the image block to code the image block. When determining to encode an image block in the original resolution encoding mode, directly employing the original resolution encoding module shown in fig. 9 to encode the image block; when the image block is determined to be encoded in the variable resolution mode, an up-sampling filter needs to be selected from the CNN up-sampling filter and the FIR up-sampling filter, and a down-sampling filter needs to be selected from the CNN down-sampling filter and the FIR down-sampling filter. In order to ensure the encoding effect of the image block, the down-sampling filter and the up-sampling filter use the same type of filter, and as shown in fig. 9, when the up-sampling filter is a CNN up-sampling filter, the down-sampling filter is a CNN down-sampling filter, and when the up-sampling filter is a FIR up-sampling filter, the down-sampling filter is a FIR down-sampling filter.
Among them, the CNN upsampling filter, the CNN downsampling filter, and the FIR upsampling filter and the FIR downsampling filter may be designed in advance (filter parameter values are already determined) and built in the encoder.
In the first embodiment, the encoding end needs to select the encoding mode and the filter, so that the encoding end generates two specific selection information, that is, the original resolution or low resolution encoding mode selection information and the CNN or FIR filter selection information, when encoding the image block, and writes the two selection information into the code stream. The decoder can identify whether one image block adopts a variable resolution decoding mode, and selects a corresponding up-sampling filter to perform up-sampling operation on a low-resolution reconstructed image block obtained by decoding under the condition of adopting the variable resolution decoding mode, so as to obtain a reconstructed image block with the original resolution.
As shown in fig. 10, in the first embodiment, the encoding process of the encoder is specifically as follows:
701. an image block is obtained.
Specifically, the image block may be a complete image of a frame, or may be a part of an image of a frame. When the image block is a part of a frame of image, the image may be divided to obtain a plurality of image blocks before step 701, and then the image block to be encoded is obtained in step 701.
702. And determining the coding cost of the current image block in the original resolution coding mode.
Specifically, in step 702, the coding cost of the current image block coded in the original resolution coding mode needs to be calculated.
The coding cost of the current image block in the original resolution coding mode can be calculated by the following procedure.
Firstly, coding a current image block in an original resolution mode according to a preset coding and decoding scheme to obtain a compressed code stream of the current image block and a reconstructed image block of the current image block;
secondly, calculating the mean square error D of the reconstructed image block and the current image block org Determining the size R of the compressed code stream of the current image block org ;
Finally, according to D org And R org Calculating the rate distortion cost of encoding the current image block under the original resolution encoding mode org And distorting the rateCost org And determining the coding cost of the current image block in the original resolution coding mode.
It should be understood that the coding cost has multiple concrete forms, for example, the coding cost may be specifically a rate distortion cost, may also be a distortion, and the concrete form of the coding cost is not limited in this application.
703. And determining the coding cost of the current image block when the FIR up-sampling filter is used as the target up-sampling filter.
Step 703 essentially calculates the coding cost for coding the current image block in the variable resolution coding mode (FIR upsampling filter as upsampling filter).
Specifically, the encoding cost of the current image block when the FIR upsampling filter is used as the target upsampling filter can be calculated by the following procedure.
(1) Performing downsampling operation on a current image block by using an FIR downsampling filter (a typical FIR downsampling filter is a bicubic downsampling filter) to obtain a low-resolution image block;
(2) Coding the low-resolution image block according to the selected coding and decoding scheme, and outputting a compressed code stream;
(3) Reconstructing the low-resolution image block to obtain a low-resolution reconstructed image block;
(4) An FIR up-sampling filter is adopted to up-sample the reconstructed image block with low resolution to obtain the reconstructed image block with the same original resolution as the current image block;
(4) Calculating the mean square error D of the reconstructed image block and the current image block FIR Determining the size R of the compressed code stream of the current image block FIR ;
(5) According to D FIR And R FIR Calculating rate-distortion cost FIR And the rate distortion cost is cost FIR And determining the coding cost of the current image block when the FIR filter is used as the target upsampling filter.
In the above process of calculating the coding cost of the current image block when the FIR upsampling filter is used as the target upsampling filter, in order to obtain the same reconstructed image block as the decoding end, the FIR upsampling filter used here and the FIR upsampling filter used at the decoding end must be the same filter (with the same filter parameters).
704. And determining the coding cost of the current image block when the CNN filter is used as the target upsampling filter.
Step 704 is essentially to calculate the coding cost for coding the current image block in the variable resolution coding mode (CNN upsampling filter as upsampling filter).
Specifically, the coding cost of the current image block when the CNN filter is used as the target upsampling filter can be calculated by the following procedure.
(1) Adopting a CNN downsampling filter to carry out downsampling operation on the current image block to obtain an image block with low resolution;
(2) Coding the low-resolution image block according to the selected coding and decoding scheme, and outputting a compressed code stream;
(3) Reconstructing the low-resolution image block to obtain a low-resolution reconstructed image block;
(4) Adopting a CNN up-sampling filter to up-sample the reconstructed image block with low resolution to obtain a reconstructed image block with the same original resolution as the current image block;
(5) Calculating the mean square error D of the reconstructed image block and the current image block CNN Determining the size R of the compressed code stream of the current image block CNN ;
(6) According to D CNN And R CNN Calculating rate-distortion cost CNN And the rate distortion cost is cost CNN And determining the coding cost of the current image block when the CNN filter is used as the target upsampling filter.
In the above process of calculating the coding cost of the current image block when the CNN upsampling filter is used as the target upsampling filter, in order to obtain the same reconstructed image block as the decoding end, the CNN upsampling filter used here and the CNN upsampling filter used by the decoding end must be the same filter (the filter parameters are the same).
Fig. 11 shows a specific structure of the CNN downsampling filter. The network structure of the CNN downsampling filter is composed of 10 convolutional layers, the convolutional kernel size of each convolutional layer is 3 × 3, except for the last convolutional layer, linear rectification functions (relus) are used after other convolutional layers, wherein the Linear rectification functions used by convolutional layers 1 to 9 are relu.1 to relu.9, and the step size (Stride) of the convolution of the first layer is set to be 2. In addition, the network structure adds a low-resolution image obtained after an input image block is subjected to down-sampling by an FIR filter to the output of the last layer of the network, so that the network has a residual learning characteristic.
The FIR Filter in the network structure shown in fig. 11 may be specifically a bicubic Filter, a Discrete Cosine Transform Based Interpolation Filter (DCTIF), or the like, wherein the Discrete Cosine Transform Based Interpolation Filter may be directly referred to as a DCTIF Filter for short.
The above steps 703 and 704 are to substantially determine the cost of encoding the current image block in the variable resolution encoding mode, and since different upsampling filters may be used for upsampling when performing variable resolution encoding, the encoding cost of the current image block needs to be calculated when different filters (FIR upsampling filter and CNN upsampling filter) are used as the target upsampling filters.
705. And carrying out coding operation on the current image block in a coding mode with the minimum coding cost to generate a code stream.
Specifically, in step 705, the encoding mode may be selected according to the magnitude of the encoding cost calculated in steps 702 to 704, and encoding information may be generated. The coding information may specifically include a variable resolution coding mode indication Flag1 and an upsampling filter selection Flag2.
The setting of Flag1 and Flag2 according to the coding cost described above specifically includes the following three cases.
(1) When cost org <cost FIR And cost org <cost CNN Determining the current graph in the original resolution modeThe tile is encoded, with Flag1 set to 0, flag2 is meaningless and therefore not needed;
(2) When cost FIR <cost org And cost FIR <cost CNN Then, determining to encode the current image block in a variable resolution mode, and performing up-sampling operation and down-sampling operation by adopting an FIR filter, setting Flag1 to be 1 and setting Flag2 to be 0 at the moment;
(3) When cost CNN <cost org And cost CNN <cost FIR And then, determining to encode the current image block in the variable resolution mode, and performing up-sampling operation and down-sampling operation by adopting a CNN filter, wherein Flag1 is set to be 1, and Flag2 is set to be 1.
706. And writing the coding information of the current image block into a code stream.
In step 706, the coding information may include, in addition to the variable resolution coding mode Flag1 and the upsampling filter selection Flag2 determined in step 705, a prediction mode when the current image block is coded, a quantized coding coefficient, and the like.
A decoding process of the decoding end in the first embodiment is shown in fig. 12, the decoding process shown in fig. 12 corresponds to the encoding process shown in fig. 10, and fig. 12 specifically includes the following steps.
801. And acquiring a compressed code stream.
The compressed code stream obtained in step 801 may be a code stream finally generated by the encoding method shown in fig. 10. When the compressed code stream is obtained, the decoding end can directly obtain the compressed code stream from the encoding end or obtain the compressed code stream from the server.
802. And analyzing the variable resolution coding mode indication Flag1 of the current image block.
The current image block may be an image block to be decoded currently, the compressed code stream acquired in step 801 may be a code stream of the current image block, and after the code stream of the current image block is acquired, the variable resolution coding mode indication Flag1 may be analyzed from the code stream of the current image block, and a value of Flag1 is acquired.
803. It is determined whether or not the variable resolution decoding mode is employed based on Flag1.
The Flag1 value may be used to indicate whether a variable resolution coding mode or an original resolution coding mode is used when generating the code stream of the current image block. Therefore, whether to decode the current image block in the variable resolution decoding mode can be determined according to the value of Flag1.
For example, when the value of Flag1 is 0, it indicates that the original resolution coding mode is adopted when the code stream of the current image block is generated; when the value of Flag1 is 1, it indicates that a variable resolution coding mode is adopted when the code stream of the current image block is generated.
Therefore, when the value of Flag1 is 0, it can be determined that the current image block is decoded by adopting the original resolution coding mode; when the value of Flag1 is 1, it may be determined that the current image block is decoded in the variable resolution coding mode.
804. The upsampling filter select Flag2 of the current image block is parsed.
The value of Flag2 may be used to indicate whether to upsample using a FIR upsampling filter or a CNN upsampling filter as the upsampling filter in the encoding process.
805. And analyzing the coding information of the low-resolution coding block to obtain a low-resolution reconstructed image block.
Because the encoding end can select different encoding schemes to encode the image block, the decoding scheme corresponding to the encoding scheme is also selected to decode when the reconstructed image block is obtained by analyzing the decoding information.
For example, if an image block is encoded in a manner similar to Joint Photographic Experts Group (JPEG), an international image compression standard, then only the quantized transform coefficient information of the image block needs to be analyzed from the code stream during decoding, and inverse quantization and inverse transform operations are performed on the coefficient, so as to obtain the original resolution image block reconstruction.
In addition, if a similar h.264 or h.265 intra-frame coding mode is selected to encode an image block, then the coding information of the current block, such as the coding mode, the prediction mode, the quantized transform coefficient, etc., needs to be decoded from the code stream during decoding. And after inverse quantization and inverse transformation are carried out on the quantized transformation coefficient to obtain a reconstructed residual signal, prediction operation is carried out according to a prediction mode to obtain a prediction signal, and finally the prediction signal and the prediction signal are added to obtain the original resolution image block reconstruction.
Specifically, as an optional implementation manner, acquiring the low-resolution reconstructed block mainly includes the following processes:
firstly, entropy decoding is carried out on a compressed code stream of a current image block to obtain a quantization coefficient;
secondly, inverse quantization and transformation are carried out on the quantized coefficients to obtain a reconstructed residual error signal of the current image block;
thirdly, acquiring a prediction signal (or called a prediction image block) of the current image block;
and finally, obtaining a low-resolution reconstructed image block of the current image block according to the residual signal and the prediction signal of the current image block.
806. And analyzing the coding information of the coding block with the original resolution to obtain a reconstructed image block with the original resolution.
The process of obtaining the reconstructed image block with the original resolution in step 806 is the same as the specific process of obtaining the reconstructed image block with the low resolution in step 805, except that the input signal in step 805 is a code stream obtained in the low resolution encoding mode, and the input signal in step 806 is a code stream obtained in the original resolution encoding mode.
807. Whether to adopt a CNN up-sampling filter is determined according to Flag2.
Specifically, whether the CNN upsampling filter or the FIR upsampling filter is used for upsampling may be specifically determined according to the value of Flag2.
For example, when the value of Flag2 is 0, it means that the CNN upsampling filter is used for upsampling, and when the value of Flag2 is 1, it means that the FIR upsampling filter is used for upsampling. Therefore, the CNN upsampling filter is determined to be used when the value of Flag2 is 1, and the CNN upsampling filter is determined not to be used (the FIR upsampling filter is used) when the value of Flag2 is 0.
808. And (3) upsampling the low-resolution reconstructed block by using a CNN upsampling filter to obtain a reconstructed image block of the original resolution.
The CNN upsampling filter used in step 808 may be as shown in fig. 13, where fig. 13 shows a network structure of the CNN upsampling filter, the network is divided into five layers, the first two layers are convolutional layers and perform multi-scale feature extraction operation, the third layer is an anti-convolutional layer and perform multiscale feature upsampling operation, and the last two layers are convolutional layers to complete multi-scale reconstruction. The first layer convolutional kernel size is 5x5, and the number of feature maps is 64; the second layer uses convolution kernels with the sizes of 5x5 and 3x3 at the same time, and the number of characteristic maps is 16 and 32 respectively; the size of the deconvolution kernel of the third layer is 12x12, and the number of feature maps is 48; the fourth layer uses convolution kernels with the sizes of 3x3 and 1x1 at the same time, and the corresponding feature map numbers are 16 and 32 respectively; the convolution kernel size of the fifth layer is 3 × 3, the number of feature maps is 1, that is, the output original resolution image block. And adding a ReLU layer behind each of the first 4 convolutional layers as an activation function to perform activation operation on the convolutional output signals.
809. And performing up-sampling on the low-resolution reconstruction block by using an FIR up-sampling filter to obtain a reconstruction image block with the original resolution.
In step 809, the low-resolution reconstruction may be interpolated by using a DCTIF interpolation filter in the h.265 standard scheme to obtain the original-resolution reconstruction.
In steps 808 and 809, the original resolution refers to the resolution of the image block to be encoded corresponding to the reconstructed image block at the encoding end, and the reconstructed image block with the same resolution as the original image to be encoded can be obtained by performing upsampling.
810. It is determined whether the current image block is the last image block.
Specifically, whether the current image block is the last image block to be decoded or not can be determined according to other coding information carried in the code stream.
When the current image block is not the last image block to be decoded, continue to execute step 802 to continue decoding the next image block; in case the current image block is the last image block to be decoded, step 811 is performed and the decoding process of the current image ends.
811. The decoding of the current picture is finished.
In the first embodiment, the CNN upsampling filter and the CNN downsampling filter may be obtained by training in advance and preset in the encoder and the decoder. The encoder and the decoder directly use the preset CNN up-sampling filter and the CNN up-sampling filter in the encoding and decoding process.
Specifically, the parameter values of the CNN upsampling filter and the CNN downsampling filter may be preset parameter values, and the preset parameter values may be obtained by performing offline training based on an image training set.
In the first embodiment, since it is possible to use the CNN upsampling filter and the CNN downsampling filter for the upsampling filter and the downsampling filter respectively, the parameter values of the CNN upsampling filter and the CNN downsampling filter can be obtained through a joint training manner.
The joint training of the CNN up-sampling filter and the CNN down-sampling filter comprises the following specific steps:
step 1, training a CNN up-sampling network independently to obtain initial parameters of the up-sampling network.
And downsampling the pictures in the training set, taking the obtained downsampled images as the input of a CNN upsampling network, and taking the original pictures in the training set as training targets. Specifically, the CNN upsampling network may be trained through formula (7), so as to obtain initial parameters of the upsampling network.
In the above formula (7), x i Original image, y, representing a training set i Representing a low resolution image obtained after down-sampling the original image, n representing the number of training samples, theta g Representing all parameters in the CNN upsampled network, g (y) i ;θ g ) Representing the upsampled network pair input y i The up-sampling operation is carried out,and (3) representing initial parameters of the CNN up-sampling network obtained after the training in the step 1 is finished.
And 2, training the CNN down-sampling network based on the CNN up-sampling network obtained by the training in the step 1 to obtain initial parameters of the CNN down-sampling network.
And (2) taking the original image in the training set as the input of the CNN down-sampling network, taking the output of the CNN down-sampling network as the input of the CNN up-sampling network obtained in the step (1) to obtain an image output by the CNN up-sampling network, and calculating the mean square error between the image output by the CNN up-sampling network and the original image as a loss function to minimize the loss function so as to obtain the initial parameters of the CNN down-sampling network.
Specifically, the CNN upsampling network may be trained specifically by formula (8), so as to obtain initial parameters of the upsampling network.
In the formula (8), θ f Representing all parameters in a CNN down-sampled network, f (x) i ;θ f ) Representing down-sampled network pair input x i The down-sampling operation is carried out,and (3) representing initial parameters of the CNN downsampling network obtained after the training in the step (2) is finished.
And 3, performing combined training on the up-sampling network and the down-sampling network to obtain updated network parameters.
Parameters obtained by training in step 1 and step 2Initial parameters of the up-sampling network and the down-sampling network, respectively. The original resolution images in the training set are then used as input and output, while parameters of the up and down sampling networks are trained.
Specifically, the CNN up-sampling network and the down-sampling network may be trained simultaneously according to equation (9).
In equation (9), the loss function consists of two terms, the first termThe second term of the error between the image output by the CNN downsampling network and the image output by the FIR filter downsampling network is calculatedThe error between the image output by the CNN upsampling network and the original resolution image is calculated. Training the up-sampling network and the down-sampling network according to the formula (9) to obtain the optimal parameters of the CNN down-sampling networkThe CNN downsampling network with the optimal parameters is the CNN downsampling filter used in the encoding process of this embodiment.
Step 4, fine adjustment of the up-sampling network according to the images in the training set
And (4) carrying out downsampling on all images in the training set by using the optimal downsampling network obtained after the training in the step (3) is finished, and then carrying out compression coding on the low-resolution images to obtain low-resolution reconstructed images. Taking the low-resolution compressed reconstructed image as input, taking the original resolution image corresponding to the training set as a training target, and retraining the upsampling network according to the method in the step 1 to obtain the optimal parameters of the CNN upsampling networkThe CNN upsampling network with the optimal parameters is the CNN upsampling filter used in the encoding and decoding process of this embodiment. It should be appreciated that the upsampling network is still trained using the image training set in step 4.
Filtering the CNN downsampling in the steps 1 to 4When the CNN upsampling filter is trained, the process shown in fig. 14 may be used for joint training to construct a reconstruction loss functionIn addition, a regularized loss function is reconstructedAnd (3) carrying out the CNN up-sampling filter and the CNN up-sampling filter by a preset image training set to ensure that the reconstruction loss function and the regularization loss function are minimum, and taking the corresponding parameter value when the reconstruction loss function and the regularization loss function are minimum as the parameter value of the CNN down-sampling filter and the CNN up-sampling filter.
In the first embodiment, when each image block is encoded, two encoding modes, namely the original resolution encoding mode and the low resolution encoding mode, are available for selection. When the image block is coded in the original resolution coding mode, the up-sampling filter and the down-sampling filter can be selected in two modes, wherein one mode is that the up-sampling filter and the down-sampling filter both adopt CNN filters, and the other mode is that the up-sampling filter and the down-sampling filter both adopt FIR filters. In the first embodiment, by adaptively selecting the up-sampling filter and the down-sampling filter, the compression coding efficiency of the existing block-level variable resolution coding scheme can be significantly enhanced. In addition, in the first embodiment, the parameter values of the CNN upsampling filter and the CNN downsampling filter are obtained by a combined offline training mode and are preset in the encoder and the decoder, so that the information loss caused by the image texture in the upsampling and downsampling processes can be reduced, and the quality of the encoded image can be improved.
The second embodiment: the down-sampling filter is a preset FIR down-sampling filter.
In the second embodiment, the filter parameters of the downsampling filter are fixed, but the upsampling filter is not fixed, the upsampling filter can be determined from the CNN filter and the FIR filter according to the image content of the image block, and the upsampling filter parameters are written into the code stream, so that the decoding end can know the parameters of the upsampling filter.
The system architecture of the encoder in the second embodiment can be as shown in fig. 15. When the encoder encodes an image block by using a variable resolution encoding mode, a preset FIR downsampling filter is fixedly used for downsampling the original resolution image block to obtain a low-resolution image block. When the encoder performs an upsampling operation on the reconstruction of the low-resolution image block, one of the preset CNN upsampling filter and the preset FIR upsampling filter can be flexibly selected as the upsampling filter. In addition, the encoder needs to write the variable resolution mode indication information and the upsampling filter selection flag information into the compressed code stream, so that the decoder can perform a decoding operation using the corresponding upsampling filter in the corresponding decoding mode.
It should be understood that in the second embodiment, the encoding process of the encoder is substantially the same as that of the encoder in the first embodiment, and in the second embodiment, the up-sampling filter can be selected from the FIR up-sampling filter and the CNN up-sampling filter in the manner of determining the up-sampling filter in the first embodiment, except that in the second embodiment, the down-sampling filter is fixed, so the down-sampling filter does not need to be determined in the second embodiment.
The decoding process of the decoder in the second embodiment is the same as that of the decoder in the first embodiment, and repeated description is omitted here for the sake of brevity.
In the second embodiment, by adaptively selecting the CNN upsampling filter or the FIR upsampling filter, the compression coding efficiency of the existing block-level variable resolution coding scheme can be significantly enhanced. In addition, the FIR up-sampling filter is fixedly used, and the encoder does not need to perform calculation comparison among a plurality of down-sampling filters, so that the encoding operation complexity can be reduced, and the encoding speed is increased.
In the second embodiment, the CNN upsampling filter can be trained in advance and preset in the encoder and the decoder. The encoder and decoder directly use a preset CNN up-sampling filter in the encoding and decoding process.
Specifically, the CNN upsampling filter parameter value may be a preset parameter value, and the preset parameter value may be obtained by performing offline training based on an image training set.
In the second embodiment, when performing offline training on the CNN upsampling filter, a preset FIR downsampling filter may be first used to perform offline training on the original image x in the training set i Down-sampling is carried out, and then the image y obtained by down-sampling is i As input to the CNN upsampling filter and convert the original image x i As a training target.
Next, the CNN upsampling network can be trained according to the general training method of the CNN network, so as to obtain the optimal parameters of the upsampling filterSpecifically, the CNN upsampling network may be trained according to formula (10), so as to obtain the optimal parameters of the upsampling filter
In the formula (10), x i Representing the original image in the training set, y i Representing a low resolution image obtained after down-sampling of the original image, n representing the number of training samples, θ g Representing all parameters in the CNN upsampled network, g (y) i ;θ g ) Representing the upsampled network pair input y i And performing an upsampling operation.
Furthermore, in the training process described above, except for down-sampling the image y i Out of training as a network input. Also can be firstly aligned with y i Coding operation is carried out to obtain down-sampling image reconstruction y' i Then y' i Training is performed as a network input. The up-sampling network obtained by training can eliminate image flaws introduced by coding operation to a certain extent while recovering the original resolution image details, and further improves the up-sampling operation to output the original resolutionQuality of the rate image block reconstruction.
Example three: the up-sampling filter is a preset FIR filter.
In the third embodiment, the filter parameter of the upsampling filter is fixed, but the downsampling filter is not fixed, and the downsampling filter can be determined from the CNN downsampling filter and the FIR downsampling filter according to the image content of the image block, and since the upsampling filter parameter is fixed, that is, the upsampling filter parameter is known to the encoding end and the decoding end, the encoding end does not need to write the upsampling filter parameter information into the code stream, and the code stream is saved.
The system architecture of the encoder in the third embodiment can be as shown in fig. 16. In the third embodiment, when the encoder encodes an image block, a preset FIR upsampling filter may be fixedly used for upsampling, and a filter may be flexibly selected from a preset CNN downsampling filter and a FIR downsampling filter as a downsampling filter. In the third embodiment, since the decoder only needs to perform the upsampling operation, the encoder can flexibly select any downsampling filter without passing the selection information of the downsampling filter to the decoder.
The encoding process of the encoder in the third embodiment specifically includes the following processes.
Firstly, determining coding costs of coding an image block to be coded in an original resolution coding mode and a variable resolution coding mode respectively.
Specifically, determining the coding costs of the image block to be coded in different coding modes respectively may include step 1, step 2 and step 3.
Step 1, determining the coding cost of an image block to be coded in an original resolution coding mode.
It should be understood that, when determining the coding cost of the original resolution coding in step 1, the specific process in step 702 may be referred to, and the rate distortion cost of the current image block may be cost org And determining the coding cost of the current image block in the original resolution coding mode.
When determining the coding cost of the low fraction coding, it is necessary to determine the coding cost when the CNN downsampling filter is used as the downsampling filter and the coding cost when the FIR downsampling filter is used as the downsampling filter, which are specifically shown in the following steps 2 and 3.
And 2, calculating the coding cost of the image block to be coded when the FIR downsampling filter is used for downsampling.
The calculation of the coding cost of the image block to be coded when the downsampling is performed by using the FIR downsampling filter is specifically the following process:
(1) Performing downsampling operation on the current image block by using an FIR downsampling filter to obtain an image block with low resolution;
(2) Performing coding operation on the low-resolution image block to output a compressed code stream;
(3) Acquiring a low-resolution reconstruction image block;
(4) Performing FIR (finite impulse response) up-sampling filtering operation on the low-resolution image block reconstruction to obtain a reconstructed image block with the original resolution;
(5) Calculating the mean square error D of the reconstructed image block and the current image block FIR Determining the size R of the compressed code stream of the current image block FIR ;
(6) According to D FIR And R FIR Calculating a rate-distortion cost downFIR And the rate distortion cost is cost downFIR And determining the coding cost of the current image block when the FIR filter is used as the target upsampling filter.
And 3, calculating the coding cost of the image block to be coded when the CNN downsampling filter is used for downsampling.
The calculation of the variable resolution coding cost when using the CNN downsampling filter is specifically the following process:
(1) Using a CNN downsampling filter to carry out downsampling operation on a current image block to obtain a low-resolution image block to be coded;
(2) Performing coding operation on the low-resolution image block to output a compressed code stream and a low-resolution reconstructed image block;
(3) Performing CNN up-sampling filtering operation on the low-resolution image block reconstruction to obtain a reconstructed image block of the original resolution;
(4) Calculating the mean square error D of the reconstructed image block and the current image block CNN Determining the size R of the compressed code stream of the current image block CNN ;
(5) According to D CNN And R CNN Calculating a rate-distortion cost downCNN And the rate distortion cost is cost downCNN And determining the coding cost of the current image block when the CNN filter is used as the target upsampling filter.
Secondly, selecting a mode with low coding cost as a preferred mode to perform coding operation on the current image block.
Specifically, in step 902, an encoding mode may be selected according to the size of the encoding cost calculated in step 901, and encoding information may be generated. The coding information may specifically include a variable resolution coding mode indication Flag1.
The setting of Flag1 according to the coding cost described above specifically includes the following three cases.
(1) When cost org <cost downFIR And cost org <cost downCNN When the current image block is determined to be coded in the original resolution mode, setting Flag1 to be 0;
(2) When cost downFIR <cost org And cost downFIR <cost downCNN When the image is coded in the variable resolution mode, the current image block is determined to be coded, a FIR downsampling filter is adopted for downsampling operation, and Flag1 is set to be 1;
(3) When cost downCNN <cost org And cost downCNN <cost downFIR And then, determining to encode the current image block in the variable resolution mode, and performing downsampling operation by adopting a CNN downsampling filter, wherein Flag1 is set to be 1.
And finally, determining the coding information of the current image block, and writing the coding information into a code stream.
In step 903, the coding information may include, in addition to the variable resolution coding mode Flag1 determined in step 902, a prediction mode when the current image block is coded, a quantized coding coefficient, and the like.
The decoding process of the decoding end in the third embodiment is shown in fig. 17, the decoding process shown in fig. 17 corresponds to the encoding process shown in the above steps 901 to 903, and fig. 17 specifically includes the following steps.
1001. And acquiring a compressed code stream.
1002. And analyzing the variable resolution coding mode indication Flag1 of the current image block.
The current image block may be an image block to be decoded currently, the compressed code stream acquired in step 1001 may be a code stream of the current image block, and after the code stream of the current image block is acquired, the variable resolution coding mode indication Flag1 may be analyzed from the code stream of the current image block, and a value of Flag1 is acquired.
1003. It is determined whether or not the variable resolution decoding mode is employed based on Flag1.
The Flag1 value may be used to indicate whether a variable resolution coding mode or an original resolution coding mode is used when generating the code stream of the current image block. Therefore, whether to decode the current image block in the variable resolution decoding mode can be determined according to the value of Flag1.
For example, when the value of Flag1 is 0, it indicates that the original resolution coding mode is adopted when the code stream of the current image block is generated; when the value of Flag1 is 1, it indicates that a variable resolution coding mode is adopted when the code stream of the current image block is generated.
Therefore, when the value of Flag1 is 0, it can be determined that the current image block is decoded by adopting the original resolution coding mode; when the value of Flag1 is 1, it may be determined that the current image block is decoded by using the variable resolution coding mode.
1004. And analyzing the coding information of the low-resolution coding block to obtain a low-resolution reconstructed image block.
Because the encoding end can select different encoding schemes to encode the image block, the decoding scheme corresponding to the encoding scheme is also selected to decode the reconstructed image block obtained by analyzing the decoding information.
For example, if an image block is encoded in a manner similar to JPEG, only the quantized transform coefficient information of the image block needs to be analyzed from the code stream during decoding, and inverse quantization and inverse transform operations are performed on the coefficients, so that the original resolution image block reconstruction can be obtained.
In addition, if a similar h.264 or h.265 intra-frame coding mode is used to encode an image block, the coding information such as the coding mode, prediction mode, quantized transform coefficient, etc. of the current block needs to be decoded from the code stream during decoding. And after inverse quantization and inverse transformation are carried out on the quantized transformation coefficient to obtain a reconstructed residual signal, prediction operation is carried out according to a prediction mode to obtain a prediction signal, and finally the prediction signal and the prediction signal are added to obtain the original resolution image block reconstruction.
Specifically, as an optional implementation manner, acquiring the low-resolution reconstructed block mainly includes the following processes:
firstly, entropy decoding a compressed code stream of a current image block to obtain a quantization coefficient;
secondly, inverse quantization and transformation are carried out on the quantization coefficients to obtain a reconstructed residual signal of the current image block;
thirdly, acquiring a prediction signal (or called a prediction image block) of the current image block;
and finally, obtaining a low-resolution reconstructed image block of the current image block according to the residual signal and the prediction signal of the current image block.
1005. And performing up-sampling on the low-resolution reconstruction block by using an FIR up-sampling filter to obtain a reconstruction image block with the original resolution.
In step 1005, the original resolution refers to the resolution of the image block to be encoded corresponding to the reconstructed image block at the encoding end, and the reconstructed image block with the same resolution as the original image to be encoded can be obtained by performing upsampling.
1006. And analyzing the coding information of the coding block with the original resolution to obtain a reconstructed image block with the original resolution.
The process of obtaining the reconstructed image block with the original resolution in step 1006 is the same as the specific process of obtaining the reconstructed image block with the low resolution in step 1005, except that the input signal in step 1005 is a code stream obtained in a low resolution coding mode, and the input signal in step 1006 is a code stream obtained in an original resolution coding mode.
1007. It is determined whether the current image block is the last image block.
Specifically, whether the current image block is the last image block to be decoded or not can be determined according to other encoding information carried in the code stream.
If the current image block is not the last image block to be decoded, continue to perform step 1002 to continue decoding the next image block; in the case that the current image block is the last image block to be decoded, step 1008 is performed and the decoding process of the current image ends.
1008. The decoding of the current picture is finished.
In the third embodiment, the CNN downsampling filter can be obtained by training in advance and is preset in the encoder. The encoder directly uses a preset CNN downsampling filter in the encoding process.
Specifically, the CNN downsampling filter parameter value may be a preset parameter value, and the preset parameter value may be obtained by performing offline training based on an image training set.
In the third embodiment, the off-line training of the CNN downsampling filter includes the following specific steps:
the offline training of the CNN downsampling filter may include two steps, initial parameter training and parameter fine tuning.
Step 1, training a CNN downsampling filter to obtain initial parameters of the CNN downsampling network.
And taking the original image in the training set as the input of the CNN downsampling filter, taking the output of the CNN downsampling filter as the input of the FIR upsampling filter, calculating the mean square error between the image output by the FIR upsampling filter and the original image as a loss function, and taking the filter parameter with the minimum loss function as the initial parameter of the CNN downsampling network. Specifically, offline training may be performed according to formula (11) to obtain initial parameters of the CNN downsampling network.
In the formula (11), the first and second groups,representing all parameters in a CNN down-sampled network, f (x) i ;θ f ) Representing down-sampled network pair input x i Do down-sampling operation, g FIR Indicating an up-sampling operation using an FIR filter,representing initial parameters of the CNN downsampling network.
And 2, fine adjustment is carried out on the CNN downsampling network initial parameters.
And (3) using the original resolution images in the training set as input sums to train parameters of the up-sampling network and the down-sampling network. Specifically, the CNN upsampling network and the downsampling network may be trained simultaneously according to equation (12).
In equation (12), the loss function consists of two terms, the first termCalculated is the error between the original images of the images output by the CNN up-sampling network, the second termThe error between the image output by the CNN downsampled network and the bicubic downsampled output image is calculated. The optimal parameters of the CNN downsampling network can be obtained by training the upsampling network and the downsampling network according to the formula (12)The CNN downsampling network with the optimal parameters is the CNN downsampling filter used in the encoding process of this embodiment.
It should be understood that if the encoder computation is considered to be saved, the following step 2 can be skipped and the result obtained in step 1 can be used directlyAnd setting a CNN down-sampling filter.
In the first to third embodiments, the parameters of the CNN upsampling filter and the CNN downsampling filter are obtained by an offline training mode, and in order to obtain a better coding effect, the parameters of the CNN upsampling filter and the CNN downsampling filter may also be obtained by an online training mode.
The following describes the process of obtaining the parameters of the CNN filter by online training in detail with reference to the fourth to sixth embodiments. It should be understood that the parameters of the CNN upsampling filter and the CNN downsampling filter may be updated by using the online training method in the fourth to sixth embodiments, that is, the online training method in the fourth to sixth embodiments may be applied to the first to sixth embodiments described above to update the parameters of the CNN upsampling filter or the CNN downsampling filter.
Example four: and carrying out online training on the CNN downsampling filter.
In the fourth embodiment, the encoder may train parameters of a CNN downsampling filter on line according to the image block to be encoded, and perform downsampling operation on the image block to be encoded by using the trained CNN downsampling filter. Obviously, when the CNN downsampling filter is used in the technical scheme of the encoder to perform downsampling operation on the image block to be encoded, the scheme in this embodiment may be used to update the parameters of the CNN downsampling filter.
In the fourth embodiment, the on-line training procedure shown in fig. 5 may be employed to update the CNN downsampling network parameters. As shown in FIG. 5, I O For the current picture to be coded, I L Representing input image I using a CNN downsampling filter O The simulator at the encoding end is a CNN network simulating the image encoding process, and can simulate the image encoding process and output low-resolution imagesReconstructed image I' L And coding bit overhead. The CNN up-sampling network is obtained through off-line training and is pre-arranged in an encoder and a decoder for up-sampling reconstruction of the low-resolution image block. The CNN downsampling network structure may be as in fig. 11 above and the CNN upsampling network structure may be as in fig. 13 above.
The CNN downsampling network is trained according to the formula (13), so that the optimal parameters of the CNN downsampling network can be obtained
In equation (13), f represents a downsampling operation mapping function, θ f Representing the downsampling network parameters, h representing the encoded reconstructed image output by the encoding simulator mapping function, g representing the mapping function of the upsampling network,denotes the pre-set up-sampling filter parameters, r denotes the coding bit overhead of the coding emulator output, and λ is the weighting factor.
In the fourth embodiment, information loss caused by downsampling can be reduced to the maximum extent by training the parameters of the CNN downsampling filter on line, and the quality of a reconstructed image is improved. In addition, since the CNN downsampling filter is only used at the encoder, it does not need to be passed to the decoder without an increase in the encoding overhead, making the overall rate-distortion of the scheme small.
Example five: the CNN upsampling filter is trained online.
In the fifth embodiment, the encoder may train parameters of the CNN upsampling filter on line according to the image block to be encoded, and perform upsampling operation on the low-resolution image block reconstruction by using the trained CNN upsampling filter. It should be understood that, in the present application, when the encoder performs an upsampling operation on a low-resolution reconstructed image block by using the CNN upsampling filter, the parameters of the CNN upsampling filter can be updated by using the scheme in the fifth embodiment. Moreover, since the decoding end needs to use the same CNN upsampling filter to perform upsampling operation on the low-resolution image block reconstruction, the encoding end needs to write the updated parameters of the CNN upsampling filter into the code stream, so that the decoding end can also obtain the parameters of the CNN upsampling filter.
In the fifth embodiment, the on-line training procedure shown in fig. 5 can be used to update the CNN upsampling filter parameters. As shown in FIG. 5, I O For the current picture to be coded, I L Representing input image I using a CNN downsampling filter O The coding end simulator is a CNN network simulating the image coding process and can simulate the image coding process and output a low-resolution reconstructed image I' L And coding bit overhead. The CNN downsampling network structure may be as in fig. 11 above and the CNN upsampling network structure may be as in fig. 13 above. The CNN downsampling network is obtained through offline training, is preset in the encoder and is used for downsampling an original image block to be encoded.
The optimal parameters of the CNN up-sampling network can be obtained by training the CNN up-sampling network through the formula (14)
In equation (14), g represents an upsampling operation mapping function, θ g Representing the upsampling filter parameters and g representing the mapping function of the upsampling filter.
In the fifth embodiment, the parameters of the CNN upsampling filter are trained on line, so that the matched parameters can be used for upsampling according to the texture features of the image, and the quality of the image block output by upsampling can be further improved compared with the preset parameters of the CNN upsampling filter. In addition, the encoding end may also comprehensively consider the image quality improvement and parameter transfer cost brought by updating the upsampling filter parameters, for example, the overall rate distortion performance of the encoding scheme of this embodiment is improved by means of RDO.
It should be understood that the method for training parameters of a CNN upsampling filter on line in the fifth embodiment may also be applied to the first embodiment, and when the scheme in the fifth embodiment is applied to the first embodiment, the CNN upsampling filter may obtain filter parameters in an on-line training manner in addition to using preset filter parameters. Specifically, a CNN upsampling filter obtained using only on-line training may be provided. It is also possible to set one coded picture or one coded block to be used alternatively between the preset filter and the on-line training filter.
Because the decoding end needs to use the same CNN upsampling filter to perform upsampling operation on the low-resolution image block reconstruction, the encoding end needs to transmit the trained CNN upsampling filter parameters to the decoder. Specifically, the encoding end may write the filter parameter into a parameter set such as SPS and PPS in the video compressed code stream, or into a slice header (slice header). The number of filter parameters is determined by the CNN upsampling filter structure, including the weights (weights) and offsets (bias) of all convolutional layers in the CNN upsampling filter. In order to save the parameter transmission overhead, one of the existing lossless data compression methods may be selected to perform compression Coding on the group of parameters, for example, differential Pulse Code Modulation (DPCM) Coding, huffman Coding (Huffman Coding), arithmetic Coding, and the like.
When the fifth embodiment is applied to the first embodiment, the decoding process of the decoder is substantially the same as that of the first embodiment, except that the decoder further needs to parse the code stream to obtain the parameters of the CNN upsampling filter. Specific decoding process referring to fig. 18, as shown in fig. 18, the decoding process of the decoder specifically includes:
1101. and acquiring a compressed code stream.
1102. And analyzing to obtain parameters of the CNN up-sampling filter.
The CNN upsampling filter parameters may be obtained by an encoder according to an on-line training of the CNN upsampling filter.
1103. And analyzing the variable resolution coding mode indication Flag1 of the current image block.
The current image block may be an image block to be decoded currently, the compressed code stream acquired in step 1101 may be a code stream of the current image block, and after the code stream of the current image block is acquired, the variable resolution coding mode indication Flag1 may be analyzed from the code stream of the current image block, and a value of Flag1 is acquired.
1104. It is determined whether or not the variable resolution decoding mode is employed based on Flag1.
The value of Flag1 may be used to indicate whether a variable resolution coding mode or an original resolution coding mode is adopted when generating the code stream of the current image block. Therefore, whether to decode the current image block in the variable resolution decoding mode can be determined according to the value of Flag1.
For example, when the value of Flag1 is 0, it indicates that the original resolution coding mode is adopted when the code stream of the current image block is generated; when the value of Flag1 is 1, it indicates that a variable resolution coding mode is adopted when the code stream of the current image block is generated.
Therefore, when the value of Flag1 is 0, it can be determined that the current image block is decoded in the original resolution coding mode; when the value of Flag1 is 1, it may be determined that the current image block is decoded in the variable resolution coding mode.
1105. The upsampling filter select Flag2 of the current image block is parsed.
The value of Flag2 may be used to indicate whether to upsample using a FIR upsampling filter or a CNN upsampling filter as the upsampling filter in the encoding process.
1106. And analyzing the coding information of the low-resolution coding block to obtain a low-resolution reconstructed image block.
Because the encoding end can select different encoding schemes to encode the image block, the decoding scheme corresponding to the encoding scheme is also selected to decode when the reconstructed image block is obtained by analyzing the decoding information.
1107. And analyzing the coding information of the coding block with the original resolution to acquire a reconstructed image block with the original resolution.
1108. Whether to adopt a CNN up-sampling filter is determined according to Flag2.
Specifically, whether the CNN upsampling filter or the FIR upsampling filter is used for upsampling may be specifically determined according to the value of Flag2.
For example, when the value of Flag2 is 0, it means that the CNN upsampling filter is used for upsampling, and when the value of Flag2 is 1, it means that the FIR upsampling filter is used for upsampling. Therefore, the CNN upsampling filter is determined to be used when the value of Flag2 is 1, and the CNN upsampling filter is not determined to be used (the FIR upsampling filter is used) when the value of Flag2 is 0.
1109. And (3) upsampling the low-resolution reconstructed block by using a CNN upsampling filter to obtain a reconstructed image block of the original resolution.
1110. And performing up-sampling on the low-resolution reconstruction block by using an FIR up-sampling filter to obtain a reconstruction image block with the original resolution.
In step 1108 and step 1109, the original resolution refers to the resolution of the image block to be encoded corresponding to the reconstructed image block at the encoding end, and the reconstructed image block with the same resolution as the original image to be encoded can be obtained by performing upsampling.
1111. It is determined whether the current image block is the last image block.
Specifically, whether the current image block is the last image block to be decoded or not can be determined according to other encoding information carried in the code stream.
If the current image block is not the last image block to be decoded, continue to perform step 1103 to continue decoding the next image block; in case the current image block is the last image block to be decoded, step 1112 is executed and the decoding process of the current image ends.
1112. The decoding of the current picture is finished.
Example six: and performing joint online training on the CNN down-sampling filter and the CNN up-sampling filter.
Specifically, the CNN network in fig. 5 may be employed onlineAnd the training process is used for updating the network parameters of the CNN up-sampling and down-sampling simultaneously. As shown in FIG. 5, I O For the current picture to be coded, I L Representing input image I using a CNN downsampling filter O The coding end simulator is a CNN network simulating the image coding process and can simulate the image coding process and output a low-resolution reconstructed image I' L And coding bit overhead. The CNN downsampling network structure may be as shown in fig. 11 above, and the CNN upsampling network structure may be as shown in fig. 13 above.
The CNN down-sampling filter and the CNN up-sampling filter are jointly trained on line, and specifically, the CNN up-sampling network and the CNN down-sampling network can be trained according to a formula (15) to obtain the optimal parameters of the CNN up-sampling networkAnd optimal parameters for CNN downsampling networks
In equation (15), f represents a downsampling operation mapping function, g represents a mapping function of an upsampling network, h represents an encoded reconstructed image output by an encoding simulator mapping function, and θ f Representing a down-sampled network parameter, θ g Denotes the up-sampling filter parameters, r denotes the coding bit overhead of the coding simulator output, and λ is the weighting coefficient.
In the sixth embodiment, the parameters of the CNN upsampling filter and the CNN downsampling filter are obtained by performing joint training under an online condition, and compared with a mode of training the filter parameters alone, the parameter values of the CNN upsampling filter and the CNN downsampling filter matched with the image block to be encoded can be obtained more accurately, so that information loss caused by image textures in the upsampling and downsampling processes can be reduced during encoding, and the quality of an encoded image is improved.
While the image encoding method and the image decoding method according to the embodiment of the present application are described in detail with reference to fig. 2 to 18, the image encoding apparatus and the image decoding apparatus according to the embodiment of the present application are described with reference to fig. 19 to 21, it is understood that the image encoding apparatus shown in fig. 19 and 20 can perform the steps of the image encoding method according to the embodiment of the present application shown in fig. 2 to 18, and the image decoding apparatus shown in fig. 21 can perform the steps of the image decoding method according to the embodiment of the present application shown in fig. 19 to 21. For the sake of brevity, duplicate description is appropriately omitted below.
In addition, the image encoding apparatus and the image decoding apparatus shown in fig. 19 to 21 may be specifically an intelligent terminal, a PAD, a video player, an internet of things device, or other devices having video image encoding or decoding functions.
Fig. 19 is a schematic block diagram of an image encoding device according to an embodiment of the present application. The image encoding device 1200 shown in fig. 19 specifically includes:
a determining module 1201, configured to determine a target upsampling filter from a preset upsampling filter set according to a coding cost of an image block to be coded, where the upsampling filter set at least includes a finite impulse response FIR upsampling filter and a convolutional neural network CNN upsampling filter;
a processing module 1202, the processing module 1202 specifically configured to: generating up-sampling filter indication information corresponding to the target up-sampling filter; adopting a preset FIR down-sampling filter to perform down-sampling on the image block to be encoded to obtain a first image block; coding the first image block to obtain a code stream; and writing the indication information of the up-sampling filter into the code stream.
According to the method and the device, one filter can be selected from multiple candidate filters as an up-filter according to the coding cost of the image block, and a better coding effect can be obtained compared with a mode of adopting a fixed filter as an up-sampling filter.
Fig. 20 is a schematic block diagram of an image encoding device according to an embodiment of the present application. The image encoding device 1300 shown in fig. 20 specifically includes:
an obtaining module 1301, configured to obtain a code stream;
a processing module 1302, which is specifically configured to: entropy decoding, inverse quantization and inverse transformation are carried out on the code stream to obtain a reconstructed residual error signal of the image block to be decoded; acquiring a prediction signal of the image block to be decoded; adding the reconstruction residual signal and the prediction signal to obtain an initial reconstruction image block of the image block to be decoded; analyzing the code stream to obtain coding mode indication information; determining a target decoding mode from an original resolution decoding mode and a variable resolution decoding mode according to the encoding mode indication information; under the condition that the target decoding mode is a variable resolution decoding mode, analyzing the code stream of the image block to be coded to acquire indication information of an up-sampling filter; determining a target upsampling filter from a preset upsampling filter set according to the upsampling filter indication information, wherein the upsampling filter set at least comprises a Finite Impulse Response (FIR) upsampling filter and a Convolutional Neural Network (CNN) upsampling filter;
according to the scheme, the target up-sampling filter can be determined from the up-sampling filter set according to the coding cost, and compared with the operation of directly adopting the up-sampling filter with fixed parameter values to perform up-sampling, the coding cost of the image block to be coded is fully considered when the target up-sampling filter is selected, and a better coding effect can be obtained.
Fig. 21 is a schematic block diagram of an image decoding apparatus according to an embodiment of the present application. The image coding apparatus 1400 shown in fig. 21 specifically includes:
an obtaining module 1401, configured to obtain a code stream;
a processing module 1402, the processing module specifically configured to: entropy decoding, inverse quantization and inverse transformation are carried out on the code stream to obtain a reconstructed residual error signal of the image block to be decoded; acquiring a prediction signal of the image block to be decoded; adding the reconstruction residual signal and the prediction signal to obtain an initial reconstruction image block of the image block to be decoded; analyzing the code stream to obtain coding mode indication information; determining a target decoding mode from an original resolution decoding mode and a variable resolution decoding mode according to the encoding mode indication information; under the condition that the target decoding mode is a variable resolution decoding mode, analyzing the code stream of the image block to be coded to acquire the indication information of the up-sampling filter; determining a target upsampling filter from a preset upsampling filter set according to the upsampling filter indication information, wherein the upsampling filter set at least comprises a Finite Impulse Response (FIR) upsampling filter and a Convolutional Neural Network (CNN) upsampling filter; and upsampling the initial reconstructed image block by adopting the target upsampling filter to obtain a target reconstructed image block.
In the application, when an image block is decoded in a variable resolution decoding mode, a decoding end can select a corresponding up-sampling filter from a preset up-sampling filter set as a target up-sampling filter according to up-sampling filter indication information, and then performs up-sampling operation.
The specific implementation form of the image encoding apparatus 1200, the image encoding apparatus 1300, and the image decoding apparatus 1400 may be any one of the following devices: a desktop computer, a mobile computing device, a notebook (e.g., laptop) computer, a tablet computer, a set-top box, a smartphone, a handset, a television, a camera, a display device, a digital media player, a video game console, an on-board computer, or other similar apparatus.
Fig. 22 is a schematic block diagram of an encoder of an embodiment of the present application. The encoder 2000 shown in fig. 22 includes: a coding-end prediction module 2001, a transform quantization module 2002, an entropy coding module 2003, a coding reconstruction module 2004 and a coding-end filtering module.
The specific configurations of the image encoding apparatus 1200 and the image encoding apparatus 1300 described above may be as described above with reference to the encoder 2000, and the encoder 2000 is capable of executing the steps of the image encoding method according to the embodiment of the present application.
Fig. 23 is a schematic block diagram of a decoder of an embodiment of the present application. The decoder 3000 shown in fig. 23 includes: an entropy decoding module 3001, an inverse transform inverse quantization module 3002, a decoding side prediction module 3003, a decoding reconstruction module 3004 and a decoding side filtering module 3005.
The specific structure of the image decoding apparatus 1400 can be as shown in the above decoder 3000, and the decoder 3000 can execute the steps of the image decoding method according to the embodiment of the present application.
Fig. 24 is a schematic block diagram of a codec device according to an embodiment of the present application. The codec device 50 may be a device dedicated to encoding and/or decoding video images, or may be an electronic device having a video codec function, and further, the codec device 50 may be a mobile terminal or a user equipment of a wireless communication system.
It should be understood that the codec shown in fig. 24 can be regarded as the specific structure of the image encoding apparatus 1200, the image encoding apparatus 1300, and the image decoding apparatus 1400 described above. Fig. 24 is a block diagram for executing steps of the image coding and decoding method according to the embodiment of the present application.
The codec device 50 may include the following modules or units: a controller 56, a codec 54, a radio interface 52, an antenna 44, a smart card 46, a card reader 48, a memory 58, an infrared port 42, and the display 32. In addition to the modules and units shown in fig. 24, the codec device 50 may also include a microphone or any suitable audio input module, which may be a digital or analog signal input, and the codec device 50 may also include an audio output module, which may be a headphone, a speaker, or an analog audio or digital audio output connection. Codec 50 may also include a battery, which may be a solar cell, a fuel cell, or the like. The codec 50 may also include an infrared port for short range line-of-sight communication with other devices, and the codec 50 may also communicate with other devices using any suitable short range communication means, including, for example, a bluetooth wireless connection, a USB/firewire wired connection.
The memory 58 may store data in the form of images and audio data, as well as instructions for execution on the controller 56.
Codec 54 may enable encoding and decoding of audio and/or video data or auxiliary encoding and auxiliary decoding of audio and/or video data under the control of controller 56.
The smart card 46 and the card reader 48 may provide user information, as well as network authentication and authentication information for authorized users. Specific implementations of the smart Card 46 and the Card reader 48 may be a Universal Integrated Circuit Card (UICC) and UICC reader.
The radio interface circuit 52 may generate a wireless communication signal, which may be a communication signal resulting from an ongoing cellular communication network, wireless communication system or wireless local area network communication.
The antenna 44 is used to transmit radio frequency signals generated at the radio interface circuit 52 to other devices (the number of devices may be one or more), and may also be used to receive radio frequency signals from other devices (the number of devices may be one or more).
In some embodiments of the present application, the codec device 50 may receive the video image data to be processed from another device prior to transmission and/or storage. In other embodiments of the present application, the codec device 50 may receive images via a wireless or wired connection and encode/decode the received images.
It should be understood that the processing object of the image coding and decoding method of the embodiment of the present application may be a video image, that is, the image coding and decoding method of the present application may code and decode the video image. Therefore, the video coding and decoding system can also perform the image coding and decoding method according to the embodiment of the present application, and the video coding and decoding system according to the embodiment of the present application is described in detail below with reference to fig. 17.
Fig. 25 is a schematic block diagram of a video codec system according to an embodiment of the present application.
As shown in fig. 25, the video codec system 7000 includes a source device 4000 and a destination device 5000. The source device 4000 generates encoded video data, the source device 4000 may also be referred to as a video encoding device or a video encoding apparatus, the destination device 5000 may decode the encoded video data generated by the source device 4000, and the destination device 5000 may also be referred to as a video decoding device or a video decoding apparatus.
The source device 4000 corresponds to the image encoding device 1200 and the image encoding device 1300 in the above, and the destination device 5000 corresponds to the image decoding device 1400 in the above.
The source device 4000 and the destination device 5000 may be implemented in any one of the following devices: a desktop computer, a mobile computing device, a notebook (e.g., laptop) computer, a tablet computer, a set-top box, a smartphone, a handset, a television, a camera, a display device, a digital media player, a video game console, an on-board computer, or other similar apparatus.
Destination device 5000 may receive encoded video data from source device 4000 via channel 6000. Channel 6000 may include one or more media and/or devices capable of moving the encoded video data from source device 4000 to destination device 5000. In one example, channel 6000 may include one or more communication media that enable source device 4000 to transmit encoded video data directly to destination device 5000 in real-time, in which example source device 4000 may modulate the encoded video data according to a communication standard (e.g., a wireless communication protocol) and may transmit the modulated video data to destination device 5000. The one or more communication media may comprise wireless and/or wired communication media such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media described above may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the internet). The one or more communication media may comprise a router, switch, base station, or other device that enables communication from source device 4000 to destination device 5000.
In another example, channel 6000 may include a storage medium that stores encoded video data generated by source device 4000. In this example, destination device 5000 can access the storage medium via disk access or card access. The storage medium may comprise a variety of locally accessed data storage media such as blu-ray discs, digital Video Discs (DVDs), compact Disc Read-Only memories (CD-ROMs), flash Memory, or other suitable Digital storage media for storing encoded Video data.
In another example, channel 6000 may include a file server or another intermediate storage device that stores the encoded video data generated by source device 4000. In this example, destination device 5000 may access the encoded video data stored at a file server or other intermediate storage device via streaming or download. The file server may be of a type capable of storing encoded video data and transmitting the encoded video data to the destination device 5000. For example, a File server may include a World Wide Web server (e.g., for a website), a File Transfer Protocol (FTP) server, a Network Attached Storage (NAS) device, and a local disk drive.
Destination device 5000 may access the encoded video data via a standard data connection, such as an internet connection. Example types of data connections include a wireless channel, a wired connection (e.g., cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from the file server may be a streaming transmission, a download transmission, or a combination of both.
The image coding and decoding method in the embodiment of the present application is not limited to a wireless application scenario, and for example, the image coding and decoding method in the embodiment of the present application may be applied to video coding and decoding supporting various multimedia applications such as the following applications: over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions (e.g., via the internet), encoding of video data stored on a data storage medium, decoding of video data stored on a data storage medium, or other applications. In some examples, video codec system 7000 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
In fig. 25, a source device 4000 includes a video source 4001, a video encoder 4002, and an output interface 4003. In some examples, output interface 4003 can include a modulator/demodulator (modem) and/or a transmitter. Video source 4001 may comprise a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video input interface to receive video data from a video content provider, and/or a computer graphics system for generating video data, or a combination of the aforementioned video data sources.
Video encoder 4002 may encode video data from video source 4001. In some examples, source device 4000 transmits the encoded video data directly to destination device 5000 via output interface 4003. The encoded video data may also be stored on a storage medium or file server for later access by destination device 5000 for decoding and/or playback.
In the example of fig. 25, destination device 5000 includes an input interface 5003, a video decoder 5002, and a display device 5001. In some examples, input interface 5003 includes a receiver and/or a modem. The input interface 5003 may receive encoded video data via a channel 6000. The display device 5001 may be integrated with the destination device 5000 or may be external to the destination device 5000. In general, the display device 5001 displays decoded video data. The display device 5001 may include a variety of display devices such as a liquid crystal display, a plasma display, an organic light emitting diode display, or other types of display devices.
The Video encoder 4002 and the Video decoder 5002 may operate according to a Video compression standard (e.g., the High Efficiency Video codec h.265 standard) and may be in compliance with the High Efficiency Video Coding (HEVC) test model (HM). The text description of the H.265 standard ITU-T H.265 (V3) (04/2015), published on No. 29, 4.2015, downloadable from http:// handle. Itu.int/11.1002/7000/12455, the entire contents of which are incorporated herein by reference.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (42)
1. An image encoding method, comprising:
determining a target up-sampling filter from a preset up-sampling filter set according to the coding cost of an image block to be coded, wherein the up-sampling filter set at least comprises a Finite Impulse Response (FIR) up-sampling filter and a Convolutional Neural Network (CNN) up-sampling filter;
generating up-sampling filter indication information corresponding to the target up-sampling filter;
adopting a preset FIR down-sampling filter to perform down-sampling on the image block to be encoded to obtain a first image block;
coding the first image block to obtain a code stream;
and writing the indication information of the up-sampling filter into the code stream.
2. The method as claimed in claim 1, wherein the determining a target upsampling filter from a preset upsampling filter set according to the coding cost of the image block to be coded comprises:
determining the coding cost of the image block to be coded when each upsampling filter in the upsampling filter set is used as the target upsampling filter;
determining a first upsampling filter in the upsampling filter set as the target upsampling filter, wherein in the upsampling filter set, the coding cost of the image block to be coded is the smallest when the first upsampling filter is used as the target upsampling filter.
3. The method of claim 1 or 2, wherein the parameter values of the CNN upsampling filter are preset, and the parameter values of the CNN upsampling filter are obtained by performing offline training on a preset image training set.
4. The method according to any of claims 1-3, wherein before determining the target upsampling filter from the set of pre-set upsampling filters according to the coding cost of the image block to be coded, the method further comprises:
and performing online training on the CNN up-sampling filter according to the image block to be encoded to obtain an update parameter value of the CNN up-sampling filter, wherein the update parameter value of the CNN up-sampling filter is used for replacing a preset parameter value of the CNN up-sampling filter.
5. The method of claim 4, wherein the method further comprises:
and writing the update parameter value of the CNN up-sampling filter into the code stream.
6. The method according to any of claims 1-5, wherein before determining the target upsampling filter from the set of pre-set upsampling filters according to the coding cost of the image block to be coded, the method further comprises:
determining a target coding mode from an original resolution coding mode and a variable resolution coding mode according to the coding cost of the image block to be coded;
generating encoding mode indication information corresponding to the target encoding mode;
and writing the coding mode indication information into a code stream.
7. An image encoding method, comprising:
determining a target up-sampling filter from a preset up-sampling filter set according to the coding cost of an image block to be coded, wherein the up-sampling filter set at least comprises a Finite Impulse Response (FIR) up-sampling filter and a Convolutional Neural Network (CNN) up-sampling filter;
generating up-sampling filter indication information corresponding to the target up-sampling filter;
determining a down-sampling filter with the same type as the target up-sampling filter in a preset down-sampling filter set as a target down-sampling filter, wherein the down-sampling filter set at least comprises a Finite Impulse Response (FIR) down-sampling filter and a Convolutional Neural Network (CNN) down-sampling filter;
adopting the target downsampling filter to downsample the image block to be encoded to obtain a first image block;
coding the first image block to obtain a code stream;
and writing the indication information of the up-sampling filter into the code stream.
8. The method as claimed in claim 7, wherein the determining a target upsampling filter from a preset upsampling filter set according to the coding cost of the image block to be coded comprises:
determining the coding cost of the image block to be coded when each upsampling filter in the upsampling filter set is used as the target upsampling filter;
determining a first upsampling filter in the upsampling filter set as the target upsampling filter, wherein in the upsampling filter set, the coding cost of the image block to be coded is the smallest when the first upsampling filter in the upsampling filter set is used as the target upsampling filter.
9. The method of claim 7 or 8, wherein the parameter values of the CNN upsampling filter are preset, and the parameter values of the CNN upsampling filter are obtained by performing offline training on a preset image training set.
10. The method according to any one of claims 7 to 9, wherein the parameter values of the CNN downsampling filter are preset, and the parameter values of the CNN downsampling filter are obtained by performing offline training on a preset image training set.
11. The method of claim 7 or 8, wherein the parameter values of the CNN upsampling filter and the CNN downsampling filter are preset, and the parameter values of the CNN upsampling filter and the CNN downsampling filter are obtained by jointly training a preset image training set in an offline state.
12. The method of any one of claims 7-11, wherein prior to downsampling the image block to be encoded with the target downsampling filter, the method further comprises:
and performing on-line training on the CNN down-sampling filter according to the image block to be encoded to obtain an update parameter value of the CNN down-sampling filter, wherein the update parameter value of the CNN down-sampling filter is used for replacing a preset parameter value of the CNN down-sampling filter.
13. The method according to any of claims 7-12, wherein before determining the target upsampling filter from the set of pre-set upsampling filters according to the coding cost of the image block to be coded, the method further comprises:
and performing online training on the CNN up-sampling filter according to the image block to be encoded to obtain an updated parameter value of the CNN up-sampling filter, wherein the updated parameter value of the CNN up-sampling filter is used for replacing a preset parameter value of the CNN up-sampling filter.
14. The method of any one of claims 7-11, wherein prior to downsampling the image block to be encoded with the target downsampling filter, the method further comprises:
performing joint online training on the CNN down-sampling filter and the CNN up-sampling filter according to the image block to be encoded to obtain an update parameter value of the CNN down-sampling filter and an update parameter value of the CNN up-sampling filter;
the updated parameter value of the CNN downsampling filter is used for replacing the parameter value preset by the CNN downsampling filter, and the updated parameter value of the CNN upsampling filter is used for replacing the parameter value preset by the CNN upsampling filter.
15. The method of claim 13 or 14, wherein the method further comprises:
and writing the update parameter value of the CNN up-sampling filter into the code stream.
16. The method according to any of claims 7-15, wherein before determining the target upsampling filter from the set of pre-set upsampling filters according to the coding cost of the image block to be coded, the method further comprises:
determining a target coding mode from an original resolution coding mode and a variable resolution coding mode according to the coding cost of the image block to be coded;
generating encoding mode indication information corresponding to the target encoding mode;
and writing the coding mode indication information into a code stream.
17. An image decoding method, comprising:
acquiring a code stream;
entropy decoding, inverse quantization and inverse transformation are carried out on the code stream to obtain a reconstructed residual signal of an image block to be decoded;
acquiring a prediction signal of the image block to be decoded;
adding the reconstructed residual error signal and the prediction signal to obtain an initial reconstructed image block of the image block to be decoded;
analyzing the code stream to obtain coding mode indication information;
determining a target decoding mode from an original resolution decoding mode and a variable resolution decoding mode according to the coding mode indication information;
under the condition that the target decoding mode is a variable resolution decoding mode, analyzing the code stream of the image block to be decoded to acquire indication information of an up-sampling filter;
determining a target upsampling filter from a preset upsampling filter set according to the upsampling filter indication information, wherein the upsampling filter set at least comprises a Finite Impulse Response (FIR) upsampling filter and a Convolutional Neural Network (CNN) upsampling filter;
and upsampling the initial reconstructed image block by adopting the target upsampling filter to obtain a target reconstructed image block.
18. The method of claim 17, wherein the parameter values of the CNN upsampling filter are preset, and the parameter values of the CNN upsampling filter are obtained by performing offline training on a preset image training set.
19. The method of claim 17 or 18, wherein the method further comprises:
analyzing the code stream to obtain an update parameter value of the CNN up-sampling filter, wherein the update parameter value of the CNN up-sampling filter is used for replacing a preset parameter value of the CNN up-sampling filter.
20. The method of claim 19, wherein prior to parsing the code stream to obtain updated parameter values for the CNN upsampling filter, the method further comprises:
analyzing the code stream to obtain filter parameter updating indication information, wherein the filter parameter updating indication information is used for indicating whether to update the parameter value of the target up-sampling filter;
analyzing the code stream to obtain an update parameter value of the CNN upsampling filter, wherein the update parameter value comprises the following steps:
and analyzing the code stream to obtain an update parameter value of the CNN up-sampling filter under the condition that the filter parameter update indication information indicates that the parameter of the target up-sampling filter is updated.
21. The method according to claim 19 or 20, wherein the updated parameter values of the CNN upsampling filter are obtained by performing online training on a CNN upsampling network according to an image block to be encoded, wherein the image block to be decoded is obtained by encoding the image block to be encoded.
22. An image encoding device characterized by comprising:
the device comprises a determining module, a calculating module and a judging module, wherein the determining module is used for determining a target up-sampling filter from a preset up-sampling filter set according to the coding cost of an image block to be coded, and the up-sampling filter set at least comprises a finite impulse response FIR up-sampling filter and a Convolutional Neural Network (CNN) up-sampling filter;
a processing module, the processing module specifically configured to:
generating up-sampling filter indication information corresponding to the target up-sampling filter;
adopting a preset FIR down-sampling filter to perform down-sampling on the image block to be encoded to obtain a first image block;
coding the first image block to obtain a code stream;
and writing the indication information of the up-sampling filter into the code stream.
23. The apparatus of claim 22, wherein the determination module is specifically configured to:
determining the coding cost of the image block to be coded when each upsampling filter in the upsampling filter set is used as the target upsampling filter;
determining a first upsampling filter in the upsampling filter set as the target upsampling filter, wherein in the upsampling filter set, the coding cost of the image block to be coded is the smallest when the first upsampling filter is used as the target upsampling filter.
24. The apparatus of claim 22 or 23, wherein the parameter values of the CNN upsampling filter are preset, and the parameter values of the CNN upsampling filter are obtained by performing offline training on a preset image training set.
25. The apparatus according to any of claims 22-24, wherein before the determining module determines the target upsampling filter from a preset set of upsampling filters according to the coding cost of the image block to be coded, the determining module is further configured to:
and performing online training on the CNN up-sampling filter according to the image block to be encoded to obtain an update parameter value of the CNN up-sampling filter, wherein the update parameter value of the CNN up-sampling filter is used for replacing a preset parameter value of the CNN up-sampling filter.
26. The apparatus of claim 25, wherein the processing module is further to:
and writing the update parameter value of the CNN up-sampling filter into the code stream.
27. The apparatus according to any of claims 22-26, wherein before the determining module determines the target upsampling filter from a preset set of upsampling filters according to the coding cost of the image block to be coded, the determining module is further configured to:
determining a target coding mode from an original resolution coding mode and a variable resolution coding mode according to the coding cost of the image block to be coded;
generating encoding mode indication information corresponding to the target encoding mode;
and writing the coding mode indication information into a code stream.
28. An image encoding device characterized by comprising:
the device comprises a determining module, a calculating module and a judging module, wherein the determining module is used for determining a target up-sampling filter from a preset up-sampling filter set according to the coding cost of an image block to be coded, and the up-sampling filter set at least comprises a finite impulse response FIR up-sampling filter and a Convolutional Neural Network (CNN) up-sampling filter;
a processing module, the processing module specifically configured to:
generating up-sampling filter indication information corresponding to the target up-sampling filter;
determining a down-sampling filter with the same type as the target up-sampling filter in a preset down-sampling filter set as a target down-sampling filter, wherein the down-sampling filter set at least comprises a Finite Impulse Response (FIR) down-sampling filter and a Convolutional Neural Network (CNN) down-sampling filter;
adopting the target downsampling filter to downsample the image block to be encoded to obtain a first image block;
coding the first image block to obtain a code stream;
and writing the indication information of the up-sampling filter into the code stream.
29. The apparatus of claim 28, wherein the determination module is specifically configured to:
determining the coding cost of the image block to be coded when each upsampling filter in the upsampling filter set is used as the target upsampling filter;
determining a first upsampling filter in the upsampling filter set as the target upsampling filter, wherein in the upsampling filter set, the coding cost of the image block to be coded is the smallest when the first upsampling filter in the upsampling filter set is used as the target upsampling filter.
30. The apparatus according to claim 28 or 29, wherein the parameter values of the CNN upsampling filter are preset, and the parameter values of the CNN upsampling filter are obtained by performing offline training on a preset image training set.
31. The apparatus according to any of claims 28-30, wherein the CNN downsampling filter has preset parameter values, and the parameter values of the CNN downsampling filter are obtained by performing offline training on a preset image training set.
32. The apparatus according to claim 28 or 29, wherein the parameter values of the CNN upsampling filter and the CNN downsampling filter are preset, and the parameter values of the CNN upsampling filter and the CNN downsampling filter are jointly trained on a preset image training set in an offline state.
33. The apparatus as claimed in any one of claims 28-32, wherein before said processing module downsamples said image block to be encoded using said target downsampling filter, said processing module is further configured to:
and performing online training on the CNN downsampling filter according to the image block to be encoded to obtain an updated parameter value of the CNN downsampling filter, wherein the updated parameter value of the CNN downsampling filter is used for replacing a preset parameter value of the CNN downsampling filter.
34. The apparatus according to any of claims 28-33, wherein before the determining module determines the target upsampling filter from a preset set of upsampling filters according to the coding cost of the image block to be coded, the determining module is further configured to:
and performing online training on the CNN up-sampling filter according to the image block to be encoded to obtain an updated parameter value of the CNN up-sampling filter, wherein the updated parameter value of the CNN up-sampling filter is used for replacing a preset parameter value of the CNN up-sampling filter.
35. The apparatus as claimed in any one of claims 28-32, wherein before said processing module downsamples said image block to be encoded using said target downsampling filter, said processing module is further configured to:
performing joint online training on the CNN down-sampling filter and the CNN up-sampling filter according to the image block to be encoded to obtain an update parameter value of the CNN down-sampling filter and an update parameter value of the CNN up-sampling filter;
the updating parameter value of the CNN down-sampling filter is used for replacing a preset parameter value of the CNN down-sampling filter, and the updating parameter value of the CNN up-sampling filter is used for replacing a preset parameter value of the CNN up-sampling filter.
36. The apparatus of claim 34 or 35, wherein the processing module is further configured to:
and writing the update parameter value of the CNN up-sampling filter into the code stream.
37. The apparatus according to any of claims 28-36, wherein before the determining module determines the target upsampling filter from a preset set of upsampling filters according to the coding cost of the image block to be encoded, the determining module is further configured to:
determining a target coding mode from an original resolution coding mode and a variable resolution coding mode according to the coding cost of the image block to be coded;
generating encoding mode indication information corresponding to the target encoding mode;
and writing the coding mode indication information into a code stream.
38. An image decoding apparatus, comprising:
the acquisition module is used for acquiring the code stream;
a processing module, the processing module specifically configured to:
entropy decoding, inverse quantization and inverse transformation are carried out on the code stream to obtain a reconstructed residual signal of an image block to be decoded;
acquiring a prediction signal of the image block to be decoded;
adding the reconstruction residual signal and the prediction signal to obtain an initial reconstruction image block of the image block to be decoded;
analyzing the code stream to obtain coding mode indication information;
determining a target decoding mode from an original resolution decoding mode and a variable resolution decoding mode according to the coding mode indication information;
under the condition that the target decoding mode is a variable resolution decoding mode, analyzing the code stream of the image block to be decoded to acquire indication information of an up-sampling filter;
determining a target upsampling filter from a preset upsampling filter set according to the upsampling filter indication information, wherein the upsampling filter set at least comprises a Finite Impulse Response (FIR) upsampling filter and a Convolutional Neural Network (CNN) upsampling filter;
and upsampling the initial reconstructed image block by adopting the target upsampling filter to obtain a target reconstructed image block.
39. The apparatus of claim 38, wherein the parameter values of the CNN upsampling filter are preset, and the parameter values of the CNN upsampling filter are obtained by performing offline training on a preset image training set.
40. The apparatus of claim 38 or 39, wherein the processing module is further configured to:
analyzing the code stream to obtain an updated parameter value of the CNN up-sampling filter, wherein the updated parameter value of the CNN up-sampling filter is used for replacing a preset parameter value of the CNN up-sampling filter.
41. The apparatus of claim 40, wherein before the processing module parses the code stream to obtain updated parameter values for the CNN upsampling filter, the processing module is further configured to:
analyzing the code stream to obtain filter parameter updating indication information, wherein the filter parameter updating indication information is used for indicating whether to update the parameter value of the target up-sampling filter;
analyzing the code stream to obtain an update parameter value of the CNN up-sampling filter, wherein the analyzing comprises:
and analyzing the code stream to obtain an update parameter value of the CNN up-sampling filter under the condition that the filter parameter update indication information indicates that the parameter of the target up-sampling filter is updated.
42. The apparatus of claim 40 or 41, wherein the updated parameter values of the CNN upsampling filter are obtained by performing online training on a CNN upsampling network according to an image block to be encoded, wherein the image block to be decoded is obtained by encoding the image block to be encoded.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810242304.7A CN110300301B (en) | 2018-03-22 | 2018-03-22 | Image coding and decoding method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810242304.7A CN110300301B (en) | 2018-03-22 | 2018-03-22 | Image coding and decoding method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110300301A CN110300301A (en) | 2019-10-01 |
CN110300301B true CN110300301B (en) | 2023-01-13 |
Family
ID=68025753
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810242304.7A Active CN110300301B (en) | 2018-03-22 | 2018-03-22 | Image coding and decoding method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110300301B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102525578B1 (en) | 2018-10-19 | 2023-04-26 | 삼성전자주식회사 | Method and Apparatus for video encoding and Method and Apparatus for video decoding |
KR102287947B1 (en) | 2019-10-28 | 2021-08-09 | 삼성전자주식회사 | Apparatus and method for performing artificial intelligence encoding and artificial intelligence decoding of image |
KR102436512B1 (en) * | 2019-10-29 | 2022-08-25 | 삼성전자주식회사 | Method and Apparatus for video encoding and Method and Apparatus for video decoding |
CN110971784B (en) * | 2019-11-14 | 2022-03-25 | 北京达佳互联信息技术有限公司 | Video processing method and device, electronic equipment and storage medium |
CN111064958B (en) * | 2019-12-28 | 2021-03-30 | 复旦大学 | Low-complexity neural network filtering algorithm for B frame and P frame |
CN115700771A (en) * | 2021-07-31 | 2023-02-07 | 华为技术有限公司 | Encoding and decoding method and device |
CN114050987B (en) * | 2021-11-03 | 2023-08-22 | 猫岐智能科技(上海)有限公司 | Non-contact debugging system and method for Internet of things equipment |
WO2024061136A1 (en) * | 2022-09-19 | 2024-03-28 | Douyin Vision Co., Ltd. | Method, apparatus, and medium for video processing |
WO2024174071A1 (en) * | 2023-02-20 | 2024-08-29 | Rwth Aachen University | Video coding using signal enhancement filtering |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103716630A (en) * | 2012-09-29 | 2014-04-09 | 华为技术有限公司 | Upsampling filter generation method and device |
CN104811276A (en) * | 2015-05-04 | 2015-07-29 | 东南大学 | DL-CNN (deep leaning-convolutional neutral network) demodulator for super-Nyquist rate communication |
CN107016642A (en) * | 2015-11-09 | 2017-08-04 | 汤姆逊许可公司 | For to there is image of making an uproar to carry out the method for resolution ratio up-regulation and for there is image of making an uproar to carry out the device of resolution ratio up-regulation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1769399B1 (en) * | 2004-06-07 | 2020-03-18 | Sling Media L.L.C. | Personal media broadcasting system |
-
2018
- 2018-03-22 CN CN201810242304.7A patent/CN110300301B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103716630A (en) * | 2012-09-29 | 2014-04-09 | 华为技术有限公司 | Upsampling filter generation method and device |
CN104811276A (en) * | 2015-05-04 | 2015-07-29 | 东南大学 | DL-CNN (deep leaning-convolutional neutral network) demodulator for super-Nyquist rate communication |
CN107016642A (en) * | 2015-11-09 | 2017-08-04 | 汤姆逊许可公司 | For to there is image of making an uproar to carry out the method for resolution ratio up-regulation and for there is image of making an uproar to carry out the device of resolution ratio up-regulation |
Non-Patent Citations (1)
Title |
---|
Learning a Convolutional Neural Network for;Han Zhang等;《VCIP》;20171213;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110300301A (en) | 2019-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110300301B (en) | Image coding and decoding method and device | |
JP7461974B2 (en) | Chroma prediction method and device | |
CN109996072B (en) | Video image processing method and device | |
CN107547907B (en) | Method and device for coding and decoding | |
CN113766249B (en) | Loop filtering method, device, equipment and storage medium in video coding and decoding | |
KR20150010903A (en) | Method And Apparatus For Generating 3K Resolution Display Image for Mobile Terminal screen | |
US12010325B2 (en) | Intra block copy scratch frame buffer | |
US20220295071A1 (en) | Video encoding method, video decoding method, and corresponding apparatus | |
CN116582685A (en) | AI-based grading residual error coding method, device, equipment and storage medium | |
WO2021228512A1 (en) | Global skip connection based cnn filter for image and video coding | |
CN114554212A (en) | Video processing apparatus and method, and computer storage medium | |
CN114463453A (en) | Image reconstruction method, image coding method, image decoding method, image coding device, image decoding device, and image decoding device | |
CN117441186A (en) | Image decoding and processing method, device and equipment | |
CN107945108A (en) | Method for processing video frequency and device | |
CN112911311B (en) | Encoding method, encoding device, storage medium, and electronic apparatus | |
US20240195959A1 (en) | Subblock-based adaptive interpolation filter in digital video coding | |
WO2023246655A1 (en) | Image encoding method and apparatus, and image decoding method and apparatus | |
KR102398232B1 (en) | Method and apparatus for decoding a video signal with reference picture filtering | |
JP2024507791A (en) | Method and apparatus for encoding/decoding video | |
WO2024163488A1 (en) | A method and an apparatus for encoding/decoding at least one part of an image using one or more multi-resolution transform blocks | |
CN117280683A (en) | Method and apparatus for encoding/decoding video | |
CN115866244A (en) | Image coding method and device | |
KR20240107131A (en) | Learning-based point cloud compression through adaptive point generation | |
CN117813817A (en) | Method and apparatus for encoding/decoding video | |
JP2024537625A (en) | Improved angle discretization in decoder-side intra-mode derivation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |