WO2022246809A1 - 编解码方法、码流、编码器、解码器以及存储介质 - Google Patents

编解码方法、码流、编码器、解码器以及存储介质 Download PDF

Info

Publication number
WO2022246809A1
WO2022246809A1 PCT/CN2021/096818 CN2021096818W WO2022246809A1 WO 2022246809 A1 WO2022246809 A1 WO 2022246809A1 CN 2021096818 W CN2021096818 W CN 2021096818W WO 2022246809 A1 WO2022246809 A1 WO 2022246809A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
information
feature information
residual
resolution
Prior art date
Application number
PCT/CN2021/096818
Other languages
English (en)
French (fr)
Inventor
元辉
姜东冉
初彦翰
杨烨
李明
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to EP21942383.7A priority Critical patent/EP4351136A1/en
Priority to PCT/CN2021/096818 priority patent/WO2022246809A1/zh
Priority to CN202180097906.8A priority patent/CN117280685A/zh
Publication of WO2022246809A1 publication Critical patent/WO2022246809A1/zh
Priority to US18/520,922 priority patent/US20240098271A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • the present application relates to the technical field of video processing, and in particular to a codec method, a code stream, an encoder, a decoder, and a storage medium.
  • the encoding and decoding of the current block may adopt intra-frame prediction and inter-frame prediction.
  • sub-pixel motion compensation technology is a key technology to improve compression efficiency by eliminating video temporal redundancy, and it is mainly used in motion compensation and motion estimation of inter-frame prediction.
  • Embodiments of the present application provide a codec method, a code stream, a coder, a decoder, and a storage medium, which can save a code rate and improve codec efficiency while ensuring the same decoding quality.
  • the embodiment of the present application provides an encoding method applied to an encoder, and the method includes:
  • the current block is encoded according to the motion information.
  • the embodiment of the present application provides a code stream, which is generated by performing bit coding according to the information to be coded; wherein,
  • the information to be coded includes at least the motion information of the current block, the residual block of the current block, and the value of the first syntax element identification information, and the first syntax element identification information is used to indicate whether the current block uses motion compensation enhancement processing.
  • the embodiment of the present application provides a decoding method, which is applied to a decoder, and the method includes:
  • the first syntax element identification information indicates that the current block uses a motion compensation enhancement processing method, then parse the code stream to determine the first motion information of the current block;
  • a reconstructed block of the current block is determined.
  • an encoder which includes a first determination unit, a first motion compensation unit, and a coding unit; wherein,
  • a first determining unit configured to determine a first matching block of the current block
  • the first motion compensation unit is configured to perform motion compensation enhancement on the first matching block to obtain at least one second matching block;
  • the first determining unit is further configured to determine motion information of the current block according to at least one second matching block;
  • the encoding unit is configured to encode the current block according to the motion information.
  • the embodiment of the present application provides an encoder, where the encoder includes a first memory and a first processor; wherein,
  • a first memory for storing a computer program capable of running on the first processor
  • the first processor is configured to execute the method of the first aspect when running the computer program.
  • the embodiment of the present application provides a decoder, which includes an analysis unit, a second determination unit, and a second motion compensation unit; wherein,
  • the parsing unit is configured to parse the code stream and determine the value of the first syntax element identification information
  • the parsing unit is further configured to parse the code stream and determine the first motion information of the current block if the first syntax element identification information indicates that the current block uses a motion compensation enhancement processing method;
  • the second motion compensation unit is configured to determine a first matching block of the current block according to the first motion information, and perform motion compensation enhancement on the first matching block to obtain at least one second matching block;
  • the second determination unit is configured to determine a first prediction block of the current block according to the first motion information and at least one second matching block; and determine a reconstruction block of the current block according to the first prediction block.
  • the embodiment of the present application provides a decoder, where the decoder includes a second memory and a second processor; wherein,
  • a second memory for storing a computer program capable of running on the second processor
  • the second processor is configured to execute the method of the third aspect when running the computer program.
  • the embodiment of the present application provides a computer storage medium, the computer storage medium stores a computer program, and when the computer program is executed by the first processor, the method of the first aspect is implemented, or when the computer program is executed by the second processor, Realize as the method of the third aspect.
  • Embodiments of the present application provide a codec method, code stream, encoder, decoder, and storage medium.
  • On the encoder side by determining the first matching block of the current block and performing motion compensation enhancement on the first matching block, the obtained at least one second matching block; determining the motion information of the current block according to the at least one second matching block; and encoding the current block according to the motion information.
  • the decoder side by parsing the code stream, determine the value of the first syntax element identification information; if the first syntax element identification information indicates that the current block uses motion compensation enhancement processing, then parse the code stream to determine the first motion of the current block Information; determine the first matching block of the current block according to the first motion information, and perform motion compensation enhancement on the first matching block to obtain at least one second matching block; determine the current block according to the first motion information and at least one second matching block A first prediction block of the block; according to the first prediction block, a reconstruction block of the current block is determined.
  • FIG. 1 is a schematic diagram of a representation form of inter-frame prediction provided by an embodiment of the present application
  • Fig. 2 is a schematic diagram of fractional positions of a luminance component with sub-pixel precision provided by an embodiment of the present application
  • FIG. 3A is a schematic block diagram of a video encoding system provided by an embodiment of the present application.
  • FIG. 3B is a schematic block diagram of a video decoding system provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of an encoding method provided in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a network structure of a preset neural network model provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a network structure of a residual projection block provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a decoding method provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an encoder provided in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a specific hardware structure of an encoder provided in an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a decoder provided in an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a specific hardware structure of a decoder provided in an embodiment of the present application.
  • references to “some embodiments” describe a subset of all possible embodiments, but it is understood that “some embodiments” may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.
  • first ⁇ second ⁇ third involved in the embodiment of the present application is only used to distinguish similar objects, and does not represent a specific ordering of objects. Understandably, “first ⁇ second ⁇ The specific order or sequence of "third” can be interchanged where allowed, so that the embodiments of the application described herein can be implemented in an order other than that illustrated or described herein.
  • the first image component, the second image component and the third image component are generally used to represent the coding block (Coding Block, CB); where these three image components are a luminance component and a blue chrominance component respectively And a red chrominance component, specifically, the luminance component is usually represented by the symbol Y, the blue chrominance component is usually represented by the symbol Cb or U, and the red chrominance component is usually represented by the symbol Cr or V; thus, the video image can be expressed in the YCbCr format Indicates that it can also be expressed in YUV format.
  • MPEG Moving Picture Experts Group
  • JVET Joint Video Experts Team
  • VVC Very Video Coding
  • VVC's reference software testing platform VVC Test Model, VTM
  • Peak Signal to Noise Ratio Peak Signal to Noise Ratio
  • inter-frame prediction is the process of predicting the current frame by using the decoded and reconstructed reference frame. Its core is to obtain the optimal matching block (also called "best matching piece").
  • the motion information may include a prediction direction, an index number of a reference frame, and a motion vector.
  • FIG. 1 it shows a schematic diagram of a representation form of inter-frame prediction provided by an embodiment of the present application. As shown in Figure 1, for the encoder, the encoder uses a certain search algorithm to find an optimal matching block for the current block to be encoded in the current frame, and the displacement between the two is called a motion vector. This process can be called for motion estimation.
  • the encoder first needs to perform integer pixel motion estimation to obtain the optimal matching block at the integer pixel position.
  • the concept of sub-pixel motion compensation is proposed.
  • the so-called sub-pixel motion compensation is to interpolate the optimal matching block at the integer pixel position through an interpolation filter to generate 1/2 precision sub-pixel samples and 1/4 precision sub-pixel samples.
  • FIG. 2 shows a schematic diagram of a fractional position of a luminance component with sub-pixel accuracy provided by an embodiment of the present application.
  • uppercase letters represent integer pixel samples, that is, A i,j represents pixels at integer positions; lowercase letters represent sub-pixel samples, where b i,j , h i,j , j i,j all represent dichotomous The sub-pixels of one-precision positions, and the rest of the lowercase letters represent sub-pixels of quarter-precision positions.
  • the essence of sub-pixel motion compensation is to further optimize the matching blocks at integer pixel positions by means of interpolation filtering, where the main functions of the interpolation filter include removing spectral aliasing caused by digital sampling and suppressing coding noise.
  • the existing technical solutions still have some defects, especially it is difficult to adapt to the increasingly diverse video content and complex encoding environment, resulting in low encoding and decoding efficiency.
  • the embodiment of the present application provides an encoding method, by determining the first matching block of the current block; performing motion compensation enhancement on the first matching block to obtain at least one second matching block; determining the current block according to the at least one second matching block
  • the motion information of the current block is encoded according to the motion information.
  • the embodiment of the present application provides a decoding method, which determines the value of the first syntax element identification information by analyzing the code stream; if the first syntax element identification information indicates that the current block uses motion compensation enhancement processing, then parses the code stream to determine First motion information of the current block; determining a first matching block of the current block according to the first motion information, and performing motion compensation enhancement on the first matching block to obtain at least one second matching block; according to the first motion information and at least one first matching block
  • the second matching block is to determine the first prediction block of the current block; and to determine the reconstruction block of the current block according to the first prediction block.
  • the video coding system 10 includes a transform and quantization unit 101, an intra frame estimation unit 102, an intra frame prediction unit 103, a motion compensation unit 104, a motion estimation unit 105, an inverse transform and inverse quantization unit 106, a filter Control analysis unit 107, filter unit 108, encoding unit 109 and decoded image cache unit 110 etc., wherein, filtering unit 108 can realize deblocking filtering and sample adaptive indentation (Sample Adaptive Offset, SAO) filtering, encoding unit 109 can realize Header information coding and context-based adaptive binary arithmetic coding (Context-based Adaptive Binary Arithmetic Coding, CABAC).
  • SAO Sample Adaptive Offset
  • a video coding block can be obtained by dividing the coding tree block (Coding Tree Unit, CTU), and then the residual pixel information obtained after intra-frame or inter-frame prediction is paired by the transformation and quantization unit 101
  • the video coding block is transformed, including transforming the residual information from the pixel domain to the transform domain, and quantizing the obtained transform coefficients to further reduce the bit rate;
  • the intra frame estimation unit 102 and the intra frame prediction unit 103 are used for Intra-frame prediction is performed on the video coding block; specifically, the intra-frame estimation unit 102 and the intra-frame prediction unit 103 are used to determine the intra-frame prediction mode to be used to code the video coding block;
  • the motion compensation unit 104 and the motion estimation unit 105 is used to perform inter-frame predictive encoding of the received video coding block relative to one or more blocks in one or more reference frames to provide temporal prediction information;
  • the motion estimation performed by the motion estimation unit 105 is used to generate motion vectors process, the motion vector can estimate the motion of the video
  • the context content can be based on adjacent coding blocks, and can be used to encode the information indicating the determined intra-frame prediction mode, and output the code stream of the video signal; and the decoded image buffer unit 110 is used to store the reconstructed video coding block for Forecast reference. As the video image encoding progresses, new reconstructed video encoding blocks will be continuously generated, and these reconstructed video encoding blocks will be stored in the decoded image buffer unit 110 .
  • the video decoding system 20 includes a decoding unit 201, an inverse transform and inverse quantization unit 202, an intra prediction unit 203, a motion compensation unit 204, a filtering unit 205, and a decoded image buffer unit 206, etc., wherein the decoding unit 201 can implement header information decoding and CABAC decoding, and filtering unit 205 can implement deblocking filtering and SAO filtering.
  • the decoding unit 201 can implement header information decoding and CABAC decoding
  • filtering unit 205 can implement deblocking filtering and SAO filtering.
  • the code stream of the video signal is output; the code stream is input into the video decoding system 20, and first passes through the decoding unit 201 to obtain the decoded transform coefficient; for the transform coefficient, pass
  • the inverse transform and inverse quantization unit 202 performs processing to generate a residual block in the pixel domain; the intra prediction unit 203 is operable to generate residual blocks based on the determined intra prediction mode and data from previously decoded blocks of the current frame or picture.
  • the motion compensation unit 204 determines the prediction information for the video decoding block by parsing motion vectors and other associated syntax elements, and uses the prediction information to generate the predictive properties of the video decoding block being decoded block; a decoded video block is formed by summing the residual block from the inverse transform and inverse quantization unit 202 with the corresponding predictive block produced by the intra prediction unit 203 or the motion compensation unit 204; the decoded video signal Video quality can be improved by filtering unit 205 in order to remove block artifacts; the decoded video blocks are then stored in the decoded picture buffer unit 206, which stores reference pictures for subsequent intra prediction or motion compensation , and is also used for the output of the video signal, that is, the restored original video signal is obtained.
  • the embodiment of the present application can be applied to the inter-frame prediction part of the video encoding system 10 (which may be referred to as “encoder” for short), specifically the motion compensation unit 104 and the motion estimation unit 105 as shown in FIG. 3A;
  • the embodiments of the application can also be applied to the inter-frame prediction part of the video decoding system 20 (which may be referred to as “decoder” for short), specifically the motion compensation unit 204 as shown in FIG. 3B. That is to say, the embodiment of the present application can be applied to an encoder, a decoder, or even both an encoder and a decoder, but no specific limitation is made here.
  • the "current block” specifically refers to the coding block currently to be inter-frame predicted in the image to be coded; when the method of the embodiment of the present application is applied to the decoding When using a device, the “current block” specifically refers to the currently decoded block to be inter-frame predicted in the image to be decoded.
  • FIG. 4 shows a schematic flowchart of an encoding method provided in an embodiment of the present application. As shown in Figure 4, the method may include:
  • S401 Determine the first matching block of the current block.
  • the video image can be divided into multiple image blocks, and each image block to be encoded can be called a coding block, and the current block here specifically refers to the current block to be subjected to inter-frame prediction. code block.
  • the current block may be a CTU, or even a CU, PU, etc., which is not limited in this embodiment of the present application.
  • the encoding method in the embodiment of the present application is mainly applied to motion estimation and motion compensation of inter-frame prediction.
  • motion compensation is to use the partial image in the decoded and reconstructed reference frame to predict and compensate the current partial image, which can reduce the redundant information of the moving image
  • motion estimation is to extract the motion information from the video sequence, that is, to extract the moving object from the reference
  • the displacement information between the frame and the current frame is estimated, that is, the motion information described in the embodiment of the present application, and this process is called motion estimation.
  • the first matching block here can be obtained based on integer pixel motion estimation, or can be obtained by using sub-pixel interpolation and filtering in related technologies, and this embodiment of the application does not make any limitation.
  • the determining the first matching block of the current block may include:
  • Integer pixel motion estimation is performed on the current block, and the first matching block of the current block is determined.
  • the target matching block at the integer pixel position ( Or referred to as "the first matching block") is the matching block with the smallest rate-distortion cost when motion estimation is performed at the integer pixel position of the current block.
  • motion estimation methods mainly include two categories: pixel recursive method and block matching method.
  • the former has high complexity and is rarely used in practice; the latter is widely used in video coding standards.
  • the block matching method it mainly includes a block matching criterion and a search method.
  • SAD Absolute Difference
  • MSE mean square error
  • NCCF Normalized Cross Correlation Function
  • the matching block at the integer pixel position in the smallest case is the optimal matching block, that is, the target matching block described in the embodiment of the present application. That is to say, the target matching block at the integer pixel position is a matching block corresponding to the minimum rate-distortion cost value selected from multiple matching blocks at the integer pixel position.
  • S402 Perform motion compensation enhancement on the first matching block to obtain at least one second matching block.
  • the embodiment of the present application may further perform motion compensation enhancement.
  • DCTIF is generally used in video coding standards to perform half-precision sub-pixel interpolation.
  • the basic idea is to forward transform the integer pixel samples to the DCT domain, and then use the DCT base sampled at the target sub-pixel position to inversely transform the DCT coefficients back to the spatial domain.
  • This process can be represented by a finite impulse response filtering process. Among them, assuming that a given pixel is represented as f(i), the pixel obtained by interpolation is represented as Then the mathematical form of the DCTIF interpolation process is shown in formula (1).
  • the tap coefficients of the interpolation filter of the half-precision sub-pixel samples are [-1,4,-11,40,40,-11,4,- 1].
  • the embodiment of the present application proposes a motion compensation enhancement processing manner based on a preset neural network model.
  • the step of performing motion compensation enhancement on the first matching block may further include: performing motion compensation enhancement on the first matching block by using a preset neural network model.
  • performing motion compensation enhancement on the first matching block to obtain at least one second matching block may include:
  • the first filtering process is performed on the processing block to obtain at least one second matching block.
  • the resolution of the processing block is higher than the resolution of the current block.
  • the resulting processed blocks after super-resolution and quality enhancement processing have high-quality and high-resolution performance.
  • the first matching block has the same resolution as the current block
  • the second matching block obtained after the first filtering process also has the same resolution as the current block.
  • the first filtering process may include: downsampling. That is to say, after the processing block is obtained, at least one second matching block can be obtained by down-sampling the processing block.
  • performing motion compensation enhancement on the first matching block using a preset neural network model to obtain at least one second matching block may include:
  • the precision of the second matching block is half precision, and the number of the second matching blocks is four; in another In a possible implementation manner, the precision of the second matching block is quarter precision, and the number of the second matching blocks is 16; however, this embodiment of the present application does not make any limitation thereto.
  • the preset neural network model is a convolutional neural network (Convolutional Neural Networks, CNN) model.
  • CNN convolutional Neural Networks
  • CNN is a kind of feed-forward neural network with convolution calculation and deep structure, and it is one of the representative algorithms of deep learning.
  • the convolutional neural network has the ability to learn representations, and can perform shift-invariant classification on input information according to its hierarchical structure, so it can also be called "Shift-Invariant Artificial Neural Networks". , SIANN)".
  • this embodiment is different from the interpolation filter in the above-mentioned embodiment to interpolate and filter three sub-pixel samples of half precision for the first matching block.
  • This embodiment uses the convolutional neural network model to block to achieve end-to-end super-resolution and quality enhancement, and then downsample the output high-resolution image to generate four half-precision sub-pixel samples (i.e., the “second matching block”).
  • the preset neural network model may include a feature extraction module, a residual projection module group, a sampling module and a reconstruction module.
  • the feature extraction module, the residual projection module group, the sampling module and the reconstruction module are connected in sequence.
  • performing super-resolution and quality enhancement processing on the first matching block to obtain a processing block may include:
  • the feature extraction module is mainly used to extract shallow features, so the feature extraction module can also be called a "shallow feature extraction module".
  • the shallow features in the embodiments of the present application mainly refer to low-level simple features (such as edge features, etc.).
  • the feature extraction module may include a first convolutional layer.
  • performing shallow feature extraction on the first matching block through the feature extraction module to obtain the first feature information may include: performing a convolution operation on the first matching block through the first convolution layer to obtain the first feature information .
  • the size of the convolution kernel of the first convolution layer is K ⁇ L
  • the number of convolution kernels of the first convolution layer is an integer power of 2
  • K and L are positive integers greater than zero.
  • the size of the convolution kernel of the first convolution layer may be 3 ⁇ 3
  • the number of convolution kernels of the first convolution layer is 64, but there is no limitation here.
  • the residual projection module set may include N residual projection blocks, a second convolutional layer, and a first connection layer; where N is greater than or equal to Integer of 1.
  • the N residual projection blocks, the second convolutional layer and the first connection layer are sequentially connected, and the first connection layer is also connected with the first residual projection block of the N residual projection blocks. Enter the connection.
  • performing residual feature learning on the first feature information through the residual projection module group to obtain the second feature information includes:
  • the first feature information and the second intermediate feature information are added through the first connection layer to obtain the second feature information.
  • the size of the convolution kernel of the second convolution layer is K ⁇ L
  • the number of convolution kernels of the second convolution layer is an integer power of 2
  • K and L are positive integers greater than zero.
  • the size of the convolution kernel of the second convolution layer is 3 ⁇ 3
  • the number of convolution kernels of the second convolution layer is 64, but there is no limitation here.
  • the input of the d-th residual projection block is denoted by F d-1
  • the output of the d-th residual projection block is denoted by F d .
  • the N residual projection blocks are a cascade structure
  • the input of the cascade structure is the first feature information
  • the output of the cascade structure is the first feature information.
  • performing residual feature learning on the first feature information through N residual projection blocks to obtain the first intermediate feature information may include:
  • N When N is equal to 1, input the first feature information to the first residual projection block, obtain the output information of the first residual projection block, and determine the output information of the first residual projection block as the first intermediate characteristic information;
  • N is greater than 1
  • input the output information of the dth residual projection block to the d+1th residual projection block to obtain the d+1th residual The output information of the difference projection block, and add 1 to d until the output information of the Nth residual projection block is obtained, and the output information of the Nth residual projection block is determined as the first intermediate feature information; where, d is an integer greater than or equal to 1 and less than N.
  • the output information of the residual projection block is the first intermediate feature information; if N is greater than 1, that is, the residual projection module There are more than two residual projection blocks in the group.
  • the output of the previous residual projection block is the input of the next residual projection block until the output of the last residual projection block is obtained by stacking.
  • Information, at this time the output information of the last residual projection block is the first intermediate feature information.
  • the sampling module may include a sub-pixel convolution layer.
  • said performing the second filtering process on the second feature information through the sampling module to obtain the third feature information may include:
  • the second filtering process is performed on the second feature information through the sub-pixel convolution layer to obtain the third feature information.
  • the resolution of the third feature information obtained after the second filtering process is higher than the resolution of the second feature information.
  • the second filtering process may include: upsampling. That is to say, the sampling module is mainly used for upsampling the second characteristic information, so the sampling module may also be called an "upsampling module”.
  • the sampling module may use a sub-pixel convolution layer, or may add a sub-pixel convolution layer.
  • the sub-pixel convolutional layer can also be a PixShuffle module (or, called Pixelshuffle module), which realizes the function of inputting a low-resolution H ⁇ W image, and converting it to It is transformed into a rH ⁇ rW high-resolution input image.
  • the implementation process does not directly generate this high-resolution image by means of interpolation, but first obtains the feature map of r 2 channels through convolution (the size of the feature map is consistent with the input low-resolution image), and then passes periodic screening (periodic shuffling) method to obtain this high-resolution image; wherein, r can be the magnification of the image.
  • r can be the magnification of the image.
  • the feature map the number of channels of which is r 2 , the r 2 channels of each pixel are rearranged into an r ⁇ r area, corresponding to a sub-block of r ⁇ r size in the high-resolution image,
  • the feature image of size r 2 ⁇ H ⁇ W is rearranged into a high-resolution image of size 1 ⁇ rH ⁇ rW.
  • the reconstruction module may include a fifth convolutional layer.
  • the super-resolution reconstruction of the third feature information by the reconstruction module to obtain the processing block may include:
  • a convolution operation is performed on the third feature information through the fifth convolution layer to obtain a processing block.
  • the size of the convolution kernel of the fifth convolution layer is K ⁇ L
  • the number of convolution kernels of the fifth convolution layer is an integer power of 2
  • K and L are positive integers greater than zero.
  • the size of the convolution kernel of the fifth convolution layer is 3 ⁇ 3
  • the number of convolution kernels of the fifth convolution layer is 1, but there is no limitation here.
  • FIG. 5 shows a schematic diagram of a network structure of a preset neural network model provided by an embodiment of the present application.
  • the preset neural network model can be represented by RPNet.
  • RPNet mainly includes Shallow Feature Extraction Net, Residual Projection Blocks, Up-sampling Net and Reconstruction Net ) and other four parts.
  • the shallow feature extraction network layer is the feature extraction module described in the embodiment of the application, which can be the first convolutional layer
  • the upsampling network layer is the sampling module described in the embodiment of the application, which can be a sub-pixel volume The stacking layer
  • the reconstruction network layer is the reconstruction module described in the embodiment of the present application, which may be the fifth convolutional layer.
  • I LR represents the first matching block described in the embodiment of the present application, that is, the low-resolution image input by RPNet
  • I SR represents the processing block described in the embodiment of the present application, that is, the high-resolution image ( can also be referred to as a super-resolution image); that is, I LR and I SR denote the input and output of RPNet, respectively.
  • the network structure of the model will be described in detail below with reference to FIG. 5 .
  • HSFE ( ) represents the convolution operation
  • F 0 represents the shallow features of the extracted low-resolution image, which are used as the input of the residual projection module.
  • W LSC represents the weight value of the convolutional layer after the Nth residual projection block.
  • GRL Global Residual Leaning
  • H UP ( ) represents the convolution operation to achieve upsampling
  • F UP represents the extracted third feature information, which is used as the input of the reconstruction network layer.
  • H REC ( ) denotes the convolution operation to achieve super-resolution reconstruction.
  • the residual projection block may include an upper projection module, M residual modules, a local feature fusion module, a lower projection module and a second connection layer; wherein, M is an integer greater than or equal to 1.
  • the upper projection module, the M residual modules, the local feature fusion module, the lower projection module and the second connection layer are sequentially connected, and the second connection layer is also connected to the input of the upper projection module, M
  • the output of the residual module is also respectively connected with the local feature fusion module.
  • the method may also include:
  • the input information and the filtered feature information are added through the second connection layer to obtain the output information of the residual projection block.
  • the up-projection module may include a transposed convolutional layer.
  • performing the third filtering process on the input information of the residual projection block through the up-projection module to obtain the first high-resolution feature information may include:
  • the third filtering process is performed on the input information of the residual projection block by transposing the convolution layer to obtain the first high-resolution feature information.
  • the resolution of the first high-resolution feature information obtained after the third filtering process is higher than the resolution of the input information of the residual projection block.
  • the third filtering process may include: upsampling. That is, the input information of the residual projection block is upsampled by transposing the convolutional layer to obtain the first high-resolution feature information.
  • the local feature fusion module may include a feature fusion layer and a third convolutional layer, and the M second high-resolution feature information is processed by the local feature fusion module Perform a fusion operation to obtain the third high-resolution feature information, including:
  • a third convolutional layer is used to perform a convolution operation on the fused feature information to obtain third high-resolution feature information.
  • the size of the convolution kernel of the third convolution layer is K ⁇ L
  • the number of convolution kernels of the third convolution layer is an integer power of 2
  • K and L are positive integers greater than zero.
  • the size of the convolution kernel of the third convolution layer is 1 ⁇ 1
  • the number of convolution kernels of the third convolution layer is 64, but there is no limitation here.
  • the M second high-resolution feature information is fused through the feature fusion layer, but in order to give full play to the learning ability of the residual network, a 1 ⁇ 1 convolutional layer is also introduced here
  • the fusion operation of the feature information learned by the residual module can adaptively control the learned feature information.
  • the lower projection module may include a fourth convolutional layer, and the fourth filtering process is performed on the third high-resolution feature information through the lower projection module to obtain the filtered Characteristic information, including:
  • a fourth filtering process is performed on the third high-resolution feature information through the fourth convolution layer to obtain filtered feature information.
  • the resolution of the filtered feature information obtained after the fourth filtering process is lower than the resolution of the third high-resolution feature information.
  • the filtered feature information obtained after the fourth filtering process has the same resolution as the input information of the residual projection block.
  • the fourth filtering process may include: downsampling. That is to say, the third high-resolution feature information is down-sampled through the fourth convolutional layer to obtain filtered feature information.
  • FIG. 6 shows a schematic diagram of a network structure of a residual projection block provided by an embodiment of the present application.
  • the residual projection block can be represented by RPB.
  • RPB mainly includes Up-Projection Unit, Residual Block, Local Feature Fusion and Down-Projection Unit.
  • M residual modules for the dth residual projection block, it is assumed that it contains M residual modules; the specific connection relationship is shown in Figure 6.
  • the up-projection module uses a transposed convolutional layer to up-sample the input low-resolution features, and the mathematical form is shown in equation (9),
  • * indicates the spatial convolution operation
  • F d-1 indicates the input of the d-th residual projection block
  • p t indicates the transposed convolution
  • ⁇ s indicates the upsampling with scaling factor s
  • F d,0 indicates the first input to the residual module.
  • [F d,1 ,...,F d,M ] denote the output of M residual modules respectively.
  • the local feature fusion includes a feature fusion layer and a third convolutional layer, in which a 1 ⁇ 1 third convolutional layer is introduced to perform fusion operations on the features learned by the residual module.
  • the characteristic information learned by adaptive control its mathematical form is shown in formula (10),
  • the down-projection module uses the convolution operation of the fourth convolutional layer to down-sample F d, LFF to achieve the effect of using high-resolution features to guide low-resolution features, and finally perform pixel-level addition through F d-1 Get F d , its mathematical form is shown in formula (11),
  • the embodiment of the present application combines the transposed convolution and the residual module to propose the residual projection block RPB.
  • the basic idea is to use the transposed convolution layer to project low-resolution features into the high-resolution feature space, and then use The residual module learns high-resolution features of different levels, then improves the expressive ability of the residual module through local feature fusion, and finally uses the convolutional layer to project the high-resolution features back to the low-resolution feature space.
  • the embodiment of this application proposes a preset neural network model RPNet with half-precision sub-pixel interpolation, and embeds the trained model into the coding platform VTM7.0.
  • the embodiment of the present application can only select RPNet for motion compensation enhancement for PUs with a size greater than or equal to 64 ⁇ 64; for PUs with a size smaller than 64 ⁇ 64, motion is still performed according to the interpolation filter in the related art Compensation enhancements.
  • the method in the embodiment of the present application can realize sub-pixel motion compensation with half precision, and can also realize sub-pixel motion compensation with quarter precision.
  • the pixel-by-pixel motion compensation can even implement pixel-by-pixel motion compensation with other precision, which is not limited in this embodiment of the present application.
  • the convolution kernel size of the transposed convolution layer and the fourth convolution layer are both 6 ⁇ 6, and the step size and padding value are both set to 2; or, when the precision of the sub-pixel sample value is a quarter precision, the convolution kernel size of the transposed convolution layer and the fourth convolution layer are both 8 ⁇ 8, and the step size and padding value are respectively set to 4 and 2.
  • the number N of residual projection blocks in RPNet can be set to 10
  • the number M of residual modules in each residual projection block can be Set to 3.
  • the number of convolution kernels of the convolution layer in the reconstruction network layer is set to 1
  • the number of convolution kernels of other transposed convolution layers or convolution layers in the network model is set to 64.
  • the size of the convolution kernel in the transposed convolutional layer in the upper projection module and the convolutional layer in the lower projection module is set to 6 ⁇ 6, and the stride and padding are set to 2.
  • other convolutional layers in the network model use convolutional kernels with a size of 3 ⁇ 3, and the upsampling module can use sub-pixel convolutional layers.
  • the RPNet in the embodiment of the present application can also be used for PUs of all sizes to perform half-precision sub-pixel interpolation.
  • the RPNet in the embodiment of the present application can also adjust the number of residual projection blocks and the number of residual modules in the residual projection block. Even the RPNet in the embodiment of the present application can also be used for quarter-precision sub-pixel motion compensation.
  • the transposed convolution layer in the upper projection module and the convolution kernel in the convolution layer in the lower projection module The size is set to 8 ⁇ 8, the stride and padding are set to 4 and 2 respectively, and a sub-pixel convolutional layer is added in the upsampling module.
  • the preset neural network model can be obtained through model training.
  • the method may also include:
  • the training data set includes at least one training image
  • the neural network model is trained by using the at least one set of input image groups to obtain at least one set of candidate model parameters; wherein the true value area is used to determine the loss value (Loss) of the loss function of the neural network model, which At least one set of candidate model parameters is obtained when the loss value of the loss function converges to a preset threshold.
  • the true value area is used to determine the loss value (Loss) of the loss function of the neural network model, which At least one set of candidate model parameters is obtained when the loss value of the loss function converges to a preset threshold.
  • the embodiment of the present application may select a public data set (such as the DIV2K data set), which includes 800 training images and 100 verification images.
  • the preprocessing of the DIV2K dataset mainly includes two steps of format conversion and encoding reconstruction. First, format conversion is performed on 800 high-resolution images in the training set, 100 high-resolution images in the test set, and their corresponding low-resolution images, from the original PNG format to YUV420 format. Then, extract the luminance component from the high-resolution image data in YUV420 format and save it in PNG format as the ground truth area.
  • VTM7.0 is used for full intra-frame encoding, and the quantization parameters (Quantization Parameter, QP) can be set to 22, 27, 32, and 37 respectively, and then the four sets of decoded and reconstructed data are extracted separately.
  • the brightness component is saved in PNG format as the input of the neural network model. Thus, four sets of training data sets can be obtained.
  • the embodiment of the present application selects peak signal-to-noise ratio (Peak Signal-to-Noise Ratio, PSNR) as the evaluation standard of image reconstruction quality.
  • PSNR Peak Signal-to-Noise Ratio
  • the model is trained based on the Pytorch platform.
  • a low-resolution image of size 48 ⁇ 48 is taken as input, and the batch is set to 16.
  • the mean absolute error can be selected as the loss function
  • the adaptive moment estimation can be used as the optimization function
  • the momentum and weight decay can be set to 0.9 and 0.0001, respectively.
  • the initial learning rate is set to 0.0001, and every 100 epochs (epochs) are reduced by a ratio of 0.1, and a total of 300 epochs have passed.
  • four sets of model parameters can be obtained. These four sets of model parameters correspond to four models, which are represented by RPNet_qp22, RPNet_qp27, RPNet_qp32, and RPNet_qp37.
  • the method may also include:
  • a preset neural network model is determined.
  • the input image groups correspond to different quantization parameters, and there are correspondences between the multiple groups of candidate model parameters and different quantization parameters.
  • the trained model parameters corresponding to the quantization parameter can be determined, and then the preset neural network model used in the embodiment of the present application can be determined.
  • the method may further include: encoding the model parameters, and writing encoded bits into a code stream.
  • the encoder and decoder use the same preset neural network model with fixed parameters, then the parameters have been solidified at this time, so there is no need to transmit model parameters; on the other hand, if the code stream transmits The access information of the public training data set, such as a Uniform Resource Locator (Uniform Resource Locator, URL), the decoder is trained in the same way as the encoder; on the other hand, for the encoder, the encoded video sequence can be used for study.
  • Uniform Resource Locator Uniform Resource Locator
  • the encoder writes the model parameters into the code stream, the decoder does not need to perform model training. After obtaining the model parameters by parsing the code stream, the preset neural network used in the embodiment of this application can be determined. Model.
  • the preset neural network model can be used to perform motion compensation enhancement on the first matching block to obtain at least one second matching block.
  • S403 Determine motion information of the current block according to at least one second matching block.
  • the embodiment of the present application also needs to perform sub-pixel motion estimation.
  • the method may also include:
  • the target matching block at the sub-pixel position (may be referred to as “sub-pixel matching block”) is the matching block with the smallest rate-distortion cost when motion estimation is performed at the sub-pixel position for the current block.
  • the matching block at the sub-pixel position is the optimal matching block, that is, the target matching block described in the embodiment of the present application. That is to say, the target matching block at the sub-pixel position (or "sub-pixel matching block" for short) is the matching block corresponding to the minimum rate-distortion cost value selected from the plurality of second matching blocks.
  • the determining the motion information of the current block according to at least one second matching block may include:
  • first rate-distortion cost is greater than the second rate-distortion cost, then determine that the current block uses motion compensation enhancement processing, and determine that the motion information is the first motion information, and the first motion information is used to point to the sub-pixel position;
  • the motion information is determined as the second motion information, and the second motion information is used to point to the integer pixel position .
  • the embodiment of the present application is determined according to the value of the calculated rate-distortion cost . That is to say, the encoder finally selects the mode with the smallest rate-distortion cost for predictive encoding.
  • the motion information at this time is the first motion information, which is used to point to the sub-pixel position (ie "sub-pixel precision position"). At this time, when decoding It is also necessary to perform motion compensation enhancement in the processor to interpolate the second matching block. Otherwise, if it is determined that the current block does not use the motion compensation enhancement processing method, then the motion information at this time is the second motion information, which is used to point to the integer pixel position (ie "integer pixel precision position"), and the decoder does not need it at this time Enhanced with motion compensation.
  • the method may also include:
  • first rate-distortion cost value is greater than the second rate-distortion cost value, determine that the value of the first syntax element identification information is the first value
  • first rate-distortion cost is less than or equal to the second rate-distortion cost, it is determined that the value of the first syntax element identification information is the second value.
  • the first value and the second value are different, and the first value and the second value may be in the form of a parameter or a form of a number.
  • the first syntax element identification information is a parameter written in the profile (profile), but the first syntax element identification information may also be a flag (flag), which is not limited here.
  • first syntax element identification information may also be set, where the first syntax element identification information is used to indicate whether the current block uses a motion compensation enhancement processing manner.
  • the decoder subsequently in the decoder, according to the value of the identification information of the first syntax element, it can be determined whether the current block uses the motion compensation enhancement processing manner.
  • the first syntax element identification information is a flag
  • the first value can be set to 1, and the second value can be set to 0; in another specific example, The first value can also be set to true, and the second value can also be set to false; even in yet another specific example, the first value can also be set to 0, and the second value can also be set to 1; or, the first value It can also be set to false, and the second value can also be set to true.
  • the first value and the second value in the embodiment of the present application are not limited in any way.
  • the embodiment of the present application may also set second syntax element identification information, where the second syntax element identification information is used to indicate whether the current block uses the motion compensation enhancement method of the embodiment of the application.
  • the method may further include: if the second syntax element identification information indicates that the current block uses the motion compensation enhancement method of the embodiment of the present application, that is, the value of the second syntax element identification information is the first value, then execute The process shown in FIG. 4; if the second syntax element identification information indicates that the current block does not use the motion compensation enhancement method of the embodiment of the present application, that is, the value of the second syntax element identification information is the second value, then perform the motion of the related art Compensation and enhancement methods, such as DCTIF-based sub-pixel motion compensation methods.
  • the method may further include: encoding the value of the identification information of the first syntax element, and writing the encoded bits into the code stream.
  • the decoder can directly determine whether the current block uses the motion compensation enhancement processing mode by analyzing the code stream, so as to facilitate the decoder to perform subsequent operations.
  • S404 Encode the current block according to the motion information.
  • the motion information may at least include: reference frame information and motion vectors.
  • the prediction block can be determined from the reference frame.
  • encoding the current block according to the motion information may include:
  • the residual block is encoded, and the encoded bits are written into the code stream.
  • the determining the residual block of the current block according to the current block and the first predicted block may include: performing a subtraction operation on the current block and the first predicted block to determine the residual block of the current block .
  • encoding the current block according to the motion information may include:
  • the residual block is encoded, and the encoded bits are written into the code stream.
  • the current block does not use the motion compensation enhancement processing method, it means that the current block uses the integer pixel motion compensation method.
  • the determining the residual block of the current block according to the current block and the second predicted block may include: performing a subtraction operation on the current block and the second predicted block to determine The residual block of the current block.
  • the method may further include: encoding the motion information, and writing the encoded bits into the code stream.
  • the decoder can determine the motion information by analyzing the code stream, and then determine the prediction block (the first prediction block or the second prediction block) of the current block according to the motion information. , so that the decoder can perform subsequent operations.
  • the embodiment of the present application combines transposed convolution and residual network to propose a residual projection block. Then, based on the residual projection block, the embodiment of this application proposes a half-precision sub-pixel interpolation network RPNet, and applies it in VTM7.0.
  • An embodiment of the present application provides an encoding method, which is applied to an encoder.
  • determining the first matching block of the current block By determining the first matching block of the current block; performing motion compensation enhancement on the first matching block to obtain at least one second matching block; determining the motion information of the current block according to the at least one second matching block; to encode.
  • using the preset neural network model for motion compensation enhancement can not only reduce the computational complexity, but also save the code rate and improve the encoding and decoding efficiency under the premise of ensuring the same decoding quality.
  • the embodiment of the present application provides a code stream, where the code stream is generated by performing bit coding according to the information to be coded.
  • the information to be encoded includes at least the motion information of the current block, the residual block of the current block, and the value of the first syntax element identification information, and the first syntax element identification information is used to indicate whether the current block uses motion compensation enhancement processing.
  • the code stream may be transmitted from the encoder to the decoder, so that the decoder can perform subsequent operations conveniently.
  • FIG. 7 shows a schematic flowchart of a decoding method provided in an embodiment of the present application. As shown in Figure 7, the method may include:
  • S701 Parse the code stream, and determine a value of the first syntax element identification information.
  • the video image can be divided into multiple image blocks, and each image block to be decoded can be called a decoding block, and the current block here specifically refers to the current block to be subjected to inter-frame prediction. decoding block.
  • the current block may be a CTU, or even a CU, PU, etc., which is not limited in this embodiment of the present application.
  • the decoding method in the embodiment of the present application is mainly applied to the motion compensation part of the inter-frame prediction.
  • the motion compensation is to use the partial image in the decoded and reconstructed reference frame to predict and compensate the current partial image, which can reduce the redundant information of the moving image.
  • the parsing the code stream to determine the value of the first syntax element identification information may include:
  • the current block uses a motion compensation enhancement processing method
  • the value of the identification information of the first syntax element is the second value, it is determined that the current block does not use the motion compensation enhancement processing manner.
  • the identification information of the first syntax element is used to indicate whether the current block uses a motion compensation enhancement processing manner.
  • the first value and the second value are different, and the first value and the second value may be in the form of parameters or numbers.
  • the first syntax element identification information is a parameter written in the profile (profile), but the first syntax element identification information may also be a flag (flag), which is not limited here.
  • the first syntax element identification information is a flag
  • the first value can be set to 1, and the second value can be set to 0; in another specific example, The first value can also be set to true, and the second value can also be set to false; even in yet another specific example, the first value can also be set to 0, and the second value can also be set to 1; or, the first value It can also be set to false, and the second value can also be set to true.
  • the first value and the second value in the embodiment of the present application are not limited in any way.
  • the motion information obtained by decoding is the first motion information, which is used to point to the sub-pixel position.
  • motion compensation enhancement is also required in the decoder to interpolate the second matching block.
  • the motion information obtained by decoding at this time is the second motion information, which is used to point to the integer pixel position, At this time, the decoder does not need to perform motion compensation enhancement.
  • S703 Determine a first matching block of the current block according to the first motion information, and perform motion compensation enhancement on the first matching block to obtain at least one second matching block.
  • the motion information may include reference frame information and motion vector (Motion Vector, MV) information.
  • MV Motion Vector
  • whether to use sub-pixel motion compensation is determined by the MV precision determined by analyzing the code stream. For example, it identifies whether the MV is an integer pixel precision or a sub-pixel precision. If it is sub-pixel precision, such as 1/4 pixel, and the lower 2 bits of the MV component are all 0, it can indicate that the MV points to the position of integer pixel precision; otherwise, it points to the position of sub-pixel precision.
  • the first matching block here may point to an integer-pixel precision position, or may be a sub-pixel precision position pointed to by a sub-pixel interpolation filtering method in the related art, which is not limited by this embodiment of the present application.
  • the decoder needs to perform sub-pixel motion compensation to interpolate out of the second matching block.
  • the decoded reference frames in the decoder are all integer pixel positions, the sub-pixel positions in the middle of the whole pixel positions need to be obtained by interpolation. Realized by pixel motion compensation.
  • the step of performing motion compensation enhancement on the first matching block may further include: performing motion compensation enhancement on the first matching block by using a preset neural network model.
  • performing motion compensation enhancement on the first matching block to obtain at least one second matching block may include:
  • the first filtering process is performed on the processing block to obtain at least one second matching block.
  • the resolution of the processing block is higher than the resolution of the current block.
  • the resulting processed blocks after super-resolution and quality enhancement processing have high-quality and high-resolution performance.
  • the first matching block has the same resolution as the current block
  • the second matching block obtained after the first filtering process also has the same resolution as the current block.
  • the first filtering process may include: downsampling. That is to say, after the processing block is obtained, at least one second matching block can be obtained by down-sampling the processing block.
  • performing motion compensation enhancement on the first matching block using a preset neural network model to obtain at least one second matching block may include:
  • the precision of the second matching block is half precision, and the number of the second matching blocks is four; in another In a possible implementation manner, the precision of the second matching block is quarter precision, and the number of the second matching blocks is 16; however, this embodiment of the present application does not make any limitation thereto.
  • the preset neural network model may be a convolutional neural network model.
  • the convolutional neural network model can be used to implement end-to-end super-resolution and quality enhancement for the first matching block, and then down-sample the output high-resolution image to generate four half-precision Pixel samples (ie "second matching block").
  • the preset neural network model may include a feature extraction module, a residual projection module group, a sampling module and a reconstruction module; wherein, the feature extraction module, the residual projection module group, the sampling module and the reconstruction module are sequentially connections.
  • performing super-resolution and quality enhancement processing on the first matching block to obtain a processing block may include:
  • the feature extraction module can also be called a "shallow feature extraction module".
  • the feature extraction module may be the first convolutional layer.
  • performing shallow feature extraction on the first matching block through the feature extraction module to obtain the first feature information may include: performing a convolution operation on the first matching block through the first convolution layer to obtain the first feature information .
  • the shallow features here mainly refer to low-level simple features (such as edge features, etc.).
  • the size of the convolution kernel of the first convolution layer is K ⁇ L
  • the number of convolution kernels of the first convolution layer is an integer power of 2
  • K and L are positive integers greater than zero.
  • the size of the convolution kernel of the first convolution layer may be 3 ⁇ 3
  • the number of convolution kernels of the first convolution layer is 64, but there is no limitation here.
  • the residual projection module set may include N residual projection blocks, a second convolutional layer, and a first connection layer; where N is greater than or equal to Integer of 1.
  • the N residual projection blocks, the second convolutional layer and the first connection layer are sequentially connected, and the first connection layer is also connected with the first residual projection block of the N residual projection blocks. Enter the connection.
  • performing residual feature learning on the first feature information through the residual projection module group to obtain the second feature information includes:
  • the first feature information and the second intermediate feature information are added through the first connection layer to obtain the second feature information.
  • the size of the convolution kernel of the second convolution layer is K ⁇ L
  • the number of convolution kernels of the second convolution layer is an integer power of 2
  • K and L are positive integers greater than zero.
  • the size of the convolution kernel of the second convolution layer is 3 ⁇ 3
  • the number of convolution kernels of the second convolution layer is 64, but there is no limitation here.
  • the N residual projection blocks are a cascade structure
  • the input of the cascade structure is the first feature information
  • the output of the cascade structure is the second intermediate feature information.
  • performing residual feature learning on the first feature information through N residual projection blocks to obtain the first intermediate feature information may include:
  • N When N is equal to 1, input the first feature information to the first residual projection block, obtain the output information of the first residual projection block, and determine the output information of the first residual projection block as the first intermediate characteristic information;
  • N is greater than 1
  • input the output information of the dth residual projection block to the d+1th residual projection block to obtain the d+1th residual The output information of the difference projection block, and add 1 to d until the output information of the Nth residual projection block is obtained, and the output information of the Nth residual projection block is determined as the first intermediate feature information; where, d is an integer greater than or equal to 1 and less than N.
  • the output information of the residual projection block is the first intermediate feature information; if N is greater than 1, that is, the residual projection module There are more than two residual projection blocks in the group.
  • the output of the previous residual projection block is the input of the next residual projection block until the output of the last residual projection block is obtained by stacking.
  • Information, at this time the output information of the last residual projection block is the first intermediate feature information.
  • the residual projection block may include an upper projection module, M residual modules, a local feature fusion module, a lower projection module and a second connection layer; wherein, M is an integer greater than or equal to 1.
  • the upper projection module, the M residual modules, the local feature fusion module, the lower projection module and the second connection layer are sequentially connected, and the second connection layer is also connected to the input of the upper projection module, M
  • the output of the residual module is also respectively connected with the local feature fusion module.
  • the method may also include:
  • the input information and the filtered feature information are added through the second connection layer to obtain the output information of the residual projection block.
  • the up-projection module may include a transposed convolutional layer.
  • performing the third filtering process on the input information of the residual projection block through the up-projection module to obtain the first high-resolution feature information may include:
  • the third filtering process is performed on the input information of the residual projection block by transposing the convolution layer to obtain the first high-resolution feature information.
  • the resolution of the first high-resolution feature information obtained after the third filtering process is higher than the resolution of the input information of the residual projection block.
  • the third filtering process may include: upsampling. That is, the input information of the residual projection block is upsampled by transposing the convolutional layer to obtain the first high-resolution feature information.
  • the local feature fusion module may include a feature fusion layer and a third convolutional layer, and the M second high-resolution feature information is processed by the local feature fusion module Perform a fusion operation to obtain the third high-resolution feature information, including:
  • a third convolutional layer is used to perform a convolution operation on the fused feature information to obtain third high-resolution feature information.
  • the size of the convolution kernel of the third convolution layer is K ⁇ L
  • the number of convolution kernels of the third convolution layer is an integer power of 2
  • K and L are positive integers greater than zero.
  • the size of the convolution kernel of the third convolution layer is 1 ⁇ 1
  • the number of convolution kernels of the third convolution layer is 64, but there is no limitation here.
  • the M second high-resolution feature information is fused through the feature fusion layer, but in order to give full play to the learning ability of the residual network, a 1 ⁇ 1 convolutional layer is also introduced here
  • the fusion operation of the feature information learned by the residual module can adaptively control the learned feature information.
  • the lower projection module may include a fourth convolutional layer, and the fourth filtering process is performed on the third high-resolution feature information through the lower projection module to obtain the filtered Characteristic information, including:
  • a fourth filtering process is performed on the third high-resolution feature information through the fourth convolution layer to obtain filtered feature information.
  • the resolution of the filtered feature information obtained after the fourth filtering process is lower than the resolution of the third high-resolution feature information.
  • the filtered feature information obtained after the fourth filtering process has the same resolution as the input information of the residual projection block.
  • the fourth filtering process may include: downsampling. That is to say, the third high-resolution feature information is down-sampled through the fourth convolutional layer to obtain filtered feature information.
  • the sampling module of the preset neural network module may include a sub-pixel convolution layer.
  • the second filtering process is performed on the second feature information through the sampling module to obtain the third feature information, including:
  • the second filtering process is performed on the second feature information through the sub-pixel convolution layer to obtain the third feature information.
  • the resolution of the third feature information obtained after the second filtering process is higher than the resolution of the second feature information.
  • the second filtering process may include: upsampling. That is to say, the sampling module is mainly used for upsampling the second characteristic information, so the sampling module may also be called an "upsampling module”.
  • the sampling module may use a sub-pixel convolution layer, or may add a sub-pixel convolution layer.
  • the sub-pixel convolutional layer can also be a PixShuffle module (or, called Pixelshuffle module), which realizes the function of inputting a low-resolution H ⁇ W image, and converting it to It is transformed into a rH ⁇ rW high-resolution input image.
  • the implementation process does not directly generate this high-resolution image through interpolation, but first obtains the feature map of r 2 channels through convolution (the size of the feature map is consistent with the input low-resolution image), and then passes periodic screening (periodic shuffling) method to obtain this high-resolution image; wherein, r can be the magnification of the image.
  • r can be the magnification of the image.
  • the number of channels of which is r 2 rearrange the r 2 channels of each pixel into an r ⁇ r area, corresponding to a sub-block of r ⁇ r size in the high-resolution image,
  • the feature image of size r 2 ⁇ H ⁇ W is rearranged into a high-resolution image of size 1 ⁇ rH ⁇ rW.
  • the reconstruction module may include a fifth convolutional layer.
  • the super-resolution reconstruction of the third feature information by the reconstruction module to obtain the processing block may include:
  • a convolution operation is performed on the third feature information through the fifth convolution layer to obtain a processing block.
  • the size of the convolution kernel of the fifth convolution layer is K ⁇ L
  • the number of convolution kernels of the fifth convolution layer is an integer power of 2
  • K and L are positive integers greater than zero.
  • the size of the convolution kernel of the fifth convolution layer is 3 ⁇ 3
  • the number of convolution kernels of the fifth convolution layer is 1, but there is no limitation here.
  • FIG. 5 shows an example of a network structure of a preset neural network model provided in an embodiment of the present application
  • FIG. 6 shows an example of a network structure of a residual projection block provided in an embodiment of the present application. That is to say, the embodiment of the present application combines the transposed convolution and the residual module to propose the residual projection block RPB.
  • the basic idea is to use the transposed convolution layer to project low-resolution features into the high-resolution feature space, and then use The residual module learns high-resolution features of different levels, then improves the expressive ability of the residual module through local feature fusion, and finally uses the convolutional layer to project the high-resolution features back to the low-resolution feature space.
  • the embodiment of this application proposes a preset neural network model RPNet with half-precision sub-pixel interpolation, and embeds the trained model into the coding platform VTM7.0.
  • the embodiment of the present application can only select RPNet for motion compensation enhancement for PUs with a size greater than or equal to 64 ⁇ 64; for PUs with a size smaller than 64 ⁇ 64, motion is still performed according to the interpolation filter in the related art Compensation enhancements.
  • the method in the embodiment of the present application can realize sub-pixel motion compensation with half precision, and can also realize sub-pixel motion compensation with quarter precision.
  • the pixel-by-pixel motion compensation can even implement pixel-by-pixel motion compensation with other precision, which is not limited in this embodiment of the present application.
  • the convolution kernel size of the transposed convolution layer and the fourth convolution layer are both 6 ⁇ 6, and the step size and padding value are both set to 2; or, when the precision of the sub-pixel sample value is a quarter precision, the convolution kernel size of the transposed convolution layer and the fourth convolution layer are both 8 ⁇ 8, and the step size and padding value are respectively set to 4 and 2.
  • the number N of residual projection blocks in RPNet can be set to 10
  • the number M of residual modules in each residual projection block can be Set to 3.
  • the number of convolution kernels of the convolution layer in the reconstruction network layer is set to 1
  • the number of convolution kernels of other transposed convolution layers or convolution layers in the network model is set to 64.
  • the size of the convolution kernel in the transposed convolutional layer in the upper projection module and the convolutional layer in the lower projection module is set to 6 ⁇ 6, and the stride and padding are set to 2.
  • other convolutional layers in the network model use convolutional kernels with a size of 3 ⁇ 3, and the upsampling module can use sub-pixel convolutional layers.
  • the RPNet in the embodiment of the present application can also be used for PUs of all sizes to perform half-precision sub-pixel interpolation.
  • the RPNet in the embodiment of the present application can also adjust the number of residual projection blocks and the number of residual modules in the residual projection block. Even the RPNet in the embodiment of the present application can also be used for quarter-precision sub-pixel motion compensation.
  • the transposed convolution layer in the upper projection module and the convolution kernel in the convolution layer in the lower projection module The size is set to 8 ⁇ 8, the stride and padding are set to 4 and 2 respectively, and a sub-pixel convolutional layer is added in the upsampling module.
  • the method may also include:
  • determining a training data set comprising at least one training image and at least one verification image
  • the neural network model is trained by using the at least one set of input image groups to obtain at least one set of candidate model parameters; wherein the true value area is used to determine the loss value (Loss) of the loss function of the neural network model, which At least one set of candidate model parameters is obtained when the loss value of the loss function converges to a preset threshold.
  • the true value area is used to determine the loss value (Loss) of the loss function of the neural network model, which At least one set of candidate model parameters is obtained when the loss value of the loss function converges to a preset threshold.
  • the embodiment of the present application may choose a public data set DIV2K, which contains 800 training images and 100 verification images.
  • the preprocessing of the DIV2K dataset mainly includes two steps of format conversion and encoding reconstruction. First, format conversion is performed on 800 high-resolution images in the training set, 100 high-resolution images in the test set, and their corresponding low-resolution images, from the original PNG format to YUV420 format. Then, extract the luminance component from the high-resolution image data in YUV420 format and save it in PNG format as the ground truth area.
  • VTM7.0 is used for full intra-frame encoding, and the quantization parameters (Quantization Parameter, QP) can be set to 22, 27, 32, and 37 respectively, and then the four sets of decoded and reconstructed data are extracted separately.
  • the brightness component is saved in PNG format as the input of the neural network model. Thus, four sets of training data sets can be obtained.
  • the embodiment of the present application selects peak signal-to-noise ratio (Peak Signal-to-Noise Ratio, PSNR) as the evaluation standard of image reconstruction quality.
  • PSNR Peak Signal-to-Noise Ratio
  • the model is trained based on the Pytorch platform.
  • a low-resolution image of size 48 ⁇ 48 is taken as input, and the batch is set to 16.
  • the mean absolute error can be selected as the loss function
  • the adaptive moment estimation can be used as the optimization function
  • the momentum and weight decay can be set to 0.9 and 0.0001, respectively.
  • the initial learning rate is set to 0.0001, and every 100 epochs (epochs) are reduced by a ratio of 0.1, and a total of 300 epochs have passed.
  • QP using the corresponding data set training, four sets of model parameters are obtained. These four sets of model parameters correspond to four models, which are represented by RPNet_qp22, RPNet_qp27, RPNet_qp32, and RPNet_qp37 respectively.
  • the determination of the preset neural network model can be realized in the following two ways.
  • the method may further include:
  • a preset neural network model is determined.
  • the input image groups correspond to different quantization parameters, and there are correspondences between the multiple groups of candidate model parameters and different quantization parameters.
  • the method may also include:
  • the preset neural network model is determined according to the model parameters.
  • the decoder can determine the trained model parameters corresponding to the quantization parameters of the current block according to the quantization parameters of the current block, and then determine the preset neural network model used in the embodiment of the present application; or , the decoder can also obtain model parameters by parsing the code stream, and then determine the preset neural network model used in the embodiment of the present application according to the model parameters; the embodiment of the present application does not specifically limit this.
  • the encoder and decoder use the same preset neural network model with fixed parameters, then the parameters have been solidified at this time, so there is no need to transmit model parameters; on the other hand, if the code stream contains Transmitting access information for public training datasets, such as a Uniform Resource Locator (URL), the decoder is trained in the same way as the encoder; on the other hand, for the encoder, the encoded video sequence can be used to study.
  • URL Uniform Resource Locator
  • the preset neural network model can be used to perform motion compensation on the first matching block Enhanced to obtain at least one second matching block.
  • S704 Determine a first prediction block of the current block according to the first motion information and at least one second matching block.
  • S705 Determine a reconstructed block of the current block according to the first predicted block.
  • the decoder also needs to decode to obtain the residual block of the current block.
  • the method may further include: parsing the code stream to obtain a residual block of the current block.
  • the determining the reconstruction block of the current block according to the first prediction block may include: determining the reconstruction block of the current block according to the residual block and the first prediction block.
  • the determining the reconstructed block of the current block according to the residual block and the first predicted block may include: performing an addition operation on the residual block and the first predicted block to determine the reconstructed block of the current block .
  • the first syntax element identification information may also indicate that the current block does not use the motion compensation enhancement processing method, that is, the current block uses the integer pixel motion compensation method.
  • the method may also include:
  • the first syntax element identification information indicates that the current block does not use the motion compensation enhancement processing method, then parse the code stream to obtain the second motion information of the current block, and the second motion information is used to point to the integer pixel position;
  • a reconstructed block of the current block is determined.
  • the second prediction block of the current block can be determined according to the second motion information obtained through decoding.
  • the decoder still needs to decode to obtain the residual block of the current block.
  • the method may further include: parsing the code stream to obtain a residual block of the current block.
  • the determining the reconstruction block of the current block according to the second prediction block may include: determining the reconstruction block of the current block according to the residual block and the second prediction block.
  • the determining the reconstructed block of the current block according to the residual block and the second predicted block may include: performing an addition operation on the residual block and the second predicted block to determine the reconstructed block of the current block .
  • the embodiment of this application proposes a half-precision sub-pixel interpolation network RPNet, and applies it in VTM7.0.
  • the embodiment of the present application provides a decoding method, which is applied to a decoder. Determine the value of the first syntax element identification information by parsing the code stream; if the first syntax element identification information indicates that the current block uses motion compensation enhancement processing, then parse the code stream to determine the first motion information of the current block; according to the first The motion information determines the first matching block of the current block, and performs motion compensation enhancement on the first matching block to obtain at least one second matching block; according to the first motion information and at least one second matching block, determines the first prediction of the current block block; determine the reconstructed block of the current block according to the first predicted block.
  • the preset neural network model is used to perform motion compensation enhancement at this time.
  • the computational complexity be reduced, but also the code rate can be saved.
  • the encoding and decoding efficiency can be improved.
  • FIG. 8 shows a schematic structural diagram of an encoder 80 provided in the embodiment of the present application.
  • the encoder 80 may include: a first determination unit 801, a first motion compensation unit 802, and an encoding unit 803; wherein,
  • the first determining unit 801 is configured to determine the first matching block of the current block
  • the first motion compensation unit 802 is configured to perform motion compensation enhancement on the first matching block to obtain at least one second matching block;
  • the first determining unit 801 is further configured to determine the motion information of the current block according to at least one second matching block;
  • the encoding unit 803 is configured to encode the current block according to the motion information.
  • the first motion compensation unit 802 is specifically configured to perform super-resolution and quality enhancement processing on the first matching block to obtain a processing block; and perform first filtering processing on the processing block to obtain at least one second matching block, wherein the second matching block obtained after the first filtering process has the same resolution as the current block.
  • the first filtering process includes: downsampling.
  • the first motion compensation unit 802 is further configured to use a preset neural network model to perform motion compensation enhancement on the first matching block; wherein the preset neural network model includes a feature extraction module, a residual projection module group, A sampling module and a reconstruction module, and the feature extraction module, the residual projection module group, the sampling module and the reconstruction module are sequentially connected;
  • the first motion compensation unit 802 is further configured to perform shallow feature extraction on the first matching block through the feature extraction module to obtain the first feature information; and perform residual feature extraction on the first feature information through the residual projection module group learning to obtain second feature information; and performing second filtering processing on the second feature information through the sampling module to obtain third feature information; and performing super-resolution reconstruction on the third feature information through the reconstruction module to obtain processing blocks.
  • the feature extraction module includes a first convolutional layer; correspondingly, the first motion compensation unit 802 is also configured to perform a convolution operation on the first matching block through the first convolutional layer to obtain the first feature information .
  • the residual projection module set includes N residual projection blocks, a second convolutional layer and a first connection layer, and the N residual projection blocks, the second convolutional layer and the first connection layer are sequentially connected , and the first connection layer is also connected to the input of the first residual projection block in the N residual projection blocks;
  • the first motion compensation unit 802 is further configured to perform residual feature learning on the first feature information through N residual projection blocks to obtain first intermediate feature information, where N is an integer greater than or equal to 1; and Performing a convolution operation on the first intermediate feature information through the second convolution layer to obtain the second intermediate feature information; and performing an addition calculation on the first feature information and the second intermediate feature information through the first connection layer to obtain the second feature information .
  • the N residual projection blocks are a cascade structure
  • the input of the cascade structure is the first feature information
  • the output of the cascade structure is the second intermediate feature information
  • the first motion compensation unit 802 is further configured to input the first feature information to the first residual projection block when N is equal to 1, to obtain the output information of the first residual projection block, and Determining the output information of the first residual projection block as the first intermediate feature information; and when N is greater than 1, after obtaining the output information of the first residual projection block, outputting the output of the dth residual projection block
  • the information is input to the d+1th residual projection block, the output information of the d+1th residual projection block is obtained, and d is added to d until the output information of the Nth residual projection block is obtained, and The output information of the Nth residual projection block is determined as the first intermediate feature information; wherein, d is an integer greater than or equal to 1 and less than N.
  • the residual projection block includes an upper projection module, M residual modules, a local feature fusion module, a lower projection module and a second connection layer, an upper projection module, M residual modules, a local feature fusion module,
  • the lower projection module is connected to the second connection layer in sequence, and the second connection layer is also connected to the input of the upper projection module, and the outputs of the M residual modules are also respectively connected to the local feature fusion module;
  • the first motion compensation unit 802 is further configured to perform a third filtering process on the input information of the residual projection block through the up-projection module to obtain the first high-resolution feature information; and to obtain the first high-resolution feature information through the M residual modules;
  • the high-resolution feature information performs different levels of high-resolution feature learning to obtain M second high-resolution feature information, where M is an integer greater than or equal to 1;
  • High-resolution feature information is fused to obtain the third high-resolution feature information; and the fourth filtering process is performed on the third high-resolution feature information through the lower projection module to obtain filtered feature information; and the input information is obtained through the second connection layer. and the filtered feature information are added to obtain the output information of the residual projection block.
  • the upper projection module includes a transposed convolutional layer; correspondingly, the first motion compensation unit 802 is also configured to perform a third filtering process on the input information of the residual projection block through the transposed convolutional layer, to obtain The first high-resolution feature information, wherein the resolution of the first high-resolution feature information obtained after the third filtering process is higher than the resolution of the input information of the residual projection block.
  • the third filtering process includes: upsampling.
  • the local feature fusion module includes a feature fusion layer and a third convolutional layer; correspondingly, the first motion compensation unit 802 is also configured to fuse the M second high-resolution feature information through the feature fusion layer operation to obtain fusion feature information; and performing a convolution operation on the fusion feature information through a third convolution layer to obtain third high-resolution feature information.
  • the lower projection module includes a fourth convolutional layer; correspondingly, the first motion compensation unit 802 is also configured to perform fourth filtering processing on the third high-resolution feature information through the fourth convolutional layer, to obtain The filtered feature information, wherein the resolution of the filtered feature information obtained after the fourth filtering process is lower than the resolution of the third high-resolution feature information.
  • the fourth filtering process includes: downsampling.
  • the sampling module includes a sub-pixel convolution layer; correspondingly, the first motion compensation unit 802 is further configured to perform a second filtering process on the second feature information through the sub-pixel convolution layer to obtain the third feature information , wherein the resolution of the third feature information obtained after the second filtering process is higher than the resolution of the second feature information.
  • the second filtering process includes: upsampling.
  • the reconstruction module includes a fifth convolutional layer; correspondingly, the first motion compensation unit 802 is further configured to perform a convolution operation on the third feature information through the fifth convolutional layer to obtain a processing block.
  • the encoder 80 may further include a first training unit 804 configured to determine a training data set, the training data set includes at least one training image; and perform preprocessing on the training data set to obtain the preprocessed Set the true value area of the neural network model and at least one set of input image groups; wherein, the input image group includes at least one input image; and based on the true value area, use at least one set of input image groups to train the neural network model to obtain at least A set of candidate model parameters; wherein, the true value area is used to determine the loss value of the loss function of the neural network model, and at least one set of candidate model parameters is obtained when the loss value of the loss function converges to a preset threshold.
  • a first training unit 804 configured to determine a training data set, the training data set includes at least one training image; and perform preprocessing on the training data set to obtain the preprocessed Set the true value area of the neural network model and at least one set of input image groups; wherein, the input image group includes at least one input image; and
  • the first determination unit 801 is further configured to determine the quantization parameter of the current block; determine the model parameter corresponding to the quantization parameter from at least one group of candidate model parameters according to the quantization parameter; and determine the preset A neural network model; wherein, when at least one group is multiple groups, the input image groups correspond to different quantization parameters, and there is a corresponding relationship between the multiple groups of candidate model parameters and different quantization parameters.
  • the encoding unit 803 is further configured to encode the model parameters, and write the encoded bits into the code stream.
  • the encoder 80 may further include a motion estimation unit 805 configured to perform integer pixel motion estimation on the current block, and determine the first matching block of the current block; wherein, the first matching block is the current block The matching block with the least rate-distortion cost when performing motion estimation at the integer pixel position;
  • the first motion compensation unit 802 is further configured to perform pixel-by-pixel motion compensation on the first matching block by using a preset neural network model to obtain at least one second matching block.
  • the motion estimation unit 805 is further configured to perform pixel-by-pixel motion estimation on the current block according to at least one second matching block, and determine the pixel-by-pixel matching block of the current block, where the pixel-by-pixel matching block is the pixel-by-pixel position of the current block The matching block with the least rate-distortion cost when performing motion estimation;
  • the first determining unit 801 is further configured to use the first matching block to perform precoding processing on the current block to determine a first rate-distortion cost value; and use the sub-pixel matching block to perform precoding processing on the current block to determine a second rate-distortion cost value value; and if the first rate-distortion cost value is greater than the second rate-distortion cost value, then determine that the current block uses a motion compensation enhancement processing method, and determine that the motion information is the first motion information, and the first motion information is used to point to the sub-pixel position; Or, if the first rate-distortion cost is less than or equal to the second rate-distortion cost, it is determined that the current block does not use the motion compensation enhancement processing method, and the motion information is determined to be the second motion information, and the second motion information is used to point to the integer pixel Location.
  • the first determining unit 801 is further configured to determine that the value of the first syntax element identification information is the first value if the first rate-distortion cost is greater than the second rate-distortion cost; or, if the first If the rate-distortion cost value is less than or equal to the second rate-distortion cost value, it is determined that the value of the first syntax element identification information is the second value; wherein, the first syntax element identification information is used to indicate whether the current block uses motion compensation enhancement processing Way.
  • the encoding unit 803 is further configured to encode the value of the first syntax element identification information, and write the encoded bits into the code stream.
  • the encoding unit 803 is further configured to determine the first prediction block of the current block according to the first motion information and the sub-pixel matching block when the current block uses the motion compensation enhancement processing method; and according to the current block and the second A prediction block, determining the residual block of the current block; and encoding the residual block, and writing the coded bits into the code stream;
  • the encoding unit 803 is further configured to determine a second prediction block of the current block according to the second motion information and the first matching block when the current block does not use the motion compensation enhancement processing mode; and determine the second prediction block of the current block according to the current block and the second prediction block , determine the residual block of the current block; and encode the residual block, and write the coded bits into the code stream.
  • the encoding unit 803 is further configured to encode the motion information, and write the encoded bits into the code stream.
  • a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a module, or it may be non-modular.
  • each component in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules.
  • the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment is essentially or It is said that the part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes.
  • the embodiment of the present application provides a computer storage medium, which is applied to the encoder 80, and the computer storage medium stores a computer program, and when the computer program is executed by the first processor, it implements any one of the preceding embodiments. Methods.
  • FIG. 9 shows a schematic diagram of a specific hardware structure of the encoder 80 provided by the embodiment of the present application.
  • it may include: a first communication interface 901 , a first memory 902 and a first processor 903 ; each component is coupled together through a first bus system 904 .
  • the first bus system 904 is used to realize connection and communication between these components.
  • the first bus system 904 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as the first bus system 904 in FIG. 9 . in,
  • the first communication interface 901 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
  • the first memory 902 is used to store computer programs that can run on the first processor 903;
  • the first processor 903 is configured to, when running the computer program, execute:
  • the current block is encoded according to the motion information.
  • the first memory 902 in this embodiment of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
  • RAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • SRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM, DDRSDRAM enhanced synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM synchronous connection dynamic random access memory
  • Synchronous DRAM Synchronous Dynamic Random Access Memory
  • Enhanced SDRAM synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • the first memory 902 of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
  • the first processor 903 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be implemented by an integrated logic circuit of hardware in the first processor 903 or an instruction in the form of software.
  • the above-mentioned first processor 903 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the first memory 902, and the first processor 903 reads the information in the first memory 902, and completes the steps of the above method in combination with its hardware.
  • the embodiments described in this application may be implemented by hardware, software, firmware, middleware, microcode or a combination thereof.
  • the processing unit can be implemented in one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing device (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processor, controller, microcontroller, microprocessor, other devices for performing the functions described in this application electronic unit or its combination.
  • the techniques described herein can be implemented through modules (eg, procedures, functions, and so on) that perform the functions described herein.
  • Software codes can be stored in memory and executed by a processor. Memory can be implemented within the processor or external to the processor.
  • the first processor 903 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
  • This embodiment provides an encoder, which may include a first determination unit, a first motion compensation unit, and an encoding unit.
  • an encoder which may include a first determination unit, a first motion compensation unit, and an encoding unit.
  • FIG. 10 shows a schematic diagram of the composition and structure of a decoder 100 provided in the embodiment of the present application.
  • the decoder 100 may include: an analysis unit 1001, a second determination unit 1002, and a second motion compensation unit 1003; wherein,
  • the parsing unit 1001 is configured to parse the code stream and determine the value of the first syntax element identification information
  • the parsing unit 1001 is further configured to parse the code stream and determine the first motion information of the current block if the first syntax element identification information indicates that the current block uses motion compensation enhancement processing;
  • the second motion compensation unit 1003 is configured to determine a first matching block of the current block according to the first motion information, and perform motion compensation enhancement on the first matching block to obtain at least one second matching block;
  • the second determining unit 1002 is configured to determine a first prediction block of the current block according to the first motion information and at least one second matching block; and determine a reconstruction block of the current block according to the first prediction block.
  • the parsing unit 1001 is further configured to parse the code stream to obtain the residual block of the current block;
  • the second determining unit 1002 is further configured to determine the reconstructed block of the current block according to the residual block and the first prediction block.
  • the parsing unit 1001 is further configured to parse the code stream to obtain the second motion information of the current block if the first syntax element identification information indicates that the current block does not use the motion compensation enhancement processing method, and the second motion information uses point to an integer pixel position;
  • the second determining unit 1002 is further configured to determine a second prediction block of the current block according to the second motion information of the current block; and determine a reconstruction block of the current block according to the second prediction block.
  • the parsing unit 1001 is further configured to parse the code stream to obtain the residual block of the current block;
  • the second determining unit 1002 is further configured to determine the reconstructed block of the current block according to the residual block and the second prediction block.
  • the second determining unit 1002 is further configured to determine that the current block uses a motion compensation enhancement processing method if the value of the first syntax element identification information is the first value; or, if the first syntax element identification information is the second value, it is determined that the current block does not use the motion compensation enhancement processing manner.
  • the second motion compensation unit 1003 is specifically configured to perform super-resolution and quality enhancement processing on the first matching block to obtain a processed block, wherein the resolution of the processed block is higher than the resolution of the current block; and The first filtering process is performed on the processing block to obtain at least one second matching block, wherein the second matching block obtained after the first filtering process has the same resolution as the current block.
  • the first filtering process includes: downsampling.
  • the second motion compensation unit 1003 is further configured to use a preset neural network model to perform motion compensation enhancement on the first matching block; wherein, the preset neural network model includes a feature extraction module, a residual projection module group, A sampling module and a reconstruction module, and the feature extraction module, the residual projection module group, the sampling module and the reconstruction module are sequentially connected;
  • the second motion compensation unit 1003 is further configured to perform shallow feature extraction on the first matching block through the feature extraction module to obtain the first feature information; and perform residual feature extraction on the first feature information through the residual projection module group learning to obtain second feature information; and performing second filtering processing on the second feature information through the sampling module to obtain third feature information; and performing super-resolution reconstruction on the third feature information through the reconstruction module to obtain processing blocks.
  • the feature extraction module is a first convolutional layer; correspondingly, the second motion compensation unit 1003 is also configured to perform a convolution operation on the first matching block through the first convolutional layer to obtain the first feature information .
  • the residual projection module set includes N residual projection blocks, a second convolutional layer and a first connection layer, and the N residual projection blocks, the second convolutional layer and the first connection layer are sequentially connected , and the first connection layer is also connected to the input of the first residual projection block in the N residual projection blocks;
  • the second motion compensation unit 1003 is further configured to perform residual feature learning on the first feature information through N residual projection blocks to obtain first intermediate feature information, where N is an integer greater than or equal to 1; and Performing a convolution operation on the first intermediate feature information through the second convolution layer to obtain the second intermediate feature information; and performing an addition calculation on the first feature information and the second intermediate feature information through the first connection layer to obtain the second feature information .
  • the N residual projection blocks are a cascade structure
  • the input of the cascade structure is the first feature information
  • the output of the cascade structure is the second intermediate feature information
  • the second motion compensation unit 1003 is further configured to input the first feature information to the first residual projection block when N is equal to 1, to obtain the output information of the first residual projection block, and Determining the output information of the first residual projection block as the first intermediate feature information; and when N is greater than 1, after obtaining the output information of the first residual projection block, outputting the output of the dth residual projection block
  • the information is input to the d+1th residual projection block, the output information of the d+1th residual projection block is obtained, and d is added to d until the output information of the Nth residual projection block is obtained, and The output information of the Nth residual projection block is determined as the first intermediate feature information; wherein, d is an integer greater than or equal to 1 and less than N.
  • the residual projection block includes an upper projection module, M residual modules, a local feature fusion module, a lower projection module and a second connection layer, an upper projection module, M residual modules, a local feature fusion module,
  • the lower projection module is connected to the second connection layer in sequence, and the second connection layer is also connected to the input of the upper projection module, and the outputs of the M residual modules are also respectively connected to the local feature fusion module;
  • the second motion compensation unit 1003 is further configured to perform a third filtering process on the input information of the residual projection block through the up-projection module to obtain the first high-resolution feature information; and to obtain the first high-resolution feature information through the M residual modules;
  • the high-resolution feature information performs different levels of high-resolution feature learning to obtain M second high-resolution feature information, where M is an integer greater than or equal to 1;
  • High-resolution feature information is fused to obtain the third high-resolution feature information; and the fourth filtering process is performed on the third high-resolution feature information through the lower projection module to obtain filtered feature information; and the input information is obtained through the second connection layer. and the filtered feature information are added to obtain the output information of the residual projection block.
  • the upper projection module includes a transposed convolutional layer; correspondingly, the second motion compensation unit 1003 is also configured to perform a third filtering process on the input information of the residual projection block through the transposed convolutional layer, to obtain The first high-resolution feature information, wherein the resolution of the first high-resolution feature information obtained after the third filtering process is higher than the resolution of the input information of the residual projection block.
  • the third filtering process includes: upsampling.
  • the local feature fusion module includes a feature fusion layer and a third convolutional layer; correspondingly, the second motion compensation unit 1003 is also configured to fuse the M second high-resolution feature information through the feature fusion layer operation to obtain fusion feature information; and performing a convolution operation on the fusion feature information through a third convolution layer to obtain third high-resolution feature information.
  • the lower projection module includes a fourth convolutional layer; correspondingly, the second motion compensation unit 1003 is also configured to perform fourth filtering processing on the third high-resolution feature information through the fourth convolutional layer, to obtain The filtered feature information, wherein the resolution of the filtered feature information obtained after the fourth filtering process is lower than the resolution of the third high-resolution feature information.
  • the fourth filtering process includes: downsampling.
  • the sampling module includes a sub-pixel convolution layer; correspondingly, the second motion compensation unit 1003 is further configured to perform a second filtering process on the second feature information through the sub-pixel convolution layer to obtain third feature information , wherein the resolution of the third feature information obtained after the second filtering process is higher than the resolution of the second feature information.
  • the second filtering process includes: upsampling.
  • the reconstruction module includes a fifth convolutional layer; correspondingly, the second motion compensation unit 1003 is further configured to perform a convolution operation on the third feature information through the fifth convolutional layer to obtain a processing block.
  • the decoder 100 may further include a second training unit 1004 configured to determine a training data set, the training data set includes at least one training image; and perform preprocessing on the training data set to obtain the preprocessed Set the true value area of the neural network model and at least one set of input image groups; wherein, the input image group includes at least one input image; and based on the true value area, use at least one set of input image groups to train the neural network model to obtain at least A set of candidate model parameters; wherein, the true value area is used to determine the loss value of the loss function of the neural network model, and at least one set of candidate model parameters is obtained when the loss value of the loss function converges to a preset threshold.
  • a second training unit 1004 configured to determine a training data set, the training data set includes at least one training image; and perform preprocessing on the training data set to obtain the preprocessed Set the true value area of the neural network model and at least one set of input image groups; wherein, the input image group includes at least one input image;
  • the second determination unit 1002 is further configured to determine the quantization parameter of the current block; determine the model parameter corresponding to the quantization parameter from at least one group of candidate model parameters according to the quantization parameter; and determine the preset A neural network model; wherein, when at least one group is multiple groups, the input image groups correspond to different quantization parameters, and there is a corresponding relationship between the multiple groups of candidate model parameters and different quantization parameters.
  • the parsing unit 1001 is further configured to parse the code stream to obtain model parameters
  • the second determining unit 1002 is further configured to determine a preset neural network model according to model parameters.
  • a "unit” may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a module, or it may be non-modular.
  • each component in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules.
  • the integrated units are implemented in the form of software function modules and are not sold or used as independent products, they can be stored in a computer-readable storage medium.
  • this embodiment provides a computer storage medium, which is applied to the decoder 100, and the computer storage medium stores a computer program, and when the computer program is executed by the second processor, any one of the preceding embodiments is implemented. the method described.
  • FIG. 11 shows a schematic diagram of a specific hardware structure of the decoder 100 provided by the embodiment of the present application.
  • it may include: a second communication interface 1101 , a second memory 1102 and a second processor 1103 ; each component is coupled together through a second bus system 1104 .
  • the second bus system 1104 is used to realize connection and communication between these components.
  • the second bus system 1104 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as the second bus system 1104 in FIG. 11 . in,
  • the second communication interface 1101 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
  • the second memory 1102 is used to store computer programs that can run on the second processor 1103;
  • the second processor 1103 is configured to, when running the computer program, execute:
  • the first syntax element identification information indicates that the current block uses a motion compensation enhancement processing method, then parse the code stream to determine the first motion information of the current block;
  • a reconstructed block of the current block is determined.
  • the second processor 1103 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
  • the hardware function of the second memory 1102 is similar to that of the first memory 902, and the hardware function of the second processor 1103 is similar to that of the first processor 903; details will not be described here.
  • This embodiment provides a decoder, which may include an analysis unit, a second determination unit, and a second motion compensation unit.
  • a decoder which may include an analysis unit, a second determination unit, and a second motion compensation unit.
  • the encoder side by determining the first matching block of the current block; performing motion compensation enhancement on the first matching block, at least one second matching block is obtained; according to at least one second matching block, the current block is determined
  • the motion information of the current block is encoded according to the motion information.
  • both the encoder and the decoder can perform motion compensation enhancement on the first matching block. Under the premise of ensuring the same decoding quality, it can not only reduce the computational complexity, but also save the code rate, thereby improving the encoding and decoding efficiency. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例公开了一种编解码方法、码流、编码器、解码器以及存储介质,应用于编码器,该方法包括:确定当前块的第一匹配块;对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;根据至少一个第二匹配块,确定当前块的运动信息;根据运动信息,对当前块进行编码。这样,在保证相同解码质量的前提下,可以节省码率,进而能够提高编解码效率。

Description

编解码方法、码流、编码器、解码器以及存储介质 技术领域
本申请涉及视频处理技术领域,尤其涉及一种编解码方法、码流、编码器、解码器以及存储介质。
背景技术
在视频处理技术领域中,当前块的编解码可以采用帧内预测方式和帧间预测方式。其中,分像素运动补偿技术是通过消除视频时间冗余来提高压缩效率的关键技术,其主要应用于帧间预测的运动补偿和运动估计部分。
目前,针对二分之一精度的分像素运动补偿,虽然相关技术已经存在一些分像素运动补偿的技术方案,但是考虑到自然图像信号的非平稳性和编码噪声的非线性,现有的技术方案仍然存在一些缺陷,尤其是难以适应日益多样化的视频内容和复杂的编码环境,造成编解码效率偏低。
发明内容
本申请实施例提供一种编解码方法、码流、编码器、解码器以及存储介质,在保证相同解码质量的前提下,可以节省码率,进而能够提高编解码效率。
本申请实施例的技术方案可以如下实现:
第一方面,本申请实施例提供了一种编码方法,应用于编码器,该方法包括:
确定当前块的第一匹配块;
对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;
根据至少一个第二匹配块,确定当前块的运动信息;
根据运动信息,对当前块进行编码。
第二方面,本申请实施例提供了一种码流,该码流是根据待编码信息进行比特编码生成的;其中,
待编码信息至少包括当前块的运动信息、当前块的残差块和第一语法元素标识信息的取值,该第一语法元素标识信息用于指示当前块是否使用运动补偿增强处理方式。
第三方面,本申请实施例提供了一种解码方法,应用于解码器,该方法包括:
解析码流,确定第一语法元素标识信息的取值;
若第一语法元素标识信息指示当前块使用运动补偿增强处理方式,则解析码流,确定当前块的第一运动信息;
根据第一运动信息确定当前块的第一匹配块,并对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;
根据第一运动信息和至少一个第二匹配块,确定当前块的第一预测块;
根据第一预测块,确定当前块的重建块。
第四方面,本申请实施例提供了一种编码器,该编码器包括第一确定单元、第一运动补偿单元和编码单元;其中,
第一确定单元,配置为确定当前块的第一匹配块;
第一运动补偿单元,配置为对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;
第一确定单元,还配置为根据至少一个第二匹配块,确定当前块的运动信息;
编码单元,配置为根据运动信息,对当前块进行编码。
第五方面,本申请实施例提供了一种编码器,该编码器包括第一存储器和第一处理器;其中,
第一存储器,用于存储能够在第一处理器上运行的计算机程序;
第一处理器,用于在运行计算机程序时,执行如第一方面的方法。
第六方面,本申请实施例提供了一种解码器,该解码器包括解析单元、第二确定单元和第二运动补偿单元;其中,
解析单元,配置为解析码流,确定第一语法元素标识信息的取值;
解析单元,还配置为若第一语法元素标识信息指示当前块使用运动补偿增强处理方式,则解析码流,确定当前块的第一运动信息;
第二运动补偿单元,配置为根据第一运动信息确定当前块的第一匹配块,并对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;
第二确定单元,配置为根据第一运动信息和至少一个第二匹配块,确定当前块的第一预测块;以及根据第一预测块,确定当前块的重建块。
第七方面,本申请实施例提供了一种解码器,该解码器包括第二存储器和第二处理器;其中,
第二存储器,用于存储能够在第二处理器上运行的计算机程序;
第二处理器,用于在运行计算机程序时,执行如第三方面的方法。
第八方面,本申请实施例提供了一种计算机存储介质,该计算机存储介质存储有计算机程序,计算机程序被第一处理器执行时实现如第一方面的方法、或者被第二处理器执行时实现如第三方面的方法。
本申请实施例提供了一种编解码方法、码流、编码器、解码器以及存储介质,在编码器侧,通过确定当前块的第一匹配块;对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;根据至少一个第二匹配块,确定当前块的运动信息;根据运动信息,对当前块进行编码。在解码器侧,通过解析码流,确定第一语法元素标识信息的取值;若第一语法元素标识信息指示当前块使用运动补偿增强处理方式,则解析码流,确定当前块的第一运动信息;根据第一运动信息确定当前块的第一匹配块,并对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;根据第一运动信息和至少一个第二匹配块,确定当前块的第一预测块;根据第一预测块,确定当前块的重建块。这样,无论是编码器还是解码器,通过对第一匹配块进行运动补偿增强,在保证相同解码质量的前提下,不仅可以降低计算复杂度,还可以节省码率,进而能够提高编解码效率。
附图说明
图1为本申请实施例提供的一种帧间预测的表现形式示意图;
图2为本申请实施例提供的一种分像素精度的亮度分量的分数位置示意图;
图3A为本申请实施例提供的一种视频编码系统的组成框图示意图;
图3B为本申请实施例提供的一种视频解码系统的组成框图示意图;
图4为本申请实施例提供的一种编码方法的流程示意图;
图5为本申请实施例提供的一种预设神经网络模型的网络结构示意图;
图6为本申请实施例提供的一种残差投影块的网络结构示意图;
图7为本申请实施例提供的一种解码方法的流程示意图;
图8为本申请实施例提供的一种编码器的组成结构示意图;
图9为本申请实施例提供的一种编码器的具体硬件结构示意图;
图10为本申请实施例提供的一种解码器的组成结构示意图;
图11为本申请实施例提供的一种解码器的具体硬件结构示意图。
具体实施方式
为了能够更加详尽地了解本申请实施例的特点与技术内容,下面结合附图对本申请实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本申请实施例。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。还需要指出,本申请实施例所涉及的术语“第一\第二\第三”仅是用于区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
在视频图像中,一般采用第一图像分量、第二图像分量和第三图像分量来表征编码块(Coding Block,CB);其中,这三个图像分量分别为一个亮度分量、一个蓝色色度分量和一个红色色度分量,具体地,亮度分量通常使用符号Y表示,蓝色色度分量通常使用符号Cb或者U表示,红色色度分量通常使用符号Cr或者V表示;这样,视频图像可以用YCbCr格式表示,也可以用YUV格式表示。
对本申请实施例进行进一步详细说明之前,先对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释:
动态图像专家组(Moving Picture Experts Group,MPEG)
国际标准化组织(International Standardization Organization,ISO)
国际电工委员会(International Electrotechnical Commission,IEC)
联合视频专家组(Joint Video Experts Team,JVET)
开放媒体联盟(Alliance for Open Media,AOM)
新一代视频编码标准H.266/多功能视频编码(Versatile Video Coding,VVC)
VVC的参考软件测试平台(VVC Test Model,VTM)
编码单元(Coding Unit,CU)
编码树单元(Coding Tree Unit,CTU)
预测单元(Prediction Unit,PU)
离散余弦变换(Discrete Cosine Transform,DCT)
基于离散余弦变换的插值滤波器(Interpolation Filter based on Discrete Cosine Transform,DCTIF)
残差投影块(Residual Projection Block,RPB)
残差投影网络(Residual Projection Network,RPN)
峰值信噪比(Peak Signal to Noise Ratio,PSNR)
可以理解,在视频处理技术领域中,分像素运动补偿技术是通过消除视频时间冗余来提高压缩效率的关键技术,其主要应用于帧间预测的运动补偿和运动估计部分。这里,帧间预测是利用解码重建的参考帧对当前帧进行预测的过程,其核心是根据当前块的运动信息通过运动补偿从参考帧中获取最优匹配块(也可以称为“最佳匹配块”)。其中,运动信息可以包括预测方向、参考帧的索引序号和运动向量。参见图1,其示出了本申请实施例提供的一种帧间预测的表现形式示意图。如图1所示,对于编码器而言,编码器采用一定的搜索算法给当前帧要编码的当前块找到一个最优匹配块,两者之间的位移称为运动向量,这一过程可称为运动估计。
具体来讲,编码器首先需要进行整像素的运动估计,获得整像素位置的最优匹配块。为了进一步提高预测精度,分像素运动补偿的概念被提出。所谓分像素运动补偿,就是通过插值滤波器对整像素位置的最优匹配块进行插值,生成二分之一精度的分像素样本和四分之一精度的分像素样本。如图2所示,其示出了本申请实施例提供的一种分像素精度的亮度分量的分数位置示意图。在图2中,大写字母代表整像素样本,即A i,j表示整数位置的像素;小写字母代表分像素样本,其中,b i,j、h i,j、j i,j均表示二分之一精度位置的分像素,其余小写字母则表示四分之一精度位置的分像素。
也就是说,分像素运动补偿的本质是对整像素位置的匹配块采用插值滤波方式进一步优化,其中,插值滤波器的主要作用包括去除由于数字采样造成的频谱混叠和抑制参考帧中存在的编码噪声。然而,针对二分之一精度的分像素运动补偿,但是考虑到自然图像信号的非平稳性和编码噪声的非线性,现有的技术方案仍然存在一些缺陷,尤其是难以适应日益多样化的视频内容和复杂的编码环境,造成编解码效率偏低。
本申请实施例提供了一种编码方法,通过确定当前块的第一匹配块;对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;根据至少一个第二匹配块,确定当前块的运动信息;根据运动信息,对当前块进行编码。
本申请实施例提供了一种解码方法,通过解析码流,确定第一语法元素标识信息的取值;若第一语法元素标识信息指示当前块使用运动补偿增强处理方式,则解析码流,确定当前块的第一运动信息;根据第一运动信息确定当前块的第一匹配块,并对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;根据第一运动信息和至少一个第二匹配块,确定当前块的第一预测块;根据第一预测块,确定当前块的重建块。
这样,无论是编码器还是解码器,通过对第一匹配块进行运动补偿增强,不仅可以适应多样化的视频内容和复杂的编码环境,而且在保证相同解码质量的前提下,可以降低计算复杂度,还可以节省码率,进而提高编解码效率。
下面将结合附图对本申请各实施例进行详细说明。
参见图3A,其示出了本申请实施例提供的一种视频编码系统的组成框图示意图。如图3A所示,该视频编码系统10包括变换与量化单元101、帧内估计单元102、帧内预测单元103、运动补偿单元104、运动估计单元105、反变换与反量化单元106、滤波器控制分析单元107、滤波单元108、编码单元109和解码图像缓存单元110等,其中,滤波单元108可以实现去方块滤波及样本自适应缩进(Sample  Adaptive Offset,SAO)滤波,编码单元109可以实现头信息编码及基于上下文的自适应二进制算术编码(Context-based Adaptive Binary Arithmetic Coding,CABAC)。针对输入的原始视频信号,通过编码树块(Coding Tree Unit,CTU)的划分可以得到一个视频编码块,然后对经过帧内或帧间预测后得到的残差像素信息通过变换与量化单元101对该视频编码块进行变换,包括将残差信息从像素域变换到变换域,并对所得的变换系数进行量化,用以进一步减少比特率;帧内估计单元102和帧内预测单元103是用于对该视频编码块进行帧内预测;明确地说,帧内估计单元102和帧内预测单元103用于确定待用以编码该视频编码块的帧内预测模式;运动补偿单元104和运动估计单元105用于执行所接收的视频编码块相对于一或多个参考帧中的一或多个块的帧间预测编码以提供时间预测信息;由运动估计单元105执行的运动估计为产生运动向量的过程,所述运动向量可以估计该视频编码块的运动,然后由运动补偿单元104基于由运动估计单元105所确定的运动向量执行运动补偿;在确定帧内预测模式之后,帧内预测单元103还用于将所选择的帧内预测数据提供到编码单元109,而且运动估计单元105将所计算确定的运动向量数据也发送到编码单元109;此外,反变换与反量化单元106是用于该视频编码块的重构建,在像素域中重构建残差块,该重构建残差块通过滤波器控制分析单元107和滤波单元108去除方块效应伪影,然后将该重构残差块添加到解码图像缓存单元110的帧中的一个预测性块,用以产生经重构建的视频编码块;编码单元109是用于编码各种编码参数及量化后的变换系数,在基于CABAC的编码算法中,上下文内容可基于相邻编码块,可用于编码指示所确定的帧内预测模式的信息,输出该视频信号的码流;而解码图像缓存单元110是用于存放重构建的视频编码块,用于预测参考。随着视频图像编码的进行,会不断生成新的重构建的视频编码块,这些重构建的视频编码块都会被存放在解码图像缓存单元110中。
参见图3B,其示出了本申请实施例提供的一种视频解码系统的组成框图示意图。如图3B所示,该视频解码系统20包括解码单元201、反变换与反量化单元202、帧内预测单元203、运动补偿单元204、滤波单元205和解码图像缓存单元206等,其中,解码单元201可以实现头信息解码以及CABAC解码,滤波单元205可以实现去方块滤波以及SAO滤波。输入的视频信号经过图3A的编码处理之后,输出该视频信号的码流;该码流输入视频解码系统20中,首先经过解码单元201,用于得到解码后的变换系数;针对该变换系数通过反变换与反量化单元202进行处理,以便在像素域中产生残差块;帧内预测单元203可用于基于所确定的帧内预测模式和来自当前帧或图片的先前经解码块的数据而产生当前视频解码块的预测数据;运动补偿单元204是通过剖析运动向量和其他关联语法元素来确定用于视频解码块的预测信息,并使用该预测信息以产生正被解码的视频解码块的预测性块;通过对来自反变换与反量化单元202的残差块与由帧内预测单元203或运动补偿单元204产生的对应预测性块进行求和,而形成解码的视频块;该解码的视频信号通过滤波单元205以便去除方块效应伪影,可以改善视频质量;然后将经解码的视频块存储于解码图像缓存单元206中,解码图像缓存单元206存储用于后续帧内预测或运动补偿的参考图像,同时也用于视频信号的输出,即得到了所恢复的原始视频信号。
需要说明的是,本申请实施例可以应用在视频编码系统10(可简称为“编码器”)的帧间预测部分,具体是如图3A所示的运动补偿单元104和运动估计单元105;本申请实施例还可以应用在视频解码系统20(可简称为“解码器”)的帧间预测部分,具体是如图3B所示的运动补偿单元204。也就是说,本申请实施例既可以应用于编码器,也可以应用于解码器,甚至还可以同时应用于编码器和解码器,但是这里不作具体限定。
还需要说明的是,当本申请实施例的方法应用于编码器时,“当前块”具体是指待编码图像中当前待进行帧间预测的编码块;当本申请实施例的方法应用于解码器时,“当前块”具体是指待解码图像中当前待进行帧间预测的解码块。
本申请的一实施例中,参见图4,其示出了本申请实施例提供的一种编码方法的流程示意图。如图4所示,该方法可以包括:
S401:确定当前块的第一匹配块。
需要说明的是,对于一视频图像而言,视频图像可以划分为多个图像块,每个待编码的图像块均可以称为编码块,而这里的当前块具体是指当前待进行帧间预测的编码块。其中,当前块可以是一个CTU,甚至可以是一个CU、PU等,本申请实施例不作任何限定。
还需要说明的是,本申请实施例的编码方法主要应用于帧间预测的运动估计和运动补偿部分。其中,运动补偿是利用解码重建的参考帧中的局部图像来预测和补偿当前的局部图像,可以减少运动图像的冗余信息;运动估计是从视频序列中抽取运动信息,即对运动物体从参考帧到当前帧之间的位移信息做出估计,也即本申请实施例所述的运动信息,这个过程称为运动估计。
在本申请实施例中,这里的第一匹配块可以是根据整像素运动估计得到的,也可以是使用相关技术 的分像素插值滤波方式得到的,本申请实施例并不作任何限定。
以整像素运动估计为例,在一种具体的示例中,所述确定当前块的第一匹配块,可以包括:
对当前块进行整像素运动估计,确定当前块的第一匹配块。
具体来讲,对当前块进行整像素运动估计,确定当前块在整像素位置的目标匹配块,将整像素位置的目标匹配块确定为第一匹配块;其中,整像素位置的目标匹配块(或称之为“第一匹配块”)为当前块在整像素位置进行运动估计时率失真代价值最小的匹配块。
在本申请实施例中,运动估计方法主要包括像素递归法和块匹配法两大类。其中,前者复杂度高,实际中应用较少;后者在视频编码标准中被广泛采用。具体地,在块匹配法中,主要包括块匹配准则及搜索方法。目前有三种常用的匹配准则:(1)绝对误差和(Sum Absolute Difference,SAD)准则;(2)均方误差(Mean Square Error,MSE)准则;(3)归一化互相关函数(Normalized Cross Correlation Function,NCCF)准则。在确定匹配准则后就需要进行寻找最优匹配块的搜索工作,例如可以采用全搜索法、三部搜索法、菱形搜索法等等。
这里,在整像素运动估计过程中,针对多个整像素位置的匹配块,需要计算每一个整像素位置的匹配块所对应的率失真代价值(Rate Distortion Cost),然后选取出率失真代价值最小情况下的整像素位置的匹配块,即为最优匹配块,也即本申请实施例所述的目标匹配块。也就是说,整像素位置的目标匹配块为从多个整像素位置的匹配块中选择出最小率失真代价值对应的匹配块。
S402:对第一匹配块进行运动补偿增强,得到至少一个第二匹配块。
需要说明的是,在获取到第一匹配块之后,为了进一步提高预测精度,本申请实施例还可以进行运动补偿增强。
在一种可能的实施方式中,对于运动补偿增强而言,视频编码标准中通常采用DCTIF进行二分之一精度的分像素样本插值。其基本思想是对整像素样本进行正变换到DCT域,然后使用目标分像素位置采样的DCT基将DCT系数反变换回空域,这一过程可以用一个有限脉冲响应滤波过程来表示。其中,假定给定像素表示为f(i),插值得到的像素表示为
Figure PCTCN2021096818-appb-000001
那么DCTIF插值过程的数学形式如式(1)所示。
Figure PCTCN2021096818-appb-000002
在式(1)中,
Figure PCTCN2021096818-appb-000003
Figure PCTCN2021096818-appb-000004
基于DCTIF的基本原理,在实际应用中,可以得出二分之一精度的分像素样本的插值滤波器的抽头系数为[-1,4,-11,40,40,-11,4,-1]。
在该实施方式中,针对二分之一精度的运动补偿增强,考虑到自然图像信号的非平稳性和编码噪声的非线性,如果采用基于固定抽头的线性插值滤波器DCTIF,将难以适应日益多样化的视频内容和复杂的编码环境。
在另一种可能的实施方式中,对于运动补偿增强而言,本申请实施例提出了一种基于预设神经网络模型的运动补偿增强处理方式。具体地,在一些实施例中,所述对第一匹配块进行运动补偿增强的步骤还可以包括:利用预设神经网络模型对第一匹配块进行运动补偿增强。
相应地,所述对第一匹配块进行运动补偿增强,得到至少一个第二匹配块,可以包括:
利用预设神经网络模型对第一匹配块进行超分辨率和质量增强处理,得到处理块;
对处理块进行第一滤波处理,得到至少一个第二匹配块。
需要说明的是,处理块的分辨率高于当前块的分辨率。或者,也可以说,经过超分辨率和质量增强处理后所得到的处理块具有高质量和高分辨率性能。
还需要说明的是,第一匹配块与当前块具有相同的分辨率,而且经过第一滤波处理后得到的第二匹配块也具有与当前块相同的分辨率。
进一步地,在一些实施例中,第一滤波处理可以包括:下采样。也就是说,在得到处理块之后,对处理块进行下采样,可以得到至少一个第二匹配块。
在一种具体的示例中,如果第一匹配块为整像素匹配块,所述利用预设神经网络模型对第一匹配块进行运动补偿增强,得到至少一个第二匹配块,可以包括:
利用预设神经网络模型对第一匹配块进行分像素运动补偿,得到至少一个第二匹配块。
在这里,如果第一匹配块为整像素匹配块,那么在一种可能的实施方式中,第二匹配块的精度为二 分之一精度,该第二匹配块的数量为四个;在另一种可能的实施方式中,第二匹配块的精度为四分之一精度,该第二匹配块的数量为16个;但是本申请实施例不作任何限定。
可以理解地,预设神经网络模型为一种卷积神经网络(Convolutional Neural Networks,CNN)模型。其中,CNN是一类包含卷积计算且具有深度结构的前馈神经网络,是深度学习的代表算法之一。这里,卷积神经网络具有表征学习能力,能够按其阶层结构对输入信息进行平移不变分类(shift-invariant classification),故也可称为“平移不变人工神经网络(Shift-Invariant Artificial Neural Networks,SIANN)”。
也就是说,本实施方式与上述实施方式中插值滤波器对第一匹配块插值滤波出三个二分之一精度的分像素样本不同,本实施方式是利用卷积神经网络模型对第一匹配块实现端到端的超分辨率和质量增强,然后对输出的高分辨率图像进行下采样,可以生成四个二分之一精度的分像素样本(即“第二匹配块”)。
进一步地,针对预设神经网络模型而言,在一些实施例中,预设神经网络模型可以包括特征提取模块、残差投影模块组、采样模块和重建模块。其中,特征提取模块、残差投影模块组、采样模块和重建模块顺次连接。
在一种具体的示例中,基于预设神经网络模型的具体结构,所述对第一匹配块进行超分辨率和质量增强处理,得到处理块,可以包括:
通过特征提取模块对第一匹配块进行浅层特征提取,得到第一特征信息;
通过残差投影模块组对第一特征信息进行残差特征学习,得到第二特征信息;
通过采样模块对第二特征信息进行第二滤波处理,得到第三特征信息;
通过重建模块对第三特征信息进行超分辨率重建,得到处理块。
在这里,对于特征提取模块,其主要是用于进行浅层特征的提取,故特征提取模块又可以称为“浅层特征提取模块”。其中,本申请实施例的浅层特征主要是指低层次的简单特征(如边缘特征等)。
在一种具体的示例中,特征提取模块可以包括第一卷积层。相应地,所述通过特征提取模块对第一匹配块进行浅层特征提取,得到第一特征信息,可以包括:通过第一卷积层对第一匹配块进行卷积操作,得到第一特征信息。
在一些实施例中,第一卷积层的卷积核尺寸为K×L,第一卷积层的卷积核数量为2的整数次幂,K和L为大于零的正整数。在一种更具体的示例中,第一卷积层的卷积核尺寸可以为3×3,第一卷积层的卷积核数量为64,但是这里并不作任何限定。
进一步地,对于残差投影模块组,在一种具体的示例中,残差投影模块组可以包括N个残差投影块、第二卷积层和第一连接层;其中,N为大于或等于1的整数。
在本申请实施例中,N个残差投影块、第二卷积层和第一连接层顺次连接,且第一连接层还与N个残差投影块中第一个残差投影块的输入连接。
相应地,所述通过残差投影模块组对第一特征信息进行残差特征学习,得到第二特征信息,包括:
通过N个残差投影块对第一特征信息进行残差特征学习,得到第一中间特征信息;
通过第二卷积层对第一中间特征信息进行卷积操作,得到第二中间特征信息;
通过第一连接层对第一特征信息和第二中间特征信息进行加法计算,得到第二特征信息。
在一些实施例中,第二卷积层的卷积核尺寸为K×L,第二卷积层的卷积核数量为2的整数次幂,K和L为大于零的正整数。在一种更具体的示例中,第二卷积层的卷积核尺寸为3×3,第二卷积层的卷积核数量为64,但是这里也不作任何限定。
需要说明的是,第d个残差投影块的输入用F d-1表示,第d个残差投影块的输出用F d表示。如果仅通过堆叠残差投影块的方式并不会达到更好的效果,这时候可以引入第一连接层,即长跳跃连接(Long Skip Connection,LSC)层;甚至还在N个残差投影块之后引入了第二卷积层,通过这种残差学习的方式,可以简化预设神经网络模型中的信息流,使得预设神经网络模型的性能更加稳定。
还需要说明的是,针对堆叠残差投影块的方式,N个残差投影块是级联结构,该级联结构的输入为所述第一特征信息,该级联结构的输出为所述第二中间特征信息。在一些实施例中,所述通过N个残差投影块对第一特征信息进行残差特征学习,得到第一中间特征信息,可以包括:
当N等于1时,将第一特征信息输入到第一个残差投影块,得到第一个残差投影块的输出信息,并将第一个残差投影块的输出信息确定为第一中间特征信息;
当N大于1时,在得到第一个残差投影块的输出信息后,将第d个残差投影块的输出信息输入到第d+1个残差投影块,得到第d+1个残差投影块的输出信息,并对d执行加1处理,直至得到第N个残差投影块的输出信息,并将第N个残差投影块的输出信息确定为第一中间特征信息;其中,d为大于或等于1且小于N的整数。
在这里,如果N等于1,即残差投影模块组中仅包括1个残差投影块,该残差投影块的输出信息即为第一中间特征信息;如果N大于1,即残差投影模块组中包括有两个以上残差投影块,通过堆叠残差 学习方式,即前一个残差投影块的输出即为下一个残差投影块的输入,直至堆叠得到最后一个残差投影块的输出信息,这时候最后一个残差投影块的输出信息即为第一中间特征信息。
进一步地,对于采样模块,在一种具体的示例中,采样模块可以包括亚像素卷积层。相应地,所述通过采样模块对第二特征信息进行第二滤波处理,得到第三特征信息,可以包括:
通过亚像素卷积层对第二特征信息进行第二滤波处理,得到第三特征信息。
需要说明的是,经过第二滤波处理后得到的第三特征信息的分辨率高于第二特征信息的分辨率。
还需要说明的是,第二滤波处理可以包括:上采样。也就是说,对于采样模块,其主要是用于对第二特征信息进行上采样,故该采样模块又可以称为“上采样模块”。
还需要说明的是,该采样模块可以采用亚像素卷积层,也可以是增加一层亚像素卷积层。在这里,亚像素卷积层也可以是PixShuffle模块(或者,称为Pixelshuffle模块),其实现的功能是:将一个H×W的低分辨率输入图像,通过亚像素(Sub-pixel)操作将其变换为rH×rW的高分辨率输入图像。但是其实现过程不是直接通过插值等方式产生这个高分辨率图像,而是通过卷积先得到r 2个通道的特征图(特征图大小和输入低分辨率图像一致),然后通过周期筛选(periodic shuffling)的方法得到这个高分辨率图像;其中,r可以为图像的放大倍数。具体地,针对特征图,其通道数为r 2,将每个像素的r 2个通道重新排列成一个r×r的区域,对应于高分辨率图像中的一个r×r大小的子块,从而大小为r 2×H×W的特征图像被重新排列成1×rH×rW大小的高分辨率图像。
进一步地,对于重建模块,在一种具体的示例中,重建模块可以包括第五卷积层。相应地,所述通过重建模块对第三特征信息进行超分辨率重建,得到处理块,可以包括:
通过第五卷积层对第三特征信息进行卷积操作,得到处理块。
在本申请实施例中,第五卷积层的卷积核尺寸为K×L,第五卷积层的卷积核数量为2的整数次幂,K和L为大于零的正整数。在一种更具体的示例中,第五卷积层的卷积核尺寸为3×3,第五卷积层的卷积核数量为1,但是这里也不作任何限定。
示例性地,参见图5,其示出了本申请实施例提供的一种预设神经网络模型的网络结构示意图。该预设神经网络模型可以用RPNet表示。如图5所示,RPNet主要包括浅层特征提取网络层(Shallow Feature Extraction Net)、残差投影模块组(Residual Projection Blocks)、上采样网络层(Up-sampling Net)和重建网络层(Reconstruction Net)等四部分。其中,浅层特征提取网络层即本申请实施例所述的特征提取模块,其可以为第一卷积层;上采样网络层即本申请实施例所述的采样模块,其可以为亚像素卷积层;重建网络层即本申请实施例所述的重建模块,其可以为第五卷积层。在这里,假定I LR表示本申请实施例所述的第一匹配块,即RPNet输入的低分辨率图像;I SR表示本申请实施例所述的处理块,即RPNet输出的高分辨率图像(也可称为超分辨率图像);也就是说,I LR和I SR分别表示RPNet的输入和输出。下面将结合图5对该模型的网络结构进行具体说明。
首先,本申请实施例只使用一层卷积层对输入的低分辨率图像进行浅层特征提取,其数学形式如式(4)所示,
F 0=H SFE(I LR)              (4)
其中,H SFE(·)表示卷积操作,F 0表示提取到的低分辨率图像的浅层特征,并作为残差投影模块的输入。
其次,假定RPNet中包含N个残差投影块,那么第d个残差投影块的运算可以用式(5)描述,
F d=H RPB,d(F d-1)=H RPB,d(H RPB,d-1(...(H RPB,1(F 0))...))          (5)
其中,F d-1和F d分别表示第d个残差投影块的输入和输出。经验表明,仅通过堆叠残差投影块的方式并不会取得更好的效果。为了解决这个问题,这里引入了长跳跃连接层,其数学形式如式(6)所示,
F DF=F 0+W LSCF N=F 0+W LSCH RPB,N(H RPB,N-1(...(H RPB,1(F 0))...))          (6)
其中,W LSC表示第N个残差投影块之后卷积层的权重值。通过这种全局残差学习(Global Residual Leaning,GRL)的方式,简化了网络结构中的信息流,使网络模型在训练过程中变得更加稳定,并且利用残差投影块直接学习残差信息,为网络模型性能的提升带来了可能。
再次,通过上采样网络层对残差特征F DF进行上采样,其数学形式如式(7)所示,
F UP=H UP(F DF)                (7)
其中,H UP(·)表示卷积操作以实现上采样,F UP表示提取到的第三特征信息,并作为重建网络层的输入。
最后,通过重建网络层生成高分辨率图像,数学形式如式(8)所示,
I SR=H REC(F UP).                 (8)
其中,H REC(·)表示卷积操作以实现超分辨率重建。
还可以理解,考虑到低分辨率特征与高分辨率特征之间的联系,受投影思想的启发,本申请实施例提出了残差投影模块(或简称为“残差投影块”)。在一些实施例中,残差投影块可以包括上投影模块、M个残差模块、局部特征融合模块、下投影模块和第二连接层;其中,M为大于或等于1的整数。
在本申请实施例中,上投影模块、M个残差模块、局部特征融合模块、下投影模块和第二连接层顺次连接,且第二连接层还与上投影模块的输入连接,M个残差模块的输出还分别与局部特征融合模块连接。
相应地,对于残差投影块的具体网络结构,该方法还可以包括:
通过上投影模块对残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息;
通过M个残差模块对第一高分辨率特征信息进行不同等级的高分辨率特征学习,得到M个第二高分辨率特征信息;
通过局部特征融合模块对M个第二高分辨率特征信息进行融合操作,得到第三高分辨率特征信息;
通过下投影模块对第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息;
通过第二连接层对输入信息和滤波后特征信息进行加法计算,得到残差投影块的输出信息。
进一步地,对于上投影模块,在一种具体的示例中,上投影模块可以包括转置卷积层。相应地,所述通过上投影模块对残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息,可以包括:
通过转置卷积层对残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息。
需要说明的是,经过第三滤波处理后得到的第一高分辨率特征信息的分辨率高于残差投影块的输入信息的分辨率。
还需要说明的是,第三滤波处理可以包括:上采样。也就是说,通过转置卷积层对残差投影块的输入信息进行上采样,得到第一高分辨率特征信息。
进一步地,对于局部特征融合模块,在一种具体的示例中,局部特征融合模块可以包括特征融合层和第三卷积层,所述通过局部特征融合模块对M个第二高分辨率特征信息进行融合操作,得到第三高分辨率特征信息,包括:
通过特征融合层对M个第二高分辨率特征信息进行融合操作,得到融合特征信息;
通过第三卷积层对融合特征信息进行卷积操作,得到第三高分辨率特征信息。
在本申请实施例中,第三卷积层的卷积核尺寸为K×L,第三卷积层的卷积核数量为2的整数次幂,K和L为大于零的正整数。在一种更具体的示例中,第三卷积层的卷积核尺寸为1×1,第三卷积层的卷积核数量为64,但是这里并不作任何限定。
也就是说,在本申请实施例中,通过特征融合层对M个第二高分辨率特征信息进行融合操作,但是为了充分发挥残差网络的学习能力,这里还引入了1×1卷积层对残差模块学习到的特征信息进行融合操作,能够自适应地控制学习到的特征信息。
进一步地,对于下投影模块,在一种具体的示例中,下投影模块可以包括第四卷积层,所述通过下投影模块对第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息,包括:
通过第四卷积层对第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息。
需要说明的是,经过第四滤波处理后得到的滤波后特征信息的分辨率低于第三高分辨率特征信息的分辨率。另外,经过第四滤波处理后得到的滤波后特征信息与残差投影块的输入信息具有相同的分辨率。
还需要说明的是,第四滤波处理可以包括:下采样。也就是说,通过第四卷积层对第三高分辨率特征信息进行下采样,得到滤波后特征信息。
示例性地,参见图6,其示出了本申请实施例提供的一种残差投影块的网络结构示意图。该残差投影块可以用RPB表示。如图6所示,RPB主要包括上投影模块(Up-Projection Unit)、残差模块(Residual Block)、局部特征融合模块(Local Feature Fusion)和下投影模块(Down-Projection Unit)等。其中,对于第d个残差投影块,假定其中包含M个残差模块;其具体连接关系详见图6。
首先,上投影模块采用转置卷积层对输入的低分辨率特征进行上采样,数学形式如式(9)所示,
F d,0=(F d-1*p t)↑ s            (9)
其中,*表示空间卷积操作,F d-1表示第d个残差投影块的输入,p t表示转置卷积,↑ s表示缩放因子为s的上采样,F d,0表示第1个残差模块的输入。在这里,[F d,1,...,F d,M]分别表示M个残差模块的输出。
其次,为了充分发挥残差网络的学习能力,局部特征融合包括特征融合层和第三卷积层,其中引入1×1的第三卷积层对残差模块学习到的特征进行融合操作,自适应的控制学习到的特征信息,其数学形式如式(10)所示,
Figure PCTCN2021096818-appb-000005
再次,下投影模块采用第四卷积层的卷积操作对F d,LFF进行下采样,实现使用高分辨率特征指导低分辨率特征的效果,最终与通过F d-1进行像素级相加得到F d,其数学形式如式(11)所示,
F d=F d-1+(F d,LFF*q t)↓ s               (11)
其中,*表示空间卷积操作,q t表示卷积层,↓ s表示缩放因子为s的下采样。
也就是说,本申请实施例结合转置卷积和残差模块提出了残差投影块RPB,其基本思想是利用转置卷积层将低分辨率特征投影到高分辨率特征空间,再利用残差模块学习不同等级的高分率特征,然后通过局部特征融合提高残差模块的表达能力,最后利用卷积层将该高分率特征投影回低分辨率特征空间。这样,基于该模块,本申请实施例提出了二分之一精度的分像素插值的预设神经网络模型RPNet,并将训练好的模型嵌入到编码平台VTM7.0。如此,在视频编码过程中,本申请实施例可以只对尺寸大于或等于64×64的PU选择RPNet进行运动补偿增强;对于尺寸小于64×64的PU仍然按照相关技术中的插值滤波器进行运动补偿增强。
另外,以第一匹配块为整像素匹配块为例,对于运动补偿增强来说,本申请实施例的方法可以实现二分之一精度的分像素运动补偿,也可以实现四分之一精度的分像素运动补偿,甚至还可以实现其他精度的分像素运动补偿,本申请实施例不作任何限定。
在一种具体的示例中,当分像素样值的精度为二分之一精度时,转置卷积层和第四卷积层的卷积核尺寸均为6×6,步长和填充值均设置为2;或者,当分像素样值的精度为四分之一精度时,转置卷积层和第四卷积层的卷积核尺寸均为8×8,步长和填充值分别设置为4和2。
换句话说,在模型结构中,对于RPNet而言,在一种具体的示例中,RPNet中残差投影块的数量N可设置为10,每个残差投影块中残差模块的数量M可设置为3。除了重建网络层中的卷积层的卷积核个数设置为1外,网络模型中其他转置卷积层或卷积层的卷积核个数都设置为64。上投影模块中的转置卷积层和下投影模块中的卷积层中卷积核的尺寸设置为6×6,步长和填充设置为2。除此之外,网络模型中其他卷积层均采用尺寸3×3的卷积核,而上采样模块可以采用亚像素卷积层。
在另一种具体的示例中,本申请实施例中的RPNet还可以用于所有尺寸的PU进行二分之一精度的分像素样本插值。另外,虽然所有的更改都可能对最终视频的质量产生影响,但是本申请实施例中的RPNet还可以调整残差投影块的数量以及调整残差投影块中残差模块的数量。甚至本申请实施例中的RPNet还可以用于四分之一精度的分像素运动补偿,这时候将上投影模块中的转置卷积层和下投影模块中的卷积层中卷积核的尺寸设置为8×8,步长和填充分别设置为4和2,并在上采样模块中增加一层亚像素卷积层。
还可以理解,对于预设神经网络模型,其可以是通过模型训练得到的。在一些实施例中,该方法还可以包括:
确定训练数据集,该训练数据集包括至少一张训练图像;
对训练数据集进行预处理,得到预设神经网络模型的真值区域以及至少一组输入图像组;其中,该输入图像组包括至少一张输入图像;
基于真值区域,利用这至少一组输入图像组对神经网络模型进行训练,得到至少一组候选模型参数;其中,真值区域用于确定神经网络模型的损失函数的损失值(Loss),这至少一组候选模型参数是在损失函数的损失值收敛到预设阈值时得到的。
需要说明的是,在数据集方面,本申请实施例可以选择一种公开数据集(如DIV2K数据集),其中包含800张训练图像和100张验证图像。在具体的操作过程中,对DIV2K数据集的预处理主要包括格式转换与编码重建两个步骤。首先,对训练集中的800张高分辨率图像和测试集中的100张高分辨率图像以及二者对应的低分辨率图像进行格式转换,由原始的PNG格式转换为YUV420格式。然后,对YUV420格式的高分辨率图像数据提取亮度分量并保存为PNG格式,作为真值区域(Ground Truth)。对YUV420格式的低分辨率图像数据则利用VTM7.0进行全帧内编码,量化参数(Quantization Parameter,QP)可以分别设置为22、27、32、37,再对四组解码重建的数据分别提取亮度分量并保存为PNG格式,作为神经网络模型的输入。由此,可以获得四组训练数据集。
在评价标准方面,本申请实施例选择峰值信噪比(Peak Signal-to-Noise Ratio,PSNR)作为图像重建质量的评价标准。
在网络训练方面,模型是基于Pytorch平台训练的。将尺寸为48×48的低分辨率图像作为输入,批(batch)设置为16。本申请实施例可以选择平均绝对误差作为损失函数,自适应矩估计作为优化函数,并将动量和权重衰减分别设置为0.9和0.0001。初始学习率设置为0.0001,并且每经过100个时期(epoch)则按照0.1的比例下降,一共经过300个epoch。根据不同的QP,利用相应的数据集进行训练,可以得到四组模型参数,这四组模型参数对应四个模型,分别用RPNet_qp22、RPNet_qp27、RPNet_qp32、RPNet_qp37表示。
进一步地,模型训练之后,在一些实施例中,该方法还可以包括:
确定当前块的量化参数;
根据所述量化参数,从至少一组候选模型参数中确定所述量化参数对应的模型参数;
根据所述模型参数,确定预设神经网络模型。
在这里,当至少一组为多组时,输入图像组对应不同的量化参数,且多组候选模型参数与不同的量化参数之间具有对应关系。
也就是说,在模型训练之后,根据当前块的量化参数,可以确定出该量化参数对应的训练好后的模型参数,进而能够确定出本申请实施例使用的预设神经网络模型。
进一步地,在一些实施例中,该方法还可以包括:对所述模型参数进行编码,将编码比特写入码流。
需要说明的是,一方面,如果编码器和解码器使用相同的固定参数的预设神经网络模型,那么这时候参数已经固化,故不需要进行模型参数传输;另一方面,如果码流中传输公共训练数据集的访问信息,例如一个统一资源定位系统(Uniform Resource Locator,URL),解码器使用与编码器相同的方式进行训练;再一方面,对于编码器,可以使用已编码的视频序列进行学习。
还需要说明的是,如果编码器将模型参数写入码流,那么解码器可以不再进行模型训练,通过解析码流获得模型参数后,即可确定出本申请实施例使用的预设神经网络模型。
这样,在确定出预设神经网络模型之后,可以利用预设神经网络模型对第一匹配块进行运动补偿增强,得到至少一个第二匹配块。
S403:根据至少一个第二匹配块,确定当前块的运动信息。
需要说明的是,如果第一匹配块为整像素匹配块,在得到至少一个第二匹配块之后,本申请实施例还需要进行分像素运动估计。在一些实施例中,该方法还可以包括:
根据至少一个第二匹配块对当前块进行分像素运动估计,确定当前块的分像素匹配块。
具体来讲,根据至少一个第二匹配块对当前块进行分像素运动估计,确定当前块在分像素位置的目标匹配块,将分像素位置的目标匹配块确定为所述分像素匹配块;其中,分像素位置的目标匹配块(可称之为“分像素匹配块”)为当前块在分像素位置进行运动估计时率失真代价值最小的匹配块。
在这里,在分像素运动估计过程中,针对多个第二匹配块,需要计算每一个分像素位置的匹配块所对应的率失真代价值(Rate Distortion Cost),然后选取出率失真代价值最小情况下的分像素位置的匹配块,即为最优匹配块,也即本申请实施例所述的目标匹配块。也就是说,分像素位置的目标匹配块(或简称为“分像素匹配块”)为从这多个第二匹配块中选择出最小率失真代价值对应的匹配块。
在一些实施例中,所述根据至少一个第二匹配块,确定当前块的运动信息,可以包括:
利用第一匹配块对当前块进行预编码处理,确定第一率失真代价值;
利用分像素匹配块对当前块进行预编码处理,确定第二率失真代价值;
若第一率失真代价值大于第二率失真代价值,则确定当前块使用运动补偿增强处理方式,且确定运动信息为第一运动信息,该第一运动信息用于指向分像素位置;
若第一率失真代价值小于或等于第二率失真代价值,则确定当前块不使用运动补偿增强处理方式,且确定运动信息为第二运动信息,该第二运动信息用于指向整像素位置。
需要说明的是,当前块是使用运动补偿增强处理方式,还是不使用运动补偿增强处理方式(或者说,使用整像素运动补偿方式),本申请实施例是根据计算率失真代价值的大小确定的。也就是说,编码器最终选取率失真代价值最小的方式进行预测编码。
还需要说明的是,如果确定当前块使用运动补偿增强处理方式,那么这时候的运动信息为第一运动信息,即用于指向分像素位置(即“分像素精度位置”),此时在解码器中也需要进行运动补偿增强以插值出第二匹配块。否则,如果确定当前块不使用运动补偿增强处理方式,那么这时候的运动信息为第二运动信息,即用于指向整像素位置(即“整像素精度位置”),此时解码器是不需要进行运动补偿增强的。
进一步地,在一些实施例中,该方法还可以包括:
若第一率失真代价值大于第二率失真代价值,则确定第一语法元素标识信息的取值为第一值;
若第一率失真代价值小于或等于第二率失真代价值,则确定第一语法元素标识信息的取值为第二值。
在本申请实施例中,第一值和第二值不同,而且第一值和第二值可以是参数形式,也可以是数字形式。在通常情况下,第一语法元素标识信息是写入在概述(profile)中的参数,但是第一语法元素标识信息也可以是一个标志(flag),这里并不作任何限定。
也就是说,本申请实施例还可以设置第一语法元素标识信息,该第一语法元素标识信息用于指示当前块是否使用运动补偿增强处理方式。这样,后续在解码器中,根据第一语法元素标识信息的取值,即可以确定当前块是否使用运动补偿增强处理方式。
还需要说明的是,如果第一语法元素标识信息为一个flag,那么在一种具体的示例中,第一值可以设置为1,第二值可以设置为0;在另一具体的示例中,第一值还可以设置为true,第二值还可以设置为false;甚至在又一具体的示例中,第一值还可以设置为0,第二值还可以设置为1;或者,第一值还 可以设置为false,第二值还可以设置为true。本申请实施例第一值和第二值不作任何限定。
除此之外,本申请实施例还可以设置第二语法元素标识信息,该第二语法元素标识信息用于指示当前块是否使用本申请实施例的运动补偿增强方法。在一些实施例中,该方法还可以包括:若第二语法元素标识信息指示当前块使用本申请实施例的运动补偿增强方法,即第二语法元素标识信息的取值为第一值,则执行图4所示的流程;若第二语法元素标识信息指示当前块不使用本申请实施例的运动补偿增强方法,即第二语法元素标识信息的取值为第二值,则执行相关技术的运动补偿增强方法,比如基于DCTIF的分像素运动补偿方法。
进一步地,在一些实施例中,该方法还可以包括:对第一语法元素标识信息的取值进行编码,将编码比特写入码流。如此,编码器在将第一语法元素标识信息的取值写入码流后,使得解码器可以通过解析码流直接确定出当前块是否使用运动补偿增强处理方式,以方便解码器执行后续操作。
S404:根据运动信息,对当前块进行编码。
需要说明的是,运动信息可以至少包括:参考帧信息和运动向量。这样,根据当前块的运动信息,可以从参考帧中确定出预测块。
在一种可能的实施方式中,如果当前块使用运动补偿增强处理方式,那么所述根据运动信息,对当前块进行编码,可以包括:
根据第一运动信息和分像素匹配块,确定当前块的第一预测块;
根据当前块和第一预测块,确定当前块的残差块;
对残差块进行编码,将编码比特写入码流。
在一种具体的示例中,所述根据当前块和第一预测块,确定当前块的残差块,可以包括:对当前块和第一预测块进行减法运算,确定出当前块的残差块。
在另一种可能的实施方式中,如果当前块不使用运动补偿增强处理方式,那么所述根据运动信息,对当前块进行编码,可以包括:
根据第二运动信息和第一匹配块,确定当前块的第二预测块;
根据当前块和第二预测块,确定当前块的残差块;
对残差块进行编码,将编码比特写入码流。
需要说明的是,在本申请实施例中,如果当前块不使用运动补偿增强处理方式,那么即指当前块使用整像素运动补偿方式。
还需要说明的是,在一种具体的示例中,所述根据当前块和第二预测块,确定当前块的残差块,可以包括:对当前块和第二预测块进行减法运算,确定出当前块的残差块。
进一步地,在一些实施例中,该方法还可以包括:对运动信息进行编码,将编码比特写入码流。如此,编码器在将运动信息写入码流后,使得解码器通过解析码流确定出运动信息之后,根据运动信息即可以确定出当前块的预测块(第一预测块或第二预测块),以方便解码器执行后续操作。
简言之,考虑到低分辨率特征与高分辨特征之间的联系,受投影思想的启发,本申请实施例结合转置卷积和残差网络,提出了残差投影块。然后基于残差投影块,本申请实施例提出了一种二分之一精度的分像素插值网络RPNet,并将其应用在VTM7.0中。
这样,本申请实施例所提出的技术方案在VTM7.0上实现后,在低延迟(Low Delay)的P条件下对视频序列进行编码实验。如表1所示,其示出了应用基于RPNet的分像素运动补偿方法的VTM7.0编码结果。与VTM7.0相比,在同样解码质量的前提下,本技术方案提出的方法可以使得码率(BD-Rate)平均降低0.18%。尤其是对于SlideShow视频序列,该方法可以使得码率降低0.57%,进一步说明了基于RPNet的二分之一精度的分像素运动补偿方法的有效性。
表1
Figure PCTCN2021096818-appb-000006
本申请实施例提供了一种编码方法,应用于编码器。通过确定当前块的第一匹配块;对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;根据至少一个第二匹配块,确定当前块的运动信息;根据运动信息,对当前块进行编码。这样,利用预设神经网络模型进行运动补偿增强,在保证相同解码质量的前提下,不仅可以降低计算复杂度,还可以节省码率,进而能够提高编解码效率。
本申请的另一实施例中,本申请实施例提供一种码流,该码流是根据待编码信息进行比特编码生成的。其中,待编码信息至少包括当前块的运动信息、当前块的残差块和第一语法元素标识信息的取值,该第一语法元素标识信息用于指示当前块是否使用运动补偿增强处理方式。
在本申请实施例中,码流可以由编码器传输到解码器,以方便解码器执行后续操作。
本申请的又一实施例中,参见图7,其示出了本申请实施例提供的一种解码方法的流程示意图。如图7所示,该方法可以包括:
S701:解析码流,确定第一语法元素标识信息的取值。
需要说明的是,对于一视频图像而言,视频图像可以划分为多个图像块,每个待解码的图像块均可以称为解码块,而这里的当前块具体是指当前待进行帧间预测的解码块。其中,当前块可以是一个CTU,甚至可以是一个CU、PU等,本申请实施例不作任何限定。
还需要说明的是,本申请实施例的解码方法主要应用于帧间预测的运动补偿部分。其中,运动补偿是利用解码重建的参考帧中的局部图像来预测和补偿当前的局部图像,可以减少运动图像的冗余信息。
在一些实施例中,所述解析码流,确定第一语法元素标识信息的取值,可以包括:
若第一语法元素标识信息的取值为第一值,则确定当前块使用运动补偿增强处理方式;
若第一语法元素标识信息的取值为第二值,则确定当前块不使用运动补偿增强处理方式。
需要说明的是,第一语法元素标识信息用于指示当前块是否使用运动补偿增强处理方式。另外,第一值和第二值不同,而且第一值和第二值可以是参数形式,也可以是数字形式。在通常情况下,第一语法元素标识信息是写入在概述(profile)中的参数,但是第一语法元素标识信息也可以是一个标志(flag),这里并不作任何限定。
还需要说明的是,如果第一语法元素标识信息为一个flag,那么在一种具体的示例中,第一值可以设置为1,第二值可以设置为0;在另一具体的示例中,第一值还可以设置为true,第二值还可以设置为false;甚至在又一具体的示例中,第一值还可以设置为0,第二值还可以设置为1;或者,第一值还可以设置为false,第二值还可以设置为true。本申请实施例第一值和第二值不作任何限定。
以第一值为1,第二值为0为例,在本申请实施例中,如果第一语法元素标识信息的取值为1,那么可以确定当前块使用运动补偿增强处理方式,那么这时候解码获得的运动信息为第一运动信息,即用于指向分像素位置,此时在解码器中也需要进行运动补偿增强以插值出第二匹配块。否则,如果第一语法元素标识信息的取值为0,那么可以确定当前块不使用运动补偿增强处理方式,那么这时候解码获得的运动信息为第二运动信息,即用于指向整像素位置,此时解码器是不需要进行运动补偿增强的。
S702:若第一语法元素标识信息指示当前块使用运动补偿增强处理方式,则解析码流,确定当前块的第一运动信息。
S703:根据第一运动信息确定当前块的第一匹配块,并对第一匹配块进行运动补偿增强,得到至少一个第二匹配块。
在本申请实施例中,运动信息可以包括参考帧信息和运动向量(Motion Vector,MV)信息。以VVC为例,是否使用分像素运动补偿是由解析码流所确定的MV精度决定的。例如,标识MV是整像素精度还是分像素精度。如果是分像素精度,例如1/4像素,MV分量的低2个比特位均为0,那么可以表示该MV指向整像素精度位置;反之,指向分像素精度位置。
需要说明的是,这里的第一匹配块可以是指向整像素精度位置,也可以是使用相关技术的分像素插值滤波方式所指向的分像素精度位置,本申请实施例并不作任何限定。
还需要说明的是,如果第一匹配块可以是指向整像素精度位置,那么当第一语法元素标识信息指示当前块使用运动补偿增强处理方式时,这时候解码器需要进行分像素运动补偿以插值出第二匹配块。在这里,由于解码器中已解码的参考帧均为整像素位置,对于整像素位置中间的分像素位置,则是需要通过插值方式得到,本申请实施例是采用基于预设神经网络模型的分像素运动补偿方式实现的。
在一些实施例中,所述对第一匹配块进行运动补偿增强的步骤还可以包括:利用预设神经网络模型对第一匹配块进行运动补偿增强。
相应地,所述对第一匹配块进行运动补偿增强,得到至少一个第二匹配块,可以包括:
利用预设神经网络模型对第一匹配块进行超分辨率和质量增强处理,得到处理块;
对处理块进行第一滤波处理,得到至少一个第二匹配块。
需要说明的是,处理块的分辨率高于当前块的分辨率。或者,也可以说,经过超分辨率和质量增强处理后所得到的处理块具有高质量和高分辨率性能。
还需要说明的是,第一匹配块与当前块具有相同的分辨率,而且经过第一滤波处理后得到的第二匹配块也具有与当前块相同的分辨率。
进一步地,在一些实施例中,第一滤波处理可以包括:下采样。也就是说,在得到处理块之后,对处理块进行下采样,可以得到至少一个第二匹配块。
在一种具体的示例中,如果第一匹配块为整像素匹配块,所述利用预设神经网络模型对第一匹配块进行运动补偿增强,得到至少一个第二匹配块,可以包括:
利用预设神经网络模型对第一匹配块进行分像素运动补偿,得到至少一个第二匹配块。
在这里,如果第一匹配块为整像素匹配块,那么在一种可能的实施方式中,第二匹配块的精度为二分之一精度,该第二匹配块的数量为四个;在另一种可能的实施方式中,第二匹配块的精度为四分之一精度,该第二匹配块的数量为16个;但是本申请实施例不作任何限定。
可以理解地,预设神经网络模型可以为一种卷积神经网络模型。本申请实施例可以利用卷积神经网络模型对第一匹配块实现端到端的超分辨率和质量增强,然后对输出的高分辨率图像进行下采样,能够生成四个二分之一精度的分像素样本(即“第二匹配块”)。
进一步地,在一些实施例中,预设神经网络模型可以包括特征提取模块、残差投影模块组、采样模块和重建模块;其中,特征提取模块、残差投影模块组、采样模块和重建模块顺次连接。
相应地,基于预设神经网络模型的具体结构,所述对第一匹配块进行超分辨率和质量增强处理,得到处理块,可以包括:
通过特征提取模块对第一匹配块进行浅层特征提取,得到第一特征信息;
通过残差投影模块组对第一特征信息进行残差特征学习,得到第二特征信息;
通过采样模块对第二特征信息进行第二滤波处理,得到第三特征信息;
通过重建模块对第三特征信息进行超分辨率重建,得到处理块。
在这里,特征提取模块又可以称为“浅层特征提取模块”。在一种具体的示例中,特征提取模块可以为第一卷积层。相应地,所述通过特征提取模块对第一匹配块进行浅层特征提取,得到第一特征信息,可以包括:通过第一卷积层对第一匹配块进行卷积操作,得到第一特征信息。需要注意的是,这里的浅层特征主要是指低层次的简单特征(如边缘特征等)。
在一些实施例中,第一卷积层的卷积核尺寸为K×L,第一卷积层的卷积核数量为2的整数次幂,K和L为大于零的正整数。在一种更具体的示例中,第一卷积层的卷积核尺寸可以为3×3,第一卷积层的卷积核数量为64,但是这里并不作任何限定。
进一步地,对于残差投影模块组,在一种具体的示例中,残差投影模块组可以包括N个残差投影块、第二卷积层和第一连接层;其中,N为大于或等于1的整数。
在本申请实施例中,N个残差投影块、第二卷积层和第一连接层顺次连接,且第一连接层还与N个残差投影块中第一个残差投影块的输入连接。
相应地,所述通过残差投影模块组对第一特征信息进行残差特征学习,得到第二特征信息,包括:
通过N个残差投影块对第一特征信息进行残差特征学习,得到第一中间特征信息;
通过第二卷积层对第一中间特征信息进行卷积操作,得到第二中间特征信息;
通过第一连接层对第一特征信息和第二中间特征信息进行加法计算,得到第二特征信息。
在一些实施例中,第二卷积层的卷积核尺寸为K×L,第二卷积层的卷积核数量为2的整数次幂,K和L为大于零的正整数。在一种更具体的示例中,第二卷积层的卷积核尺寸为3×3,第二卷积层的卷积核数量为64,但是这里也不作任何限定。
进一步地,针对堆叠残差投影块的方式,N个残差投影块是级联结构,该级联结构的输入为所述第一特征信息,该级联结构的输出为所述第二中间特征信息。在一些实施例中,所述通过N个残差投影块对第一特征信息进行残差特征学习,得到第一中间特征信息,可以包括:
当N等于1时,将第一特征信息输入到第一个残差投影块,得到第一个残差投影块的输出信息,并将第一个残差投影块的输出信息确定为第一中间特征信息;
当N大于1时,在得到第一个残差投影块的输出信息后,将第d个残差投影块的输出信息输入到第d+1个残差投影块,得到第d+1个残差投影块的输出信息,并对d执行加1处理,直至得到第N个残差投影块的输出信息,并将第N个残差投影块的输出信息确定为第一中间特征信息;其中,d为大于或等于1且小于N的整数。
在这里,如果N等于1,即残差投影模块组中仅包括1个残差投影块,该残差投影块的输出信息即 为第一中间特征信息;如果N大于1,即残差投影模块组中包括有两个以上残差投影块,通过堆叠残差学习方式,即前一个残差投影块的输出即为下一个残差投影块的输入,直至堆叠得到最后一个残差投影块的输出信息,这时候最后一个残差投影块的输出信息即为第一中间特征信息。
可以理解,考虑到低分辨率特征与高分辨率特征之间的联系,受投影思想的启发,本申请实施例提出了残差投影模块(或简称为“残差投影块”)。在一些实施例中,残差投影块可以包括上投影模块、M个残差模块、局部特征融合模块、下投影模块和第二连接层;其中,M为大于或等于1的整数。
在本申请实施例中,上投影模块、M个残差模块、局部特征融合模块、下投影模块和第二连接层顺次连接,且第二连接层还与上投影模块的输入连接,M个残差模块的输出还分别与局部特征融合模块连接。
相应地,对于残差投影块的具体网络结构,该方法还可以包括:
通过上投影模块对残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息;
通过M个残差模块对第一高分辨率特征信息进行不同等级的高分辨率特征学习,得到M个第二高分辨率特征信息;
通过局部特征融合模块对M个第二高分辨率特征信息进行融合操作,得到第三高分辨率特征信息;
通过下投影模块对第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息;
通过第二连接层对输入信息和滤波后特征信息进行加法计算,得到所述残差投影块的输出信息。
进一步地,对于上投影模块,在一种具体的示例中,上投影模块可以包括转置卷积层。相应地,所述通过上投影模块对残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息,可以包括:
通过转置卷积层对残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息。
需要说明的是,经过第三滤波处理后得到的第一高分辨率特征信息的分辨率高于残差投影块的输入信息的分辨率。
还需要说明的是,第三滤波处理可以包括:上采样。也就是说,通过转置卷积层对残差投影块的输入信息进行上采样,得到第一高分辨率特征信息。
进一步地,对于局部特征融合模块,在一种具体的示例中,局部特征融合模块可以包括特征融合层和第三卷积层,所述通过局部特征融合模块对M个第二高分辨率特征信息进行融合操作,得到第三高分辨率特征信息,包括:
通过特征融合层对M个第二高分辨率特征信息进行融合操作,得到融合特征信息;
通过第三卷积层对融合特征信息进行卷积操作,得到第三高分辨率特征信息。
在本申请实施例中,第三卷积层的卷积核尺寸为K×L,第三卷积层的卷积核数量为2的整数次幂,K和L为大于零的正整数。在一种更具体的示例中,第三卷积层的卷积核尺寸为1×1,第三卷积层的卷积核数量为64,但是这里并不作任何限定。
也就是说,在本申请实施例中,通过特征融合层对M个第二高分辨率特征信息进行融合操作,但是为了充分发挥残差网络的学习能力,这里还引入了1×1卷积层对残差模块学习到的特征信息进行融合操作,能够自适应地控制学习到的特征信息。
进一步地,对于下投影模块,在一种具体的示例中,下投影模块可以包括第四卷积层,所述通过下投影模块对第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息,包括:
通过第四卷积层对第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息。
需要说明的是,经过第四滤波处理后得到的滤波后特征信息的分辨率低于第三高分辨率特征信息的分辨率。另外,经过第四滤波处理后得到的滤波后特征信息与残差投影块的输入信息具有相同的分辨率。
还需要说明的是,第四滤波处理可以包括:下采样。也就是说,通过第四卷积层对第三高分辨率特征信息进行下采样,得到滤波后特征信息。
除此之外,对于预设神经网络模块的采样模块而言,在一种具体的示例中,采样模块可以包括亚像素卷积层。相应地,所述通过采样模块对第二特征信息进行第二滤波处理,得到第三特征信息,包括:
通过亚像素卷积层对第二特征信息进行第二滤波处理,得到第三特征信息。
需要说明的是,经过第二滤波处理后得到的第三特征信息的分辨率高于第二特征信息的分辨率。
还需要说明的是,第二滤波处理可以包括:上采样。也就是说,对于采样模块,其主要是用于对第二特征信息进行上采样,故该采样模块又可以称为“上采样模块”。
还需要说明的是,该采样模块可以采用亚像素卷积层,也可以是增加一层亚像素卷积层。在这里,亚像素卷积层也可以是PixShuffle模块(或者,称为Pixelshuffle模块),其实现的功能是:将一个H×W的低分辨率输入图像,通过亚像素(Sub-pixel)操作将其变换为rH×rW的高分辨率输入图像。但是其实现过程不是直接通过插值等方式产生这个高分辨率图像,而是通过卷积先得到r 2个通道的特征图(特征图大小和输入低分辨率图像一致),然后通过周期筛选(periodic shuffling)的方法得到这个高分 辨率图像;其中,r可以为图像的放大倍数。具体地,针对特征图,其通道数为r 2,将每个像素的r 2个通道重新排列成一个r×r的区域,对应于高分辨率图像中的一个r×r大小的子块,从而大小为r 2×H×W的特征图像被重新排列成1×rH×rW大小的高分辨率图像。
进一步地,对于预设神经网络模块的重建模块而言,在一种具体的示例中,重建模块可以包括第五卷积层。相应地,所述通过重建模块对第三特征信息进行超分辨率重建,得到处理块,可以包括:
通过第五卷积层对第三特征信息进行卷积操作,得到处理块。
在本申请实施例中,第五卷积层的卷积核尺寸为K×L,第五卷积层的卷积核数量为2的整数次幂,K和L为大于零的正整数。在一种更具体的示例中,第五卷积层的卷积核尺寸为3×3,第五卷积层的卷积核数量为1,但是这里也不作任何限定。
示例性地,图5示出了本申请实施例提供的一种预设神经网络模型的网络结构示例,图6示出了本申请实施例提供的一种残差投影块的网络结构示例。也就是说,本申请实施例结合转置卷积和残差模块提出了残差投影块RPB,其基本思想是利用转置卷积层将低分辨率特征投影到高分辨率特征空间,再利用残差模块学习不同等级的高分率特征,然后通过局部特征融合提高残差模块的表达能力,最后利用卷积层将该高分率特征投影回低分辨率特征空间。这样,基于该模块,本申请实施例提出了二分之一精度的分像素插值的预设神经网络模型RPNet,并将训练好的模型嵌入到编码平台VTM7.0。如此,在视频编码过程中,本申请实施例可以只对尺寸大于或等于64×64的PU选择RPNet进行运动补偿增强;对于尺寸小于64×64的PU仍然按照相关技术中的插值滤波器进行运动补偿增强。
另外,以第一匹配块为整像素匹配块为例,对于运动补偿增强来说,本申请实施例的方法可以实现二分之一精度的分像素运动补偿,也可以实现四分之一精度的分像素运动补偿,甚至还可以实现其他精度的分像素运动补偿,本申请实施例不作任何限定。
在一种具体的示例中,当分像素样值的精度为二分之一精度时,转置卷积层和第四卷积层的卷积核尺寸均为6×6,步长和填充值均设置为2;或者,当分像素样值的精度为四分之一精度时,转置卷积层和第四卷积层的卷积核尺寸均为8×8,步长和填充值分别设置为4和2。
换句话说,在模型结构中,对于RPNet而言,在一种具体的示例中,RPNet中残差投影块的数量N可设置为10,每个残差投影块中残差模块的数量M可设置为3。除了重建网络层中的卷积层的卷积核个数设置为1外,网络模型中其他转置卷积层或卷积层的卷积核个数都设置为64。上投影模块中的转置卷积层和下投影模块中的卷积层中卷积核的尺寸设置为6×6,步长和填充设置为2。除此之外,网络模型中其他卷积层均采用尺寸3×3的卷积核,而上采样模块可以采用亚像素卷积层。
在另一种具体的示例中,本申请实施例中的RPNet还可以用于所有尺寸的PU进行二分之一精度的分像素样本插值。另外,虽然所有的更改都可能对最终视频的质量产生影响,但是本申请实施例中的RPNet还可以调整残差投影块的数量以及调整残差投影块中残差模块的数量。甚至本申请实施例中的RPNet还可以用于四分之一精度的分像素运动补偿,这时候将上投影模块中的转置卷积层和下投影模块中的卷积层中卷积核的尺寸设置为8×8,步长和填充分别设置为4和2,并在上采样模块中增加一层亚像素卷积层。
进一步地,对于预设神经网络模型,其可以是通过模型训练得到的。在一些实施例中,该方法还可以包括:
确定训练数据集,该训练数据集包括至少一张训练图像和至少一张验证图像;
对训练数据集进行预处理,得到预设神经网络模型的真值区域以及至少一组输入图像组;其中,该输入图像组包括至少一张输入图像;
基于真值区域,利用这至少一组输入图像组对神经网络模型进行训练,得到至少一组候选模型参数;其中,真值区域用于确定神经网络模型的损失函数的损失值(Loss),这至少一组候选模型参数是在损失函数的损失值收敛到预设阈值时得到的。
需要说明的是,在数据集方面,本申请实施例可以选择公开数据集DIV2K,其中包含800张训练图像和100张验证图像。在具体的操作过程中,对DIV2K数据集的预处理主要包括格式转换与编码重建两个步骤。首先,对训练集中的800张高分辨率图像和测试集中的100张高分辨率图像以及二者对应的低分辨率图像进行格式转换,由原始的PNG格式转换为YUV420格式。然后,对YUV420格式的高分辨率图像数据提取亮度分量并保存为PNG格式,作为真值区域(Ground Truth)。对YUV420格式的低分辨率图像数据则利用VTM7.0进行全帧内编码,量化参数(Quantization Parameter,QP)可以分别设置为22、27、32、37,再对四组解码重建的数据分别提取亮度分量并保存为PNG格式,作为神经网络模型的输入。由此,可以获得四组训练数据集。
在评价标准方面,本申请实施例选择峰值信噪比(Peak Signal-to-Noise Ratio,PSNR)作为图像重建质量的评价标准。
在网络训练方面,模型是基于Pytorch平台训练的。将尺寸为48×48的低分辨率图像作为输入,批(batch)设置为16。本申请实施例可以选择平均绝对误差作为损失函数,自适应矩估计作为优化函数,并将动量和权重衰减分别设置为0.9和0.0001。初始学习率设置为0.0001,并且每经过100个时期(epoch)则按照0.1的比例下降,一共经过300个epoch。根据不同的QP,利用相应的数据集训练,得到四组模型参数,这四组模型参数对应四个模型,分别用RPNet_qp22、RPNet_qp27、RPNet_qp32、RPNet_qp37表示。
进一步地,对于预设神经网络模型的确定,可以通过下述两种方式来实现。
在一种可能的实施方式中,模型训练之后,该方法还可以包括:
确定当前块的量化参数;
根据所述量化参数,从至少一组候选模型参数中确定所述量化参数对应的模型参数;
根据所述模型参数,确定预设神经网络模型。
在这里,当至少一组为多组时,输入图像组对应不同的量化参数,且多组候选模型参数与不同的量化参数之间具有对应关系。
在另一种可能的实施方式中,该方法还可以包括:
解析所述码流,获取模型参数;
根据所述模型参数,确定所述预设神经网络模型。
需要说明的是,解码器可以在模型训练之后,根据当前块的量化参数来确定出该量化参数对应的训练好后的模型参数,然后确定出本申请实施例使用的预设神经网络模型;或者,解码器还可以通过解析码流获得模型参数,然后根据该模型参数即可确定出本申请实施例使用的预设神经网络模型;本申请实施例对此不作具体限定。
还需要说明的是,一方面,如果编码器和解码器使用相同的固定参数的预设神经网络模型,那么这时候参数已经固化,故不需要进行模型参数传输;另一方面,如果码流中传输公共训练数据集的访问信息,例如一个统一资源定位系统(Uniform Resource Locator,URL),解码器使用与编码器相同的方式进行训练;再一方面,对于编码器,可以使用已编码的视频序列进行学习。
这样,如果第一语法元素标识信息指示当前块使用运动补偿增强处理方式,那么在确定出第一运动信息和预设神经网络模型之后,可以利用预设神经网络模型对第一匹配块进行运动补偿增强,得到至少一个第二匹配块。
S704:根据第一运动信息和至少一个第二匹配块,确定当前块的第一预测块。
S705:根据第一预测块,确定当前块的重建块。
需要说明的是,解码器还需要解码获得当前块的残差块。在一些实施例中,该方法还可以包括:解析码流,获取当前块的残差块。这样,所述根据第一预测块,确定当前块的重建块,可以包括:根据残差块和第一预测块,确定当前块的重建块。
在一种具体的示例中,所述根据残差块和第一预测块,确定当前块的重建块,可以包括:对残差块和第一预测块进行加法运算,确定出当前块的重建块。
除此之外,在本申请实施例中,第一语法元素标识信息还可以指示当前块不使用运动补偿增强处理方式,即当前块使用整像素运动补偿方式。这时候,在一些实施例中,该方法还可以包括:
若第一语法元素标识信息指示当前块不使用运动补偿增强处理方式,则解析码流,获取当前块的第二运动信息,该第二运动信息用于指向整像素位置;
根据当前块的第二运动信息,确定当前块的第二预测块;
根据第二预测块,确定当前块的重建块。
需要说明的是,如果当前块使用整像素运动补偿方式,这时候不再需要进行运动补偿增强;根据解码获得的第二运动信息,即可以确定出当前块的第二预测块。
还需要说明的是,在确定当前块的重建块过程中,解码器仍然需要解码获得当前块的残差块。具体地,在一些实施例中,该方法还可以包括:解析码流,获取当前块的残差块。这样,所述根据第二预测块,确定当前块的重建块,可以包括:根据残差块和第二预测块,确定当前块的重建块。
在一种具体的示例中,所述根据残差块和第二预测块,确定当前块的重建块,可以包括:对残差块和第二预测块进行加法运算,确定出当前块的重建块。
简言之,在解码器中,如果解码获得当前块使用运动补偿增强处理方式,那么考虑到低分辨率特征与高分辨特征之间的联系,受投影思想的启发,这时候可以结合转置卷积和残差网络来提出残差投影块。然后基于残差投影块,本申请实施例提出了一种二分之一精度的分像素插值网络RPNet,并将其应用在VTM7.0中。
本申请实施例提供了一种解码方法,应用于解码器。通过解析码流,确定第一语法元素标识信息的 取值;若第一语法元素标识信息指示当前块使用运动补偿增强处理方式,则解析码流,确定当前块的第一运动信息;根据第一运动信息确定当前块的第一匹配块,并对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;根据第一运动信息和至少一个第二匹配块,确定当前块的第一预测块;根据第一预测块,确定当前块的重建块。这样,当解码获得当前块使用运动补偿增强处理方式时,这时候利用预设神经网络模型进行运动补偿增强,在保证相同解码质量的前提下,不仅可以降低计算复杂度,还可以节省码率,进而能够提高编解码效率。
本申请的再一实施例中,基于前述实施例相同的发明构思,参见图8,其示出了本申请实施例提供的一种编码器80的组成结构示意图。如图8所示,该编码器80可以包括:第一确定单元801、第一运动补偿单元802和编码单元803;其中,
第一确定单元801,配置为确定当前块的第一匹配块;
第一运动补偿单元802,配置为对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;
第一确定单元801,还配置为根据至少一个第二匹配块,确定当前块的运动信息;
编码单元803,配置为根据运动信息,对当前块进行编码。
在一些实施例中,第一运动补偿单元802,具体配置为对第一匹配块进行超分辨率和质量增强处理,得到处理块;以及对处理块进行第一滤波处理,得到至少一个第二匹配块,其中,经过第一滤波处理后得到的第二匹配块具有与当前块相同的分辨率。
在一些实施例中,第一滤波处理包括:下采样。
在一些实施例中,第一运动补偿单元802,还配置为利用预设神经网络模型对第一匹配块进行运动补偿增强;其中,预设神经网络模型包括特征提取模块、残差投影模块组、采样模块和重建模块,且特征提取模块、残差投影模块组、采样模块和重建模块顺次连接;
相应地,第一运动补偿单元802,还配置为通过特征提取模块对第一匹配块进行浅层特征提取,得到第一特征信息;以及通过残差投影模块组对第一特征信息进行残差特征学习,得到第二特征信息;以及通过采样模块对第二特征信息进行第二滤波处理,得到第三特征信息;以及通过重建模块对第三特征信息进行超分辨率重建,得到处理块。
在一些实施例中,特征提取模块包括第一卷积层;相应地,第一运动补偿单元802,还配置为通过第一卷积层对第一匹配块进行卷积操作,得到第一特征信息。
在一些实施例中,残差投影模块组包括N个残差投影块、第二卷积层和第一连接层,N个残差投影块、第二卷积层和第一连接层顺次连接,且第一连接层还与N个残差投影块中第一个残差投影块的输入连接;
相应地,第一运动补偿单元802,还配置为通过N个残差投影块对第一特征信息进行残差特征学习,得到第一中间特征信息,其中,N为大于或等于1的整数;以及通过第二卷积层对第一中间特征信息进行卷积操作,得到第二中间特征信息;以及通过第一连接层对第一特征信息和第二中间特征信息进行加法计算,得到第二特征信息。
在一些实施例中,N个残差投影块是级联结构,级联结构的输入为第一特征信息,级联结构的输出为第二中间特征信息。
在一些实施例中,第一运动补偿单元802,还配置为当N等于1时,将第一特征信息输入到第一个残差投影块,得到第一个残差投影块的输出信息,并将第一个残差投影块的输出信息确定为第一中间特征信息;以及当N大于1时,在得到第一个残差投影块的输出信息后,将第d个残差投影块的输出信息输入到第d+1个残差投影块,得到第d+1个残差投影块的输出信息,并对d执行加1处理,直至得到第N个残差投影块的输出信息,并将第N个残差投影块的输出信息确定为第一中间特征信息;其中,d为大于或等于1且小于N的整数。
在一些实施例中,残差投影块包括上投影模块、M个残差模块、局部特征融合模块、下投影模块和第二连接层,上投影模块、M个残差模块、局部特征融合模块、下投影模块和第二连接层顺次连接,且第二连接层还与上投影模块的输入连接,M个残差模块的输出还分别与局部特征融合模块连接;
相应地,第一运动补偿单元802,还配置为通过上投影模块对残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息;以及通过M个残差模块对第一高分辨率特征信息进行不同等级的高分辨率特征学习,得到M个第二高分辨率特征信息,其中,M为大于或等于1的整数;以及通过局部特征融合模块对M个第二高分辨率特征信息进行融合操作,得到第三高分辨率特征信息;以及通过下投影模块对第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息;以及通过第二连接层对输入信息和滤波后特征信息进行加法计算,得到残差投影块的输出信息。
在一些实施例中,上投影模块包括转置卷积层;相应地,第一运动补偿单元802,还配置为通过转 置卷积层对残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息,其中,经过第三滤波处理后得到的第一高分辨率特征信息的分辨率高于残差投影块的输入信息的分辨率。
在一些实施例中,第三滤波处理包括:上采样。
在一些实施例中,局部特征融合模块包括特征融合层和第三卷积层;相应地,第一运动补偿单元802,还配置为通过特征融合层对M个第二高分辨率特征信息进行融合操作,得到融合特征信息;以及通过第三卷积层对融合特征信息进行卷积操作,得到第三高分辨率特征信息。
在一些实施例中,下投影模块包括第四卷积层;相应地,第一运动补偿单元802,还配置为通过第四卷积层对第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息,其中,经过第四滤波处理后得到的滤波后特征信息的分辨率低于第三高分辨率特征信息的分辨率。
在一些实施例中,第四滤波处理包括:下采样。
在一些实施例中,采样模块包括亚像素卷积层;相应地,第一运动补偿单元802,还配置为通过亚像素卷积层对第二特征信息进行第二滤波处理,得到第三特征信息,其中,经过第二滤波处理后得到的第三特征信息的分辨率高于第二特征信息的分辨率。
在一些实施例中,第二滤波处理包括:上采样。
在一些实施例中,重建模块包括第五卷积层;相应地,第一运动补偿单元802,还配置为通过第五卷积层对第三特征信息进行卷积操作,得到处理块。
在一些实施例中,参见图8,编码器80还可以包括第一训练单元804,配置为确定训练数据集,训练数据集包括至少一张训练图像;以及对训练数据集进行预处理,得到预设神经网络模型的真值区域以及至少一组输入图像组;其中,输入图像组包括至少一张输入图像;以及基于真值区域,利用至少一组输入图像组对神经网络模型进行训练,得到至少一组候选模型参数;其中,真值区域用于确定神经网络模型的损失函数的损失值,至少一组候选模型参数是在损失函数的损失值收敛到预设阈值时得到的。
在一些实施例中,第一确定单元801,还配置为确定当前块的量化参数;根据量化参数,从至少一组候选模型参数中确定量化参数对应的模型参数;以及根据模型参数,确定预设神经网络模型;其中,当至少一组为多组时,输入图像组对应不同的量化参数,且多组候选模型参数与不同的量化参数之间具有对应关系。
在一些实施例中,编码单元803,还配置为对模型参数进行编码,将编码比特写入码流。
在一些实施例中,参见图8,编码器80还可以包括运动估计单元805,配置为对当前块进行整像素运动估计,确定当前块的第一匹配块;其中,第一匹配块为当前块在整像素位置进行运动估计时率失真代价值最小的匹配块;
第一运动补偿单元802,还配置为利用预设神经网络模型对第一匹配块进行分像素运动补偿,得到至少一个第二匹配块。
在一些实施例中,运动估计单元805,还配置为根据至少一个第二匹配块对当前块进行分像素运动估计,确定当前块的分像素匹配块,分像素匹配块为当前块在分像素位置进行运动估计时率失真代价值最小的匹配块;
第一确定单元801,还配置为利用第一匹配块对当前块进行预编码处理,确定第一率失真代价值;以及利用分像素匹配块对当前块进行预编码处理,确定第二率失真代价值;以及若第一率失真代价值大于第二率失真代价值,则确定当前块使用运动补偿增强处理方式,且确定运动信息为第一运动信息,第一运动信息用于指向分像素位置;或者,若第一率失真代价值小于或等于第二率失真代价值,则确定当前块不使用运动补偿增强处理方式,且确定运动信息为第二运动信息,第二运动信息用于指向整像素位置。
在一些实施例中,第一确定单元801,还配置为若第一率失真代价值大于第二率失真代价值,则确定第一语法元素标识信息的取值为第一值;或者,若第一率失真代价值小于或等于第二率失真代价值,则确定第一语法元素标识信息的取值为第二值;其中,第一语法元素标识信息用于指示当前块是否使用运动补偿增强处理方式。
在一些实施例中,编码单元803,还配置为对第一语法元素标识信息的取值进行编码,将编码比特写入码流。
在一些实施例中,编码单元803,还配置为当当前块使用运动补偿增强处理方式时,根据第一运动信息和分像素匹配块,确定当前块的第一预测块;以及根据当前块和第一预测块,确定当前块的残差块;以及对残差块进行编码,将编码比特写入码流;
或者,编码单元803,还配置为当当前块不使用运动补偿增强处理方式时,根据第二运动信息和第一匹配块,确定当前块的第二预测块;以及根据当前块和第二预测块,确定当前块的残差块;以及对残差块进行编码,将编码比特写入码流。
在一些实施例中,编码单元803,还配置为对运动信息进行编码,将编码比特写入码流。
可以理解地,在本申请实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
因此,本申请实施例提供了一种计算机存储介质,应用于编码器80,该计算机存储介质存储有计算机程序,所述计算机程序被第一处理器执行时实现前述实施例中任一项所述的方法。
基于上述编码器80的组成以及计算机存储介质,参见图9,其示出了本申请实施例提供的编码器80的具体硬件结构示意图。如图9所示,可以包括:第一通信接口901、第一存储器902和第一处理器903;各个组件通过第一总线系统904耦合在一起。可理解,第一总线系统904用于实现这些组件之间的连接通信。第一总线系统904除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图9中将各种总线都标为第一总线系统904。其中,
第一通信接口901,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第一存储器902,用于存储能够在第一处理器903上运行的计算机程序;
第一处理器903,用于在运行所述计算机程序时,执行:
确定当前块的第一匹配块;
对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;
根据至少一个第二匹配块,确定当前块的运动信息;
根据运动信息,对当前块进行编码。
可以理解,本申请实施例中的第一存储器902可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请描述的系统和方法的第一存储器902旨在包括但不限于这些和任意其它适合类型的存储器。
而第一处理器903可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过第一处理器903中的硬件的集成逻辑电路或者软件形式的指令完成。上述的第一处理器903可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于第一存储器902,第一处理器903读取第一存储器902中的信息,结合其硬件完成上述方法的步骤。
可以理解的是,本申请描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其 组合中。对于软件实现,可通过执行本申请所述功能的模块(例如过程、函数等)来实现本申请所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。
可选地,作为另一个实施例,第一处理器903还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
本实施例提供了一种编码器,该编码器可以包括第一确定单元、第一运动补偿单元和编码单元。这样,在保证相同解码质量的前提下,不仅可以降低计算复杂度,还可以节省码率,进而能够提高编解码效率。
本申请的再一实施例中,基于前述实施例相同的发明构思,参见图10,其示出了本申请实施例提供的一种解码器100的组成结构示意图。如图10所示,该解码器100可以包括:解析单元1001、第二确定单元1002和第二运动补偿单元1003;其中,
解析单元1001,配置为解析码流,确定第一语法元素标识信息的取值;
解析单元1001,还配置为若第一语法元素标识信息指示当前块使用运动补偿增强处理方式,则解析码流,确定当前块的第一运动信息;
第二运动补偿单元1003,配置为根据第一运动信息确定当前块的第一匹配块,并对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;
第二确定单元1002,配置为根据第一运动信息和至少一个第二匹配块,确定当前块的第一预测块;以及根据第一预测块,确定当前块的重建块。
在一些实施例中,解析单元1001,还配置为解析码流,获取当前块的残差块;
相应地,第二确定单元1002,还配置为根据残差块和第一预测块,确定当前块的重建块。
在一些实施例中,解析单元1001,还配置为若第一语法元素标识信息指示当前块不使用运动补偿增强处理方式,则解析码流,获取当前块的第二运动信息,第二运动信息用于指向整像素位置;
第二确定单元1002,还配置为根据当前块的第二运动信息,确定当前块的第二预测块;以及根据第二预测块,确定当前块的重建块。
在一些实施例中,解析单元1001,还配置为解析码流,获取当前块的残差块;
相应地,第二确定单元1002,还配置为根据残差块和第二预测块,确定当前块的重建块。
在一些实施例中,第二确定单元1002,还配置为若第一语法元素标识信息的取值为第一值,则确定当前块使用运动补偿增强处理方式;或者,若第一语法元素标识信息的取值为第二值,则确定当前块不使用运动补偿增强处理方式。
在一些实施例中,第二运动补偿单元1003,具体配置为对第一匹配块进行超分辨率和质量增强处理,得到处理块,其中,处理块的分辨率高于当前块的分辨率;以及对处理块进行第一滤波处理,得到至少一个第二匹配块,其中,经过第一滤波处理后得到的第二匹配块具有与当前块相同的分辨率。
在一些实施例中,第一滤波处理包括:下采样。
在一些实施例中,第二运动补偿单元1003,还配置为利用预设神经网络模型对第一匹配块进行运动补偿增强;其中,预设神经网络模型包括特征提取模块、残差投影模块组、采样模块和重建模块,且特征提取模块、残差投影模块组、采样模块和重建模块顺次连接;
相应地,第二运动补偿单元1003,还配置为通过特征提取模块对第一匹配块进行浅层特征提取,得到第一特征信息;以及通过残差投影模块组对第一特征信息进行残差特征学习,得到第二特征信息;以及通过采样模块对第二特征信息进行第二滤波处理,得到第三特征信息;以及通过重建模块对第三特征信息进行超分辨率重建,得到处理块。
在一些实施例中,特征提取模块为第一卷积层;相应地,第二运动补偿单元1003,还配置为通过第一卷积层对第一匹配块进行卷积操作,得到第一特征信息。
在一些实施例中,残差投影模块组包括N个残差投影块、第二卷积层和第一连接层,N个残差投影块、第二卷积层和第一连接层顺次连接,且第一连接层还与N个残差投影块中第一个残差投影块的输入连接;
相应地,第二运动补偿单元1003,还配置为通过N个残差投影块对第一特征信息进行残差特征学习,得到第一中间特征信息,其中,N为大于或等于1的整数;以及通过第二卷积层对第一中间特征信息进行卷积操作,得到第二中间特征信息;以及通过第一连接层对第一特征信息和第二中间特征信息进行加法计算,得到第二特征信息。
在一些实施例中,N个残差投影块是级联结构,级联结构的输入为第一特征信息,级联结构的输出为第二中间特征信息。
在一些实施例中,第二运动补偿单元1003,还配置为当N等于1时,将第一特征信息输入到第一 个残差投影块,得到第一个残差投影块的输出信息,并将第一个残差投影块的输出信息确定为第一中间特征信息;以及当N大于1时,在得到第一个残差投影块的输出信息后,将第d个残差投影块的输出信息输入到第d+1个残差投影块,得到第d+1个残差投影块的输出信息,并对d执行加1处理,直至得到第N个残差投影块的输出信息,并将第N个残差投影块的输出信息确定为第一中间特征信息;其中,d为大于或等于1且小于N的整数。
在一些实施例中,残差投影块包括上投影模块、M个残差模块、局部特征融合模块、下投影模块和第二连接层,上投影模块、M个残差模块、局部特征融合模块、下投影模块和第二连接层顺次连接,且第二连接层还与上投影模块的输入连接,M个残差模块的输出还分别与局部特征融合模块连接;
相应地,第二运动补偿单元1003,还配置为通过上投影模块对残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息;以及通过M个残差模块对第一高分辨率特征信息进行不同等级的高分辨率特征学习,得到M个第二高分辨率特征信息,其中,M为大于或等于1的整数;以及通过局部特征融合模块对M个第二高分辨率特征信息进行融合操作,得到第三高分辨率特征信息;以及通过下投影模块对第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息;以及通过第二连接层对输入信息和滤波后特征信息进行加法计算,得到残差投影块的输出信息。
在一些实施例中,上投影模块包括转置卷积层;相应地,第二运动补偿单元1003,还配置为通过转置卷积层对残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息,其中,经过第三滤波处理后得到的第一高分辨率特征信息的分辨率高于残差投影块的输入信息的分辨率。
在一些实施例中,第三滤波处理包括:上采样。
在一些实施例中,局部特征融合模块包括特征融合层和第三卷积层;相应地,第二运动补偿单元1003,还配置为通过特征融合层对M个第二高分辨率特征信息进行融合操作,得到融合特征信息;以及通过第三卷积层对融合特征信息进行卷积操作,得到第三高分辨率特征信息。
在一些实施例中,下投影模块包括第四卷积层;相应地,第二运动补偿单元1003,还配置为通过第四卷积层对第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息,其中,经过第四滤波处理后得到的滤波后特征信息的分辨率低于第三高分辨率特征信息的分辨率。
在一些实施例中,第四滤波处理包括:下采样。
在一些实施例中,采样模块包括亚像素卷积层;相应地,第二运动补偿单元1003,还配置为通过亚像素卷积层对第二特征信息进行第二滤波处理,得到第三特征信息,其中,经过第二滤波处理后得到的第三特征信息的分辨率高于第二特征信息的分辨率。
在一些实施例中,第二滤波处理包括:上采样。
在一些实施例中,重建模块包括第五卷积层;相应地,第二运动补偿单元1003,还配置为通过第五卷积层对第三特征信息进行卷积操作,得到处理块。
在一些实施例中,参见图10,解码器100还可以包括第二训练单元1004,配置为确定训练数据集,训练数据集包括至少一张训练图像;以及对训练数据集进行预处理,得到预设神经网络模型的真值区域以及至少一组输入图像组;其中,输入图像组包括至少一张输入图像;以及基于真值区域,利用至少一组输入图像组对神经网络模型进行训练,得到至少一组候选模型参数;其中,真值区域用于确定神经网络模型的损失函数的损失值,至少一组候选模型参数是在损失函数的损失值收敛到预设阈值时得到的。
在一些实施例中,第二确定单元1002,还配置为确定当前块的量化参数;根据量化参数,从至少一组候选模型参数中确定量化参数对应的模型参数;以及根据模型参数,确定预设神经网络模型;其中,当至少一组为多组时,输入图像组对应不同的量化参数,且多组候选模型参数与不同的量化参数之间具有对应关系。
在一些实施例中,解析单元1001,还配置为解析码流,获取模型参数;
第二确定单元1002,还配置为根据模型参数,确定预设神经网络模型。
可以理解地,在本实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本实施例提供了一种计算机存储介质,应用于解码器100,该计算机存储介质存储有计算机程序,所述计算机程序被第二处理器执行时实现前述实施例中任一项所述的方法。
基于上述解码器100的组成以及计算机存储介质,参见图11,其示出了本申请实施例提供的解码器100的具体硬件结构示意图。如图11所示,可以包括:第二通信接口1101、第二存储器1102和第 二处理器1103;各个组件通过第二总线系统1104耦合在一起。可理解,第二总线系统1104用于实现这些组件之间的连接通信。第二总线系统1104除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图11中将各种总线都标为第二总线系统1104。其中,
第二通信接口1101,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第二存储器1102,用于存储能够在第二处理器1103上运行的计算机程序;
第二处理器1103,用于在运行所述计算机程序时,执行:
解析码流,确定第一语法元素标识信息的取值;
若第一语法元素标识信息指示当前块使用运动补偿增强处理方式,则解析码流,确定当前块的第一运动信息;
根据第一运动信息确定当前块的第一匹配块,并对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;
根据第一运动信息和至少一个第二匹配块,确定当前块的第一预测块;
根据第一预测块,确定当前块的重建块。
可选地,作为另一个实施例,第二处理器1103还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
可以理解,第二存储器1102与第一存储器902的硬件功能类似,第二处理器1103与第一处理器903的硬件功能类似;这里不再详述。
本实施例提供了一种解码器,该解码器可以包括解析单元、第二确定单元和第二运动补偿单元。这样,当解码获得当前块使用运动补偿增强处理方式时,在保证相同解码质量的前提下,不仅可以降低计算复杂度,还可以节省码率,进而能够提高编解码效率。
需要说明的是,在本申请中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。
本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
工业实用性
本申请实施例中,在编码器侧,通过确定当前块的第一匹配块;对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;根据至少一个第二匹配块,确定当前块的运动信息;根据运动信息,对当前块进行编码。在解码器侧,通过解析码流,确定第一语法元素标识信息的取值;若第一语法元素标识信息指示当前块使用运动补偿增强处理方式,则解析码流,确定当前块的第一运动信息;根据第一运动信息确定当前块的第一匹配块,并对第一匹配块进行运动补偿增强,得到至少一个第二匹配块;根据第一运动信息和至少一个第二匹配块,确定当前块的第一预测块;根据第一预测块,确定当前块的重建块。这样,无论是编码器还是解码器,均可以对第一匹配块进行运动补偿增强,在保证相同解码质量的前提下,不仅可以降低计算复杂度,还可以节省码率,进而能够提高编解码效率。

Claims (56)

  1. 一种编码方法,应用于编码器,所述方法包括:
    确定当前块的第一匹配块;
    对所述第一匹配块进行运动补偿增强,得到至少一个第二匹配块;
    根据所述至少一个第二匹配块,确定所述当前块的运动信息;
    根据所述运动信息,对所述当前块进行编码。
  2. 根据权利要求1所述的方法,其中,所述对所述第一匹配块进行运动补偿增强,得到至少一个第二匹配块,包括:
    对所述第一匹配块进行超分辨率和质量增强处理,得到处理块,其中,所述处理块的分辨率高于所述当前块的分辨率;
    对所述处理块进行第一滤波处理,得到所述至少一个第二匹配块,其中,经过所述第一滤波处理后得到的所述第二匹配块具有与所述当前块相同的分辨率。
  3. 根据权利要求2所述的方法,其中,所述第一滤波处理包括:下采样。
  4. 根据权利要求2所述的方法,其中,所述对所述第一匹配块进行运动补偿增强的步骤还包括:利用预设神经网络模型对所述第一匹配块进行运动补偿增强;其中,所述预设神经网络模型包括特征提取模块、残差投影模块组、采样模块和重建模块,且所述特征提取模块、所述残差投影模块组、所述采样模块和所述重建模块顺次连接;
    所述对所述第一匹配块进行超分辨率和质量增强处理,得到处理块,包括:
    通过所述特征提取模块对所述第一匹配块进行浅层特征提取,得到第一特征信息;
    通过所述残差投影模块组对所述第一特征信息进行残差特征学习,得到第二特征信息;
    通过所述采样模块对所述第二特征信息进行第二滤波处理,得到第三特征信息;
    通过所述重建模块对所述第三特征信息进行超分辨率重建,得到所述处理块。
  5. 根据权利要求4所述的方法,其中,所述特征提取模块包括第一卷积层;所述通过所述特征提取模块对所述第一匹配块进行浅层特征提取,得到第一特征信息,包括:
    通过所述第一卷积层对所述第一匹配块进行卷积操作,得到所述第一特征信息。
  6. 根据权利要求4所述的方法,其中,所述残差投影模块组包括N个残差投影块、第二卷积层和第一连接层,所述N个残差投影块、所述第二卷积层和所述第一连接层顺次连接,且所述第一连接层还与所述N个残差投影块中第一个残差投影块的输入连接;
    所述通过所述残差投影模块组对所述第一特征信息进行残差特征学习,得到第二特征信息,包括:
    通过所述N个残差投影块对所述第一特征信息进行残差特征学习,得到第一中间特征信息,其中,N为大于或等于1的整数;
    通过所述第二卷积层对所述第一中间特征信息进行卷积操作,得到第二中间特征信息;
    通过所述第一连接层对所述第一特征信息和所述第二中间特征信息进行加法计算,得到所述第二特征信息。
  7. 根据权利要求6所述的方法,其中,所述N个残差投影块是级联结构,所述级联结构的输入为所述第一特征信息,所述级联结构的输出为所述第二中间特征信息。
  8. 根据权利要求7所述的方法,其中,所述通过所述N个残差投影块对所述第一特征信息进行残差特征学习,得到第一中间特征信息,包括:
    当N等于1时,将所述第一特征信息输入到第一个残差投影块,得到所述第一个残差投影块的输出信息,并将所述第一个残差投影块的输出信息确定为所述第一中间特征信息;
    当N大于1时,在得到所述第一个残差投影块的输出信息后,将第d个残差投影块的输出信息输入到第d+1个残差投影块,得到所述第d+1个残差投影块的输出信息,并对d执行加1处理,直至得到第N个残差投影块的输出信息,并将所述第N个残差投影块的输出信息确定为所述第一中间特征信息;其中,d为大于或等于1且小于N的整数。
  9. 根据权利要求8所述的方法,其中,所述残差投影块包括上投影模块、M个残差模块、局部特征融合模块、下投影模块和第二连接层,所述上投影模块、所述M个残差模块、所述局部特征融合模块、所述下投影模块和所述第二连接层顺次连接,且所述第二连接层还与所述上投影模块的输入连接,所述M个残差模块的输出还分别与所述局部特征融合模块连接;
    所述方法还包括:
    通过所述上投影模块对所述残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息;
    通过所述M个残差模块对所述第一高分辨率特征信息进行不同等级的高分辨率特征学习,得到M个第二高分辨率特征信息,其中,M为大于或等于1的整数;
    通过所述局部特征融合模块对所述M个第二高分辨率特征信息进行融合操作,得到第三高分辨率特征信息;
    通过所述下投影模块对所述第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息;
    通过所述第二连接层对所述输入信息和所述滤波后特征信息进行加法计算,得到所述残差投影块的输出信息。
  10. 根据权利要求9所述的方法,其中,所述上投影模块包括转置卷积层,所述通过所述上投影模块对所述残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息,包括:
    通过所述转置卷积层对所述残差投影块的输入信息进行第三滤波处理,得到所述第一高分辨率特征信息,其中,经过所述第三滤波处理后得到的所述第一高分辨率特征信息的分辨率高于所述残差投影块的输入信息的分辨率。
  11. 根据权利要求10所述的方法,其中,所述第三滤波处理包括:上采样。
  12. 根据权利要求9所述的方法,其中,所述局部特征融合模块包括特征融合层和第三卷积层,所述通过所述局部特征融合模块对所述M个第二高分辨率特征信息进行融合操作,得到第三高分辨率特征信息,包括:
    通过所述特征融合层对所述M个第二高分辨率特征信息进行融合操作,得到融合特征信息;
    通过所述第三卷积层对所述融合特征信息进行卷积操作,得到所述第三高分辨率特征信息。
  13. 根据权利要求9所述的方法,其中,所述下投影模块包括第四卷积层,所述通过所述下投影模块对所述第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息,包括:
    通过所述第四卷积层对所述第三高分辨率特征信息进行第四滤波处理,得到所述滤波后特征信息,其中,经过所述第四滤波处理后得到的所述滤波后特征信息的分辨率低于所述第三高分辨率特征信息的分辨率。
  14. 根据权利要求13所述的方法,其中,所述第四滤波处理包括:下采样。
  15. 根据权利要求4所述的方法,其中,所述采样模块包括亚像素卷积层,所述通过所述采样模块对所述第二特征信息进行第二滤波处理,得到第三特征信息,包括:
    通过所述亚像素卷积层对所述第二特征信息进行所述第二滤波处理,得到所述第三特征信息,其中,经过所述第二滤波处理后得到的所述第三特征信息的分辨率高于所述第二特征信息的分辨率。
  16. 根据权利要求15所述的方法,其中,所述第二滤波处理包括:上采样。
  17. 根据权利要求4所述的方法,其中,所述重建模块包括第五卷积层,所述通过所述重建模块对所述第三特征信息进行超分辨率重建,得到所述处理块,包括:
    通过所述第五卷积层对所述第三特征信息进行卷积操作,得到所述处理块。
  18. 根据权利要求1至17任一项所述的方法,其中,所述方法还包括:
    确定训练数据集,所述训练数据集包括至少一张训练图像;
    对所述训练数据集进行预处理,得到所述预设神经网络模型的真值区域以及至少一组输入图像组;其中,所述输入图像组包括至少一张输入图像;
    基于所述真值区域,利用所述至少一组输入图像组对神经网络模型进行训练,得到至少一组候选模型参数;其中,所述真值区域用于确定所述神经网络模型的损失函数的损失值,所述至少一组候选模型参数是在所述损失函数的损失值收敛到预设阈值时得到的。
  19. 根据权利要求18所述的方法,其中,所述方法还包括:
    确定所述当前块的量化参数;
    根据所述量化参数,从所述至少一组候选模型参数中确定所述量化参数对应的模型参数;
    根据所述模型参数,确定所述预设神经网络模型;其中,当所述至少一组为多组时,所述输入图像组对应不同的量化参数,且多组所述候选模型参数与不同的量化参数之间具有对应关系。
  20. 根据权利要求19所述的方法,其中,所述方法还包括:
    对所述模型参数进行编码,将编码比特写入码流。
  21. 根据权利要求4所述的方法,其中,所述确定当前块的第一匹配块,包括:
    对所述当前块进行整像素运动估计,确定所述当前块的第一匹配块;其中,所述第一匹配块为所述当前块在整像素位置进行运动估计时率失真代价值最小的匹配块;
    所述对所述第一匹配块进行运动补偿增强,得到至少一个第二匹配块,包括:
    利用预设神经网络模型对所述第一匹配块进行分像素运动补偿,得到所述至少一个第二匹配块。
  22. 根据权利要求21所述的方法,其中,在所述得到至少一个第二匹配块之后,所述方法还包括:
    根据所述至少一个第二匹配块对所述当前块进行分像素运动估计,确定所述当前块的分像素匹配块,所述分像素匹配块为所述当前块在分像素位置进行运动估计时率失真代价值最小的匹配块;
    所述根据所述至少一个第二匹配块,确定所述当前块的运动信息,包括:
    利用所述第一匹配块对所述当前块进行预编码处理,确定第一率失真代价值;
    利用所述分像素匹配块对所述当前块进行预编码处理,确定第二率失真代价值;
    若所述第一率失真代价值大于所述第二率失真代价值,则确定所述当前块使用运动补偿增强处理方式,且确定所述运动信息为第一运动信息,所述第一运动信息用于指向分像素位置;
    若所述第一率失真代价值小于或等于所述第二率失真代价值,则确定所述当前块不使用运动补偿增强处理方式,且确定所述运动信息为第二运动信息,所述第二运动信息用于指向整像素位置。
  23. 根据权利要求22所述的方法,其中,所述方法还包括:
    若所述第一率失真代价值大于所述第二率失真代价值,则确定第一语法元素标识信息的取值为第一值;
    若所述第一率失真代价值小于或等于所述第二率失真代价值,则确定第一语法元素标识信息的取值为第二值;其中,所述第一语法元素标识信息用于指示所述当前块是否使用运动补偿增强处理方式。
  24. 根据权利要求23所述的方法,其中,所述方法还包括:
    对所述第一语法元素标识信息的取值进行编码,将编码比特写入码流。
  25. 根据权利要求22所述的方法,其中,所述根据所述运动信息,对所述当前块进行编码,包括:
    当所述当前块使用运动补偿增强处理方式时,根据所述第一运动信息和所述分像素匹配块,确定所述当前块的第一预测块;
    根据所述当前块和所述第一预测块,确定所述当前块的残差块;
    对所述残差块进行编码,将编码比特写入码流;
    或者,
    当所述当前块不使用运动补偿增强处理方式时,根据所述第二运动信息和所述第一匹配块,确定所述当前块的第二预测块;
    根据所述当前块和所述第二预测块,确定所述当前块的残差块;
    对所述残差块进行编码,将编码比特写入码流。
  26. 根据权利要求1至25任一项所述的方法,其中,所述方法还包括:
    对所述运动信息进行编码,将编码比特写入码流。
  27. 一种码流,其中,所述码流是根据待编码信息进行比特编码生成的;其中,所述待编码信息至少包括当前块的运动信息、当前块的残差块和第一语法元素标识信息的取值,所述第一语法元素标识信息用于指示所述当前块是否使用运动补偿增强处理方式。
  28. 一种解码方法,应用于解码器,所述方法包括:
    解析码流,确定第一语法元素标识信息的取值;
    若所述第一语法元素标识信息指示当前块使用运动补偿增强处理方式,则解析所述码流,确定所述当前块的第一运动信息;
    根据所述第一运动信息确定所述当前块的第一匹配块,并对所述第一匹配块进行运动补偿增强,得到至少一个第二匹配块;
    根据所述第一运动信息和所述至少一个第二匹配块,确定所述当前块的第一预测块;
    根据所述第一预测块,确定所述当前块的重建块。
  29. 根据权利要求28所述的方法,其中,所述方法还包括:
    解析码流,获取所述当前块的残差块;
    所述根据所述第一预测块,确定所述当前块的重建块,包括:
    根据所述残差块和所述第一预测块,确定所述当前块的重建块。
  30. 根据权利要求28所述的方法,其中,所述方法还包括:
    若所述第一语法元素标识信息指示当前块不使用运动补偿增强处理方式,则解析码流,获取所述当前块的第二运动信息;
    根据所述当前块的第二运动信息,确定所述当前块的第二预测块;
    根据所述第二预测块,确定所述当前块的重建块。
  31. 根据权利要求30所述的方法,其中,所述方法还包括:
    解析码流,获取所述当前块的残差块;
    所述根据所述第二预测块,确定所述当前块的重建块,包括:
    根据所述残差块和所述第二预测块,确定所述当前块的重建块。
  32. 根据权利要求28至31任一项所述的方法,其中,所述解析码流,确定第一语法元素标识信息的取值,包括:
    若所述第一语法元素标识信息的取值为第一值,则确定所述当前块使用运动补偿增强处理方式;
    若所述第一语法元素标识信息的取值为第二值,则确定所述当前块不使用运动补偿增强处理方式。
  33. 根据权利要求28所述的方法,其中,所述对所述第一匹配块进行运动补偿增强,得到至少一个第二匹配块,包括:
    对所述第一匹配块进行超分辨率和质量增强处理,得到处理块,其中,所述处理块的分辨率高于所述当前块的分辨率;
    对所述处理块进行第一滤波处理,得到所述至少一个第二匹配块,其中,经过所述第一滤波处理后得到的所述第二匹配块具有与所述当前块相同的分辨率。
  34. 根据权利要求33所述的方法,其中,所述第一滤波处理包括:下采样。
  35. 根据权利要求33所述的方法,其中,所述对所述第一匹配块进行运动补偿增强的步骤还包括:利用预设神经网络模型对所述第一匹配块进行运动补偿增强;其中,所述预设神经网络模型包括特征提取模块、残差投影模块组、采样模块和重建模块,且所述特征提取模块、所述残差投影模块组、所述采样模块和所述重建模块顺次连接;
    所述对所述第一匹配块进行超分辨率和质量增强处理,得到处理块,包括:
    通过所述特征提取模块对所述第一匹配块进行浅层特征提取,得到第一特征信息;
    通过所述残差投影模块组对所述第一特征信息进行残差特征学习,得到第二特征信息;
    通过所述采样模块对所述第二特征信息进行第二滤波处理,得到第三特征信息;
    通过所述重建模块对所述第三特征信息进行超分辨率重建,得到所述处理块。
  36. 根据权利要求35所述的方法,其中,所述特征提取模块包括第一卷积层;所述通过所述特征提取模块对所述第一匹配块进行浅层特征提取,得到第一特征信息,包括:
    通过所述第一卷积层对所述第一匹配块进行卷积操作,得到所述第一特征信息。
  37. 根据权利要求35所述的方法,其中,所述残差投影模块组包括N个残差投影块、第二卷积层和第一连接层,所述N个残差投影块、所述第二卷积层和所述第一连接层顺次连接,且所述第一连接层还与所述N个残差投影块中第一个残差投影块的输入连接;
    所述通过所述残差投影模块组对所述第一特征信息进行残差特征学习,得到第二特征信息,包括:
    通过所述N个残差投影块对所述第一特征信息进行残差特征学习,得到第一中间特征信息,其中,N为大于或等于1的整数;
    通过所述第二卷积层对所述第一中间特征信息进行卷积操作,得到第二中间特征信息;
    通过所述第一连接层对所述第一特征信息和所述第二中间特征信息进行加法计算,得到所述第二特征信息。
  38. 根据权利要求37所述的方法,其中,所述N个残差投影块是级联结构,所述级联结构的输入为所述第一特征信息,所述级联结构的输出为所述第二中间特征信息。
  39. 根据权利要求38所述的方法,其中,所述通过所述N个残差投影块对所述第一特征信息进行残差特征学习,得到第一中间特征信息,包括:
    当N等于1时,将所述第一特征信息输入到第一个残差投影块,得到所述第一个残差投影块的输出信息,并将所述第一个残差投影块的输出信息确定为所述第一中间特征信息;
    当N大于1时,在得到所述第一个残差投影块的输出信息后,将第d个残差投影块的输出信息输入到第d+1个残差投影块,得到所述第d+1个残差投影块的输出信息,并对d执行加1处理,直至得到第N个残差投影块的输出信息,并将所述第N个残差投影块的输出信息确定为所述第一中间特征信息;其中,d为大于或等于1且小于N的整数。
  40. 根据权利要求39所述的方法,其中,所述残差投影块包括上投影模块、M个残差模块、局部特征融合模块、下投影模块和第二连接层,所述上投影模块、所述M个残差模块、所述局部特征融合模块、所述下投影模块和所述第二连接层顺次连接,且所述第二连接层还与所述上投影模块的输入连接,所述M个残差模块的输出还分别与所述局部特征融合模块连接;
    所述方法还包括:
    通过所述上投影模块对所述残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息;
    通过所述M个残差模块对所述第一高分辨率特征信息进行不同等级的高分辨率特征学习,得到M个第二高分辨率特征信息,其中,M为大于或等于1的整数;
    通过所述局部特征融合模块对所述M个第二高分辨率特征信息进行融合操作,得到第三高分辨率特征信息;
    通过所述下投影模块对所述第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息;
    通过所述第二连接层对所述输入信息和所述滤波后特征信息进行加法计算,得到所述残差投影块的输出信息。
  41. 根据权利要求40所述的方法,其中,所述上投影模块包括转置卷积层,所述通过所述上投影模块对所述残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息,包括:
    通过所述转置卷积层对所述残差投影块的输入信息进行第三滤波处理,得到所述第一高分辨率特征信息,其中,经过所述第三滤波处理后得到的所述第一高分辨率特征信息的分辨率高于所述残差投影块的输入信息的分辨率。
  42. 根据权利要求41所述的方法,其中,所述第三滤波处理包括:上采样。
  43. 根据权利要求40所述的方法,其中,所述局部特征融合模块包括特征融合层和第三卷积层,所述通过所述局部特征融合模块对所述M个第二高分辨率特征信息进行融合操作,得到第三高分辨率特征信息,包括:
    通过所述特征融合层对所述M个第二高分辨率特征信息进行融合操作,得到融合特征信息;
    通过所述第三卷积层对所述融合特征信息进行卷积操作,得到所述第三高分辨率特征信息。
  44. 根据权利要求40所述的方法,其中,所述下投影模块包括第四卷积层,所述通过所述下投影模块对所述第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息,包括:
    通过所述第四卷积层对所述第三高分辨率特征信息进行第四滤波处理,得到所述滤波后特征信息,其中,经过所述第四滤波处理后得到的所述滤波后特征信息的分辨率低于所述第三高分辨率特征信息的分辨率。
  45. 根据权利要求44所述的方法,其中,所述第四滤波处理包括:下采样。
  46. 根据权利要求35所述的方法,其中,所述采样模块包括亚像素卷积层,所述通过所述采样模块对所述第二特征信息进行第二滤波处理,得到第三特征信息,包括:
    通过所述亚像素卷积层对所述第二特征信息进行所述第二滤波处理,得到所述第三特征信息,其中,经过所述第二滤波处理后得到的所述第三特征信息的分辨率高于所述第二特征信息的分辨率。
  47. 根据权利要求46所述的方法,其中,所述第二滤波处理包括:上采样。
  48. 根据权利要求35所述的方法,其中,所述重建模块包括第五卷积层,所述通过所述重建模块对所述第三特征信息进行超分辨率重建,得到所述处理块,包括:
    通过所述第五卷积层对所述第三特征信息进行卷积操作,得到所述处理块。
  49. 根据权利要求28至48任一项所述的方法,其中,所述方法还包括:
    确定训练数据集,所述训练数据集包括至少一张训练图像;
    对所述训练数据集进行预处理,得到所述预设神经网络模型的真值区域以及至少一组输入图像组;其中,所述输入图像组包括至少一张输入图像;
    基于所述真值区域,利用所述至少一组输入图像组对神经网络模型进行训练,得到至少一组候选模型参数;其中,所述真值区域用于确定所述神经网络模型的损失函数的损失值,所述至少一组候选模型参数是在所述损失函数的损失值收敛到预设阈值时得到的。
  50. 根据权利要求49所述的方法,其中,所述方法还包括:
    确定所述当前块的量化参数;
    根据所述量化参数,从所述至少一组候选模型参数中确定所述量化参数对应的模型参数;
    根据所述模型参数,确定所述预设神经网络模型;其中,当所述至少一组为多组时,所述输入图像组对应不同的量化参数,且多组所述候选模型参数与不同的量化参数之间具有对应关系。
  51. 根据权利要求28至50任一项所述的方法,其中,所述方法还包括:
    解析所述码流,获取模型参数;
    根据所述模型参数,确定所述预设神经网络模型。
  52. 一种编码器,所述编码器包括第一确定单元、第一运动补偿单元和编码单元;其中,
    所述第一确定单元,配置为确定当前块的第一匹配块;
    所述第一运动补偿单元,配置为对所述第一匹配块进行运动补偿增强,得到至少一个第二匹配块;
    所述第一确定单元,还配置为根据所述至少一个第二匹配块,确定所述当前块的运动信息;
    所述编码单元,配置为根据所述运动信息,对所述当前块进行编码。
  53. 一种编码器,所述编码器包括第一存储器和第一处理器;其中,
    所述第一存储器,用于存储能够在所述第一处理器上运行的计算机程序;
    所述第一处理器,用于在运行所述计算机程序时,执行如权利要求1至26任一项所述的方法。
  54. 一种解码器,所述解码器包括解析单元、第二确定单元和第二运动补偿单元;其中,
    所述解析单元,配置为解析码流,确定第一语法元素标识信息的取值;
    所述解析单元,还配置为若所述第一语法元素标识信息指示当前块使用运动补偿增强处理方式,则解析所述码流,确定所述当前块的第一运动信息;
    所述第二运动补偿单元,配置为根据所述第一运动信息确定所述当前块的第一匹配块,并对所述第一匹配块进行运动补偿增强,得到至少一个第二匹配块;
    所述第二确定单元,配置为根据所述第一运动信息和所述至少一个第二匹配块,确定所述当前块的第一预测块;以及根据所述第一预测块,确定所述当前块的重建块。
  55. 一种解码器,所述解码器包括第二存储器和第二处理器;其中,
    所述第二存储器,用于存储能够在所述第二处理器上运行的计算机程序;
    所述第二处理器,用于在运行所述计算机程序时,执行如权利要求28至51任一项所述的方法。
  56. 一种计算机存储介质,其中,所述计算机存储介质存储有计算机程序,所述计算机程序被第一处理器执行时实现如权利要求1至26任一项所述的方法、或者被第二处理器执行时实现如权利要求28至51任一项所述的方法。
PCT/CN2021/096818 2021-05-28 2021-05-28 编解码方法、码流、编码器、解码器以及存储介质 WO2022246809A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP21942383.7A EP4351136A1 (en) 2021-05-28 2021-05-28 Encoding method, decoding method, code stream, encoder, decoder and storage medium
PCT/CN2021/096818 WO2022246809A1 (zh) 2021-05-28 2021-05-28 编解码方法、码流、编码器、解码器以及存储介质
CN202180097906.8A CN117280685A (zh) 2021-05-28 2021-05-28 编解码方法、码流、编码器、解码器以及存储介质
US18/520,922 US20240098271A1 (en) 2021-05-28 2023-11-28 Encoding method, decoding method, bitstream, encoder, decoder and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/096818 WO2022246809A1 (zh) 2021-05-28 2021-05-28 编解码方法、码流、编码器、解码器以及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/520,922 Continuation US20240098271A1 (en) 2021-05-28 2023-11-28 Encoding method, decoding method, bitstream, encoder, decoder and storage medium

Publications (1)

Publication Number Publication Date
WO2022246809A1 true WO2022246809A1 (zh) 2022-12-01

Family

ID=84228340

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096818 WO2022246809A1 (zh) 2021-05-28 2021-05-28 编解码方法、码流、编码器、解码器以及存储介质

Country Status (4)

Country Link
US (1) US20240098271A1 (zh)
EP (1) EP4351136A1 (zh)
CN (1) CN117280685A (zh)
WO (1) WO2022246809A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080198934A1 (en) * 2007-02-20 2008-08-21 Edward Hong Motion refinement engine for use in video encoding in accordance with a plurality of sub-pixel resolutions and methods for use therewith
US20120063515A1 (en) * 2010-09-09 2012-03-15 Qualcomm Incorporated Efficient Coding of Video Parameters for Weighted Motion Compensated Prediction in Video Coding
CN109729363A (zh) * 2017-10-31 2019-05-07 腾讯科技(深圳)有限公司 一种视频图像的处理方法和装置
CN111010568A (zh) * 2018-10-06 2020-04-14 华为技术有限公司 插值滤波器的训练方法、装置及视频图像编解码方法、编解码器
CN111586415A (zh) * 2020-05-29 2020-08-25 浙江大华技术股份有限公司 视频编码方法、装置、编码器及存储装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080198934A1 (en) * 2007-02-20 2008-08-21 Edward Hong Motion refinement engine for use in video encoding in accordance with a plurality of sub-pixel resolutions and methods for use therewith
US20120063515A1 (en) * 2010-09-09 2012-03-15 Qualcomm Incorporated Efficient Coding of Video Parameters for Weighted Motion Compensated Prediction in Video Coding
CN109729363A (zh) * 2017-10-31 2019-05-07 腾讯科技(深圳)有限公司 一种视频图像的处理方法和装置
CN111010568A (zh) * 2018-10-06 2020-04-14 华为技术有限公司 插值滤波器的训练方法、装置及视频图像编解码方法、编解码器
CN111586415A (zh) * 2020-05-29 2020-08-25 浙江大华技术股份有限公司 视频编码方法、装置、编码器及存储装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
D. BULL, F. ZHANG, M. AFONSO (UNIV. OF BRISTOL): "Description of SDR video coding technology proposal by University of Bristol", 10. JVET MEETING; 20180410 - 20180420; SAN DIEGO; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), 12 April 2018 (2018-04-12), pages 1 - 35, XP030248160 *

Also Published As

Publication number Publication date
EP4351136A1 (en) 2024-04-10
US20240098271A1 (en) 2024-03-21
CN117280685A (zh) 2023-12-22

Similar Documents

Publication Publication Date Title
JP7114153B2 (ja) ビデオエンコーディング、デコーディング方法、装置、コンピュータ機器及びコンピュータプログラム
WO2021203394A1 (zh) 环路滤波的方法与装置
CN115606179A (zh) 用于使用学习的下采样特征进行图像和视频编码的基于学习的下采样的cnn滤波器
WO2023000179A1 (zh) 视频超分辨网络及视频超分辨、编解码处理方法、装置
CN108848377B (zh) 视频编码、解码方法、装置、计算机设备和存储介质
WO2023130333A1 (zh) 编解码方法、编码器、解码器以及存储介质
US20230262212A1 (en) Picture prediction method, encoder, decoder, and computer storage medium
CN111800629A (zh) 视频解码方法、编码方法以及视频解码器和编码器
CN112534817A (zh) 视频图像分量的预测方法、装置及计算机存储介质
WO2021120122A1 (zh) 图像分量预测方法、编码器、解码器以及存储介质
CN115552905A (zh) 用于图像和视频编码的基于全局跳过连接的cnn滤波器
WO2023142926A1 (zh) 一种图像处理方法和装置
CN116582685A (zh) 一种基于ai的分级残差编码方法、装置、设备和存储介质
JP2023537823A (ja) ビデオ処理方法、装置、機器、デコーダ、システム及び記憶媒体
CN112601095B (zh) 一种视频亮度和色度分数插值模型的创建方法及系统
CN113784128A (zh) 图像预测方法、编码器、解码器以及存储介质
CN112866697B (zh) 视频图像编解码方法、装置、电子设备及存储介质
CN113822824A (zh) 视频去模糊方法、装置、设备及存储介质
RU2683614C2 (ru) Кодер, декодер и способ работы с использованием интерполяции
CN117441186A (zh) 图像解码及处理方法、装置及设备
US20230262251A1 (en) Picture prediction method, encoder, decoder and computer storage medium
WO2022246809A1 (zh) 编解码方法、码流、编码器、解码器以及存储介质
CN113261285A (zh) 编码方法、解码方法、编码器、解码器以及存储介质
CN110710204A (zh) 用于编码和解码表示至少一个图像的数据流的方法和设备
CN112313950A (zh) 视频图像分量的预测方法、装置及计算机存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21942383

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180097906.8

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2021942383

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2021942383

Country of ref document: EP

Effective date: 20240102