WO2022246809A1 - 编解码方法、码流、编码器、解码器以及存储介质 - Google Patents
编解码方法、码流、编码器、解码器以及存储介质 Download PDFInfo
- Publication number
- WO2022246809A1 WO2022246809A1 PCT/CN2021/096818 CN2021096818W WO2022246809A1 WO 2022246809 A1 WO2022246809 A1 WO 2022246809A1 CN 2021096818 W CN2021096818 W CN 2021096818W WO 2022246809 A1 WO2022246809 A1 WO 2022246809A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- block
- information
- feature information
- residual
- resolution
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 292
- 230000008569 process Effects 0.000 claims description 123
- 238000001914 filtration Methods 0.000 claims description 117
- 238000012545 processing Methods 0.000 claims description 99
- 230000004927 fusion Effects 0.000 claims description 78
- 238000003062 neural network model Methods 0.000 claims description 77
- 238000000605 extraction Methods 0.000 claims description 49
- 238000013139 quantization Methods 0.000 claims description 45
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 44
- 238000012549 training Methods 0.000 claims description 43
- 238000005070 sampling Methods 0.000 claims description 42
- 230000015654 memory Effects 0.000 claims description 39
- 230000006870 function Effects 0.000 claims description 28
- 238000004590 computer program Methods 0.000 claims description 24
- 238000003672 processing method Methods 0.000 claims description 24
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000004458 analytical method Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 21
- 238000013527 convolutional neural network Methods 0.000 description 11
- 239000013598 vector Substances 0.000 description 10
- 230000003044 adaptive effect Effects 0.000 description 7
- 241000985610 Forpus Species 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000001360 synchronised effect Effects 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 230000000737 periodic effect Effects 0.000 description 4
- 239000000203 mixture Substances 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000005314 correlation function Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 1
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 1
- 229910003460 diamond Inorganic materials 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000007373 indentation Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000013522 software testing Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/59—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/137—Motion inside a coding unit, e.g. average field, frame or block difference
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
Definitions
- the present application relates to the technical field of video processing, and in particular to a codec method, a code stream, an encoder, a decoder, and a storage medium.
- the encoding and decoding of the current block may adopt intra-frame prediction and inter-frame prediction.
- sub-pixel motion compensation technology is a key technology to improve compression efficiency by eliminating video temporal redundancy, and it is mainly used in motion compensation and motion estimation of inter-frame prediction.
- Embodiments of the present application provide a codec method, a code stream, a coder, a decoder, and a storage medium, which can save a code rate and improve codec efficiency while ensuring the same decoding quality.
- the embodiment of the present application provides an encoding method applied to an encoder, and the method includes:
- the current block is encoded according to the motion information.
- the embodiment of the present application provides a code stream, which is generated by performing bit coding according to the information to be coded; wherein,
- the information to be coded includes at least the motion information of the current block, the residual block of the current block, and the value of the first syntax element identification information, and the first syntax element identification information is used to indicate whether the current block uses motion compensation enhancement processing.
- the embodiment of the present application provides a decoding method, which is applied to a decoder, and the method includes:
- the first syntax element identification information indicates that the current block uses a motion compensation enhancement processing method, then parse the code stream to determine the first motion information of the current block;
- a reconstructed block of the current block is determined.
- an encoder which includes a first determination unit, a first motion compensation unit, and a coding unit; wherein,
- a first determining unit configured to determine a first matching block of the current block
- the first motion compensation unit is configured to perform motion compensation enhancement on the first matching block to obtain at least one second matching block;
- the first determining unit is further configured to determine motion information of the current block according to at least one second matching block;
- the encoding unit is configured to encode the current block according to the motion information.
- the embodiment of the present application provides an encoder, where the encoder includes a first memory and a first processor; wherein,
- a first memory for storing a computer program capable of running on the first processor
- the first processor is configured to execute the method of the first aspect when running the computer program.
- the embodiment of the present application provides a decoder, which includes an analysis unit, a second determination unit, and a second motion compensation unit; wherein,
- the parsing unit is configured to parse the code stream and determine the value of the first syntax element identification information
- the parsing unit is further configured to parse the code stream and determine the first motion information of the current block if the first syntax element identification information indicates that the current block uses a motion compensation enhancement processing method;
- the second motion compensation unit is configured to determine a first matching block of the current block according to the first motion information, and perform motion compensation enhancement on the first matching block to obtain at least one second matching block;
- the second determination unit is configured to determine a first prediction block of the current block according to the first motion information and at least one second matching block; and determine a reconstruction block of the current block according to the first prediction block.
- the embodiment of the present application provides a decoder, where the decoder includes a second memory and a second processor; wherein,
- a second memory for storing a computer program capable of running on the second processor
- the second processor is configured to execute the method of the third aspect when running the computer program.
- the embodiment of the present application provides a computer storage medium, the computer storage medium stores a computer program, and when the computer program is executed by the first processor, the method of the first aspect is implemented, or when the computer program is executed by the second processor, Realize as the method of the third aspect.
- Embodiments of the present application provide a codec method, code stream, encoder, decoder, and storage medium.
- On the encoder side by determining the first matching block of the current block and performing motion compensation enhancement on the first matching block, the obtained at least one second matching block; determining the motion information of the current block according to the at least one second matching block; and encoding the current block according to the motion information.
- the decoder side by parsing the code stream, determine the value of the first syntax element identification information; if the first syntax element identification information indicates that the current block uses motion compensation enhancement processing, then parse the code stream to determine the first motion of the current block Information; determine the first matching block of the current block according to the first motion information, and perform motion compensation enhancement on the first matching block to obtain at least one second matching block; determine the current block according to the first motion information and at least one second matching block A first prediction block of the block; according to the first prediction block, a reconstruction block of the current block is determined.
- FIG. 1 is a schematic diagram of a representation form of inter-frame prediction provided by an embodiment of the present application
- Fig. 2 is a schematic diagram of fractional positions of a luminance component with sub-pixel precision provided by an embodiment of the present application
- FIG. 3A is a schematic block diagram of a video encoding system provided by an embodiment of the present application.
- FIG. 3B is a schematic block diagram of a video decoding system provided by an embodiment of the present application.
- FIG. 4 is a schematic flowchart of an encoding method provided in an embodiment of the present application.
- FIG. 5 is a schematic diagram of a network structure of a preset neural network model provided by an embodiment of the present application.
- FIG. 6 is a schematic diagram of a network structure of a residual projection block provided by an embodiment of the present application.
- FIG. 7 is a schematic flowchart of a decoding method provided by an embodiment of the present application.
- FIG. 8 is a schematic structural diagram of an encoder provided in an embodiment of the present application.
- FIG. 9 is a schematic diagram of a specific hardware structure of an encoder provided in an embodiment of the present application.
- FIG. 10 is a schematic structural diagram of a decoder provided in an embodiment of the present application.
- FIG. 11 is a schematic diagram of a specific hardware structure of a decoder provided in an embodiment of the present application.
- references to “some embodiments” describe a subset of all possible embodiments, but it is understood that “some embodiments” may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.
- first ⁇ second ⁇ third involved in the embodiment of the present application is only used to distinguish similar objects, and does not represent a specific ordering of objects. Understandably, “first ⁇ second ⁇ The specific order or sequence of "third” can be interchanged where allowed, so that the embodiments of the application described herein can be implemented in an order other than that illustrated or described herein.
- the first image component, the second image component and the third image component are generally used to represent the coding block (Coding Block, CB); where these three image components are a luminance component and a blue chrominance component respectively And a red chrominance component, specifically, the luminance component is usually represented by the symbol Y, the blue chrominance component is usually represented by the symbol Cb or U, and the red chrominance component is usually represented by the symbol Cr or V; thus, the video image can be expressed in the YCbCr format Indicates that it can also be expressed in YUV format.
- MPEG Moving Picture Experts Group
- JVET Joint Video Experts Team
- VVC Very Video Coding
- VVC's reference software testing platform VVC Test Model, VTM
- Peak Signal to Noise Ratio Peak Signal to Noise Ratio
- inter-frame prediction is the process of predicting the current frame by using the decoded and reconstructed reference frame. Its core is to obtain the optimal matching block (also called "best matching piece").
- the motion information may include a prediction direction, an index number of a reference frame, and a motion vector.
- FIG. 1 it shows a schematic diagram of a representation form of inter-frame prediction provided by an embodiment of the present application. As shown in Figure 1, for the encoder, the encoder uses a certain search algorithm to find an optimal matching block for the current block to be encoded in the current frame, and the displacement between the two is called a motion vector. This process can be called for motion estimation.
- the encoder first needs to perform integer pixel motion estimation to obtain the optimal matching block at the integer pixel position.
- the concept of sub-pixel motion compensation is proposed.
- the so-called sub-pixel motion compensation is to interpolate the optimal matching block at the integer pixel position through an interpolation filter to generate 1/2 precision sub-pixel samples and 1/4 precision sub-pixel samples.
- FIG. 2 shows a schematic diagram of a fractional position of a luminance component with sub-pixel accuracy provided by an embodiment of the present application.
- uppercase letters represent integer pixel samples, that is, A i,j represents pixels at integer positions; lowercase letters represent sub-pixel samples, where b i,j , h i,j , j i,j all represent dichotomous The sub-pixels of one-precision positions, and the rest of the lowercase letters represent sub-pixels of quarter-precision positions.
- the essence of sub-pixel motion compensation is to further optimize the matching blocks at integer pixel positions by means of interpolation filtering, where the main functions of the interpolation filter include removing spectral aliasing caused by digital sampling and suppressing coding noise.
- the existing technical solutions still have some defects, especially it is difficult to adapt to the increasingly diverse video content and complex encoding environment, resulting in low encoding and decoding efficiency.
- the embodiment of the present application provides an encoding method, by determining the first matching block of the current block; performing motion compensation enhancement on the first matching block to obtain at least one second matching block; determining the current block according to the at least one second matching block
- the motion information of the current block is encoded according to the motion information.
- the embodiment of the present application provides a decoding method, which determines the value of the first syntax element identification information by analyzing the code stream; if the first syntax element identification information indicates that the current block uses motion compensation enhancement processing, then parses the code stream to determine First motion information of the current block; determining a first matching block of the current block according to the first motion information, and performing motion compensation enhancement on the first matching block to obtain at least one second matching block; according to the first motion information and at least one first matching block
- the second matching block is to determine the first prediction block of the current block; and to determine the reconstruction block of the current block according to the first prediction block.
- the video coding system 10 includes a transform and quantization unit 101, an intra frame estimation unit 102, an intra frame prediction unit 103, a motion compensation unit 104, a motion estimation unit 105, an inverse transform and inverse quantization unit 106, a filter Control analysis unit 107, filter unit 108, encoding unit 109 and decoded image cache unit 110 etc., wherein, filtering unit 108 can realize deblocking filtering and sample adaptive indentation (Sample Adaptive Offset, SAO) filtering, encoding unit 109 can realize Header information coding and context-based adaptive binary arithmetic coding (Context-based Adaptive Binary Arithmetic Coding, CABAC).
- SAO Sample Adaptive Offset
- a video coding block can be obtained by dividing the coding tree block (Coding Tree Unit, CTU), and then the residual pixel information obtained after intra-frame or inter-frame prediction is paired by the transformation and quantization unit 101
- the video coding block is transformed, including transforming the residual information from the pixel domain to the transform domain, and quantizing the obtained transform coefficients to further reduce the bit rate;
- the intra frame estimation unit 102 and the intra frame prediction unit 103 are used for Intra-frame prediction is performed on the video coding block; specifically, the intra-frame estimation unit 102 and the intra-frame prediction unit 103 are used to determine the intra-frame prediction mode to be used to code the video coding block;
- the motion compensation unit 104 and the motion estimation unit 105 is used to perform inter-frame predictive encoding of the received video coding block relative to one or more blocks in one or more reference frames to provide temporal prediction information;
- the motion estimation performed by the motion estimation unit 105 is used to generate motion vectors process, the motion vector can estimate the motion of the video
- the context content can be based on adjacent coding blocks, and can be used to encode the information indicating the determined intra-frame prediction mode, and output the code stream of the video signal; and the decoded image buffer unit 110 is used to store the reconstructed video coding block for Forecast reference. As the video image encoding progresses, new reconstructed video encoding blocks will be continuously generated, and these reconstructed video encoding blocks will be stored in the decoded image buffer unit 110 .
- the video decoding system 20 includes a decoding unit 201, an inverse transform and inverse quantization unit 202, an intra prediction unit 203, a motion compensation unit 204, a filtering unit 205, and a decoded image buffer unit 206, etc., wherein the decoding unit 201 can implement header information decoding and CABAC decoding, and filtering unit 205 can implement deblocking filtering and SAO filtering.
- the decoding unit 201 can implement header information decoding and CABAC decoding
- filtering unit 205 can implement deblocking filtering and SAO filtering.
- the code stream of the video signal is output; the code stream is input into the video decoding system 20, and first passes through the decoding unit 201 to obtain the decoded transform coefficient; for the transform coefficient, pass
- the inverse transform and inverse quantization unit 202 performs processing to generate a residual block in the pixel domain; the intra prediction unit 203 is operable to generate residual blocks based on the determined intra prediction mode and data from previously decoded blocks of the current frame or picture.
- the motion compensation unit 204 determines the prediction information for the video decoding block by parsing motion vectors and other associated syntax elements, and uses the prediction information to generate the predictive properties of the video decoding block being decoded block; a decoded video block is formed by summing the residual block from the inverse transform and inverse quantization unit 202 with the corresponding predictive block produced by the intra prediction unit 203 or the motion compensation unit 204; the decoded video signal Video quality can be improved by filtering unit 205 in order to remove block artifacts; the decoded video blocks are then stored in the decoded picture buffer unit 206, which stores reference pictures for subsequent intra prediction or motion compensation , and is also used for the output of the video signal, that is, the restored original video signal is obtained.
- the embodiment of the present application can be applied to the inter-frame prediction part of the video encoding system 10 (which may be referred to as “encoder” for short), specifically the motion compensation unit 104 and the motion estimation unit 105 as shown in FIG. 3A;
- the embodiments of the application can also be applied to the inter-frame prediction part of the video decoding system 20 (which may be referred to as “decoder” for short), specifically the motion compensation unit 204 as shown in FIG. 3B. That is to say, the embodiment of the present application can be applied to an encoder, a decoder, or even both an encoder and a decoder, but no specific limitation is made here.
- the "current block” specifically refers to the coding block currently to be inter-frame predicted in the image to be coded; when the method of the embodiment of the present application is applied to the decoding When using a device, the “current block” specifically refers to the currently decoded block to be inter-frame predicted in the image to be decoded.
- FIG. 4 shows a schematic flowchart of an encoding method provided in an embodiment of the present application. As shown in Figure 4, the method may include:
- S401 Determine the first matching block of the current block.
- the video image can be divided into multiple image blocks, and each image block to be encoded can be called a coding block, and the current block here specifically refers to the current block to be subjected to inter-frame prediction. code block.
- the current block may be a CTU, or even a CU, PU, etc., which is not limited in this embodiment of the present application.
- the encoding method in the embodiment of the present application is mainly applied to motion estimation and motion compensation of inter-frame prediction.
- motion compensation is to use the partial image in the decoded and reconstructed reference frame to predict and compensate the current partial image, which can reduce the redundant information of the moving image
- motion estimation is to extract the motion information from the video sequence, that is, to extract the moving object from the reference
- the displacement information between the frame and the current frame is estimated, that is, the motion information described in the embodiment of the present application, and this process is called motion estimation.
- the first matching block here can be obtained based on integer pixel motion estimation, or can be obtained by using sub-pixel interpolation and filtering in related technologies, and this embodiment of the application does not make any limitation.
- the determining the first matching block of the current block may include:
- Integer pixel motion estimation is performed on the current block, and the first matching block of the current block is determined.
- the target matching block at the integer pixel position ( Or referred to as "the first matching block") is the matching block with the smallest rate-distortion cost when motion estimation is performed at the integer pixel position of the current block.
- motion estimation methods mainly include two categories: pixel recursive method and block matching method.
- the former has high complexity and is rarely used in practice; the latter is widely used in video coding standards.
- the block matching method it mainly includes a block matching criterion and a search method.
- SAD Absolute Difference
- MSE mean square error
- NCCF Normalized Cross Correlation Function
- the matching block at the integer pixel position in the smallest case is the optimal matching block, that is, the target matching block described in the embodiment of the present application. That is to say, the target matching block at the integer pixel position is a matching block corresponding to the minimum rate-distortion cost value selected from multiple matching blocks at the integer pixel position.
- S402 Perform motion compensation enhancement on the first matching block to obtain at least one second matching block.
- the embodiment of the present application may further perform motion compensation enhancement.
- DCTIF is generally used in video coding standards to perform half-precision sub-pixel interpolation.
- the basic idea is to forward transform the integer pixel samples to the DCT domain, and then use the DCT base sampled at the target sub-pixel position to inversely transform the DCT coefficients back to the spatial domain.
- This process can be represented by a finite impulse response filtering process. Among them, assuming that a given pixel is represented as f(i), the pixel obtained by interpolation is represented as Then the mathematical form of the DCTIF interpolation process is shown in formula (1).
- the tap coefficients of the interpolation filter of the half-precision sub-pixel samples are [-1,4,-11,40,40,-11,4,- 1].
- the embodiment of the present application proposes a motion compensation enhancement processing manner based on a preset neural network model.
- the step of performing motion compensation enhancement on the first matching block may further include: performing motion compensation enhancement on the first matching block by using a preset neural network model.
- performing motion compensation enhancement on the first matching block to obtain at least one second matching block may include:
- the first filtering process is performed on the processing block to obtain at least one second matching block.
- the resolution of the processing block is higher than the resolution of the current block.
- the resulting processed blocks after super-resolution and quality enhancement processing have high-quality and high-resolution performance.
- the first matching block has the same resolution as the current block
- the second matching block obtained after the first filtering process also has the same resolution as the current block.
- the first filtering process may include: downsampling. That is to say, after the processing block is obtained, at least one second matching block can be obtained by down-sampling the processing block.
- performing motion compensation enhancement on the first matching block using a preset neural network model to obtain at least one second matching block may include:
- the precision of the second matching block is half precision, and the number of the second matching blocks is four; in another In a possible implementation manner, the precision of the second matching block is quarter precision, and the number of the second matching blocks is 16; however, this embodiment of the present application does not make any limitation thereto.
- the preset neural network model is a convolutional neural network (Convolutional Neural Networks, CNN) model.
- CNN convolutional Neural Networks
- CNN is a kind of feed-forward neural network with convolution calculation and deep structure, and it is one of the representative algorithms of deep learning.
- the convolutional neural network has the ability to learn representations, and can perform shift-invariant classification on input information according to its hierarchical structure, so it can also be called "Shift-Invariant Artificial Neural Networks". , SIANN)".
- this embodiment is different from the interpolation filter in the above-mentioned embodiment to interpolate and filter three sub-pixel samples of half precision for the first matching block.
- This embodiment uses the convolutional neural network model to block to achieve end-to-end super-resolution and quality enhancement, and then downsample the output high-resolution image to generate four half-precision sub-pixel samples (i.e., the “second matching block”).
- the preset neural network model may include a feature extraction module, a residual projection module group, a sampling module and a reconstruction module.
- the feature extraction module, the residual projection module group, the sampling module and the reconstruction module are connected in sequence.
- performing super-resolution and quality enhancement processing on the first matching block to obtain a processing block may include:
- the feature extraction module is mainly used to extract shallow features, so the feature extraction module can also be called a "shallow feature extraction module".
- the shallow features in the embodiments of the present application mainly refer to low-level simple features (such as edge features, etc.).
- the feature extraction module may include a first convolutional layer.
- performing shallow feature extraction on the first matching block through the feature extraction module to obtain the first feature information may include: performing a convolution operation on the first matching block through the first convolution layer to obtain the first feature information .
- the size of the convolution kernel of the first convolution layer is K ⁇ L
- the number of convolution kernels of the first convolution layer is an integer power of 2
- K and L are positive integers greater than zero.
- the size of the convolution kernel of the first convolution layer may be 3 ⁇ 3
- the number of convolution kernels of the first convolution layer is 64, but there is no limitation here.
- the residual projection module set may include N residual projection blocks, a second convolutional layer, and a first connection layer; where N is greater than or equal to Integer of 1.
- the N residual projection blocks, the second convolutional layer and the first connection layer are sequentially connected, and the first connection layer is also connected with the first residual projection block of the N residual projection blocks. Enter the connection.
- performing residual feature learning on the first feature information through the residual projection module group to obtain the second feature information includes:
- the first feature information and the second intermediate feature information are added through the first connection layer to obtain the second feature information.
- the size of the convolution kernel of the second convolution layer is K ⁇ L
- the number of convolution kernels of the second convolution layer is an integer power of 2
- K and L are positive integers greater than zero.
- the size of the convolution kernel of the second convolution layer is 3 ⁇ 3
- the number of convolution kernels of the second convolution layer is 64, but there is no limitation here.
- the input of the d-th residual projection block is denoted by F d-1
- the output of the d-th residual projection block is denoted by F d .
- the N residual projection blocks are a cascade structure
- the input of the cascade structure is the first feature information
- the output of the cascade structure is the first feature information.
- performing residual feature learning on the first feature information through N residual projection blocks to obtain the first intermediate feature information may include:
- N When N is equal to 1, input the first feature information to the first residual projection block, obtain the output information of the first residual projection block, and determine the output information of the first residual projection block as the first intermediate characteristic information;
- N is greater than 1
- input the output information of the dth residual projection block to the d+1th residual projection block to obtain the d+1th residual The output information of the difference projection block, and add 1 to d until the output information of the Nth residual projection block is obtained, and the output information of the Nth residual projection block is determined as the first intermediate feature information; where, d is an integer greater than or equal to 1 and less than N.
- the output information of the residual projection block is the first intermediate feature information; if N is greater than 1, that is, the residual projection module There are more than two residual projection blocks in the group.
- the output of the previous residual projection block is the input of the next residual projection block until the output of the last residual projection block is obtained by stacking.
- Information, at this time the output information of the last residual projection block is the first intermediate feature information.
- the sampling module may include a sub-pixel convolution layer.
- said performing the second filtering process on the second feature information through the sampling module to obtain the third feature information may include:
- the second filtering process is performed on the second feature information through the sub-pixel convolution layer to obtain the third feature information.
- the resolution of the third feature information obtained after the second filtering process is higher than the resolution of the second feature information.
- the second filtering process may include: upsampling. That is to say, the sampling module is mainly used for upsampling the second characteristic information, so the sampling module may also be called an "upsampling module”.
- the sampling module may use a sub-pixel convolution layer, or may add a sub-pixel convolution layer.
- the sub-pixel convolutional layer can also be a PixShuffle module (or, called Pixelshuffle module), which realizes the function of inputting a low-resolution H ⁇ W image, and converting it to It is transformed into a rH ⁇ rW high-resolution input image.
- the implementation process does not directly generate this high-resolution image by means of interpolation, but first obtains the feature map of r 2 channels through convolution (the size of the feature map is consistent with the input low-resolution image), and then passes periodic screening (periodic shuffling) method to obtain this high-resolution image; wherein, r can be the magnification of the image.
- r can be the magnification of the image.
- the feature map the number of channels of which is r 2 , the r 2 channels of each pixel are rearranged into an r ⁇ r area, corresponding to a sub-block of r ⁇ r size in the high-resolution image,
- the feature image of size r 2 ⁇ H ⁇ W is rearranged into a high-resolution image of size 1 ⁇ rH ⁇ rW.
- the reconstruction module may include a fifth convolutional layer.
- the super-resolution reconstruction of the third feature information by the reconstruction module to obtain the processing block may include:
- a convolution operation is performed on the third feature information through the fifth convolution layer to obtain a processing block.
- the size of the convolution kernel of the fifth convolution layer is K ⁇ L
- the number of convolution kernels of the fifth convolution layer is an integer power of 2
- K and L are positive integers greater than zero.
- the size of the convolution kernel of the fifth convolution layer is 3 ⁇ 3
- the number of convolution kernels of the fifth convolution layer is 1, but there is no limitation here.
- FIG. 5 shows a schematic diagram of a network structure of a preset neural network model provided by an embodiment of the present application.
- the preset neural network model can be represented by RPNet.
- RPNet mainly includes Shallow Feature Extraction Net, Residual Projection Blocks, Up-sampling Net and Reconstruction Net ) and other four parts.
- the shallow feature extraction network layer is the feature extraction module described in the embodiment of the application, which can be the first convolutional layer
- the upsampling network layer is the sampling module described in the embodiment of the application, which can be a sub-pixel volume The stacking layer
- the reconstruction network layer is the reconstruction module described in the embodiment of the present application, which may be the fifth convolutional layer.
- I LR represents the first matching block described in the embodiment of the present application, that is, the low-resolution image input by RPNet
- I SR represents the processing block described in the embodiment of the present application, that is, the high-resolution image ( can also be referred to as a super-resolution image); that is, I LR and I SR denote the input and output of RPNet, respectively.
- the network structure of the model will be described in detail below with reference to FIG. 5 .
- HSFE ( ) represents the convolution operation
- F 0 represents the shallow features of the extracted low-resolution image, which are used as the input of the residual projection module.
- W LSC represents the weight value of the convolutional layer after the Nth residual projection block.
- GRL Global Residual Leaning
- H UP ( ) represents the convolution operation to achieve upsampling
- F UP represents the extracted third feature information, which is used as the input of the reconstruction network layer.
- H REC ( ) denotes the convolution operation to achieve super-resolution reconstruction.
- the residual projection block may include an upper projection module, M residual modules, a local feature fusion module, a lower projection module and a second connection layer; wherein, M is an integer greater than or equal to 1.
- the upper projection module, the M residual modules, the local feature fusion module, the lower projection module and the second connection layer are sequentially connected, and the second connection layer is also connected to the input of the upper projection module, M
- the output of the residual module is also respectively connected with the local feature fusion module.
- the method may also include:
- the input information and the filtered feature information are added through the second connection layer to obtain the output information of the residual projection block.
- the up-projection module may include a transposed convolutional layer.
- performing the third filtering process on the input information of the residual projection block through the up-projection module to obtain the first high-resolution feature information may include:
- the third filtering process is performed on the input information of the residual projection block by transposing the convolution layer to obtain the first high-resolution feature information.
- the resolution of the first high-resolution feature information obtained after the third filtering process is higher than the resolution of the input information of the residual projection block.
- the third filtering process may include: upsampling. That is, the input information of the residual projection block is upsampled by transposing the convolutional layer to obtain the first high-resolution feature information.
- the local feature fusion module may include a feature fusion layer and a third convolutional layer, and the M second high-resolution feature information is processed by the local feature fusion module Perform a fusion operation to obtain the third high-resolution feature information, including:
- a third convolutional layer is used to perform a convolution operation on the fused feature information to obtain third high-resolution feature information.
- the size of the convolution kernel of the third convolution layer is K ⁇ L
- the number of convolution kernels of the third convolution layer is an integer power of 2
- K and L are positive integers greater than zero.
- the size of the convolution kernel of the third convolution layer is 1 ⁇ 1
- the number of convolution kernels of the third convolution layer is 64, but there is no limitation here.
- the M second high-resolution feature information is fused through the feature fusion layer, but in order to give full play to the learning ability of the residual network, a 1 ⁇ 1 convolutional layer is also introduced here
- the fusion operation of the feature information learned by the residual module can adaptively control the learned feature information.
- the lower projection module may include a fourth convolutional layer, and the fourth filtering process is performed on the third high-resolution feature information through the lower projection module to obtain the filtered Characteristic information, including:
- a fourth filtering process is performed on the third high-resolution feature information through the fourth convolution layer to obtain filtered feature information.
- the resolution of the filtered feature information obtained after the fourth filtering process is lower than the resolution of the third high-resolution feature information.
- the filtered feature information obtained after the fourth filtering process has the same resolution as the input information of the residual projection block.
- the fourth filtering process may include: downsampling. That is to say, the third high-resolution feature information is down-sampled through the fourth convolutional layer to obtain filtered feature information.
- FIG. 6 shows a schematic diagram of a network structure of a residual projection block provided by an embodiment of the present application.
- the residual projection block can be represented by RPB.
- RPB mainly includes Up-Projection Unit, Residual Block, Local Feature Fusion and Down-Projection Unit.
- M residual modules for the dth residual projection block, it is assumed that it contains M residual modules; the specific connection relationship is shown in Figure 6.
- the up-projection module uses a transposed convolutional layer to up-sample the input low-resolution features, and the mathematical form is shown in equation (9),
- * indicates the spatial convolution operation
- F d-1 indicates the input of the d-th residual projection block
- p t indicates the transposed convolution
- ⁇ s indicates the upsampling with scaling factor s
- F d,0 indicates the first input to the residual module.
- [F d,1 ,...,F d,M ] denote the output of M residual modules respectively.
- the local feature fusion includes a feature fusion layer and a third convolutional layer, in which a 1 ⁇ 1 third convolutional layer is introduced to perform fusion operations on the features learned by the residual module.
- the characteristic information learned by adaptive control its mathematical form is shown in formula (10),
- the down-projection module uses the convolution operation of the fourth convolutional layer to down-sample F d, LFF to achieve the effect of using high-resolution features to guide low-resolution features, and finally perform pixel-level addition through F d-1 Get F d , its mathematical form is shown in formula (11),
- the embodiment of the present application combines the transposed convolution and the residual module to propose the residual projection block RPB.
- the basic idea is to use the transposed convolution layer to project low-resolution features into the high-resolution feature space, and then use The residual module learns high-resolution features of different levels, then improves the expressive ability of the residual module through local feature fusion, and finally uses the convolutional layer to project the high-resolution features back to the low-resolution feature space.
- the embodiment of this application proposes a preset neural network model RPNet with half-precision sub-pixel interpolation, and embeds the trained model into the coding platform VTM7.0.
- the embodiment of the present application can only select RPNet for motion compensation enhancement for PUs with a size greater than or equal to 64 ⁇ 64; for PUs with a size smaller than 64 ⁇ 64, motion is still performed according to the interpolation filter in the related art Compensation enhancements.
- the method in the embodiment of the present application can realize sub-pixel motion compensation with half precision, and can also realize sub-pixel motion compensation with quarter precision.
- the pixel-by-pixel motion compensation can even implement pixel-by-pixel motion compensation with other precision, which is not limited in this embodiment of the present application.
- the convolution kernel size of the transposed convolution layer and the fourth convolution layer are both 6 ⁇ 6, and the step size and padding value are both set to 2; or, when the precision of the sub-pixel sample value is a quarter precision, the convolution kernel size of the transposed convolution layer and the fourth convolution layer are both 8 ⁇ 8, and the step size and padding value are respectively set to 4 and 2.
- the number N of residual projection blocks in RPNet can be set to 10
- the number M of residual modules in each residual projection block can be Set to 3.
- the number of convolution kernels of the convolution layer in the reconstruction network layer is set to 1
- the number of convolution kernels of other transposed convolution layers or convolution layers in the network model is set to 64.
- the size of the convolution kernel in the transposed convolutional layer in the upper projection module and the convolutional layer in the lower projection module is set to 6 ⁇ 6, and the stride and padding are set to 2.
- other convolutional layers in the network model use convolutional kernels with a size of 3 ⁇ 3, and the upsampling module can use sub-pixel convolutional layers.
- the RPNet in the embodiment of the present application can also be used for PUs of all sizes to perform half-precision sub-pixel interpolation.
- the RPNet in the embodiment of the present application can also adjust the number of residual projection blocks and the number of residual modules in the residual projection block. Even the RPNet in the embodiment of the present application can also be used for quarter-precision sub-pixel motion compensation.
- the transposed convolution layer in the upper projection module and the convolution kernel in the convolution layer in the lower projection module The size is set to 8 ⁇ 8, the stride and padding are set to 4 and 2 respectively, and a sub-pixel convolutional layer is added in the upsampling module.
- the preset neural network model can be obtained through model training.
- the method may also include:
- the training data set includes at least one training image
- the neural network model is trained by using the at least one set of input image groups to obtain at least one set of candidate model parameters; wherein the true value area is used to determine the loss value (Loss) of the loss function of the neural network model, which At least one set of candidate model parameters is obtained when the loss value of the loss function converges to a preset threshold.
- the true value area is used to determine the loss value (Loss) of the loss function of the neural network model, which At least one set of candidate model parameters is obtained when the loss value of the loss function converges to a preset threshold.
- the embodiment of the present application may select a public data set (such as the DIV2K data set), which includes 800 training images and 100 verification images.
- the preprocessing of the DIV2K dataset mainly includes two steps of format conversion and encoding reconstruction. First, format conversion is performed on 800 high-resolution images in the training set, 100 high-resolution images in the test set, and their corresponding low-resolution images, from the original PNG format to YUV420 format. Then, extract the luminance component from the high-resolution image data in YUV420 format and save it in PNG format as the ground truth area.
- VTM7.0 is used for full intra-frame encoding, and the quantization parameters (Quantization Parameter, QP) can be set to 22, 27, 32, and 37 respectively, and then the four sets of decoded and reconstructed data are extracted separately.
- the brightness component is saved in PNG format as the input of the neural network model. Thus, four sets of training data sets can be obtained.
- the embodiment of the present application selects peak signal-to-noise ratio (Peak Signal-to-Noise Ratio, PSNR) as the evaluation standard of image reconstruction quality.
- PSNR Peak Signal-to-Noise Ratio
- the model is trained based on the Pytorch platform.
- a low-resolution image of size 48 ⁇ 48 is taken as input, and the batch is set to 16.
- the mean absolute error can be selected as the loss function
- the adaptive moment estimation can be used as the optimization function
- the momentum and weight decay can be set to 0.9 and 0.0001, respectively.
- the initial learning rate is set to 0.0001, and every 100 epochs (epochs) are reduced by a ratio of 0.1, and a total of 300 epochs have passed.
- four sets of model parameters can be obtained. These four sets of model parameters correspond to four models, which are represented by RPNet_qp22, RPNet_qp27, RPNet_qp32, and RPNet_qp37.
- the method may also include:
- a preset neural network model is determined.
- the input image groups correspond to different quantization parameters, and there are correspondences between the multiple groups of candidate model parameters and different quantization parameters.
- the trained model parameters corresponding to the quantization parameter can be determined, and then the preset neural network model used in the embodiment of the present application can be determined.
- the method may further include: encoding the model parameters, and writing encoded bits into a code stream.
- the encoder and decoder use the same preset neural network model with fixed parameters, then the parameters have been solidified at this time, so there is no need to transmit model parameters; on the other hand, if the code stream transmits The access information of the public training data set, such as a Uniform Resource Locator (Uniform Resource Locator, URL), the decoder is trained in the same way as the encoder; on the other hand, for the encoder, the encoded video sequence can be used for study.
- Uniform Resource Locator Uniform Resource Locator
- the encoder writes the model parameters into the code stream, the decoder does not need to perform model training. After obtaining the model parameters by parsing the code stream, the preset neural network used in the embodiment of this application can be determined. Model.
- the preset neural network model can be used to perform motion compensation enhancement on the first matching block to obtain at least one second matching block.
- S403 Determine motion information of the current block according to at least one second matching block.
- the embodiment of the present application also needs to perform sub-pixel motion estimation.
- the method may also include:
- the target matching block at the sub-pixel position (may be referred to as “sub-pixel matching block”) is the matching block with the smallest rate-distortion cost when motion estimation is performed at the sub-pixel position for the current block.
- the matching block at the sub-pixel position is the optimal matching block, that is, the target matching block described in the embodiment of the present application. That is to say, the target matching block at the sub-pixel position (or "sub-pixel matching block" for short) is the matching block corresponding to the minimum rate-distortion cost value selected from the plurality of second matching blocks.
- the determining the motion information of the current block according to at least one second matching block may include:
- first rate-distortion cost is greater than the second rate-distortion cost, then determine that the current block uses motion compensation enhancement processing, and determine that the motion information is the first motion information, and the first motion information is used to point to the sub-pixel position;
- the motion information is determined as the second motion information, and the second motion information is used to point to the integer pixel position .
- the embodiment of the present application is determined according to the value of the calculated rate-distortion cost . That is to say, the encoder finally selects the mode with the smallest rate-distortion cost for predictive encoding.
- the motion information at this time is the first motion information, which is used to point to the sub-pixel position (ie "sub-pixel precision position"). At this time, when decoding It is also necessary to perform motion compensation enhancement in the processor to interpolate the second matching block. Otherwise, if it is determined that the current block does not use the motion compensation enhancement processing method, then the motion information at this time is the second motion information, which is used to point to the integer pixel position (ie "integer pixel precision position"), and the decoder does not need it at this time Enhanced with motion compensation.
- the method may also include:
- first rate-distortion cost value is greater than the second rate-distortion cost value, determine that the value of the first syntax element identification information is the first value
- first rate-distortion cost is less than or equal to the second rate-distortion cost, it is determined that the value of the first syntax element identification information is the second value.
- the first value and the second value are different, and the first value and the second value may be in the form of a parameter or a form of a number.
- the first syntax element identification information is a parameter written in the profile (profile), but the first syntax element identification information may also be a flag (flag), which is not limited here.
- first syntax element identification information may also be set, where the first syntax element identification information is used to indicate whether the current block uses a motion compensation enhancement processing manner.
- the decoder subsequently in the decoder, according to the value of the identification information of the first syntax element, it can be determined whether the current block uses the motion compensation enhancement processing manner.
- the first syntax element identification information is a flag
- the first value can be set to 1, and the second value can be set to 0; in another specific example, The first value can also be set to true, and the second value can also be set to false; even in yet another specific example, the first value can also be set to 0, and the second value can also be set to 1; or, the first value It can also be set to false, and the second value can also be set to true.
- the first value and the second value in the embodiment of the present application are not limited in any way.
- the embodiment of the present application may also set second syntax element identification information, where the second syntax element identification information is used to indicate whether the current block uses the motion compensation enhancement method of the embodiment of the application.
- the method may further include: if the second syntax element identification information indicates that the current block uses the motion compensation enhancement method of the embodiment of the present application, that is, the value of the second syntax element identification information is the first value, then execute The process shown in FIG. 4; if the second syntax element identification information indicates that the current block does not use the motion compensation enhancement method of the embodiment of the present application, that is, the value of the second syntax element identification information is the second value, then perform the motion of the related art Compensation and enhancement methods, such as DCTIF-based sub-pixel motion compensation methods.
- the method may further include: encoding the value of the identification information of the first syntax element, and writing the encoded bits into the code stream.
- the decoder can directly determine whether the current block uses the motion compensation enhancement processing mode by analyzing the code stream, so as to facilitate the decoder to perform subsequent operations.
- S404 Encode the current block according to the motion information.
- the motion information may at least include: reference frame information and motion vectors.
- the prediction block can be determined from the reference frame.
- encoding the current block according to the motion information may include:
- the residual block is encoded, and the encoded bits are written into the code stream.
- the determining the residual block of the current block according to the current block and the first predicted block may include: performing a subtraction operation on the current block and the first predicted block to determine the residual block of the current block .
- encoding the current block according to the motion information may include:
- the residual block is encoded, and the encoded bits are written into the code stream.
- the current block does not use the motion compensation enhancement processing method, it means that the current block uses the integer pixel motion compensation method.
- the determining the residual block of the current block according to the current block and the second predicted block may include: performing a subtraction operation on the current block and the second predicted block to determine The residual block of the current block.
- the method may further include: encoding the motion information, and writing the encoded bits into the code stream.
- the decoder can determine the motion information by analyzing the code stream, and then determine the prediction block (the first prediction block or the second prediction block) of the current block according to the motion information. , so that the decoder can perform subsequent operations.
- the embodiment of the present application combines transposed convolution and residual network to propose a residual projection block. Then, based on the residual projection block, the embodiment of this application proposes a half-precision sub-pixel interpolation network RPNet, and applies it in VTM7.0.
- An embodiment of the present application provides an encoding method, which is applied to an encoder.
- determining the first matching block of the current block By determining the first matching block of the current block; performing motion compensation enhancement on the first matching block to obtain at least one second matching block; determining the motion information of the current block according to the at least one second matching block; to encode.
- using the preset neural network model for motion compensation enhancement can not only reduce the computational complexity, but also save the code rate and improve the encoding and decoding efficiency under the premise of ensuring the same decoding quality.
- the embodiment of the present application provides a code stream, where the code stream is generated by performing bit coding according to the information to be coded.
- the information to be encoded includes at least the motion information of the current block, the residual block of the current block, and the value of the first syntax element identification information, and the first syntax element identification information is used to indicate whether the current block uses motion compensation enhancement processing.
- the code stream may be transmitted from the encoder to the decoder, so that the decoder can perform subsequent operations conveniently.
- FIG. 7 shows a schematic flowchart of a decoding method provided in an embodiment of the present application. As shown in Figure 7, the method may include:
- S701 Parse the code stream, and determine a value of the first syntax element identification information.
- the video image can be divided into multiple image blocks, and each image block to be decoded can be called a decoding block, and the current block here specifically refers to the current block to be subjected to inter-frame prediction. decoding block.
- the current block may be a CTU, or even a CU, PU, etc., which is not limited in this embodiment of the present application.
- the decoding method in the embodiment of the present application is mainly applied to the motion compensation part of the inter-frame prediction.
- the motion compensation is to use the partial image in the decoded and reconstructed reference frame to predict and compensate the current partial image, which can reduce the redundant information of the moving image.
- the parsing the code stream to determine the value of the first syntax element identification information may include:
- the current block uses a motion compensation enhancement processing method
- the value of the identification information of the first syntax element is the second value, it is determined that the current block does not use the motion compensation enhancement processing manner.
- the identification information of the first syntax element is used to indicate whether the current block uses a motion compensation enhancement processing manner.
- the first value and the second value are different, and the first value and the second value may be in the form of parameters or numbers.
- the first syntax element identification information is a parameter written in the profile (profile), but the first syntax element identification information may also be a flag (flag), which is not limited here.
- the first syntax element identification information is a flag
- the first value can be set to 1, and the second value can be set to 0; in another specific example, The first value can also be set to true, and the second value can also be set to false; even in yet another specific example, the first value can also be set to 0, and the second value can also be set to 1; or, the first value It can also be set to false, and the second value can also be set to true.
- the first value and the second value in the embodiment of the present application are not limited in any way.
- the motion information obtained by decoding is the first motion information, which is used to point to the sub-pixel position.
- motion compensation enhancement is also required in the decoder to interpolate the second matching block.
- the motion information obtained by decoding at this time is the second motion information, which is used to point to the integer pixel position, At this time, the decoder does not need to perform motion compensation enhancement.
- S703 Determine a first matching block of the current block according to the first motion information, and perform motion compensation enhancement on the first matching block to obtain at least one second matching block.
- the motion information may include reference frame information and motion vector (Motion Vector, MV) information.
- MV Motion Vector
- whether to use sub-pixel motion compensation is determined by the MV precision determined by analyzing the code stream. For example, it identifies whether the MV is an integer pixel precision or a sub-pixel precision. If it is sub-pixel precision, such as 1/4 pixel, and the lower 2 bits of the MV component are all 0, it can indicate that the MV points to the position of integer pixel precision; otherwise, it points to the position of sub-pixel precision.
- the first matching block here may point to an integer-pixel precision position, or may be a sub-pixel precision position pointed to by a sub-pixel interpolation filtering method in the related art, which is not limited by this embodiment of the present application.
- the decoder needs to perform sub-pixel motion compensation to interpolate out of the second matching block.
- the decoded reference frames in the decoder are all integer pixel positions, the sub-pixel positions in the middle of the whole pixel positions need to be obtained by interpolation. Realized by pixel motion compensation.
- the step of performing motion compensation enhancement on the first matching block may further include: performing motion compensation enhancement on the first matching block by using a preset neural network model.
- performing motion compensation enhancement on the first matching block to obtain at least one second matching block may include:
- the first filtering process is performed on the processing block to obtain at least one second matching block.
- the resolution of the processing block is higher than the resolution of the current block.
- the resulting processed blocks after super-resolution and quality enhancement processing have high-quality and high-resolution performance.
- the first matching block has the same resolution as the current block
- the second matching block obtained after the first filtering process also has the same resolution as the current block.
- the first filtering process may include: downsampling. That is to say, after the processing block is obtained, at least one second matching block can be obtained by down-sampling the processing block.
- performing motion compensation enhancement on the first matching block using a preset neural network model to obtain at least one second matching block may include:
- the precision of the second matching block is half precision, and the number of the second matching blocks is four; in another In a possible implementation manner, the precision of the second matching block is quarter precision, and the number of the second matching blocks is 16; however, this embodiment of the present application does not make any limitation thereto.
- the preset neural network model may be a convolutional neural network model.
- the convolutional neural network model can be used to implement end-to-end super-resolution and quality enhancement for the first matching block, and then down-sample the output high-resolution image to generate four half-precision Pixel samples (ie "second matching block").
- the preset neural network model may include a feature extraction module, a residual projection module group, a sampling module and a reconstruction module; wherein, the feature extraction module, the residual projection module group, the sampling module and the reconstruction module are sequentially connections.
- performing super-resolution and quality enhancement processing on the first matching block to obtain a processing block may include:
- the feature extraction module can also be called a "shallow feature extraction module".
- the feature extraction module may be the first convolutional layer.
- performing shallow feature extraction on the first matching block through the feature extraction module to obtain the first feature information may include: performing a convolution operation on the first matching block through the first convolution layer to obtain the first feature information .
- the shallow features here mainly refer to low-level simple features (such as edge features, etc.).
- the size of the convolution kernel of the first convolution layer is K ⁇ L
- the number of convolution kernels of the first convolution layer is an integer power of 2
- K and L are positive integers greater than zero.
- the size of the convolution kernel of the first convolution layer may be 3 ⁇ 3
- the number of convolution kernels of the first convolution layer is 64, but there is no limitation here.
- the residual projection module set may include N residual projection blocks, a second convolutional layer, and a first connection layer; where N is greater than or equal to Integer of 1.
- the N residual projection blocks, the second convolutional layer and the first connection layer are sequentially connected, and the first connection layer is also connected with the first residual projection block of the N residual projection blocks. Enter the connection.
- performing residual feature learning on the first feature information through the residual projection module group to obtain the second feature information includes:
- the first feature information and the second intermediate feature information are added through the first connection layer to obtain the second feature information.
- the size of the convolution kernel of the second convolution layer is K ⁇ L
- the number of convolution kernels of the second convolution layer is an integer power of 2
- K and L are positive integers greater than zero.
- the size of the convolution kernel of the second convolution layer is 3 ⁇ 3
- the number of convolution kernels of the second convolution layer is 64, but there is no limitation here.
- the N residual projection blocks are a cascade structure
- the input of the cascade structure is the first feature information
- the output of the cascade structure is the second intermediate feature information.
- performing residual feature learning on the first feature information through N residual projection blocks to obtain the first intermediate feature information may include:
- N When N is equal to 1, input the first feature information to the first residual projection block, obtain the output information of the first residual projection block, and determine the output information of the first residual projection block as the first intermediate characteristic information;
- N is greater than 1
- input the output information of the dth residual projection block to the d+1th residual projection block to obtain the d+1th residual The output information of the difference projection block, and add 1 to d until the output information of the Nth residual projection block is obtained, and the output information of the Nth residual projection block is determined as the first intermediate feature information; where, d is an integer greater than or equal to 1 and less than N.
- the output information of the residual projection block is the first intermediate feature information; if N is greater than 1, that is, the residual projection module There are more than two residual projection blocks in the group.
- the output of the previous residual projection block is the input of the next residual projection block until the output of the last residual projection block is obtained by stacking.
- Information, at this time the output information of the last residual projection block is the first intermediate feature information.
- the residual projection block may include an upper projection module, M residual modules, a local feature fusion module, a lower projection module and a second connection layer; wherein, M is an integer greater than or equal to 1.
- the upper projection module, the M residual modules, the local feature fusion module, the lower projection module and the second connection layer are sequentially connected, and the second connection layer is also connected to the input of the upper projection module, M
- the output of the residual module is also respectively connected with the local feature fusion module.
- the method may also include:
- the input information and the filtered feature information are added through the second connection layer to obtain the output information of the residual projection block.
- the up-projection module may include a transposed convolutional layer.
- performing the third filtering process on the input information of the residual projection block through the up-projection module to obtain the first high-resolution feature information may include:
- the third filtering process is performed on the input information of the residual projection block by transposing the convolution layer to obtain the first high-resolution feature information.
- the resolution of the first high-resolution feature information obtained after the third filtering process is higher than the resolution of the input information of the residual projection block.
- the third filtering process may include: upsampling. That is, the input information of the residual projection block is upsampled by transposing the convolutional layer to obtain the first high-resolution feature information.
- the local feature fusion module may include a feature fusion layer and a third convolutional layer, and the M second high-resolution feature information is processed by the local feature fusion module Perform a fusion operation to obtain the third high-resolution feature information, including:
- a third convolutional layer is used to perform a convolution operation on the fused feature information to obtain third high-resolution feature information.
- the size of the convolution kernel of the third convolution layer is K ⁇ L
- the number of convolution kernels of the third convolution layer is an integer power of 2
- K and L are positive integers greater than zero.
- the size of the convolution kernel of the third convolution layer is 1 ⁇ 1
- the number of convolution kernels of the third convolution layer is 64, but there is no limitation here.
- the M second high-resolution feature information is fused through the feature fusion layer, but in order to give full play to the learning ability of the residual network, a 1 ⁇ 1 convolutional layer is also introduced here
- the fusion operation of the feature information learned by the residual module can adaptively control the learned feature information.
- the lower projection module may include a fourth convolutional layer, and the fourth filtering process is performed on the third high-resolution feature information through the lower projection module to obtain the filtered Characteristic information, including:
- a fourth filtering process is performed on the third high-resolution feature information through the fourth convolution layer to obtain filtered feature information.
- the resolution of the filtered feature information obtained after the fourth filtering process is lower than the resolution of the third high-resolution feature information.
- the filtered feature information obtained after the fourth filtering process has the same resolution as the input information of the residual projection block.
- the fourth filtering process may include: downsampling. That is to say, the third high-resolution feature information is down-sampled through the fourth convolutional layer to obtain filtered feature information.
- the sampling module of the preset neural network module may include a sub-pixel convolution layer.
- the second filtering process is performed on the second feature information through the sampling module to obtain the third feature information, including:
- the second filtering process is performed on the second feature information through the sub-pixel convolution layer to obtain the third feature information.
- the resolution of the third feature information obtained after the second filtering process is higher than the resolution of the second feature information.
- the second filtering process may include: upsampling. That is to say, the sampling module is mainly used for upsampling the second characteristic information, so the sampling module may also be called an "upsampling module”.
- the sampling module may use a sub-pixel convolution layer, or may add a sub-pixel convolution layer.
- the sub-pixel convolutional layer can also be a PixShuffle module (or, called Pixelshuffle module), which realizes the function of inputting a low-resolution H ⁇ W image, and converting it to It is transformed into a rH ⁇ rW high-resolution input image.
- the implementation process does not directly generate this high-resolution image through interpolation, but first obtains the feature map of r 2 channels through convolution (the size of the feature map is consistent with the input low-resolution image), and then passes periodic screening (periodic shuffling) method to obtain this high-resolution image; wherein, r can be the magnification of the image.
- r can be the magnification of the image.
- the number of channels of which is r 2 rearrange the r 2 channels of each pixel into an r ⁇ r area, corresponding to a sub-block of r ⁇ r size in the high-resolution image,
- the feature image of size r 2 ⁇ H ⁇ W is rearranged into a high-resolution image of size 1 ⁇ rH ⁇ rW.
- the reconstruction module may include a fifth convolutional layer.
- the super-resolution reconstruction of the third feature information by the reconstruction module to obtain the processing block may include:
- a convolution operation is performed on the third feature information through the fifth convolution layer to obtain a processing block.
- the size of the convolution kernel of the fifth convolution layer is K ⁇ L
- the number of convolution kernels of the fifth convolution layer is an integer power of 2
- K and L are positive integers greater than zero.
- the size of the convolution kernel of the fifth convolution layer is 3 ⁇ 3
- the number of convolution kernels of the fifth convolution layer is 1, but there is no limitation here.
- FIG. 5 shows an example of a network structure of a preset neural network model provided in an embodiment of the present application
- FIG. 6 shows an example of a network structure of a residual projection block provided in an embodiment of the present application. That is to say, the embodiment of the present application combines the transposed convolution and the residual module to propose the residual projection block RPB.
- the basic idea is to use the transposed convolution layer to project low-resolution features into the high-resolution feature space, and then use The residual module learns high-resolution features of different levels, then improves the expressive ability of the residual module through local feature fusion, and finally uses the convolutional layer to project the high-resolution features back to the low-resolution feature space.
- the embodiment of this application proposes a preset neural network model RPNet with half-precision sub-pixel interpolation, and embeds the trained model into the coding platform VTM7.0.
- the embodiment of the present application can only select RPNet for motion compensation enhancement for PUs with a size greater than or equal to 64 ⁇ 64; for PUs with a size smaller than 64 ⁇ 64, motion is still performed according to the interpolation filter in the related art Compensation enhancements.
- the method in the embodiment of the present application can realize sub-pixel motion compensation with half precision, and can also realize sub-pixel motion compensation with quarter precision.
- the pixel-by-pixel motion compensation can even implement pixel-by-pixel motion compensation with other precision, which is not limited in this embodiment of the present application.
- the convolution kernel size of the transposed convolution layer and the fourth convolution layer are both 6 ⁇ 6, and the step size and padding value are both set to 2; or, when the precision of the sub-pixel sample value is a quarter precision, the convolution kernel size of the transposed convolution layer and the fourth convolution layer are both 8 ⁇ 8, and the step size and padding value are respectively set to 4 and 2.
- the number N of residual projection blocks in RPNet can be set to 10
- the number M of residual modules in each residual projection block can be Set to 3.
- the number of convolution kernels of the convolution layer in the reconstruction network layer is set to 1
- the number of convolution kernels of other transposed convolution layers or convolution layers in the network model is set to 64.
- the size of the convolution kernel in the transposed convolutional layer in the upper projection module and the convolutional layer in the lower projection module is set to 6 ⁇ 6, and the stride and padding are set to 2.
- other convolutional layers in the network model use convolutional kernels with a size of 3 ⁇ 3, and the upsampling module can use sub-pixel convolutional layers.
- the RPNet in the embodiment of the present application can also be used for PUs of all sizes to perform half-precision sub-pixel interpolation.
- the RPNet in the embodiment of the present application can also adjust the number of residual projection blocks and the number of residual modules in the residual projection block. Even the RPNet in the embodiment of the present application can also be used for quarter-precision sub-pixel motion compensation.
- the transposed convolution layer in the upper projection module and the convolution kernel in the convolution layer in the lower projection module The size is set to 8 ⁇ 8, the stride and padding are set to 4 and 2 respectively, and a sub-pixel convolutional layer is added in the upsampling module.
- the method may also include:
- determining a training data set comprising at least one training image and at least one verification image
- the neural network model is trained by using the at least one set of input image groups to obtain at least one set of candidate model parameters; wherein the true value area is used to determine the loss value (Loss) of the loss function of the neural network model, which At least one set of candidate model parameters is obtained when the loss value of the loss function converges to a preset threshold.
- the true value area is used to determine the loss value (Loss) of the loss function of the neural network model, which At least one set of candidate model parameters is obtained when the loss value of the loss function converges to a preset threshold.
- the embodiment of the present application may choose a public data set DIV2K, which contains 800 training images and 100 verification images.
- the preprocessing of the DIV2K dataset mainly includes two steps of format conversion and encoding reconstruction. First, format conversion is performed on 800 high-resolution images in the training set, 100 high-resolution images in the test set, and their corresponding low-resolution images, from the original PNG format to YUV420 format. Then, extract the luminance component from the high-resolution image data in YUV420 format and save it in PNG format as the ground truth area.
- VTM7.0 is used for full intra-frame encoding, and the quantization parameters (Quantization Parameter, QP) can be set to 22, 27, 32, and 37 respectively, and then the four sets of decoded and reconstructed data are extracted separately.
- the brightness component is saved in PNG format as the input of the neural network model. Thus, four sets of training data sets can be obtained.
- the embodiment of the present application selects peak signal-to-noise ratio (Peak Signal-to-Noise Ratio, PSNR) as the evaluation standard of image reconstruction quality.
- PSNR Peak Signal-to-Noise Ratio
- the model is trained based on the Pytorch platform.
- a low-resolution image of size 48 ⁇ 48 is taken as input, and the batch is set to 16.
- the mean absolute error can be selected as the loss function
- the adaptive moment estimation can be used as the optimization function
- the momentum and weight decay can be set to 0.9 and 0.0001, respectively.
- the initial learning rate is set to 0.0001, and every 100 epochs (epochs) are reduced by a ratio of 0.1, and a total of 300 epochs have passed.
- QP using the corresponding data set training, four sets of model parameters are obtained. These four sets of model parameters correspond to four models, which are represented by RPNet_qp22, RPNet_qp27, RPNet_qp32, and RPNet_qp37 respectively.
- the determination of the preset neural network model can be realized in the following two ways.
- the method may further include:
- a preset neural network model is determined.
- the input image groups correspond to different quantization parameters, and there are correspondences between the multiple groups of candidate model parameters and different quantization parameters.
- the method may also include:
- the preset neural network model is determined according to the model parameters.
- the decoder can determine the trained model parameters corresponding to the quantization parameters of the current block according to the quantization parameters of the current block, and then determine the preset neural network model used in the embodiment of the present application; or , the decoder can also obtain model parameters by parsing the code stream, and then determine the preset neural network model used in the embodiment of the present application according to the model parameters; the embodiment of the present application does not specifically limit this.
- the encoder and decoder use the same preset neural network model with fixed parameters, then the parameters have been solidified at this time, so there is no need to transmit model parameters; on the other hand, if the code stream contains Transmitting access information for public training datasets, such as a Uniform Resource Locator (URL), the decoder is trained in the same way as the encoder; on the other hand, for the encoder, the encoded video sequence can be used to study.
- URL Uniform Resource Locator
- the preset neural network model can be used to perform motion compensation on the first matching block Enhanced to obtain at least one second matching block.
- S704 Determine a first prediction block of the current block according to the first motion information and at least one second matching block.
- S705 Determine a reconstructed block of the current block according to the first predicted block.
- the decoder also needs to decode to obtain the residual block of the current block.
- the method may further include: parsing the code stream to obtain a residual block of the current block.
- the determining the reconstruction block of the current block according to the first prediction block may include: determining the reconstruction block of the current block according to the residual block and the first prediction block.
- the determining the reconstructed block of the current block according to the residual block and the first predicted block may include: performing an addition operation on the residual block and the first predicted block to determine the reconstructed block of the current block .
- the first syntax element identification information may also indicate that the current block does not use the motion compensation enhancement processing method, that is, the current block uses the integer pixel motion compensation method.
- the method may also include:
- the first syntax element identification information indicates that the current block does not use the motion compensation enhancement processing method, then parse the code stream to obtain the second motion information of the current block, and the second motion information is used to point to the integer pixel position;
- a reconstructed block of the current block is determined.
- the second prediction block of the current block can be determined according to the second motion information obtained through decoding.
- the decoder still needs to decode to obtain the residual block of the current block.
- the method may further include: parsing the code stream to obtain a residual block of the current block.
- the determining the reconstruction block of the current block according to the second prediction block may include: determining the reconstruction block of the current block according to the residual block and the second prediction block.
- the determining the reconstructed block of the current block according to the residual block and the second predicted block may include: performing an addition operation on the residual block and the second predicted block to determine the reconstructed block of the current block .
- the embodiment of this application proposes a half-precision sub-pixel interpolation network RPNet, and applies it in VTM7.0.
- the embodiment of the present application provides a decoding method, which is applied to a decoder. Determine the value of the first syntax element identification information by parsing the code stream; if the first syntax element identification information indicates that the current block uses motion compensation enhancement processing, then parse the code stream to determine the first motion information of the current block; according to the first The motion information determines the first matching block of the current block, and performs motion compensation enhancement on the first matching block to obtain at least one second matching block; according to the first motion information and at least one second matching block, determines the first prediction of the current block block; determine the reconstructed block of the current block according to the first predicted block.
- the preset neural network model is used to perform motion compensation enhancement at this time.
- the computational complexity be reduced, but also the code rate can be saved.
- the encoding and decoding efficiency can be improved.
- FIG. 8 shows a schematic structural diagram of an encoder 80 provided in the embodiment of the present application.
- the encoder 80 may include: a first determination unit 801, a first motion compensation unit 802, and an encoding unit 803; wherein,
- the first determining unit 801 is configured to determine the first matching block of the current block
- the first motion compensation unit 802 is configured to perform motion compensation enhancement on the first matching block to obtain at least one second matching block;
- the first determining unit 801 is further configured to determine the motion information of the current block according to at least one second matching block;
- the encoding unit 803 is configured to encode the current block according to the motion information.
- the first motion compensation unit 802 is specifically configured to perform super-resolution and quality enhancement processing on the first matching block to obtain a processing block; and perform first filtering processing on the processing block to obtain at least one second matching block, wherein the second matching block obtained after the first filtering process has the same resolution as the current block.
- the first filtering process includes: downsampling.
- the first motion compensation unit 802 is further configured to use a preset neural network model to perform motion compensation enhancement on the first matching block; wherein the preset neural network model includes a feature extraction module, a residual projection module group, A sampling module and a reconstruction module, and the feature extraction module, the residual projection module group, the sampling module and the reconstruction module are sequentially connected;
- the first motion compensation unit 802 is further configured to perform shallow feature extraction on the first matching block through the feature extraction module to obtain the first feature information; and perform residual feature extraction on the first feature information through the residual projection module group learning to obtain second feature information; and performing second filtering processing on the second feature information through the sampling module to obtain third feature information; and performing super-resolution reconstruction on the third feature information through the reconstruction module to obtain processing blocks.
- the feature extraction module includes a first convolutional layer; correspondingly, the first motion compensation unit 802 is also configured to perform a convolution operation on the first matching block through the first convolutional layer to obtain the first feature information .
- the residual projection module set includes N residual projection blocks, a second convolutional layer and a first connection layer, and the N residual projection blocks, the second convolutional layer and the first connection layer are sequentially connected , and the first connection layer is also connected to the input of the first residual projection block in the N residual projection blocks;
- the first motion compensation unit 802 is further configured to perform residual feature learning on the first feature information through N residual projection blocks to obtain first intermediate feature information, where N is an integer greater than or equal to 1; and Performing a convolution operation on the first intermediate feature information through the second convolution layer to obtain the second intermediate feature information; and performing an addition calculation on the first feature information and the second intermediate feature information through the first connection layer to obtain the second feature information .
- the N residual projection blocks are a cascade structure
- the input of the cascade structure is the first feature information
- the output of the cascade structure is the second intermediate feature information
- the first motion compensation unit 802 is further configured to input the first feature information to the first residual projection block when N is equal to 1, to obtain the output information of the first residual projection block, and Determining the output information of the first residual projection block as the first intermediate feature information; and when N is greater than 1, after obtaining the output information of the first residual projection block, outputting the output of the dth residual projection block
- the information is input to the d+1th residual projection block, the output information of the d+1th residual projection block is obtained, and d is added to d until the output information of the Nth residual projection block is obtained, and The output information of the Nth residual projection block is determined as the first intermediate feature information; wherein, d is an integer greater than or equal to 1 and less than N.
- the residual projection block includes an upper projection module, M residual modules, a local feature fusion module, a lower projection module and a second connection layer, an upper projection module, M residual modules, a local feature fusion module,
- the lower projection module is connected to the second connection layer in sequence, and the second connection layer is also connected to the input of the upper projection module, and the outputs of the M residual modules are also respectively connected to the local feature fusion module;
- the first motion compensation unit 802 is further configured to perform a third filtering process on the input information of the residual projection block through the up-projection module to obtain the first high-resolution feature information; and to obtain the first high-resolution feature information through the M residual modules;
- the high-resolution feature information performs different levels of high-resolution feature learning to obtain M second high-resolution feature information, where M is an integer greater than or equal to 1;
- High-resolution feature information is fused to obtain the third high-resolution feature information; and the fourth filtering process is performed on the third high-resolution feature information through the lower projection module to obtain filtered feature information; and the input information is obtained through the second connection layer. and the filtered feature information are added to obtain the output information of the residual projection block.
- the upper projection module includes a transposed convolutional layer; correspondingly, the first motion compensation unit 802 is also configured to perform a third filtering process on the input information of the residual projection block through the transposed convolutional layer, to obtain The first high-resolution feature information, wherein the resolution of the first high-resolution feature information obtained after the third filtering process is higher than the resolution of the input information of the residual projection block.
- the third filtering process includes: upsampling.
- the local feature fusion module includes a feature fusion layer and a third convolutional layer; correspondingly, the first motion compensation unit 802 is also configured to fuse the M second high-resolution feature information through the feature fusion layer operation to obtain fusion feature information; and performing a convolution operation on the fusion feature information through a third convolution layer to obtain third high-resolution feature information.
- the lower projection module includes a fourth convolutional layer; correspondingly, the first motion compensation unit 802 is also configured to perform fourth filtering processing on the third high-resolution feature information through the fourth convolutional layer, to obtain The filtered feature information, wherein the resolution of the filtered feature information obtained after the fourth filtering process is lower than the resolution of the third high-resolution feature information.
- the fourth filtering process includes: downsampling.
- the sampling module includes a sub-pixel convolution layer; correspondingly, the first motion compensation unit 802 is further configured to perform a second filtering process on the second feature information through the sub-pixel convolution layer to obtain the third feature information , wherein the resolution of the third feature information obtained after the second filtering process is higher than the resolution of the second feature information.
- the second filtering process includes: upsampling.
- the reconstruction module includes a fifth convolutional layer; correspondingly, the first motion compensation unit 802 is further configured to perform a convolution operation on the third feature information through the fifth convolutional layer to obtain a processing block.
- the encoder 80 may further include a first training unit 804 configured to determine a training data set, the training data set includes at least one training image; and perform preprocessing on the training data set to obtain the preprocessed Set the true value area of the neural network model and at least one set of input image groups; wherein, the input image group includes at least one input image; and based on the true value area, use at least one set of input image groups to train the neural network model to obtain at least A set of candidate model parameters; wherein, the true value area is used to determine the loss value of the loss function of the neural network model, and at least one set of candidate model parameters is obtained when the loss value of the loss function converges to a preset threshold.
- a first training unit 804 configured to determine a training data set, the training data set includes at least one training image; and perform preprocessing on the training data set to obtain the preprocessed Set the true value area of the neural network model and at least one set of input image groups; wherein, the input image group includes at least one input image; and
- the first determination unit 801 is further configured to determine the quantization parameter of the current block; determine the model parameter corresponding to the quantization parameter from at least one group of candidate model parameters according to the quantization parameter; and determine the preset A neural network model; wherein, when at least one group is multiple groups, the input image groups correspond to different quantization parameters, and there is a corresponding relationship between the multiple groups of candidate model parameters and different quantization parameters.
- the encoding unit 803 is further configured to encode the model parameters, and write the encoded bits into the code stream.
- the encoder 80 may further include a motion estimation unit 805 configured to perform integer pixel motion estimation on the current block, and determine the first matching block of the current block; wherein, the first matching block is the current block The matching block with the least rate-distortion cost when performing motion estimation at the integer pixel position;
- the first motion compensation unit 802 is further configured to perform pixel-by-pixel motion compensation on the first matching block by using a preset neural network model to obtain at least one second matching block.
- the motion estimation unit 805 is further configured to perform pixel-by-pixel motion estimation on the current block according to at least one second matching block, and determine the pixel-by-pixel matching block of the current block, where the pixel-by-pixel matching block is the pixel-by-pixel position of the current block The matching block with the least rate-distortion cost when performing motion estimation;
- the first determining unit 801 is further configured to use the first matching block to perform precoding processing on the current block to determine a first rate-distortion cost value; and use the sub-pixel matching block to perform precoding processing on the current block to determine a second rate-distortion cost value value; and if the first rate-distortion cost value is greater than the second rate-distortion cost value, then determine that the current block uses a motion compensation enhancement processing method, and determine that the motion information is the first motion information, and the first motion information is used to point to the sub-pixel position; Or, if the first rate-distortion cost is less than or equal to the second rate-distortion cost, it is determined that the current block does not use the motion compensation enhancement processing method, and the motion information is determined to be the second motion information, and the second motion information is used to point to the integer pixel Location.
- the first determining unit 801 is further configured to determine that the value of the first syntax element identification information is the first value if the first rate-distortion cost is greater than the second rate-distortion cost; or, if the first If the rate-distortion cost value is less than or equal to the second rate-distortion cost value, it is determined that the value of the first syntax element identification information is the second value; wherein, the first syntax element identification information is used to indicate whether the current block uses motion compensation enhancement processing Way.
- the encoding unit 803 is further configured to encode the value of the first syntax element identification information, and write the encoded bits into the code stream.
- the encoding unit 803 is further configured to determine the first prediction block of the current block according to the first motion information and the sub-pixel matching block when the current block uses the motion compensation enhancement processing method; and according to the current block and the second A prediction block, determining the residual block of the current block; and encoding the residual block, and writing the coded bits into the code stream;
- the encoding unit 803 is further configured to determine a second prediction block of the current block according to the second motion information and the first matching block when the current block does not use the motion compensation enhancement processing mode; and determine the second prediction block of the current block according to the current block and the second prediction block , determine the residual block of the current block; and encode the residual block, and write the coded bits into the code stream.
- the encoding unit 803 is further configured to encode the motion information, and write the encoded bits into the code stream.
- a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a module, or it may be non-modular.
- each component in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules.
- the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the technical solution of this embodiment is essentially or It is said that the part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment.
- the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes.
- the embodiment of the present application provides a computer storage medium, which is applied to the encoder 80, and the computer storage medium stores a computer program, and when the computer program is executed by the first processor, it implements any one of the preceding embodiments. Methods.
- FIG. 9 shows a schematic diagram of a specific hardware structure of the encoder 80 provided by the embodiment of the present application.
- it may include: a first communication interface 901 , a first memory 902 and a first processor 903 ; each component is coupled together through a first bus system 904 .
- the first bus system 904 is used to realize connection and communication between these components.
- the first bus system 904 also includes a power bus, a control bus and a status signal bus.
- the various buses are labeled as the first bus system 904 in FIG. 9 . in,
- the first communication interface 901 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
- the first memory 902 is used to store computer programs that can run on the first processor 903;
- the first processor 903 is configured to, when running the computer program, execute:
- the current block is encoded according to the motion information.
- the first memory 902 in this embodiment of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories.
- the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
- the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
- RAM Static Random Access Memory
- DRAM Dynamic Random Access Memory
- SRAM Dynamic Random Access Memory
- Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
- SDRAM double data rate synchronous dynamic random access memory
- Double Data Rate SDRAM, DDRSDRAM enhanced synchronous dynamic random access memory
- Enhanced SDRAM, ESDRAM synchronous connection dynamic random access memory
- Synchronous DRAM Synchronous Dynamic Random Access Memory
- Enhanced SDRAM synchronous dynamic random access memory
- SLDRAM synchronous connection dynamic random access memory
- Direct Rambus RAM Direct Rambus RAM
- the first memory 902 of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
- the first processor 903 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be implemented by an integrated logic circuit of hardware in the first processor 903 or an instruction in the form of software.
- the above-mentioned first processor 903 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- DSP Digital Signal Processor
- ASIC Application Specific Integrated Circuit
- FPGA Field Programmable Gate Array
- Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
- a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
- the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
- the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
- the storage medium is located in the first memory 902, and the first processor 903 reads the information in the first memory 902, and completes the steps of the above method in combination with its hardware.
- the embodiments described in this application may be implemented by hardware, software, firmware, middleware, microcode or a combination thereof.
- the processing unit can be implemented in one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing device (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processor, controller, microcontroller, microprocessor, other devices for performing the functions described in this application electronic unit or its combination.
- the techniques described herein can be implemented through modules (eg, procedures, functions, and so on) that perform the functions described herein.
- Software codes can be stored in memory and executed by a processor. Memory can be implemented within the processor or external to the processor.
- the first processor 903 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
- This embodiment provides an encoder, which may include a first determination unit, a first motion compensation unit, and an encoding unit.
- an encoder which may include a first determination unit, a first motion compensation unit, and an encoding unit.
- FIG. 10 shows a schematic diagram of the composition and structure of a decoder 100 provided in the embodiment of the present application.
- the decoder 100 may include: an analysis unit 1001, a second determination unit 1002, and a second motion compensation unit 1003; wherein,
- the parsing unit 1001 is configured to parse the code stream and determine the value of the first syntax element identification information
- the parsing unit 1001 is further configured to parse the code stream and determine the first motion information of the current block if the first syntax element identification information indicates that the current block uses motion compensation enhancement processing;
- the second motion compensation unit 1003 is configured to determine a first matching block of the current block according to the first motion information, and perform motion compensation enhancement on the first matching block to obtain at least one second matching block;
- the second determining unit 1002 is configured to determine a first prediction block of the current block according to the first motion information and at least one second matching block; and determine a reconstruction block of the current block according to the first prediction block.
- the parsing unit 1001 is further configured to parse the code stream to obtain the residual block of the current block;
- the second determining unit 1002 is further configured to determine the reconstructed block of the current block according to the residual block and the first prediction block.
- the parsing unit 1001 is further configured to parse the code stream to obtain the second motion information of the current block if the first syntax element identification information indicates that the current block does not use the motion compensation enhancement processing method, and the second motion information uses point to an integer pixel position;
- the second determining unit 1002 is further configured to determine a second prediction block of the current block according to the second motion information of the current block; and determine a reconstruction block of the current block according to the second prediction block.
- the parsing unit 1001 is further configured to parse the code stream to obtain the residual block of the current block;
- the second determining unit 1002 is further configured to determine the reconstructed block of the current block according to the residual block and the second prediction block.
- the second determining unit 1002 is further configured to determine that the current block uses a motion compensation enhancement processing method if the value of the first syntax element identification information is the first value; or, if the first syntax element identification information is the second value, it is determined that the current block does not use the motion compensation enhancement processing manner.
- the second motion compensation unit 1003 is specifically configured to perform super-resolution and quality enhancement processing on the first matching block to obtain a processed block, wherein the resolution of the processed block is higher than the resolution of the current block; and The first filtering process is performed on the processing block to obtain at least one second matching block, wherein the second matching block obtained after the first filtering process has the same resolution as the current block.
- the first filtering process includes: downsampling.
- the second motion compensation unit 1003 is further configured to use a preset neural network model to perform motion compensation enhancement on the first matching block; wherein, the preset neural network model includes a feature extraction module, a residual projection module group, A sampling module and a reconstruction module, and the feature extraction module, the residual projection module group, the sampling module and the reconstruction module are sequentially connected;
- the second motion compensation unit 1003 is further configured to perform shallow feature extraction on the first matching block through the feature extraction module to obtain the first feature information; and perform residual feature extraction on the first feature information through the residual projection module group learning to obtain second feature information; and performing second filtering processing on the second feature information through the sampling module to obtain third feature information; and performing super-resolution reconstruction on the third feature information through the reconstruction module to obtain processing blocks.
- the feature extraction module is a first convolutional layer; correspondingly, the second motion compensation unit 1003 is also configured to perform a convolution operation on the first matching block through the first convolutional layer to obtain the first feature information .
- the residual projection module set includes N residual projection blocks, a second convolutional layer and a first connection layer, and the N residual projection blocks, the second convolutional layer and the first connection layer are sequentially connected , and the first connection layer is also connected to the input of the first residual projection block in the N residual projection blocks;
- the second motion compensation unit 1003 is further configured to perform residual feature learning on the first feature information through N residual projection blocks to obtain first intermediate feature information, where N is an integer greater than or equal to 1; and Performing a convolution operation on the first intermediate feature information through the second convolution layer to obtain the second intermediate feature information; and performing an addition calculation on the first feature information and the second intermediate feature information through the first connection layer to obtain the second feature information .
- the N residual projection blocks are a cascade structure
- the input of the cascade structure is the first feature information
- the output of the cascade structure is the second intermediate feature information
- the second motion compensation unit 1003 is further configured to input the first feature information to the first residual projection block when N is equal to 1, to obtain the output information of the first residual projection block, and Determining the output information of the first residual projection block as the first intermediate feature information; and when N is greater than 1, after obtaining the output information of the first residual projection block, outputting the output of the dth residual projection block
- the information is input to the d+1th residual projection block, the output information of the d+1th residual projection block is obtained, and d is added to d until the output information of the Nth residual projection block is obtained, and The output information of the Nth residual projection block is determined as the first intermediate feature information; wherein, d is an integer greater than or equal to 1 and less than N.
- the residual projection block includes an upper projection module, M residual modules, a local feature fusion module, a lower projection module and a second connection layer, an upper projection module, M residual modules, a local feature fusion module,
- the lower projection module is connected to the second connection layer in sequence, and the second connection layer is also connected to the input of the upper projection module, and the outputs of the M residual modules are also respectively connected to the local feature fusion module;
- the second motion compensation unit 1003 is further configured to perform a third filtering process on the input information of the residual projection block through the up-projection module to obtain the first high-resolution feature information; and to obtain the first high-resolution feature information through the M residual modules;
- the high-resolution feature information performs different levels of high-resolution feature learning to obtain M second high-resolution feature information, where M is an integer greater than or equal to 1;
- High-resolution feature information is fused to obtain the third high-resolution feature information; and the fourth filtering process is performed on the third high-resolution feature information through the lower projection module to obtain filtered feature information; and the input information is obtained through the second connection layer. and the filtered feature information are added to obtain the output information of the residual projection block.
- the upper projection module includes a transposed convolutional layer; correspondingly, the second motion compensation unit 1003 is also configured to perform a third filtering process on the input information of the residual projection block through the transposed convolutional layer, to obtain The first high-resolution feature information, wherein the resolution of the first high-resolution feature information obtained after the third filtering process is higher than the resolution of the input information of the residual projection block.
- the third filtering process includes: upsampling.
- the local feature fusion module includes a feature fusion layer and a third convolutional layer; correspondingly, the second motion compensation unit 1003 is also configured to fuse the M second high-resolution feature information through the feature fusion layer operation to obtain fusion feature information; and performing a convolution operation on the fusion feature information through a third convolution layer to obtain third high-resolution feature information.
- the lower projection module includes a fourth convolutional layer; correspondingly, the second motion compensation unit 1003 is also configured to perform fourth filtering processing on the third high-resolution feature information through the fourth convolutional layer, to obtain The filtered feature information, wherein the resolution of the filtered feature information obtained after the fourth filtering process is lower than the resolution of the third high-resolution feature information.
- the fourth filtering process includes: downsampling.
- the sampling module includes a sub-pixel convolution layer; correspondingly, the second motion compensation unit 1003 is further configured to perform a second filtering process on the second feature information through the sub-pixel convolution layer to obtain third feature information , wherein the resolution of the third feature information obtained after the second filtering process is higher than the resolution of the second feature information.
- the second filtering process includes: upsampling.
- the reconstruction module includes a fifth convolutional layer; correspondingly, the second motion compensation unit 1003 is further configured to perform a convolution operation on the third feature information through the fifth convolutional layer to obtain a processing block.
- the decoder 100 may further include a second training unit 1004 configured to determine a training data set, the training data set includes at least one training image; and perform preprocessing on the training data set to obtain the preprocessed Set the true value area of the neural network model and at least one set of input image groups; wherein, the input image group includes at least one input image; and based on the true value area, use at least one set of input image groups to train the neural network model to obtain at least A set of candidate model parameters; wherein, the true value area is used to determine the loss value of the loss function of the neural network model, and at least one set of candidate model parameters is obtained when the loss value of the loss function converges to a preset threshold.
- a second training unit 1004 configured to determine a training data set, the training data set includes at least one training image; and perform preprocessing on the training data set to obtain the preprocessed Set the true value area of the neural network model and at least one set of input image groups; wherein, the input image group includes at least one input image;
- the second determination unit 1002 is further configured to determine the quantization parameter of the current block; determine the model parameter corresponding to the quantization parameter from at least one group of candidate model parameters according to the quantization parameter; and determine the preset A neural network model; wherein, when at least one group is multiple groups, the input image groups correspond to different quantization parameters, and there is a corresponding relationship between the multiple groups of candidate model parameters and different quantization parameters.
- the parsing unit 1001 is further configured to parse the code stream to obtain model parameters
- the second determining unit 1002 is further configured to determine a preset neural network model according to model parameters.
- a "unit” may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a module, or it may be non-modular.
- each component in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules.
- the integrated units are implemented in the form of software function modules and are not sold or used as independent products, they can be stored in a computer-readable storage medium.
- this embodiment provides a computer storage medium, which is applied to the decoder 100, and the computer storage medium stores a computer program, and when the computer program is executed by the second processor, any one of the preceding embodiments is implemented. the method described.
- FIG. 11 shows a schematic diagram of a specific hardware structure of the decoder 100 provided by the embodiment of the present application.
- it may include: a second communication interface 1101 , a second memory 1102 and a second processor 1103 ; each component is coupled together through a second bus system 1104 .
- the second bus system 1104 is used to realize connection and communication between these components.
- the second bus system 1104 also includes a power bus, a control bus and a status signal bus.
- the various buses are labeled as the second bus system 1104 in FIG. 11 . in,
- the second communication interface 1101 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
- the second memory 1102 is used to store computer programs that can run on the second processor 1103;
- the second processor 1103 is configured to, when running the computer program, execute:
- the first syntax element identification information indicates that the current block uses a motion compensation enhancement processing method, then parse the code stream to determine the first motion information of the current block;
- a reconstructed block of the current block is determined.
- the second processor 1103 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
- the hardware function of the second memory 1102 is similar to that of the first memory 902, and the hardware function of the second processor 1103 is similar to that of the first processor 903; details will not be described here.
- This embodiment provides a decoder, which may include an analysis unit, a second determination unit, and a second motion compensation unit.
- a decoder which may include an analysis unit, a second determination unit, and a second motion compensation unit.
- the encoder side by determining the first matching block of the current block; performing motion compensation enhancement on the first matching block, at least one second matching block is obtained; according to at least one second matching block, the current block is determined
- the motion information of the current block is encoded according to the motion information.
- both the encoder and the decoder can perform motion compensation enhancement on the first matching block. Under the premise of ensuring the same decoding quality, it can not only reduce the computational complexity, but also save the code rate, thereby improving the encoding and decoding efficiency. .
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims (56)
- 一种编码方法,应用于编码器,所述方法包括:确定当前块的第一匹配块;对所述第一匹配块进行运动补偿增强,得到至少一个第二匹配块;根据所述至少一个第二匹配块,确定所述当前块的运动信息;根据所述运动信息,对所述当前块进行编码。
- 根据权利要求1所述的方法,其中,所述对所述第一匹配块进行运动补偿增强,得到至少一个第二匹配块,包括:对所述第一匹配块进行超分辨率和质量增强处理,得到处理块,其中,所述处理块的分辨率高于所述当前块的分辨率;对所述处理块进行第一滤波处理,得到所述至少一个第二匹配块,其中,经过所述第一滤波处理后得到的所述第二匹配块具有与所述当前块相同的分辨率。
- 根据权利要求2所述的方法,其中,所述第一滤波处理包括:下采样。
- 根据权利要求2所述的方法,其中,所述对所述第一匹配块进行运动补偿增强的步骤还包括:利用预设神经网络模型对所述第一匹配块进行运动补偿增强;其中,所述预设神经网络模型包括特征提取模块、残差投影模块组、采样模块和重建模块,且所述特征提取模块、所述残差投影模块组、所述采样模块和所述重建模块顺次连接;所述对所述第一匹配块进行超分辨率和质量增强处理,得到处理块,包括:通过所述特征提取模块对所述第一匹配块进行浅层特征提取,得到第一特征信息;通过所述残差投影模块组对所述第一特征信息进行残差特征学习,得到第二特征信息;通过所述采样模块对所述第二特征信息进行第二滤波处理,得到第三特征信息;通过所述重建模块对所述第三特征信息进行超分辨率重建,得到所述处理块。
- 根据权利要求4所述的方法,其中,所述特征提取模块包括第一卷积层;所述通过所述特征提取模块对所述第一匹配块进行浅层特征提取,得到第一特征信息,包括:通过所述第一卷积层对所述第一匹配块进行卷积操作,得到所述第一特征信息。
- 根据权利要求4所述的方法,其中,所述残差投影模块组包括N个残差投影块、第二卷积层和第一连接层,所述N个残差投影块、所述第二卷积层和所述第一连接层顺次连接,且所述第一连接层还与所述N个残差投影块中第一个残差投影块的输入连接;所述通过所述残差投影模块组对所述第一特征信息进行残差特征学习,得到第二特征信息,包括:通过所述N个残差投影块对所述第一特征信息进行残差特征学习,得到第一中间特征信息,其中,N为大于或等于1的整数;通过所述第二卷积层对所述第一中间特征信息进行卷积操作,得到第二中间特征信息;通过所述第一连接层对所述第一特征信息和所述第二中间特征信息进行加法计算,得到所述第二特征信息。
- 根据权利要求6所述的方法,其中,所述N个残差投影块是级联结构,所述级联结构的输入为所述第一特征信息,所述级联结构的输出为所述第二中间特征信息。
- 根据权利要求7所述的方法,其中,所述通过所述N个残差投影块对所述第一特征信息进行残差特征学习,得到第一中间特征信息,包括:当N等于1时,将所述第一特征信息输入到第一个残差投影块,得到所述第一个残差投影块的输出信息,并将所述第一个残差投影块的输出信息确定为所述第一中间特征信息;当N大于1时,在得到所述第一个残差投影块的输出信息后,将第d个残差投影块的输出信息输入到第d+1个残差投影块,得到所述第d+1个残差投影块的输出信息,并对d执行加1处理,直至得到第N个残差投影块的输出信息,并将所述第N个残差投影块的输出信息确定为所述第一中间特征信息;其中,d为大于或等于1且小于N的整数。
- 根据权利要求8所述的方法,其中,所述残差投影块包括上投影模块、M个残差模块、局部特征融合模块、下投影模块和第二连接层,所述上投影模块、所述M个残差模块、所述局部特征融合模块、所述下投影模块和所述第二连接层顺次连接,且所述第二连接层还与所述上投影模块的输入连接,所述M个残差模块的输出还分别与所述局部特征融合模块连接;所述方法还包括:通过所述上投影模块对所述残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息;通过所述M个残差模块对所述第一高分辨率特征信息进行不同等级的高分辨率特征学习,得到M个第二高分辨率特征信息,其中,M为大于或等于1的整数;通过所述局部特征融合模块对所述M个第二高分辨率特征信息进行融合操作,得到第三高分辨率特征信息;通过所述下投影模块对所述第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息;通过所述第二连接层对所述输入信息和所述滤波后特征信息进行加法计算,得到所述残差投影块的输出信息。
- 根据权利要求9所述的方法,其中,所述上投影模块包括转置卷积层,所述通过所述上投影模块对所述残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息,包括:通过所述转置卷积层对所述残差投影块的输入信息进行第三滤波处理,得到所述第一高分辨率特征信息,其中,经过所述第三滤波处理后得到的所述第一高分辨率特征信息的分辨率高于所述残差投影块的输入信息的分辨率。
- 根据权利要求10所述的方法,其中,所述第三滤波处理包括:上采样。
- 根据权利要求9所述的方法,其中,所述局部特征融合模块包括特征融合层和第三卷积层,所述通过所述局部特征融合模块对所述M个第二高分辨率特征信息进行融合操作,得到第三高分辨率特征信息,包括:通过所述特征融合层对所述M个第二高分辨率特征信息进行融合操作,得到融合特征信息;通过所述第三卷积层对所述融合特征信息进行卷积操作,得到所述第三高分辨率特征信息。
- 根据权利要求9所述的方法,其中,所述下投影模块包括第四卷积层,所述通过所述下投影模块对所述第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息,包括:通过所述第四卷积层对所述第三高分辨率特征信息进行第四滤波处理,得到所述滤波后特征信息,其中,经过所述第四滤波处理后得到的所述滤波后特征信息的分辨率低于所述第三高分辨率特征信息的分辨率。
- 根据权利要求13所述的方法,其中,所述第四滤波处理包括:下采样。
- 根据权利要求4所述的方法,其中,所述采样模块包括亚像素卷积层,所述通过所述采样模块对所述第二特征信息进行第二滤波处理,得到第三特征信息,包括:通过所述亚像素卷积层对所述第二特征信息进行所述第二滤波处理,得到所述第三特征信息,其中,经过所述第二滤波处理后得到的所述第三特征信息的分辨率高于所述第二特征信息的分辨率。
- 根据权利要求15所述的方法,其中,所述第二滤波处理包括:上采样。
- 根据权利要求4所述的方法,其中,所述重建模块包括第五卷积层,所述通过所述重建模块对所述第三特征信息进行超分辨率重建,得到所述处理块,包括:通过所述第五卷积层对所述第三特征信息进行卷积操作,得到所述处理块。
- 根据权利要求1至17任一项所述的方法,其中,所述方法还包括:确定训练数据集,所述训练数据集包括至少一张训练图像;对所述训练数据集进行预处理,得到所述预设神经网络模型的真值区域以及至少一组输入图像组;其中,所述输入图像组包括至少一张输入图像;基于所述真值区域,利用所述至少一组输入图像组对神经网络模型进行训练,得到至少一组候选模型参数;其中,所述真值区域用于确定所述神经网络模型的损失函数的损失值,所述至少一组候选模型参数是在所述损失函数的损失值收敛到预设阈值时得到的。
- 根据权利要求18所述的方法,其中,所述方法还包括:确定所述当前块的量化参数;根据所述量化参数,从所述至少一组候选模型参数中确定所述量化参数对应的模型参数;根据所述模型参数,确定所述预设神经网络模型;其中,当所述至少一组为多组时,所述输入图像组对应不同的量化参数,且多组所述候选模型参数与不同的量化参数之间具有对应关系。
- 根据权利要求19所述的方法,其中,所述方法还包括:对所述模型参数进行编码,将编码比特写入码流。
- 根据权利要求4所述的方法,其中,所述确定当前块的第一匹配块,包括:对所述当前块进行整像素运动估计,确定所述当前块的第一匹配块;其中,所述第一匹配块为所述当前块在整像素位置进行运动估计时率失真代价值最小的匹配块;所述对所述第一匹配块进行运动补偿增强,得到至少一个第二匹配块,包括:利用预设神经网络模型对所述第一匹配块进行分像素运动补偿,得到所述至少一个第二匹配块。
- 根据权利要求21所述的方法,其中,在所述得到至少一个第二匹配块之后,所述方法还包括:根据所述至少一个第二匹配块对所述当前块进行分像素运动估计,确定所述当前块的分像素匹配块,所述分像素匹配块为所述当前块在分像素位置进行运动估计时率失真代价值最小的匹配块;所述根据所述至少一个第二匹配块,确定所述当前块的运动信息,包括:利用所述第一匹配块对所述当前块进行预编码处理,确定第一率失真代价值;利用所述分像素匹配块对所述当前块进行预编码处理,确定第二率失真代价值;若所述第一率失真代价值大于所述第二率失真代价值,则确定所述当前块使用运动补偿增强处理方式,且确定所述运动信息为第一运动信息,所述第一运动信息用于指向分像素位置;若所述第一率失真代价值小于或等于所述第二率失真代价值,则确定所述当前块不使用运动补偿增强处理方式,且确定所述运动信息为第二运动信息,所述第二运动信息用于指向整像素位置。
- 根据权利要求22所述的方法,其中,所述方法还包括:若所述第一率失真代价值大于所述第二率失真代价值,则确定第一语法元素标识信息的取值为第一值;若所述第一率失真代价值小于或等于所述第二率失真代价值,则确定第一语法元素标识信息的取值为第二值;其中,所述第一语法元素标识信息用于指示所述当前块是否使用运动补偿增强处理方式。
- 根据权利要求23所述的方法,其中,所述方法还包括:对所述第一语法元素标识信息的取值进行编码,将编码比特写入码流。
- 根据权利要求22所述的方法,其中,所述根据所述运动信息,对所述当前块进行编码,包括:当所述当前块使用运动补偿增强处理方式时,根据所述第一运动信息和所述分像素匹配块,确定所述当前块的第一预测块;根据所述当前块和所述第一预测块,确定所述当前块的残差块;对所述残差块进行编码,将编码比特写入码流;或者,当所述当前块不使用运动补偿增强处理方式时,根据所述第二运动信息和所述第一匹配块,确定所述当前块的第二预测块;根据所述当前块和所述第二预测块,确定所述当前块的残差块;对所述残差块进行编码,将编码比特写入码流。
- 根据权利要求1至25任一项所述的方法,其中,所述方法还包括:对所述运动信息进行编码,将编码比特写入码流。
- 一种码流,其中,所述码流是根据待编码信息进行比特编码生成的;其中,所述待编码信息至少包括当前块的运动信息、当前块的残差块和第一语法元素标识信息的取值,所述第一语法元素标识信息用于指示所述当前块是否使用运动补偿增强处理方式。
- 一种解码方法,应用于解码器,所述方法包括:解析码流,确定第一语法元素标识信息的取值;若所述第一语法元素标识信息指示当前块使用运动补偿增强处理方式,则解析所述码流,确定所述当前块的第一运动信息;根据所述第一运动信息确定所述当前块的第一匹配块,并对所述第一匹配块进行运动补偿增强,得到至少一个第二匹配块;根据所述第一运动信息和所述至少一个第二匹配块,确定所述当前块的第一预测块;根据所述第一预测块,确定所述当前块的重建块。
- 根据权利要求28所述的方法,其中,所述方法还包括:解析码流,获取所述当前块的残差块;所述根据所述第一预测块,确定所述当前块的重建块,包括:根据所述残差块和所述第一预测块,确定所述当前块的重建块。
- 根据权利要求28所述的方法,其中,所述方法还包括:若所述第一语法元素标识信息指示当前块不使用运动补偿增强处理方式,则解析码流,获取所述当前块的第二运动信息;根据所述当前块的第二运动信息,确定所述当前块的第二预测块;根据所述第二预测块,确定所述当前块的重建块。
- 根据权利要求30所述的方法,其中,所述方法还包括:解析码流,获取所述当前块的残差块;所述根据所述第二预测块,确定所述当前块的重建块,包括:根据所述残差块和所述第二预测块,确定所述当前块的重建块。
- 根据权利要求28至31任一项所述的方法,其中,所述解析码流,确定第一语法元素标识信息的取值,包括:若所述第一语法元素标识信息的取值为第一值,则确定所述当前块使用运动补偿增强处理方式;若所述第一语法元素标识信息的取值为第二值,则确定所述当前块不使用运动补偿增强处理方式。
- 根据权利要求28所述的方法,其中,所述对所述第一匹配块进行运动补偿增强,得到至少一个第二匹配块,包括:对所述第一匹配块进行超分辨率和质量增强处理,得到处理块,其中,所述处理块的分辨率高于所述当前块的分辨率;对所述处理块进行第一滤波处理,得到所述至少一个第二匹配块,其中,经过所述第一滤波处理后得到的所述第二匹配块具有与所述当前块相同的分辨率。
- 根据权利要求33所述的方法,其中,所述第一滤波处理包括:下采样。
- 根据权利要求33所述的方法,其中,所述对所述第一匹配块进行运动补偿增强的步骤还包括:利用预设神经网络模型对所述第一匹配块进行运动补偿增强;其中,所述预设神经网络模型包括特征提取模块、残差投影模块组、采样模块和重建模块,且所述特征提取模块、所述残差投影模块组、所述采样模块和所述重建模块顺次连接;所述对所述第一匹配块进行超分辨率和质量增强处理,得到处理块,包括:通过所述特征提取模块对所述第一匹配块进行浅层特征提取,得到第一特征信息;通过所述残差投影模块组对所述第一特征信息进行残差特征学习,得到第二特征信息;通过所述采样模块对所述第二特征信息进行第二滤波处理,得到第三特征信息;通过所述重建模块对所述第三特征信息进行超分辨率重建,得到所述处理块。
- 根据权利要求35所述的方法,其中,所述特征提取模块包括第一卷积层;所述通过所述特征提取模块对所述第一匹配块进行浅层特征提取,得到第一特征信息,包括:通过所述第一卷积层对所述第一匹配块进行卷积操作,得到所述第一特征信息。
- 根据权利要求35所述的方法,其中,所述残差投影模块组包括N个残差投影块、第二卷积层和第一连接层,所述N个残差投影块、所述第二卷积层和所述第一连接层顺次连接,且所述第一连接层还与所述N个残差投影块中第一个残差投影块的输入连接;所述通过所述残差投影模块组对所述第一特征信息进行残差特征学习,得到第二特征信息,包括:通过所述N个残差投影块对所述第一特征信息进行残差特征学习,得到第一中间特征信息,其中,N为大于或等于1的整数;通过所述第二卷积层对所述第一中间特征信息进行卷积操作,得到第二中间特征信息;通过所述第一连接层对所述第一特征信息和所述第二中间特征信息进行加法计算,得到所述第二特征信息。
- 根据权利要求37所述的方法,其中,所述N个残差投影块是级联结构,所述级联结构的输入为所述第一特征信息,所述级联结构的输出为所述第二中间特征信息。
- 根据权利要求38所述的方法,其中,所述通过所述N个残差投影块对所述第一特征信息进行残差特征学习,得到第一中间特征信息,包括:当N等于1时,将所述第一特征信息输入到第一个残差投影块,得到所述第一个残差投影块的输出信息,并将所述第一个残差投影块的输出信息确定为所述第一中间特征信息;当N大于1时,在得到所述第一个残差投影块的输出信息后,将第d个残差投影块的输出信息输入到第d+1个残差投影块,得到所述第d+1个残差投影块的输出信息,并对d执行加1处理,直至得到第N个残差投影块的输出信息,并将所述第N个残差投影块的输出信息确定为所述第一中间特征信息;其中,d为大于或等于1且小于N的整数。
- 根据权利要求39所述的方法,其中,所述残差投影块包括上投影模块、M个残差模块、局部特征融合模块、下投影模块和第二连接层,所述上投影模块、所述M个残差模块、所述局部特征融合模块、所述下投影模块和所述第二连接层顺次连接,且所述第二连接层还与所述上投影模块的输入连接,所述M个残差模块的输出还分别与所述局部特征融合模块连接;所述方法还包括:通过所述上投影模块对所述残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息;通过所述M个残差模块对所述第一高分辨率特征信息进行不同等级的高分辨率特征学习,得到M个第二高分辨率特征信息,其中,M为大于或等于1的整数;通过所述局部特征融合模块对所述M个第二高分辨率特征信息进行融合操作,得到第三高分辨率特征信息;通过所述下投影模块对所述第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息;通过所述第二连接层对所述输入信息和所述滤波后特征信息进行加法计算,得到所述残差投影块的输出信息。
- 根据权利要求40所述的方法,其中,所述上投影模块包括转置卷积层,所述通过所述上投影模块对所述残差投影块的输入信息进行第三滤波处理,得到第一高分辨率特征信息,包括:通过所述转置卷积层对所述残差投影块的输入信息进行第三滤波处理,得到所述第一高分辨率特征信息,其中,经过所述第三滤波处理后得到的所述第一高分辨率特征信息的分辨率高于所述残差投影块的输入信息的分辨率。
- 根据权利要求41所述的方法,其中,所述第三滤波处理包括:上采样。
- 根据权利要求40所述的方法,其中,所述局部特征融合模块包括特征融合层和第三卷积层,所述通过所述局部特征融合模块对所述M个第二高分辨率特征信息进行融合操作,得到第三高分辨率特征信息,包括:通过所述特征融合层对所述M个第二高分辨率特征信息进行融合操作,得到融合特征信息;通过所述第三卷积层对所述融合特征信息进行卷积操作,得到所述第三高分辨率特征信息。
- 根据权利要求40所述的方法,其中,所述下投影模块包括第四卷积层,所述通过所述下投影模块对所述第三高分辨率特征信息进行第四滤波处理,得到滤波后特征信息,包括:通过所述第四卷积层对所述第三高分辨率特征信息进行第四滤波处理,得到所述滤波后特征信息,其中,经过所述第四滤波处理后得到的所述滤波后特征信息的分辨率低于所述第三高分辨率特征信息的分辨率。
- 根据权利要求44所述的方法,其中,所述第四滤波处理包括:下采样。
- 根据权利要求35所述的方法,其中,所述采样模块包括亚像素卷积层,所述通过所述采样模块对所述第二特征信息进行第二滤波处理,得到第三特征信息,包括:通过所述亚像素卷积层对所述第二特征信息进行所述第二滤波处理,得到所述第三特征信息,其中,经过所述第二滤波处理后得到的所述第三特征信息的分辨率高于所述第二特征信息的分辨率。
- 根据权利要求46所述的方法,其中,所述第二滤波处理包括:上采样。
- 根据权利要求35所述的方法,其中,所述重建模块包括第五卷积层,所述通过所述重建模块对所述第三特征信息进行超分辨率重建,得到所述处理块,包括:通过所述第五卷积层对所述第三特征信息进行卷积操作,得到所述处理块。
- 根据权利要求28至48任一项所述的方法,其中,所述方法还包括:确定训练数据集,所述训练数据集包括至少一张训练图像;对所述训练数据集进行预处理,得到所述预设神经网络模型的真值区域以及至少一组输入图像组;其中,所述输入图像组包括至少一张输入图像;基于所述真值区域,利用所述至少一组输入图像组对神经网络模型进行训练,得到至少一组候选模型参数;其中,所述真值区域用于确定所述神经网络模型的损失函数的损失值,所述至少一组候选模型参数是在所述损失函数的损失值收敛到预设阈值时得到的。
- 根据权利要求49所述的方法,其中,所述方法还包括:确定所述当前块的量化参数;根据所述量化参数,从所述至少一组候选模型参数中确定所述量化参数对应的模型参数;根据所述模型参数,确定所述预设神经网络模型;其中,当所述至少一组为多组时,所述输入图像组对应不同的量化参数,且多组所述候选模型参数与不同的量化参数之间具有对应关系。
- 根据权利要求28至50任一项所述的方法,其中,所述方法还包括:解析所述码流,获取模型参数;根据所述模型参数,确定所述预设神经网络模型。
- 一种编码器,所述编码器包括第一确定单元、第一运动补偿单元和编码单元;其中,所述第一确定单元,配置为确定当前块的第一匹配块;所述第一运动补偿单元,配置为对所述第一匹配块进行运动补偿增强,得到至少一个第二匹配块;所述第一确定单元,还配置为根据所述至少一个第二匹配块,确定所述当前块的运动信息;所述编码单元,配置为根据所述运动信息,对所述当前块进行编码。
- 一种编码器,所述编码器包括第一存储器和第一处理器;其中,所述第一存储器,用于存储能够在所述第一处理器上运行的计算机程序;所述第一处理器,用于在运行所述计算机程序时,执行如权利要求1至26任一项所述的方法。
- 一种解码器,所述解码器包括解析单元、第二确定单元和第二运动补偿单元;其中,所述解析单元,配置为解析码流,确定第一语法元素标识信息的取值;所述解析单元,还配置为若所述第一语法元素标识信息指示当前块使用运动补偿增强处理方式,则解析所述码流,确定所述当前块的第一运动信息;所述第二运动补偿单元,配置为根据所述第一运动信息确定所述当前块的第一匹配块,并对所述第一匹配块进行运动补偿增强,得到至少一个第二匹配块;所述第二确定单元,配置为根据所述第一运动信息和所述至少一个第二匹配块,确定所述当前块的第一预测块;以及根据所述第一预测块,确定所述当前块的重建块。
- 一种解码器,所述解码器包括第二存储器和第二处理器;其中,所述第二存储器,用于存储能够在所述第二处理器上运行的计算机程序;所述第二处理器,用于在运行所述计算机程序时,执行如权利要求28至51任一项所述的方法。
- 一种计算机存储介质,其中,所述计算机存储介质存储有计算机程序,所述计算机程序被第一处理器执行时实现如权利要求1至26任一项所述的方法、或者被第二处理器执行时实现如权利要求28至51任一项所述的方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21942383.7A EP4351136A1 (en) | 2021-05-28 | 2021-05-28 | Encoding method, decoding method, code stream, encoder, decoder and storage medium |
PCT/CN2021/096818 WO2022246809A1 (zh) | 2021-05-28 | 2021-05-28 | 编解码方法、码流、编码器、解码器以及存储介质 |
CN202180097906.8A CN117280685A (zh) | 2021-05-28 | 2021-05-28 | 编解码方法、码流、编码器、解码器以及存储介质 |
US18/520,922 US20240098271A1 (en) | 2021-05-28 | 2023-11-28 | Encoding method, decoding method, bitstream, encoder, decoder and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/096818 WO2022246809A1 (zh) | 2021-05-28 | 2021-05-28 | 编解码方法、码流、编码器、解码器以及存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/520,922 Continuation US20240098271A1 (en) | 2021-05-28 | 2023-11-28 | Encoding method, decoding method, bitstream, encoder, decoder and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022246809A1 true WO2022246809A1 (zh) | 2022-12-01 |
Family
ID=84228340
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/096818 WO2022246809A1 (zh) | 2021-05-28 | 2021-05-28 | 编解码方法、码流、编码器、解码器以及存储介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20240098271A1 (zh) |
EP (1) | EP4351136A1 (zh) |
CN (1) | CN117280685A (zh) |
WO (1) | WO2022246809A1 (zh) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080198934A1 (en) * | 2007-02-20 | 2008-08-21 | Edward Hong | Motion refinement engine for use in video encoding in accordance with a plurality of sub-pixel resolutions and methods for use therewith |
US20120063515A1 (en) * | 2010-09-09 | 2012-03-15 | Qualcomm Incorporated | Efficient Coding of Video Parameters for Weighted Motion Compensated Prediction in Video Coding |
CN109729363A (zh) * | 2017-10-31 | 2019-05-07 | 腾讯科技(深圳)有限公司 | 一种视频图像的处理方法和装置 |
CN111010568A (zh) * | 2018-10-06 | 2020-04-14 | 华为技术有限公司 | 插值滤波器的训练方法、装置及视频图像编解码方法、编解码器 |
CN111586415A (zh) * | 2020-05-29 | 2020-08-25 | 浙江大华技术股份有限公司 | 视频编码方法、装置、编码器及存储装置 |
-
2021
- 2021-05-28 WO PCT/CN2021/096818 patent/WO2022246809A1/zh active Application Filing
- 2021-05-28 EP EP21942383.7A patent/EP4351136A1/en active Pending
- 2021-05-28 CN CN202180097906.8A patent/CN117280685A/zh active Pending
-
2023
- 2023-11-28 US US18/520,922 patent/US20240098271A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080198934A1 (en) * | 2007-02-20 | 2008-08-21 | Edward Hong | Motion refinement engine for use in video encoding in accordance with a plurality of sub-pixel resolutions and methods for use therewith |
US20120063515A1 (en) * | 2010-09-09 | 2012-03-15 | Qualcomm Incorporated | Efficient Coding of Video Parameters for Weighted Motion Compensated Prediction in Video Coding |
CN109729363A (zh) * | 2017-10-31 | 2019-05-07 | 腾讯科技(深圳)有限公司 | 一种视频图像的处理方法和装置 |
CN111010568A (zh) * | 2018-10-06 | 2020-04-14 | 华为技术有限公司 | 插值滤波器的训练方法、装置及视频图像编解码方法、编解码器 |
CN111586415A (zh) * | 2020-05-29 | 2020-08-25 | 浙江大华技术股份有限公司 | 视频编码方法、装置、编码器及存储装置 |
Non-Patent Citations (1)
Title |
---|
D. BULL, F. ZHANG, M. AFONSO (UNIV. OF BRISTOL): "Description of SDR video coding technology proposal by University of Bristol", 10. JVET MEETING; 20180410 - 20180420; SAN DIEGO; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), 12 April 2018 (2018-04-12), pages 1 - 35, XP030248160 * |
Also Published As
Publication number | Publication date |
---|---|
EP4351136A1 (en) | 2024-04-10 |
US20240098271A1 (en) | 2024-03-21 |
CN117280685A (zh) | 2023-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7114153B2 (ja) | ビデオエンコーディング、デコーディング方法、装置、コンピュータ機器及びコンピュータプログラム | |
WO2021203394A1 (zh) | 环路滤波的方法与装置 | |
CN115606179A (zh) | 用于使用学习的下采样特征进行图像和视频编码的基于学习的下采样的cnn滤波器 | |
WO2023000179A1 (zh) | 视频超分辨网络及视频超分辨、编解码处理方法、装置 | |
CN108848377B (zh) | 视频编码、解码方法、装置、计算机设备和存储介质 | |
WO2023130333A1 (zh) | 编解码方法、编码器、解码器以及存储介质 | |
US20230262212A1 (en) | Picture prediction method, encoder, decoder, and computer storage medium | |
CN111800629A (zh) | 视频解码方法、编码方法以及视频解码器和编码器 | |
CN112534817A (zh) | 视频图像分量的预测方法、装置及计算机存储介质 | |
WO2021120122A1 (zh) | 图像分量预测方法、编码器、解码器以及存储介质 | |
CN115552905A (zh) | 用于图像和视频编码的基于全局跳过连接的cnn滤波器 | |
WO2023142926A1 (zh) | 一种图像处理方法和装置 | |
CN116582685A (zh) | 一种基于ai的分级残差编码方法、装置、设备和存储介质 | |
JP2023537823A (ja) | ビデオ処理方法、装置、機器、デコーダ、システム及び記憶媒体 | |
CN112601095B (zh) | 一种视频亮度和色度分数插值模型的创建方法及系统 | |
CN113784128A (zh) | 图像预测方法、编码器、解码器以及存储介质 | |
CN112866697B (zh) | 视频图像编解码方法、装置、电子设备及存储介质 | |
CN113822824A (zh) | 视频去模糊方法、装置、设备及存储介质 | |
RU2683614C2 (ru) | Кодер, декодер и способ работы с использованием интерполяции | |
CN117441186A (zh) | 图像解码及处理方法、装置及设备 | |
US20230262251A1 (en) | Picture prediction method, encoder, decoder and computer storage medium | |
WO2022246809A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
CN113261285A (zh) | 编码方法、解码方法、编码器、解码器以及存储介质 | |
CN110710204A (zh) | 用于编码和解码表示至少一个图像的数据流的方法和设备 | |
CN112313950A (zh) | 视频图像分量的预测方法、装置及计算机存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21942383 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180097906.8 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2021942383 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2021942383 Country of ref document: EP Effective date: 20240102 |